Towards the Memorization Effect of Neural Networks in Adversarial Training

阿新 • • 發佈：2022-04-03

概
主要內容

Xu H., Liu X., Wang W., Jain A. K. Tang J., Ding W., Wu Z. and Liu Z. Towards the memorization effect of neural networks in adversarial training. In International Conference on Learning Representations (ICLR), 2022.

概

作者將樣本分為 typical 和 atypical (可以理解為較少的和其它類別相近的困難樣本) 兩類. 神經網路對於前者能夠利用語義特徵來區別, 而對於後者往往需要利用記憶. 對於標準訓練來說, 記憶 atypical 的樣本並不會降低網路的泛化能力. 對於對抗訓練來說, 為了記憶 atypical 樣本, 容易造成自然精度的下滑, 所以作者提出BAT來更細緻地對待這些 atypcial 樣本.

主要內容

typcial 和 atypical 樣本

首先定義利用演算法$\mathcal{A}$和資料集$\mathcal{D}$在樣本$x_i$處的'memorization value':

\[\tag{1} \mathrm{mem}(\mathcal{A}, \mathcal{D}, x_i) =\mathop{\mathbf{Pr.}} \limits_{F \leftarrow \mathcal{A}(\mathcal{D})} (F(x_i) = y_i) -\mathop{\mathbf{Pr.}} \limits_{F \leftarrow \mathcal{A}(\mathcal{D} \setminus x_i)} (F(x_i) = y_i). \]

如果該值很大, 說明網路必須記憶這個樣本, 否則難以正確識別出它 (也就是說這個樣本的特徵其實是脫離整個資料集的分佈的).

上面的是對於訓練集中的樣本而言的, 對於測試集合的樣本 $(x_j', y_j')$ 和訓練集中的樣本 $ (x_i, y_i)$ 有:

\[\tag{2} \mathrm{infl}(\mathcal{A}, \mathcal{D}, x_i, x_j') =\mathop{\mathbf{Pr.}} \limits_{F \leftarrow \mathcal{A}(\mathcal{D})} (F(x_j') = y_j') -\mathop{\mathbf{Pr.}} \limits_{F \leftarrow \mathcal{A}(\mathcal{D} \setminus x_i)} (F(x_j') = y_j'). \]

給定閾值 $t$, 我們定義 atpcial 訓練樣本和測試樣本:

\[\mathcal{D}_{\mathrm{atyp}} := \{x_i \in \mathcal{D}| \mathrm{mem}(x_i) > t\}, \\ \mathcal{D}_{\mathrm{atyp}}' := \{x_j' \in \mathcal{D'}| \mathrm{infl}(x_i, x_j') > t, \: \forall x_i \in \mathcal{D}_{\mathrm{atyp}}\}. \]

atypical 較差的泛化性

作者選擇了 $t=0.15$, 然後在整個資料集上進行訓練, 可以發現:

無論是自然精度還是魯棒性, 其Training的結果都很好, 這意味這ResNet18WRN28都有足夠的表示能力;
隨著訓練精度的上升, 在 $\mathcal{D}_{\mathrm{atyp}}'$上的自然精度能夠上升, 但是魯棒性幾乎沒有變換, 說明記憶 atpyical 樣本對於增強魯棒性是無效的.

typcial 和 atypical 樣本在魯棒性上的衝突

這裡, 作者以 typical 樣本為基礎, 逐步新增 atypical 樣本, 可以發現讓網路去記憶這些 atypical 反而會造成對 typical 資料有效性. 作者認為, 這些 atypical 由於本身數目比較少, 然後又和別的類別比較接近, 區分難度大的特點, 導致網路想要去記憶這些樣本反而會學習到更差的特徵.

Benign Adversarial Training (BAT)

作者通過重加權和 Discrimination Loss 來解決這一問題.

cost-sensitive reweighting strategy

\[\left \{ \begin{array}{ll} \exp(-\alpha \cdot q(x_i^{adv})) & \text{if } \mathrm{mem}(x_i) > t \text{ and } \mathrm{argmax}_k F_k(x_i^{adv}) \not = y \\ 1 & \text{otherwise}. \end{array} \right . \]

其中

\[q(x_i^{adv}) = \max_{k \not =y} F_k(x_i^{adv}). \]

然後分類損失是:

\[\mathop{\arg \min} \limits_F \frac{1}{\sum_{i}w_i} \sum_{i} [w_i \cdot \mathcal{L}(F(x_i^{adv}), y_i)]. \]

discrimination loss

\[\mathcal{L}_{DL}(F) = \mathop{\mathbb{E}} \limits_{(x_i, y_i) (x_j, y_j), \{(x_b,y_b)\}_{b=1}^B} \Big[-\log \frac{e^{h^T(x_i^{adv}) h(x_j^{adv}) / \tau}}{\sum_{b=1}^B e^{h^T(x_i^{adv}) h(x_k^{adv}) / \tau}} \Big], \]

其中

\[y_i = y_j, \\ y_b \not= y_i, \: b=1,2,\cdots, B, \\ \mathrm{mem}(x_i), \mathrm{mem}(x_j), \mathrm{mem}(x_b) < t. \]

即該損失希望 typcial 樣本的特徵 $h(x_i)$ (倒數第二層) 同類之間相互靠近, 不同類之間相互遠離.

最後的損失是:

\[\mathop{\arg \min} \limits_F \frac{1}{\sum_{i}w_i} \sum_{i} [w_i \cdot \mathcal{L}(F(x_i^{adv}), y_i)] + \beta \cdot \mathcal{L}_{DL}(F). \]

實驗設定:

$\alpha = 1 | 2, \beta = 0.2$;
160 epochs, momentum=0.9, weight decay = 5e-4;
lr=0.1, [80, 120] x 0.1
CIFAR: $8/255$; TinyImageNet: $4/255$

Towards the Memorization Effect of Neural Networks in Adversarial Training

目錄概主要內容 typcial 和 atypical 樣本 atypical 較差的泛化性 typcial 和 atypical 樣本在魯棒性上的衝突

Understanding the Representation Power of Graph Neural Networks in Learning Graph Topology-NIPS2019

理解圖神經網路再學習圖拓撲表示方面的能力-NIPS2019 一、引言 1、問題引入 Despite their practical success, most GCNs are deployed as black boxes feature extractors for graph data. It is not yet clear t

論文翻譯：2018_Source localization using deep neural networks in a shallow water environment

論文地址：https://asa.scitation.org/doi/abs/10.1121/1.5036725 深度神經網路在淺水環境中的源定位

DropAttack: A Masked Weight Adversarial Training Method to Improve Generalization of Neural Networks

目錄概主要內容程式碼 Ni S., Li J. and Kao H. DropAttack: a masked weight adversarial training method to improve generalization of neural networks. In International Conference on Learning Repr

Double Descent in Adversarial Training: An Implicit Label Noise Perspective

目錄概主要內容假設擾動前後條件概率的距離 label smoothing 用於減輕過擬合實際的操作

The Effect of Planning Shape on Dyna-style Planning in High-dimensional State Spaces

發表時間：2019 文章要點：文章分析了Dyna這種model based方法，用model去生成one-step的transition和n-step的transition的區別，得出的主要結論是one-step的transition基本上沒有任何幫助，還不如直接用現有的buf

Improving the Learning Speed of 2-Layer Neural Networks by Choosing Initial Values of the Adaptive Weights

目錄概主要內容一維情形如何加速多維情形程式碼 Nguyen D. and Widrow B. Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights. In Inte

Finding max and min in arrays Find the index of K min in an array

Implement a method that finds the index of the K-th element equal to the minimum in an array of ints. If no such element can be found, return -1. The input array can be empty, K > 0.

關於Training deep neural networks for binary communication with the Whetstone method的程式碼實現

技術標籤：文獻閱讀脈衝神經網路 GitHub網址如下： https://github.com/SNL-NERL/Whetstone/blob/master/examples/adaptive_mnist.py 實現過程中解決的問題： 1.Ubuntu下，python+TensorFlow+Keras版本問題經檢

Android studio 編譯 The number of method references in a .dex file cannot exceed 64K.

出現這種情況的：工程在編譯的時候方法超過dex最多儲存範圍65536，會丟擲異常MultiDex。

class "org.bouncycastle.openssl.PEMException"'s signer information does not match signer information of other classes in the same package

最近寫程式碼遇到下面的問題，第一次遇到這種問題，解決的時候花費了一些時間，特此記錄下來

AI vs. Machine Learning vs. Deep Learning vs. Neural Networks: What's the Difference?

AI vs. Machine Learning vs. Deep Learning vs. Neural Networks: What’s the Difference? | IBM 摘要: Deep Learning就像最新式的手機/電腦CPU，遊戲等等，好的很。可CPU早就有了，超級瑪麗、魂鬥羅、紅警這些遊

Four usage scenarios of @ param annotation in mybatis, the last one is often ignored!

Some partners think that only when there are multiple parameters inmybatismethod can @ param annotation be added. In fact, this understanding is not accurate. Even if the mybatis method has only one

Failure to find ... was cached in the local repository, resolution will not be reattempted until the update interval of public has elapsed or updates are forced 問題解決

當我對專案進行打包時,報了一下錯誤: Failure to find com.st:oscarJDBC16:pom:1.0.0 in http://maven.aliyun.com/nexus/content/groups/public/ was cached in the local repository, resolution will not

Towards the Memorization Effect of Neural Networks in Adversarial Training

概

主要內容

typcial 和 atypical 樣本

atypical 較差的泛化性

typcial 和 atypical 樣本在魯棒性上的衝突

Benign Adversarial Training (BAT)

cost-sensitive reweighting strategy

discrimination loss

實驗設定:

Towards the Memorization Effect of Neural Networks in Adversarial Training

Understanding the Representation Power of Graph Neural Networks in Learning Graph Topology-NIPS2019

論文翻譯：2018_Source localization using deep neural networks in a shallow water environment

DropAttack: A Masked Weight Adversarial Training Method to Improve Generalization of Neural Networks

Double Descent in Adversarial Training: An Implicit Label Noise Perspective

The Effect of Planning Shape on Dyna-style Planning in High-dimensional State Spaces

Improving the Learning Speed of 2-Layer Neural Networks by Choosing Initial Values of the Adaptive Weights

Finding max and min in arrays Find the index of K min in an array

關於Training deep neural networks for binary communication with the Whetstone method的程式碼實現

Android studio 編譯 The number of method references in a .dex file cannot exceed 64K.

class "org.bouncycastle.openssl.PEMException"'s signer information does not match signer information of other classes in the same package

AI vs. Machine Learning vs. Deep Learning vs. Neural Networks: What's the Difference?

Four usage scenarios of @ param annotation in mybatis, the last one is often ignored!

Failure to find ... was cached in the local repository, resolution will not be reattempted until the update interval of public has elapsed or updates are forced 問題解決

Efficient Hardware Acceleration of Sparsely Active Convolutional Spiking Neural Networks

[轉]Must Know Tips/Tricks in Deep Neural Networks閱讀筆記

SNN_文獻閱讀_Recent Advances and New Frontiers in Spiking Neural Networks

A note on the calculation of some functions in finite fields: Tricks of the Trade解讀

【DMCP】2020-CVPR-DMCP Differentiable Markov Channel Pruning for Neural Networks-論文閱讀

【論文筆記】Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition

Towards the Memorization Effect of Neural Networks in Adversarial Training

概

主要內容

typcial 和 atypical 樣本

atypical 較差的泛化性

typcial 和 atypical 樣本在魯棒性上的衝突

Benign Adversarial Training (BAT)

cost-sensitive reweighting strategy

discrimination loss

實驗設定:

相關推薦