Understanding and Improving Fast Adversarial Training

阿新 • • 發佈：2021-10-23

概
主要內容
Random Step的作用
線性性質
gradient alignment
程式碼

Andriushchenko M. and Flammarion N. Understanding and improving fast adversarial training. In Advances in Neural Information Processing Systems (NIPS), 2020.

概

本文主要探討:

為什麼簡單的FGSM不能夠提高魯棒性;
為什麼FGSM-RS(即加了隨機擾動)可以更好地提高魯棒性;
一種正則化方法, 即使不加隨機擾動亦可提高魯棒性.

主要內容

對抗訓練是迄今最有效的防禦手段, 其思想為:

\[\min_{\theta} \: \mathbb{E}_{(x, y) \sim D} [\max_{\|\delta\| \le \epsilon} \ell(x + \delta, y ;\theta) ]. \]

為了求解inner maximum, 一般通過PGD來近似求解. 但是這種multi-steps的方法很耗時, 所以最近也有一些方法基於FGSM進行一些改進, 其發現是FGSM在額外加一個擾動之後可以有效提高網路魯棒性:

\[\delta_{FGSM-RS} := \prod_{[-\epsilon, \epsilon]^d} [\eta + \alpha \mathrm{sign} (\nabla_x \ell(x + \eta, y; \theta))], \: \eta \sim \mathcal{U}([-\epsilon, \epsilon]^d). \]

但是作者發現這種方法所帶來的魯棒性作用範圍(\(\epsilon\)

)非常狹窄:

其和FGSM-AT一樣, 會在某個點魯棒性突然崩潰, 沒有很好的擴充套件性.

Random Step的作用

為什麼RS能起到一定作用, 作者認為實際上加了RS之後, \(\epsilon\)在某種意義是'變小'了,

作者推得

\[\mathbb{E}_{\eta} [\|\delta_{FGSM-RS}(\eta)\|_2] \le \sqrt{d}\sqrt{-\frac{1}{6\epsilon}\alpha^3 + \frac{1}{2}\alpha^2 + \frac{1}{3}\epsilon} \in [\frac{1}{\sqrt{3}}\sqrt{d}\epsilon, \sqrt{d}\epsilon] \le \|\delta_{FGSM}\|_2 = \sqrt{d}\epsilon. \]

特別的, 作者設定小的\(\epsilon\)

試了(且不加RS)發現能與加了RS效果一致:

線性性質

接下來作者提出自己的觀點, 剖析FGSM為啥有這些異常的情況出現.
作者認為一開始FGSM是對於inner maximum求解是較為準確的, 但是隨著訓練的深入, 不準確了, 為什麼不準確, 作者認為是\(\ell(x;\theta)\)關於\(x\)並不那麼線性了.
我們知道, FGSM實際上是對於線性情況的最優解:

\[\delta_{FGSM} = \arg \max_{\|\delta\|_{\infty} \le \epsilon} \langle \nabla_x \ell(x, y;\theta),\delta \rangle, \]

當\(\ell\)在\(\epsilon\)球內不那麼線性的時候, 這個解就不好了, 可以通過下面的條件來衡量是否線性:

\[\mathbb{E}_{(x, y) \sim D, \eta \sim \mathcal{U}([-\epsilon, \epsilon]^d)} [\cos(\nabla_x \ell(x, y;\theta), \nabla_x \ell(x + \eta, y; \theta))], \]

如上圖所示, 普通的FGSM和FGSM-RS在訓練過程中越發變得區域性非線性, 所以求解越來越差.

gradient alignment

本文提出的解決方法就是利用上述的條件作為一個正則化項.
個人感覺這個正則化條件比以往的想法子讓梯度變小更有趣一點(不侷限於光滑性之上).

程式碼

原文程式碼

Understanding and Improving Fast Adversarial Training

概

主要內容

Random Step的作用

線性性質

gradient alignment

程式碼

Understanding and Improving Fast Adversarial Training

BAG OF TRICKS FOR ADVERSARIAL TRAINING

閱讀筆記 Modality-specific and shared generative adversarial network for cross-modal retrieval

OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation

LTD: Low Temperature Distillation for Robust Adversarial Training

論文導讀：Universal Adversarial Training

DropAttack: A Masked Weight Adversarial Training Method to Improve Generalization of Neural Networks

Towards the Memorization Effect of Neural Networks in Adversarial Training

Double Descent in Adversarial Training: An Implicit Label Noise Perspective

深度學習論文翻譯解析（十）：Visualizing and Understanding Convolutional Networks

DOC - Using and understanding OpenMesh

Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines 論文研讀

2020 Multi-University Training Contest 3 1005- Little W and Contest

2020 Nowcoder Training - AceSrc and chenjb Contest Problem H. Dividing 數論，整除分塊

Online Safe Trajectory Generation For QuadrotorsUsing Fast Marching Method and Bernstein Basis Polyn

PEPSI++: Fast and Lightweight Network for Image Inpainting | 簡記

Ray - Fast and Simple Distributed Computing

【論文筆記】Improving Transformer-based End-to-End Speech Recognition with CTC and LM Integration

FFDNet: Toward a Fast and Flexible Solution for CNN-Based Image Denoising

Fast Packet Processing with eBPF and XDP部分

Understanding and Improving Fast Adversarial Training

概

主要內容

Random Step的作用

線性性質

gradient alignment

程式碼

相關推薦