distributions of adversarial examples by learning...

by learning the Distributions of Adversarial Examples

Boqing GongJoint work with Yandong Li, Lijun Li, Liqiang Wang, & Tong Zhang

Published in ICML 2019

Intriguing properties of deep neural networks (DNNs)

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2014). Intriguing properties of neural networks. ICLR.Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. ICLR.

Projected gradient descent (PGD) attack

Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2017). Towards deep learning models resistant to adversarial attacks. arXiv:1706.06083.Kurakin, A., Goodfellow, I., & Bengio, S. (2016). Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236.

Intriguing results (1)

~100% attack success rates on CIFAR10 & ImageNet

vs


Adversarial examples generalize between different DNNs

vs

E.g., AlexNet InceptionV3


A universal adversarial perturbation

Moosavi-Dezfooli, S. M., Fawzi, A., Fawzi, O., & Frossard, P. (2017). Universal adversarial perturbations. CVPR.

In a nutshell, white-box adversarial attacks can

Fool different DNNs for almost all test examples

Most data points lie near the classification boundaries.

Fool different DNNs by the same adversarial examples

The classification boundaries of various DNNs are close.

Fool different DNNs by a single universal perturbation

We can turn most examples to adversarial by moving them along the same direction by the same amount.

However, white-box adversarial attacks can

Not apply to most real-world scenarios

Not work when the network architecture is unknown

Not work when the weights are unknown

Not work when querying networks is (e.g., cost) prohibitive

Black-box attacks

Panda

Panda: 0.88493

Indri: 0.00878

Red Panda: 0.00317

Substitute attack (Papernot et al., 2017)

Decision-based (Brendel et al., 2017)

Boundary-tracing (Cheng et al., 2019)

Zero-th order (Chen et al., 2017)

Natural evolution strategies (Ilyas et al., 2018)

Bad local optimum, non-smooth optimization, curse of dimensionality, defense-specific gradient estimation, etc.

The? adversarial perturbation (for an input)

Athalye, A., Carlini, N., & Wagner, D. (2018). Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. ICML.

Learns the distribution of adversarial examples (for any input)

Our work

Learns the distribution of adversarial examples (for an input)

Reduces the “attack dimension”Fewer queries into the network.

Smoothes the optimizationHigher attack success rates.

Characterizes the risk of the input exampleNew defense methods.

Our work


Our work

> > >


A sample from the distribution fails DNN by a high chance.

Our work

Which family of distributions?

Natural evolution strategies (NES)

Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR..

Black-box

Experiment setup

Attack 13 defended DNNs & 2 vanilla DNNs

Consider both and

Examine all test examples of CIFAR10 & 1000 of ImageNet

Excluding those misclassified by the targeted DNN

Evaluate by attack success rates

Attack success rates, ImageNet

Attack success rates, CIFAR10

Attack success rate vs. optimization steps

Transferabilities of the adversarial examples

A universally effective defense technique?

Adversarial training / defensive learning

The PGD attack

DNN weights

In a nutshell,

Is a powerful black-box attack, >= white-box attacks

Is universal, failed various defenses by the same algorithm

Characterizes the distributions of adversarial examples

Reduces the “attack dimension”

Speeds up the defensive learning (ongoing work)

Physical adversarial attack

Joint work with Yang Zhang, Hassan Foroosh, & David Phil

Published in ICLR 2019Boqing Gong

Recall the following result

A universal adversarial perturbation

Moosavi-Dezfooli, S. M., Fawzi, A., Fawzi, O., & Frossard, P. (2017). Universal adversarial perturbations. CVPR.

Physical attack: universal perturbation → 2D mask

Eykholt, Kevin, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. "Robust physical-world attacks on deep learning models." CVPR 2018.

Physical attack: 2D mask → 3D camouflage

Gradient descent w.r.t. camouflage cin order to minimize detection scoresfor the vehicle under all feasible locations

Non-differentiable

Repeat until done1. Camouflage a vehicle2. Drive it around and take many pictures of it3. Detect it by Faster-RCNN & save the detection scores

→ Dataset: {(camouflage, vehicle, background, detection score)}


Fit a DNN to predict any camouflage’s corresponding detection scores



Gradient descent w.r.t. Camouflage cin order to minimize detection scoresfor the vehicle under all feasible locations

Non-differentiableBut approximated by a DNN

https://docs.google.com/file/d/13fuemiymxa9YVNwbR7UISyYAPHdQ43QU/preview

Why do we care?

Observation, re-observation, & future work

Defended DNNs are still vulnerable to transfer attacks (only to some moderate degree though)

Adversarial examples from black-box attacks are less transferable than those from white-box attacks

All future work on defenses will adopt adversarial training

Adversarial training will become faster (we are working on it)

We should certify DNNs’ expected robustness by

New works to watchStateful DNNs: Goodfellow (2019). A Research Agenda: Dynamic Models to Defend Against Correlated Attacks. arXiv: 1903.06293.

Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples Are Not Bugs, They Are Features. arXiv:1905.02175.

Faster adversarial training: Zhang et al. (2019). You Only Propagate Once: Painless Adversarial Training Using Maximal Principle. arXiv:1905.00877. && Shafahi et al. (2019). Adversarial Training for Free! arXiv: 1904.12843.

Certifying DNNs’ expected robustness: Webb et al. (2019). A Statistical Approach to Assessing Neural Network Robustness. ICLR. && Cohen et al. (2019). Certified adversarial robustness via randomized smoothing. arXiv preprint arXiv:1902.02918.

distributions of adversarial examples by learning...

Documents