distributions of adversarial examples by learning...

43
by learning the Distributions of Adversarial Examples Boqing Gong Joint work with Yandong Li, Lijun Li, Liqiang Wang, & Tong Zhang Published in ICML 2019

Upload: others

Post on 22-Jul-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

by learning the Distributions of Adversarial Examples

Boqing GongJoint work with Yandong Li, Lijun Li, Liqiang Wang, & Tong Zhang

Published in ICML 2019

Page 2: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

Intriguing properties of deep neural networks (DNNs)

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2014). Intriguing properties of neural networks. ICLR.Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. ICLR.

Page 3: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

Intriguing properties of deep neural networks (DNNs)

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2014). Intriguing properties of neural networks. ICLR.Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. ICLR.

Page 4: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

Projected gradient descent (PGD) attack

Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2017). Towards deep learning models resistant to adversarial attacks. arXiv:1706.06083.Kurakin, A., Goodfellow, I., & Bengio, S. (2016). Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236.

Page 5: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

Intriguing results (1)

~100% attack success rates on CIFAR10 & ImageNet

vs

Page 6: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

Intriguing results (2)

Page 7: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

Intriguing results (2)

Adversarial examples generalize between different DNNs

vs

E.g., AlexNet InceptionV3

Page 8: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

Intriguing results (3)

A universal adversarial perturbation

Moosavi-Dezfooli, S. M., Fawzi, A., Fawzi, O., & Frossard, P. (2017). Universal adversarial perturbations. CVPR.

Page 9: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

In a nutshell, white-box adversarial attacks can

Fool different DNNs for almost all test examples

Most data points lie near the classification boundaries.

Fool different DNNs by the same adversarial examples

The classification boundaries of various DNNs are close.

Fool different DNNs by a single universal perturbation

We can turn most examples to adversarial by moving them along the same direction by the same amount.

Page 10: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

However, white-box adversarial attacks can

Not apply to most real-world scenarios

Not work when the network architecture is unknown

Not work when the weights are unknown

Not work when querying networks is (e.g., cost) prohibitive

Page 11: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

Black-box attacks

Panda

Panda: 0.88493

Indri: 0.00878

Red Panda: 0.00317

Substitute attack (Papernot et al., 2017)

Decision-based (Brendel et al., 2017)

Boundary-tracing (Cheng et al., 2019)

Zero-th order (Chen et al., 2017)

Natural evolution strategies (Ilyas et al., 2018)

Page 12: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

Bad local optimum, non-smooth optimization, curse of dimensionality, defense-specific gradient estimation, etc.

The? adversarial perturbation (for an input)

Athalye, A., Carlini, N., & Wagner, D. (2018). Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. ICML.

Page 13: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

Learns the distribution of adversarial examples (for any input)

Our work

Page 14: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

Learns the distribution of adversarial examples (for an input)

Reduces the “attack dimension”Fewer queries into the network.

Smoothes the optimizationHigher attack success rates.

Characterizes the risk of the input exampleNew defense methods.

Our work

Page 15: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

Learns the distribution of adversarial examples (for an input)

Our work

> > >

Page 16: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

Learns the distribution of adversarial examples (for an input)

A sample from the distribution fails DNN by a high chance.

Our work

Page 17: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

Which family of distributions?

Page 18: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

Natural evolution strategies (NES)

Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR..

Page 19: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

Natural evolution strategies (NES)

Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR..

Page 20: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

Natural evolution strategies (NES)

Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR..

Page 21: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

Natural evolution strategies (NES)

Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR..

Page 22: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

Natural evolution strategies (NES)

Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR..

Page 23: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

Natural evolution strategies (NES)

Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR..

Page 24: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

Natural evolution strategies (NES)

Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR..

Page 25: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

Black-box

Page 26: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

Experiment setup

Attack 13 defended DNNs & 2 vanilla DNNs

Consider both and

Examine all test examples of CIFAR10 & 1000 of ImageNet

Excluding those misclassified by the targeted DNN

Evaluate by attack success rates

Page 27: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

Attack success rates, ImageNet

Page 28: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

Attack success rates, CIFAR10

Page 29: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

Attack success rate vs. optimization steps

Page 30: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

Transferabilities of the adversarial examples

Page 31: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

A universally effective defense technique?

Adversarial training / defensive learning

The PGD attack

DNN weights

Page 32: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

In a nutshell,

Is a powerful black-box attack, >= white-box attacks

Is universal, failed various defenses by the same algorithm

Characterizes the distributions of adversarial examples

Reduces the “attack dimension”

Speeds up the defensive learning (ongoing work)

Page 33: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

Physical adversarial attack

Joint work with Yang Zhang, Hassan Foroosh, & David Phil

Published in ICLR 2019Boqing Gong

Page 34: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

Recall the following result

A universal adversarial perturbation

Moosavi-Dezfooli, S. M., Fawzi, A., Fawzi, O., & Frossard, P. (2017). Universal adversarial perturbations. CVPR.

Page 35: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

Physical attack: universal perturbation → 2D mask

Eykholt, Kevin, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. "Robust physical-world attacks on deep learning models." CVPR 2018.

Page 36: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

Physical attack: 2D mask → 3D camouflage

Gradient descent w.r.t. camouflage cin order to minimize detection scoresfor the vehicle under all feasible locations

Non-differentiable

Page 37: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

Repeat until done1. Camouflage a vehicle2. Drive it around and take many pictures of it3. Detect it by Faster-RCNN & save the detection scores

→ Dataset: {(camouflage, vehicle, background, detection score)}

Physical attack: 2D mask → 3D camouflage

Page 38: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

Fit a DNN to predict any camouflage’s corresponding detection scores

Physical attack: 2D mask → 3D camouflage

Page 39: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

Physical attack: 2D mask → 3D camouflage

Gradient descent w.r.t. Camouflage cin order to minimize detection scoresfor the vehicle under all feasible locations

Non-differentiableBut approximated by a DNN

Page 41: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

Why do we care?

Page 42: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

Observation, re-observation, & future work

Defended DNNs are still vulnerable to transfer attacks (only to some moderate degree though)

Adversarial examples from black-box attacks are less transferable than those from white-box attacks

All future work on defenses will adopt adversarial training

Adversarial training will become faster (we are working on it)

We should certify DNNs’ expected robustness by

Page 43: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples

New works to watchStateful DNNs: Goodfellow (2019). A Research Agenda: Dynamic Models to Defend Against Correlated Attacks. arXiv: 1903.06293.

Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples Are Not Bugs, They Are Features. arXiv:1905.02175.

Faster adversarial training: Zhang et al. (2019). You Only Propagate Once: Painless Adversarial Training Using Maximal Principle. arXiv:1905.00877. && Shafahi et al. (2019). Adversarial Training for Free! arXiv: 1904.12843.

Certifying DNNs’ expected robustness: Webb et al. (2019). A Statistical Approach to Assessing Neural Network Robustness. ICLR. && Cohen et al. (2019). Certified adversarial robustness via randomized smoothing. arXiv preprint arXiv:1902.02918.