countering adversarial images using input transformations hari …yjlee/teaching/ecs289g... ·...

27
Countering Adversarial Images using Input Transformations Chuan Guo, Mayank Rana, Moustapha Cisse, Laurens Van Der Maaten Presented by Hari Venugopalan, Zainul Abi Din

Upload: others

Post on 30-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Countering Adversarial Images using Input Transformations Hari …yjlee/teaching/ecs289g... · 2018. 2. 12. · x is the original input, x’ is the perturbed input which is found

Countering Adversarial Images using Input Transformations

Chuan Guo, Mayank Rana, Moustapha Cisse, Laurens Van Der Maaten

Presented by

Hari Venugopalan, Zainul Abi Din

Page 2: Countering Adversarial Images using Input Transformations Hari …yjlee/teaching/ecs289g... · 2018. 2. 12. · x is the original input, x’ is the perturbed input which is found

Motivation: Why is this a hard problem to solve?

● Adversarial Examples are very easy to generate and very difficult to defend against - problem with neural nets, piecewise linear functions

● Adversarial Examples transfer from one model to another very easily

● This problem is not well understood, researchers are divided on the nature of these samples.

Page 3: Countering Adversarial Images using Input Transformations Hari …yjlee/teaching/ecs289g... · 2018. 2. 12. · x is the original input, x’ is the perturbed input which is found

Problem Definition: Adversarial Example

Source: Towards Evaluating the Robustness of Neural Networks, Carlini and Wagner

● x is the original input, x’ is the perturbed input which is found by adding some noise to x

● Given a classifier h(x), the score h(x) and h(x’) should not be the same.

● While d(x, x’) is smaller than some threshold.

● Keep the distortion low and move the decision boundary

+ Noise

x

x’

Page 4: Countering Adversarial Images using Input Transformations Hari …yjlee/teaching/ecs289g... · 2018. 2. 12. · x is the original input, x’ is the perturbed input which is found

Measure of distortion● Normalized L2-Dissimilarity● X is the actual image● X’ is the adversarial image● N is the total number of images● Low L2-Dissimilarity

Source: Countering Adversarial Images using Input Transformations, Chuan Guo

Page 5: Countering Adversarial Images using Input Transformations Hari …yjlee/teaching/ecs289g... · 2018. 2. 12. · x is the original input, x’ is the perturbed input which is found

Intuitively? What is an attackX: Original Input

Classifier

h(X) Y: probabilities

X’: Perturbed Image h(X’) Y: probabilities

Classifier

Image taken from: Towards Evaluating the Robustness of Neural Networks, Carlini and Wagner

Page 6: Countering Adversarial Images using Input Transformations Hari …yjlee/teaching/ecs289g... · 2018. 2. 12. · x is the original input, x’ is the perturbed input which is found

Types of attacks in DNNs

Source: ZOO: Zeroth Order Optimization Based Black-box A acks to Deep Neural Networks without Training Substitute Models, Pi-Yu Chen

● White-Box: The attacker is assumed to have complete knowledge of the networks weights and architecture

● Black-Box: Does not require internal model access

● Gray-Box: Attacker doesn’t know the defense strategy used.

● Targeted Attack: Attacker selects the class they want the example to be misclassified as.

● Untargeted Attack: Any misclassification is the goal

Page 7: Countering Adversarial Images using Input Transformations Hari …yjlee/teaching/ecs289g... · 2018. 2. 12. · x is the original input, x’ is the perturbed input which is found

How do you find this noise? Fast Gradient Sign Method

● Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy - 2015

1. x’ = x + η2. η = sign (∇xJ(θ, x, y)).

Page 8: Countering Adversarial Images using Input Transformations Hari …yjlee/teaching/ecs289g... · 2018. 2. 12. · x is the original input, x’ is the perturbed input which is found

Deep fool: ● f(x) is the classifier● Given a sample x0, change f(x0) sign

● Project x0 orthogonally on f(x)

● r = {f( x0 ) / [||∇x f( x0 )||^2 } ∇x f( x0 )

● Keep adding r to x0 till sign of f( x0 ) changes

Source: DeepFool: A simple and accurate method to fool deep neural networks, Moosavi Dezfooli

Page 9: Countering Adversarial Images using Input Transformations Hari …yjlee/teaching/ecs289g... · 2018. 2. 12. · x is the original input, x’ is the perturbed input which is found

Adversarial Transformative Networks

Image Source: DeepFool: A simple and accurate method to fool deep neural networks, Moosavi Dezfooli

Concept Source: Learning to generate adversarial examples, Shumeet Baluja

Target NN

ATN

fw(X)

gf,θ(X)

X Probabilities

X

X’

Learn to generate and adversarial sample for Target NN

Learn the noise

+

Page 10: Countering Adversarial Images using Input Transformations Hari …yjlee/teaching/ecs289g... · 2018. 2. 12. · x is the original input, x’ is the perturbed input which is found

ATN Cost- Gradient W.R.T θ

Source: Learning to generate adversarial examples, Shumeet Baluja

Cost = ΣβLx(gf,θ(x), x) + Ly(fw(gf,θ(x)), r(fw(x), t))

r(f(x),t) = norm(ɑ*max(f(x); k = t, f(x); otherwise)

Page 11: Countering Adversarial Images using Input Transformations Hari …yjlee/teaching/ecs289g... · 2018. 2. 12. · x is the original input, x’ is the perturbed input which is found

Carlini and Wagner Attack:

Source: Towards Evaluating the Robustness of Neural Networks, Carlini and Wagner

minx’[ || x - x’||2 + λfmax(-k, Z(x’)h(x) - max{Z(x’)k :k != h(x)})]

● Z(x’): Input to the softmax layer● Z(x’)k: k th component of Z(x’)● max{Z(x’)k :k != h(x)} : second largest logit● Z(x’)h(x) - max{Z(x’)k :k != h(x)} : Difference between

second largest logit and largest logit● k : Confidence

Page 12: Countering Adversarial Images using Input Transformations Hari …yjlee/teaching/ecs289g... · 2018. 2. 12. · x is the original input, x’ is the perturbed input which is found

Countering Adversarial Images

● Improve robustness of model● Exploit randomness and non-differentiability● Model specific and model agnostic defenses

Page 13: Countering Adversarial Images using Input Transformations Hari …yjlee/teaching/ecs289g... · 2018. 2. 12. · x is the original input, x’ is the perturbed input which is found

● Based on Robust Optimization● Minimization-maximization approach is followed● Learning algorithm and regularization scheme● Makes assumptions on nature of adversary● Do not satisfy Kerckhoff's principle

Model Specific Defenses

Page 14: Countering Adversarial Images using Input Transformations Hari …yjlee/teaching/ecs289g... · 2018. 2. 12. · x is the original input, x’ is the perturbed input which is found

● Transforms images to remove perturbations● JPEG compression and image re-scaling are examples● Paper aims to increase effectiveness of model agnostic

defenses

Model Agnostic Defenses

Page 15: Countering Adversarial Images using Input Transformations Hari …yjlee/teaching/ecs289g... · 2018. 2. 12. · x is the original input, x’ is the perturbed input which is found

● Proposed by Xu et al● Detects adversarial inputs● Input space is reduced by “squeezing out” features● Outputs are compared in original and reduced spaces● Different outputs means adversarial inputs

Feature Squeezing

Source: Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks

Page 16: Countering Adversarial Images using Input Transformations Hari …yjlee/teaching/ecs289g... · 2018. 2. 12. · x is the original input, x’ is the perturbed input which is found

● Proposed by Dziugaite et al● Removes perturbations by compressing images● Compression could remove aspects of perturbation● Effective against small-magnitude perturbation● With larger perturbation, compression is unable to recover

non-adversarial image

JPEG Compression

Source: A study of the effect of JPG compression on adversarial images

Page 17: Countering Adversarial Images using Input Transformations Hari …yjlee/teaching/ecs289g... · 2018. 2. 12. · x is the original input, x’ is the perturbed input which is found

● Transformation is taken as an optimization problem● Transformation is close to input, but also has low variation● Inspired by noise removal

Total Variation Minimization

Source: Countering Adversarial Images using Input Transformations

Page 18: Countering Adversarial Images using Input Transformations Hari …yjlee/teaching/ecs289g... · 2018. 2. 12. · x is the original input, x’ is the perturbed input which is found

● Images pieced together from small patches taken from database

● Non-parametric technique● Database only has clean patches● Patches are placed over predefined points and edges are

smoothed● Patches selected using KNN on database● Resulting image does not have any perturbations

Image Quilting

Page 19: Countering Adversarial Images using Input Transformations Hari …yjlee/teaching/ecs289g... · 2018. 2. 12. · x is the original input, x’ is the perturbed input which is found

● Performed in black box and gray box setting● Performed on Imagenet dataset● ResNet-50 model was attacked● Attacks included FGSM, I-FGSM, Deepfool and Carlini and Wagner● Top1 classification accuracy was reported for varying normalized L2 dissimilarities

Experiments

Page 20: Countering Adversarial Images using Input Transformations Hari …yjlee/teaching/ecs289g... · 2018. 2. 12. · x is the original input, x’ is the perturbed input which is found

Gray Box: Image Transformations at Test Time

Source: Countering Adversarial Images using Input Transformations

Page 21: Countering Adversarial Images using Input Transformations Hari …yjlee/teaching/ecs289g... · 2018. 2. 12. · x is the original input, x’ is the perturbed input which is found

Black Box: Image transformations at test time

Source: Countering Adversarial Images using Input Transformations

Page 22: Countering Adversarial Images using Input Transformations Hari …yjlee/teaching/ecs289g... · 2018. 2. 12. · x is the original input, x’ is the perturbed input which is found

Black Box: Ensembling and model transfer

Source: Countering Adversarial Images using Input Transformations

Page 23: Countering Adversarial Images using Input Transformations Hari …yjlee/teaching/ecs289g... · 2018. 2. 12. · x is the original input, x’ is the perturbed input which is found

Gray Box: Image transformations at test time

Source: Countering Adversarial Images using Input Transformations

Page 24: Countering Adversarial Images using Input Transformations Hari …yjlee/teaching/ecs289g... · 2018. 2. 12. · x is the original input, x’ is the perturbed input which is found

Comparison to Ensemble Adversarial Training

Source: Countering Adversarial Images using Input Transformations

Page 25: Countering Adversarial Images using Input Transformations Hari …yjlee/teaching/ecs289g... · 2018. 2. 12. · x is the original input, x’ is the perturbed input which is found

Our experiment: Running TVM with an ATN● We built an ATN that broke an ANN from error rate 0.34 to 0.91● Error rate went from 0.91 to 0.90 when ATN perturbation was transformed using TVM● Dataset consisted of 20000 non-MNIST images

Page 26: Countering Adversarial Images using Input Transformations Hari …yjlee/teaching/ecs289g... · 2018. 2. 12. · x is the original input, x’ is the perturbed input which is found

Conclusions and Questions● Image transformations are a more generic defense● Benefits from randomization● Benefits from non-differentiability● Why were they not successful against ATN attack?● Like ATNs, can the best transformation be “learned”?● How good are these transformations in complete white box settings?

Page 27: Countering Adversarial Images using Input Transformations Hari …yjlee/teaching/ecs289g... · 2018. 2. 12. · x is the original input, x’ is the perturbed input which is found

Merci Beaucoup