adversarial attacks on deep-learning based nlpwzhang/doc/iconip_tut-21nov...adversarial attacks on...

84
Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Upload: others

Post on 24-Feb-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Adversarial Attacks on Deep-learning based NLP

Tutorial @ ICONIP’20

Dr. Wei (Emma) Zhang

21 November 2020

Page 2: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Self-Introduction – Wei (Emma) Zhang

• Short Bio:– Lecturer, School of Computer Science, The University of

Adelaide, Australia. July 2019 - now

– Research fellow, Department of Computing, Macquarie University. Mar 2017- June 2019

– Ph.D. on Computer Science from the University of Adelaide, Australia, Aug 2013 – Feb 2017.

• Research:– Text mining, Natural language Processing

– Internet of Things Applications

2

Page 3: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Motivation of this Tutorial

• Deep neural networks (DNNs), have gained significant popularity in many Artificial Intelligence (AI) applications

• DNNs are vulnerable to strategically modified samples, named adversarial examples

• Most attentions are put on generating adversarial examples for Computer Vision applications

• Relatively few works on Natural Language ProcessingDNN models, but showing a promising increasing trend

3

Page 4: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Expected Goal of this Tutorial

• Develop a shared vocabulary to talk about adversarial attack on textual DNN

• Understand adversarial attack on textual DNN and the differences to attacking images

• Perform black-box and white-box attacks

• Adopt defence strategies

4

Page 5: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Adversarial Examples- (very) Brief History

• History

– L-BFGS [Szegedy et al. ICLR’14]

• Invented “adversarial example”, which are the worst-case inputs

• Find minimum distance between original points and adversarial points that can make the output (label) incorrectly changes.

– FSGM [Goodfellow et al. ICLR’15]: Fast Sign Gradient Method

• Linear explanation

• Fast computation

– [Jia and Liang EMNLP 17’]: first work in NLP

• Most of the papers are in computer vision, > 3 times in NLP.

5

Page 6: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Content of this Tutorial

• Introduction to Adversarial Examples

• Attack considerations

• Black-box Attack

• (a 5 mins break)

• White-box Attack

• Attack on Multi-modal Applications

• Adversarial Training via adversarial examples

• Future Perspective

6Wei Emma Zhang, Quan Z. Sheng, Ahoud Abdulrahmn F. Alhazmi, Chenliang Li: Adversarial Attacks on Deep-learning Models in Natural Language Processing: A Survey. ACM Trans. Intell. Syst. Technol. 11(3): 24:1-24:41, 2020

Page 7: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Content of this Tutorial

• Introduction to Adversarial Examples

• Attack considerations

• Black-box Attack

• White-box Attack

• Attack on Multi-modal Applications

• Adversarial Training via adversarial examples

• Future Perspective

7

Page 8: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

An Example

• Paragraph: “The largest portion of the Huguenots to settle in the Cape arrived between 1688 and 1689…but quite a few arrived as late as 1700; thereafter, the numbers declined.

• Question: “The number of new Huguenot colonists declined after what year?”

• Correct Answer: “1700”

8

Model used: BiDAF Ensemble (Seo et al., 2016)

Robin Jia and Percy Liang. Adversarial Examples for Evaluating Reading Comprehension Systems. EMNLP’17.

Page 9: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

An Example

• Paragraph: “The largest portion of the Huguenots to settle in the Cape arrived between 1688 and 1689…but quite a few arrived as late as 1700; thereafter, the numbers declined. The number of old Acadian colonists declined after the year of 1675.”

• Question: “The number of new Huguenot colonists declined after what year?”

• Correct Answer: “1700”• Predicted Answer: “1675”

9

Robin Jia and Percy Liang. Adversarial Examples for Evaluating Reading Comprehension Systems. EMNLP’17.

Model used: BiDAF Ensemble (Seo et al., 2016)

Page 10: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Adversarial Examples

10

Page 11: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Formal Definition

Given:

A DNN model:

An allowed perturbation set with certain constraints

An adversarial example for is a point

11

𝑠. 𝑡. 𝑓 𝑥 + 𝜂 ≠ 𝑓(𝑥)

𝑥′ = 𝑥 + 𝜂 for 𝜂 ∈ 𝑆

untargeted

𝑜𝑟 𝑓 𝑥 + 𝜂 = 𝑦′ targeted

Page 12: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Terminology

12

• Adversarial examples

– Perturbed examples.

• Adversary attack (Evasion Attack): – A method for generating adversarial examples

• Adversarial Machine Learning

– Technique that attempts to fool models by supplying deceptive input.

Page 13: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Terminology

13

• Adversarial Training

– The processes where adversarial examples are introduced to the model and make the model more robust.

• General Adversarial NetsIt is different. Not to be confused!

– Non-cooperative Game

Page 14: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Why do adversarial attacks matter?

14

[Song et al. ICML’18]

Page 15: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Why do adversarial attacks matter?

15

Samuel G. Finlayson et al. Science 2019;363:1287-1289

Page 16: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Why do adversarial attacks matter?

16

Samuel G. Finlayson et al. Science 2019;363:1287-1289

Page 17: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

17

Why do adversarial attacks matter?

Page 18: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Why do adversarial attacks matter?

18

Page 19: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

To study adversarial attacks

• Test the robustness of the model against worst-case examples.

19

• Discern how a model actually understands its input.

• Improve models through training and optimization with adversarial examples (adversarial training).

Page 20: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

How to attack?

20

Page 21: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Content of this Tutorial

• Introduction to Adversarial Examples

• Attack considerations

• Black-box Attack

• White-box Attack

• Attack on Multi-modal Applications

• Adversarial Training via adversarial examples

• Future Perspective

21

Page 22: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Prepare for an attack

• Black-box or White-box

• Targeted or Untargeted

• Character-level, Word-level, Sentence level, Subword-level

• Single Modal or Multi-Modal

22

Page 23: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Prepare for an attack

24Wei Emma Zhang, Quan Z. Sheng, Ahoud Abdulrahmn F. Alhazmi, Chenliang Li: Adversarial Attacks on Deep-learning Models in Natural Language Processing: A Survey. ACM Trans. Intell. Syst. Technol. 11(3): 24:1-24:41, 2020

Page 24: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

General Steps

• Attack Positions

– Which word/subword/character?

– The whole sentence

– Representation in latent space

25

• Attack Strategies

insert, delete, switch, replace with:

– Similar token

– Synonyms, paraphrase

– Token within a constraint distance

Page 25: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

General Steps

• Perturbation Control the way to measure the size of the perturbation, so that it can be controlled to ensure the ability of fooling the victim DNN while remains less perceivable to a robust DNN.

26

– Edit-based measurement

• e.g., Levenshtein Distance

– Jaccard similarity coefficient

• Check token overlap

– Semantic-preserving measurement

– Norm-based measurement (L-p)

On originalrepresentation

Both

On vectorized representation

Page 26: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

General Steps

27

• Language Control

– Valid word or word embedding

– Grammar checker

– Language model

Perplexity

– Paraphrases

Page 27: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Content of this Tutorial

• Introduction to Adversarial Examples

• Attack considerations

• Black-box Attack

• White-box Attack

• Attack on Multi-modal Applications

• Adversarial Training via adversarial examples

• Future Perspective

28

Page 28: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Black-Box Attack

29

Application DNN

Original Example Adversarial ExampleAttack

Output

Page 29: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Black-Box Attack

• Edit Adversaries

• Paraphrase Adversaries

• GAN-based Adversaries

• BERT-based Adversaries

30

Page 30: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Jia and Liang EMNLP’17

• Generate concatenative adversaries

– Append distracting text to the paragraph

– Must ensure that added text does not actually answer the question

31

Paragraph: “The largest portion of the Huguenots to settle in the Cape arrived between 1688 and 1689…but quite a few arrived as late as 1700; thereafter, the numbers declined. The number of old Acadian colonists declined after the year of 1675.”

Question: “The number of new Huguenot colonists declined after what year?”

Correct Answer: “1700”

Predicted Answer: “1675”

Page 31: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

32

What city did Tesla move to in 1880? Prague

What city did Tadakatsu move to in 1881? Chicago

Tadakatsu moved the city of Chicago to in 1881.

Change entities,numbers, antonyms

Generate fake answer withsame NER/POS tag

Convert to declarative sentence

Have crowd workers fix errors

Jia and Liang EMNLP’17: AddSent

Tadakatsu moved to the city of Chicago in 1881.

Tadakatsu moved to Chicago in 1881.

In 1881, Tadakatsu moved to the city of Chicago.

Model failed if distracted by

any of these (concatenation)

Page 32: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Jia and Liang EMNLP’17: AddSent

• F1 score

33

Page 33: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Edit Adversaries

• DeepWordBug [Gao et al. SP’18]

– Scoring function

– Character-level transform are applied to the highest-ranked tokens

– Minimize the edit distance of the perturbation

34

Measure word importance according to the effect of the model prediction:

Page 34: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Edit Adversaries

• Probability Weighted Word Saliency [Ren et al. ACL’19]

35

– Saliency

– Use synonym that maximizes the change of prediction output as the substitute word.

Page 35: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Edit Adversaries

• Linguistic information

– POS-Tags [Alhazmi et al. IJCNN’20]

36

Ahoud Abdulrahmn F. Alhazmi, Wei Emma Zhang, Quan Z. Sheng, Abdulwahab Aljubairy: Analyzing the Sensitivity of Deep Neural Networks for Sentiment Analysis: A Scoring Approach. IJCNN 2020Ahoud Abdulrahmn F. Alhazmi, Wei Emma Zhang, Quan Z. Sheng, Abdulwahab Aljubairy: Are Modern Deep Learning Models for Sentiment Analysis Brittle: An Examination on Part-of-Speech. IJCNN 2020

Page 36: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Edit Adversaries

• Semantic relatedness [Alzantot et al. EMNLP’18]

– Nearest neighbours in the embedding space (Glove) are candidate adversarial examples

– Post-process the adversary's Glove embedding to ensure that the nearest neighbors are synonyms

– Use language model to filter out words that do not fit within the context surrounding the word

– Pick the one that will maximize the target label prediction probability when it replaces the word.

37

Page 37: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Edit Adversaries – Summary

• Sentence-level

• Character-level and word-level

– Swap with neighboring character/word

– Delete character/word

– Insert distracting character/word

– Replace words with their synonym or paraphrases

– Change a verb to the wrong tense or forms

– Negate the root verb of the source input

– Changes verbs, adjectives, or adverbs to their antonyms.

– …

38

Page 38: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Paraphrase-based Adversaries

• SCPN [Iyyer et al. NAACL’18]– Produces a paraphrase of the given sentence with desired syntax , the

output is the targeted paraphrase of the original sentence.

39

• use the pre-trained PARANMT-50M corpus from Wieting and Gimpel(2017): 50 million paraphrases obtained by backtranslating the Czech side of the Czech-English.

• Parse the backtranslatedparaphrases using the Stanford parser: <s1,s2> → <p1,p2>

• Relax the target syntactic form to a parse template (top two levels of the linearized parse tree):p2 → t2

• Given a paraphrase pair ⟨s1,s2⟩and corresponding target syntax trees ⟨p1, p2⟩, an encoder-decoder model trained on :

⟨s1,p2⟩→ s2• Given syntax trees ⟨p1,p2⟩ and

template ⟨t1,t2⟩, a parse generator is trained on :

⟨p1,t2⟩→ p2

Page 39: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Paraphrase-based Adversaries

• SCPN

– A: s1,p2->s2

– B: p1,t2 -> p2

40

Page 40: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Paraphrase-based Adversaries

• SCPN: On sentiment analysis

41

Page 41: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

GAN-based Adversaries

• [Zhao et al. ICLR’18] Search in the space of latent dense representation z of the input x and find adversarial z*. Then map z* to x* .

• Generator G on X . G: z->x

• Inverter I: x->z

• Adversarial example x*:

The perturbation is on dense representation:

• Training:

42

Page 42: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

GAN-based Adversaries

• Generating:

– utilize the inverter to obtain the latent vector , and feed perturbations in the neighborhood of to the generator to generate natural samples . Check the prediction on .

– Incrementally increase the search range within which the perturbations are randomly sampled until the generated samples that change the prediction. Among these samples, choose the one which has the closest to the original as an adversarial example x.

43

Page 43: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

GAN-based Adversaries

• For text:

– Use a regularized autoencoder to get continuous representation of text

– Two MLP for generator and inverter

44

Page 44: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

BERT-based

• BAE [Garg et al. EMNLP’20]

– A black box attack based on language model

– These perturbations in the input sentence are achieved by masking a part of the input and using a LM to fill in the mask.

45

The authors use BERT-MLM to predict masked tokens in the text for generating adversarial examples.The MASK token replaces a word (BAE-R attack) or is inserted to the left/right of the word (BAE-I).

Page 45: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

BERT-based

• Compute token importance by examining the changes of the model prediction– decide which word to perturb.

• For importance in descending order:

– Predict top-k tokens for the mask, ranked with the sentence similarity,

– For BAE-R, delete tokens with different POS

– Choose the token in the most similar sentence

– If none, choose the one that decreases the prediction probability the most

– Check the prediction- changes (success) or all tokens checked (fail)

46

Page 46: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Content of this Tutorial

• Introduction to Adversarial Examples

• Attack considerations

• Black-box Attack

• (a 5 mins break)

• White-box Attack

• Attack on Multi-modal Applications

• Adversarial Training via adversarial examples

• Future Perspective

47

Page 47: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Content of this Tutorial

• Introduction to Adversarial Examples

• Attack considerations

• Black-box Attack

• (a 5 mins break)

• White-box Attack

• Attack on Multi-modal Applications

• Adversarial Training via adversarial examples

• Future Perspective

48

Page 48: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

White-Box Attack

49

Application DNN

Original Example Adversarial ExampleAttack

Output

Page 49: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Fast Sign Gradient Method (FSGM)

• FSGM [Goodfellow et al. ICLR’15]

50

Page 50: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

FSGM

• Linearize

• Maximize

subject to

51

Gradient on x

Perturbation size

Loss

Page 51: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

FSGM

• FGSM hypothesizes that the designs of modern DNN intentionally encourage linear behavior for computational gains.

52

Page 52: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

White-Box Attack

53

• Gradient-based

• Optimisation-based

• Attention-based

• Adversarial Reprogramming

• Invariance Attack

Page 53: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Gradient-based Textual Attack

• TextFool [Liang et al. IJCAI’18]

54

Page 54: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

TextFool

• Identify text items (hot phrases) that are important for classification according to their cost gradients

– Word-level: word vectors that possesses the maximum highest gradient magnitude

– Character-level: hot characters containing the dimensions with maximum highest magnitude - > hot words contain >=3 hot characters.

• then leverage these items, to insert/ modify/ remove

55Combination of three strategies (83.7% Building to 95.7% Means of Transportation).

Page 55: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

TextFool

• Modification

– The modification should follow the direction of the cost gradient

, and against the direction of

56

attack CNN-based models

Page 56: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

HotFlip [Ebrahimi et al. ACL’17]

• Generate adversarial examples with character “flips”.

– Flip: given one-hot representation of the input, a character flip in the jth character of the ith word (a→b) can be represented by the following vector:

57

Maximize:

“aid” -> “bid”

a b

• The first-order approximation of change in loss

directional derivative

Page 57: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

HotFlip

58

• Inserts and deletes can be treated as a sequence of character flips

• Multiple Changes: <20% flips.

• Word-level: derivatives with respect to one-hot word Vectors + semantics-preserving Constraints.

• Efficiency: Rather than query-based method that need multiple backward and forwards passes, Hotfilp only requires one forward and backward passes.

Page 58: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Seq2Sick [Cheng et al. AAAI’20]

59

• A machine translation example

non-overlapping attack.

Target keywords “Hund Sitzt”

Page 59: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Seq2Sick

• Non-overlapping attack

60

< 0

Page 60: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Seq2Sick

• Targeted keyword attack

– do not specify the positions

61

Page 61: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Seq2Sick

• Overall objective function:

62

L_keyword or L_nonoverlapping Group lasso to avoid large changes (perturbation control)

Regularization that penalizes a large distance to the nearest point in word embedding space.

Page 62: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Attention-based

• [Blohm et al. CoNLL’18]

– For Question Answering

– Two white-box attention-based attack

• Word-level:

– The authors leveraged the model’s internal attention distribution to find the pivotal sentence, which is assigned a larger weight by the model to derive the correct answer.

– Then they exchanged the words that received the most attention with the randomly chosen words in a known vocabulary.

• Sentence-level:

– Remove the whole sentence that gets the highest attention.

64

Page 63: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Reprogramming

• Adversarial Reprogramming [Elsayed et al., ICLR’19] is a new class of adversarial attacks where the adversary wishes to repurpose an existing neural network for a new task chosen by the attacker, without the need for the attacker to compute the specific desired output.

• Adversarial reprogramming shares the same basic idea as adversarial examples: The attack changes the behaviorof a deep learning model by making changes to its input.

65

Page 64: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Reprogramming [Neekhara EMNLP’19]

s t

C C’

ls lt

original task adversary task

Page 65: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Reprogramming

67

• context-based vocabulary remapping model, a trainable 3D matrix .

• Generate the adversarial sequence s:

where

Page 66: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Reprogramming

• As si is discrete, the optimization problem is non-differentiable.

• Gumbel-Softmax trick to smoothen the s.

69

Page 67: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Reprogramming

70

Page 68: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Invariance Attack

• Invariance Attack [Chaturvedi et al. arXiv’20]

– Contrary to most adversarial attacks, this method looks to provide a model with maximally perturbed inputs that result in no change to the model’s output (i.e. invariance):

𝑓 𝑥′′ = 𝑦

where 𝑥′′ is a maximally perturbed input.

– Apply gradient to choose the position for replacement

– Choose words in the vocabulary but not in the input sentence as replacing words – choose the ones that keeps the loss minimized.

71

Page 69: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Content of this Tutorial

• Introduction to Adversarial Examples

• Attack considerations

• Black-box Attack

• White-box Attack

• Attack on Multi-modal Applications

• Adversarial Training via Adversarial Examples

• Future Perspective

72

Page 70: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Attacks on Multi-Modal Applications

• Continuous data space to discrete data space

73

Page 71: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Show-and-Fool [Chen et al. ACL’18]

• White-box optimization-based attack on CNN+RNN

• Targeted caption and targeted keyword

74

Page 72: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Show-and-Fool

• Targeted caption strategy:

– Given targeted caption

75

Page 73: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Show-and-Fool

• Targeted keywords strategy

– Given targeted keywords

76

Page 74: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Show-and-Fool

77

Page 75: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Content of this Tutorial

• Introduction to Adversarial Examples

• Attack considerations

• Black-box Attack

• White-box Attack

• Attack on Multi-modal Applications

• Adversarial Training via AdversarialExamples

• Future Perspective

78

Page 76: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Empirical Defences

• Defences that seem to work in practice, but lack of theocratical proof.

• Many of defences have broken soon when new attacks released in publications

• One notable exception is adversarial training

– Heuristic defence as it has not theoretical proof but is effective in most cases.

79

[Madry et al’ 18]

Page 77: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Adversarial Training

• Adversarial training is the process of training a model to correctly classify both unmodified examples and adversarial examples.

– Data augmentation extends the original training set with the generated adversarial examples

– Model Regularization. Model regularization enforces the generated adversarial examples as the regularizer and follows the form of

80

Page 78: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Adversarial Training

• Adversarial training is the process of training a model to correctly classify both unmodified examples and adversarial examples.

– Data augmentation extends the original training set with the generated adversarial examples

81

[Jia and Liang. EMNLP’17]

Page 79: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Adversarial Training

• Adversarial training is the process of training a model to correctly classify both unmodified examples and adversarial examples.

– Model Regularization. Model regularization enforces the generated adversarial examples as the regularizer and follows the form

where

In FSGM:

82

Page 80: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Adversarial Training [Miyato et al. ICLR’17]

• Applying perturbations to the word embeddings in a recurrent neural network

83

The model with perturbed embeddings

Page 81: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

[Sato et al. ACL’20]

• Applies the aforementioned method in NMT

85

Page 82: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Content of this Tutorial

• Introduction to Adversarial Examples

• Attack considerations

• Black-box Attack

• White-box Attack

• Attack on Multi-modal Applications

• Adversarial Training via Adversarial Examples

• Future Perspective

86

Page 83: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Future Perspectives

• Perceivability vs Attack Effectiveness

87

• Invariance-based Attack

• Transferability• same architecture with different data• different architectures with same application • different architectures with different data

[Yuan et al. 19’]

• Increase the robustness of the NLP models

• More applications

Page 84: Adversarial Attacks on Deep-learning based NLPwzhang/doc/ICONIP_Tut-21Nov...Adversarial Attacks on Deep-learning based NLP Tutorial @ ICONIP’20 Dr. Wei (Emma) Zhang 21 November 2020

Thanks!

Q&A

88

Please refer to our paper: Wei Emma Zhang, Quan Z. Sheng, Ahoud Abdulrahmn F. Alhazmi, Chenliang Li: Adversarial Attacks on Deep-learning Models in Natural Language Processing: A Survey. ACM Trans. Intell. Syst. Technol. 11(3): 24:1-24:41, 2020