virtual adversarial training for semi-supervised …virtual adversarial training for semi-supervised...
TRANSCRIPT
![Page 1: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/1.jpg)
Virtual Adversarial Training for Semi-Supervised Text Classification
Takeru Miyato, Andrew M. Dai, Ian Goodfellow
Slides By Roee Aharoni BIU NLP summer 2016 reading group
![Page 2: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/2.jpg)
Motivation
![Page 3: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/3.jpg)
Outline• Introduction to (virtual) adversarial training
• Virtual adversarial training for text classification
• Experimental Setup
• Results (and some analysis)
• Conclusions
![Page 4: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/4.jpg)
A basic NN training procedure
![Page 5: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/5.jpg)
Training Example
A basic NN training procedure
![Page 6: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/6.jpg)
Training Example
A basic NN training procedure
![Page 7: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/7.jpg)
Training Example
A basic NN training procedure
![Page 8: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/8.jpg)
Training Example
Prediction
A basic NN training procedure
![Page 9: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/9.jpg)
Training Example
True Label
Prediction
A basic NN training procedure
![Page 10: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/10.jpg)
Training Example
True Label
Prediction
Compute Loss
A basic NN training procedure
![Page 11: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/11.jpg)
Training Example
True Label
Prediction
Compute Loss
Backpropagate loss
A basic NN training procedure
![Page 12: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/12.jpg)
Adversarial Examples
![Page 13: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/13.jpg)
Adversarial Examples
That’s problematic. How can we solve it?
![Page 14: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/14.jpg)
Adversarial Training for NN’s
![Page 15: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/15.jpg)
Adversarial Training for NN’s
Training Example
![Page 16: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/16.jpg)
Adversarial Training for NN’s
Training Example
![Page 17: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/17.jpg)
Adversarial Training for NN’s
Training Example
Perturbed Example
![Page 18: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/18.jpg)
Adversarial Training for NN’s
Training Example
Perturbed Example
![Page 19: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/19.jpg)
Adversarial Training for NN’s
Training Example
Perturbed Example
![Page 20: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/20.jpg)
Adversarial Training for NN’s
Training Example
Prediction
Perturbed Example
![Page 21: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/21.jpg)
Adversarial Training for NN’s
Training Example
True Label
Prediction
Perturbed Example
“panda”
![Page 22: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/22.jpg)
Adversarial Training for NN’s
Training Example
True Label
Prediction
Compute Loss Function
Perturbed Example
“panda”
![Page 23: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/23.jpg)
Adversarial Training for NN’s
Training Example
True Label
Prediction
Compute Loss Function
Backpropagate loss
Perturbed Example
“panda”
![Page 24: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/24.jpg)
Adversarial Training for NN’s
• The general addition to the cost function for adversarial training:
• In practice, take a change depending on the gradient :
![Page 25: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/25.jpg)
Virtual Adversarial Training
• Extends adversarial training to the semi-supervised regime
• The key idea - make the output distribution for an original and perturbrated example close to each other
• Enables the use of large amounts of unlabeled data
![Page 26: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/26.jpg)
Virtual Adversarial Training for NN’s
![Page 27: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/27.jpg)
Virtual Adversarial Training for NN’s
Unlabeled Example
![Page 28: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/28.jpg)
Virtual Adversarial Training for NN’s
Unlabeled Example
Perturbed Example
![Page 29: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/29.jpg)
Virtual Adversarial Training for NN’s
Unlabeled Example
Perturbed Example
![Page 30: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/30.jpg)
Virtual Adversarial Training for NN’s
Unlabeled Example
Perturbed Example
Prediction Distribution Prediction Distribution
![Page 31: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/31.jpg)
Virtual Adversarial Training for NN’s
Unlabeled Example
Perturbed Example
Prediction Distribution Prediction Distribution
Measure KL Divergence-based loss
![Page 32: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/32.jpg)
Virtual Adversarial Training for NN’s
Unlabeled Example
Backpropagate loss
Perturbed Example
Prediction Distribution Prediction Distribution
Measure KL Divergence-based loss
![Page 33: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/33.jpg)
Virtual Adversarial Training for NN’s
• The general addition to the cost function for virtual adversarial training:
• Again, in practice, there is an efficient way to approximate this (as detailed in Miyato et. al., 2016)
![Page 34: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/34.jpg)
Model - Adversarial Training for Text Classification
• Adversarial perturbations typically consist of making small modifications to very many real-valued inputs (i.e. pixels in the previous examples)
• For text classification, the input is discrete, and usually represented as a series of high-dimensional one-hot vectors (where such small modifications are impossible).
• Solution: define the perturbation on continuous word embeddings instead of discrete word inputs.
![Page 35: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/35.jpg)
Model - Adversarial Training for Text Classification
• The perturbation is introduced to normalized embeddings to avoid the network from learning to ignore them:
![Page 36: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/36.jpg)
• As we model the input text as:
• The perturbation is defined as:
• And the addition to the loss function is:
Adversarial Training for Text Classification
=
![Page 37: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/37.jpg)
Virtual Adversarial Training for Text Classification
• Here, the perturbation is defined as:
• And the addition to the loss function is then:
![Page 38: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/38.jpg)
Experimental Settings• 5 datasets:
• Sentiment classification (binary): IMDB, Rotten Tomatoes, Elec
• Topic classification (multiclass): DBpedia, RCV1
![Page 39: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/39.jpg)
• Treat punctuation as spaces
• Convert words to lower case
• Remove words which appear in only one document
• RCV1 - remove stop words
Experimental Settings - Preprocessing
![Page 40: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/40.jpg)
Pre-Training Tricks and Hyperparams
• Initialize word embeddings and LSTM weights with RNNLM on labeled and unlabeled examples
• Single layer LSTM, 1024 units (512 for BiLSTM)
• Embedding size: 256(IMDB, BiLSTM)/512(Rest)
• Sampled softmax loss with 1024 candidate samples (?)
• Adam optimization, 256 samples per batch
• 0.5 dropout rate on the word embeddings
![Page 41: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/41.jpg)
Classification Model Tricks and Hyperparams
• 1 Hidden layer before softmax, 30(IMDB, Elec, Rotten)/128(Rest) units
• ReLU activation function
• batch size - 64(IMDB, Elec, RCV1)/128(Rest)
• 10k-20k training steps for each model
• Truncated back propagation - stop back propagating after 400 steps
• Generate perturbation after dropout
• Optimize epsilon, dropout rate on validation set
![Page 42: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/42.jpg)
Results - IMDB• Adversarial and virtual adversarial training show lower
negative log-likelihood
• Virtual adversarial training also improves the adversarial training loss
![Page 43: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/43.jpg)
Results - IMDB
![Page 44: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/44.jpg)
Embedding-Based Analysis
![Page 45: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/45.jpg)
Results - Topic classifiction: Elec, RCV1
• Improved SOTA on Elec, RCV1, without using CNN’s
![Page 46: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/46.jpg)
Results - Sentiment analysis: Rotten Tomatoes
• Adversarial+Virtual adv. performs equally to SOTA
• Virtual adversarial is weaker than baseline - could be due to small amount of supervised examples, short sentences
![Page 47: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/47.jpg)
Results - DBpedia• Baseline itself improves over SOTA, virtual
adversarial performs best
![Page 48: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/48.jpg)
Related Work
• Dropout/Random Noise
• Generative Models
• Pre-Training as semi-supervised learning
![Page 49: Virtual Adversarial Training for Semi-Supervised …Virtual Adversarial Training for Semi-Supervised Text Classification Takeru Miyato, Andrew M. Dai, Ian Goodfellow Slides By Roee](https://reader033.vdocument.in/reader033/viewer/2022050416/5f8c59b7cbd25360046204b2/html5/thumbnails/49.jpg)
Conclusion
• Adversarial and virtual adversarial training provides good regularization performance for text classification with RNN’s
• Provides SOTA results or on-par results for the examined datasets
• “Improved Quality” of word embeddings