convolutional neural networks for text classificationlin99.github.io/nlptm-2016/4.docs/cnn for...

41
Convolutional Neural Networks for Text Classification Sebastian Sierra MindLab Research Group July 1, 2016 Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 1 / 32

Upload: others

Post on 14-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

Convolutional Neural Networks for TextClassification

Sebastian Sierra

MindLab Research Group

July 1, 2016

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 1 / 32

Page 2: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

Outline

1 What is a Convolution?

2 What are Convolutional Neural Networks?

3 CNN for NLP

4 CNN hyperparameters

5 Example: The Model

6 Bibliography

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 2 / 32

Page 3: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

Outline

1 What is a Convolution?

2 What are Convolutional Neural Networks?

3 CNN for NLP

4 CNN hyperparameters

5 Example: The Model

6 Bibliography

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 3 / 32

Page 4: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

What is a Convolution?

Convolutions are great for extracting features from Images.

Convolutional Neural Networks (CNN) are biologically-inspiredvariants of MLPs

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 4 / 32

Page 5: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

Green: Input, Yellow: Convolutional Filter, Red: Output

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 5 / 32

Page 6: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

Green: Input, Yellow: Convolutional Filter, Red: Output

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 5 / 32

Page 7: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

Green: Input, Yellow: Convolutional Filter, Red: Output

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 5 / 32

Page 8: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

Green: Input, Yellow: Convolutional Filter, Red: Output

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 5 / 32

Page 9: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

Green: Input, Yellow: Convolutional Filter, Red: Output

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 5 / 32

Page 10: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

Green: Input, Yellow: Convolutional Filter, Red: Output

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 5 / 32

Page 11: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

Green: Input, Yellow: Convolutional Filter, Red: Output

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 5 / 32

Page 12: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

Green: Input, Yellow: Convolutional Filter, Red: Output

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 5 / 32

Page 13: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

Green: Input, Yellow: Convolutional Filter, Red: Output

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 5 / 32

Page 14: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

Outline

1 What is a Convolution?

2 What are Convolutional Neural Networks?

3 CNN for NLP

4 CNN hyperparameters

5 Example: The Model

6 Bibliography

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 6 / 32

Page 15: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

Convolutional Neural Network

Figure 1: Close up of Convolutional Neural Network

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 7 / 32

Page 16: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

Convolutional Neural Network

CNNs are networks composed of several layers of convolutions withnonlinear activation functions like ReLU or tanh applied to the results.

Traditional Layers are fully connected, instead CNN use localconnections.

Each layer applies different filters (thousands) like the ones showedabove, and combines their results.

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 8 / 32

Page 17: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

Properties of Convolutional Neural Networks

Local Invariance

Compositionality

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 9 / 32

Page 18: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

Outline

1 What is a Convolution?

2 What are Convolutional Neural Networks?

3 CNN for NLP

4 CNN hyperparameters

5 Example: The Model

6 Bibliography

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 10 / 32

Page 19: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

CNN in NLP

Do they make sense in NLP?

Perhaps Recurrent Neural Networks would make more sense trying to learnpatterns extracted from a text sequence. They are not cognitively orlinguistically plausible.

Advantage

There are fast GPU implementations for CNNs

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 11 / 32

Page 20: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

An example of how CNN work

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 12 / 32

Page 21: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

An example of how CNN work

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 13 / 32

Page 22: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

An example of how CNN work

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 14 / 32

Page 23: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

An example of how CNN work

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 15 / 32

Page 24: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

An example of how CNN work

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 16 / 32

Page 25: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

An example of how CNN work

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 17 / 32

Page 26: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

An example of how CNN work

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 18 / 32

Page 27: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

An example of how CNN work

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 19 / 32

Page 28: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

An example of how CNN work

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 20 / 32

Page 29: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

An example of how CNN work

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 21 / 32

Page 30: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

Char-CNN

Let’s try this model

C ∈ Rd×l : Matrix representation of a sequence of length l(140, 300, ?).

H ∈ Rd×w : Convolutional filter matrix where,

d : Dimensionality of character embeddings (used 30)

w : Width of convolution filter (3, 4, 5)

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 22 / 32

Page 31: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

Char-CNN

Let’s try this model

C ∈ Rd×l : Matrix representation of a sequence of length l(140, 300, ?).

H ∈ Rd×w : Convolutional filter matrix where,

d : Dimensionality of character embeddings (used 30)

w : Width of convolution filter (3, 4, 5)

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 22 / 32

Page 32: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

A simple architecture

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 23 / 32

Page 33: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

Steps for applying a CNN

1 Apply a convolution between C and H to obtain a vector f ∈ Rl−w+1

f [i ] = 〈C [∗, i : i + w − 1],H〉

〈A,B〉 is the Frobenius inner product. Tr(ABT )

2 This vector f is also known as feature map.

3 Take the maximum value over time as the feature that representsfilter H. (K-max pooling)

f̂ = relu(maxi{f [i ]}+ b)

4 Then we do the same for all m filters.

z = [f̂i , ..., f̂m]

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 24 / 32

Page 34: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

Intution behind

Why ReLU and not tanh?

Should I use multiple filter weights H?

Should I use variable filter widths w?

Can I add another channel as in Computer Vision domain?

Is Max-Pooling capturing the most important activation?

Would they capture morphological relations?

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 25 / 32

Page 35: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

Outline

1 What is a Convolution?

2 What are Convolutional Neural Networks?

3 CNN for NLP

4 CNN hyperparameters

5 Example: The Model

6 Bibliography

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 26 / 32

Page 36: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

Regularization tricks

Use dropout (Gradients are only backpropagated through certaininputs of z).

Constrain L2 norms of weight vectors of each class(rows in Softmax

matrx W (S)) to a fixed number: If ‖W (S)c ‖ > s, the rescale it so

‖W (S)c ‖ = s.

Early Stopping

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 27 / 32

Page 37: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

CNN in NLP - Previous Work

Previous works: NLP from scratch (Collobert et al. 2011).

Sentence or paragraph modelling using words as input (Kim 2014;Kalchbrenner, Grefenstette, and Blunsom 2014; Johnson andT. Zhang 2015a; Johnson and T. Zhang 2015b).

Text classification using characters as input (Kim et al. 2016;X. Zhang, Zhao, and LeCun 2015)

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 28 / 32

Page 38: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

Outline

1 What is a Convolution?

2 What are Convolutional Neural Networks?

3 CNN for NLP

4 CNN hyperparameters

5 Example: The Model

6 Bibliography

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 29 / 32

Page 39: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

Outline

1 What is a Convolution?

2 What are Convolutional Neural Networks?

3 CNN for NLP

4 CNN hyperparameters

5 Example: The Model

6 Bibliography

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 30 / 32

Page 40: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

Collobert, Ronan et al. (2011). “Natural language processing (almost)from scratch”. In: Journal of Machine Learning Research 12.Aug,pp. 2493–2537.

Johnson, Rie and Tong Zhang (2015a). “Effective Use of Word Order forText Categorization with Convolutional Neural Networks”. In:Proceedings of the 2015 Conference of the North American Chapter ofthe Association for Computational Linguistics: Human LanguageTechnologies. Denver, Colorado: Association for ComputationalLinguistics, pp. 103–112. url:http://www.aclweb.org/anthology/N15-1011.

– (2015b). “Semi-supervised convolutional neural networks for textcategorization via region embedding”. In: Advances in neuralinformation processing systems, pp. 919–927.

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 31 / 32

Page 41: Convolutional Neural Networks for Text Classificationlin99.github.io/NLPTM-2016/4.Docs/cnn for text.pdfCollobert, Ronan et al. (2011).\Natural language processing (almost) from scratch".In:

Kalchbrenner, Nal, Edward Grefenstette, and Phil Blunsom (2014). “AConvolutional Neural Network for Modelling Sentences”. In:Proceedings of the 52nd Annual Meeting of the Association forComputational Linguistics (Volume 1: Long Papers). Baltimore,Maryland: Association for Computational Linguistics, pp. 655–665. url:http://www.aclweb.org/anthology/P14-1062.

Kim, Yoon (2014). “Convolutional Neural Networks for SentenceClassification”. In: Proceedings of the 2014 Conference on EmpiricalMethods in Natural Language Processing (EMNLP). Doha, Qatar:Association for Computational Linguistics, pp. 1746–1751. url:http://www.aclweb.org/anthology/D14-1181.

Kim, Yoon et al. (2016). “Character-Aware Neural Language Models”. In:Thirtieth AAAI Conference on Artificial Intelligence.

Zhang, Xiang, Junbo Zhao, and Yann LeCun (2015). “Character-levelConvolutional Networks for Text Classification”. In: Advanced in NeuralInformation Processing Systems (NIPS 2015). Vol. 28.

Sebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 32 / 32