lecture 5 convolutional neural networks
TRANSCRIPT
![Page 1: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/1.jpg)
Lecture 5 Convolutional Neural
NetworksLTAT.01.001 – Natural Language Processing
Kairit Sirts ([email protected])17.03.2021
![Page 2: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/2.jpg)
Plan for today
● CNN as an analog to human visual system● CNN components
● Convolutional layers● Pooling layers● Linear and non-linear layers
● An example CNN for text classification● (Interpreting CNN decisions)
2
![Page 3: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/3.jpg)
Convolutional neural networks
● Feature extractors● Analogous to the brain’s visual system● First applied to image processing domain● Now also used in NLP
3
![Page 4: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/4.jpg)
Feature extraction in image detection
4
![Page 5: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/5.jpg)
Visual system in the brain
5
![Page 6: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/6.jpg)
Convolutional NN for image classification
6
![Page 7: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/7.jpg)
“Artificial” and “natural” CNN
7
![Page 8: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/8.jpg)
Convolutional layers
8
![Page 9: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/9.jpg)
Convolutional layers
9
𝐹 − #ilter
𝑂!,! = 𝐼!,!𝐹!,! + 𝐼!,#𝐹!,# + 𝐼!,$𝐹!,$ +𝐼#,!𝐹#,! + 𝐼#,#𝐹#,# + 𝐼#,$𝐹#,$ +𝐼$,!𝐹$,! + 𝐼$,#𝐹$,# + 𝐼$,$𝐹$,$
𝑂%,# = 𝐼%,#𝐹#,# + 𝐼%,$𝐹#,$ + 𝐼%,&𝐹#,& +𝐼',#𝐹$,# + 𝐼',$𝐹$,$ + 𝐼',&𝐹$,& +𝐼(,#𝐹&,# + 𝐼(,$𝐹&,$ + 𝐼(,&𝐹&,&
𝑂),* = -+,!
-.#
-/,!
0.#
𝐼)1+,*1/ . 𝐹+,/
𝐼 − input
https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-convolutional-neural-networks
𝑂 − output
![Page 10: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/10.jpg)
Convolutional layer for text
10
Embed Embed Embed EmbedEmbed Embed
The cat sat on the mat
![Page 11: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/11.jpg)
Convolutional layer for text
● The convolutional filter is an n-gram detector
● Concatenate the embeddings of the n-gram
● Compute dot-product between the concatenated embedding vector and the filter
● The result is a scalar score for the n-gram
11
Embed EmbedEmbed
The cat sat
×
1D filter
![Page 12: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/12.jpg)
Convolutional layer for text
12
Embed Embed Embed EmbedEmbed Embed
The cat sat on the mat
×
![Page 13: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/13.jpg)
Convolutional layer for text
13
Embed Embed Embed EmbedEmbed Embed
The cat sat on the mat
×
![Page 14: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/14.jpg)
Convolutional layer for text
14
Embed Embed Embed EmbedEmbed Embed
The cat sat on the mat
×
![Page 15: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/15.jpg)
Convolutional layer for text
15
Embed Embed Embed EmbedEmbed Embed
The cat sat on the mat
×
![Page 16: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/16.jpg)
Another (but equivalent) formulation
16
Embed Embed Embed EmbedEmbed Embed
The cat sat on the mat
×Element-wise multiplicationThen sum up
Again, apply to every n-gram
![Page 17: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/17.jpg)
Different filters: unigrams, bigrams, trigrams etc
17
Embed Embed Embed EmbedEmbed Embed
The cat sat on the mat
× × ×
unigrams bigrams trigrams
![Page 18: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/18.jpg)
Number of filters
● There are typically several filters for each n-gram type● This is a tunable hyperparameter
18The cat sat on the mat
×
Number of filters = 3
![Page 19: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/19.jpg)
Final representation
● After applying 5 different filters to unigrams, bigrams and trigrams
19The cat sat on the mat
unigrams bigrams trigrams
filte
rs =
5
![Page 20: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/20.jpg)
Narrow and wide convolution
● Narrow convolution: ● start the n-grams from the first word and end with the last word in the text● The resulting vector has n-k+1 elements● n – length of the sequence● k – n-gram length
20
The cat sat on the mat
![Page 21: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/21.jpg)
Narrow and wide convolution
● Wide convolution:● Pad the sequence from the beginning and and with a special PAD symbol● The length of the resulting vector is equal to n + k – 1
21
The cat sat on the matPAD PAD
![Page 22: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/22.jpg)
Wide convolution
● Trigram filter
22
The cat sat on the matPAD PADPAD PAD
![Page 23: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/23.jpg)
Pooling layers
23
![Page 24: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/24.jpg)
Pooling layers
● The goal is to reduce the dimensionality of the feature map (output of the convolutional layer)
● Brings the representations of different inputs to the same dimensionality● Typical pooling operations:
● max-pooling (over time)● average pooling
24
![Page 25: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/25.jpg)
Max-pooling over time
● Retain the maximum value over all values computed by a filter● The information about the most relevant n-grams are kept
2525The cat sat on the mat
×
Number of filters = 3max-pool
![Page 26: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/26.jpg)
Max-pooling over time
26
unigrams bigrams trigrams
filte
rs =
5
The cat sat on the mat
![Page 27: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/27.jpg)
Max-pooling over time
27
unigrams bigrams trigrams
filte
rs =
5
The cat sat on the mat
![Page 28: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/28.jpg)
28
unigrams bigrams trigrams
filte
rs =
5
The cat sat on the mat
unigrams
![Page 29: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/29.jpg)
29
unigrams bigrams trigrams
filte
rs =
5
The cat sat on the mat
unigrams bigrams
![Page 30: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/30.jpg)
30
unigrams bigrams trigrams
filte
rs =
5
The cat sat on the mat
unigrams bigrams trigrams
![Page 31: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/31.jpg)
Other pooling layers
● Average pooling: compute the average along the filter axis● K-max-pooling: retain k maximum elements (instead of just one), keep the
original order of the elements
31
![Page 32: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/32.jpg)
Linear and non-linear layers
32
![Page 33: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/33.jpg)
Linear layer
● Linear layer is just a parameter matrix● The input vector is multiplied with the linear layer matrix from the left
33
Linear layerW
inpu
t dim
output dim
input x output y
× =
𝒚 = 𝒙 $ 𝑊
![Page 34: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/34.jpg)
Bias vector
● Often a bias vector is part of the linear layer
34
Linear layerW
inpu
t dim
output dim
input x output y
× =
𝒚 = 𝒙 $ 𝑊 + 𝒃
+
bias b
![Page 35: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/35.jpg)
Non-linear activation functions
● Typically applied on top of a linear layer
● Except for the linear output layer – after that typically comes the softmax
36
![Page 36: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/36.jpg)
An example CNN for text classification
37
![Page 37: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/37.jpg)
Y. Kim, 2014. Convolutional Neural Networks for Sentence Classification
38
![Page 38: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/38.jpg)
Things to consider
● Pretrained word embeddings or learn from scratch?● What kind of n-gram filters?● How many filters per n-gram?● Dropout?● How to model the output layer in binary case? One-dimensional with
sigmoid or two-dimensional with softmax?● Something else?
39
![Page 39: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/39.jpg)
Interpreting CNNs
40
![Page 40: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/40.jpg)
Decision interpretation
● For each test item it is possible to check which n-grams passed the max-pool and thus contributed to the classification decision
41Examples from: Jacovi et al., 2018. Understanding Convolutional Neural Networks for Text Classification
![Page 41: Lecture 5 Convolutional Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022013101/61d1eb16bd0dc9448c5441dd/html5/thumbnails/41.jpg)
Model interpretation
● Data from reddit● Model for predicting users with self-reported eating disorder● Posts extracted from non mental health related sub-reddits● The model configuration
● The model architecture of Kim (2014)● n-gram filters: unigrams to 5-grams● 100 filters per n-gram● Inputs: all posts per user concatenated together● Maximum length of the input: 15000 words● All words appearing less than 20 times are replaced with UNK token
● Look, which n-grams most frequently passed through the max-pool of each filter
42