nlp programming tutorial 10 - neural networks · nlp programming tutorial 10 – neural networks...
TRANSCRIPT
![Page 1: NLP Programming Tutorial 10 - Neural Networks · NLP Programming Tutorial 10 – Neural Networks NLP Programming Tutorial 10 - Neural Networks ... This is binary classification (of](https://reader031.vdocument.in/reader031/viewer/2022022006/5ac0f84e7f8b9aca388c848c/html5/thumbnails/1.jpg)
1
NLP Programming Tutorial 10 – Neural Networks
NLP Programming Tutorial 10 -Neural Networks
Graham NeubigNara Institute of Science and Technology (NAIST)
![Page 2: NLP Programming Tutorial 10 - Neural Networks · NLP Programming Tutorial 10 – Neural Networks NLP Programming Tutorial 10 - Neural Networks ... This is binary classification (of](https://reader031.vdocument.in/reader031/viewer/2022022006/5ac0f84e7f8b9aca388c848c/html5/thumbnails/2.jpg)
2
NLP Programming Tutorial 10 – Neural Networks
Prediction Problems
Given x, predict y
![Page 3: NLP Programming Tutorial 10 - Neural Networks · NLP Programming Tutorial 10 – Neural Networks NLP Programming Tutorial 10 - Neural Networks ... This is binary classification (of](https://reader031.vdocument.in/reader031/viewer/2022022006/5ac0f84e7f8b9aca388c848c/html5/thumbnails/3.jpg)
3
NLP Programming Tutorial 10 – Neural Networks
Example we will use:
● Given an introductory sentence from Wikipedia
● Predict whether the article is about a person
● This is binary classification (of course!)
Given
Gonso was a Sanron sect priest (754-827)in the late Nara and early Heian periods.
Predict
Yes!
Shichikuzan Chigogataki Fudomyoo isa historical site located at Magura, MaizuruCity, Kyoto Prefecture.
No!
![Page 4: NLP Programming Tutorial 10 - Neural Networks · NLP Programming Tutorial 10 – Neural Networks NLP Programming Tutorial 10 - Neural Networks ... This is binary classification (of](https://reader031.vdocument.in/reader031/viewer/2022022006/5ac0f84e7f8b9aca388c848c/html5/thumbnails/4.jpg)
4
NLP Programming Tutorial 10 – Neural Networks
Linear Classifiers
y = sign (w⋅ϕ(x))
= sign (∑i=1
Iw i⋅ϕi( x))
● x: the input
● φ(x): vector of feature functions {φ1(x), φ
2(x), …, φ
I(x)}
● w: the weight vector {w1, w
2, …, w
I}
● y: the prediction, +1 if “yes”, -1 if “no”● (sign(v) is +1 if v >= 0, -1 otherwise)
![Page 5: NLP Programming Tutorial 10 - Neural Networks · NLP Programming Tutorial 10 – Neural Networks NLP Programming Tutorial 10 - Neural Networks ... This is binary classification (of](https://reader031.vdocument.in/reader031/viewer/2022022006/5ac0f84e7f8b9aca388c848c/html5/thumbnails/5.jpg)
5
NLP Programming Tutorial 10 – Neural Networks
Example Feature Functions:Unigram Features
● Equal to “number of times a particular word appears”
x = A site , located in Maizuru , Kyotoφ
unigram “A”(x) = 1 φ
unigram “site”(x) = 1 φ
unigram “,”(x) = 2
φunigram “located”
(x) = 1 φunigram “in”
(x) = 1
φunigram “Maizuru”
(x) = 1 φunigram “Kyoto”
(x) = 1
φunigram “the”
(x) = 0 φunigram “temple”
(x) = 0
…The restare all 0
● For convenience, we use feature names (φunigram “A”
)
instead of feature indexes (φ1)
![Page 6: NLP Programming Tutorial 10 - Neural Networks · NLP Programming Tutorial 10 – Neural Networks NLP Programming Tutorial 10 - Neural Networks ... This is binary classification (of](https://reader031.vdocument.in/reader031/viewer/2022022006/5ac0f84e7f8b9aca388c848c/html5/thumbnails/6.jpg)
6
NLP Programming Tutorial 10 – Neural Networks
Calculating the Weighted Sumx = A site , located in Maizuru , Kyoto
φunigram “A”
(x) = 1φ
unigram “site”(x) = 1
φunigram “,”
(x) = 2
φunigram “located”
(x) = 1
φunigram “in”
(x) = 1
φunigram “Maizuru”
(x) = 1
φunigram “Kyoto”
(x) = 1
wunigram “a”
= 0
wunigram “site”
= -3
wunigram “located”
= 0
wunigram “Maizuru”
= 0
wunigram “,”
= 0w
unigram “in” = 0
wunigram “Kyoto”
= 0
φunigram “priest”
(x) = 0 wunigram “priest”
= 2
φunigram “black”
(x) = 0 wunigram “black”
= 0
* =
0
-3
…
0
000
0
0
0
…
+
++
+
+++
++
=-3 → No!
![Page 7: NLP Programming Tutorial 10 - Neural Networks · NLP Programming Tutorial 10 – Neural Networks NLP Programming Tutorial 10 - Neural Networks ... This is binary classification (of](https://reader031.vdocument.in/reader031/viewer/2022022006/5ac0f84e7f8b9aca388c848c/html5/thumbnails/7.jpg)
7
NLP Programming Tutorial 10 – Neural Networks
The Perceptron
● Think of it as a “machine” to calculate a weighted sum
sign(∑i=1
Iw i⋅ϕi( x))
φ“A”
= 1φ
“site”= 1
φ“,”
= 2
φ“located”
= 1
φ“in”
= 1
φ“Maizuru”
= 1
φ“Kyoto”
= 1
φ“priest”
= 0
φ“black”
= 0
0
-30000020
-1
![Page 8: NLP Programming Tutorial 10 - Neural Networks · NLP Programming Tutorial 10 – Neural Networks NLP Programming Tutorial 10 - Neural Networks ... This is binary classification (of](https://reader031.vdocument.in/reader031/viewer/2022022006/5ac0f84e7f8b9aca388c848c/html5/thumbnails/8.jpg)
8
NLP Programming Tutorial 10 – Neural Networks
Problem: Linear Constraint
● Perceptron cannot achieve high accuracy on non-linear functions
X
O
O
X
![Page 9: NLP Programming Tutorial 10 - Neural Networks · NLP Programming Tutorial 10 – Neural Networks NLP Programming Tutorial 10 - Neural Networks ... This is binary classification (of](https://reader031.vdocument.in/reader031/viewer/2022022006/5ac0f84e7f8b9aca388c848c/html5/thumbnails/9.jpg)
9
NLP Programming Tutorial 10 – Neural Networks
Neural Networks
● Neural networks connect multiple perceptrons together
φ“A”
= 1φ
“site”= 1
φ“,”
= 2
φ“located”
= 1
φ“in”
= 1
φ“Maizuru”
= 1
φ“Kyoto”
= 1
φ“priest”
= 0
φ“black”
= 0
-1
● Motivation: Can express non-linear functions
![Page 10: NLP Programming Tutorial 10 - Neural Networks · NLP Programming Tutorial 10 – Neural Networks NLP Programming Tutorial 10 - Neural Networks ... This is binary classification (of](https://reader031.vdocument.in/reader031/viewer/2022022006/5ac0f84e7f8b9aca388c848c/html5/thumbnails/10.jpg)
10
NLP Programming Tutorial 10 – Neural Networks
Example:
● Build two classifiers:
X
O
O
X
φ(x2) = {1, 1}φ(x
1) = {-1, 1}
φ(x4) = {1, -1}φ(x
3) = {-1, -1}
w1
w2
φ1
φ2
1
φ1
φ2
1
1
1
-1
-1
-1
-1
φ1
φ2 y
1
y2
![Page 11: NLP Programming Tutorial 10 - Neural Networks · NLP Programming Tutorial 10 – Neural Networks NLP Programming Tutorial 10 - Neural Networks ... This is binary classification (of](https://reader031.vdocument.in/reader031/viewer/2022022006/5ac0f84e7f8b9aca388c848c/html5/thumbnails/11.jpg)
11
NLP Programming Tutorial 10 – Neural Networks
Example:
● These classifiers map the points to a new space
X
O
O
X
φ(x2) = {1, 1}φ(x
1) = {-1, 1}
φ(x4) = {1, -1}φ(x
3) = {-1, -1}
11-1
-1-1-1
φ1
φ2
y2
y1
y1
y2
y(x1) = {-1, -1}
X Oy(x
2) = {1, -1}
O
y(x3) = {-1, 1}
y(x4) = {-1, -1}
![Page 12: NLP Programming Tutorial 10 - Neural Networks · NLP Programming Tutorial 10 – Neural Networks NLP Programming Tutorial 10 - Neural Networks ... This is binary classification (of](https://reader031.vdocument.in/reader031/viewer/2022022006/5ac0f84e7f8b9aca388c848c/html5/thumbnails/12.jpg)
12
NLP Programming Tutorial 10 – Neural Networks
Example: ● In the new space, examples are classifiable!
X
O
O
X
φ(x2) = {1, 1}φ(x
1) = {-1, 1}
φ(x4) = {1, -1}φ(x
3) = {-1, -1}
11-1
-1-1-1
φ1
φ2
y2
y1
y1
y2
y(x1) = {-1, -1}
X O y(x2) = {1, -1}
Oy(x3) = {-1, 1}
y(x4) = {-1, -1}
111
y3
![Page 13: NLP Programming Tutorial 10 - Neural Networks · NLP Programming Tutorial 10 – Neural Networks NLP Programming Tutorial 10 - Neural Networks ... This is binary classification (of](https://reader031.vdocument.in/reader031/viewer/2022022006/5ac0f84e7f8b9aca388c848c/html5/thumbnails/13.jpg)
13
NLP Programming Tutorial 10 – Neural Networks
Example:
● Final neural network:
w1
w2
φ1
φ2
1
φ1
φ2
1
1
1
-1
-1
-1
-1
1 1
1
1
w3 y4
![Page 14: NLP Programming Tutorial 10 - Neural Networks · NLP Programming Tutorial 10 – Neural Networks NLP Programming Tutorial 10 - Neural Networks ... This is binary classification (of](https://reader031.vdocument.in/reader031/viewer/2022022006/5ac0f84e7f8b9aca388c848c/html5/thumbnails/14.jpg)
14
NLP Programming Tutorial 10 – Neural Networks
Representing a Neural Network
● Assume network is fully connected and in layers
● Each perceptron:● A layer ID● A weight vector
network = [ (1, w
0),
(1, w1),
(1, w2),
(2, w3)
]
φ“A”
= 1φ
“site”= 1
φ“,”
= 2
φ“located”
= 1
φ“in”
= 1
φ“Maizuru”
= 1
φ“Kyoto”
= 1
φ“priest”
= 0
φ“black”
= 0
0
1
2
3
Layer 1 Layer 2
![Page 15: NLP Programming Tutorial 10 - Neural Networks · NLP Programming Tutorial 10 – Neural Networks NLP Programming Tutorial 10 - Neural Networks ... This is binary classification (of](https://reader031.vdocument.in/reader031/viewer/2022022006/5ac0f84e7f8b9aca388c848c/html5/thumbnails/15.jpg)
15
NLP Programming Tutorial 10 – Neural Networks
Neural Network Prediction Process
● Predict one perceptron at a time using previous layer
φ“A”
= 1φ
“site”= 1
φ“,”
= 2
φ“located”
= 1
φ“in”
= 1
φ“Maizuru”
= 1
φ“Kyoto”
= 1
φ“priest”
= 0
φ“black”
= 0
0
1
2
3
![Page 16: NLP Programming Tutorial 10 - Neural Networks · NLP Programming Tutorial 10 – Neural Networks NLP Programming Tutorial 10 - Neural Networks ... This is binary classification (of](https://reader031.vdocument.in/reader031/viewer/2022022006/5ac0f84e7f8b9aca388c848c/html5/thumbnails/16.jpg)
16
NLP Programming Tutorial 10 – Neural Networks
Neural Network Prediction Process
● Predict one perceptron at a time using previous layer
φ“A”
= 1φ
“site”= 1
φ“,”
= 2
φ“located”
= 1
φ“in”
= 1
φ“Maizuru”
= 1
φ“Kyoto”
= 1
φ“priest”
= 0
φ“black”
= 0
0
1
2
3
-1
![Page 17: NLP Programming Tutorial 10 - Neural Networks · NLP Programming Tutorial 10 – Neural Networks NLP Programming Tutorial 10 - Neural Networks ... This is binary classification (of](https://reader031.vdocument.in/reader031/viewer/2022022006/5ac0f84e7f8b9aca388c848c/html5/thumbnails/17.jpg)
17
NLP Programming Tutorial 10 – Neural Networks
Neural Network Prediction Process
● Predict one perceptron at a time using previous layer
φ“A”
= 1φ
“site”= 1
φ“,”
= 2
φ“located”
= 1
φ“in”
= 1
φ“Maizuru”
= 1
φ“Kyoto”
= 1
φ“priest”
= 0
φ“black”
= 0
0
1
2
3
-1
1
![Page 18: NLP Programming Tutorial 10 - Neural Networks · NLP Programming Tutorial 10 – Neural Networks NLP Programming Tutorial 10 - Neural Networks ... This is binary classification (of](https://reader031.vdocument.in/reader031/viewer/2022022006/5ac0f84e7f8b9aca388c848c/html5/thumbnails/18.jpg)
18
NLP Programming Tutorial 10 – Neural Networks
Neural Network Prediction Process
● Predict one perceptron at a time using previous layer
φ“A”
= 1φ
“site”= 1
φ“,”
= 2
φ“located”
= 1
φ“in”
= 1
φ“Maizuru”
= 1
φ“Kyoto”
= 1
φ“priest”
= 0
φ“black”
= 0
0
1
2
3
-1
1
1
![Page 19: NLP Programming Tutorial 10 - Neural Networks · NLP Programming Tutorial 10 – Neural Networks NLP Programming Tutorial 10 - Neural Networks ... This is binary classification (of](https://reader031.vdocument.in/reader031/viewer/2022022006/5ac0f84e7f8b9aca388c848c/html5/thumbnails/19.jpg)
19
NLP Programming Tutorial 10 – Neural Networks
Neural Network Prediction Process
● Predict one perceptron at a time using previous layer
φ“A”
= 1φ
“site”= 1
φ“,”
= 2
φ“located”
= 1
φ“in”
= 1
φ“Maizuru”
= 1
φ“Kyoto”
= 1
φ“priest”
= 0
φ“black”
= 0
0
1
2
3
-1
1
1
-1
![Page 20: NLP Programming Tutorial 10 - Neural Networks · NLP Programming Tutorial 10 – Neural Networks NLP Programming Tutorial 10 - Neural Networks ... This is binary classification (of](https://reader031.vdocument.in/reader031/viewer/2022022006/5ac0f84e7f8b9aca388c848c/html5/thumbnails/20.jpg)
20
NLP Programming Tutorial 10 – Neural Networks
Review:Pseudo-code for Perceptron Predicton
predict_one(w, phi) score = 0
for each name, value in phi # score = w*φ(x)if name exists in w
score += value * w[name]if score >= 0
return 1else
return -1
![Page 21: NLP Programming Tutorial 10 - Neural Networks · NLP Programming Tutorial 10 – Neural Networks NLP Programming Tutorial 10 - Neural Networks ... This is binary classification (of](https://reader031.vdocument.in/reader031/viewer/2022022006/5ac0f84e7f8b9aca388c848c/html5/thumbnails/21.jpg)
21
NLP Programming Tutorial 10 – Neural Networks
Pseudo-Code for NN Prediction
predict_nn(network, phi) y = [ phi, {}, {} … ] # activations for each layer for each node i:
layer, weight = network[i] # predict the answer with the previous perceptron
answer = predict_one(weight, y[layer-1])# save this answer as a feature for the next layery[layer][i] = answer
return the answer for the last perceptron
![Page 22: NLP Programming Tutorial 10 - Neural Networks · NLP Programming Tutorial 10 – Neural Networks NLP Programming Tutorial 10 - Neural Networks ... This is binary classification (of](https://reader031.vdocument.in/reader031/viewer/2022022006/5ac0f84e7f8b9aca388c848c/html5/thumbnails/22.jpg)
22
NLP Programming Tutorial 10 – Neural Networks
-4 -3 -2 -1 0 1 2 3 4
-2
0
2
sign(φ*x)y
Neural Network Activation Functions
● Previously described NN uses step function
● Step function is not differentiable → use tanh
y=sign(w⋅ϕ( x))
-4 -3 -2 -1 0 1 2 3 4
-2-1012
tanh(φ*x)
y
y=tanh (w⋅ϕ(x ))
Python:from math import tanhtanh(x)
![Page 23: NLP Programming Tutorial 10 - Neural Networks · NLP Programming Tutorial 10 – Neural Networks NLP Programming Tutorial 10 - Neural Networks ... This is binary classification (of](https://reader031.vdocument.in/reader031/viewer/2022022006/5ac0f84e7f8b9aca388c848c/html5/thumbnails/23.jpg)
23
NLP Programming Tutorial 10 – Neural Networks
Learning a Perceptron w/ tanh
● First, calculate the error:
● Update each weight with:
● Where λ is the learning rate
● (For step function perceptron δ = -2 or +2, λ = 1/2)
δ = y' - y
correct tag system output
w ←w+ λ⋅δ⋅ϕ(x )
![Page 24: NLP Programming Tutorial 10 - Neural Networks · NLP Programming Tutorial 10 – Neural Networks NLP Programming Tutorial 10 - Neural Networks ... This is binary classification (of](https://reader031.vdocument.in/reader031/viewer/2022022006/5ac0f84e7f8b9aca388c848c/html5/thumbnails/24.jpg)
24
NLP Programming Tutorial 10 – Neural Networks
Problem: Don't Know Correct Answer!
● For NNs, only know correct tag for last layer
φ“A”
= 1φ
“site”= 1
φ“,”
= 2
φ“located”
= 1
φ“in”
= 1
φ“Maizuru”
= 1
φ“Kyoto”
= 1
φ“priest”
= 0
φ“black”
= 0
0
1
2
3y' = 1y = -1
y' = ?y = 1
y' = ?y = 1
y' = ?y = 1
![Page 25: NLP Programming Tutorial 10 - Neural Networks · NLP Programming Tutorial 10 – Neural Networks NLP Programming Tutorial 10 - Neural Networks ... This is binary classification (of](https://reader031.vdocument.in/reader031/viewer/2022022006/5ac0f84e7f8b9aca388c848c/html5/thumbnails/25.jpg)
25
NLP Programming Tutorial 10 – Neural Networks
-4 -3 -2 -1 0 1 2 3 4
-2
0
2
y
Answer: Back-Propogation● Pass error backwards along the network
● Also consider gradient of tanh
● Combine:
δ = -0.9
δ = 0.2
δ = 0.4
j
w=0.1
w=1
w=-0.3
∑iδ iw j ,i
d tanh (ϕ(x )∗w)=1−(ϕ(x)∗w)2=1− y j
2
δ j=(1− y j2)∑i
δiw j ,i
![Page 26: NLP Programming Tutorial 10 - Neural Networks · NLP Programming Tutorial 10 – Neural Networks NLP Programming Tutorial 10 - Neural Networks ... This is binary classification (of](https://reader031.vdocument.in/reader031/viewer/2022022006/5ac0f84e7f8b9aca388c848c/html5/thumbnails/26.jpg)
26
NLP Programming Tutorial 10 – Neural Networks
Back Propagation Codeupdate_nn(network, phi, y')
create array δ calculate y using predict_nn for each node j in reverse order:
if j is the last nodeδ
j = y' – y
j
elseδ
j =
for each node j:layer, w = network[j]for each name, val in y[layer-1]:
w[name] += λ * δj * val
(1− y j2)∑i
δ i w j ,i
![Page 27: NLP Programming Tutorial 10 - Neural Networks · NLP Programming Tutorial 10 – Neural Networks NLP Programming Tutorial 10 - Neural Networks ... This is binary classification (of](https://reader031.vdocument.in/reader031/viewer/2022022006/5ac0f84e7f8b9aca388c848c/html5/thumbnails/27.jpg)
27
NLP Programming Tutorial 10 – Neural Networks
Training process
● For previous perceptron, we initialized weights to zero
● In NN: randomly initialize weights(so not all perceptrons are identical)
create networkrandomize network weightsfor I iterations
for each labeled pair x, y in the dataphi = create_features(x)update_nn(w, phi, y)
![Page 28: NLP Programming Tutorial 10 - Neural Networks · NLP Programming Tutorial 10 – Neural Networks NLP Programming Tutorial 10 - Neural Networks ... This is binary classification (of](https://reader031.vdocument.in/reader031/viewer/2022022006/5ac0f84e7f8b9aca388c848c/html5/thumbnails/28.jpg)
28
NLP Programming Tutorial 10 – Neural Networks
Exercise
![Page 29: NLP Programming Tutorial 10 - Neural Networks · NLP Programming Tutorial 10 – Neural Networks NLP Programming Tutorial 10 - Neural Networks ... This is binary classification (of](https://reader031.vdocument.in/reader031/viewer/2022022006/5ac0f84e7f8b9aca388c848c/html5/thumbnails/29.jpg)
29
NLP Programming Tutorial 10 – Neural Networks
Exercise (1)● Write two programs
● train-nn: Creates a neural network model● test-nn: Reads a neural network model
● Test train-nn● Input: test/03-train-input.txt● Use one iteration, one hidden layer, two hidden nodes● Calculate updates by hand and make sure they are
correct
![Page 30: NLP Programming Tutorial 10 - Neural Networks · NLP Programming Tutorial 10 – Neural Networks NLP Programming Tutorial 10 - Neural Networks ... This is binary classification (of](https://reader031.vdocument.in/reader031/viewer/2022022006/5ac0f84e7f8b9aca388c848c/html5/thumbnails/30.jpg)
30
NLP Programming Tutorial 10 – Neural Networks
Exercise (2)
● Train a model on data/titles-en-train.labeled
● Predict the labels of data/titles-en-test.word
● Grade your answers ● script/grade-prediction.py data-en/titles-en-test.labeled your_answer
● Compare:● With a single perceptron/SVM classifiers● With different neural network structures
![Page 31: NLP Programming Tutorial 10 - Neural Networks · NLP Programming Tutorial 10 – Neural Networks NLP Programming Tutorial 10 - Neural Networks ... This is binary classification (of](https://reader031.vdocument.in/reader031/viewer/2022022006/5ac0f84e7f8b9aca388c848c/html5/thumbnails/31.jpg)
31
NLP Programming Tutorial 10 – Neural Networks
Thank You!