cs626: nlp, speech and the webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “the basic...
TRANSCRIPT
![Page 1: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/1.jpg)
CS626: NLP, Speech and the Web
Pushpak BhattacharyyaCSE Dept., IIT Bombay
Lectures 30, 31, 32, 33: Recurrent NN, Language Modeling
13th October onwards, 2014
(Guiding paper: Application of Deep Belief Networks for Natural Language Understanding, IEEE Transactions on
Audio, Speech and Language Processing)
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 1
![Page 2: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/2.jpg)
Harris’s distributional hypothesis
“We group A and B into substitution set whenever A and B have the same (or par-tially same) environments X (Harris, 1981, p.17)”
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 2
![Page 3: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/3.jpg)
“The basic concept of word”
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 3
The Basic Concept
of word
is hard
to express
The 0 1 1 1 1 0 0 0 0
basic 1 0 1 1 1 0 0 0 0
concept
of
word …..
is
hard
to
express 0 0 0 0 0 0 0 0 0
![Page 4: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/4.jpg)
Backpropagation algorithm
Fully connected feed forward network Pure FF network (no jumping of
connections over layers)
Hidden layers
Input layer (n i/p neurons)
Output layer (m o/p neurons)
j
iwji
….
….
….
….
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 4
![Page 5: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/5.jpg)
General Backpropagation Rule
ijjk
kkj ooow )1()(layernext
)1()( jjjjj ooot
iji jow • General weight updating rule:
• Where
for outermost layer
for hidden layers
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 5
![Page 6: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/6.jpg)
Recurrent NN
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 6
![Page 7: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/7.jpg)
Hopfield net Inspired by associative memory which means
memory retrieval is not by address, but by part of the data.
Consists ofN neurons fully connected with symmetric weight strength wij = wji
No self connection. So the weight matrix is 0-diagonal and symmetric.
Each computing element or neuron is a linear threshold element with threshold = 0.
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 7
![Page 8: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/8.jpg)
Connection matrix of the network, 0-diagonal and symmetric
n1 n2 n3 . . . nk
n1
n2
n3. ..
nk
j
i wij
0 – diagonal
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 8
![Page 9: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/9.jpg)
Examplew12 = w21 = 5w13 = w31 = 3w23 = w32 = 2At time t=0
s1(t) = 1s2(t) = -1s3(t) = 1Unstable state: Neuron 1 will flip.A stable pattern is called an
attractor for the net.
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 9
![Page 10: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/10.jpg)
Concept of Energy Energy at state s is given by the equation:
nn xxwxxwxxwsE 1131132112)(
nn xxwxxw 223223
nnnn xxw )1()1(
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 10
![Page 11: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/11.jpg)
Relation between weight vector W and state vector X
W TX
Weight vector Transpose of state vector
3
21
2
5
3
1
-11
023205350
1
11
W TX
023205350
W
1
11
TX
For example, in figure 1,At time t = 0, state of the neural network is:s(0) = <1, -1, 1> and corresponding vectors are as shown.
Fig. 1
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 11
![Page 12: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/12.jpg)
W.XT gives the inputs to the neurons at the next time instant
023205350
1
11
W TX
17
2
11
1).sgn( TXW
This shows that the n/w will change state
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 12
![Page 13: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/13.jpg)
Theorem
In the asynchronous mode of operation, the energy of the Hopfield net alwaysdecreases.
Proof:
)()()()()()()( 11111311131211121 txtxwtxtxwtxtxwtE nn
)()()()( 1122131223 txtxwtxtxw nn
)()( 11)1()1( txtxw nnnn
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 13
![Page 14: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/14.jpg)
Proof Let neuron 1 change state by summing and comparing We get following equation for energy
)()()()()()()( 22112321132221122 txtxwtxtxwtxtxwtE nn
)()()()( 2222232223 txtxwtxtxw nn
)()( 22)1()1( txtxw nnnn
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 14
![Page 15: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/15.jpg)
Proof: note that only neuron 1 changes state
)()()()()()( 1111131113121112 txtxwtxtxwtxtxw nn
n
jjjj txtxtxtxw
21112211 )()()()(
)()()()()()( 2211232113222112 txtxwtxtxwtxtxw nn
)()( 12 tEtEE
n
j
jj txtxtxw2
211111 )()()(
Since only neuron 1 changes state, xj(t1)=xj(t2), j=2, 3, 4, …n, and hence
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 15
![Page 16: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/16.jpg)
Proof (continued)
Observations: When the state changes from -1 to 1, (S) has to be +ve
and (D) is –ve; so ΔE becomes negative. When the state changes from 1 to -1, (S) has to be -ve and
(D) is +ve; so ΔE becomes negative.
Therefore, Energy for any state change always decreases.
n
jjj txtxtxw
2211111 )()()(
(D)(S)
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 16
![Page 17: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/17.jpg)
The Hopfield net has to “converge” in the asynchronous mode of operation
As the energy E goes on decreasing, it has to hit the bottom, since the weight and the state vector have finite values.
That is, the Hopfield Net has to converge to an energy minimum.
Hence the Hopfield Net reaches stability.
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 17
![Page 18: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/18.jpg)
Training of Hopfield Net Early Training Rule proposed by Hopfield Rule inspired by the concept of electron
spin Hebb’s rule of learning
If two neurons i and j have activation xi and xjrespectively, then the weight wij between the two neurons is directly proportional to the product xi ·xj i.e.
jiij xxw 13 Oct, 2014
Pushpak Bhattacharyya: recurrent NN 18
![Page 19: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/19.jpg)
Hopfield Rule
Training by Hopfield Rule Train the Hopfield net for a specific
memory behavior Store memory elements How to store patterns?
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 19
![Page 20: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/20.jpg)
Hopfield Rule To store a pattern
<xn, xn-1, …., x3, x2, x1>make
jiij xxn
w
)1(
1
• Storing pattern is equivalent to ‘Making that pattern the stable state’
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 20
![Page 21: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/21.jpg)
Training of Hopfield Net
Establish that<xn, xn-1, …., x3, x2, x1>
is a stable state of the net
To show the stability of <xn, xn-1, …., x3, x2, x1>
impress at t=0<xt
n, xtn-1, …., xt
3, xt2, xt
1>13 Oct, 2014
Pushpak Bhattacharyya: recurrent NN 21
![Page 22: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/22.jpg)
Training of Hopfield Net Consider neuron i at t=1
n
jijjiji
ii
txwtnet
tnetta
1,
))0(()0(
))0(sgn()1(
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 22
![Page 23: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/23.jpg)
Establishing stability
))0(sgn()1(,
)0(
)1())0(()1(
1
1))0(()1(
1
])0([))0(()1(
1
))0(())0(())0(()1(
1
)0(
,1
2
1,
txtxThus
tx
ntxn
txn
txtxn
txtxtxn
txw
ii
i
i
ijji
jji
jj
ji
n
jijjij
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 23
![Page 24: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/24.jpg)
Example
We want <1, -1, 1> as stored memory
Calculate all the wijvalues
wAB = 1/(3-1) * 1 * -1 = -1/2
Similarly wBC = -1/2 and wCA = ½
Is <1, -1, 1> stable?
C
BA
1
-11
C
BA
-0.5
-0.5
0.5
1
-11
Initially
After calculating weight values
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 24
![Page 25: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/25.jpg)
Observations
How much deviation can the net tolerate?
What if more than one pattern is to be stored?
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 25
![Page 26: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/26.jpg)
Storing k patterns Let the patterns be:
P1 : <xn, xn-1, …., x3, x2, x1>1
P2 : <xn, xn-1, …., x3, x2, x1>2
.
.
.Pk : <xn, xn-1, …., x3, x2, x1>k
Generalized Hopfield Rule is:
p
k
pji xx
nwij |
)1(1
1
Pth pattern
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 26
![Page 27: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/27.jpg)
Storing k patterns
Study the stability of<xn, xn-1, …., x3, x2, x1>
Impress the vector at t=0 and observer network dynamics
Looking at neuron i at t=1, we have
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 27
![Page 28: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/28.jpg)
Examining stability of the qth
pattern
)1(|)0(
1|)0()1(
1
]|)0([)|)0(()1(
1]|)0(|)0([)1(
1
|)0(|)0(|)0()1(
1|)0(]|)0(|)0([)1(
1
|)0(]|)0(|)0([)1(
1
|)0(
|)0(|)1();|)1(sgn(
|)1(
,1
2
,1
1
,1
nx
Q
xn
Q
xxn
xxn
xxxn
xxxn
xxxn
xw
xwnetnet
x
qi
qi
k
qppqjqipjqi
k
qppqjqjqiqjpjpi
k
pqjpjpi
qjij
n
ijjqjijqiqi
qi
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 28
![Page 29: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/29.jpg)
Examining stability of the qth
pattern
)]0()|)0(|)0((sgn[
)]0(sgn[
])1()0(sgn[)1(
,
,1,1
,1
,1,1
in
ijj
k
qpppjqi
in
ijj
inijj
nijji
xxx
xQ
nxQx
Thus
Small when k << n
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 29
![Page 30: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/30.jpg)
Storing k patterns
Condition for patterns to be stable on a Hopfield net with n neurons is:
k << n The storage capacity of Hopfield net is
very small Hence it is not a practical memory
element
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 30
![Page 31: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/31.jpg)
Boltmann M/C
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 31
![Page 32: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/32.jpg)
Boltzmann Machine
Hopfield net Probabilistic neurons Energy expression = -Σi Σj>i wij xi xj
where xi = activation of ith neuron Used for optimization Central concern is to ensure global
minimum Based on simulated annealing13 Oct, 2014
Pushpak Bhattacharyya: recurrent NN 32
![Page 33: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/33.jpg)
Comparative RemarksFeed forward n/w with BP
Hopfield net Boltzmann m/c
Mapping device:(i/p pattern --> o/p pattern), i.e.Classification
Associative Memory
+Optimization device
Constraint satisfaction.(Mapping + Optimization device)
Minimizes total sum square error
Energy Entropy (Kullback–Leibler divergence)
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 33
![Page 34: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/34.jpg)
Comparative Remarks (contd.)Feed forward n/w with BP
Hopfield net Boltzmann m/c
Deterministic neurons
Deterministic neurons
Probabilistic neurons
Learning to associate i/p with o/p i.e.equivalent to a function
Pattern Probability Distribution
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 34
![Page 35: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/35.jpg)
Comparative Remarks (contd.)
Feed forward n/w with BP
Hopfield net Boltzmann m/c
Can get stuck in local minimum (Greedy approach)
Local minimum possible
Can come out of local minimum
Credit/blame assignment (consistent with Hebbian rule)
Activation product (consistent with Hebbian rule)
Probability and activation product (consistent with Hebbian rule)
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 35
![Page 36: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/36.jpg)
Theory of Boltzmann m/c
For the m/c the computation means the following:At any time instant, make the state of the kth neuron (sk) equal to 1, with probability:
1 / (1 + exp(-ΔEk / T))ΔEk = change in energy of the m/c when
the kth neuron changes stateT = temperature which is a parameter of
the m/c13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 36
![Page 37: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/37.jpg)
Theory of Boltzmann m/c (contd.)
P(sk = 1)
ΔEk0
1
P(sk = 1) = 1 / (1 + exp(-ΔEk / T))
Increasing T
ΔEk = α
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 37
![Page 38: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/38.jpg)
Theory of Boltzmann m/c (contd.)
ΔEk = Ekfinal – Ek
initial
= (sinitialk - sfinal
k) * Σj≠kwkjsj
We observe:1. The higher the temperature, lower is P(Sk=1)2. at T = infinity, P(Sk=1) = P(Sk=0) = 0.5, equal
chance of being in state 0 or 1. Completely random behavior
3. If T 0, then P(Sk=1) 14. The derivative is proportional P(Sk=1)*(1 -
P(Sk=1))13 Oct, 2014
Pushpak Bhattacharyya: recurrent NN 38
![Page 39: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/39.jpg)
Consequence of the form of P(Sk=1)P(Sα) proportional to exp[( -E(Sα)) / T]
Probability Distribution called as Boltzmann Distribution
1 -1 1 -1
N - bits
P(Sα) is the probability of the state Sα
Local “sigmoid” probabilistic behavior leads to global Boltzmann Distribution behaviour of the n/w
SP(
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 39
![Page 40: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/40.jpg)
T
P(Sα) α exp[( -E(Sα)) / T]
P
E
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 40
![Page 41: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/41.jpg)
Ratio of state probabilities
Normalizing,P(Sα) = (exp(-E(Sα)) / T) / (∑β є all statesexp(-E(Sβ)/T)
P(Sα) / P(Sβ) = exp -(E(Sα) - E(Sβ) ) / T
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 41
![Page 42: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/42.jpg)
Learning a probability distribution
Digression: Estimation of a probability distribution Q by another distribution P
D = deviation = ∑sample space Qln Q/P D >= 0, which is a required property
(just like sum square error >= 0)
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 42
![Page 43: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/43.jpg)
Recurrent n/w and optimization
Problem representation
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 43
![Page 44: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/44.jpg)
What is common between Sentence Generation Sorting Travelling Salesman Problem
![Page 45: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/45.jpg)
Sentence GenerationGiven a set of words place them at appropriate positions in the sentence.
Position(pj)
Words(wi) 1 2 3 ........... M123....M
xij =1 iff ith word is in jh position
![Page 46: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/46.jpg)
SortingGiven some numbers, place them at appropriate position in the ordered list.
Position(pj)
Numbers(ni) 1 2 3 ........... M123....M
xij =1 iff ith number is in jh position
![Page 47: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/47.jpg)
TSPGiven the cities a traveller must visit, place the cities in the “tour” so that the total distance travelled is minimized.
Position(pj)
Cities(ci) 1 2 3 ........... M123....M
xij =1 iff ith city is in jh position
![Page 48: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/48.jpg)
Hopfield Net for Optimization
Optimization problem Maximizes or minimizes a quantity
Hopfield net used for optimization Hopfield net and Traveling Salesman
Problem Hopfield net and Job Scheduling Problem
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 48
![Page 49: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/49.jpg)
The essential idea of the correspondence
In optimization problems, we have to minimize a quantity.
Hopfield net minimizes the energy THIS IS THE CORRESPONDENCE
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 49
![Page 50: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/50.jpg)
Hopfield net and Traveling Salesman problem
We consider the problem for n=4 cities In the given figure, nodes represent cities
and edges represent the paths between the cities with associated distance.
D C
A B
dDA dBC
dCD
dAB
dBDdAC
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 50
![Page 51: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/51.jpg)
Traveling Salesman Problem
Goal Come back to the city A, visiting j = 2 to n
(n is number of cities) exactly once and minimize the total distance.
To solve by Hopfield net we need to decide the architecture: How many neurons? What are the weights?
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 51
![Page 52: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/52.jpg)
Constraints decide the parameters
1. For n cities and n positions, establish city to position correspondence, i.e.
Number of neurons = n cities * n positions
2. Each position can take one and only one city
3. Each city can be in exactly one position
4. Total distance should be minimum13 Oct, 2014
Pushpak Bhattacharyya: recurrent NN 52
![Page 53: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/53.jpg)
Architecture n * n matrix where
rows denote cities and columns denote positions
cell(i, j) = 1 if and only if ith city is in jthposition
Each cell is a neuron n2 neurons, O(n4)
connections
pos(α)
city(i)
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 53
![Page 54: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/54.jpg)
Expressions corresponding to constraints
1. Each city in one and only one position i.e. a row has a single 1.
• Above equation partially ensures each row has a single 1
• xiα is I/O cost at the cell(i, α)
i
n
i
n n
i xxAE
1 1 ,1
1 2
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 54
![Page 55: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/55.jpg)
Expressions corresponding to constraints (contd.)
2. Each position has a single cityi.e. each column has at most single 1.
n n
i
n
ijjji xxBE
1 1 ,12 2
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 55
![Page 56: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/56.jpg)
Expressions corresponding to constraints (contd.)
3. All cities MUST be visited once and only once
2
1 13 ])[(
2
n
i
n
i nxCE
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 56
![Page 57: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/57.jpg)
Expressions corresponding to constraints (contd.)
E1, E2, E3 ensure that each row has exactly one 1 and each column has exactly one 1
If we minimize E1 + E2 + E3
Ensures a Hamiltonian circuit on the city graph which is an NP-complete problem.
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 57
![Page 58: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/58.jpg)
Constraint of distance4. The distance traversed should be minimum
dij = distance between city i and city j
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN
n
i
n
j
n
jjiij xxxdE1 1 1
1,1,2 )(21
![Page 59: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/59.jpg)
Expressions corresponding to constraints (contd.)
We equate constraint energy:EProblem = Enetwork (*)
Where, Eproblem= E1+E2+E3+E4
and Enetwork is the well known energy expression for the Hopfield net
Find the weights from (*).
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 59
![Page 60: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/60.jpg)
Finding weights for Hopfield Net applied to TSP
Alternate and more convenient Eproblem
EP = E1 + E2
whereE1 is the equation for n cities, each city in one position and each position with one city.E2 is the equation for distance
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 60
![Page 61: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/61.jpg)
Expressions for E1 and E2
n n
i
n
i
n
ii xxAE1 1 1 1
221 ])1()1([
2
n
i
n
j
n
jjiij xxxdE1 1 1
1,1,2 )(21
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 61
![Page 62: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/62.jpg)
Explanatory example3
1 2
1 2 31 x11 x12 x13
2 x21 x22 x23
3 x31 x32 x33
pos
city
Fig. 1 shows two possible directions in which tour can take place
Fig. 1
For the matrix alongside, xiα = 1, if and only if, ith city is in position α
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 62
![Page 63: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/63.jpg)
Kinds of weights
Row weights:w11,12 w11,13 w12,13
w21,22 w21,23 w22,23
w31,32 w31,33 w32,33
Column weightsw11,21 w11,31 w21,31
w12,22 w12,32 w22,32
w13,23 w13,33 w23,3313 Oct, 2014
Pushpak Bhattacharyya: recurrent NN 63
![Page 64: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/64.jpg)
Cross weights
w11,22 w11,23 w11,32 w11,33
w12,21 w12,23 w12,31 w12,33
w13,21 w13,22 w13,31 w13,32
w21,32 w21,33 w22,31 w23,33
w23,31 w23,32
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 64
![Page 65: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/65.jpg)
Expressions
])1(
)1(
)1(
)1(
)1(
)1[(2
2332313
2322212
2312111
2333231
2232221
21312111
21
xxxxxx
xxxxxxxxx
xxxAE
EEE problem
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 65
![Page 66: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/66.jpg)
Expressions (contd.)
)...]()()()()(
)([21
32311313
31331213
33321113
22211312
21231212
232211122
xxxdxxxdxxxdxxxdxxxd
xxxdE
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 66
![Page 67: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/67.jpg)
Enetwork
...]
[
331133,11321132,11311131,11
231123,11221122,11211121,11
131213,12131113,11121112,11
xxwxxwxxwxxwxxwxxwxxwxxwxxwEnetwork
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 67
![Page 68: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/68.jpg)
Find row weight
To find, w11,12
= -(co-efficient of x11x12) in Enetwork
Search x11x12 in Eproblem
w11,12 = -A ...from E1. E2 cannot contribute
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 68
![Page 69: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/69.jpg)
Find column weight
To find, w11,21
= -(co-efficient of x11x21) in Enetwork
Search co-efficient of x11x21 in Eproblem
w11,21 = -A ...from E1. E2 cannot contribute
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 69
![Page 70: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/70.jpg)
Find Cross weights
To find, w11,22
= -(co-efficient of x11x22) Search x11x22 from Eproblem. E1 cannot
contribute Co-eff. of x11x22 in E2
(d12 + d21) / 2
Therefore, w11,22 = -( (d12 + d21) / 2)13 Oct, 2014
Pushpak Bhattacharyya: recurrent NN 70
![Page 71: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/71.jpg)
Find Cross weights
To find, w11,33
= -(co-efficient of x11x33) Search for x11x33 in Eproblem
w11,33 = -( (d13 + d31) / 2)
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 71
![Page 72: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/72.jpg)
Summary
Row weights = -A Column weights = -A Cross weights = - ( (dij + dji) / 2), j = i
± 1
13 Oct, 2014Pushpak Bhattacharyya: recurrent
NN 72
![Page 73: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/73.jpg)
Restricted Boltzmann Machines (RBM)
Lecture 396th Nov, 2014
![Page 74: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/74.jpg)
Restricted Boltzmann Machine (Binary)
VISIBLE HIDDEN
1v2v3v
mv
1h2h3h
nh
Fully connected
![Page 75: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/75.jpg)
Weights(Bidirectional and symmetric)
11w 12w . . . 1nw21w 22w . . . 2nw
. . . . . .
. . . . . .
. . . . . .
m1w m2w . . . mnw
Bias 1 Visible Units
Bias 2Hidden Units
1b2b
.
.
.
mb
1c2c
.
.
.
nc
V = Visible Unit Activation 1v2v
.
.
.
mv
H = Hidden Unit Activation 1h2h
.
.
.
nh
![Page 76: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/76.jpg)
Energy at a state
Energy E(H, V) at the state <H, V>
E(H,V)VTWHBTV CTH ijw
j 1
n
i1
m
iv jh ib ivi1
m
jcj 1
n
jh
![Page 77: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/77.jpg)
Problem: Name Identification
Let a sentence be denoted by S
Input: Sentence POS tagged with Noun (= 1) or not Noun (= 0)
TajMahal is the most visited site in India
‘1’ for NE, 0 otherwise
wwwww nnS 1210 ...
1 10 0 0 0 0 0
![Page 78: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/78.jpg)
How do the neurons become 0 and 1?
where
Similarly, for visible neurons
P ( h j 1 | V )1
1 e net j
net j w jii 1
m
v i c j
![Page 79: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/79.jpg)
Input: Delhi is in IndiaOutput: 1 0 0 1
Where E = Energy, Z = partition function
P( H,V ) eE ( H ,V )
eE( H ,V )
Z
Z eV '
H ' E ( H ' ,V ' )
![Page 80: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con](https://reader035.vdocument.in/reader035/viewer/2022062509/61026bcaa9e58f0229607664/html5/thumbnails/80.jpg)
Probability of a state of the network is given by energy.
Probability of the state of a neuron is given by sigmoid.
The weights and biases should be adjusted such that the desired <H,V> combination is stabilized.