![Page 1: Hopfield networks and Boltzmann machines - ut...• Hopfield net tries reduce the energy at each step. – This makes it impossible to escape from local minima. • We can use random](https://reader033.vdocument.in/reader033/viewer/2022042009/5e716e553762d42f380df4c4/html5/thumbnails/1.jpg)
Hopfield networks andBoltzmann machines
Geoffrey Hinton et al.Presented by Tambet Matiisen
18.11.2014
![Page 2: Hopfield networks and Boltzmann machines - ut...• Hopfield net tries reduce the energy at each step. – This makes it impossible to escape from local minima. • We can use random](https://reader033.vdocument.in/reader033/viewer/2022042009/5e716e553762d42f380df4c4/html5/thumbnails/2.jpg)
Hopfield network
Binary unitsSymmetrical connections
http://www.nnwj.de/hopfield-net.html
![Page 3: Hopfield networks and Boltzmann machines - ut...• Hopfield net tries reduce the energy at each step. – This makes it impossible to escape from local minima. • We can use random](https://reader033.vdocument.in/reader033/viewer/2022042009/5e716e553762d42f380df4c4/html5/thumbnails/3.jpg)
Energy function
• The global energy:
• The energy gap:
• Update rule:
ji
ijjiii
i wssbsE
j
ijjiiii wsbsEsEE )1()0(
otherwise,0
0if,1
i
jijjii
s
wsbs
http://en.wikipedia.org/wiki/Hopfield_network
![Page 4: Hopfield networks and Boltzmann machines - ut...• Hopfield net tries reduce the energy at each step. – This makes it impossible to escape from local minima. • We can use random](https://reader033.vdocument.in/reader033/viewer/2022042009/5e716e553762d42f380df4c4/html5/thumbnails/4.jpg)
- E = goodness = 4
Example
3 2 3 3
-1
-4
-1
1
1
0
0
?
0
- E = goodness = 3
? ?1
![Page 5: Hopfield networks and Boltzmann machines - ut...• Hopfield net tries reduce the energy at each step. – This makes it impossible to escape from local minima. • We can use random](https://reader033.vdocument.in/reader033/viewer/2022042009/5e716e553762d42f380df4c4/html5/thumbnails/5.jpg)
Deeper energy minimum
3 2 3 3
-1
-4
1-1
1
1
0
0
- E = goodness = 5
![Page 6: Hopfield networks and Boltzmann machines - ut...• Hopfield net tries reduce the energy at each step. – This makes it impossible to escape from local minima. • We can use random](https://reader033.vdocument.in/reader033/viewer/2022042009/5e716e553762d42f380df4c4/html5/thumbnails/6.jpg)
Is updating of Hopfield networkdeterministic or non-deterministic?
A. DeterministicB. Non-deterministic
Determinist
ic
Non-determinist
ic
0%0%
![Page 7: Hopfield networks and Boltzmann machines - ut...• Hopfield net tries reduce the energy at each step. – This makes it impossible to escape from local minima. • We can use random](https://reader033.vdocument.in/reader033/viewer/2022042009/5e716e553762d42f380df4c4/html5/thumbnails/7.jpg)
How to update?
• Nodes must be updated sequentially, usuallyin randomized order.
• With parallel updating energy could go up.
• If updates occur in parallel but with randomtiming, the oscillations are usually destroyed.
-100
+5 +500
![Page 8: Hopfield networks and Boltzmann machines - ut...• Hopfield net tries reduce the energy at each step. – This makes it impossible to escape from local minima. • We can use random](https://reader033.vdocument.in/reader033/viewer/2022042009/5e716e553762d42f380df4c4/html5/thumbnails/8.jpg)
Content-addressable memory
• Using energy minima to represent memoriesgives a content-addressable memory.– An item can be accessed by just knowing part of
its content.– Can fill out missing or corrupted pieces of
information.– It is robust against hardware damage.
![Page 9: Hopfield networks and Boltzmann machines - ut...• Hopfield net tries reduce the energy at each step. – This makes it impossible to escape from local minima. • We can use random](https://reader033.vdocument.in/reader033/viewer/2022042009/5e716e553762d42f380df4c4/html5/thumbnails/9.jpg)
Classical conditioning
http://changecom.wordpress.com/2013/01/03/classical-conditioning/
![Page 10: Hopfield networks and Boltzmann machines - ut...• Hopfield net tries reduce the energy at each step. – This makes it impossible to escape from local minima. • We can use random](https://reader033.vdocument.in/reader033/viewer/2022042009/5e716e553762d42f380df4c4/html5/thumbnails/10.jpg)
Storing memories
• Energy landscape is determined by weights!
• If we use activities -1 and 1:
• If we use states 0 and 1:
∆wij = sis j
)()(42
1
2
1 jiij ssw
1 thenif
1 thenif
ijji
ijji
wss
wss
![Page 11: Hopfield networks and Boltzmann machines - ut...• Hopfield net tries reduce the energy at each step. – This makes it impossible to escape from local minima. • We can use random](https://reader033.vdocument.in/reader033/viewer/2022042009/5e716e553762d42f380df4c4/html5/thumbnails/11.jpg)
Demo
• http://www.tarkvaralabor.ee/doodler/• (choose Algorithm: Hopfield and Initialize)
![Page 12: Hopfield networks and Boltzmann machines - ut...• Hopfield net tries reduce the energy at each step. – This makes it impossible to escape from local minima. • We can use random](https://reader033.vdocument.in/reader033/viewer/2022042009/5e716e553762d42f380df4c4/html5/thumbnails/12.jpg)
How many weights the example had?
A. 100B. 1000C. 10000
1001000
10000
0% 0%0%
![Page 13: Hopfield networks and Boltzmann machines - ut...• Hopfield net tries reduce the energy at each step. – This makes it impossible to escape from local minima. • We can use random](https://reader033.vdocument.in/reader033/viewer/2022042009/5e716e553762d42f380df4c4/html5/thumbnails/13.jpg)
Storage capacity
• The capacity of a totally connected net with Nunits is only about 0.15 * N memories.– With N bits per one memory this is only
0.15 * N * N bits.
• The net has N2 weights and biases.• After storing M memories, each connection
weight has an integer value in the range [–M, M].• So the number of bits required to store the
weights and biases is: N 2 log(2M +1)
![Page 14: Hopfield networks and Boltzmann machines - ut...• Hopfield net tries reduce the energy at each step. – This makes it impossible to escape from local minima. • We can use random](https://reader033.vdocument.in/reader033/viewer/2022042009/5e716e553762d42f380df4c4/html5/thumbnails/14.jpg)
How many bits are needed torepresent weights in the example?
A. 1500B. 50 000C. 320 000
1500
50 000
320 000
0% 0%0%
![Page 15: Hopfield networks and Boltzmann machines - ut...• Hopfield net tries reduce the energy at each step. – This makes it impossible to escape from local minima. • We can use random](https://reader033.vdocument.in/reader033/viewer/2022042009/5e716e553762d42f380df4c4/html5/thumbnails/15.jpg)
Spurious minima
• Each time we memorize a configuration, wehope to create a new energy minimum.
• But what if two minima merge to create aminimum at an intermediate location?
![Page 16: Hopfield networks and Boltzmann machines - ut...• Hopfield net tries reduce the energy at each step. – This makes it impossible to escape from local minima. • We can use random](https://reader033.vdocument.in/reader033/viewer/2022042009/5e716e553762d42f380df4c4/html5/thumbnails/16.jpg)
Reverse learning
• Let the net settle from a random initial stateand then do unlearning.
• This will get rid of deep, spurious minima andincrease memory capacity.
![Page 17: Hopfield networks and Boltzmann machines - ut...• Hopfield net tries reduce the energy at each step. – This makes it impossible to escape from local minima. • We can use random](https://reader033.vdocument.in/reader033/viewer/2022042009/5e716e553762d42f380df4c4/html5/thumbnails/17.jpg)
Increasing memory capacity
• Instead of trying to store vectors in one shot,cycle through the training set many times.
• Use the perceptron convergence procedure totrain each unit to have the correct state giventhe states of all the other units in that vector.
ijijji wxfx jiiij xxxw )(
![Page 18: Hopfield networks and Boltzmann machines - ut...• Hopfield net tries reduce the energy at each step. – This makes it impossible to escape from local minima. • We can use random](https://reader033.vdocument.in/reader033/viewer/2022042009/5e716e553762d42f380df4c4/html5/thumbnails/18.jpg)
Hopfield nets with hidden units• Instead of using the net
to store memories, use itto constructinterpretations of sensoryinput.– The input is represented by
the visible units.– The interpretation is
represented by the statesof the hidden units.
– The badness of theinterpretation isrepresented by the energy.
visible units
hidden units
![Page 19: Hopfield networks and Boltzmann machines - ut...• Hopfield net tries reduce the energy at each step. – This makes it impossible to escape from local minima. • We can use random](https://reader033.vdocument.in/reader033/viewer/2022042009/5e716e553762d42f380df4c4/html5/thumbnails/19.jpg)
3D edges from 2D images
You can only see one ofthese 3-D edges at atime because theyocclude one another.
2-D lines
3-D lines
picture
![Page 20: Hopfield networks and Boltzmann machines - ut...• Hopfield net tries reduce the energy at each step. – This makes it impossible to escape from local minima. • We can use random](https://reader033.vdocument.in/reader033/viewer/2022042009/5e716e553762d42f380df4c4/html5/thumbnails/20.jpg)
Noisy networks
• Hopfield net tries reduce the energy at each step.– This makes it impossible to escape from local minima.
• We can use random noise to escape from poorminima.– Start with a lot of noise so its easy to cross energy
barriers.– Slowly reduce the noise so that the system ends up in
a deep minimum. This is “simulated annealing”.
A B C
![Page 21: Hopfield networks and Boltzmann machines - ut...• Hopfield net tries reduce the energy at each step. – This makes it impossible to escape from local minima. • We can use random](https://reader033.vdocument.in/reader033/viewer/2022042009/5e716e553762d42f380df4c4/html5/thumbnails/21.jpg)
Temperature
AB
1.0)(
2.0)(
BAp
BAp
AB
000001.0)(
001.0)(
BAp
BAp
High temperaturetransitionprobabilities
Low temperaturetransitionprobabilities
![Page 22: Hopfield networks and Boltzmann machines - ut...• Hopfield net tries reduce the energy at each step. – This makes it impossible to escape from local minima. • We can use random](https://reader033.vdocument.in/reader033/viewer/2022042009/5e716e553762d42f380df4c4/html5/thumbnails/22.jpg)
Stochastic binary units
• Replace the binary threshold units by binary stochasticunits that make biased random decisions.
• The “temperature” controls the amount of noise.• Raising the noise level is equivalent to decreasing all
the energy gaps between configurations.
p(si=1) = 1
1+ e−∆Ei T temperature
![Page 23: Hopfield networks and Boltzmann machines - ut...• Hopfield net tries reduce the energy at each step. – This makes it impossible to escape from local minima. • We can use random](https://reader033.vdocument.in/reader033/viewer/2022042009/5e716e553762d42f380df4c4/html5/thumbnails/23.jpg)
Why we need stochastic binary units?
A. Because we cannotget rid of inherentnoise.
B. Because it helps toescape localminima.
C. Because we wantsystem to producerandomized results.
Because
we ca
nnot get r
i...
Because
it helps t
o escap...
Because
we w
ant syst
em ..
0% 0%0%
![Page 24: Hopfield networks and Boltzmann machines - ut...• Hopfield net tries reduce the energy at each step. – This makes it impossible to escape from local minima. • We can use random](https://reader033.vdocument.in/reader033/viewer/2022042009/5e716e553762d42f380df4c4/html5/thumbnails/24.jpg)
Thermal equilibrium
• Thermal equilibrium is a difficult concept!– Reaching thermal equilibrium does not mean that
the system has settled down into the lowestenergy configuration.
– The thing that settles down is the probabilitydistribution over configurations.
– This settles to the stationary distribution.– Any given system keeps changing its configuration,
but the fraction of systems in each configurationdoes not change.
![Page 25: Hopfield networks and Boltzmann machines - ut...• Hopfield net tries reduce the energy at each step. – This makes it impossible to escape from local minima. • We can use random](https://reader033.vdocument.in/reader033/viewer/2022042009/5e716e553762d42f380df4c4/html5/thumbnails/25.jpg)
Modeling binary data
• Given a training set of binary vectors, fit amodel that will assign a probability to everypossible binary vector.
• Model can be used for generating data withthe same distribution as original data.
• If particular model (distribution) produced theobserved data:
jj
iii modeldatap
modelpmodeldatapdatamodelp
)|(
)()|()|(
![Page 26: Hopfield networks and Boltzmann machines - ut...• Hopfield net tries reduce the energy at each step. – This makes it impossible to escape from local minima. • We can use random](https://reader033.vdocument.in/reader033/viewer/2022042009/5e716e553762d42f380df4c4/html5/thumbnails/26.jpg)
Boltzmann machine
• ...is defined in terms of the energies of jointconfigurations of the visible and hidden units.
• Probability of joint configuration:
• The probability of finding the network in thatjoint configuration after we have updated allof the stochastic binary units many times.
p(v,h)∝e−E(v,h)
![Page 27: Hopfield networks and Boltzmann machines - ut...• Hopfield net tries reduce the energy at each step. – This makes it impossible to escape from local minima. • We can use random](https://reader033.vdocument.in/reader033/viewer/2022042009/5e716e553762d42f380df4c4/html5/thumbnails/27.jpg)
Energy of a joint configuration
−E(v,h) = vibii∈vis∑ + hkbk
k∈hid∑ + viv jwij
i< j∑ + vihkwik
i,k∑ + hkhlwkl
k<l∑
bias ofunit k
weight betweenvisible unit i andhidden unit k
Energy with configuration von the visible units and hon the hidden units
binary stateof unit i in v
indexes every non-identical pair of iand j once
![Page 28: Hopfield networks and Boltzmann machines - ut...• Hopfield net tries reduce the energy at each step. – This makes it impossible to escape from local minima. • We can use random](https://reader033.vdocument.in/reader033/viewer/2022042009/5e716e553762d42f380df4c4/html5/thumbnails/28.jpg)
From energies to probabilities• The probability of a joint
configuration over both visibleand hidden units depends onthe energy of that jointconfiguration compared withthe energy of all other jointconfigurations.
• The probability of aconfiguration of the visibleunits is the sum of theprobabilities of all the jointconfigurations that contain it.
p(v,h) = e−E (v,h)
e−E(u,g)
u,g∑
partitionfunction
p(v) =e−E(v,h)
h∑e−E(u,g)
u,g∑
![Page 29: Hopfield networks and Boltzmann machines - ut...• Hopfield net tries reduce the energy at each step. – This makes it impossible to escape from local minima. • We can use random](https://reader033.vdocument.in/reader033/viewer/2022042009/5e716e553762d42f380df4c4/html5/thumbnails/29.jpg)
Example
h1 h2
+2 +1
v1 v2
An example of how weights
define a distribution
1 1 1 1 2 7.39 .1861 1 1 0 2 7.39 .1861 1 0 1 1 2.72 .0691 1 0 0 0 1 .0251 0 1 1 1 2.72 .0691 0 1 0 2 7.39 .1861 0 0 1 0 1 .0251 0 0 0 0 1 .0250 1 1 1 0 1 .0250 1 1 0 0 1 .0250 1 0 1 1 2.72 .0690 1 0 0 0 1 .0250 0 1 1 -1 0.37 .0090 0 1 0 0 1 .0250 0 0 1 0 1 .0250 0 0 0 0 1 .025
39.70
v h −E e−E p(v, h ) p(v)
0.466
0.305
0.144
0.084
-1
![Page 30: Hopfield networks and Boltzmann machines - ut...• Hopfield net tries reduce the energy at each step. – This makes it impossible to escape from local minima. • We can use random](https://reader033.vdocument.in/reader033/viewer/2022042009/5e716e553762d42f380df4c4/html5/thumbnails/30.jpg)
Getting a sample from model• We cannot compute the normalizing term (the
partition function) because it has exponentially manyterms.
• So we use Markov Chain Monte Carlo to get samplesfrom the model starting from a random globalconfiguration:– Keep picking units at random and allowing them to
stochastically update their states based on their energygaps.
– Run the Markov chain until it reaches its stationarydistribution
• The probability of a global configuration is then relatedto its energy by the Boltzmann distribution.
![Page 31: Hopfield networks and Boltzmann machines - ut...• Hopfield net tries reduce the energy at each step. – This makes it impossible to escape from local minima. • We can use random](https://reader033.vdocument.in/reader033/viewer/2022042009/5e716e553762d42f380df4c4/html5/thumbnails/31.jpg)
Getting a sample from the posteriordistribution for a given data vector
• The number of possible hidden configurations isexponential so we need MCMC to sample from theposterior.– It is just the same as getting a sample from the model,
except that we keep the visible units clamped to the givendata vector.
– Only the hidden units are allowed to change states• Samples from the posterior are required for learning
the weights. Each hidden configuration is an“explanation” of an observed visible configuration.Better explanations have lower energy.
![Page 32: Hopfield networks and Boltzmann machines - ut...• Hopfield net tries reduce the energy at each step. – This makes it impossible to escape from local minima. • We can use random](https://reader033.vdocument.in/reader033/viewer/2022042009/5e716e553762d42f380df4c4/html5/thumbnails/32.jpg)
What does Boltzmann machine reallydo?
A. Models probabilitydistribution of inputdata.
B. Generates samplesfrom modeleddistribution.
C. Learns probabilitydistribution of inputdata from samples.
D. All of above.Models p
robability dist
ri...
Generates s
amples from...
Learns p
robability dist
rib...
All of a
bove.
None of above
.
20% 20% 20%20%20%