restricted boltzmann machinesmwmak/papers/rbm.pdf3 restricted boltzmann machines • given the joint...

21
1 Restricted Boltzmann Machines Supplementary Notes to EIE4105 (Out of Syllabus) M.W. Mak [email protected] http://www.eie.polyu.edu.hk/~mwmak References (Equations of this file are obtained from): 1. Alex Krizhevsky, “Learning Multiple Layers of Features from Tiny Images, 2009

Upload: others

Post on 27-May-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Restricted Boltzmann Machinesmwmak/papers/RBM.pdf3 Restricted Boltzmann Machines • Given the joint probability of v and h is • Intui8vely, the configuraon (v,h) leading to low

1

Restricted Boltzmann Machines Supplementary Notes to EIE4105 (Out of Syllabus)

M.W. Mak [email protected]

http://www.eie.polyu.edu.hk/~mwmak References (Equations of this file are obtained from):

1.  Alex Krizhevsky, “Learning Multiple Layers of Features from Tiny Images, 2009

Page 2: Restricted Boltzmann Machinesmwmak/papers/RBM.pdf3 Restricted Boltzmann Machines • Given the joint probability of v and h is • Intui8vely, the configuraon (v,h) leading to low

2

RestrictedBoltzmannMachines •  AnRBMcomprisesvisiblenodesandhiddennodes.

•  Opera8onofanRBMisgovernedbyitsenergyfunc8on

•  Intheabovediagram,V=2andH=3

vi

hjwij

bjh

1

1biv

Page 3: Restricted Boltzmann Machinesmwmak/papers/RBM.pdf3 Restricted Boltzmann Machines • Given the joint probability of v and h is • Intui8vely, the configuraon (v,h) leading to low

3

RestrictedBoltzmannMachines •  Given

thejointprobabilityofvandhis

•  Intui8vely,theconfigura8on(v,h)leadingtolow(high)energyareassignedwithhigh(low)probability

•  Marginalizingovervandh,wehave

•  Intheabovediagram,V=2andH=3

v = [v1, . . . , vV ]T and h = [h1, . . . , hH ]T

Page 4: Restricted Boltzmann Machinesmwmak/papers/RBM.pdf3 Restricted Boltzmann Machines • Given the joint probability of v and h is • Intui8vely, the configuraon (v,h) leading to low

4

RestrictedBoltzmannMachines •  Condi8onalprobabili8es:

•  p(v|h)isdifficulttoevaluatebecausetherearemanydifferentu’sinthedenominator.

•  However,itispossibletoderiveaclose-formsolu8onfor

Page 5: Restricted Boltzmann Machinesmwmak/papers/RBM.pdf3 Restricted Boltzmann Machines • Given the joint probability of v and h is • Intui8vely, the configuraon (v,h) leading to low

5

RestrictedBoltzmannMachines

Page 6: Restricted Boltzmann Machinesmwmak/papers/RBM.pdf3 Restricted Boltzmann Machines • Given the joint probability of v and h is • Intui8vely, the configuraon (v,h) leading to low

6

RestrictedBoltzmannMachines

Page 7: Restricted Boltzmann Machinesmwmak/papers/RBM.pdf3 Restricted Boltzmann Machines • Given the joint probability of v and h is • Intui8vely, the configuraon (v,h) leading to low

7

RestrictedBoltzmannMachines •  Similarly:

•  Therefore,thecondi8onalprobabilitythatahiddenunitisonisindependentofotherhiddenunits.

•  ThispropertymakesRBMtrainingveryefficient

Page 8: Restricted Boltzmann Machinesmwmak/papers/RBM.pdf3 Restricted Boltzmann Machines • Given the joint probability of v and h is • Intui8vely, the configuraon (v,h) leading to low

8

TrainingofRBMs •  GivenCtrainingvectors:,weaimto

maximizethelogprobability

•  Usinggradientascent

Page 9: Restricted Boltzmann Machinesmwmak/papers/RBM.pdf3 Restricted Boltzmann Machines • Given the joint probability of v and h is • Intui8vely, the configuraon (v,h) leading to low

9

TrainingofRBMs •  Thefirstterm:

=X

g

vci gj

"e�E(vc,g)

Pg0 e�E(vc,g0)

#

=X

g

vci gjp(g|vc)

= vci hgj |vci= vci p(gj = 1|vc)

= vci p(hj = 1|vc)

Page 10: Restricted Boltzmann Machinesmwmak/papers/RBM.pdf3 Restricted Boltzmann Machines • Given the joint probability of v and h is • Intui8vely, the configuraon (v,h) leading to low

10

TrainingofRBMs •  Thesecondterm:

=X

u

X

g

uigj

"e�E(u,g)

Pu0P

g0 e�E(u0,g0)

#

= huigjip(u,g)= hvihjip(v,h)

Page 11: Restricted Boltzmann Machinesmwmak/papers/RBM.pdf3 Restricted Boltzmann Machines • Given the joint probability of v and h is • Intui8vely, the configuraon (v,h) leading to low

11

TrainingofRBMs •  Thesecondtermisdifficulttocomputebecausethereisno

close-formsolu8onforp(v,h)•  Inprac8ce,the2ndtermcanbeapproximatedby1-step

contras8vedivergence(CD-1):

Page 12: Restricted Boltzmann Machinesmwmak/papers/RBM.pdf3 Restricted Boltzmann Machines • Given the joint probability of v and h is • Intui8vely, the configuraon (v,h) leading to low

12

TrainingofRBMs 1.  Assignvisibleunits:2.  Computehiddenac8va8ons

3.  Samplingthehiddennodefrompmftoobtainasample(binary)h

4.  Reconstructvbasedonthesampledbinaryh:

5.  Computehidden-nodepmfbasedonthereconstructedv:

6.  Computeapproximatedexpecta8on(2ndterm):

vi vci , i = 1, . . . , V

hj =XV

i=1viwij + bhj

p(hj = 1|vc) =1

1 + e�(P

i viwij+bhj )

vreci =1

1 + e�(P

j hjwji+bvi )

p(hj = 1|vrec) = 1

1 + e�(P

i vreci wji+bhj )

hvihjip(v,h) ⇡ vreci p(hj = 1|vrec)

Page 13: Restricted Boltzmann Machinesmwmak/papers/RBM.pdf3 Restricted Boltzmann Machines • Given the joint probability of v and h is • Intui8vely, the configuraon (v,h) leading to low

13

TrainingofRBMs

Page 14: Restricted Boltzmann Machinesmwmak/papers/RBM.pdf3 Restricted Boltzmann Machines • Given the joint probability of v and h is • Intui8vely, the configuraon (v,h) leading to low

14

TrainingofRBMs •  Combining1stand2ndterms,trainingofRBMusingCD-1

amountsto:

wherethesuperscript“rec”meansreconstruc8ngvusingthec-thtrainingvectorasinput.

�wij = ✏w [hvihjidata

� hvihjimodel

]

= ✏w

CX

c=1

[vci p(hj = 1|vc)� vreci p(hj = 1|vrec)]

Page 15: Restricted Boltzmann Machinesmwmak/papers/RBM.pdf3 Restricted Boltzmann Machines • Given the joint probability of v and h is • Intui8vely, the configuraon (v,h) leading to low

15

Gaussian-BernoulliRBMs •  Forreal-valueratherthanbinaryinput,thevisualnodesare

assumetofollowGaussiandistribu8ons.•  Energyfunc8onofGaussian-BernoulliRBM(GB-RBM):

•  Energyfunc8onofBernoulli-BernoulliRBM(BB-RBM):

•  InGB-RBM,eachvisualnodeaddsaparabolicoffsettotheenergyfunc8on,withthewidthoftheparabolacontrolledbyσ.

Page 16: Restricted Boltzmann Machinesmwmak/papers/RBM.pdf3 Restricted Boltzmann Machines • Given the joint probability of v and h is • Intui8vely, the configuraon (v,h) leading to low

16

Gaussian-BernoulliRBMs •  Itcanbeshownthat

p(v|h) = N (v;µv|h,⌃v|h)

Page 17: Restricted Boltzmann Machinesmwmak/papers/RBM.pdf3 Restricted Boltzmann Machines • Given the joint probability of v and h is • Intui8vely, the configuraon (v,h) leading to low

17

Gaussian-BernoulliRBMs •  TrainingofGB-RBMsissimilartothatofBB-RBMs•  1sttermofthederiva8veoflog-probability:

= � 1

�ivci p(hj = 1|vc)

Page 18: Restricted Boltzmann Machinesmwmak/papers/RBM.pdf3 Restricted Boltzmann Machines • Given the joint probability of v and h is • Intui8vely, the configuraon (v,h) leading to low

18

Gaussian-BernoulliRBMs •  2ndtermofthederiva8veoflog-probability:

•  Updateformula:

= � 1

�ivreci p(hj = 1|vrec)

�wij =✏w�i

[hvihjidata

� hvihjimodel

]

=✏w�i

CX

c=1

[vci p(hj = 1|vc)� vreci p(hj = 1|vrec)]

Page 19: Restricted Boltzmann Machinesmwmak/papers/RBM.pdf3 Restricted Boltzmann Machines • Given the joint probability of v and h is • Intui8vely, the configuraon (v,h) leading to low

19

Gaussian-BernoulliRBMs

Page 20: Restricted Boltzmann Machinesmwmak/papers/RBM.pdf3 Restricted Boltzmann Machines • Given the joint probability of v and h is • Intui8vely, the configuraon (v,h) leading to low

20

Bernoulli-GaussianRBMs •  Forreal-valueratherthanbinaryoutput,thehiddennodes

areassumetofollowGaussiandistribu8ons.•  Energyfunc8onofBernoulli-GaussianRBM(BG-RBM):

•  Energyfunc8onofBernoulli-BernoulliRBM(BB-RBM):

•  InBG-RBM,eachhiddennodeaddsaparabolicoffsettotheenergyfunc8on,withthewidthoftheparabolacontrolledbyσ.

E(v,h) =HX

j=1

(hj � bhj )2

2�2j

�VX

i=1

bvi vi �VX

i=1

HX

j=1

vihj

�jwij

Page 21: Restricted Boltzmann Machinesmwmak/papers/RBM.pdf3 Restricted Boltzmann Machines • Given the joint probability of v and h is • Intui8vely, the configuraon (v,h) leading to low

21

Bernoulli-GaussianRBMs •  Itcanbeshownthat

p(vk = 1|h) = 1

1 + e�⇣PH

j=1

hjwjk�j

+bvk

p(h|v) = N (h;µh|v,⌃h|v)

bhj + �j

VX

i=1

viwji