chonbuk national universitynlp.chonbuk.ac.kr/pgm2019/slides_jbnu/mrf.pdf · 2019. 5. 1. · •a...
TRANSCRIPT
![Page 1: Chonbuk National Universitynlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MRF.pdf · 2019. 5. 1. · •A flip-flop network –Only stable states are (1,−1) and (−1, 1). Energy function](https://reader036.vdocument.in/reader036/viewer/2022071505/61255a6121248058a4304bc9/html5/thumbnails/1.jpg)
Undirected Graphical Models(Markov random fields)
Seung-Hoon Na
Chonbuk National University
![Page 2: Chonbuk National Universitynlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MRF.pdf · 2019. 5. 1. · •A flip-flop network –Only stable states are (1,−1) and (−1, 1). Energy function](https://reader036.vdocument.in/reader036/viewer/2022071505/61255a6121248058a4304bc9/html5/thumbnails/2.jpg)
Hopfield networks• Bidirectional associative memory
![Page 3: Chonbuk National Universitynlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MRF.pdf · 2019. 5. 1. · •A flip-flop network –Only stable states are (1,−1) and (−1, 1). Energy function](https://reader036.vdocument.in/reader036/viewer/2022071505/61255a6121248058a4304bc9/html5/thumbnails/3.jpg)
Bidirectional associative memory
• In general,
• After some iters, BAM goes to a fixed point (x,y):
…
![Page 4: Chonbuk National Universitynlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MRF.pdf · 2019. 5. 1. · •A flip-flop network –Only stable states are (1,−1) and (−1, 1). Energy function](https://reader036.vdocument.in/reader036/viewer/2022071505/61255a6121248058a4304bc9/html5/thumbnails/4.jpg)
Bidirectional associative memory
• Hebbian learning
– Then,
• For a set of m vector pairs
![Page 5: Chonbuk National Universitynlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MRF.pdf · 2019. 5. 1. · •A flip-flop network –Only stable states are (1,−1) and (−1, 1). Energy function](https://reader036.vdocument.in/reader036/viewer/2022071505/61255a6121248058a4304bc9/html5/thumbnails/5.jpg)
Bidirectional associative memory
– 𝒙0: initial vector
– the excitation vector:
– (𝒙0, 𝒚0): a stable state of the network if
smaller (because of the minus sign) if the vector 𝑾𝒚0
𝑇 lies closer to 𝒙0
The energy function
![Page 6: Chonbuk National Universitynlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MRF.pdf · 2019. 5. 1. · •A flip-flop network –Only stable states are (1,−1) and (−1, 1). Energy function](https://reader036.vdocument.in/reader036/viewer/2022071505/61255a6121248058a4304bc9/html5/thumbnails/6.jpg)
Bidirectional associative memory
• The energy function E of BAM:
• Generally, using a threshold-unit for each node
![Page 7: Chonbuk National Universitynlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MRF.pdf · 2019. 5. 1. · •A flip-flop network –Only stable states are (1,−1) and (−1, 1). Energy function](https://reader036.vdocument.in/reader036/viewer/2022071505/61255a6121248058a4304bc9/html5/thumbnails/7.jpg)
Hopfield networks
• Asynchronous BAM
– Each unit computes its excitation at random times and changes its state to 1 or −1 independently of the others and according to the sign of its total excitation
– For each update, the energy function is decreasing
• Hopfield network
– A special case of an asynchronous BAM: x=y
– Each unit is connected to all other units except itself
– With necessary conditions for weight matrix W
• 1) The symmetry 2) Zero diagonal
![Page 8: Chonbuk National Universitynlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MRF.pdf · 2019. 5. 1. · •A flip-flop network –Only stable states are (1,−1) and (−1, 1). Energy function](https://reader036.vdocument.in/reader036/viewer/2022071505/61255a6121248058a4304bc9/html5/thumbnails/8.jpg)
Hopfield networks• Network with a non-zero diagonal
• Network with asymmetric connections,
(1, 1, 1) (−1,−1,−1)
![Page 9: Chonbuk National Universitynlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MRF.pdf · 2019. 5. 1. · •A flip-flop network –Only stable states are (1,−1) and (−1, 1). Energy function](https://reader036.vdocument.in/reader036/viewer/2022071505/61255a6121248058a4304bc9/html5/thumbnails/9.jpg)
Hopfield networks
• Energy function
![Page 10: Chonbuk National Universitynlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MRF.pdf · 2019. 5. 1. · •A flip-flop network –Only stable states are (1,−1) and (−1, 1). Energy function](https://reader036.vdocument.in/reader036/viewer/2022071505/61255a6121248058a4304bc9/html5/thumbnails/10.jpg)
Hopfield networks: Example• A flip-flop network
– Only stable states are (1,−1) and (−1, 1).
Energy function of a flip-flop
![Page 11: Chonbuk National Universitynlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MRF.pdf · 2019. 5. 1. · •A flip-flop network –Only stable states are (1,−1) and (−1, 1). Energy function](https://reader036.vdocument.in/reader036/viewer/2022071505/61255a6121248058a4304bc9/html5/thumbnails/11.jpg)
Hopfield networks: Example
![Page 12: Chonbuk National Universitynlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MRF.pdf · 2019. 5. 1. · •A flip-flop network –Only stable states are (1,−1) and (−1, 1). Energy function](https://reader036.vdocument.in/reader036/viewer/2022071505/61255a6121248058a4304bc9/html5/thumbnails/12.jpg)
Hopfield networks: Example
![Page 13: Chonbuk National Universitynlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MRF.pdf · 2019. 5. 1. · •A flip-flop network –Only stable states are (1,−1) and (−1, 1). Energy function](https://reader036.vdocument.in/reader036/viewer/2022071505/61255a6121248058a4304bc9/html5/thumbnails/13.jpg)
Hopfield networks• A fully connected Ising model with a
symmetric weight matrix 𝑾 = 𝑾𝑇
• Exact inference is intractable
• Iterative conditional modes (ICM): just sets each node to its most likely (lowest energy) state, given all its neighbors
![Page 14: Chonbuk National Universitynlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MRF.pdf · 2019. 5. 1. · •A flip-flop network –Only stable states are (1,−1) and (−1, 1). Energy function](https://reader036.vdocument.in/reader036/viewer/2022071505/61255a6121248058a4304bc9/html5/thumbnails/14.jpg)
Learning MRFs: Moment matching
• MRF
• Log-likelihood:
• Gradient for 𝜽𝑐 :
Moment matching
![Page 15: Chonbuk National Universitynlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MRF.pdf · 2019. 5. 1. · •A flip-flop network –Only stable states are (1,−1) and (−1, 1). Energy function](https://reader036.vdocument.in/reader036/viewer/2022071505/61255a6121248058a4304bc9/html5/thumbnails/15.jpg)
Learning MRFs: Latent variable models
• Log-likelihood:
• Gradient:
• Alternatively, we can use a generalized EM
![Page 16: Chonbuk National Universitynlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MRF.pdf · 2019. 5. 1. · •A flip-flop network –Only stable states are (1,−1) and (−1, 1). Energy function](https://reader036.vdocument.in/reader036/viewer/2022071505/61255a6121248058a4304bc9/html5/thumbnails/16.jpg)
Approximate methods for
Learning MRFs
• Pseude-likelihood• Stochatic maximum likelihood• Iterative proportional fitting• Generalized iterative scaling
![Page 17: Chonbuk National Universitynlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MRF.pdf · 2019. 5. 1. · •A flip-flop network –Only stable states are (1,−1) and (−1, 1). Energy function](https://reader036.vdocument.in/reader036/viewer/2022071505/61255a6121248058a4304bc9/html5/thumbnails/17.jpg)
Pseudo likelihood
– 𝑃 𝑦 ≈ ς𝑖 𝑃(𝑦𝑖|𝑀𝐵 𝑦𝑖 )
The representation used by pseudo likelihood
𝑀𝐵 𝑦𝑖 : Markov blanket of 𝑦𝑖
![Page 18: Chonbuk National Universitynlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MRF.pdf · 2019. 5. 1. · •A flip-flop network –Only stable states are (1,−1) and (−1, 1). Energy function](https://reader036.vdocument.in/reader036/viewer/2022071505/61255a6121248058a4304bc9/html5/thumbnails/18.jpg)
Approximate methods for Learning MRFs:
Stochatic maximum likelihood
Approximate the model expectation using MC sampling
![Page 19: Chonbuk National Universitynlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MRF.pdf · 2019. 5. 1. · •A flip-flop network –Only stable states are (1,−1) and (−1, 1). Energy function](https://reader036.vdocument.in/reader036/viewer/2022071505/61255a6121248058a4304bc9/html5/thumbnails/19.jpg)
Iterative proportional fitting
•
𝑝 𝒚 𝜽 =1
𝑍𝑒𝑥𝑝
𝑐
𝜽𝑐𝑇𝝓𝑐(𝒚) =
1
𝑍ෑ
𝑐
𝜓𝑐(𝒚𝑐) = 𝑝 𝒚 𝝍
logL=σ𝑖 log 𝑝 𝒚(𝑖) 𝜽 = σ𝑖 σ𝑐 log𝜓𝑐(𝒚𝑐(𝑖)) − log𝑍
𝜕 𝑙𝑜𝑔𝐿
𝜕 𝜓𝑐(𝑦𝑐)=σ𝑖 𝐼(𝒚𝑐
(𝑖)= 𝑦𝑐)
𝜓𝑐(𝑦𝑐 )−σ𝒚∈𝑌(𝑦𝑐)
ς𝑐′≠𝑐𝜓𝑐′(𝒚𝑐′)
𝑍
i-th example에서 clique c에해당되는 potential function
𝑍 =
𝒚
ෑ
𝑐
𝜓𝑐(𝒚𝑐)
𝜕 𝑙𝑜𝑔𝐿
𝜕 𝜓𝑐(𝑦𝑐)=σ𝑖 𝐼(𝒚𝑐
(𝑖)= 𝑦𝑐)
𝜓𝑐(𝑦𝑐 )−1
𝑍
𝒚∈𝑌(𝑦𝑐)
𝑍 𝑝(𝒚|𝝍)
𝜓𝑐(𝑦𝑐 )
𝑌(𝑦𝑐): cli𝑞𝑢𝑒가 𝑦𝑐값을 갖는모든 y집합
![Page 20: Chonbuk National Universitynlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MRF.pdf · 2019. 5. 1. · •A flip-flop network –Only stable states are (1,−1) and (−1, 1). Energy function](https://reader036.vdocument.in/reader036/viewer/2022071505/61255a6121248058a4304bc9/html5/thumbnails/20.jpg)
Iterative proportional fitting
𝜕 𝑙𝑜𝑔𝐿
𝜕 𝜓𝑐(𝑦𝑐)=σ𝑖 𝐼(𝒚𝑐
(𝑖)= 𝑦𝑐)
𝜓𝑐(𝑦𝑐 )−1
𝑍
𝒚∈𝑌(𝑦𝑐)
𝑍 𝑝(𝒚|𝝍)
𝜓𝑐(𝑦𝑐 )
𝜕 𝑙𝑜𝑔𝐿
𝜕 𝜓𝑐(𝑦𝑐)=𝑝𝑒𝑚𝑝(𝑦𝑐)
𝜓𝑐(𝑦𝑐 )−𝑝𝑚𝑜𝑑𝑒𝑙 𝑦𝑐 𝝍
𝜓𝑐 𝑦𝑐
Fixed point algorithm by making the gradient zero
𝜓𝑐 𝑦𝑐 ← 𝜓𝑐 𝑦𝑐𝑝𝑚𝑜𝑑𝑒𝑙 𝑦𝑐 𝝍
𝑝𝑒𝑚𝑝(𝑦𝑐)
![Page 21: Chonbuk National Universitynlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MRF.pdf · 2019. 5. 1. · •A flip-flop network –Only stable states are (1,−1) and (−1, 1). Energy function](https://reader036.vdocument.in/reader036/viewer/2022071505/61255a6121248058a4304bc9/html5/thumbnails/21.jpg)
Iterative proportional fitting
![Page 22: Chonbuk National Universitynlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MRF.pdf · 2019. 5. 1. · •A flip-flop network –Only stable states are (1,−1) and (−1, 1). Energy function](https://reader036.vdocument.in/reader036/viewer/2022071505/61255a6121248058a4304bc9/html5/thumbnails/22.jpg)
Learning MRFs:
Generalized iterative scaling
![Page 23: Chonbuk National Universitynlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MRF.pdf · 2019. 5. 1. · •A flip-flop network –Only stable states are (1,−1) and (−1, 1). Energy function](https://reader036.vdocument.in/reader036/viewer/2022071505/61255a6121248058a4304bc9/html5/thumbnails/23.jpg)
Markov Logic Networks
• A Markov logic network 𝐿 is a set of pairs (𝐹𝑖 , 𝑤𝑖), where 𝐹𝑖 is a formula in first-order logic and 𝑤𝑖 is a real number.
• Together with a finite set of constants 𝐶 ={𝑐1, 𝑐2, . . . , 𝑐 𝐶 }, 𝐿 defines a Markov network 𝑀𝐿,𝐶 :
𝑀𝐿,𝐶 contains one binary node for each possible grounding of each predicate appearing in 𝐿. The value of the node is 1 if the ground atom is true, and 0 otherwise.
𝑀𝐿, 𝐶 contains one feature for each possible grounding of each formula 𝐹𝑖 in L. The value of this feature is 1 if the ground formula is true, and 0 otherwise. The weight of the feature is the 𝑤𝑖 associated with 𝐹𝑖 in 𝐿.
![Page 24: Chonbuk National Universitynlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MRF.pdf · 2019. 5. 1. · •A flip-flop network –Only stable states are (1,−1) and (−1, 1). Energy function](https://reader036.vdocument.in/reader036/viewer/2022071505/61255a6121248058a4304bc9/html5/thumbnails/24.jpg)
Markov Logic Networks
• The probability distribution of a possible world 𝑥:
𝑛𝑖(𝑥): the number of true groundings of 𝐹𝑖 in 𝑥
𝑥{𝑖}: the state (truth values) of the atoms appearing in 𝐹
![Page 25: Chonbuk National Universitynlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MRF.pdf · 2019. 5. 1. · •A flip-flop network –Only stable states are (1,−1) and (−1, 1). Energy function](https://reader036.vdocument.in/reader036/viewer/2022071505/61255a6121248058a4304bc9/html5/thumbnails/25.jpg)
Markov Logic Networks: Example
![Page 26: Chonbuk National Universitynlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MRF.pdf · 2019. 5. 1. · •A flip-flop network –Only stable states are (1,−1) and (−1, 1). Energy function](https://reader036.vdocument.in/reader036/viewer/2022071505/61255a6121248058a4304bc9/html5/thumbnails/26.jpg)
Markov Logic Networks: Example
![Page 27: Chonbuk National Universitynlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MRF.pdf · 2019. 5. 1. · •A flip-flop network –Only stable states are (1,−1) and (−1, 1). Energy function](https://reader036.vdocument.in/reader036/viewer/2022071505/61255a6121248058a4304bc9/html5/thumbnails/27.jpg)
Markov Logic Networks
• Inference
![Page 28: Chonbuk National Universitynlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MRF.pdf · 2019. 5. 1. · •A flip-flop network –Only stable states are (1,−1) and (−1, 1). Energy function](https://reader036.vdocument.in/reader036/viewer/2022071505/61255a6121248058a4304bc9/html5/thumbnails/28.jpg)
Markov Logic Networks
• Learning
– Use Pseudo-likelihood to approximation the model probability