chonbuk national universitynlp.chonbuk.ac.kr/pgm2019/slides_jbnu/mrf.pdf · 2019. 5. 1. · •a...

Undirected Graphical Models(Markov random fields)

Seung-Hoon Na

Chonbuk National University

Hopfield networks• Bidirectional associative memory

Bidirectional associative memory

• In general,

• After some iters, BAM goes to a fixed point (x,y):

…


• Hebbian learning

– Then,

• For a set of m vector pairs


– 𝒙0: initial vector

– the excitation vector:

– (𝒙0, 𝒚0): a stable state of the network if

smaller (because of the minus sign) if the vector 𝑾𝒚0

𝑇 lies closer to 𝒙0

The energy function


• The energy function E of BAM:

• Generally, using a threshold-unit for each node

Hopfield networks

• Asynchronous BAM

– Each unit computes its excitation at random times and changes its state to 1 or −1 independently of the others and according to the sign of its total excitation

– For each update, the energy function is decreasing

• Hopfield network

– A special case of an asynchronous BAM: x=y

– Each unit is connected to all other units except itself

– With necessary conditions for weight matrix W

• 1) The symmetry 2) Zero diagonal

Hopfield networks• Network with a non-zero diagonal

• Network with asymmetric connections,

(1, 1, 1) (−1,−1,−1)

Hopfield networks

• Energy function

Hopfield networks: Example• A flip-flop network

– Only stable states are (1,−1) and (−1, 1).

Energy function of a flip-flop

Hopfield networks: Example

Hopfield networks• A fully connected Ising model with a

symmetric weight matrix 𝑾 = 𝑾𝑇

• Exact inference is intractable

• Iterative conditional modes (ICM): just sets each node to its most likely (lowest energy) state, given all its neighbors

Learning MRFs: Moment matching

• MRF

• Log-likelihood:

• Gradient for 𝜽𝑐 :

Moment matching

Learning MRFs: Latent variable models

• Log-likelihood:

• Gradient:

• Alternatively, we can use a generalized EM

Approximate methods for

Learning MRFs

• Pseude-likelihood• Stochatic maximum likelihood• Iterative proportional fitting• Generalized iterative scaling

Pseudo likelihood

– 𝑃 𝑦 ≈ ς𝑖 𝑃(𝑦𝑖|𝑀𝐵 𝑦𝑖 )

The representation used by pseudo likelihood

𝑀𝐵 𝑦𝑖 : Markov blanket of 𝑦𝑖

Approximate methods for Learning MRFs:

Stochatic maximum likelihood

Approximate the model expectation using MC sampling

Iterative proportional fitting

•

𝑝 𝒚 𝜽 =1

𝑍𝑒𝑥𝑝

𝑐

𝜽𝑐𝑇𝝓𝑐(𝒚) =

1

𝑍ෑ

𝑐

𝜓𝑐(𝒚𝑐) = 𝑝 𝒚 𝝍

logL=σ𝑖 log 𝑝 𝒚(𝑖) 𝜽 = σ𝑖 σ𝑐 log𝜓𝑐(𝒚𝑐(𝑖)) − log𝑍

𝜕 𝑙𝑜𝑔𝐿

𝜕 𝜓𝑐(𝑦𝑐)=σ𝑖 𝐼(𝒚𝑐

(𝑖)= 𝑦𝑐)

𝜓𝑐(𝑦𝑐 )−σ𝒚∈𝑌(𝑦𝑐)

ς𝑐′≠𝑐𝜓𝑐′(𝒚𝑐′)

𝑍

i-th example에서 clique c에해당되는 potential function

𝑍 =

𝒚

ෑ

𝑐

𝜓𝑐(𝒚𝑐)



(𝑖)= 𝑦𝑐)

𝜓𝑐(𝑦𝑐 )−1

𝑍

𝒚∈𝑌(𝑦𝑐)

𝑍 𝑝(𝒚|𝝍)

𝜓𝑐(𝑦𝑐 )

𝑌(𝑦𝑐): cli𝑞𝑢𝑒가 𝑦𝑐값을 갖는모든 y집합




(𝑖)= 𝑦𝑐)

𝜓𝑐(𝑦𝑐 )−1

𝑍

𝒚∈𝑌(𝑦𝑐)

𝑍 𝑝(𝒚|𝝍)

𝜓𝑐(𝑦𝑐 )


𝜕 𝜓𝑐(𝑦𝑐)=𝑝𝑒𝑚𝑝(𝑦𝑐)

𝜓𝑐(𝑦𝑐 )−𝑝𝑚𝑜𝑑𝑒𝑙 𝑦𝑐 𝝍

𝜓𝑐 𝑦𝑐

Fixed point algorithm by making the gradient zero

𝜓𝑐 𝑦𝑐 ← 𝜓𝑐 𝑦𝑐𝑝𝑚𝑜𝑑𝑒𝑙 𝑦𝑐 𝝍

𝑝𝑒𝑚𝑝(𝑦𝑐)

Learning MRFs:

Generalized iterative scaling

Markov Logic Networks

• A Markov logic network 𝐿 is a set of pairs (𝐹𝑖 , 𝑤𝑖), where 𝐹𝑖 is a formula in first-order logic and 𝑤𝑖 is a real number.

• Together with a finite set of constants 𝐶 ={𝑐1, 𝑐2, . . . , 𝑐 𝐶 }, 𝐿 defines a Markov network 𝑀𝐿,𝐶 :

𝑀𝐿,𝐶 contains one binary node for each possible grounding of each predicate appearing in 𝐿. The value of the node is 1 if the ground atom is true, and 0 otherwise.

𝑀𝐿, 𝐶 contains one feature for each possible grounding of each formula 𝐹𝑖 in L. The value of this feature is 1 if the ground formula is true, and 0 otherwise. The weight of the feature is the 𝑤𝑖 associated with 𝐹𝑖 in 𝐿.


• The probability distribution of a possible world 𝑥:

𝑛𝑖(𝑥): the number of true groundings of 𝐹𝑖 in 𝑥

𝑥{𝑖}: the state (truth values) of the atoms appearing in 𝐹

Markov Logic Networks: Example


• Inference


• Learning

– Use Pseudo-likelihood to approximation the model probability

chonbuk national universitynlp.chonbuk.ac.kr/pgm2019/slides_jbnu/mrf.pdf · 2019. 5. 1. · •a...

Documents