graphical models: approximate inference and learning ca6b, lecture 5
TRANSCRIPT
Graphical models: approximate inference and learning
CA6b, lecture 5
Bayesian Networks
General Factorization
D-separation: Example
Trees
Undirected Tree Directed Tree Polytree
Converting Directed to Undirected Graphs (2)
Additional links
Inference on a Chain
Inference on a Chain
Inference on a Chain
Inference in a HMM
E step: belief propagation
1s 1ns ns 1ns Ns
Belief propagation in a HMM
E step: belief propagation
1s 1ns ns 1ns Ns
Expectation maximization in a HMM
E step: belief propagation
1s 1ns ns 1ns Ns
The Junction Tree Algorithm
• Exact inference on general graphs.• Works by turning the initial graph into a
junction tree and then running a sum-product-like algorithm.
Factor Graphs
Factor Graphs from Undirected Graphs
The Sum-Product Algorithm (6)
The Sum-Product Algorithm (6)
The Sum-Product Algorithm (6)
The Sum-Product Algorithm (5)
The Sum-Product Algorithm (3)
The Sum-Product Algorithm (7)
Initialization
Sensory observations
Prior expectations
Forest
Tree
Leave Root
Bottom-up
Top-down
Stem
1x
2x
3x
5x
4x
6xGreen
Consequence of failing inhibition in hierarchical inference
Causal model Pairwise factor graph
Bayesian network and factor graph
Causal model Pairwise factor graph
Causal model Pairwise factor graph
Pairwise graphs
Log belief ratio
Log messages ratio
Belief propagation and inhibitory loops
-
-
-
-
-
Tight excitatory/inhibitory balance is required, and sufficient
Okun and Lampl, Nat Neuro 2008
Inhibition
Excitation
Lewis et al,Nat Rev Nsci 05
controls schizophrenia
Support for impaired inhibition in schizophrenia
See also: Benes, Neuropsychopharmacology 2010, Uhhaas and Singer, Nat Rev Nsci 2010…
GAD26
Circular inference:
Impaired inhibitory loops
Circular inference and overconfidence:
32
1
2
Renaud Jardri Alexandra Litvinova & Sandrine Duverne
The Fisher Task
3
4
A priori
Evidence sensorielles
Confiance a posteriori
Mean group responses
Controls: Schizophrenes:
-4 -2 0 2 4-8
-6
-4
-2
0
2
4
6
8
Co
nfid
ence
Log likelihood ratio-4 -2 0 2 4
-8
-6
-4
-2
0
2
4
6
8
Co
nfid
ence
Log prior ratio
Simple Bayes:
-2 0 2-3
-2
-1
0
1
2
3
Log likelihood ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log prior ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log likelihood ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log prior ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log likelihood ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log prior ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log likelihood ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log prior ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log likelihood ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log prior ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log likelihood ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log prior ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log likelihood ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log prior ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log likelihood ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log prior ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log likelihood ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log prior ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log likelihood ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log prior ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log likelihood ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log prior ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log likelihood ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log prior ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log likelihood ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log prior ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log likelihood ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log prior ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log likelihood ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log prior ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log likelihood ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log prior ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log likelihood ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log prior ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log likelihood ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log prior ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log likelihood ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log prior ratio
Con
fiden
ce-2 0 2
-3
-2
-1
0
1
2
3
Log likelihood ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log prior ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log likelihood ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log prior ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log likelihood ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log prior ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log likelihood ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log prior ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log likelihood ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log prior ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log likelihood ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log prior ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log likelihood ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log prior ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log likelihood ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log prior ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log likelihood ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log prior ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log likelihood ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log prior ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log likelihood ratio
Con
fiden
ce
-2 0 2-3
-2
-1
0
1
2
3
Log prior ratio
Con
fiden
ce
Control Patients
?s
SCZ
CTL
***
*
***
Para
met
er v
alue
(mea
n +
sd) 0.75
0.50
0.25
0.00
Mean parameter values
PAN
SS p
ositi
ve fa
ctor
Inference loops and psychosis
25
Non
-clin
ical
bel
iefs
(PD
I-21
scor
es)
PDI s
core
Strenght of loops Strenght of loops
The Junction Tree Algorithm
• Exact inference on general graphs.• Works by turning the initial graph into a
junction tree and then running a sum-product-like algorithm.
• Intractable on graphs with large cliques.
What if exact inference is intractable?
• Loopy belief propagation works in some scenarios.
• Markov-Monte-Carlo sampling methods.• Variational methods (not covered here)
Loopy Belief Propagation
• Sum-Product on general graphs.• Initial unit messages passed across all links,
after which messages are passed around until convergence (not guaranteed!).
• Approximate but tractable for large graphs.• Sometime works well, sometimes not at all.
1s1sh
2sh
Neural code for uncertainty: sampling
Alternative neural code for uncertainty: sampling
Berkes et al, Science 2011
Alternative neural code for uncertainty: sampling
Learning in graphical models
More generally: learning parameters in latent variable models
Visible
Hidden
?
? , |p x h
ˆ argmax |u
u
p x
Learning in graphical models
More generally: learning parameters in latent variable models
Visible
Hidden
?
? , |p x h
ˆ argmax |u
u
p x
| , |u u
h
p x p x h
Learning in graphical models
More generally: learning parameters in latent variable models
Visible
Hidden
?
? , |p x h
ˆ argmax |u
u
p x
| , |u u
h
p x p x h
Huge!
Mixture of Gaussians (clustering algorithm)
Data (unsupervised)
Mixture of Gaussians (clustering algorithm)
Data (unsupervised)
Generative model: M possible clusters
Gaussian distribution
Mixture of Gaussians (clustering algorithm)
Data (unsupervised)
Generative model: M possible clusters
Gaussian distribution
Parameters
Given the current parameters and the data, what are the expected hidden states?
Expectation stage:
Responsability
Given the responsabilities of each cluster, update the parameters to maximize the likelihood of the data:
Maximization stage:
Learning in hidden Markov models
1ts ts 1ts
tx1tx 1tx Hidden state
Observations
cause
Forward model
Sensory likelihood
Inverse model
tx
1ts
2ts
tL
t dts ts t dts
txdttx dttx Object present/not
Receptor spike/not
Time
1ts
2ts
tx
Time
tL
Leak Synaptic input
' it i t
i
LL w s
t
Bayesian integration corresponds to leaky integration.
Expectation maximization in a HMM
1s 1ns ns 1ns Ns
Multiple training sequences: 1 2, ,...,u u uNs s s
What are the parameters: 1 |ij n nr p x i x j
|ik n nq p s k x j
Transition probabilities
Observation probabilities
Expectation stage
E step: belief propagation
1s 1ns ns 1ns Ns
Expectation stage
E step: belief propagation
1s 1ns ns 1ns Ns
Expectation stage
E step: belief propagation
1s 1ns ns 1ns Ns
Using “on-line” expectation maximization, a neuron can adapt to the statistics of its input.
1 0,i iq q
,on offr r
Fast adaptation in single neurons
Adaptation to temporal statistics? Fairhall et al, 2001