graphical models: approximate inference and learning ca6b, lecture 5

70
Graphical models: approximate inference and learning CA6b, lecture 5

Upload: hugh-fields

Post on 18-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Graphical models: approximate inference and learning CA6b, lecture 5

Graphical models: approximate inference and learning

CA6b, lecture 5

Page 2: Graphical models: approximate inference and learning CA6b, lecture 5

Bayesian Networks

General Factorization

Page 3: Graphical models: approximate inference and learning CA6b, lecture 5

D-separation: Example

Page 4: Graphical models: approximate inference and learning CA6b, lecture 5

Trees

Undirected Tree Directed Tree Polytree

Page 5: Graphical models: approximate inference and learning CA6b, lecture 5

Converting Directed to Undirected Graphs (2)

Additional links

Page 6: Graphical models: approximate inference and learning CA6b, lecture 5

Inference on a Chain

Page 7: Graphical models: approximate inference and learning CA6b, lecture 5

Inference on a Chain

Page 8: Graphical models: approximate inference and learning CA6b, lecture 5

Inference on a Chain

Page 9: Graphical models: approximate inference and learning CA6b, lecture 5

Inference in a HMM

E step: belief propagation

1s 1ns ns 1ns Ns

Page 10: Graphical models: approximate inference and learning CA6b, lecture 5

Belief propagation in a HMM

E step: belief propagation

1s 1ns ns 1ns Ns

Page 11: Graphical models: approximate inference and learning CA6b, lecture 5

Expectation maximization in a HMM

E step: belief propagation

1s 1ns ns 1ns Ns

Page 12: Graphical models: approximate inference and learning CA6b, lecture 5

The Junction Tree Algorithm

• Exact inference on general graphs.• Works by turning the initial graph into a

junction tree and then running a sum-product-like algorithm.

Page 13: Graphical models: approximate inference and learning CA6b, lecture 5

Factor Graphs

Page 14: Graphical models: approximate inference and learning CA6b, lecture 5

Factor Graphs from Undirected Graphs

Page 15: Graphical models: approximate inference and learning CA6b, lecture 5

The Sum-Product Algorithm (6)

Page 16: Graphical models: approximate inference and learning CA6b, lecture 5

The Sum-Product Algorithm (6)

Page 17: Graphical models: approximate inference and learning CA6b, lecture 5

The Sum-Product Algorithm (6)

Page 18: Graphical models: approximate inference and learning CA6b, lecture 5

The Sum-Product Algorithm (5)

Page 19: Graphical models: approximate inference and learning CA6b, lecture 5

The Sum-Product Algorithm (3)

Page 20: Graphical models: approximate inference and learning CA6b, lecture 5

The Sum-Product Algorithm (7)

Initialization

Page 21: Graphical models: approximate inference and learning CA6b, lecture 5
Page 22: Graphical models: approximate inference and learning CA6b, lecture 5

Sensory observations

Prior expectations

Forest

Tree

Leave Root

Bottom-up

Top-down

Stem

1x

2x

3x

5x

4x

6xGreen

Consequence of failing inhibition in hierarchical inference

Page 23: Graphical models: approximate inference and learning CA6b, lecture 5

Causal model Pairwise factor graph

Bayesian network and factor graph

Page 24: Graphical models: approximate inference and learning CA6b, lecture 5

Causal model Pairwise factor graph

Page 25: Graphical models: approximate inference and learning CA6b, lecture 5

Causal model Pairwise factor graph

Page 26: Graphical models: approximate inference and learning CA6b, lecture 5

Pairwise graphs

Log belief ratio

Log messages ratio

Page 27: Graphical models: approximate inference and learning CA6b, lecture 5

Belief propagation and inhibitory loops

-

-

-

-

-

Page 28: Graphical models: approximate inference and learning CA6b, lecture 5

Tight excitatory/inhibitory balance is required, and sufficient

Okun and Lampl, Nat Neuro 2008

Inhibition

Excitation

Page 29: Graphical models: approximate inference and learning CA6b, lecture 5

Lewis et al,Nat Rev Nsci 05

controls schizophrenia

Support for impaired inhibition in schizophrenia

See also: Benes, Neuropsychopharmacology 2010, Uhhaas and Singer, Nat Rev Nsci 2010…

GAD26

Page 30: Graphical models: approximate inference and learning CA6b, lecture 5

Circular inference:

Impaired inhibitory loops

Page 31: Graphical models: approximate inference and learning CA6b, lecture 5

Circular inference and overconfidence:

Page 32: Graphical models: approximate inference and learning CA6b, lecture 5

32

1

2

Renaud Jardri Alexandra Litvinova & Sandrine Duverne

The Fisher Task

3

4

A priori

Evidence sensorielles

Confiance a posteriori

Page 33: Graphical models: approximate inference and learning CA6b, lecture 5

Mean group responses

Controls: Schizophrenes:

-4 -2 0 2 4-8

-6

-4

-2

0

2

4

6

8

Co

nfid

ence

Log likelihood ratio-4 -2 0 2 4

-8

-6

-4

-2

0

2

4

6

8

Co

nfid

ence

Log prior ratio

Simple Bayes:

Page 34: Graphical models: approximate inference and learning CA6b, lecture 5

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce-2 0 2

-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

Control Patients

Page 35: Graphical models: approximate inference and learning CA6b, lecture 5

?s

SCZ

CTL

***

*

***

Para

met

er v

alue

(mea

n +

sd) 0.75

0.50

0.25

0.00

Mean parameter values

Page 36: Graphical models: approximate inference and learning CA6b, lecture 5

PAN

SS p

ositi

ve fa

ctor

Inference loops and psychosis

25

Non

-clin

ical

bel

iefs

(PD

I-21

scor

es)

PDI s

core

Strenght of loops Strenght of loops

Page 37: Graphical models: approximate inference and learning CA6b, lecture 5

The Junction Tree Algorithm

• Exact inference on general graphs.• Works by turning the initial graph into a

junction tree and then running a sum-product-like algorithm.

• Intractable on graphs with large cliques.

Page 38: Graphical models: approximate inference and learning CA6b, lecture 5

What if exact inference is intractable?

• Loopy belief propagation works in some scenarios.

• Markov-Monte-Carlo sampling methods.• Variational methods (not covered here)

Page 39: Graphical models: approximate inference and learning CA6b, lecture 5

Loopy Belief Propagation

• Sum-Product on general graphs.• Initial unit messages passed across all links,

after which messages are passed around until convergence (not guaranteed!).

• Approximate but tractable for large graphs.• Sometime works well, sometimes not at all.

Page 40: Graphical models: approximate inference and learning CA6b, lecture 5
Page 41: Graphical models: approximate inference and learning CA6b, lecture 5
Page 42: Graphical models: approximate inference and learning CA6b, lecture 5
Page 43: Graphical models: approximate inference and learning CA6b, lecture 5
Page 44: Graphical models: approximate inference and learning CA6b, lecture 5
Page 45: Graphical models: approximate inference and learning CA6b, lecture 5

1s1sh

2sh

Neural code for uncertainty: sampling

Page 46: Graphical models: approximate inference and learning CA6b, lecture 5

Alternative neural code for uncertainty: sampling

Berkes et al, Science 2011

Page 47: Graphical models: approximate inference and learning CA6b, lecture 5

Alternative neural code for uncertainty: sampling

Page 48: Graphical models: approximate inference and learning CA6b, lecture 5

Learning in graphical models

More generally: learning parameters in latent variable models

Visible

Hidden

?

? , |p x h

ˆ argmax |u

u

p x

Page 49: Graphical models: approximate inference and learning CA6b, lecture 5

Learning in graphical models

More generally: learning parameters in latent variable models

Visible

Hidden

?

? , |p x h

ˆ argmax |u

u

p x

| , |u u

h

p x p x h

Page 50: Graphical models: approximate inference and learning CA6b, lecture 5

Learning in graphical models

More generally: learning parameters in latent variable models

Visible

Hidden

?

? , |p x h

ˆ argmax |u

u

p x

| , |u u

h

p x p x h

Huge!

Page 51: Graphical models: approximate inference and learning CA6b, lecture 5
Page 52: Graphical models: approximate inference and learning CA6b, lecture 5

Mixture of Gaussians (clustering algorithm)

Data (unsupervised)

Page 53: Graphical models: approximate inference and learning CA6b, lecture 5

Mixture of Gaussians (clustering algorithm)

Data (unsupervised)

Generative model: M possible clusters

Gaussian distribution

Page 54: Graphical models: approximate inference and learning CA6b, lecture 5

Mixture of Gaussians (clustering algorithm)

Data (unsupervised)

Generative model: M possible clusters

Gaussian distribution

Parameters

Page 55: Graphical models: approximate inference and learning CA6b, lecture 5

Given the current parameters and the data, what are the expected hidden states?

Expectation stage:

Responsability

Page 56: Graphical models: approximate inference and learning CA6b, lecture 5

Given the responsabilities of each cluster, update the parameters to maximize the likelihood of the data:

Maximization stage:

Page 57: Graphical models: approximate inference and learning CA6b, lecture 5

Learning in hidden Markov models

1ts ts 1ts

tx1tx 1tx Hidden state

Observations

cause

Forward model

Sensory likelihood

Inverse model

Page 58: Graphical models: approximate inference and learning CA6b, lecture 5

tx

1ts

2ts

tL

t dts ts t dts

txdttx dttx Object present/not

Receptor spike/not

Time

1ts

2ts

tx

Page 59: Graphical models: approximate inference and learning CA6b, lecture 5

Time

tL

Leak Synaptic input

' it i t

i

LL w s

t

Bayesian integration corresponds to leaky integration.

Page 60: Graphical models: approximate inference and learning CA6b, lecture 5

Expectation maximization in a HMM

1s 1ns ns 1ns Ns

Multiple training sequences: 1 2, ,...,u u uNs s s

What are the parameters: 1 |ij n nr p x i x j

|ik n nq p s k x j

Transition probabilities

Observation probabilities

Page 61: Graphical models: approximate inference and learning CA6b, lecture 5

Expectation stage

E step: belief propagation

1s 1ns ns 1ns Ns

Page 62: Graphical models: approximate inference and learning CA6b, lecture 5

Expectation stage

E step: belief propagation

1s 1ns ns 1ns Ns

Page 63: Graphical models: approximate inference and learning CA6b, lecture 5

Expectation stage

E step: belief propagation

1s 1ns ns 1ns Ns

Page 64: Graphical models: approximate inference and learning CA6b, lecture 5

Using “on-line” expectation maximization, a neuron can adapt to the statistics of its input.

1 0,i iq q

,on offr r

Page 65: Graphical models: approximate inference and learning CA6b, lecture 5

Fast adaptation in single neurons

Adaptation to temporal statistics? Fairhall et al, 2001

Page 66: Graphical models: approximate inference and learning CA6b, lecture 5
Page 67: Graphical models: approximate inference and learning CA6b, lecture 5
Page 68: Graphical models: approximate inference and learning CA6b, lecture 5
Page 69: Graphical models: approximate inference and learning CA6b, lecture 5
Page 70: Graphical models: approximate inference and learning CA6b, lecture 5