an introduction to bayesian inference and model comparison · an introduction to bayesian inference...

J. Daunizeau

ICM, Paris, France TNU, Zurich, Switzerland

An introduction to Bayesian inference and model comparison

Overview of the talk

9 An introduction to probabilistic modelling

9 Bayesian model comparison

9 SPM applications

Degree of plausibility desiderata: - should be represented using real numbers (D1) - should conform with intuition (D2) - should be consistent (D3)

a=2 b=5

a=2

• normalization:

• marginalization:

• conditioning : (Bayes rule)

Probability theory: basics

Deriving the likelihood function

- Model of data with unknown parameters:

� �y f T e.g., GLM: � �f XT T

- But data is noisy: � �y f T H �

- Assume noise/residuals is ‘small’:

� � 22

1exp

2p H H

V§ ·v �¨ ¸© ¹

� �4 0.05P H V! |

H

→ Distribution of data, given fixed parameters:

� � � �� 2

2

1exp

2p y y fT T

V§ ·v � �¨ ¸© ¹

T

f

Forward and inverse problems

� �,p y m-

forward problem

likelihood

� �,p y m-

inverse problem

posterior distribution model data

Likelihood:

Prior:

Bayes rule:

Likelihood, priors and the model evidence

T

generative model m

Principle of parsimony : « plurality should not be assumed without necessity »

y=f(x

) y

= f(

x)

x

“Occam’s razor” :

mod

el e

vide

nce

p(y|

m)

space of all data sets

Model evidence:

Bayesian model comparison

••• inference

causality

Hierarchical models

Directed acyclic graphs (DAGs)

Variational approximations (VB, EM, ReML)

→ VB : maximize the free energy F(q) w.r.t. the approximate posterior q(θ) under some (e.g., mean field, Laplace) simplifying constraint

� �1 or 2q T

� �1 or 2 ,p y mT

� �1 2, ,p y mT T

T1

T2

� � � � � ��

� � � �� Free energy

ln | ln , | , ;q

F q

p y m p y m S q KL p y m qT T T � �




9 SPM applications

� �t t Y{ t *

� �0*P t t H!

� �0p t H

� �0*P t t H D! dif then reject H0

• estimate parameters (obtain test stat.)

H0 :T 0• define the null, e.g.:

• apply decision rule, i.e.:

classical (null) hypothesis testing

• define two alternative models, e.g.:

• apply decision rule, e.g.:

Bayesian Model Comparison

Frequentist versus Bayesian inference

Y y

� �1p Y m

� �0p Y m

space of all datasets

if then accept m0 � ��

0

1

P m yP m y

Dt

� �

� � � �

0 0

1 1

1 if 0:

0 otherwise

: 0,

m p m

m p m N

TT

T

®¯

6

Family-level inference

A B A B

A B

u

A B

u

P(m1|y) = 0.04 P(m2|y) = 0.25

P(m2|y) = 0.7 P(m2|y) = 0.01

� � � �1 1 max

0.3m

P e y P m y �

model selection error risk:

Family-level inference

A B A B

A B

u

A B

u

P(m1|y) = 0.04 P(m2|y) = 0.25

P(m2|y) = 0.7 P(m2|y) = 0.01

� � � �1 1 max

0.3m

P e y P m y �

model selection error risk:

P(f2|y) = 0.95 P(f1|y) = 0.05

� � � �1 1 max

0.05f

P e y P f y �

family inference (pool statistical evidence)

� � � �m f

P f y P m y�

¦

Sampling subjects as marbles in an urn

10

i

i

mm

® ¯

→ ith marble is blue

→ ith marble is purple

→ (binomial) probability of drawing a set of n marbles:

� � � �11

1 ii

nmm

i

p m r r r �

��

Thus, our belief about the proportion of blue marbles is:

� � � � � �� 1

1

11

11 ii

n p r nmm

iii

p r m p r r r E r m mn

v�

v � � ª º ¬ ¼ ¦�

r = proportion of blue marbles in the urn

r

1m 2m nm…

Group-level model comparison

At least, we can measure how likely is the ith subject’s data under each model!

� � � � � � � �1

,n

i i ii

p r m y p r p y m p m r

v �

� �i ip y m � �n np y m� �1 1p y m � �2 2p y m

… …

r

1m 2m nm

ny2y1y

…

… � � � �,m

p r y p r m y ¦Our belief about the proportion of models is:

Exceedance probability: � �'k k k kP r r yM z !




9 SPM applications

realignment smoothing

normalisation

general linear model

template

Gaussian field theory

p <0.05

statistical inference

segmentation and normalisation

dynamic causal modelling

posterior probability maps (PPMs)

multivariate decoding

grey matter CSF white matter

…

…

yi ci O

Pk

P2

P1

V1 V 2 V k

class variances

class means

ith voxel value

ith voxel label

class frequencies

aMRI segmentation mixture of Gaussians (MoG) model

Decoding of brain images recognizing brain states from fMRI

+

fixation cross

>>

pace response

log-evidence of X-Y sparse mappings: effect of lateralization

log-evidence of X-Y bilateral mappings: effect of spatial deployment

fMRI time series analysis spatial priors and model comparison

PPM: regions best explained by short-term memory model

PPM: regions best explained by long-term memory model

fMRI time series

GLM coeff

prior variance of GLM coeff

prior variance of data noise

AR coeff (correlated noise)

short-term memory design matrix (X)

long-term memory design matrix (X)

m2 m1 m3 m4

V1 V5 stim

PPC

attention

V1 V5 stim

PPC

attention

V1 V5 stim

PPC

attention

V1 V5 stim

PPC

attention

m1 m2 m3 m4

15

10

5

0

V1 V5 stim

PPC

attention

1.25

0.13

0.46

0.39 0.26

0.26

0.10 estimated

effective synaptic strengths for best model (m4)

models marginal likelihood ln p y m� �

Dynamic Causal Modelling network structure identification

SPM: frequentist vs Bayesian RFX analysis

-2

-1

0

1

2

0 1M M0.05p �

-2

-1

0

1

2

0.05p !0 1M M -2

-1

0

1

2

0.05p !0 1M M

-2

-1

0

1

2

0 1M M

0.05p �

0 ?T

subjects

para

met

er e

stim

ates

I thank you for your attention.

A note on statistical significance lessons from the Neyman-Pearson lemma

• Neyman-Pearson lemma: the likelihood ratio (or Bayes factor) test

� ��

1

0

p y Hu

p y H/ t

is the most powerful test of size to test the null. � �0p u HD / t

MVB (Bayes factor) u=1.09, power=56%

CCA (F-statistics) F=2.20, power=20%

error I rate

1 - e

rror I

I rat

e

ROC analysis

• what is the threshold u, above which the Bayes factor test yields a error I rate of 5%?

an introduction to bayesian inference and model comparison · an introduction to bayesian inference...

Documents