bayesian models for fmri data methods & models for fmri data analysis november 2011 with many...

40
Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly Guillaume Flandin The Reverend Thomas Bayes (1702-1761) Klaas Enno Stephan Translational Neuromodeling Unit (TNU) Institute for Biomedical Engineering, University of Zurich & ETH Zurich Laboratory for Social & Neural Systems Research (SNS), University of Zurich Wellcome Trust Centre for Neuroimaging, University College London

Upload: jared-ray

Post on 29-Jan-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

Bayesian models for fMRI data

Methods & models for fMRI data analysis November 2011

With many thanks for slides & images to:

FIL Methods group, particularly Guillaume Flandin

The Reverend Thomas Bayes(1702-1761)

Klaas Enno Stephan

Translational Neuromodeling Unit (TNU)Institute for Biomedical Engineering, University of Zurich & ETH Zurich

Laboratory for Social & Neural Systems Research (SNS), University of Zurich

Wellcome Trust Centre for Neuroimaging, University College London

Page 2: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

Why do I need to learn about Bayesian stats?

Because SPM is getting more and more Bayesian:

• Segmentation & spatial normalisation

• Posterior probability maps (PPMs)– 1st level: specific spatial priors– 2nd level: global spatial priors

• Dynamic Causal Modelling (DCM)

• Bayesian Model Selection (BMS)

• EEG: source reconstruction

Page 3: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

Realignment Smoothing

Normalisation

General linear model

Statistical parametric map (SPM)Image time-series

Parameter estimates

Design matrix

Template

Kernel

Gaussian field theory

p <0.05

Statisticalinference

Bayesian segmentationand normalisation

Bayesian segmentationand normalisation

Spatial priorson activation extent

Spatial priorson activation extent

Posterior probabilitymaps (PPMs)

Posterior probabilitymaps (PPMs)

Dynamic CausalModelling

Dynamic CausalModelling

Page 4: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

p-value: probability of getting the observed data in the effect’s absence. If small, reject null hypothesis that there is no effect.

0

0

: 0

( | )

H

p y H

Limitations:One can never accept the null hypothesisGiven enough data, one can always demonstrate a

significant effectCorrection for multiple comparisons necessary

Solution: infer posterior probability of the effect

Probability of observing the data y, given no effect ( = 0).

)|( yp

Problems of classical (frequentist) statistics

Probability of the effect, given the observed data

Page 5: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

Overview of topics

• Bayes' rule

• Bayesian update rules for Gaussian densities

• Bayesian analyses in SPM– Segmentation & spatial normalisation

– Posterior probability maps (PPMs)• 1st level: specific spatial priors

• 2nd level: global spatial priors

– Bayesian Model Selection (BMS)

Page 6: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

Bayes‘ Theorem

Reverend Thomas Bayes1702 - 1761

“Bayes‘ Theorem describes, how an ideally rational person processes information."

Wikipedia

)(

)()|()|(

yp

pypyP

Likelihood

Prior

Evidence

Posterior

Page 7: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

)(

)()|()|(

yp

pypyP

Given data y and parameters , the conditional probabilities are:

)(

),()|(

yp

ypyp

)(

),()|(

p

ypyp

Eliminating p(y,) gives Bayes’ rule:

Likelihood

Prior

Evidence

Posterior

Bayes’ Theorem

Page 8: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

Bayesian statistics

)()|()|( pypyp posterior likelihood ∙ prior

)|( yp )(p

Bayes theorem allows one to formally incorporate prior knowledge into computing statistical probabilities.

Priors can be of different sorts:empirical, principled or shrinkage priors.

The “posterior” probability of the parameters given the data is an optimal combination of prior knowledge and new data, weighted by their relative precision.

new data prior knowledge

Page 9: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

Bayes in motion - an animation

Page 10: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

yy

Observation of data

likelihood p(y|)

prior distribution p()

likelihood p(y|)

prior distribution p()

Formulation of a generative model

Update of beliefs based upon observations, given a prior state of knowledge

( | ) ( | ) ( )p y p y p

Principles of Bayesian inference

Page 11: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

Likelihood & Prior

Posterior:

Posterior mean = variance-weighted combination of prior mean and data mean

Prior

Likelihood

Posterior

y

Posterior mean & variance of univariate Gaussians

p

),;()(

),;()|(2

2

pp

e

Np

yNyp

),;()|( 2 Nyp

ppe

pe

222

222

11

111

Page 12: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

Likelihood & prior

Posterior:

Prior

Likelihood

Posterior

Same thing – but expressed as precision weighting

p

),;()(

),;()|(1

1

pp

e

Np

yNyp

),;()|( 1 Nyp

ppe

pe

Relative precision weighting

y

Page 13: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

Likelihood & Prior

Posterior

)2(

Relative precision weighting

Prior

Likelihood

Posterior

)2()2()1(

)1()1(

y

Same thing – but explicit hierarchical perspective

)1(

)2()2(

)1()1(

)2()1(

)1()1( )/1,;()|(

Nyp

)/1,;()(

)/1,;()|()2()2()1()1(

)1()1()1(

Np

yNyp

Page 14: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

Bayesian regression: univariate case

Relative precision weighting

Normal densities

exy

ppe

yy

pey

yx

x

222||

22

2

2|

1

11

x

Univariatelinear model

),;()( 2ppNp

),;()|( 2exyNyp

),;()|( 2|| yyNyp

p

y|

Page 15: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

One step if Ce is known.Otherwise iterative estimation with EM.

GeneralLinear Model

Bayesian GLM: multivariate caseNormal densities eXθy

),;()( ppNp Cηθθ

),;()|( eNp CXθyθy

),;()|( || yyNyp Cηθθ

ppeT

yy

peT

y

ηCyCXCη

CXCXC1

||

111|

2

1

Page 16: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

An intuitive example

-10 -5 0 5 10

-10

-5

0

5

10

1

2

PriorLikelihoodPosterior

Page 17: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

Still intuitive

-10 -5 0 5 10

-10

-5

0

5

10

1

2

PriorLikelihoodPosterior

Page 18: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

Less intuitive

-10 -5 0 5 10

-10

-5

0

5

10

1

2

PriorLikelihoodPosterior

Page 19: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

Bayesian analyses in SPM8

• Segmentation & spatial normalisation

• Posterior probability maps (PPMs)– 1st level: specific spatial priors– 2nd level: global spatial priors

• Dynamic Causal Modelling (DCM)

• Bayesian Model Selection (BMS)

• EEG: source reconstruction

Page 20: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

Spatial normalisation: Bayesian regularisation

Deformations consist of a linear combination of smooth basis functions

lowest frequencies of a 3D discrete cosine transform.

Find maximum a posteriori (MAP) estimates: simultaneously minimise

– squared difference between template and source image – squared difference between parameters and their priors

)(log)(log)|(log)|(log yppypyp MAP:

MAP:

Deformation parametersDeformation parameters

“Difference” between template and source image

“Difference” between template and source image

Squared distance between parameters and their expected values

(regularisation)

Squared distance between parameters and their expected values

(regularisation)

Page 21: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

Templateimage

Affine registration.(2 = 472.1)

Non-linearregistration

withoutregularisation.

(2 = 287.3)

Non-linearregistration

usingregularisation.(2 = 302.7)

Without regularisation, the non-linear spatial normalisation can introduce unnecessary warps.

Spatial normalisation: overfitting

Page 22: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

Bayesian segmentation with empirical priors

•Goal: for each voxel, compute probability that it belongs to a particular tissue type, given its intensity

• Likelihood model: Intensities are modelled by a mixture of Gaussian distributions representing different tissue classes (e.g. GM, WM, CSF).

• Priors are obtained from tissue probability maps (segmented images of 151 subjects).

•Goal: for each voxel, compute probability that it belongs to a particular tissue type, given its intensity

• Likelihood model: Intensities are modelled by a mixture of Gaussian distributions representing different tissue classes (e.g. GM, WM, CSF).

• Priors are obtained from tissue probability maps (segmented images of 151 subjects).

Ashburner & Friston 2005, NeuroImage

p (tissue | intensity) p (intensity | tissue) ∙ p (tissue)

Page 23: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

XyGeneral Linear Model:

What are the priors?

),0(~ CNwith

• In “classical” SPM, no priors (= “flat” priors)

• Full Bayes: priors are predefined on a principled or empirical basis

• Empirical Bayes: priors are estimated from the data, assuming a hierarchical generative model PPMs in SPM

Parameters of one level = priors for distribution of parameters at lower levelParameters and hyperparameters at each level can be estimated using EM

Bayesian fMRI analyses

Page 24: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

Hierarchical models and Empirical Bayes

)()()()1(

)2()2()2()1(

)1()1()1(

nnnn X

X

Xy

Hierarchicalmodel

Hierarchicalmodel

ParametricEmpirical

Bayes (PEB)

ParametricEmpirical

Bayes (PEB)

EM = PEB = ReMLEM = PEB = ReML

RestrictedMaximumLikelihood

(ReML)

RestrictedMaximumLikelihood

(ReML)

Single-levelmodel

Single-levelmodel

)()()1(

)()1()1(

)2()1()1(

...

nn

nn

XXXX

Xy

Page 25: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

Posterior Probability Maps (PPMs)

)|( yp )|( yp

Posterior distribution: probability of the effect given the dataPosterior distribution: probability of the effect given the data

Posterior probability map: images of the probability (confidence) that an activation exceeds some specified threshold, given the data y

Posterior probability map: images of the probability (confidence) that an activation exceeds some specified threshold, given the data y

)|( yp

Two thresholds:• activation threshold : percentage of whole brain mean

signal (physiologically relevant size of effect)• probability that voxels must exceed to be displayed (e.g.

95%)

Two thresholds:• activation threshold : percentage of whole brain mean

signal (physiologically relevant size of effect)• probability that voxels must exceed to be displayed (e.g.

95%)

mean: size of effectprecision: variability

mean: size of effectprecision: variability

Page 26: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

PPMs vs. SPMs

Likelihood PriorPosterior

SPMsSPMs

PPMsPPMs

u

)(yft )0|( utp )|( yp

)()|()|( pypyp

Bayesian test:Bayesian test: Classical t-test:Classical t-test:

Page 27: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

2nd level PPMs with global priors

In the absence of evidenceto the contrary, parameters

will shrink to zero.

In the absence of evidenceto the contrary, parameters

will shrink to zero.

)1()1()1( Xy1st level (GLM):

2nd level (shrinkage prior):

),0()( CNp

)2(

)2()2()1(

0

),0()( CNp

)(p

0

Basic idea: use the variance of over voxels as prior variance of at any particular voxel.

2nd level: (2) = average effect over voxels, (2) = voxel-to-voxel variation.

(1) reflects regionally specific effects assume that it sums to zero over all voxels shrinkage prior at the second level variance of this prior is implicitly estimated by estimating (2)

Page 28: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

Shrinkage Priors Small & variable effect Large & variable effect

Small but clear effect Large & clear effect

Page 29: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

2nd level PPMs with global priors

)1( Xy

1st level (GLM):

2nd level (shrinkage prior):

),0()( CNp

)2(0 ),0()( CNp

Once Cε and C are known, we can apply the usual rule for computing the posterior mean & covariance:

yCXCm

CXCXCT

yy

Ty

1||

111|

We are looking for the same effect over multiple voxels

Pooled estimation of C over voxels

voxel-specific

global pooled estimate

Friston & Penny 2003, NeuroImage

Page 30: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

PPMs and multiple comparisons

No need to correct for multiple comparisons:

Thresholding a PPM at 95% confidence: in every voxel, the posterior probability of an activation is 95%.

At most, 5% of the voxels identified could have activations less than .

Independent of the search volume, thresholding a PPM thus puts an upper bound on the false discovery rate.

Page 31: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

PPMs vs.SPMsSPMm

ip

[0, 0, 0]

<

< <

PPM2.06

rest [2.06]

SPMresults:C:\home\spm\analysis_PET

Height threshold P = 0.95

Extent threshold k = 0 voxels

Design matrix1 4 7 10 13 16 19 22

147

1013161922252831343740434649525560

contrast(s)

4

SPMm

ip

[0, 0, 0]

<

< <

SPM{T39.0

}

rest

SPMresults:C:\home\spm\analysis_PET

Height threshold T = 5.50

Extent threshold k = 0 voxels

Design matrix1 4 7 10 13 16 19 22

147

1013161922252831343740434649525560

contrast(s)

3

PPMs: Show activations greater than a given size

PPMs: Show activations greater than a given size

SPMs: Show voxels with non-zero

activations

SPMs: Show voxels with non-zero

activations

Page 32: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

PPMs: pros and cons

• One can infer that a cause did not elicit a response

• Inference is independent of search volume

• SPMs conflate effect-size and effect-variability

• One can infer that a cause did not elicit a response

• Inference is independent of search volume

• SPMs conflate effect-size and effect-variability

DisadvantagesDisadvantagesAdvantagesAdvantages

• Estimating priors over voxels is computationally demanding

• Practical benefits are yet to be established

• Thresholds other than zero require justification

• Estimating priors over voxels is computationally demanding

• Practical benefits are yet to be established

• Thresholds other than zero require justification

Page 33: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

1st level PPMs with local spatial priors

• Neighbouring voxels often not independent

• Spatial dependencies vary across the brain

• But spatial smoothing in SPM is uniform

• Matched filter theorem: SNR maximal when smoothing the data with a kernel which matches the smoothness of the true signal

• Basic idea: estimate regional spatial dependencies from the data and use this as a prior in a PPM regionally specific smoothing markedly increased sensitivity

Contrast map

AR(1) map

Penny et al. 2005, NeuroImage

Page 34: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

b

A

q1 q2

a

W

Y

1

1 2; ,

K

kk

k k

p p

p Ga q q

α

1

11; ,

KTk

k

T T Tk k k

p p

p N

W w

w w 0 S S

u1 u2

l

1

1 2; ,

N

nn

n n

p p

p Ga u u

λ

1

1 1

( ) ( )

( ) ; , ( )

P

pp

Tp p p

p p

p N

A a

a a 0 S S

Y=XW+E

r1 r2

1

1 2

( ) ( )

( ) ( ; , )

P

pp

p p

p p

p Ga r r

β

The generative spatio-temporal model

Penny et al. 2005, NeuroImage

= spatial precision of parameters = observation noise precision = precision of AR coefficients

observation noise

regressioncoefficients

autoregressive parameters

Page 35: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

11,0; SSwNwp T

kTk

Tk

Prior for k-th parameter:

Shrinkage prior

Spatial kernel matrix

Spatial precision: determines the

amount of smoothness

The spatial prior

Different choices possible for spatial kernel matrix S.

Currently used in SPM: Laplacian prior (same as in LORETA)

Page 36: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

Smoothing

Global prior Laplacian Prior

Example: application to event-related fMRI data

Contrast maps for familiar vs. non-familiar faces, obtained with

- smoothing- global spatial prior- Laplacian prior

Page 37: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

Bayesian model selection (BMS)

Given competing hypotheses on structure & functional mechanisms of a system, which model is the best?

For which model m does p(y|m) become maximal?

Which model represents thebest balance between model fit and model complexity?

Pitt & Miyung (2002), TICS

Page 38: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

dmpmypmyp )|(),|()|( Model evidence:

Various approximations, e.g.:- negative free energy- AIC- BIC

Penny et al. (2004) NeuroImage

Bayesian model selection (BMS)

)|(

)|(

2

1

myp

mypBF

Model comparison via Bayes factor:

)|(

)|(),|(),|(

myp

mpmypmyp

Bayes’ rules:

accounts for both accuracy and complexity of the model

allows for inference about structure (generalisability) of the model

Page 39: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

Example: BMS of dynamic causal models

modulation of back-ward or forward connection?

additional drivingeffect of attentionon PPC?

bilinear or nonlinearmodulation offorward connection?

V1 V5stim

PPCM2

attention

V1 V5stim

PPCM1

attention

V1 V5stim

PPCM3attention

V1 V5stim

PPCM4attention

BF = 2966

M2 better than M1

M3 better than M2

BF = 12

M4 better than M3

BF = 23

Stephan et al. (2008) NeuroImage

Page 40: Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly

Thank you