model-driven statistical analysis of fmri data

Model-driven statistical analysis of fMRI data

Keith Worsley

Department of Mathematics and Statistics,

Brain Imaging Centre, Montreal Neurological Institute, McGill University

www.math.mcgill.ca/keith

References

• Worsley et al. (2002). A general statistical analysis for fMRI data. NeuroImage, 15:1-15.

• Liao et al. (2002). Estimating the delay of the response in fMRI data. NeuroImage, 16:593-606.

• FMRISTAT: MATLAB package from www.math.mcgill.ca/keith/fmristat

0

500

1000First scan of fMRI data

-5

0

5

T statistic for hot - warm effect

0 100 200 300

870880890 hot

restwarm

Highly significant effect, T=6.59

0 100 200 300

800

820hotrestwarm

No significant effect, T=-0.74

0 100 200 300

790800810

Drift

Time, seconds

fMRI data: 120 scans, 3 scans each of hot, rest, warm, rest, hot, rest, …

T = (hot – warm effect) / S.d. ~ t110 if no effect

0 20 40 60 80 100 120

4

3

2

1

Component

Frame

Temporal components (sd, % variance explained)

105.7, 77.8%

26.1, 4.8%

15.8, 1.7%

14.8, 1.5%

Slice

Component

Spatial components

0 2 4 6 8 10

1

2

3

4

Exploring the data: PCA of time space

1: excludefirst frames

2: drift

3: long-range correlationor anatomicaleffect: removeby converting to % of brain

4: signal?

Modeling the data: Choices …• Time domain / frequency domain?• AR / ARMA / state space models?• Linear / non-linear time series model?• Fixed HRF / estimated HRF?• Voxel / local / global parameters?• Fixed effects / random effects?• Frequentist / Bayesian?

Compromise:Simple, general, valid, robust, fast statistical analysis

0 50 100 150 200 250 300 350-1

0

1

2Alternating hot and warm stimuli separated by rest (9 seconds each).

hotwarm

hotwarm

0 50-0.2

0

0.2

0.4

Hemodynamic response function: difference of two gamma densities

0 50 100 150 200 250 300 350-1

0

1

2Responses = stimuli * HRF, sampled every 3 seconds

Time, seconds

Covariates example: pain perception

Linear model for fMRI time series with AR(p) correlated errors

• Linear model: ? ? Yt = (stimulust * HRF) b + driftt c + errort

• AR(p) errors: ? ? ? errort = a1 errort-1 + … + ap errort-p + s WNt

‘White Noise’

unknown parameters

-0.1

0

0.1

0.2

0.3

First step: estimate the autocorrelationAR(1) model: errort = a1 errort-1 + s WNt

• Fit the linear model using least squares

• errort = Yt – fitted Yt

• â1 = Correlation ( errort , errort-1)

• Estimating errort’s changes their correlation structure slightly, so â1 is slightly biased:

Raw autocorrelation Smoothed 15mm Bias corrected â1

~ -0.05 ~ 0~ -0.05 ~ 0

?

-1

-0.5

0

0.5

1 Hot - warm effect, %

0

0.05

0.1

0.15

0.2

0.25Sd of effect, %

-6

-4

-2

0

2

4

6 T = effect / sd, 110 df

Pre-whiten: Yt* = Yt – â1 Yt-1, then refit using least squares:

Second step: pre-whiten, refit the linear model

T > 4.93 (P < 0.05, corrected)

a1

a2

-0.1

0

0.1

0.2

0.3

a3

AR(1) AR(2)

-5

0

5

AR(3)

Higher order AR model? Try AR(3):

… has little effect on the T statistics:No correlation

biases T up ~12% more false positives

AR(1) seemsto be adequate

Results from 4 runs on the same subject

-1

0

1 Run 1 Run 2 Run 3 Run 4

Effect, E

i

0

0.1

0.2 Sd, S

i

-5

0

5

T stat, E

i / S

i

Mixed effects linear model for combining effects from different

runs/sessions/subjects:

• Ei = effect for run/session/subject i

• Si = standard error of effect

• Mixed effects model:

Ei = covariatesi c + Si WNiF + WNi

R

Random effect,due to variability from run to run

‘Fixed effects’ error,due to variabilitywithin the same run

Usually 1, but could add group,treatment, age,sex, ...

}from

Lin. Mod.

? ?

REML estimation using the EM algorithm

• Slow to converge (10 iterations by default).• Stable (maintains estimate 2 > 0 ), but2 biased if 2 (random effect) is small, so:• Re-parameterize the variance model:

Var(Ei) = Si2 + 2

= (Si2 – minj Sj

2) + (2 + minj Sj2)

= Si*2 + *2 2 = *2 – minj Sj

2 (less biased estimate)^ ^

^

?

?

^

Run 1 Run 2 Run 3 Run 4

Effect, E i

Sd, S

i

T stat, E i / S i

-1

0

1 MULTISTAT

0

0.1

0.2

-5

0

5

Problem: 4 runs, 3 df for random effects sd ...

… and T>15.96 for P<0.05 (corrected):

… very noisy sd:

… so no response is detected …

• Basic idea: increase df by spatial smoothing (local pooling) of the sd.

• Can’t smooth the random effects sd directly, - too much anatomical structure.

• Instead,

random effects sd

fixed effects sd

which removes the anatomical structure before smoothing.

Solution: Spatial regularization of the sd

sd = smooth fixed effects sd )

Random effects sd, 3 dfFixed effects sd, 440 df

0

0.05

0.1

0.15

0.2

Mixed effects sd, ~100 df

Random sd / fixed sd

0.5

1

1.5Smoothed sd ratio

randomeffect, sdratio ~1.3

divide multiply

^ Average Si

dfratio = dfrandom(2 + 1)1 1 1

dfeff dfratio dffixed

Effective df depends on smoothing

FWHMratio2 3/2

FWHMdata2

= +

e.g. dfrandom = 3, dffixed = 4 110 = 440, FWHMdata = 8mm:

0 20 40 Infinity0

100

200

300

400

FWHMratio

dfeff

random effectsanalysis, dfeff = 3

fixed effects analysis, dfeff = 440

Target = 100 df FWHM = 19mm

Why 100?If out by 50%,dbn of T notmuch affected


Effect, E i

Sd, S

i

T stat, E i / S i

-1

0

1 MULTISTAT

0

0.1

0.2

-5

0

5

Final result: 19mm smoothing, 100 effective df …

… less noisy sd:

… and T>4.93 for P<0.05 (corrected):

… and now we can detect a response!

P-values assessed for:• Peaks or local maxima• Spatial extent of clusters of neighbouring voxels

above a pre-chosen threshold (~3)

• Correct for searching over a pre-specified region (usually the whole brain), which depends on:– number of voxels in the search region (Bonferroni) or

– number of resels = volume / FWHM3 in the search region (random field theory)

– in practice, take the minimum of the two!

FWHM is spatially varying (non-isotropic)

• fMRI data is smoother in GM than WM• VBM data is highly non-isotropic

• Has little effect on P-values for local maxima (use ‘average’ FWHM inside search region), but

• Has a big effect on P-values for spatial extents: smooth regions → big clusters, rough regions → small clusters, so

• Replace cluster volume by cluster resels = volume / FWHM3

FWHM – the local smoothness of the noise

FWHM = (2 log 2)1/2 voxel size(1 – correlation)1/2

(If the noise is modeled as white noise smoothed with a Gaussian kernel, this would be its FWHM)

resels = VolumeFWHM3

0 500 10000

0.02

0.04

0.06

0.08

0.1

Resels of search volume

P v

alue

of l

ocal

max

Local maximum T = 4.5

0 0.5 1 1.5 20

0.02

0.04

0.06

0.08

0.1

Resels of cluster

P v

alue

of c

lust

er

Clusters above t = 3.0, search volume resels = 500

P-values depend on resels:

0

5

10

15

20FWHM (mm) of scans (110 df)

0

5

10

15

20FWHM (mm) of effects (3 df)

0

5

10

15

20FWHM of effects (smoothed)

0.5

1

1.5effects / scans FWHM (smoothed)

Resels=1.90P=0.007

Resels=0.57P=0.387

Statistical summary: clusters clus vol resel p-val (one)• 1 33992 54.22 0 ( 0) • 2 14150 25.03 0 ( 0) • 3 12382 20.29 0 ( 0) • 4 2538 3.12 0.011 (0.001) • 5 2538 2.77 0.016 (0.001) • 6 1577 2.15 0.035 (0.002) • 7 1000 1.43 0.098 (0.006) • 8 500 1.31 0.119 (0.007) • 9 1000 1.07 0.179 (0.011) • 10 385 0.99 0.208 (0.013)• •

Statistical summary: peaks• clus peak p-val (one) q-val (i j k) ( x y z )• 1 12.72 0 ( 0) 0 (59 74 1) ( 10.5 -28.7 24.1)• 1 12.58 0 ( 0) 0 (60 75 1) ( 8.2 -31 23.7)• 1 11.45 0 ( 0) 0 (61 73 2) ( 5.9 -25.3 17.5)• 1 11.08 0 ( 0) 0 (62 66 4) ( 3.5 -6.9 6.3)• 1 10.95 0 ( 0) 0 (61 70 4) ( 5.9 -16.2 4.8)• 1 10.6 0 ( 0) 0 (62 69 3) ( 3.5 -15 12.1)• • • • 2 5.07 0.029 (0.004) 0 (48 69 10) ( 36.3 -7.3 -36.3)• 3 5.06 0.029 (0.004) 0 (73 72 9) (-22.3 -15.3 -30.5)• 3 5.03 0.033 (0.004) 0 (81 63 10) ( -41 6.6 -34.1)• 13 5.02 0.035 (0.005) 0 (88 72 8) (-57.4 -16.4 -23.6)• 6 4.91 0.054 (0.007) 0 (42 69 3) ( 50.4 -15 12.1)• 11 4.91 0.055 (0.007) 0 (69 70 7) (-12.9 -12.9 -15.9)• 9 4.91 0.055 (0.007) 0 (48 46 5) ( 36.3 40.5 6.7)• 1 4.85 0.069 (0.008) 0 (52 93 2) ( 27 -71.6 10.2)• 3 4.82 0.08 (0.009) 0 (79 66 8) (-36.3 -2.5 -21.4)• 3 4.81 0.082 (0.009) 0 (78 65 8) ( -34 -0.2 -21)• 1 4.8 0.086 ( 0.01) 0 (62 59 5) ( 3.5 10.4 1.9)• 3 4.77 0.097 (0.011) 0 (82 61 10) (-43.4 11.2 -33.4)• 1 4.75 0.106 (0.012) 0 (55 71 2) ( 19.9 -20.7 18.3)• 5 4.73 0.114 (0.012) 0 (67 84 2) ( -8.2 -50.8 13.5)• •

T>4.86

T>4.86

T > 4.93 (P < 0.05, corrected)

T>4.86

Efficiency : optimum block design

0

0.1

0.2

0.3

0.4

0.5InterStimulus Interval (secs)

Sd of hot stimulus

X

5 10 15 200

5

10

15

20

0

0.1

0.2

0.3

0.4

0.5Sd of hot-warm

X5 10 15 20

0

5

10

15

20

0

0.2

0.4

0.6

0.8

1 (secs)

5 10 15 20

5

10

15

20

0

0.2

0.4

0.6

0.8

1

Stimulus Duration (secs)

(secs)

5 10 15 200

5

10

15

20

Optimumdesign

Optimum designX

Optimumdesign

Optimum designX

Magnitude

Delay

(Not enough signal)(Not enough signal)

5 10 15 200

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Average time between events (secs)

Sd of effect (secs for delays)

uniform . . . . . . . . .random .. . ... .. .

concentrated :

Efficiency : optimum event design

____ magnitudes ……. delays

(Not enough signal)

How many subjects?

• Largest portion of variance comes from the last stage i.e. combining over subjects:

sdrun2 sdsess

2 sdsubj2

nrun nsess nsubj nsess nsubj nsubj

• If you want to optimize total scanner time, take more subjects.

• What you do at early stages doesn’t matter very much!

+ +

References




-5 0 5 10 15 20 25-0.4

-0.2

0

0.2

0.4

0.6

t (seconds)

Estimating the delay of the response• Delay or latency to the peak of the HRF is approximated by a linear combination of two optimally chosen basis functions:

HRF(t + shift) ~ basis1(t) w1(shift) + basis2(t) w2(shift)

• Convolve bases with the stimulus, then add to the linear model

basis1 basis2HRF

shift

delay

-5 0 5-3

-2

-1

0

1

2

3

shift (seconds)

• Fit linear model, estimate w1 and w2

• Equate w2 / w1 to estimates, then solve for shift (Hensen et al., 2002)

• To reduce bias when the magnitude is small, use

shift / (1 + 1/T2)

where T = w1 / Sd(w1) is the T statistic for the magnitude

• Shrinks shift to 0 where there is little evidence for a response.

w1

w2

w2 / w1

-6

-4

-2

0

2

4

6

-6

-4

-2

0

2

4

6

-4

-2

0

2

4

0

0.5

1

1.5

2

Shift of the hot stimulusT stat for magnitude T stat for shift

Shift (secs) Sd of shift (secs)

-6

-4

-2

0

2

4

6

-6

-4

-2

0

2

4

6

-4

-2

0

2

4

0

0.5

1

1.5

2

Shift of the hot stimulusT stat for magnitude T stat for shift

Shift (secs) Sd of shift (secs)

~1 sec +/- 0.5 sec

T>4

T~2


Effect, E

i

Sd, S i

T stat, E i / S i

-4

-2

0

2

4 MULTISTAT

0

1

2

-5

0

5

Combining shifts of the hot stimulus(Contours are T stat for magnitude > 4)

Shift (secs)

Shift of the hot stimulus

T stat for magnitude > 4.93

References




False Discovery Rate (FDR)Benjamini and Hochberg (1995), Journal of the Royal Statistical Society

Benjamini and Yekutieli (2001), Annals of StatisticsGenovese et al. (2001), NeuroImage

• FDR controls the expected proportion of false positives amongst the discoveries, whereas

• Bonferroni / random field theory controls the probability of any false positives

• No correction controls the proportion of false positives in the volume

-4

-2

0

2

4

-4

-2

0

2

4

-4

-2

0

2

4

-4

-2

0

2

4

Noise

P < 0.05 (uncorrected), T > 1.645% of volume is false +

FDR < 0.05, T > 2.825% of discoveries is false +

P < 0.05 (corrected), T > 4.225% probability of any false +

Signal + Gaussian white noise

False +

True +Signal

• FDR depends on the ordered P-values: P1 < P2 < … < Pn. To control the FDR at a = 0.05, find K = max {i : Pi < (i/n) a}, threshold the P-values at PK

Proportion of true + 1 0.1 0.01 0.001 0.0001 Threshold T 1.64 2.56 3.28 3.88 4.41

• Bonferroni thresholds the P-values at a/n: Number of voxels 1 10 100 1000 10000 Threshold T 1.64 2.58 3.29 3.89 4.42

• Random field theory: resels = volume / FHHM3: Number of resels 0 1 10 100 1000 Threshold T 1.64 2.82 3.46 4.09 4.65

Comparison of thresholds

P < 0.05 (uncorrected), T > 1.645% of volume is false +

FDR < 0.05, T > 2.675% of discoveries is false +

P < 0.05 (corrected), T > 4.935% probability of any false +

-6

-4

-2

0

2

4

6

-6

-4

-2

0

2

4

6

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

Conjunction: Minimum Ti > threshold‘Minimum of Ti’ ‘Average of Ti’

For P=0.05,threshold = 1.82

For P=0.05,threshold = 4.93

Efficiency = 82%

Functional connectivity• Measured by the correlation between residuals at

every pair of voxels (6D data!)

• Local maxima are larger than all 12 neighbours• P-value can be calculated using random field theory• Good at detecting focal connectivity, but• PCA of residuals x voxels is better at detecting large

regions of co-correlated voxels

Voxel 2

Voxel 1

++ +

+++

Activation onlyVoxel 2

Voxel 1++

+

+

+

+

Correlation only

First Principal Component > threshold

|Correlations| > 0.7,P<10-10 (corrected)

model-driven statistical analysis of fmri data

Documents

fixed effects random

dffixed effects sd

mixed effects sd

errors linear model

autocorrelationar1 model

variance model

smoothed sd

runfixed effects error