spm-like statistical analysis of fmri data -...

SPM-Like Statistical Analysis of fMRI Data

Part II: Between-Subject Analysis

Alexis Roche

JIRFNI 2009, Marseille

2

Outline

1. Types of inferences in fMRI

2. Parametric tests

3. Nonparametric tests

4. Robust statistics

5. Mixed-effect models

6. Sphericity correction

3

Outline


2. Parametric tests





4

Group Analysis Goal

● Which functional networks are commonly activated in response to a given task?

5

Processing pipeline

AnatomicalTemplate

Subject 1

Spatialsmoothing

Nonrigidmatching

Subject 2

Spatialsmoothing

Nonrigidmatching

… …

BOLDestimation

Contrast images

BOLDestimation

Group analysis input

JIRFNI 2009, Marseille26/05/0906/10/07

6

Mass univariate one-sample analysis

Subject 1

…

H0: “mean effect

is zero”

Normalized contrast images

Observations in one voxel

Subject 2

Subject 3

…

7

What kind of analysis?

● Random effect

✗ (H0) “the mean effect vanishes in the population of interest”

● Fixed effect

✗ (H0) “the mean effect vanishes in this particular cohort”

8

FFX vs. RFX

Subj. 1

Subj. 2

Subj. 3

Subj. 4

Subj. 5

Subj. 60

Distribution of each subject’s BOLD estimate

Distribution of the mean estimate

σFFX

σRFX

FFX: very significant

RFX: bordeline

Source: T. Nichols


9

Inferences in fMRI

Type Fixed-effect(FFX)

Random-effect(RFX)

Single-subject Multi-subject Multi-subject

Goal Study functionalactivity in one

particular subject

Study functionalactivity in one

particular cohort

Study functionalactivity in apopulation

10

Outline


2. Parametric tests






11

One-sample t statistic

● Divide the sample mean by its standard error

=136.91 , std = 2/n=20.37

t= std

Source: www.statsci.org

Rainfalls in Sydney


12

One-sample parametric t-test

● t follows a Student distribution if the observations are normal with zero mean

● Hence reject the null hypothesis of zero mean if the observed t is too large

0 tobs

pvalue=P t≥t obs∣H 0

Student distribution with n-1 degrees of freedom

Reject H0

pvalue≤

Accept H0

yes

no

: accepted false positive rate


13

General parametric t-test

● The t-test is designed to test a contrasted effect in a general linear model

● Used in

✗ Within-subject analyses

✗ FFX analyses

✗ Other RFX analyses

● Generalization to multidimensional contrasts: F-test

H 0 c t=0, Y=X N 0, I


14

Shortcomings

● The parametric t-test assumes normal observations

● If normality doesn't hold,

✗ Biased specificity (false positive rate)

✗ Lack of sensitivity (true positive rate)

● This is a problem for small samples

✗ The Student distribution holds in the limit of large samples even if normality doesn't


15

Specificity bias

● Samples of size 10 drawn from a uniform law

Ground truthStudentPermutations

Specificity bias in multiple testing

● Parametric multiple comparison correction usually uses the Euler Characteristic (EC) approximation

● Validity requirements

✗ Multivariate normal assumption

✗ High threshold

✗ Smooth data

● May produce severe bias!

FWER≈E u∣H 0


17

Lack of sensitivity

● The t statistic is not robust

✗ One single observation can cause t to collapse

t = Inf t = 0

● Outliers may occur more frequently than predicted by the normal distribution


18

Dealing with outliers

● Invalid approach

✗ Drop outliers and run the parametric t test

t = Inf ; P=0. Invalid because statistical independence is broken.

Sign test gives P=17%. Unequal proportions may be observed by chance!

● Valid approach

✗ Use a robust test statistic combined with a permutation test that works with all datapoints

19

Outline


2. Parametric tests






20

Beyond parametric tests

● Specificity bias calls for nonparametric calibration schemes

✗ Permutation tests

✗ Bootstrap tests

● Lack of sensitivity calls for robust test statistics

● Both approaches require relaxing normality


21

Sign permutation test

Observed t

P-value

Histogram of the simulated statistics

...

Original sample

t = 3.76

One observation permuted

t = 1.36

Two observations

permuted

t = 1.25

All observations

permuted

t =-3.76


22

Familywise error correction

Observed t

Corrected P-value

Histogram of the tmax statistics ...

Original sample

tmax = 4.95

t map

... tmax = 4.63

t mapOne subject permuted

…... tmax = 4.17

t mapTwo subjects permuted

... tmax = 2.93

t mapAll subjects permuted


23

Sign permutation test

● Conservative (exact in a conditional sense)

● Works under mild assumptions

✗ One-sided tests require distribution symmetry (more general than normality)

● Manages multiple comparisons

✗ Alternative to random field theory

Distancewww.madic.org/download/

SnPMwww.sph.umich.edu/ni-stat/SnPM/


24

Other permutation tests

● Sign permutations are suited for one-sample RFX

● Two-sample RFX

✗ Shuffle group labels

● Within-subject or FFX

✗ Shuffle experimental conditions

CamBAwww-bmu.psychiatry.cam.ac.uk/software/

SnPMwww.sph.umich.edu/ni-stat/SnPM/


25

Bootstrap test

● Similar to the permutation test, except it uses another resampling method

Subtract the mean

Draws with replacement

…


26

Bootstrap test

● Approximate (not necessarily conservative)

● Asymptotically exact under weaker assumptions than the permutation test

● Manages multiple comparisons

27

Outline


2. Parametric tests






28

Robust test statistics

● The t statistic has optimal detection power for normally distributed data

● Other statistics may perform better in distributions that tend to produce outliers

✗ Median

✗ Sign statistic

✗ Wilcoxon signed rank

✗ Empirical likelihood ratio

✗ ...


29

ROC curves● Detection power in a normal distribution (sampling

10 subjects)

Student statisticWilcoxon statistic


30

ROC curves● Detection power in a Laplace distribution (sampling

10 subjects)



31

ROC curves● Detection power in a heavy-tailed distribution

(sampling 10 subjects)



32

Median

● Highly robust location estimator

✗ Resistant to contamination by (n-1)/2 outliers

median = 0.27 median = 0.27


33

Sign statistic

● Number of positive observations

● Similar to the median

● Permutation distribution data-independent (Binomial)

t s=Card {i ,i0}

Binomial law for 10 subjects


34

Wilcoxon signed rank

tw=∑irank ∣i∣ signi

10 subjects

● More sensitive than the sign statistic in moderate-tailed distributions

● Permutation distribution is also data-independent


35

Iterated reweighted least squares

● Algorithm sketch

C =∑i wii−2

Weight each observation according to its “closeness” to the current mean estimate

Update the estimate by minimizing the weighted least squares

Start with a robust mean estimate (e.g. the median)

● From there, compute a robust t statistic by analogy with ordinary least squares

● Not evaluated yet within permutation tests


36

Speech activation in three-month-olds

Cluster level:

Pcorr. = 0.34

Cluster level:

Pcorr. = 0.08

Cluster level:

Pcorr. = 0.02

✔ 10 subjects✔ Single threshold tests at P=0.01 (uncorrected)

Parametric t test (SPM) Permutation t test Wilcoxon test

Courtesy of Ghislaine Dehaene

Broca’s area

37

Outline


2. Parametric tests





Mixed effects

● Observed effects have two distinct variability sources

✗ Within-subject (errors in first-level analysis)

✗ Between-subject (intrinsic variability of BOLD responses)

● Both sources can produce outliers

✗ Artifactual outliers

✗ Behavioral outliers

● Mixed-effect statistics are robust to artifactual outliers


39

Two-level linear model

● Accounts for mixed effects

First level

βG

β1

βi

βn

Y1

Yi

Yn

Unknown populationmean effect

Unknown individualeffects Observed fMRI data

Second level

i=X GG Y i=X i ii

● Equivalent to a generalized linear model with heteroscedastic noise


40

Two-level linear model

● Full mixed-effect approach

✗ Estimate βG based on the raw scans

✗ Implemented in FSL, SPM, ...

● Simplification: sequential approach

✗ Run first-level analyses, then estimate βG based on non-exhaustive summary statistics

✗ Implemented in fMRIstat/NiPy, Distance, ...


41

Mixed-effect statistics

● Exploit the first-level variances of the observed effects

● The higher the variance, the less reliable a subject

Subj 1

First-level summary statistics

Subj 2

…

i

Test statistic

std i


42

Expectation-Maximization algorithm

● Estimate the population mean βG and variance σ

G

from the (effect+variance) observations

E-step

Estimate the true effects and their posterior variances,

M-step

Update the population mean and variance estimates,

● Then, form a mixed-effect version of the t statistic

● Many variants exist

i=G

2

G2 i

2i

i2

G2 i

2 G , i2=

G2 i

2

G2 i

2 G=1n∑i

i , G2 =

1n∑i

[i2G−i

2]


43

FIAC’05 dataset✔ 15 subjects✔ Sentence x Speaker interaction✔ Single threshold tests at P=0.01 (uncorrected)

Parametric t test (SPM) Permutation t test Permutation MFX test

Right Superior Temporal Sulcus

Cluster level:

Pcorr. = 0.13

Cluster level:

Pcorr. = 0.09

Cluster level:

Pcorr. = 0.02


44

Two-sample testing

● Mixed-effect t statistic derived similarly to the one-sample case

● Permutation test by label switching


45

Localizer dataset✔ Two-sample test: females > males (10 subjects in each group)✔ Mental calculus task✔ Single threshold tests at P=0.01 (uncorrected)

Parametric t test (SPM) Permutation t test Permutation MFX test

Left Intra-Parietal Sulcus

Cluster level:

Pcorr. = 0.01

Cluster level:

Pcorr. = 0.13

Cluster level:

Pcorr. = 0.09

46

Outline


2. Parametric tests





47

Linear models for group

● Spherical GLM: linear prediction + i.i.d. errors

Y=X , ~N 0,2 In

Xβ

48

The issue of sphericity

“Since the data obtained in many areas of psychological inquiry are not likely to conform to these requirements … researchers using the conventional procedure will erroneously claim treatment effects when none are present, thus filling their literatures with false positive claims.” – Keselman et al. 2001

49

Examples of non-spherical models

● Two-sample analysis under unequal variance

● Mixed effects

● Repeated measures ANOVA

50

Two-sample model1st level:

2nd level:

Controls Blinds

XV

Source: W. Penny

51

One-sample mixed-effect model

Subject 1

Subject 2

...

...

...

...0

σ1

σ

σ2V=2 I ndiag 1 ,

2 ... ,n2

Population

V

1st level:

2nd level:

JIRFNI 2009, Marseille26/05/09

Repeated Measures ANOVA

Contrasted effects

Effect covariances

BaselineDrug 1 Drug 2

Subjects

Placebo

Drug1/Drug2 Drug1/Placebo Drug1/Baseline

Drug2/Placebo Drug2/Baseline

Placebo/Baseline


Non-spherical linear model

Y= X Subject 1Subject 2

Subject nSubject 1Subject 2



Subject n

Drug 1

Drug 2

Placebo

Baseline

Variance matrix

Design matrix


Non-sphericity consequences

● Specificity issue

✗ t (F) statistics deviate from expected Student (Fisher) distributions

● Sensitivity issue

✗ t (F) statistics have sub-optimal detection power


Non-sphericity correction (1)

● Estimate β by ordinary least squares

● Estimate V by sample variance

● Use Satterthwaite's degrees of freedom approximation

DOF≈ traceQ V 2

trace Q V Q V Q=In−X X t X −1 X t


Non-sphericity correction (2)

● Estimate hyperparameters of V by restricted maximum likelihood (ReML)

● Compute usual statistics from V-1/2Y and V-1/2X

V−1 /2Y=V−1 /2 X , ~N 0, InFiltered data Filtered predictors


Global/local correction

● SPM performs global sphericity correction

✗ Implicit assumption that the variance structure is constant across voxels

✗ Repeated measures ANOVA are then inconsistent with one-sample tests


Utilisation dans BrainVisa

spm-like statistical analysis of fmri data -...

Documents