spm-like statistical analysis of fmri data -...
TRANSCRIPT
SPM-Like Statistical Analysis of fMRI Data
Part II: Between-Subject Analysis
Alexis Roche
JIRFNI 2009, Marseille
2
Outline
1. Types of inferences in fMRI
2. Parametric tests
3. Nonparametric tests
4. Robust statistics
5. Mixed-effect models
6. Sphericity correction
3
Outline
1. Types of inferences in fMRI
2. Parametric tests
3. Nonparametric tests
4. Robust statistics
5. Mixed-effect models
6. Sphericity correction
4
Group Analysis Goal
● Which functional networks are commonly activated in response to a given task?
5
Processing pipeline
AnatomicalTemplate
Subject 1
Spatialsmoothing
Nonrigidmatching
Subject 2
Spatialsmoothing
Nonrigidmatching
… …
BOLDestimation
Contrast images
BOLDestimation
Group analysis input
JIRFNI 2009, Marseille26/05/0906/10/07
6
Mass univariate one-sample analysis
Subject 1
…
H0: “mean effect
is zero”
Normalized contrast images
Observations in one voxel
Subject 2
Subject 3
…
7
What kind of analysis?
● Random effect
✗ (H0) “the mean effect vanishes in the population of interest”
● Fixed effect
✗ (H0) “the mean effect vanishes in this particular cohort”
8
FFX vs. RFX
Subj. 1
Subj. 2
Subj. 3
Subj. 4
Subj. 5
Subj. 60
Distribution of each subject’s BOLD estimate
Distribution of the mean estimate
σFFX
σRFX
FFX: very significant
RFX: bordeline
Source: T. Nichols
JIRFNI 2009, Marseille26/05/0906/10/07
9
Inferences in fMRI
Type Fixed-effect(FFX)
Random-effect(RFX)
Single-subject Multi-subject Multi-subject
Goal Study functionalactivity in one
particular subject
Study functionalactivity in one
particular cohort
Study functionalactivity in apopulation
10
Outline
1. Types of inferences in fMRI
2. Parametric tests
3. Nonparametric tests
4. Robust statistics
5. Mixed-effect models
6. Sphericity correction
JIRFNI 2009, Marseille26/05/0906/10/07
11
One-sample t statistic
● Divide the sample mean by its standard error
=136.91 , std = 2/n=20.37
t= std
Source: www.statsci.org
Rainfalls in Sydney
JIRFNI 2009, Marseille26/05/0906/10/07
12
One-sample parametric t-test
● t follows a Student distribution if the observations are normal with zero mean
● Hence reject the null hypothesis of zero mean if the observed t is too large
0 tobs
pvalue=P t≥t obs∣H 0
Student distribution with n-1 degrees of freedom
Reject H0
pvalue≤
Accept H0
yes
no
: accepted false positive rate
JIRFNI 2009, Marseille26/05/0906/10/07
13
General parametric t-test
● The t-test is designed to test a contrasted effect in a general linear model
● Used in
✗ Within-subject analyses
✗ FFX analyses
✗ Other RFX analyses
● Generalization to multidimensional contrasts: F-test
H 0 c t=0, Y=X N 0, I
JIRFNI 2009, Marseille26/05/0906/10/07
14
Shortcomings
● The parametric t-test assumes normal observations
● If normality doesn't hold,
✗ Biased specificity (false positive rate)
✗ Lack of sensitivity (true positive rate)
● This is a problem for small samples
✗ The Student distribution holds in the limit of large samples even if normality doesn't
JIRFNI 2009, Marseille26/05/0906/10/07
15
Specificity bias
● Samples of size 10 drawn from a uniform law
Ground truthStudentPermutations
Specificity bias in multiple testing
● Parametric multiple comparison correction usually uses the Euler Characteristic (EC) approximation
● Validity requirements
✗ Multivariate normal assumption
✗ High threshold
✗ Smooth data
● May produce severe bias!
FWER≈E u∣H 0
JIRFNI 2009, Marseille26/05/0906/10/07
17
Lack of sensitivity
● The t statistic is not robust
✗ One single observation can cause t to collapse
t = Inf t = 0
● Outliers may occur more frequently than predicted by the normal distribution
JIRFNI 2009, Marseille26/05/0906/10/07
18
Dealing with outliers
● Invalid approach
✗ Drop outliers and run the parametric t test
t = Inf ; P=0. Invalid because statistical independence is broken.
Sign test gives P=17%. Unequal proportions may be observed by chance!
● Valid approach
✗ Use a robust test statistic combined with a permutation test that works with all datapoints
19
Outline
1. Types of inferences in fMRI
2. Parametric tests
3. Nonparametric tests
4. Robust statistics
5. Mixed-effect models
6. Sphericity correction
JIRFNI 2009, Marseille26/05/0906/10/07
20
Beyond parametric tests
● Specificity bias calls for nonparametric calibration schemes
✗ Permutation tests
✗ Bootstrap tests
● Lack of sensitivity calls for robust test statistics
● Both approaches require relaxing normality
JIRFNI 2009, Marseille26/05/0906/10/07
21
Sign permutation test
Observed t
P-value
Histogram of the simulated statistics
...
Original sample
t = 3.76
One observation permuted
t = 1.36
Two observations
permuted
t = 1.25
All observations
permuted
t =-3.76
JIRFNI 2009, Marseille26/05/0906/10/07
22
Familywise error correction
Observed t
Corrected P-value
Histogram of the tmax statistics ...
Original sample
tmax = 4.95
t map
... tmax = 4.63
t mapOne subject permuted
…... tmax = 4.17
t mapTwo subjects permuted
... tmax = 2.93
t mapAll subjects permuted
JIRFNI 2009, Marseille26/05/0906/10/07
23
Sign permutation test
● Conservative (exact in a conditional sense)
● Works under mild assumptions
✗ One-sided tests require distribution symmetry (more general than normality)
● Manages multiple comparisons
✗ Alternative to random field theory
Distancewww.madic.org/download/
SnPMwww.sph.umich.edu/ni-stat/SnPM/
JIRFNI 2009, Marseille26/05/0906/10/07
24
Other permutation tests
● Sign permutations are suited for one-sample RFX
● Two-sample RFX
✗ Shuffle group labels
● Within-subject or FFX
✗ Shuffle experimental conditions
CamBAwww-bmu.psychiatry.cam.ac.uk/software/
SnPMwww.sph.umich.edu/ni-stat/SnPM/
JIRFNI 2009, Marseille26/05/0906/10/07
25
Bootstrap test
● Similar to the permutation test, except it uses another resampling method
Subtract the mean
Draws with replacement
…
JIRFNI 2009, Marseille26/05/0906/10/07
26
Bootstrap test
● Approximate (not necessarily conservative)
● Asymptotically exact under weaker assumptions than the permutation test
● Manages multiple comparisons
27
Outline
1. Types of inferences in fMRI
2. Parametric tests
3. Nonparametric tests
4. Robust statistics
5. Mixed-effect models
6. Sphericity correction
JIRFNI 2009, Marseille26/05/0906/10/07
28
Robust test statistics
● The t statistic has optimal detection power for normally distributed data
● Other statistics may perform better in distributions that tend to produce outliers
✗ Median
✗ Sign statistic
✗ Wilcoxon signed rank
✗ Empirical likelihood ratio
✗ ...
JIRFNI 2009, Marseille26/05/0906/10/07
29
ROC curves● Detection power in a normal distribution (sampling
10 subjects)
Student statisticWilcoxon statistic
JIRFNI 2009, Marseille26/05/0906/10/07
30
ROC curves● Detection power in a Laplace distribution (sampling
10 subjects)
Student statisticWilcoxon statistic
JIRFNI 2009, Marseille26/05/0906/10/07
31
ROC curves● Detection power in a heavy-tailed distribution
(sampling 10 subjects)
Student statisticWilcoxon statistic
JIRFNI 2009, Marseille26/05/0906/10/07
32
Median
● Highly robust location estimator
✗ Resistant to contamination by (n-1)/2 outliers
median = 0.27 median = 0.27
JIRFNI 2009, Marseille26/05/0906/10/07
33
Sign statistic
● Number of positive observations
● Similar to the median
● Permutation distribution data-independent (Binomial)
t s=Card {i ,i0}
Binomial law for 10 subjects
JIRFNI 2009, Marseille26/05/0906/10/07
34
Wilcoxon signed rank
tw=∑irank ∣i∣ signi
10 subjects
● More sensitive than the sign statistic in moderate-tailed distributions
● Permutation distribution is also data-independent
JIRFNI 2009, Marseille26/05/0906/10/07
35
Iterated reweighted least squares
● Algorithm sketch
C =∑i wii−2
Weight each observation according to its “closeness” to the current mean estimate
Update the estimate by minimizing the weighted least squares
Start with a robust mean estimate (e.g. the median)
● From there, compute a robust t statistic by analogy with ordinary least squares
● Not evaluated yet within permutation tests
JIRFNI 2009, Marseille26/05/0906/10/07
36
Speech activation in three-month-olds
Cluster level:
Pcorr. = 0.34
Cluster level:
Pcorr. = 0.08
Cluster level:
Pcorr. = 0.02
✔ 10 subjects✔ Single threshold tests at P=0.01 (uncorrected)
Parametric t test (SPM) Permutation t test Wilcoxon test
Courtesy of Ghislaine Dehaene
Broca’s area
37
Outline
1. Types of inferences in fMRI
2. Parametric tests
3. Nonparametric tests
4. Robust statistics
5. Mixed-effect models
6. Sphericity correction
Mixed effects
● Observed effects have two distinct variability sources
✗ Within-subject (errors in first-level analysis)
✗ Between-subject (intrinsic variability of BOLD responses)
● Both sources can produce outliers
✗ Artifactual outliers
✗ Behavioral outliers
● Mixed-effect statistics are robust to artifactual outliers
JIRFNI 2009, Marseille26/05/0906/10/07
39
Two-level linear model
● Accounts for mixed effects
First level
βG
β1
βi
βn
Y1
Yi
Yn
Unknown populationmean effect
Unknown individualeffects Observed fMRI data
Second level
i=X GG Y i=X i ii
● Equivalent to a generalized linear model with heteroscedastic noise
JIRFNI 2009, Marseille26/05/0906/10/07
40
Two-level linear model
● Full mixed-effect approach
✗ Estimate βG based on the raw scans
✗ Implemented in FSL, SPM, ...
● Simplification: sequential approach
✗ Run first-level analyses, then estimate βG based on non-exhaustive summary statistics
✗ Implemented in fMRIstat/NiPy, Distance, ...
JIRFNI 2009, Marseille26/05/0906/10/07
41
Mixed-effect statistics
● Exploit the first-level variances of the observed effects
● The higher the variance, the less reliable a subject
Subj 1
First-level summary statistics
Subj 2
…
i
Test statistic
std i
JIRFNI 2009, Marseille26/05/0906/10/07
42
Expectation-Maximization algorithm
● Estimate the population mean βG and variance σ
G
from the (effect+variance) observations
E-step
Estimate the true effects and their posterior variances,
M-step
Update the population mean and variance estimates,
● Then, form a mixed-effect version of the t statistic
● Many variants exist
i=G
2
G2 i
2i
i2
G2 i
2 G , i2=
G2 i
2
G2 i
2 G=1n∑i
i , G2 =
1n∑i
[i2G−i
2]
JIRFNI 2009, Marseille26/05/0906/10/07
43
FIAC’05 dataset✔ 15 subjects✔ Sentence x Speaker interaction✔ Single threshold tests at P=0.01 (uncorrected)
Parametric t test (SPM) Permutation t test Permutation MFX test
Right Superior Temporal Sulcus
Cluster level:
Pcorr. = 0.13
Cluster level:
Pcorr. = 0.09
Cluster level:
Pcorr. = 0.02
JIRFNI 2009, Marseille26/05/0906/10/07
44
Two-sample testing
● Mixed-effect t statistic derived similarly to the one-sample case
● Permutation test by label switching
JIRFNI 2009, Marseille26/05/0906/10/07
45
Localizer dataset✔ Two-sample test: females > males (10 subjects in each group)✔ Mental calculus task✔ Single threshold tests at P=0.01 (uncorrected)
Parametric t test (SPM) Permutation t test Permutation MFX test
Left Intra-Parietal Sulcus
Cluster level:
Pcorr. = 0.01
Cluster level:
Pcorr. = 0.13
Cluster level:
Pcorr. = 0.09
46
Outline
1. Types of inferences in fMRI
2. Parametric tests
3. Nonparametric tests
4. Robust statistics
5. Mixed-effect models
6. Sphericity correction
47
Linear models for group
● Spherical GLM: linear prediction + i.i.d. errors
Y=X , ~N 0,2 In
Xβ
48
The issue of sphericity
“Since the data obtained in many areas of psychological inquiry are not likely to conform to these requirements … researchers using the conventional procedure will erroneously claim treatment effects when none are present, thus filling their literatures with false positive claims.” – Keselman et al. 2001
49
Examples of non-spherical models
● Two-sample analysis under unequal variance
● Mixed effects
● Repeated measures ANOVA
50
Two-sample model1st level:
2nd level:
Controls Blinds
XV
Source: W. Penny
51
One-sample mixed-effect model
Subject 1
Subject 2
...
...
...
...0
σ1
σ
σ2V=2 I ndiag 1 ,
2 ... ,n2
Population
V
1st level:
2nd level:
JIRFNI 2009, Marseille26/05/09
Repeated Measures ANOVA
Contrasted effects
Effect covariances
BaselineDrug 1 Drug 2
Subjects
Placebo
Drug1/Drug2 Drug1/Placebo Drug1/Baseline
Drug2/Placebo Drug2/Baseline
Placebo/Baseline
JIRFNI 2009, Marseille26/05/09
Non-spherical linear model
Y= X Subject 1Subject 2
Subject nSubject 1Subject 2
Subject nSubject 1Subject 2
Subject nSubject 1Subject 2
Subject n
Drug 1
Drug 2
Placebo
Baseline
Variance matrix
Design matrix
JIRFNI 2009, Marseille26/05/09
Non-sphericity consequences
● Specificity issue
✗ t (F) statistics deviate from expected Student (Fisher) distributions
● Sensitivity issue
✗ t (F) statistics have sub-optimal detection power
JIRFNI 2009, Marseille26/05/09
Non-sphericity correction (1)
● Estimate β by ordinary least squares
● Estimate V by sample variance
● Use Satterthwaite's degrees of freedom approximation
DOF≈ traceQ V 2
trace Q V Q V Q=In−X X t X −1 X t
JIRFNI 2009, Marseille26/05/09
Non-sphericity correction (2)
● Estimate hyperparameters of V by restricted maximum likelihood (ReML)
● Compute usual statistics from V-1/2Y and V-1/2X
V−1 /2Y=V−1 /2 X , ~N 0, InFiltered data Filtered predictors
JIRFNI 2009, Marseille26/05/09
Global/local correction
● SPM performs global sphericity correction
✗ Implicit assumption that the variance structure is constant across voxels
✗ Repeated measures ANOVA are then inconsistent with one-sample tests
JIRFNI 2009, Marseille26/05/09
Utilisation dans BrainVisa
JIRFNI 2009, Marseille26/05/09
Utilisation dans BrainVisa