miccai’09 fmri data analysis workshop (thursday 24th ......poral autocorrelations inherent to fmri...

77
MICCAI’09 fMRI data analysis workshop (Thursday 24th September 2.00 – 5.30 pm): Statistical modeling and detection issues in intra- and inter-subject functional MRI data analysis proposed by Bertrand Thirion 1,2 , Alexis Roche 2 , Philippe Ciuciu 2 and Tom Nichols 3 1 INRIA Saclay- ˆ Ile de France, Parietal Team Neurospin, CEA Saclay,91191 Gif-sur-Yvette Cedex, France 2 Computer-Assited Neuroimaging Lab, NeuroSpin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France 3 Clinical Imaging Centre, GlaxoSmithKline and FMRIB Centre University of Oxford, UK

Upload: others

Post on 05-Mar-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

MICCAI’09 fMRI data analysis

workshop (Thursday 24th September 2.00 – 5.30 pm):

Statistical modeling and detection issues in

intra- and inter-subject functional MRI data

analysis

proposed by

Bertrand Thirion1,2, Alexis Roche2, Philippe Ciuciu2

and Tom Nichols3

1 INRIA Saclay-Ile de France, Parietal Team Neurospin, CEASaclay,91191 Gif-sur-Yvette Cedex, France

2 Computer-Assited Neuroimaging Lab, NeuroSpin, CEA Saclay,91191 Gif-sur-Yvette Cedex, France

3 Clinical Imaging Centre, GlaxoSmithKline and FMRIB CentreUniversity of Oxford, UK

Page 2: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

1 Workshop organisation

The fMRI workshop will take place during the MICCAI’09 conference at ImperialCollege London on September, 24th (2:00pm-5:30pm). It has been organized bythe following program committee:

– Bertrand Thirion: INRIA research scientist, leader of the Parietal Team (IN-RIA Saclay - NeuroSpin, CEA Saclay, France).

– Alexis Roche CEA research scientist, member of the computer assistedneuroimaging Lab (LNAO4, NeuroSpin, CEA Saclay), France and currentlyacademic guest at the BIWI Computer Vision Laboratory, ETHZ, Zurich,Switzerland.

– Philippe Ciuciu: CEA research scientist, member of the computer assistedneuroimaging Lab (LNAO, NeuroSpin, CEA Saclay), France.

– Tom Nichols: Director, Modelling & Genetics, Clinical Imaging Centre,GlaxoSmithKline ; Senior Research Fellow, Department of Clinical Neurol-ogy, FMRIB Centre, University of Oxford, UK. Adjunct Research AssociateProfessor, Dpt of Biostatistics, School of Public Health, University of Michi-gan, USA.

1.1 Abstract

Functional MRI (fMRI) provides a unique view on brain activity, which is usedboth for a better understanding of brain functional anatomy and the assessmentof various mental diseases. The analysis of fMRI data entails detection issues,in which it has to be decided whether certain regions shows an activity signifi-cantly correlated to some variables of interest. This problem can be formulatedin a given individual dataset (in which case the variable of interest is the ex-perimental paradigm) or in a multi-subject dataset (the variable of interest isthen a behavioral, clinical, or genetic factor of interest). Moreover, this problemcan be handled as a modeling problem when addressing the temporal structureof the BOLD response and various fluctuations observed in fMRI datasets, orwhen delineating brain regions, especially across individuals, as well as a sta-tistical problem: for instance, a typical concern is to warrant a certain controlover false positives (specificity) for a testing procedure, or to achieve an opti-mal compromise between sensitivity and specificity by using judicious decisionstatistics.

While some of these questions may be familiar to the medical imaging com-munity, partly for historical reasons, the neuroimaging community has developedspecific contributions to solve these issues, and all the questions mentioned aboveare still the object of active research. This workshop should be an opportunityto discuss and evaluate several solutions that have been proposed to solve thesequestions, and to confront different points of view.

4 http://www.lnao.fr

Page 3: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

3

1.2 Preliminary Program

Schedule Program Speaker14:00-14:05 Welcome B. Thirion14:05-14:50 Statistical perspective on fMRI data analysis T. Nichols

14:50-15:15Multi-Group Functional MRI Analysis Using

Statistical Activation Priors (Paper I)D. Bathula

15:15-15:40Surface-based versus volume-based fMRI group analysis:

a case study (Paper VIII)A. Tucholka

15:40-15:50 Break

15:50-16:15CanICA: Model-based extraction of reproducible group-level

ICA patterns from fMRI time series (Paper IX)G. Varoquaux

16:15-16:40 Exploring the temporal quality of fMRI acquisitions (Paper VII) B. Scherrer16:40-16:50 Flash presentation (poster session)16:50-17:30 Poster session & discussion

1.3 Panel of reviewers

The organizers are thankful to all reviewers who actively contributed to thetimely reviewing process of all paper submissions to this workshop:

– Christian Beckmann– Philippe Ciuciu– Rebecca Hutchinson– Tianzi Jiang– Pierre Lafaye– Danial Lashkari– Gabrielle Lohmann– Vincent Michel– Bernard Ng– Gegory Operto– Will Penny– Indrayana Rustandi– Alexis Roche– Benoıt Scherrer– Lawrence Staib– Bertrand Thirion– Gael Varoquaux– Mark Woolrich

Page 4: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

List of Papers

• Multi-group Functional MRI analysis using statistical activationpriors by D. Bathula et al pp. 5–

• Structural group analysis of brain functional data: assessing resultsignificance by G. Operto et al pp. 13–

• Nonparametric mean shift functional detection in the functionalspace for task and resting-state fMRI by J. Cheng et al pp. 21–

• Adaptive hierarchical Bayesian mixture for sparse regression - Anapplication to brain activity classification by V. Michel et al pp. 29–

• Random walker bsaed estimation and spatial analysis of proba-bilistic fMRI activation maps by B. Ng et al pp. 37–

• Integrating multiple-study multiple-subject fMRI datasets usingcanonical correlation analysis by I. Rustandi et al pp. 45–

• Exploring the temporal quality of fMRI acquisitions by S. Scherreret al pp. 53–

• Surface-based versus volume-based fMRI group analysis: a casestudy by A. Tucholka et al pp. 61–

• CanICA: model-based extraction of reproducible group-level ICApatterns from fMRI time series by G. Varoquaux et al pp. 69–

Page 5: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

Multi-Group Functional MRI Analysis UsingStatistical Activation Priors

Deepti R. Bathula1, Lawrence H. Staib1,2,3, Hemant D. Tagare2,3,Xenophon Papademetris1,3, Robert T. Schultz4, and James S. Duncan1,2,3

1 Departments of Biomedical Engineering,2 Electrical Engineering,

3 Diagnostic Radiology, and4 Child Study Center,

Yale University, P.O. Box 208042, New Haven, CT 06520, [email protected]

Abstract. Statistical activation priors that learn task-related brain ac-tivation patterns from training data have shown great potential for ro-bust and sensitive analysis of individual fMRI data. However, in thecontext of multiple group fMRI experiments, constructing a training-based prior model needs careful consideration. Specifically, the choice ofstatistical learning method and the composition of the training set cansignificantly affect the results. In this paper, we evaluate the performanceof PCA and ICA based activation priors when applied to an attentionmodulation fMRI study with normal and autistic subjects. We also in-vestigate the aptness of these training-based priors to studies with knownand unknown subgroups and compare them to standard GLM methods.

1 Introduction

Functional MRI experiments are often performed to infer differences betweenpopulations. Consequently, the objective of fMRI group analysis is to extracta good representation of the relationship between brain structure and functionacross subjects. However, accurate assessment of focal brain activity in individ-uals is crucial for the success of group analysis.

The functional imaging literature contains a number of methods aimed atreducing false detection of task-related activity resulting from the low signal-to-noise ratio (SNR) in fMRI data. Particularly, strategies that take into account apriori knowledge of functional activations have been very popular. The majorityof these prior information guided approaches have incorporated spatial and tem-poral autocorrelations inherent to fMRI data [1–4] and anatomical information[5, 6]. More recently, inspired by the success enjoyed by statistical shape priorsin image segmentation, statistical activation priors [7, 8] have been introduced tofMRI analysis. This latest approach involves learning brain activation patterns(strength, shape and location) from previously conducted multi-subject fMRIstudies (training data) to define functionally informed priors for improved analy-sis of new subjects. Unlike other spatio-temporal regularization priors, functional

Page 6: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

activation priors compensate for low SNR by inducing sensitivity to task-relatedregions of the brain and have been demonstrated to be more robust and sensitive.

In the context of multiple group fMRI experiments, however, two fundamen-tal issues related to training-based prior models need to be addressed:

– In experiments where the classification of subjects is known, is it better togenerate a separate prior model for each group or a single prior from thepooled training set?

– In studies where the existence of subgroups or the group membership ofsubjects is unknown, how well does a prior model generated from the mixedtraining population perform?

In this paper, we investigate the application of statistical activation models tomulti-group fMRI studies. We evaluate two well established statistical learningtechniques, Principal Component Analysis (PCA) and Independent ComponentAnalysis (ICA), in capturing variation in functional activation patterns acrosstraining samples. The performance of these training-based priors is comparedwith the standard general linear model (GLM) and smoothing based methods.

2 Description of Method

In this section, we briefly review Bayesian analysis of individual fMRI data usingstatistical activation priors and statistical testing to perform group analysis.

2.1 Individual Analysis

Background – An fMRI scan produces a set of time series {yv ∈ <T : v =1, . . . , V }, each of which represents the measured BOLD signal over T timesamples in one of V voxels. The GLM assumes that the BOLD signal is a linearcombination of protocol-dependent components and noise. Assuming Gaussianwhite noise, the voxel-wise GLM is written as

yv = Xβv + εv εv ∼ N (0, λ−1v IT ) (1)

where εv is the residual, λv denotes the noise precision and βv are regressioncoefficients on the columns of the design matrix X containing R regressors. Theestimation of the unknown parameters (β) is usually accomplished using leastsquares approach that minimizes the residual error (ε).

A Bayesian formulation provides a natural framework to incorporate priorinformation. Consequently, temporal modeling of the data is implemented usingGLM and is constrained by specifying spatial priors over the regression coeffi-cients. Using the same model for the BOLD signal as in Eq. (1), the conditionalprobability (or likelihood) of obtaining an fMRI scan, Y, given the activationstrength of all voxels, β, can be expressed as:

p(Y|β,λ) =V∏v=1

p(yv|βv, λv) =V∏v=1

N (yv; Xβv, λ−1v IT ) (2)

Page 7: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

where the equalities are obtained based on the assumption that, given the acti-vation strength vector of each voxel βv, the fMRI time series of different voxelsare conditionally independent. In this framework, spatial coherence is modeledinto activation parameters and not the observed time series.

Statistical Priors – The fundamental idea here is to exploit prior knowledgeof task-related brain activation patterns learned from training data to comple-ment the functional time series data of individual subjects. There are severalways in which data patterns can be represented using probabilistic generativemodels. Distributed representations of linear componential models, such as PCAand ICA, are particularly useful for modeling complex high-dimensional data.Given a set of coregistered and realigned activation strength maps or β-mapsresulting from GLM analysis of training images, these techniques extract lowerdimensional subspaces that encapsulate key variations in activation patterns.The prior probability densities of activation patterns, p(β), are then efficientlyestimated from these low dimensional feature spaces.

PCA provides an estimate of the density in high-dimensional space using aneigenspace decomposition. It assumes Gaussian distribution of activation pat-terns, which is a unimodal distribution. Consequently, PCA tends to bias theposterior estimate towards the mean activation pattern. Furthermore, it tendsto capture global variations in activation patterns. ICA, on the other hand, gen-erates source activation patterns that are maximally statistically independent. Itdoes not impose any normality assumptions on the data and describes localizedvariations in patterns. As fMRI data exhibits inter-subject variability in func-tional anatomy, ICA has been found to be more suitable for fMRI analysis thanPCA. The development of PCA and ICA based functional activation priors havebeen described comprehensively in [7] and [8] respectively.

MAP Estimation – Given the fMRI time series data Y, the aim is to estimatethe distribution of unknown parameters Θ (including β). Using Bayes’ rule, theposterior distribution of unknown parameters is given by:

p(Θ|Y) ∝ p(Y|Θ) p(Θ) (3)

where p(Y|Θ) represents the data likelihood and p(Θ) is the combined prior dis-tribution of activation strength parameters and other hyper-parameters. Takingthe logarithm, the maximum a posteriori (MAP) estimate of Θ is then given by

Θmap = arg maxΘ

p(Θ|Y) = arg maxΘ

[ ln p(Y|Θ) + α ln p(Θ) ] (4)

where α is the weighting factor for the prior term. This equation indicates thatthe optimum value of the model parameters is given by a tradeoff between theprior information and the data-driven likelihood information.

2.2 Group Analysis

Conventional random-effects analysis (RFX) in fMRI takes as input a set ofindividual BOLD contrast images to produce a group statistical map which is

Page 8: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

further thresholded to a desired significance level. Both parametric and nonpara-metric versions of that approach have been proposed in the literature. In thiswork, Wilcoxon’s signed rank test was chosen as the decision statistic to performgroup analysis on the mean population effect. It is a nonparametric alternativeto the Student’s t-test when the assumption of normality cannot be reasonablysatisfied and is given by:

tw =n∑i=1

rank(|βi|)× sign(βi) (5)

3 Experimental Results

We applied the proposed estimation methods to an attention modulation studyconducted on 11 healthy adults, 10 normal children and 18 autistic subjects.Imaging was performed with a whole-body Siemens Trio 3T scanner. FunctionalT2*-weighted images were acquired using a gradient-echo EPI sequence (40 axialslices, 64x64 matrix, voxel size = 3.5 x 3.5 x 3.5 mm3 with no gap, TR/TE =2320/25 ms). The experimental paradigm consisted of an ON-OFF block designwhere the subject was shown images of either houses or faces during the ONphase and asked to view a fixation point during the OFF phase. The experimentconsisted of 5 runs with each run generating 140 time samples. Data preprocess-ing involved motion correction and linear detrending but no spatial smoothing.Functional and anatomical images of all subjects were spatially normalized tothe standard Talairach template. The healthy adult and children groups werecombined to create the control group. This process created control (N1 = 21)and autism (N2 = 18) groups that were more comparable in size.

In this multi-group fMRI experiment, we examine six different estimators.GLM and Smoothed-GLM represent standard fMRI analysis using general linearmodel on unfiltered and spatially filtered (FWHM = 6mm) data respectively.Group-PCA and Group-ICA represent analyses using spatial priors trained onprincipal and independent components respectively; separate priors were used inanalyzing control and autism groups. Conversely, Mixed-PCA and Mixed-ICArepresent analyses using prior activation models trained on the whole data set,controls and patients combined, using PCA and ICA respectively. While theordering of principal components is based on eigenvalues, kurtosis was used toidentify relevant components of ICA. The number of retained components (K)for both PCA and ICA was 8 for group-wise priors and 13 for mixed priors.

Activation strength maps for individual subjects were generated by analyzingtheir data using each of the different estimators mentioned above. The BOLDcontrast maps for the effect of interest, face versus house, were produced by ap-plying the contrast weight vector c = [1 − 1] to the activation strength vectorat each voxel. For validation purposes, the activation strength maps estimatedfrom the entire time series (all five runs) using GLM were considered as refer-ence maps. The performance of different estimators is then evaluated by theirability to reproduce the reference map from reduced time series (two runs). The

Page 9: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

Sum-of-Squares Error Correlation Coefficient

GLM 52.95 ± 14.91 0.68 ± 0.18Smoothed-GLM 41.94 ± 13.00 0.65 ± 0.20Group-PCA 28.30 ± 17.63 0.77 ± 0.16Group-ICA 27.06 ± 15.36 0.79 ± 0.13Mixed-ICA 24.49 ± 15.97 0.76 ± 0.15Mixed-PCA 35.30 ± 18.41 0.72 ± 0.13

Table 1: Effectiveness of different estimators measured using Sum-of-Squares Error(SSE) and Correlation Coefficient (ρ) between estimated activation strength maps andreference maps for N = 39 subjects (N = N1 + N2, N1 = 21 controls and N2 = 18autistic subjects).

Blue Parahippocampal Place Areas (PPA)

Red Fusiform Face Areas (FFA)

Yellow Superior Temporal Sulcus (STS)

Aqua Superior Lingual Gyrus (SLG)

Green Intraparietal Sulcus (IPS)

Fig. 1: Structural image of a coronal slice with task related regions highlighted.

estimators using training-based statistical activation priors were tested using theleave-one-out cross-validation technique.

3.1 Quantitative Evaluation

Table 1 provides a quantitative comparison of the six estimators where the effec-tiveness is measured using sum-of-squares error (SSE) and correlation coefficient(ρ) between the estimated activation strength maps and the reference map. Thevalues in the table represent the mean ± standard deviation of these quantitiesobtained by analyzing the same coronal slice (shown in Fig. 1) in all 39 subjectsusing the leave-one-out technique. As expected, the GLM estimate with no priorinformation or smoothing has the highest SSE and lowest ρ. Smoothing with amodest 6mm FWHM Gaussian kernel decreased SSE but also reduced ρ, indicat-ing an underestimate of activation strength. In terms of SSE, all the estimatorsusing statistical activation priors provided statistically significant (p < 0.01,paired t-test) improvement over GLM and smoothed-GLM estimators. The in-crease in ρ is also quite significant considering that only 2-run time series datawas used in the estimation. A comparison of mixed and group-wise models in-dicates that prior models trained on individual groups were more effective incapturing the details of activation patterns relevant to those groups than mixed

Page 10: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

group priors. In general, the ICA based priors performed better than their PCAcounterparts. Furthermore, the Mixed-PCA prior showed the least improvementamong training-based statistical activation priors.

3.2 Qualitative Comparison

Figure 2 depicts the results of RFX analysis based on the Wilcoxon’s signed rankstatistic where the group statistical maps were thresholded at a significance levelof p = 0.01 uncorrected. The top row shows reference maps for the control andautism groups generated from the time series of all five runs using GLM. For theface versus house contrast, the control group shows significant positive activa-tion in Fusiform Face Areas (FFA) and negative activation in ParahippocampalPlace Areas (PPA). In contrast, the autism group only shows negative activa-tion in PPA and no activation in FFA. Many prior fMRI studies have also shownthe FFA region to be significantly less engaged among persons with autism com-pared to controls during face perception tasks [9]. Additionally, the control groupalso exhibits some positive activation in Superior Temporal Sulcus (STS) andnegative activation in Superior Lingual Gyrus (SLG). These regions are roughlyhighlighted on a structural scan in Figure 1.

The group activation map obtained from GLM on 2-run data (Fig. 2a) revealssimilar activation patterns but much lower in strength and extent, especially inFFA. It also introduces some noisy fragments in the background. Spatial smooth-ing using Gaussian filter (Fig. 2b) reduced noise but overestimated the extent ofactivations in PPA, STS and SLG of the control group. The effect of introduc-ing Group-ICA prior is shown in Figure 2c. It significantly increased the extentof activation in FFA and improved the strength in PPA of the control group.Although Group-PCA (Fig. 2d) provides a cleaner activation map than Group-ICA, it slightly overestimates activation in PPA and STS and underestimates leftFFA activation in the control group. Compared to Group-ICA, Mixed-ICA (Fig.2e) produces a slightly noisier and fragmented activation pattern in the controlgroup. On the other hand, Mixed-PCA (Fig. 2f) underestimates the strength andoverestimates the extent of PPA activation in the control and autism groups re-spectively. It also elicits a hint of activation in the right FFA of the autism group.Moreover, Mixed-PCA prior introduces or eliminates activation in SLG againstthe trend observed in the reference maps of the two groups. As mentioned in theprevious section, PCA based priors regress to the mean activation pattern caus-ing a loss of detail in the estimates of individual subjects. These results ascertainthat this effect is more pronounced when the training set contains subjects fromdissimilar groups.

Considering activation in FFA as the distinguishing factor between the twogroups, Group-ICA followed by Group-PCA provide the closest estimates to thereference maps. Mixed-ICA prior also produces a noisy but reasonable estimateof ground truth. Taking into account SLG activations, these results further es-tablish the tendency of PCA based priors to suppress intersubject differences dueto regression to the mean activation pattern. From a cognitive point of view, es-

Page 11: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

Con

trol

Aut

ism

Reference

Controls Autism

(a)

(b)

(c)

(d)

(e)

(f)

Fig

.2:

Gro

up

act

ivati

on

map

sb

ase

don

the

Wil

coxon

’ssi

gn

edra

nk

stati

stic

(p<

0.0

1,

un

corr

ecte

d),

sup

erim

pose

don

aco

ron

al

slic

e.T

op

:R

efer

ence

act

ivati

on

map

s(G

LM

5-R

un

)fo

rco

ntr

olan

dau

tism

gro

up

s.M

idd

le:N

orm

alco

ntr

olgro

up

(2-R

un

data

).B

ott

om

:A

uti

smgro

up

(2-R

un

data

).(a

)G

LM

(b)

Sm

ooth

ed-G

LM

(FW

HM

=6m

m)

(c)

Gro

up

-IC

A(K

=8,α

=0.8

)(d

)G

rou

p-P

CA

(K=

8,α

=0.8

)(e

)M

ixed

-IC

A(K

=13,α

=0.7

)(f

)M

ixed

-PC

A(K

=13,α

=0.7

).K

rep

rese

nts

the

nu

mb

erof

reta

ined

pri

nci

pal

or

ind

epen

den

tco

mp

on

ents

an

rep

rese

nts

pri

or

wei

ghti

ng

fact

or.

Ref

erto

[8]

for

more

det

ail

son

thes

ep

ara

met

ers.

Page 12: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

timators using functionally informed priors provide a much stronger evidence ofhypoactivation of the FFA in the autism group during face perception.

4 Discussion

The application of statistical activation priors to fMRI data from an attentionmodulation experiment with normal controls and autistic subjects establishestheir superiority in accurately estimating activation when compared to GLMand spatially smoothed analysis. The validation procedure used in this work,which compares the estimates from entire time series with reduced time seriesestimates, also demonstrates the potential of training-based prior models to re-duce acquisition time in test subjects.

In the context of multi-group fMRI experiment, constructing a training-basedprior model needs careful consideration. The results of our work suggest thatgroup-wise priors are more effective in capturing the details of activation patternsrelevant to those groups and hence perform better than priors trained on amixed training population. Furthermore, the results establish the tendency ofPCA based priors to regress to the mean activation pattern and the abilityof ICA based priors to account for intersubject variability in brain activationpatterns. Hence, ICA based functional priors are considered more suitable forstudies where the composition of training samples is unknown.

References

1. Descombes, X., Kruggel, F., von Cramon, Y.: fMRI signal restoration using an edgepreserving spatio-temporal Markov random field. NeuroImage 8 (1998) 340–349

2. Salli, E., Aromen, H.J., Savolainen, S., Korvenoja, A., Visa, A.: Contextual clus-tering for analysis of functional MRI data. IEEE Transactions on Medical Imaging20(5) (May 2001) 403–414

3. Wang, Y.M., Schultz, R.T., Constable, R.T., Staib, L.H.: Nonlinear estimation andmodeling of fMRI data using spatio-temporal support vector regression, InformationProcessing in Medical Imaging (July 2003) 647–659

4. Flandin, G., Penny, W.D.: Bayesian fMRI data analysis with sparse spatial basisfunction priors. NeuroImage 34 (2006) 1108–1125

5. Ou, W., Golland, P.: From spatial regularization to anatomical priors in fMRIanalysis, Information Processing in Medical Imaging (July 2005) 88–100

6. Kiebel, S., Goebel, R., Friston, K.J.: Anatomically informed basis functions. Neu-roImage 11 (2000) 656–667

7. Yang, J., Papademetris, X., Staib, L.H., Duncan, J.S.: Functional brain imageanalysis using joint function-structure priors. In: Medical Image Computing andComputer Assisted Intervention. (2004) 736–744

8. Bathula, D.R., Tagare, H.D., Staib, L.H., Papademetris, X., Schultz, R.T., Duncan,J.S.: Bayesian analysis of fMRI data with ICA based spatial prior. In: MICCAI.(2008) 246–254

9. Schultz, R.T.: Developmental deficits in social perception in autism: The role of theamygdala and fusiform face area. International Journal of Developmental Neuro-science 23 (2005) 125–141

Page 13: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

Structural Group Analysis of Brain Functional

Data : Assessing Results Significance

Gregory Operto, Bernard Fertil, Remy Bulot, and Olivier Coulon

Laboratoire LSIS, UMR CNRS 6168, Marseille, [email protected]

Abstract. Structural fMRI group analyses such as [4] perform activa-tion detection at the group level while also providing individual results.This paper presents a method to estimate the significance of the resultsprovided by such analysis. The method is twofold : first it performs a ran-dom sampling of the distribution of the objects of interest directly fromthe data; then it performs a percentile rank analysis in order to quantifythe significance of the detection results. Experiments are presented ontwo real datasets.

1 Introduction

Functional brain data analyses are typically performed on the prime voxel-baseddomain of the images. In a similar fashion, group results are generally presentedas 3D images depicting supra-threshold clusters, around putative regions thatwere activated across a pool of subjects. These clusters are then related to cere-bral areas so as to drive neuroscientific conclusions. However, the voxels are onlythe acquisition space and have no anatomical meaning, other than the simplelocalization provided by spatial normalization. In comparison to this historicaland widely spread iconic volume-based approach, the framework of structuraltechniques allows one to deal with representations of data closer to the objectsunder study than voxels. Analysis is hence driven on objects of interest, whichwere formerly extracted from the original images, using anatomical [1] or func-tional [2, 3] criteria, and inter-subject matching is therefore performed betweenthese objects, without spatial normalization.

In particular, the method described in [4] allows analyses of fMRI data acrossa group of subjects directly on the cortical surface, hence in both a structuraland surface-based way. Outputs of such kind of analysis are consequently of adifferent nature to those produced by voxel-based methods : instead of group-level clusters, the results of such structural surface-based technique are shaped ascortical mesh patches representing activations on each subject’s anatomy. Each“blob” corresponds to a putative activation and is characterized by an intensity-related measure, a specific location and an extent on the cortical surface, anda certain number of individual occurrences among the group. For instance, de-pending on the contrast of interest, some activations may be present in everysubject, while some other blobs may appear in a small number of subjects dueto some upstream artifacts.

Page 14: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

This kind of analysis aims at providing an extensive representation of datacontent, compared to the “supra-threshold” approach. Nevertheless it is manda-tory to provide a control of the type I error rate (false positive) in order to bringsignificance to the results. For instance in the case of univariate statistical meth-ods, this error rate control is performed by means of parametric (e.g. t-test) ornon parametric (e.g. permutations) tests. In this paper, we define a theoreticalcontrol of the type I error rate and associate a significance measurement to everyset of blobs associated to some unique group-level activation. In section 2 we firstrecall the nature of the results provided by the surface-based structural analysisdefined in [4]. We then present in section 3 a resampling scheme for the distri-bution of functional activations detected by this structural analysis, followed bya non-parametric rank analysis. Finally, we propose a set of experiments on realfunctional data in section 4 and discuss the results in section 5.

2 Functional Structural Group Analysis : Background

From initial fMRI volumes to blobs detection, a whole pipeline carries on asequence of processes for data preparation. Detailed description of this pipelineis given in [4]. It includes extraction of inner cortical surface meshes, projection offunctional volumes onto cortical meshes, computation of individual surface-basedstatistical t-maps and finally construction of surface-based scale-space primalsketches. Scale-space primal sketches are built from each individual surface-basedt-map, hence giving each subject a multi-scale hierarchical description [2, 5] ofthe structure of the current individual t-map, on the cortical surface. Thesedescriptions contains objects, referred to as scale-space blobs, associated to thet-maps’ local maxima. Each blob comes with an associated t-value and specificlocation and extent on the cortex.

A global comparison graph G is built by embedding the primal sketches ofall subjects : graph nodes are blobs and edges are built within two nodes ifthose two blobs a) belong to two different subjects and b) are close enough inthe common space defined by the surface-based coordinate system defined in [6].Such a graph typically contains several thousands of nodes, most of which arenoise.

Group analysis is then performed by defining a label field across the graphG. This label field is associated with a model that involves a intra-subject data-driven term based on the blobs t-values and an inter-subject matching term basedon the blobs anatomical localization. The label field is modeled as a MarkovRandom Field (MRF), whose optimal realization can be achieved in a Bayesianframework, by minimization of an energy function using a Gibbs sampler withannealing [7]. The system’s global energy measures the fit between the labelfield and the model. Blobs which eventually get the label ’zero’ are considered asnoise. Blobs with a same positive label represent the same group-level activationand appear at most once in every subject and at least in one subject, pointingout brain areas that showed good activation with relevant inter-subject matchingacross the group. Resultingly, a group activation is represented by a set of blobs

Page 15: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

subject 1 subject 2 ... subject N

Fig. 1. Structural group analysis : an overview of the process

of interest which forms a connected subgraph in the comparison graph G. Eachconnected subgraph is characterized by a local energy value, an average t-valueand an average intersubject spatial overlap. Local energy is always negative bydefinition of the model and because it corresponds to a minimum. The size ofthese connected subgraphs can vary, and is equal to the number of subjects theactivation appears in.

3 Significance assessment : method

Assessing the significance of positive labels returned by the annealing impliesestimating the distribution of the population they belong to, that is the distri-bution of connected subgraphs of a specific size in the global graph.

Let l be a positive label given by the analysis to p different blobs forming acluster, hence characterized by an energy El, an average t-value tl and an aver-age intersubject spatial overlap sl. The t-value and spatial overlap are valuableinformation since they are meaningful in terms of neuroscience interpretationand independent of our detection model, unlike the energy. The process has de-tected this component l on the basis of its interesting profile (associated to alow negative energy) in comparison to numerous other existing connected sub-graphs of same size p in the global graph. Hence, with regard to this particularcluster, estimating the distribution of a chosen statistic, such as average t-valueor overlap, on the population of connected subgraphs of same size is a way to

Page 16: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

demonstrate its atypical nature. This recalls the typical null-hypothesis testsperformed in functional imaging : if we assume that there is no group activationin our graph, what is the probability of observing a value at least as high as theone of our object ?

To sample the distribution of p-blob connected subgraphs in the graph, weuse a random draw with replacement method, performed as follows : first, arandom blob is drawn, then a second one is chosen among its neighbours in thegraph. Until size p is reached, additionnal blobs are randomly drawn from thelist containing the neighbours of every previously added blob. In case the firstblob is chosen in too small a connected component, it is drawn randomly again.Once a sample of size p has been drawn, a new one is drawn using the samemethod after replacement (the same sample can be drawn multiple times). Foreach sample, corresponding energy, average t-value and average spatial overlapare computed and stored. The whole process iterates until enough samples areacquired.

In this context, it is interesting to determine whether the sampling processis able to draw random components that are as atypical as those returned bythe former structural analysis step, like for instance, components with negativeenergy. Since no assumption is made about the distributions, we use a non-parametric rank analysis approach to compare the activated blob sets, in termsof the chosen statistic (tl or sl), to the distribution to which they belong. Fora given statistic, assessing the percentile ranks of putative activation clustersinforms about the (small) proportion of similar samples obtained randomly andtherefore about the significance of these clusters.

4 Experiments and results

4.1 Data preparation

We ran a set of experiments on fMRI images from a functional localizer protocol,involving 10 subjects in a series of cognitive, motor and perceptual tasks (see [8]for more details on the protocol). Two different contrasts were studied in par-ticular : Right motor – Left motor and Motor – Cognitive. The whole pipelinecited in section 2 was applied to the data : extraction of cortical meshes, projec-tion of functional volumes onto meshes, calculation of individual surface-basedstatistical t-maps and construction of surface-based scale-space primal sketchesin order to generate the appropriate global comparison graph for each contrast.Structural group analysis was then performed on each graph. For each contrast, aset of labels and their corresponding set of blobs was obtained. For each positivelabel, the sampling process was applied so as to collect 104 connected sub-graphsof same size within the corresponding group graph. Energy, average t-value andaverage spatial overlap were computed for each sample. Finally, for each labelproduced by the structural analysis, the percentile ranks of their average t-valueand their average overlap were computed with respect to the distributions pro-duced by the sampling process described before. From those two percentiles wealso compute the mean percentile rank as a summary statistic.

Page 17: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

Fig. 2. Structural surface-based analysis results for Right motor – Left motor contrast.

Fig. 3. Structural surface-based analysis results for Motor – Cognitive contrast.

4.2 Results of the structural analysis

Right motor – Left motor contrast : The structural analysis returns a set of 3non-null labels. Two of them appear 10 times, i.e. once per subject, on each sideof the central sulcus. One appears twice, in the frontal region.

Motor – Cognitive contrast : The analysis generated a set of 11 non-null labels.In particular, two of them appear 10 times, once per subject, on each side ofthe central sulcus. One shows 9 individual occurrences, in the temporal lobe.Another label appears 8 times and is located in the supplementary motor area(SMA). Another one has 6 individual occurrences, located in the parietal lobe.See table 1 for an exhaustive list of these labels and their localization. Figure 2and 3 give illustrations of these results on inflated cortical meshes with surface-based coordinate grids, each colour corresponding to particular label.

Page 18: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

4.3 Results significance

All the results, percentile ranks, energies, are presented in Table 1. They areranked according to their mean percentile rank. For the Right motor – Leftmotor contrast, two labels have a mean percentile rank above 95%, while for theMotor – Cognitive contrast five of them do.

Right motor – Left motor contrast :

label occurr. t s %t %s mean % energy neg localization

84 10 5.44 0.49 99.91 99.98 99.95 -17.77 10 postcentral g.

43 10 5.11 0.51 99.41 99.99 99.7 -17.44 10 precentral g.

13 2 4.04 0.7 86.26 97.71 91.99 -0.1 4 frontal

Motor – Cognitive contrast :

label number t s %t %s mean % energy neg localization

24 10 8.63 0.34 99.93 99.57 99.75 -14.49 0 postcentral gyrus

19 10 7.9 0.5 99.25 99.98 99.62 -18.31 0 precentral g.

37 8 7.19 0.52 95.89 99.96 97.93 -11.37 0 suppl. motor area

47 6 5.86 0.38 91.93 98.11 95.02 -3.75 0 parietal area

94 9 5.66 0.36 90.56 99.73 95.15 -9.02 0 temporal area

13 2 5.61 0.87 85.04 99.62 92.33 -0.35 8 dorsal precentral

60 5 8.39 0.2 99.38 64.58 81.98 -0.62 0 postcentral g.

49 3 9.29 0.19 99.8 49.71 74.75 -0.06 2 precentral g.

40 3 4.85 0.28 74.05 72.32 73.19 0.1 2 intern. face close to occip.

53 8 6.79 0.12 93.51 39.42 66.46 -1.19 0 postcent. close to insula

83 5 5.07 0.14 78.65 40.06 59.35 -0.71 0 superior parietal

Table 1. Significance assessment for labels returned by the analysis : t and s standfor the average t-value and spatial overlap of the labels, % gives the percentile rank ofaverage t-value and spatial overlap, mean % is the arithmetic mean of the two formerpercentiles, neg is the count of randomly sampled subgraphs of same size as the label’ssubgraph, with negative energies, number is the number of individual occurrences.

5 Discussion

The Right motor – Left motor contrast shows two main activations, reproducibleacross the 10 subjects of the study. The Motor – Cognitive contrast shows twoactivations as well, with 10 individual occurrences. All these labels present aver-age percentiles higher than 99.5%. These actually correspond to primary motorand sensory activations, located in primary sensorimotor area (M1-S1), and theneuroscientific value of those results is well known. Some secondary activationsare depicted by labels with from 6 to 9 occurrences. Their percentiles are higherthan 95%.

Page 19: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

Fig. 4. Detailed plots showing distributions of connected subgraphs of different sizes:the percentile ranks of the labels being tested on those distributions are shown withbars (red, also green if two labels are tested on the same ditribution) . Each distribu-tion was sampled by 104 connected subgraphs. The average t-value (left column) andspatial overlap (right column) were calculated. Two top rows: Right motor – Left motorcontrast. Four bottom rows: Motor – Cognitive contrast.

Under 95%, labels show various numbers of occurrences from 2 to 8. Amongstthem, some present high average t-values and low average spatial matching orinversely. This case is observed when the detection process finds some energy-minimizing configuration that fit poorly with the model, as suggested by theirenergy value, generally close to zero. Generally, in a group analysis, an activationmust occur in a large number of subjects to present some value. Therefore, clus-ters of 10 blobs are more valuable than those with just a few. This is illustratedby the percentile ranks increasing with clusters order (see table 1 and figure 4).Also, the count of randomly sampled clusters with negative energies quickly fallsto zero as clusters size increases. This proves the strong outlier nature of theseblobs, as well as the ability of the energy minimization process to point themout in the data. Still, the method allows to figure out whether a small cluster

Page 20: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

shows atypical features and therefore should be presented in the analysis results,despite its small size, like for instance label 47 in table 1.

The energy value gives a first idea of the significance of each label, regardingthe quality of their fit with the model. However, this value depends entirelyon the model. In comparison, we presented here a way of assessing the labelssignificance independantly from the model, close to neuroscience matters, andallowing a post hoc validation of the analysis results.

6 Conclusion

This paper presents a method to assess the significance of results produced bystructural group analyses on the cortical surface as presented in [4]. This sig-nificance assessment is done by sampling distributions directly from within thedata and performing percentile rank analysis without any assumption aboutthese distributions. Experiments were performed on two different contrasts froma functional localizer protocol [8]. We showed that it is possible to provide multi-subject blob clusters sorted by their energy (i.e. their fit with the model) but alsorelevantly quantified by their percentile rank in their appropriate distribution.

References

1. Riviere, D., Mangin, J., Papadopoulos-Orfanos, D., Martinez, J., Frouin, V., Regis,J.: Automatic recognition of cortical sulci of the human brain using a congregationof neural networks. Medical Image Analysis 6(2) (2002) 77–92

2. Coulon, O., Mangin, J.F., Poline, J.B., Zilbovicius, M., Roumenov, D., Samson,Y., Frouin, V., Bloch, I.: Structural group analysis of functional activation maps.Neuroimage 11(6) (2000)

3. Thirion, B., Pinel, P., Tucholka, A., Roche, A., Ciuciu, P., Mangin, J., Poline, J.:Structural analysis of fmri data revisited: improving the sensitivity and reliabilityof fmri group studies. IEEE Trans Med Imaging 26(9) (Sep 2007) 1256–1269

4. Operto, G., Clouchoux, C., Bulot, R., Anton, J., Coulon, O.: Surface-based struc-tural group analysis of fmri data. In: MICCAI 2008. Volume 11. (2008) 959–66

5. Lindeberg, T., Lidberg, P., Roland, P.E.: Analysis of brain activation patterns usinga 3-d scale-space primal sketch. Hum Brain Mapp 7(3) (1999) 166–194

6. Clouchoux, C., Coulon, O., Riviere, D., Cachia, A., Mangin, J.F., Regis, J.: Anatom-ically constrained surface parameterization for cortical localization. In Duncan, J.,Gerig, G., eds.: Medical Image Computing and Computer-Assisted Intervention -MICCAI 2005: 8th International Conference. Volume 3750 of LNCS., Palm Springs,CA, USA, Springer-Verlag Berlin Heidelberg (octobre 2005) 344–351

7. Geman, S., Geman, D.: Stochastic relaxation, gibbs distribution, and the bayesianrestoration of images. IEEE Trans. on Pattern Anal. Machine Intell. (6) (1984)721–741

8. Pinel, P., Thirion, B., Meriaux, S., Jobert, A., Serres, J., Le Bihan, D., Poline, J.,Dehaene, S.: Fast reproducible identification and large-scale databasing of individualfunctional cognitive networks. BMC Neurosci 8 (Oct 2007) 91

Page 21: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

Nonparametric Mean Shift Functional Detection in the Functional Space for Task and Resting-state fMRI

Jian Cheng1,2, Feng Shi3, Kun Wang1, Ming Song1, Jiefeng Jiang1, Lijuan Xu1, Tianzi Jiang1

1 LIAMA Research Center for Computational Medicine, Institute of Automation, Chinese Academy of Science

2 Odyssée Project Team, INRIA Sophia Antipolis – Méditerranée, France 3 Department of Radiology and BRIC, University of North Carolina at Chapel Hill

[email protected]

Abstract. In functional Magnetic Resonance Imaging (fMRI) data analysis, normalization of time series is an important and sometimes necessary preprocessing step in many widely used methods. The space of normalized time series with time points is the unit sphere , named the functional space. Riemannian framework on the sphere, including the geodesic, the exponential map, and the logarithmic map, has been well studied in Riemannian geometry. In this paper, by introducing the Riemannian framework in the functional space, we propose a novel nonparametric robust method, namely Mean Shift Functional Detection (MSFD), to explore the functional space. The first merit of the MSFD is that it does not need many assumptions on data which are assumed in many existing method, e.g. linear addition (GLM, PCA, ICA), uncorrelation (PCA), independence (ICA), the number and the shape of clusters (FCM). Second, MSFD takes into account the spatial information and can be seen as a multivariate extension of the functional connectivity analysis method. It is robust and works well for activation detection in task study even with a biased activation reference. It is also able to find the functional networks in resting-state study without a user-selected “seed” region. Third, it can enhance the boundary between different functional networks. Experiments were conducted on synthetic and real data to compare the performance of the proposed method with GLM and ICA. The experimental results validated the accuracy and robustness of MSFD, not only for activation detection in task study but also for functional network exploration in resting-state study.

1 Introduction

Functional Magnetic Resonance Imaging (fMRI) has become a powerful technique to study spatial-temporal neural activity in human brain. A large number of methods have been proposed for fMRI data analysis, including both task and resting state analysis. In general, the existing methods could be categorized into two families: model-based methods and model-free (or data-driven) methods.

Model-based methods often parametrically fit a prior model to the data by statistical techniques, such as correlation analysis, variation analysis, t-test, linear regression, and so on. Among them, General Linear Model (GLM) [1] is a typical one, in which false positive ratio could be limited through selecting an appropriate P value

Page 22: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

from hypothesis test. However, it has several critical limitations: (1) a linear addition model is assumed; (2) the residuals are assumed to follow a normal distribution, and so are the parameters; (3) the choice of the variables in the design matrix is sometimes arbitrary; (4) linear convolution is assumed and every voxel shares the same Hemodynamic Response Function (HRF). Another popular one is the functional connectivity method [2], which needs a user-selected ‘seed’ region of interest. It takes the mean time course in the seed region as the reference signal in the GLM. However, in most studies it is unclear how to select an appropriate seed region. Most model-based methods, including the GLM and the functional connectivity analysis, are univariate methods which assume every time series is independent of each other.

In model-free methods, effects or components of interest in the data are found based on some specific criterions. For example, Principal Component Analysis (PCA) [3] assumes that the structures of interest in the data are uncorrelated both in the temporal and spatial domains. Independent Component Analysis (ICA), including the spatial ICA [4, 5] and the temporal ICA [6], assumes that the data is a linear addition mixture of some independent sources. Another kind of model-free methods is clustering, including Fuzzy C-Means (FCM) [7], Gaussian mixture model [8], etc. These methods are based on the general physiological fact that the activities within a specific functional system have a certain similarity, which is the base of functional networks. Clustering methods partition the brain into some clusters based on some similarity measures. Model-free methods are multivariate and could gather more information and have less prior assumption than model-based methods. Moreover, model-free methods could be used for both activation detection in task study and functional networks analysis in resting-state study, which is difficult for model-based methods. However, there are still some intrinsic assumptions in model-free methods, such as linear addition, uncorrelation and independence which cannot be fully satisfied in the brain. How to determine an appropriate similarity measure and the number of clusters is also an open problem for clustering methods.

In both model-based and model-free methods, normalization of time series is an important and sometimes necessary preprocessing step. Normalization means that the time course is subtracted by its mean (centering) and then divided by its L2 norm (scaling). The space of normalized time series with n time points is a unit sphere

, named as the functional space by Friston [3]. In fact, the results of most methods, for which the normalization may not be necessary, do not change under normalization transform, which could be called as “normalization invariant”. That means most of the current fMRI data analysis methods work in the functional space. In this paper, by introducing the Riemannian framework in the functional space, we propose a novel robust nonparametric Mean Shift Functional Detection method (MSFD) to explore the functional space.

2 Riemannian Framework in the Functional Space

In fMRI study, each normalized time course could be considered as a point scattered in the functional space , and the point is called as the shape of the time course. Every point x in the sphere is a representative element of an equivalence class in ,

Page 23: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

i.e. | 0, . The distribution of these points embodies the functional topography [3]. The closer two points are, the higher their correlation (i.e. functional connectivity) is. In fact, the correlation is the cosine of the angle subtended at the origin (see Fig. 1). Considering in GLM, , we have . Model-based methods use the predefined shape of the HRF (a point on ) to find a group of points that have close shape. Comparatively, model-free methods analyze the distribution and the structure of these points. PCA and ICA methods try to find some mapping directions to properly interpret the distribution. Clustering methods try to partition these points into some functional homogeneous groups based on some certain similarity measures. In summary, the most important issue in fMRI data analysis is how to describe and interpret the distribution and how to measure the degree of the “close”, which needs a rigorous mathematical formulation.

Considering the sphere is a simple manifold that has been well studied in Riemannian geometry, we thus adopt the geodesic distance to describe how close two points are. Moreover, tangent space theory could be used for algorithms device [9]. In particular, assuming that and are two points, the geodesic is the part of the great circle between them. Formulae (1) (2) (3) give the geodesic distance, the exponential map and the logarithmic map. A pictorial representation is shown in Fig.1.

x

y

log ( ) xv y

dv

d=

r cos

Fig. 1 Pictorial representation of the functional space. : the geodesic distance between and

. : a vector in the tangent space of . : correlation coefficient. · : L2-norm.

Geodesic distance: , , ∑ (1)

Exponential map: / , where (2)

Logarithm map: / (3)

3 Mean Shift Functional Detection (MSFD)

As we discussed above, we formulate fMRI data processing as the problem of analyzing the structure of points in the functional space. A commonly accepted physiological fact is that time courses in the same functional area have covariant fluctuation, which forms the base of functional networks. In other words, they have close shapes (i.e. small geodesic distances). The functional space is a very high dimensional unit sphere. Points in the functional space are sparsely scattered and relatively assemble in groups. If we consider these points as the observations from a

Page 24: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

certain distribution, i.e. the probability distribution of the time course distributed in the whole brain, obviously hills appear in the area with many points, and valleys appear in the area with few points in the distribution on sphere. The idea of MSFD is to find the optimal representative time courses (peaks) and partition these points based on these peaks of different hills. Time courses in the same hill have close shapes (i.e. small geodesic distances). The spatial location distribution of these time courses within the same hill could be considered as a functional network (see Fig. 2).

Mean Shift (MS) is a powerful nonparametric clustering method for feature space analysis in computer vision and machine learning [10]. MS could iteratively move points towards the high density regions in the feature space and cluster data into some groups. Compared with other clustering methods, it is independent of the initialization, and can cluster data into arbitrary number of arbitrary shaped groups [10,12]. Traditional MS method is devised for low dimension Euclidean space (e.g. gray or color natural image). Two modified versions, respectively for low dimension analytic manifold [11] and high dimension Euclidean space [12], have been proposed recently. Considering the requirements in fMRI, we integrate these two versions of MS into the functional space, i.e. a high dimension manifold. If , 1, , are

points in the functional space, then the adaptive kernel density estimation (KDE) at with the kernel profile · could be expressed in (4). Define the mean shift

vector in the tangent space as (5). It could be proved that is collinear with the gradient of the probability density function (PDF) at [11]. · is the kernel profile, and g . · is chosen as the truncated Gaussian profile. So . is the kernel bandwidth at , which is adaptively determined as a half of the distance between and its nearest neighbor (KNN method). It has been proposed in high dimension Euclidean space that KNN adaptive MS is robust [12]. In this paper, the results of both synthetic and real experiments change little, even if k ranges from 500 to 1500. Therefore, we choose =500 for computational efficiency.

Fig. 2 A distribution of two hills in the functional space. Red hill and blue hill are partitioned by a valley. The black arrows are the movement directions of points. The red and blue arrows denote the locations of time courses in brain. They belong to two networks.

Adaptive KDE: ∑ , (4)

Mean Shift Vector: ∑

,

∑,

(5)

MSFD could be used not only for activation detection in task study, but also for

functional network analysis in resting-state study. Both of them are based on Mean Shift Iteration (MSI) that is given below. Two images can be obtained after MSI. One is a 4D functional image, which contains the time courses after MSI. The other one is

Page 25: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

a 3D dist image for statistical post-processing, which denotes how far a point is away from itself in the original position in the functional space. After the MSI, these new

gather in some peaks of hills that are the optimal representative time courses. Thus if two points are belonged to two groups, the distance between them will be enlarged after MSI, which will enhance the boundary between different networks (Fig.2) [13]. Algorithm: Mean Shift iteration (MSI)

Input: time courses, , 1, , normalize , 1, ,

for 1, , 0 Repeat : Determine (KNN)

,

∑,

,

, Until Output: save every new and , 1, , MSFD for functional network analysis in resting-state study. We can estimate these representative time courses through the following process. First, many clusters could be generated by grouping together all the new that are closer than a given small threshold 0.05 (correlation coefficient=0.9988). Second, many cluster centers are calculated by averaging new time courses in the same cluster. Third, these centers are sorted in descending order according to the number of voxels in the cluster. More voxels in the cluster, more believable the representative time course is. For every representative course and every point in the whole brain, a distance index could be calculated, , where is the distance between the new and , Compared with GLM, an analogous value is used to measure how close a point is to the .

√ 2 cos / 1 cos , where is length of time course (6)

Therefore, for there is a map , 1, , . If 0, MSI dose not start and 0, . Every point is the representative time courses itself since there is just one point in every cluster, i.e. . Then for every voxel , we have a T map. In that case, MSFD is the same as the functional connectivity method. MSFD for activation detection in task study. Because the aim in task study is to find the activity region instead of all activity networks in the brain, it can be seen as a special case of whole brain network analysis. We just need to find the right hill (i.e. the activation hill) that the given point (the expected hemodynamic response) belongs to. So the computational cost is much lower than in resting-state study. In practical, the given hemodynamic response (HR) is usually biased by many factors, such as the noise, the assumption of different regions share the same HR, etc. A novel idea to avoid these limitations and make full use of the given biased reference is to consider it as a real time course in brain, and update it through MSI. We consider the corrected expected HR as the real activation reference for this subject. Obviously, different

Page 26: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

subjects will have different references. However, these references are close if all subjects do the same task, which could be seen from the small dist values of all references. The corrected reference is considered as the representative time course of the activation hill. MSI is needed only for points which are in a certain cone (e.g. correlation coefficient > 0.05). For the points after MSI, , where is the distance between new and the corrected reference. Analogously, T values (formula 6) could be calculated for these points. Thus we have an activity significant map, i.e. the T map, for the reference. It should be noted that if

0, MSI does not start and the reference does not update. The T map is just the same as the T map used in the GLM with the only one repressor, i.e. the reference.

4 Experiments on Synthetic and Real Data

Two synthetic and two real experiments are given to compare the proposed method with widely used GLM and ICA. For synthetic experiments, we utilize receiver operating characteristic (ROC) analysis [14] to evaluate MSFD, GLM (SPM2, http://www.fil.ion.ucl.ac.uk/spm/) and ICA (GIFT, http://icatb. sourceforge.net/). In the first synthetic experiment, we will test the activity detection performance of three algorithms with the accurate reference. We use a resting-state data as null data [15]. Then we preprocess the null data by SPM2 including slice timing, realigning, EPI template normalization of 3 3 3 voxel size and Gaussian smoothing of 4mm FWHM. Last, synthetic data is generated by replacing resting time courses in given position (two 10 10 10 boxes, Fig.3(a)) with simulated response which is generated by mixing reference signal with white noise under a given contrast-to-noise ratio (CNR=0.4) [7]. Reference signal is simulated by a boxcar (24s rest and 26s stimulation, totals 7 cycles) convolved with the HRF that is a combination of two γ-functions in SPM2. With the accurate estimation of reference, GLM and ICA both achieve good performance (perfect classification), which is obvious because the activation signal generation satisfies the assumptions of GLM (linear addition, white noise) and ICA (spatially independent). Our MSFD is also a perfect classification even without these assumptions.

Fig. 3 ROC analysis of synthetic data. From left to right: (a) Illustration of block position. (b) the real reference, the mean time course of boxes, the biased reference and the corrected reference; (c) ROC curves of three methods with the biased reference.

The second synthetic experiment is to evaluate the robustness of methods by using a biased reference signal since the real activation reference cannot be accurately

Page 27: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

estimated in real experiment. The average time course of the original resting data in the two boxes was added to the reference with CNR=1.0 to simulate the biased reference. Correlation coefficient between the real reference and the biased reference was 0.723. Fig. 3(b) shows that after MSI, the corrected reference is very close to the real reference (correlation coefficient is 0.96), which makes the activity detection result unchanged and validates the robustness of MSFD. The ROC curves of the three methods with the biased reference are given in Fig.3(c). It reveals that the performance of GLM and ICA relies on the reference estimation. A wrong component was chosen in ICA based on the biased reference, which got the lowest performance, although these components were decomposed independently of the reference.

In the first real experiment, the real fMRI data is the auditory bi-syllabic data from the SPM public dataset (http://www.fil.ion.ucl.ac.uk/spm/data/auditory.html). The preprocessing procedures are similar to that in synthetic data. Fig. 4 shows one slice of the results. These three methods all found the activated auditory regions. From the color bars of Fig. 4, we can find that MSFD have the biggest contrast, because it can enhance the boundary between different networks.

In the second real experiment, the real resting-state data that is the null data in the synthetic experiments is used to explore the functional networks. We can get the representative time courses after MSFD. And for each of them, we have a T map. Here we just show some of these T maps for the default-mode network, hippocampus, sensorimotor and visual area (Fig.5). The slice number is shown in each axial image.

Fig. 4 Results of one slice. From left to right: (Left) GLM (SPM2, p<0.0001); (Middle) ICA (GIFT, z> 0.6); (Right) MSFD (T>2).

21 -24 66 6

Fig. 5 From left to right: default-mode network, hippocampus, sensorimotor, visual area.

5 Conclusions

In this paper, we propose a nonparametric Mean Shift Functional Detection (MSFD) method based on the Riemannian framework in the functional space. Compared with other classic methods, GLM and ICA, MSFD does not assume the linear addition and

Page 28: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

the independence. It is based on the intrinsic manifold formulation, does not depend on the initialization and does not assume the number and the shape of clusters. MSFD considers the local spatial information, and is a multivariate extension of the functional connectivity method. For every optimal representative time course, MSFD can get a T map, which is analogous with the T map in GLM. It can enhance the boundary between different functional networks, and robustly detect the optimal representative time courses of the data. It has a good performance validated by the experiments both on synthetic and real data. Therefore, it is a data-driven, robust method based on manifold formulation, which is appropriate for both the activation detection in task study and the functional network analysis in resting-state study.

References

1. Friston, K.J., Holmes, A.P., Worsley, K.J., Poline, J.B., Frith, C.D., Frackowiak, R.S.J.: Statistical parametric maps in functional imaging: a general linear approach. Human Brain Mapping 2 (1995) 189-210

2. Biswal, B., Zerrin Yetkin, F., Haughton, V.M., Hyde, J.S.: Functional connectivity in the motor cortex of resting human brain using echo-planar MRI. Magnetic Resonance in Medicine 34 (1995) 537-541

3. Friston, K.J.: Eigenimages and multivariate analyses. In: Frackowiak, R.S.J. (ed.): Human Brain Function. 1st edn (1997)

4. McKeown, M.J., Makeig, S., Brown, G.G., Jung, T.P., Kindermann, S.S., Bell, A.J., Sejnowski, T.J.: Analysis of fMRI data by blind separation into independent spatial components. Human Brain Mapping 6 (1998) 160-188

5. Beckmann, C.F., Smith, S.M.: Probabilistic independent component analysis for functional magnetic resonance imaging. IEEE transactions on medical imaging 23 (2004) 137-152

6. Calhoun, V.D., Adali, T., Pearlson, G.D., Pekar, J.J.: Spatial and temporal independent component analysis of functional MRI data containing a pair of task-related waveforms. Human Brain Mapping 13 (2001) 43-53

7. Fadili, M.J., Ruan, S., Bloyet, D., Mazoyer, B.: A multistep Unsupervised Fuzzy Clustering Analysis of fMRI time series. Human Brain Mapping 10 (2000) 160-178

8. Golland, P., Golland, Y., Malach, R.: Detection of spatial activation patterns as unsupervised segmentation of fMRI data. MICCAI 2007, Vol. 4791. Springer, Heidelberg (2007) 110-118

9. Buss, S.R., Fillmore, J.P.: Spherical averages and applications to spherical splines and interpolation. ACM Transactions on Graphics 20 (2001) 95-126

10. Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. Pattern Analysis and Machine Intelligence, IEEE Transactions on 24 (2002) 603-619

11. Subbarao, R., Meer, P.: Nonlinear Mean Shift for Clustering over Analytic Manifolds. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, New York, 1 (2006) 1168–1175

12. Georgescu, B., Shimshoni, I., Meer, P.: Mean shift based clustering in high dimensions: a texture classification example. IEEE International Conference on Computer Vision, Vol.1 (2003) 456-463

13. Cohen, A.L., Fair, D.A., Dosenbach, N.U., Miezin, F.M., Dierker, D., Van Essen, D.C., Schlaggar, B.L., Petersen, S.E.: Defining functional areas in individual human brains using resting functional connectivity MRI. Neuroimage 41 (2008) 45-57

14. Skudlarski, P., Constable, R.T., Gore, J.C.: ROC Analysis of Statistical Methods Used in Functional MRI: Individual Subjects. Neuroimage 9 (1999) 311-329

15. Lu, Y., Jiang, T., Zang, Y.: Region growing method for the analysis of functional MRI data. Neuroimage 20 (2003) 455-465

Page 29: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

Adaptive hierarchical Bayesian mixture forsparse regression - An application to brain

activity classification

Vincent Michel1,2,5, Evelyn Eger4,5, Christine Keribin2,3, and BertrandThirion1,5

1 Parietal team,INRIA Saclay-Ile-de-France, Saclay, France ,2 Universite Paris-Sud 11, Orsay, France,

3 SELECT team, INRIA Saclay-Ile-de-France, France4 INSERM U562, Gif/Yvette, France

5 CEA, DSV, I2BM, NeuroSpin, Gif/Yvette, France

Abstract. In this article, we describe a novel method for regularizedregression and apply it to the prediction of a behavioural variable frombrain activation images. In the context of neuroimaging, regression orclassification techniques are often plagued by the curse of dimensional-ity, due to the extremely high number of voxels and the limited numberof activation maps. A commonly-used solution is regularization of theweights used in the parametric prediction function. To solve the diffi-cult issue of choosing the correct amount of regularization in the model,we propose to use a Bayesian framework, but model specification needsa careful design to balance adaptiveness and sparsity. We introduce anadaptive mixture regularization that generalizes previous approaches.Based on a Variational Bayes estimation framework, our algorithm isrobust to overfitting and more adaptive than other regularization meth-ods. Results on both simulated and real data show the accuracy of themethod in the context of brain activation images.

1 Introduction

A recent trend in neuroimaging [1] consists of inferring behavioral informationor cognitive states from activation brain images such as those obtained withfunctional magnetic resonance imaging (fMRI). It can provide more sensitiveanalyses than standard statistical parametric mapping procedures [2]. Specifi-cally, it can be used to check the involvement of one or several brain regionsin specific cognitive or perceptual functions by evaluating the accuracy of theprediction of a behavioral or cognitive variable of interest (the target) when theclassifier is instantiated on that particular brain region. Such an approach isparticularly well suited for the investigation of population coding [3]: certainneuronal populations are thought to activate specifically when a certain percep-tual or cognitive parameter reaches a given value. Inferring this parameter fromthe neuronal activity helps to decode the brain system.The main difficulty in this procedure is the huge dimensionality of the data,

Page 30: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

with far more features than samples. In this article, the samples (or activationmaps) refer to the regression coefficients in the General Linear Model (GLM)analysis (i.e. beta maps), while the features correspond to the voxels. The largenumber of features leads to overfitting and thus a dramatically decrease in pre-diction accuracy. One common solution consists in working in the dual spaceusing the kernel trick [4], but in the case of neuroimaging, one may prefer to useexplicit loadings on brain regions, hence to work in the primal space. To dealwith this dimensionality problem, some regularized regression techniques havebeen developed, forcing the majority of the features to have zero or close to zeroloadings, such as Lasso [5] and elastic net [6]; however, these approaches requirethat the amount of regularization is fixed beforehand, and possibly optimizedby cross-validation. By contrast, Bayesian methods (e.g. adaptive ridge regres-sion [7] and Automatic Relevance Determination – ARD [8]) adapt the amountof regularization to the problem at hand. These regularized regression methodshave already been used for predicting cognitives states. In [9], a model based onARD has been proposed for weighting activity patterns in the case of logisticregression, but ARD can overfit in the case of very high dimension. Similarly,in [10] a Bayesian regression approach is used to classify brain states, but theconstruction relies on ad hoc voxel selection steps, so that there is no proof thatthe solution is optimal. In summary, Bayesian regression techniques have beendeveloped in two contexts: on the one hand, adaptive ridge regression regular-izes all the loadings with the same parameter, which is not well-suited for brainactivity where only few clusters of voxels have task-related activity; on the otherhand ARD regularizes separately each voxel, and is prone to overfitting whenthe model contains too many regressors.In this article, we develop an intermediate approach for sparse regularized re-gression, which assigns each voxel to a class. Regularization is performed in eachclass separately, leading to a stable and adaptive regularization, while avoidingoverfit – this approach is thus a compromise between ridge regression and ARD.The algorithm is based on a Variational Bayes (VB) approach which leads to afast estimation of the weight distributions. The parameters-updating algorithmis no more complex than an Expectation Maximization algorithm, and it itera-tively adapts the hyperparameters to the particular problem. Moreover, the VBapproach has one important property for model selection : it contains a built-incriterion, the free energy of the model. After introducing our model and the VBapproach, we show that the proposed algorithm performs better than referencemethods on simulated data, and leads to promising results on real data.

2 Methods

We introduce the following regression model :

y = Φw + ε (1)

where y represents the behavioural data to be fitted (y ∈ Rn) and w the pa-rameters (w ∈ Rm). n is the number of beta maps obtained with a GLM (each

Page 31: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

image corresponds to one stimulus presentation) and depends on the number ofblocks in the paradigm; m is the number of voxels and Φ is the design matrix(Φ ∈ Rn×m, each row is an m-dimensional activation map). The crucial issuehere is that n � m, so that estimating w is an ill-posed problem. A solution isto introduce some priors over the parameter distribution.

Priors on regression and adaptive relevance determination (ARD) -Regularized regression can be used to solve this ill-posed problem, by imposing aprior on the weights, hence possibly a sparse feature weighting. First, we modelthe noise with a Gaussian density:

ε ∼ N (0, σ2In) (2)p(σ2) = Γ (−1)(λ1, λ2) (3)

with two hyperparameters λ1, λ2. Γ (−1) stands for the inverse gamma density.For mathematical convenience, we make use of conjugate Gaussian prior for theweights, leading to an L2 penalty :

w ∼ N (0, A−1), A = diag(α1, ..., αm) (4)p(αi) = Γ (γi1, γ

i2) (5)

where αi, i ∈ [1,m] are the precision parameters, and Γ is the gamma density.Two important cases correspond to adaptive ridge regression (α1 = ... = αm)and ARD (αi 6= αj if i 6= j). Still, the highly adaptive regularization of the ARDcan lead to severe overfitting if n� m.

Mixture model (VBK) - In order to accommodate for the sparsity of ARDwith the stability of adaptive ridge regression, we introduce an intermediate rep-resentation, in which each voxel i belongs to one class among K indexed by thediscrete variable zi. Thus, all the features within a class k ∈ [1, ..,K] share thesame precision parameter αk. Next, we introduce a prior on z :

p(z) =m∏i=1

K∏k=1

πηik

k with{ηik = 0 if zi 6= kηik = 1 if zi = k

(6)

and a Dirichlet prior on πk with hyperparameter δ: p(π) = Dir(δ). The completegenerative model is summarized in Fig.1. This model has no spatial constraints,and thus is not spatially regularized.

Estimation and Selection of the model by Variational Bayes - To select amodel among several alternatives, it is natural to keep the model that yields thelargest data evidence p(y). Thus, we use the variational approach that providesa closed-form approximation q(θ) of p(θ|y), where q(θ) is in a given family ofdistributions and θ = [σ2, z, α, w, π] are the parameters of the model. We have:

q(θ) = q(w)

(m∏i=1

q(zi)

)(K∏k=1

q(αk)q(πk)

)q(σ2) (7)

By using conjugate priors, this variational scheme provides closed form for theupdate rules. We can then decompose log p(y) as the sum of the free energy F

Page 32: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

'

&

$

%Fig. 1. Generative model of the Bayesian mixture regression approach.

and the Kullback-Leibler divergence between the true posterior p(θ|y) and thevariational approximation q(θ) :

log p(y) = F(q(θ)) +DKL(q(θ)||p(θ|y)) (8)

F(q) =∫q(θ) log

p(θ, y)q(θ)

dθ (9)

The free energy F is thus a lower bound on log p(y) with equality iff q(θ) =p(θ|y), and inferring the density q of the parameters corresponds to maximizingF , which we do not detail here. Moreover, free energy is a measure of the qualityof the model and can be used in a model-selection scheme. In which case, theglobal time-consuming cross-validation-based optimization of K is avoided.

Initialization and validation - The initialization is set as in [8], with weakly-informative priors, λ1 = λ2 = γ1 = γ2 = 10−6 and δk = 5,∀k ∈ [1, ...,K] (see[11]). The initialization of z is performed by using a K-Means on the F-statisticsof the features. Since the estimation algorithm converges to a local maximum ofF , it is very sensitive to this initialization.The performance of the competing models is assessed using the ratio of explainedvariance ζ. Let Φl, yl be a learning set, Φt, yt a test set, and yt(Φl, yl, Φt) theprediction obtained with a model trained on Φl, yl and tested with Φt.

ζ(Φl, yl, Φt, yt) =var(yt)− var

(yt − yt(Φl, yl, Φt))

var(yt)(10)

ζ is the amount of variability in the response that can be explained by the model(prediction is perfect if ζ = 1, and is worst than chance if ζ < 0).

3 Experiments and Results

Simulated Data - We have tested our algorithm on a simulated dataset X ofn images with squared Regions of Interest (ROIs) R (defined by a position anda width). We note b the background (i.e. outside the ROIs). The signal in the(i, j) voxel of the kth image is simulated as :

Page 33: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

Xi,j,k =∑r∈R

Ir(i, j)αr,kui,j,k + Ib(i, j)ui,j,k + εi,j,k (11)

where ui,j,k is a random value from an uniform distribution in [0, 1], εi,j,k arandom value from a Gaussian distribution N (0, 1) smoothed with a parameterof 2 voxels to mimic the correlation structure observed in real fMRI datasets,αr,k ∼ U [0, 1] for ROI r and image k. Ir(i, j) = 1 (resp. Ib) if the (i, j) voxelis in r (resp. b), and Ir(i, j) = 0 (resp. Ib) elsewhere. We simulate the targetY as : Yk =

∑r∈R αr,k. We generate a dataset of 250 images, and split it into

a learning set of n = 200 images and validation set of 50 images. The imageshave a size of 20 × 20, with two non-overlapping ROIs of width 2 pixels. Anexample is given in Fig.3. We compare our algorithm with three other methods :a bilinear kernel-based variational ARD regression (also called Relevance VectorMachine RVM [8]), an elastic net regularization procedure (called Enet [6] withparameters s = 0.5 and λ = 0.1), and a Support Vector Regression procedure(called SVR [12] with a linear kernel and C = 1). The regularization parametersfor the RVM, SVR and Enet methods are fixed using an ad hoc calibration (byselecting the ones that yield the best accuracy on simulated datasets).

Results on Simulated Data - In a first experiment, we report the averageresults obtained for the different methods for 40 tests, and K equals to 1, 2or 3. See the results Fig.3: the VBK algorithm outperforms the other methodsfor K > 1 (c). Moreover, the VBK method finds very low and stable weightsoutside the ROIs (a,b), where Enet leads to a sparse (many weights are closedto zero) but less stable (higher standard deviation) regularized solutions. BothRVM and SVR yield a poorly regularized solution : many irrelevant voxels havea significant weight. In a second experiment, we compute the explained varianceand the free energy for different models with K ∈ [1, 2, 3, 4, 5] for 20 samples(see Fig. 2). We can see that the free energy is strongly correlated with theexplained variance, and yields the same optimum value K = 3. Moreover, theV BK classifier has a standard deviation which increases with K : being moreadaptive, the classifier yields less stable prediction.

Fig. 2. Results of the model selection procedure in the simulation experiment. The free energy (red)and the explained variance (blue) are averaged over 20 simulations. They are strongly correlated,with a maximum reached for K = 3. Thus, the free energy of the VBK model can be used for theselection of the model.

Real data - We use a real dataset related to a numerotopy (mental representa-tions of quantities) experiment. During the experiment, ten healthy volunteersview dot patterns with different quantities of dots (ν = 2, 4, 6 and 8; we takey = log(ν)), with 8 repetitions of each stimulus : so that we have a total of n = 32

Page 34: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

(a) (b)

(c)

VBK - K=1 VBK - K=2 VBK - K=3 RVM Enet SVR

average ζ 0.05 0.35 0.39 0.09 0.31 0.16

std. deviation ζ 0.16 0.18 0.13 0.17 0.18 0.17

Fig. 3. Results of the simulation experiment. ROIs are outlined by blue squares. Top - Example ofsimulated data: amplitude of the signal of an image (left), simulated loadings (middle) and F-statistic(right). Mean (a) and standard deviation (b) for the weights found with different methods. The VBKapproach gives weights similar to those of the Enet method, but with more stable estimation outsidethe ROIs. The RVM and SVM approaches lead to non-zero weights outside the ROIs, and weightsestimation is not stable across trials. (c) Ratio and standard deviation of ζ (see Eq.10) for differentmethods averaged on 40 simulations. The VBK algorithm outperforms all the other techniques andyields less variable results (when K > 1).

images by subject. Functional images were acquired on a 3 Tesla MR systemwith 12-channel head coil (Siemens Trio TIM) as T2* weighted echo-planar im-age (EPI) volumes using a high-resolution EPI-sequence. 26 oblique-transverseslices covering parietal and superior parts of frontal lobes were obtained in inter-leaved acquisition order with a TR of 2.5 s (FOV 192 mm, fat suppression, TE30 ms, flip angle 78◦ , 1.5×1.5×1.5 mm3 voxels). The slice-timing, realignment,co-registration and the fit of the GLM have been performed with the SPM5software. For our analysis we use the resulting activation maps (each image isstandardized by removing the mean and dividing by the standard deviation).We select 1000 voxels included in the main ROI, i.e. the Intra-Parietal Sulcus(IPS), which has been manually delineated in all the available datasets prior tofMRI data analysis. We divide this region into 200 parcels with an alternativeof Ward’s algorithm [13], and we average the values of the beta maps over vox-els within each parcel. The signal is thus more stable and the computation faster.

Results on Real data - We compute the explained variance obtained in aleave-one-session-out procedure for different methods and we compare the re-sults with the VBK algorithm (K = 3). The averaged results across 10 subjects.are given in Fig.4: the Enet algorithm outperforms the other methods, but yieldsless stable predictions. The VBK method performs better than ARD but worstthan the adaptive Ridge algorithm. More importantly, the VBK algorithm pro-

Page 35: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

vides maps of probabilistic belonging. We perform a two-class study (the binarycase yields to more interpretable maps) on real data. Fig.5 (a) shows the averageloadings w found by the VBK algorithm (K = 2) across subjects superimposedon the anatomical image of one subject. Fig.5 (b) and Fig.5 (c) give the aver-age probability of each voxel belonging to the low-weight or high-weight class.We can see that the VBK provides explicit classification maps that allow tounderstand the anatomical organization of discriminant brain activity.

Fig. 4. Results on real data: ratio and standard de-viation of ζ (see Eq.10) for different methods averagedacross 10 subjects. The VBK (K = 3) method performsbetter than the ARD and is respectively 2% and4%below the Ridge and Enet algorithms. The Enet algo-rithm yields to less stable predictions.

VBK Ridge ARD Enet

Average ζ 39% 41% 17% 43%

Std. dev. ζ 25% 20% 103% 60%

(a) (b) (c)Fig. 5. Results on the real dataset with K = 2. (a) Average weights of the parcels across sub-jects (between −0.1 - blue - and 0.1 - red), found by the VBK algorithm and superimposed onthe anatomical image of one subject. The loadings with low magnitudes are not shown. Averageprobability (between 0 -yellow - and 0.5 - red ) of each voxel to belong in the low-weight class (b)or high-weight class (c).

4 Discussion

Regularization of voxels loadings significantly increases the generalization abilityof the regression. However, this regularization has to be adapted to each particu-lar fMRI dataset, which is done in this article by introducing a Bayesian mixtureframework. Our approach performs an adaptive and efficient regularization, andis a compromise between a global regularization (ridge regression) which doesnot take into account the region-based structure of the information, and ARD,that is subject to overfit in large dimension.On simulated data, our approach performs better than other classical methodssuch as SVR, RVM and Enet. Besides an increase of the explained variancewhich shows that the VBK approach extracts more information from the data,the loadings are less noisy and more stable, leading to more interpretable activa-tion maps. The correlation between the free energy and the prediction accuracyconfirms that free energy is a valuable model selection tool that furthermoreavoids time-consuming optimization by cross-validation.Results on real data show that the VBK algorithm gives access to interpretableloading maps which are a powerful tool for understanding brain activity. The

Page 36: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

VBK algorithm yields to less accurate predictions than other regularizationmethods, which can be explained by the difficulty of initializing the variablez in the study of real data. We expect that alternative solution based on Gibb’ssampling will lead to more accurate predictions.A future direction of our work is to optimize the spatial model used in ourframework (here we simply use a prior parcellation of the search volume) in re-lationship with the prediction function that is used. In parallel, we will developnon-linear versions (e.g. logistic/probit) of this model for classification problems.

Conclusion - We have presented a multi-class regularization approach that in-cludes adaptive ridge regression and automatic relevance determination as limitcases; the ensuing problem of optimizing the number of classes is easily dealtwith in the Variational Bayes framework. Our simulations and real experimentsshow that our approach is well-suited for neuroimaging, as it yields a powerfulframework and also reliable and interpretable feature loadings.

References

1. Cox, D.D., Savoy, R.L.: Functional magnetic resonance imaging (fMRI) ”brainreading”: detecting and classifying distributed patterns of fMRI activity in humanvisual cortex. NeuroImage 19(2) (2003) 261–270

2. Kamitani, Y., Tong, F.: Decoding the visual and subjective contents of the humanbrain. Nature Neuroscience 8(5) (April 2005) 679–685

3. Dayan, P., Abbott, L.F.: Theoretical Neuroscience: Computational and Mathe-matical Modeling of Neural Systems. The MIT Press (2001)

4. Aizerman, A., Braverman, E.M., Rozoner, L.I.: Theoretical foundations of thepotential function method in pattern recognition learning. Automation and RemoteControl 25 (1964) 821–837

5. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Statist.Soc. Ser. B 58 (1996) 267–288

6. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. Jour-nal of the Royal Statistical Society B 67 (2005) 301–320

7. Bishop, C.M.: Pattern Recognition and Machine Learning (Information Scienceand Statistics). Springer (2006)

8. Bishop, C.M., Tipping, M.E.: Variational relevance vector machines. In: UAI ’00:Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence. (2000)

9. Yamashita, O., aki Sato, M., Yoshioka, T., Tong, F., Kamitani, Y.: Sparse es-timation automatically selects voxels relevant for the decoding of fMRI activitypatterns. NeuroImage 42 (2008)

10. Friston, K., Chu, C., Mourao-Miranda, J., Hulme, O., Rees, G., Penny, W., Ash-burner, J.: Bayesian decoding of brain images. NeuroImage 39 (2008) 181–205

11. Penny, W., Roberts, S.: Variational bayes for 1-dimensional mixture models. (2000)12. Cortes, C., Vapnik, V.: Support vector networks. In: Machine Learning. Volume 20.

(1995) 273–29713. Ward, J.H.: Hierarchical grouping to optimize an objective function. Journal of

the American Statistical Association 58(301) (1963) 236–244

Page 37: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

Random Walker Based Estimation and Spatial Analysis

of Probabilistic fMRI Activation Maps

Bernard Ng1, Rafeef Abugharbieh

1, Ghassan Hamarneh

2, Martin J. McKeown

3

1Biomedical Signal and Image Computing Lab, Department of Electrical and Computer

Engineering, The University of British Columbia.

2Medical Image Analysis Lab, Department of Computer Science, Simon Fraser University 3Department of Medicine (Neurology), Pacific Parkinson’s Research Center

[email protected], [email protected], [email protected],

[email protected]

Abstract. Conventional univariate fMRI analysis typically examines each

voxel in isolation despite the fact that voxel interactions may be another

indication of brain activation. Here, we propose using a graph-theoretical

algorithm called “Random Walker” (RW), to estimate probabilistic activation

maps that encompass both activation effects and functional connectivity. The

RW algorithm has the distinct advantage of providing a unique, globally-

optimal closed-form solution for computing the posterior probabilities. To

explore the implications of incorporating functional connectivity, we applied

our previously proposed invariant spatial features to the RW-based probabilistic

activation maps, which detected activation changes in multiple brain regions

that conform well to prior neuroscience knowledge. In contrast, similar analysis

on traditional activation statistics maps, which ignores functional connectivity,

resulted in reduced detection, thus demonstrating benefits for integrating

additional functional attributes into the activation detection procedures.

1 Introduction

Mass univariate analysis has by far been the most widely-used method for analyzing

functional magnetic resonance imaging (fMRI) data. In this approach, each voxel is

independently compared to an expected response to estimate the effects of activation.

As a result, voxel interactions (i.e. functional connectivity), are ignored despite that

each voxel is unlikely to be functioning in isolation. In fact, numerous past studies

have shown the presence of task-related functional networks [1] and functional

connectivity has actually been used as the basis for functional segmentation [2]. Thus,

these findings suggest that functional connectivity may be another indication of brain

activation and incorporating this additional information in estimating activation

effects may help combat the inherently low signal-to-noise (SNR) in fMRI signals.

The idea of incorporating functional connectivity in activation detection has, in

fact, been indirectly exploited in other past methods in the form of neighborhood

information. For instance, Descombes et al. proposed modeling activation effect maps

as Markov Random Fields (MRF) to encourage neighboring voxels to have similar

Page 38: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

labels (e.g. active or non-active) [3]. Others proposed using Bayesian approaches to

directly integrate neighborhood information into the activation effect estimates [4]-

[6]. The underlying assumption is that neighboring voxels are likely to be functionally

correlated, hence should exhibit similar activation levels. However, these methods

typically model only correlations between immediate spatial neighbors due to

computational complexity. Yet, considering that spatially-disconnected voxels may

also jointly activate upon stimulus [1], modeling the interactions between spatially-

disjointed voxels, in additional to immediate neighbors, may be beneficial.

Another limitation with standard mass univariate analysis is the need for spatial

warping to generate a voxel correspondence across subjects. This correspondence is

required for assessing the amount of inter-subject functional overlap to draw group

inferences. However, due to anatomical and functional inter-subject variability,

spatial warping tends to result in mis-registrations [7] and spatial distortions [8],

which may render voxel-based group analysis inaccurate. To create a subject

correspondence without spatial warping, a popular approach is to specify regions of

interest (ROIs) and examine statistical properties of regional activation [9]. Under this

ROI-based approach, group inferences are made by statistically comparing features

extracted from the ROIs, such as mean activation statistics, across subjects. The

underlying assumption is that the same anatomical ROIs across subjects pertain to

similar brain functions. It is worth noting that this ROI-based approach is not directly

comparable to conventional univariate group analysis, since subject correspondence is

only established at a regional level, which precludes localization of common active

voxels across subjects. Nevertheless, a more hypothesis-driven question of whether

particular brain regions are activated is addressed, which is often the question of

interest [7]. We have thus taken this region-based approach for the current study.

In this paper, we propose to estimate probabilistic activation maps using a graph-

theoretic method called “Random Walker” (RW) that enables functional connectivity

to be seamlessly integrated into the activation probability estimates. RW has the

distinct advantage of providing an exact, unique, and globally-optimal closed-form

solution for computing posterior probabilities [10]. Under the RW formation, voxels

correspond to graph vertices with voxel interactions modeled as edge weights.

Specifically, in the context of fMRI, activation effects are modeled through a

likelihood term with functional connectivity encoded into a weighted graph Laplacian

matrix for regularizing the likelihood. To infer group effects from the RW-based

probabilistic activation maps under the ROI-based approach, characteristic features

will first need to be defined and extracted from the probability maps. However, the

optimal way for summarizing the regional activation probabilities is unclear. For

instance, simply averaging the activation probabilities over an ROI and comparing

these averages across subjects as done in standard ROI-based analyses [9] may be

insufficient. Yet, if we consider the probabilistic activation maps as grayscale images,

we can characterize the spatial distribution of regional activation, as reflected by the

relative magnitude of the probability values, using invariant spatial features as we

have previously proposed [11]. To explore the implications of incorporating

functional connectivity, we contrast the sensitivity of the invariant spatial features

derived from the RW-based probabilistic activation maps and the standard activation

statistics maps (i.e. t-maps) in detecting group activation changes during a visuo-

motor tracking experiment.

Page 39: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

2 Materials

After obtaining informed consent, fMRI data were collected from 10 healthy subjects

(3 men, 7 women, mean age 57.4 ± 14 years). Each subject used their left hand to

squeeze a bulb with sufficient pressure such that a horizontal bar shown on a screen

was kept within an undulating pathway. The pathway remained straight during

baseline periods, which required a constant pressure to be applied. During the time of

stimulus, the pathway became sinusoidal at a frequency of 0.25 Hz (slow), 0.5 Hz

(medium) or 0.75 Hz (fast) presented in a pseudo-random order. Each session lasted

260 s, alternating between baseline and stimulus of 20 s duration.

Functional MRI was performed on a Philips Gyroscan Intera 3.0 T scanner

(Philips, Best, Netherlands) equipped with a head-coil. T2*-weighted images with

blood oxygen level dependent (BOLD) contrast were acquired using an echo-planar

(EPI) sequence with an echo time of 3.7 ms, a repetition time of 1985 ms, a flip angle

of 90°, an in plane resolution of 128×128 pixels, and a pixel size of 1.9×1.9 mm. Each

volume consisted of 36 axial slices of 3 mm thickness with a 1 mm gap. A 3D T1-

weighted image consisting of 170 axial slices was further acquired to facilitate

anatomical localization of activation. Each subject’s fMRI data was pre-processed

using Brain Voyager’s (Brain Innovation B.V.) trilinear interpolation for 3D motion

correction and sinc interpolation for slice timing correction. Further motion correction

was performed using motion corrected independent component analysis (MCICA)

[12]. The voxel time courses were high-pass filtered to account for temporal drifts and

temporally whitened using an autoregressive AR(1) model. No spatial warping or

smoothing was performed. For testing our proposed method, we selected twelve

motor-related ROIs, including the bilateral thalamus (THA), cerebellum (CER),

primary motor cortex (M1), supplementary motor area (SMA), prefrontal cortex

(PFC), and anterior cingulate cortex (ACC). Anatomical delineation of these ROI was

performed by an expert based on anatomical landmarks and guided by a neurological

atlas. The segmented ROIs were resliced at the fMRI resolution and used to extract

the preprocessed voxel time courses within each ROI for subsequent analysis.

3 Methods

3.1 Probabilistic Activation Map Estimation Using Random Walker

Traditional ROI-based fMRI analysis typically draws group inference using features

extracted from t-maps [9]. However, t-maps computed from conventional univariate

approaches only model voxel-specific activation effects but not the functional

interactions between voxels. Considering that such inter-voxel dependencies are also

indications of brain activation, we propose estimating probabilistic activation maps

that integrate activation effects and functional connectivity using RW [10], which

provides an exact, unique closed-form solution for computing the posterior activation

probabilities. Under the original RW formulation, voxels are treated as graph vertices

with voxel interactions modeled as edge weights. The probability that a voxel belongs

to a certain class label is then estimated by computing the probability that a random

Page 40: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

walker starting at an unlabeled voxel will first reach each pre-labeled seed given the

edge weights which bias the paths. This formulation, however, is not directly

applicable to fMRI analysis since only voxel interactions are considered, which

completely ignores intensity (i.e. activation effects in the context of fMRI). Also, pre-

specifying a seed for every functional region within an anatomical ROI may not be

possible. Therefore, we have adopted an extended RW formulation [10], where

posterior activation probabilities are estimated by minimizing an energy functional

that comprises an aspatial term (i.e. likelihood for modeling activation effects) and a

spatial term (i.e. prior for modeling functional connectivity). This formulation is

analogous to graph cuts (GC) approaches, where the aspatial and spatial terms in RW

correspond to the data fidelity and label interaction terms in GC [10]. The key

differences are that RW is non-iterative and minimizes the functional over the space

of real numbers, instead of over discrete labels [10]. Since the relative activation level

of the voxels defines the “texture” of the regional activation patterns, using

probabilistic activation maps consisting of real numbers between [0,1], as estimated

using RW, enables such textural information to be modeled in our spatial analysis.

Simulating all possible paths that a random walker may take is computationally

infeasible. Fortunately, it has been shown [10] that the posterior probabilities ps of the

voxels within an ROI being assigned a label s given data can be estimated by solving:

ssN

r

r pLc

λ=

Λ+ ∑

=1

, (1)

where λs is a column vector containing the likelihood of the voxels being assigned

label s as estimated based on the t-statistics, Λs is a matrix with λ

s along its diagonal,

and Nc is the number of class labels. Due to the presently unclear interpretation of

negative t-statistics [13], we restrict our analysis to only positive t-statistics. Hence,

voxels are designated as either active or non-active (i.e. ‘de-active’ is not considered).

L is a weighted graph Laplacian matrix. To estimate the likelihood λs, we first

compute the t-statistics using a general linear model (GLM) with boxcar functions

convolved with the hemodynamic response as regressors. Two mixtures of

distributions are compared for modeling the t-statistics, namely a mixture of two

Gaussians (GMM) and a mixture of Gamma and Gaussian (GGM) with the Gamma

distribution used to model the activated voxels [6]. Specifying the parameter values of

these distributions a priori, however, can be very challenging. Therefore, we employ

the expectation maximization (EM) algorithm to learn the GMM parameters [14]. As

for estimating the GGM parameters, since no widely-accepted formulation is present,

an iterative method we refer to as the simplified-EM algorithm is employed:

Initialization: Label the voxels as active or non-active based on the GMM posterior

probability estimates. Assuming the active voxels have t-statistics, ti ~

Gamma(α, β), estimate α and β. Equations for estimating α and β can be found

in e.g. [15]. For the non-active voxels, we assume ti ~ N(µ , Σ) and estimate µ

and Σ as the sample mean and covariance.

E-step: Compute Gamma(ti; α, β) and N(ti; µ , Σ) for all voxels and re-label the

voxels based on maximum likelihood.

M-step: Based on the re-assignment, update α, β, µ , and Σ.

Convergence: Repeat the E and M steps until no re-assignment occur. The

resulting α, β, µ , and Σ are used to estimate the likelihood λs.

Page 41: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

The simplified-EM algorithm is similar to EM except a hard assignment, instead of

posterior probabilities, is used in the E-step. Since RW only requires the likelihood λs

as input, this algorithm is sufficient for our current application. We note that to

decouple amplitude effects from spatial changes, as further discussed in Section 3.2,

we have fitted a separate mixture model for each experimental condition. Specifically,

if the t-statistics are generally larger during the fast condition for example, we will

like to remove this overall amplitude change from our spatial analysis so that any

detected changes in our spatial features will not be due to amplitude effects. This can

be achieved by fitting a separate mixture for each condition so that the centers of the

mixtures can adapt to the amplitude of the t-statistics during the different conditions.

To integrate functional connectivity into ps, we define L as follows:

==

≠=−

= ∑∈ iNj

iji

jiij

ij jiwd

jitItIcorrw

L ,

)),(),((

, (2)

where Ik(t) is the intensity time course of voxel k and Ni is the neighborhood of voxel

i. In this work, we model the ROIs as fully-connected graphs to incorporate both

correlations between immediate neighbors and spatially-disconnected voxels.

However, some voxel correlations may have arisen due to blood flow or residual

movement artifacts that remained after preprocessing. Therefore, we set all wij below

a threshold to 0, where the threshold is determined by examining the distribution of

correlation values between signals generated from AR(1) processes: εk(t+1) = aεk(t) +

σk(t), σk(t) being a white Gaussian noise process and a is set to 0.3 as typically

observed in the residuals of real fMRI data. A threshold of 0.7 is applied to remove

99.7% of the probability density from the null distribution. Note that we have used

correlations instead of differences in t-statistics in (2) to account for cases where

voxels may have similar t-statistics but are only mildly correlated. For example, in a

block design experiment as employed in this study, if voxel i responds during the

beginning of a block, whereas voxel j responds near the end of the block, these voxels

will display similar t-statistics yet will not be highly correlated.

3.2 Group Spatial Analysis

In standard ROI-based analysis, amplitude-based features, such as mean t-statistics,

are often used to summarize the ROI response. Group inferences are then drawn by

statistically comparing the ROI features across subjects. However, simply averaging

the t-statistics neglects potential spatial changes. Therefore, we proposed previously

to use invariant spatial features to characterize the ROI activation patterns [11]. Using

these spatial features enables a compact or focused pattern to be distinguished from a

diffused activation pattern for example. Also, the invariance properties of these

features facilitate comparisons of activation patterns in the subjects’ native space [11].

In this work, we examine activation changes using one such feature, J1, that is specific

for measuring the spatial extent of activation relative to the activation centroid:

0020202001 µµµ ++=J , (3)

∫∫∫ −−−= ssssssr

ssq

ssp

sspgq dzdydxzyxzzyyxx ),,()()()( ρµ , (4)

Page 42: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

where n = p+q+r is the order of the centralized 3D moment, µpqr, ρ(xs,ys,zs) is the

probability of a voxel located at (xs,ys,zs) being active as estimated using RW,

( sx , sy , sz ) are the centroid coordinates of ρ(xs,ys,zs), and (xs,ys,zs) are the (x,y,z)

coordinates of the voxels scaled to account for ROI size differences:

∑∈

−+−+−====ROIi

iiisss zzyyxxsszzsyysxx ))()()((,/,/,/222

, (5)

where ),,( zyx is the anatomical centroid of a given ROI. In [11], we normalized the t-

statistics by the maximum to decouple amplitude effects from spatial changes, which

enables a diffused activation pattern with mild activation effects to be distinguished

from a focused pattern with high activation amplitude. Now, with ρ(xs,ys,zs) being

probability maps bounded between [0,1], this step is no longer required. Also in [11],

we obtained scale invariance by 13/)(

000+++ rqp

pqr µµ , which may pool residual

amplitude effects after t-statistics normalization into the spatial analysis. Here, we

instead employ (5), which does not involve ρ(xs,ys,zs), to account for size differences.

For comparisons, we compute J1 with ρ(xs,ys,zs) being t-statistics normalized by

the maximum as well as using a sigmoid function:

))/),,(exp(1/(1),,( maxρρσρ ssssssnorm zyxzyx ⋅−+= , (6)

where ρmax is the 99th

percentile of ρ(xs,ys,zs) and σ is chosen such that outlier voxels

with ρ(xs,ys,zs) > ρmax are saturated to 1. σ is set to 5 in this paper.

4 Results and Discussion

Probabilistic activation maps generated using RW are shown in Fig. 1. Compared to

standard t-maps, incorporating functional connectivity resulted in sharper activation

maps with voxels displaying moderate activation effects and low (high) functional

connectivity suppressed (reinforced). Increasing frequency led to a focusing of the

activation pattern in the left thalamus as evident by comparing Fig. 1(a) and (b) but

less apparent in (c) and (d). We note that observing fewer voxels recruited at higher

frequency may seem counter-intuitive, but this finding can be explained by the

increase in t-statistics of the recruited voxels during the fast condition (albeit not

significant compared to the slow condition, Fig. 2), thus suggesting that certain brain

regions adapt to higher frequencies by increasing activation at task-specific locations.

0

1

0

1

0

1

(a) RW, slow

0

1

0

1

0

1

(b) RW, fast

0

1

0

1

0

1

(c) t-map, slow

0

1

0

1

0

1

(d) t-map, fast

Fig. 1. Activation maps of left thalamus of an exemplar subject. RW probabilistic activation

maps appear sharper than the t-maps (normalized using (6)). Increasing frequency resulted in a

focusing of the activation patterns as evident in (a) and (b) but less apparent in (c) and (d).

Page 43: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Left THA Left CER Left M1 Left SMA Left PFC Left ACC

0.00

1.00

2.00

3.00

4.00

5.00|t-v

alu

es|

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Left THA Left CER Left M1 Left SMA Left PFC Left ACC

0.00

1.00

2.00

3.00

4.00

5.00|t-v

alu

es|

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Left THA Left CER Left M1 Left SMA Left PFC Left ACC

0.00

1.00

2.00

3.00

4.00

5.00|t-v

alu

es|

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Right THA Right CER Right M1 Right SMA Right PFC Right ACC

0.00

1.00

2.00

3.00

4.00

5.00

|t-v

alu

es|

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Mea

n t-va

lue

sigm

oid

div

max

GG

MG

MM

Right THA Right CER Right M1 Right SMA Right PFC Right ACC

0.00

1.00

2.00

3.00

4.00

5.00

|t-v

alu

es|

Fig. 2. Real fMRI data results. t-values obtained by applying a paired t-test to J1 of the slow and

fast conditions are displayed. Significance is declared at a t-threshold of 2.64 as indicated by

the green dashed line. From left to right, the five bars pertain to using mean t-statistics, J1 with

t-maps normalized using a sigmoid function, dividing by the maximum, proposed probabilistic

activation maps with t-statistics modeled using GGM and GMM. J1 based on t-maps detected

the right thalamus and bilateral cerebellar hemispheres, whereas no significance was found with

mean t-statistics. Using GGM detected the left thalamus and right SMA in additional to ROIs

found with t-maps, thus suggesting potential benefits for incorporating functional connectivity.

Quantitative results comparing J1 derived from RW probabilistic activation maps

and t-maps are summarized in Fig. 2. Only results contrasting J1 of the slow and fast

conditions using a paired t-test are presented due to space limitation. Significance is

declared at a t-threshold of 2.64, corresponding to a p-value of 0.05 with false

discovery rate (FDR) correction. Traditional mean t-statistics were also examined.

Using mean t-statistics, no significant task-related activation changes were detected

in any of the ROIs. In contrast, applying J1 to normalized t-maps (i.e. divided by the

maximum as proposed in [11]) detected the right thalamus and the bilateral cerebellar

hemispheres, thus again demonstrating the importance of incorporating spatial

information. Tracking tasks such as the one employed in this work is known to

activate the cerebello-thalamo-cortical motor pathway [16]. Thus, detecting the right

thalamus and the bilateral cerebellar hemispheres conforms to prior neuroscience

knowledge. Normalizing the t-maps using a sigmoid function detected the same ROIs

but with higher discriminability in the left cerebellum. This increase is likely due to

outlying t-statistics being saturated to 1 by the sigmoid function. Applying J1 to the

proposed probabilistic activation maps with voxel t-statistics modeled using a GGM

detected the left thalamus and the right SMA, in addition to ROIs found with t-maps.

Examining Fig. 1, incorporating functional connectivity resulted in more discriminant

activation patterns, which may explain the additional ROI detections. These results

are consistent with past findings, since the left thalamus and the right SMA are also

part of the cerebello-thalamo-cortical motor pathway [16]. Modeling the voxel t-

statistics using GMM detected significant activation changes in the same ROIs as

those found with GGM, except for a loss of detection in the right cerebellum, which

fell just below the t-threshold. This suggests that using a Gamma distribution is more

suitable for modeling the t-statistics of the active voxels than using a Gaussian

distribution, as was previously observed [6].

Page 44: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

5 Conclusions

In this paper, we propose estimating fMRI probabilistic activation maps using RW,

which guarantees global optimality of the posterior activation probability estimates.

The flexibility of the RW formulation enables functional connectivity information to

be easily integrated into the probabilistic activation maps. Applying invariant spatial

features to the RW-based probabilistic activation maps detected significant activation

changes in multiple ROIs known to be invoked in tracking tasks, whereas reduced

sensitivity was observed when similar spatial analysis was performed on traditional t-

maps. Our results thus suggest that incorporating additional functional attributes in

the activation estimation procedures may be a promising direction to explore.

References

1. Rogers, B.P., Morgan, V.L., Newton, A.T., Gore, J.C.: Assessing Functional Connectivity in

the Human Brain by fMRI. Magn. Reson. Imaging. 25, 1347-1357 (2007)

2. Lohmann, G., Bohn, S.: Using Replicator Dynamics for Analyzing fMRI Data of the Human

Brain. Trans. Med. Imaging. 21, 485--492 (2002)

3. Descombes, X., Kruggel, F., von Cramon, D.Y.: Spatio-Temporal fMRI Analysis Using

Markov Random Fields. Trans. Med. Imaging. 17, 1028--1039 (1998)

4. Penny, W.D., Trujillo-Barreto, N.J., Friston, K.J.: Bayesian fMRI Time Series Analysis with

Spatial Priors. NeuroImage. 24, 350--362 (2005)

5. Harrison, L.M., Penny, W.D., Asburner, J., Trujillo-Barreto, N.J., Friston, K.J.: Diffusion-

based Spatial Priors for Imaging. NeuroImage. 38, 677--695 (2007)

6. Woolrich, M.W., et al.: Mixture Models with Adaptive Spatial Regularization for

Segmentation with an Application to fMRI Data. Trans. Med. Imaging. 24(1), 1--11 (2005)

7. Thirion, B., et al.: Dealing with the Shortcomings of Spatial Normalization: Multi-subject

Parcellation of fMRI Datasets. Hum. Brain Mapp. 27, 678--693 (2006)

8. Ng, B., Abugharbieh, R., McKeown, M.J.: Adverse Effects of Template-based Warping on

Spatial fMRI Analysis. In: Proc. SPIE. 7262, 72621Y (2009)

9. Constable, R.T., et al.: Quantifying and Comparing Region-of-Interest Activation Patterns in

Functional Brain MR Imaging: Methodology Considerations. Magn. Reson. Imaging. 16(3)

289--300 (1998)

10. Grady, L.: Multilabel Random Walker Image Segmentation Using Prior Models. In: Proc.

IEEE Comp. Soc. Conf. Comp. Vision Pattern Recog. 1, 763--770 (2005)

11.Ng, B., Abugharbieh, R., Huang, X. McKeown, M.J.: Spatial Characterization of fMRI

Activation Maps Using Invariant 3-D Moment Descriptors. Trans. Med. Imaging. 28(2),

261-268 (2009)

12. Liao, R., Krolik, J.L., McKeown, M.J.: An Information-theoretic Criterion for Intrasubject

Alignment of fMRI Time Series: Motion Corrected Independent Component Analysis.

Trans. Med. Imaging. 24(1), 29--44 (2005)

13. Harel, N., et al.: Origin of Negative Blood Oxygenation Level-Dependent fMRI Signals. J.

Cereb. Blood Flow Metab. 22(8) 908--917 (2002)

14. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer Science+Business

Media, LLC, New York (2007)

15. Coit, D.W., Jin, T.: Gamma Distribution Parameter Estimation for Field Reliability Data

with Missing Failure Times. IIE Trans. 32, 1161--1166 (2000)

16. Miall, R.C., Reckess, G.Z., Imamizu, H.: The Cerebellum Coordinates Eye and Hand

Tracking Movements. Nat. Neurosic. 4, 638--664 (2001)

Page 45: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

Integrating Multiple-Study Multiple-SubjectfMRI Datasets Using Canonical Correlation

Analysis

Indrayana Rustandi1,2, Marcel Adam Just3 and Tom M. Mitchell1,4

1 Computer Science Department, Carnegie Mellon University, USA,2 Center for the Neural Basis of Cognition, Carnegie Mellon University, USA,

3 Department of Psychology, Carnegie Mellon University, USA,4 Machine Learning Department, Carnegie Mellon University, USA.

Abstract. We present an approach to integrate multiple fMRI datasetsin the context of predictive fMRI data analysis. The approach utilizescanonical correlation analysis (CCA) to find common dimensions amongthe different datasets, and it does not require that the multiple fMRIdatasets be spatially normalized. We apply the approach to the taskof predicting brain activations for unseen concrete-noun words usingmultiple-subject datasets from two related fMRI studies. The proposedapproach yields better prediction accuracies than those of an approachwhere each subject’s data is analyzed separately.

1 Introduction

The predictive style of fMRI data analysis, in which we try to predict some quan-tity of interest based on some fMRI data, has recently become more widespread(for an overview, see [1]). Nonetheless, most of the predictive approaches forfMRI data analysis have been limited in the sense that they can be applied onlyindividually to a particular subject’s data from a particular fMRI study. A fewapproaches have been proposed to get around this limitation. The most naıveapproach is to first register the different subjects’ brains to a canonical spatialcoordinate frame, then pool all the data together and treat them as coming fromone subject in one study. However, this approach ignores the variability that islikely to exist across subjects both within a particular study and also acrossstudies. This can be corrected using, for instance, hierarchical Bayes techniques,as was proposed in [2]. Even then, these approaches assume that the same voxelafter spatial normalization behaves similarly across subjects and studies of in-terest, even though the spatial normalization process is imperfect and variabilityin activations across subjects can still exist even after spatial normalization.

We present a new approach to integrate multiple-subject multiple-study fMRIdata in the context of predictive fMRI data analysis. Unlike the approachesmentioned above, our approach treats the data for each subject as a distinct set,and it does not require that the disparate datasets be in a common normalizedspace. We apply our approach to the task of predicting fMRI data for newstimuli, similar to the task described in [3].

Page 46: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

2 Methods

2.1 Datasets

We use previously analyzed datasets based on two fMRI studies: the WP (Word-Picture) study [3] and the WO (Word-Only) study [4]. In both studies,each participant was presented with stimuli corresponding to sixty concrete-noun words, which can be grouped into twelve semantic categories. In the WPstudy, each stimulus consisted of a line-drawing picture and the associated wordlabel, e.g. ”house”. In the WO study, each stimulus consisted only of the wordlabel, without the line drawing. In each trial, the stimulus was presented forthree seconds followed by a seven-second period of fixation before the next trialstarted. The participants were instructed to actively think about the propertiesof the object described by the stimulus. Each participant went through six runsof the experiment during a single session, where each of the sixty words was pre-sented once in each run. Data from nine (WP) and eleven (WO) right-handedadult participants are available for the analysis, including data from three par-ticipants who participated in both studies.

Acquisition and Preprocessing Parameters In both studies, fMRI imageswere acquired on a Siemens Allegra 3.0T scanner using a gradient echo EPIpulse sequence with TR = 1000ms. The data were processed using the SPM2software to correct for slice timing, motion, and linear trend, and then temporallyfiltered using a 190s cutoff and spatially normalized into MNI space resampled to3×3×6 mm3 voxels. The percent signal change relative to the fixation conditionwas computed at each voxel for each stimulus presentation and then a singlefMRI mean image was created for each of the 360 trials (60 words × 6 runs) bytaking the mean of the images collected 4s, 5s, 6s, and 7s after stimulus onset, toaccount for the delay in the hemodynamic response. To reduce the effect of noisein our analysis, for each participant in each study, we analyze the canonical imagefor each of the sixty words, obtained by averaging the six images associated withthe corresponding word across the six presentations/runs.

2.2 Model

We analyze the datasets in the context of the predictive computational modelproposed in [3], shown in figure 1. This model assumes that there are intermedi-ate semantic features—denoted as base features in figure 1—that represent themeaning of each stimulus word, and that underlie the brain activations asso-ciated with thinking about that stimulus word. For example, one base featureto describe semantics of arbitrary stimulus nouns might be the frequency withwhich that noun co-occurs with the verb ”eat” in a large collection of text. Givena particular word and its associated base features, the model assumes that eachsubject’s brain activations can be modeled as linear combinations of the basefeatures. Finding the mapping from the base features to the fMRI brain activa-tions amounts to learning the coefficients (the β coefficients in figure 1) for the

Page 47: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

word w

base featuresf

study 1subject 1

study 1subject 2

study 1subject 3

study Ssubject T

I

I∑i=1

β(1,1)vi fwi

I∑i=1

β(1,2)vi fwi

I∑i=1

β(1,3)vi fwi

I∑i=1

β(S,T )vi fwi

Fig. 1. The baseline predictive computational model from [3] (I base features)

multivariate linear regression problem with the base features (the f variables infigure 1) as covariates and the brain activations in each voxel as responses.

In the baseline model, there exist different mappings from the base features tothe brain activations for the different participants. On the other hand, we mightexpect some similarity among the mappings for the various subjects in one ormore related studies. Incorporating this knowledge into the model can potentiallygive us better predictive ability by leveraging the similar information availableacross subjects and studies. In addition, it potentially allows us to better quantifythe similarities and differences among the various subjects and studies. Yet, wealso would like to avoid the restrictions of the existing methods for integratingmultiple fMRI datasets mentioned in section 1. With these in mind, we proposean enhancement to the baseline model, which we call the CCA-mult approach.

In the CCA-mult model, shown in figure 2, we introduce a common abstrac-tion for brain activations for the various subjects, denoted as learned commonfeatures in figure 2. The learned common features are essentially the sharedlow-dimensional representation for the various subjects’ brain activations data,and they are learned based on the regularities present in the various subjects’brain activations data. In particular, as figure 2 shows, we focus on a linear low-dimensional representation of the brain activations data, i.e. a low-dimensionalrepresentation such that the brain activations for each subject can be recon-structed as linear combination of the features in this representation. Now, in-stead of having a subject-specific direct mapping from the base features to thebrain activations, we have a (linear) subject-independent mapping from the basefeatures to the subject-independent learned features, and then subject-specificmappings from the learned common features to the brain activations. The modelparameters are obtained using a two-step process. First, the subject-specific β’s

Page 48: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

learned common features

g

I∑i=1

αjifwi

J∑j=1

β(1,1)vj gwj

J∑j=1

β(1,2)vj gwj

J∑j=1

β(1,3)vj gwj

J∑j=1

β(S,T )vj gwj

J

word w

base featuresf

study 1subject 1

study 1subject 2

study 1subject 3

study Ssubject T

I

Fig. 2. The model for the CCA-mult approach (I base and J learned common features)

are estimated using canonical correlation analysis, described next. Second, thesubject-independent α’s are estimated using multivariate linear regression.

Learning the Common Features To learn the common features across thevarious subjects’ brain activations data, we use canonical correlation analysis(CCA) [5]. The classical CCA is a multivariate statistical technique to discovercorrelated components across two datasets. More formally, given two datasetsrepresented as matrices X (DX × N) and Y (DY × N), CCA tries to find thevectors wX (DX × 1) and wY (DY × 1) such that the quantities aX = wX

T Xand aY = wY

T Y are maximally correlated. Given this formulation, wX and wY

can be found as a solution to a generalized eigenvalue problem. The pair aX andaY are called the first canonical variate, while we call wX and wY the pair ofloadings for the first canonical component. By deflating the data matrices withrespect to canonical variates already found and reapplying the process, we canfind subsequent canonical variates. The classical CCA can be extended to handlemore than two datasets [6] [7], and to avoid overfitting, we can also regularizethe loadings w·’s similar to what is done in ridge regression [8] [7].

Tying back to the CCA-mult model, we apply CCA to the fMRI datasets,each dataset being a matrix with as many rows as voxels and as many columns asinstances/trials. We then take the sample mean of the canonical variates over thedifferent datasets as the learned common features. The loadings w·’s define theinverse mappings from brain activations to common features. Notice that thereare no restrictions that all the fMRI datasets have to have the same number ofvoxels, as long as we can match the instances/trials in those datasets, since CCAaccepts data matrices with different numbers of rows (corresponding to voxels),

Page 49: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

the constraint being that the matrices have to have the same number of columns(corresponding to instances). As a result, spatial normalization is not necessary.

Prediction In the baseline model, after learning the mapping from the basefeatures to the brain activations, we can generate predicted brain activations as-sociated with a new word by using the known base features for that word alongwith the learned multivariate linear regression coefficients. In the CCA-multmodel, given the base features for a new word, we first generate a prediction forthe common features for that word by linear regression. Given the predicted com-mon features, we then generate the predicted brain activations for each subjectby multiplying the predicted brain activations with the Moore-Penrose pseudoin-verse of the loading matrix for that particular subject, obtained by aggregatingthat subject’s loadings wsubj across all the canonical components. In essence,the pseudoinverses of the loading matrices correspond to the β’s in figure 2.

Parameters We set the number of learned common features to ten (J = 10),using the first ten canonical variates; the optimal number to use still needs tobe investigated. We use the regularized multiple-dataset version of CCA similarto that presented in [7], using 0.5 as the regularization coefficient, where theregularization coefficient ranges from 0 to 1.

3 Experiments

3.1 Setup

To compare the performance of the CCA-mult approach with that of the baselineapproach, we ran experiments using the following methods:

1. LR The baseline method shown in figure 1.2. CCA-mult-subj The CCA-mult method applied to all the subjects in a

particular study, separately for the WP and the WO studies.3. CCA-mult-subj-study The CCA-mult method applied to all the subjects

from the two studies combined.

Besides integrating multiple datasets, the CCA-mult approach also performsdimensionality reduction. In order to contrast the contribution of dataset integra-tion and the dimensionality reduction aspects, we also consider a fourth method(PCA) in which we individually run principal component analysis (PCA) onthe fMRI data for each subject, and then perform a linear regression from thebase features to each subject’s first ten PCA components, to match the numberof dimensions of the CCA-mult variations.

Evaluation To evaluate the predictive ability of these four methods, we usethe cross-validation (CV) scheme described in [3]. In this scheme, for each CVfold, we hold out two words out of the sixty words used and train the model

Page 50: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

using the data associated with the 58 remaining words ((602

)or 1770 total folds).

The trained model is then used to generate the predicted fMRI activations forthe two held-out words. We compare the predicted activations with the trueobserved fMRI activations using the cosine similarity metric as described in [3],obtaining a binary accuracy score for each fold indicating whether the modelpredicts the fMRI images for the two held-out words well enough to distinguishwhich held-out word is associated with which of the two held-out images. Theseresults are aggregated across all the folds to obtain an accuracy figure.

Base features In [3], a set of base features derived from the statistics of a largetext corpus data was used. In particular, they used co-occurrence counts of thestimulus words with a set of 25 verbs as base features, the counts derived from acollection of English Web pages collected by Google. In this paper, we considerthe co-occurrence counts with the following sets of words as base features:

– 25 verbs used in [3] (I = 25)– 1000 most familiar nouns from the MRC psycholinguistic database1 (I =

1000)– 1000 most familiar nouns, 1000 most familiar verbs, and 814 most adjectives,

also from the MRC psycholinguistic database (I = 2814)

Voxel selection and data processing To decrease the effect of noise, weperform our analysis on a subset of the voxels considered relevant: in each CVfold and for each participant in each study, we rank the voxels based on thestability [3] of its activations for the 58 words used to train the model across thesix presentations/runs, and choose 500 voxels with the highest stability.

3.2 Results

Figure 3 shows the average accuracies across all the subjects in each study forall the approaches, across the three sets of base features. As the figure shows, forall three sets of base features, both CCA-mult variations give better accuraciescompared to the LR and PCA approaches. On the other hand, the differencesin accuracies between the two CCA-mult variations are relatively small.

When using the CCA-mult method, we can also look into what kind of se-mantic information each of the learned common features represents. We focuson the common feature corresponding to the dominant CCA component for theCCA-mult-subj-study variation. In particular, we can see how this common fea-ture is mapped to the entire brain by regressing it to the full-brain activations.Figure 4 shows the results for one of the participants that took part in both stud-ies. In the WP case, we see significant loading magnitude in the fusiform gyri,highlighted by the pink ellipses, while the WO loadings exhibit significant mag-nitude around the superior parietal cortex, highlighted by the purple ellipses.

1 http://www.psy.uwa.edu.au/mrcdatabase/uwa_mrc.htm

Page 51: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

WP WO0.5

0.6

0.7

0.8

0.9

125 verbs

WP WO0.5

0.6

0.7

0.8

0.9

1MRC nouns

WP WO0.5

0.6

0.7

0.8

0.9

1MRC adjs−nouns−verbs

LRPCACCA−mult−subjCCA−mult−subj−study

Fig. 3. Accuracies of the three sets of base features for the WP and WO datasets.

On the other hand, white ellipses denote some of the areas exhibiting similarloading values in both cases, located among others in the left extrastriate, leftpars opercularis, and left precentral gyrus.

Next, we can check whether this common feature represents some semanticdimensions by looking at the distributions of its value across all sixty stimuluswords. The top five stimulus words with the most positive value for the firstcommon feature—knife, cat, spoon, key, pliers—roughly represent the ”manip-ulability” concept, while the top five stimulus with the most negative value forthe same feature—apartment, church, closet, house, barn—roughly represent the”shelter” concept, mirroring some of the findings of [4].

4 Conclusion

We have presented the CCA-mult approach to integrate data from multiplesubjects and multiple fMRI studies. The CCA-mult approach does not requirethat the datasets be spatially normalized. Our results show that by using thelower-dimensional feature space discovered by the CCA-mult method, we obtainbetter accuracies compared to using the baseline approach from [3] and to usingthe dataset-specific lower-dimensional feature space discovered through PCA.

The experiments reported in this paper support our thesis that it is possibleto train more accurate computational models by integrating training data frommultiple subjects participating in multiple related fMRI studies, by incorporatinglatent variables that capture commonalities across subjects, yet still allow themodel to estimate parameters that are specific to each participant and study.Given that many fMRI analyses are limited by the sparsity of training datarelative to the complexity of the phenomena to be modeled, it is important todevelop models like ours that integrate data from multiple subjects and studies.To that end, one specific direction for future research on our model is to removeits current restriction that the different studies have matched trials (e.g., in ourcase study, each data set must include the same 60 semantic stimuli). We arecurrently exploring methods that relax this assumption, to enable training amodel from data in which different subjects are presented with different stimuli.

Page 52: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

Fig. 4. The full-brain loadings for one participant in both the WP study (left) andthe WO study (right). White ellipses denote significant loadings present in both cases,while pink ellipses denote significant loadings present in only the WP case, and purpleellipses denote significant loadings present in only the WO case.

5 Acknowledgments

We thank Vlad Cherkassky for preprocessing the fMRI data and Kai-min Chang for re-ferring us to the MRC database. This research was supported by a grant from the KeckFoundation, an NSF CCBI research grant, and also NSF through TeraGrid resourcesprovided by the Pittsburgh Supercomputing Center.

References

1. Haynes, J.D., Rees, G.: Decoding mental states from brain activity in humans.Nature Reviews Neuroscience 7 (2006) 523–534

2. Rustandi, I.: Classifying Multiple-Subject fMRI Data Using the Hierarchical Gaus-sian Naıve Bayes Classifier. In: 13th Conference on Human Brain Mapping. (2007)

3. Mitchell, T.M., et al.: Predicting human brain activity associated with the meaningsof nouns. Science 320 (2008) 1191–1195

4. Just, M.A., et al.: A neurosemantic theory of concrete noun representation basedon the underlying brain codes. Under review

5. Hotelling, H.: Relations between two sets of variates. Biometrika 28(3/4) (1936)321–377

6. Kettenring, J.R.: Canonical analysis of several sets of variables. Biometrika 58(3)(1971) 433–451

7. Hardoon, D.R., et al.: Canonical correlation analysis: An overview with applicationto learning methods. Neural Computation 16 (2004) 2639–2664

8. Vinod, H.D.: Canonical ridge and econometrics of joint production. Journal ofEconometrics 4 (1976) 147–166

Page 53: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

Exploring the temporal quality of fMRIacquisitions

B. Scherrer1, O. Commowick1, S. K. Warfield1

Computational Radiology Laboratory, Department of RadiologyChildren’s Hospital, 300 Longwood Avenue, Boston, MA, 02115, USA

Abstract. Quality assessment of an fMRI acquisition is essential to en-sure the reliability of results in fMRI studies. Major approaches focus ondetecting artifacts in the signal but do not take into account the designparadigm. In this paper, we focus on block design experiments and pro-pose to explore the information contained in each block. This allows us toidentify poor blocks, which either contain too much artifact or in whichthe subject failed to perform the task. The evaluation was performed us-ing a finger tapping experiment, from both acquisitions with simulatedartifacts and pediatric subject acquisitions. Our approach appears to bea promising way to assess fMRI data quality and, in particular, couldprovide some information about the subject’s cooperation.

1 Introduction

Functional Magnetic Resonance Imaging (fMRI) aims at detecting brain activ-ity by studying localized changes in the blood oxygenation, the so-called BOLDcontrast. Measurements suffer from low resolution and a poor signal-to-noiseratio, making this analysis difficult. A large number of fMRI data processingapproaches have been proposed in the literature, and focus mainly on improvingthe statistical analysis method [1, 2] or hemodynamic response function model-ing [3]. However, assessing the data quality is essential to ensure the reliability ofresults. In this direction, there is a growing interest in Quality Assurance (QA)methods to measure the scanner performance stability, and to ensure approxi-mately equal scanner performances in multicenter studies [4]. Other approachesaim at quantifying the reliability of an activation map over several replicatesof the same experimental paradigm [5, 6]. Intra-experiment quality assessmentis crucial as well. Approaches have been proposed to estimate and correct forspecific artifacts such as motion [7], but are not general. Recently, [8] suggesteda more general correction technique for signal artifacts independent of origin andform. However, proposed approaches to date focus on analyzing the signal anddo not consider the experiment paradigm. As a consequence, they cannot pro-vide information about how well the subject followed the experimental protocolduring the scan. This information, referred to as the subject cooperation quality,is essential to assess the quality of a fMRI experiment, specifically in pediatricimaging.

Page 54: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

In this paper, we propose a different way to explore the quality of fMRI acqui-sitions which could inform, at least in part, about both the presence of artifactsand the subject cooperation quality. More precisely, we propose to analyze therelative quality of each block in a block design paradigm. First, an activationmap is computed from the information contained in each separate block. Sec-ond, an underlying activation map and performance parameters associated witheach block are estimated in an Expectation-Maximization (EM) framework usinga STAPLE-like approach [9]. These performance parameters provide a qualitymeasure of each block relative to the estimated underlying ”true” activationmap. The evaluation was performed using a finger tapping experiment. Healthycooperative subjects simulated artifacts during the acquisition such as motionor experimental protocol mistakes. Finger tapping acquisitions of real pediatricsubjects were also used for the evaluation. We show here that our approach pro-vides interesting detection capabilities, pointing out the relatively poor blockscompared to the complete acquisition, even with a small number of blocks inthe design. Such an analysis, which focuses on the information contained in eachblock and not on the signal quality, then appears as a promising way to assessthe quality of fMRI acquisitions.

2 Method

We assume the fMRI time series to be corrected for motion and slice timing. Weconsider a block design paradigm composed of NB blocks of SB volumes each. Thetotal number of acquired volumes is then T = NBSB. Let y = (yt, t = 1, . . . , T )be the complete set of volumes acquired during the fMRI experiment, where yt

is the acquired volume at time t.

Block extraction and analysis. We extract each block separately from thedata. Let xb be the subpart of y representing the block b ∈ [[1, . . . , NB]], definedby xb = (yt, t ∈ [[1 + (b − 1)SB, bSB]]). We are interested in estimating theinformation contained in xb by calculating an activation map. To reduce thenoise variance, we construct from xb a virtual acquisition xb made of N repeat

successive repetitions of xb. However, such a repetition increases the signal andintroduces too much correlation, resulting in too many activations during thestatistical analysis. We propose to introduce an additive random Gaussian noiseof zero mean and variance σn, N (0, σn), weighted by the parameter γ: xb =xb + γN (0, σn). Such an addition noise has the effect of decreasing the signal.From xb we then compute a thresholded binary activation map. For simplicity wechose to apply the commonly used GLM estimation procedure as implementedin SPM5, including the realignment and smoothing preprocessing. Other moreelaborated methods such as [2] could also be used. From this step we obtaina set B = (Bb, b = 1, . . . , NB) of binary activation maps representative of theinformation contained in each block.

Estimation of performance parameters for each block. The STAPLE [9]algorithm was initially developed for segmentation purposes, to compute both

Page 55: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

an estimate of the reference true segmentation and performance parameters froma set of segmentations of an image. In this work we propose to apply it to fMRIbinary activation maps. Briefly, we consider as an input the set of binary acti-vation maps B constructed from the NB blocks. The goal is to estimate 1) thehidden reference activation map T underlying the set B, and 2) performanceparameters θ describing the agreement between T and each Bb. Let D be theN × NB binary decision matrix, describing the binary decisions made for eachactivation map at each of the N voxels. D is defined by Dib = 1 if the voxel iis activated in Bb, and null elsewhere. Let θ = (p, q) be the parameters charac-terizing each of the NB activation maps, each element of p = (p1, . . . , pB) (resp.q = (q1, . . . , qB)) being a sensitivity (resp. specificity) parameter. They are de-fined by pb = P (Dib = 1|Ti = 1) and qb = P (Dib = 0|Ti = 0), and also relatedto type II and type I errors (resp. β = 1− pb and α = 1− qb).The performance level parameters θ = (p, q) can then be estimated by maximiz-ing the complete data loglikelihood:

(p, q) = arg maxp,q

ln p(D,T |p, q).

With the reference activation map T being unknown, we are in the presence of amissing data problem. The EM algorithm is a general two-step iterative estima-tion technique for such a problem. During the E-Step, it computes the expectedvalue of the complete data log-likelihood knowing the performance parametersθ(k−1) at the previous iteration k − 1. During the M-Step, it estimates the up-dated performance parameters θ(k) by maximizing the expected complete datalog-likelihood (see [9] for more details). The algorithm iterates until convergence,which is based on the relative change of the performance parameters.

The result is an estimate of the underlying activation map T and an estimate ofthe sensitivity and specificity parameters pb and qb for each block b. To accountfor the proportion of activated voxels compared to the high number of voxels inthe volume, we then compute the positive and negative predictive values (PPVand NPV), defined by PPVb = P (Ti = 1|Dib = 1) and NPVb = P (Ti = 0|Dib =0). They are related to the sensitivity and the specificity through the relations:

PPVb =pbπ

pbπ + (1− qb)(1− π)and NPVb =

(1− π)qb

(1− pb)π + (1− π)qb,

where π is a global prior describing the proportion of active voxels, computedfrom B by π = 1

NB

∑NBb=1 Bb with Bb the proportion of active voxels in Bb.

Intuitively, PPVb is the proportion of active voxels in Bb which are also activein the reference activation map T . Similarly, NPVb is the proportion of non-activevoxels in Bb which are also non-active in T . These two performance parametersthen provide a measure of how the activation map constructed from one blockcharacterizes the estimated reference activation map T . They can be used toassess the relative quality of each block compared to the complete acquisition.

Page 56: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

(a)

(b)

(c)

(d)

(e)Fig. 1. Hand change delay experiment. Fig. (a) shows the ‘left hand’ (in blue) and‘right hand’ (in red) activation maps generated from each block (B1, B2, B3, B4) fordifferent coronal slices. Fig. (b) shows the corresponding statistical reference activationmap (T) computed by STAPLE. Fig. (c) shows the estimated performance parametersPPV and NPV for each block, and points out that the less reliable block is the thirdone. Fig. (d) shows the activation map computed with the four blocks, while Fig. (e)shows the activation map computed without the third artifacted block.

Page 57: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

(a) (b) (c)

(d) (e)Fig. 2. Double task experiment. Fig. (a) shows the PPV and NPV performance param-eters, and correctly points out the second block as the less reliable one. Fig. (b) and (c)show respectively a coronal and an axial slice of B2, in which activations in the Wer-nicke (W) and Broca area are detected. Fig. (d) shows the activation map computedwith the three blocks, and Fig. (e) the activation map computed without the secondblock.

3 Evaluation

For all experiments we used N repeat = 3 repetitions to construct xb. Experimen-tally, the additive noise parameters were set to σn = 1 and γ = 1/30∆max

I , where∆max

I is the difference of the maximum and the minimum intensities computedfrom all volumes masked with a rough brain extraction. For all fMRI statisticalanalysis we used a threshold of p < 10−5. The evaluation was performed usinga finger tapping experiment. The task paradigm for each block consisted of 30seconds of left finger tapping, followed by 30 seconds of right finger tapping.Scanning was performed with a 12-channel head coil on a Siemens 3T scanner.Functional data were acquired by an EPI-BOLD sequence (TR = 3s, TE =30ms, matrix size = 64x64, slice thickness = 3.75 mm, pixel size = 3.25 mm).First, healthy cooperative subjects simulated artifacts or protocol mistakes dur-ing the acquisition. Fig. 1 illustrates the results of an acquisition of 4 blocks,in which the subject added a delay of 15 seconds to actually change from lefthand to right hand during the third block. It then corresponds to the proto-col: [L-R]-[L-R]-[L-L(15s)/R(15s)]-[L-R], where L corresponds to ‘left hand’, R to‘right hand’, and [L-R] denotes a normal block. The figures are displayed withthe neurological orientation convention, so that ‘left hand’ finger tapping showsactivations on the right side of the image in the somato-motor cortex. Fig. 1cshows that the smaller positive and negative predictive values for the third blockcorrectly point out that it is the least representative of the experiment. Fig. 1dshows the activation map computed with a classical GLM procedure on the fourblocks. Fig. 1e shows that removing the third block gives more satisfying resultsthan using the four blocks.Fig. 2 shows a three block experiment in which the subject was asked to speakduring the first part of second block. It corresponds to the protocol [L-R]-

Page 58: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

(a) (b) (c)Fig. 3. Motion experiment. Fig. (a) shows the PPV and NPV performance parameters,correctly pointing out the third block as the less reliable one, mainly for ‘left hand’.Fig. (b) shows the activation map computed with the four blocks, and Fig. (c) theactivation map computed without the third artifacted block.

Fig. 4. Positive and negative predictive values for three non-artifacted blocks.

[L(speaking)-R]-[L-R], and illustrates the case of an additional task correlated withthe protocol. The PPV/NPV correctly point out the second block as the less re-liable one (smallest left PPV value, see Fig. 2a). Additionally, we qualitativelyobserve activations in the Wernicke and Broca areas in the activation map gen-erated for this second block (see Fig. 2b and 2c). However, in this case, removingthe artifacted block decreases the final result. This is likely due to the very lownumber of blocks remaining. Additionally, although the PPV of ‘left hand’ islow for the second block, its sensitivity was good (0.90). This indicates that thetask was correctly performed, but additional activations were detected. Blocksshould not be removed in these cases.Fig. 1 and 2 illustrate how our approach provides some information about thesubject cooperation. Fig. 3 reports results with a motion artifact. The experimentwas composed of four blocks, and the subject was asked to move during the firstpart of third block. It corresponds to the protocol [L-R]-[L-R]-[L(motion)-R]-[L-R].We observe that when including the four blocks, the realignment procedure ofSPM5 is not reliable enough, resulting in a poor activation map (Fig. 3b). Byremoving the third block as suggested by the smallest PPV/NPV performanceparameters (Fig. 3a), we obtain a more satisfying activation map (Fig. 3c).Finally, Fig. 4 verifies that for three non-artifacted blocks, PPV and NPV valuesare approximately homogeneous.

We then evaluated the approach on finger tapping experiments obtained frompediatric imaging. Fig. 5 shows an acquisition for which the first block appearsas the less reliable one. Removing it and keeping only two blocks appears toprovide a better activation map (Fig. 5e). On Fig. 6, the first block is as well

Page 59: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

(a) (b)

(c) (d) (e)Fig. 5. Pediatric imaging - I. Fig. (a) shows the activation map computed from each ofthe three blocks (B1, B2, B3), while Fig. (b) shows the estimated underlying activationmap computed with STAPLE. Fig. (c) shows the PPV and NPV performance param-eters, pointing out the first block as the less reliable one. Fig. (d) shows the activationmap computed with the three blocks, while Fig. (e) shows the activation map withoutthe first block.

(c) (d) (e)Fig. 6. Pediatric imaging - II. Fig. (a) shows the PPV and NPV performance param-eters, pointing out the first block as the less reliable one. Fig. (b) shows the activationmap computed with the three blocks, while Fig. (c) shows the activation map withoutthe first block.

detected as the worst one. GLM analysis with three blocks or without the firstone lead to similar activation maps, with a small advantage for the two-blockanalysis, which does not show activation in the medial part of the brain.

4 Discussion

We present in this paper an original application of the STAPLE approach toexplore the intra-experiment quality of fMRI acquisitions. We compare the rela-tive homogeneity of blocks by estimating performance parameters for each block.Despite its relative simplicity, we show that our approach provides interestingdetection capabilities to assess the quality of block design fMRI acquisitions.Particularly, motions or perturbations of the finger tapping protocol were suc-cessfully detected. Evaluation on pediatric image acquisitions appeared quan-titatively consistent as well. All of the evaluations were performed with thesame parameters, which appear experimentally satisfying. However, the impactof the threshold and the additive noise parameters introduced in the analysis ofeach block should be carefully evaluated. Additionally, repeating the blocks andadding noise may appear as ad-hoc and to introduce discontinuities. In practice,

Page 60: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

it turned out to give better results than computing an activation map from eachsingle block. Improved analysis of each block should, however, be consideredin the future. It is important to note that detected poor blocks are consideredpoor relative to the complete experiment. Therefore the task has to be correctlyperformed without too many artifacts during some blocks to provide valuableinformation. However, we show good results even with 3-block designs. If theapproach may be seen as statistically problematic for such a few blocks, we em-phasize that the aim here is not to estimate a real activation pattern. Even if theestimated ”true” activation pattern may not have a real biological meaning, webelieve that it still provides some information about the relative homogeneityof the blocks. However, the reason of a non-homogeneity (artifacts or subjectcooperation) cannot be determined.Our approach could be extendable to event-related designs for cases for whichthe inter trials time is sufficiently long enough for the hemodynamic response toreturn to the offset level. It could provide information about the relative qualityof each event response. An interesting additional refinement would be to estimatethe underlying activation map from more ‘soft’ information such as posteriorprobability maps. Indeed, the use of binary activation maps prevents propagatingactivation uncertainties during the estimation, and is likely to lose information.The estimation of performance parameters between blocks then appears as apromising way to assess the fMRI data quality. Additionally, this principle couldbe extended to the second statistical level analysis. Evaluation of performanceparameters for each subject in a group may provide valuable information abouthow normal the activation pattern of an individual is compared to the group.

References

1. Friston, K.J., Holmes, A.P., Worsley, K.J., Poline, J.B., Frith, C.D.: Statisticalparametric maps in functional imaging: a general linear approach. Human BrainMapping 2(4) (1995) 189–210

2. Penny, W.D., Trujillo-Barreto, N.J., Friston, K.J.: Bayesian fMRI time series anal-ysis with spatial priors. Neuroimage 24(2) (2005) 350–362

3. Woolrich, M.W., Behrens, T.E., Smith, S.M.: Constrained linear basis sets for HRFmodelling using Variational Bayes. Neuroimage 21(4) (2004) 1748–1761

4. Stocker, T., Schneider, F., Klein, M., Habel, U., Kellermann, T., Zilles, K., Shah,N.J.: Automated quality assurance routines for fMRI data applied to a multicenterstudy. Human Brain Mapping 25(2) (2005) 237–246

5. Genovese, C.R., Noll, D.C., Eddy, W.F.: Estimating test-retest reliability in func-tional MR imaging. I: Statistical methodology. Magn Reson Med 38(3) (1997)497–507

6. Maitra, R.: Assessing certainty of activation or inactivation in test-retest fMRIstudies. Neuroimage 47(1) (2009) 88–97

7. Andersson, J.L., Hutton, C., Ashburner, J., Turner, R., Friston, K.: Modeling geo-metric deformations in EPI time series. Neuroimage 13(5) (2001) 903–919

8. Diedrichsen, J., Shadmehr, R.: Detecting and adjusting for artifacts in fMRI timeseries data. Neuroimage 27(3) (2005) 624–634

9. Warfield, S.K., Zou, K.H., Wells, W.M.: Simultaneous truth and performance levelestimation (staple): an algorithm for the validation of image segmentation. IEEETrans Med Imaging 23(7) (2004) 903–921

Page 61: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

Surface-based versus volume-based fMRI groupanalysis: a case study

Alan Tucholka1,2,3, Merlin Keller1,2, Jean-Baptiste Poline1, Alexis Roche1, andBertrand Thirion2

1 Neurospin, I2BM, CEA, F-91191 Gif-sur-Yvette, France2 INRIA Saclay-Ile-de-France, Parietal, Saclay, France

3 [email protected]

Abstract. Being able to detect reliably functional activity in a popula-tion of subjects is crucial in human brain mapping, both for the under-standing of cognitive functions in normal subjects and for the analysis ofpatient data. The usual approach proceeds by normalizing brain volumesto a common volume-based (3D) template. However, a large part of thedata acquired in fMRI aims at localizing cortical activity, and methodsworking on the cortical surface may provide better inter-subject registra-tion than the standard procedures that process the data in 3D. Neverthe-less, few assessments of the performance of surface-based (2D) versus 3Dprocedures have been shown so far, mostly because inter-subject corticalsurface maps are not easily obtained. In this paper we present a system-atic comparison of 2D versus 3D group-level inference procedures, byusing cluster-level and voxel-level statistics assessed by permutation, inrandom effects (RFX) and mixed-effects analyses (MFX). We find that,using a voxel-level thresholding, and to some extent, cluster-level thresh-olding, the surface-based approach generally detects more, but smalleractive regions than the corresponding volume-based approach for bothRFX and MFX procedures, and that surface-based supra-threshold re-gions are more reproducible by bootstrap.

1 Introduction

Studying the localisation and variability of brain activity across subjects is cer-tainly one of the most important aspects of neuroimaging data analysis [1]. Thedetection and the precise localisation of the BOLD signal are therefore crucial.Clearly, these steps are interacting as the detection of activity across subjectsrequires first to coregister the subjects brains to a common coordinate system.This step is most commonly performed in 3D space, by applying linear andnon-linear warpings such that the anatomical and the functional images arecoregistered to a common template, often chosen to be the MNI template. Thestandard approach to activation detection [2] consists in comparing the imagesfrom the different subjects on a voxel-by-voxel basis, computing a statistical mapto test the presence of an activation in each voxel of the standard space. Theensuing multiple testing problem can be addressed directly at the voxel-level orby testing the presence of activity inside clusters defined above a user-chosenthreshold [3].

Page 62: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

Spatial normalisation is therefore crucial to the whole procedure. Because avery large part of the functional information originates from the cortex, methodsthat work on the cortical surface may be more sensitive than those using the3D data. It is well known that volume-based normalization may introduce in-accuracies in anatomical positioning of functional data, the magnitude of whichmay be estimated as 1cm in several cortical regions [4]. Several studies haveshown that a coregistration based on the cortical surface may better align thefunctional signal across subjects [5,6,7,8]. For instance, it is difficult to accountfor the inter-subject variability of gyri size, shape or position in a 3D referentialand such differences may displace functional activity to a different gyrus.

In this paper, we investigate whether surface-based approaches, that relyon a cortical surface referential, provide better constraints on the position offunctional activity, and more precisely, whether this is reflected in state-of-the-art inter-subject statistical procedures. It is important to note that by workingon the cortical surface we do not take into account the sub-cortical structures.Following [5], we perform functional analysis on the cortical surface for a groupof 25 subjects. The inter subjects analysis relies on matching the subjects corticalsurface [5]. Additionally, we systematically compare the 2D and 3D statisticalanalysis and provide results on the difference in sensitivity of the two approachesfor different tests, for a given control of the type I error. More specifically, weuse for the comparison mixed- and random-effects inference at the voxel and atthe cluster level [9] and assess their bootstrap reproducibility. These statisticalanalyses provide the cognitive neuroscientist or clinician with solid informationon the sensitivity that can be achieved with surface-based methods.

2 Materials and Method

2.1 Data and pre-processing

Data were acquired from 25 subjects who performed a global cognitive assess-ment tasks protocol (named localizer) as described in [10]. This protocol is in-tended to activate multiple brain regions in a very short time (5 minutes, 128volumes) with many experimental conditions to allow the application of differ-ent functional contrasts. Anatomical and functional data were acquired on a1.5T GE scanner. The functional data were first corrected from the EPI distor-tions using field maps. Next, standard pre-processing (correction of differencesin slice timing, motion correction and anatomo-functional co-registration) wereperformed using the SPM5 software on all subjects.

FreeSurfer [5] was used to segment and reconstruct the cortical surface fromT1 MRI data of each subject, providing the white matter mesh for both hemi-spheres (note that the aim of this work is not to evaluate the quality of thissegmentation). This provides a common spherical coordinate system for eachhemisphere in each subject. Pre-processing of the data includes i) segmentationof the white matter on a triangular mesh, ii) detection of the deepest sulci,iii) inflation of the white surface on a sphere (reference sphere), iv) deformationto match the deepest sulci positions on the template model.

Page 63: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

All data are then converted to the standard GIFTI format for further pro-cessing (to obtain a node-by-node correspondence of the resampled brains): i) aregular sphere (icosphere) of diameter equal to the reference sphere is created,ii) this sphere is refolded onto the original cortical surface of each subject whilepreserving node-to-node correspondence of the icosphere mesh between subjects.The resulting gray/white interface mesh is called resampled mesh.

In Fig. 1, two meshes are presented after affine coregistration to the MNItemplate of the corresponding T1 images. The cross corresponding to the topof the posterior Sylvian Fissure sulcus on the left subject appears on the rightsubject in a very different anatomical position. The surface-based correspondence(blue balls) is clearly much more accurate.

Fig. 1. Definition of the top of the posterior Sylvian Fissure sulcus in two brains in anormalized space (MNI space, affine coregistration). The cross corresponds to the samevoxel coordinates. The blue ball correspond to the same node after brain resampling.

An average-brain of the 25 subjects was created for visualisation of the re-sults. Each resampled mesh is coregistered to a normalized space (MNI/Talairachspace), then an average brain is obtained by computing the mean 3D positionof each node through all subjects in that space.

Functional images were then projected onto the resampled gray/white inter-face mesh of each subject using the method described in [11]. A General LinearModel (GLM) analysis was applied at each node and the activation maps wereobtained for the (i) left versus right button presses, (ii) sentence listening versussentence reading, (iii) computation versus reading and (iv) reading versus passivecheckerboard viewing contrasts that we refer to as motor, auditory, computationand reading.

2.2 Group data modeling

In this work, n = 25 subjects are considered. For each subject i, and at any nodeof the resampled mesh, let βi be the estimation of the BOLD effect related tosome some effect of interest (note that this section is easy to generalize to a linearcombination, i.e. contrast of effects). βi is distributed around the true effect βi:βi = βi + ei with ei ∼ N (0, s2i ) where the estimation variance s2i is known fromthe first-level General Linear Model (GLM). We assume that βi = βG+εi whereεi ∼ N (0, σ2) and βG is the population-level effect. We thus have :

βi = βG + ε′i, ε′i ∼ N (0, σ2 + s2i ), (1)

Page 64: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

where σ2 is the between-subject variance. This is a generalization of the RFXmodel in [2] which considers all subjects similarly, si is suposed independent ofi: si ≡ s. We shall consider σ2 ≡ σ2 + s2. Both βG and σ2 are then estimatedby maximizing the log-likelihood of the model specified in Eq. (1) using theExpectation-Maximization (EM) algorithm in [12]. The following log-likelihoodratios are computed to test the positivity of βG:

LMFX = logsupσ2,βG>0

∏Si=1N (βi;βG, σ2 + s2)

supσ2

∏Si=1N (βi; 0, σ2 + s2)

, (2)

LRFX = logsupσ2,βG>0

∏Si=1N (βi;βG, σ2)

supσ2

∏Si=1N (βi; 0, σ2)

(3)

Note that computing LRFX is equivalent to performing a t-test.

2.3 Statistical calibration

The distribution of the statistics in Eq. (2-3) under the null hypothesis (βG = 0)is unknown, but can be estimated very simply by a randomization procedure,in which the statistics are recomputed after a sign swap of the observed effectsβi. Under the hypothesis that the distribution of the true effects is symmet-ric about 0 under the null hypothesis, this procedure yields an exact (possiblyconservative) specificity for the test. In order to control the family-wise errorrate (FWER), i.e. the probability of detecting one false positive over the searchdomain, we consider the distribution of the maximal statistic under the null hy-pothesis. For a chosen FWER α, this yields a voxel- or vertex- level correctedthreshold.

A more sensitive approach to detect extended regions consists in first thresh-olding the statistics map at a given level (corresponding e.g. to p < 10−3 un-corrected), and then to estimate the distribution of size (area or volume) of thesupra-threshold clusters under the null hypothesis. To solve the multiple compar-ison issue, the size of the maximal cluster is tabulated under the null hypothesis.Once again, the quantile α of this simulated distribution yields a cluster-levelcorrected threshold.

In order to enable the comparison of volume-based and surface-based ap-proaches, we systematically project the clusters obtained from the volume-basedapproaches onto the average surface.

3 Results

To compare the surface-based versus the volume-based group-level analysis, weused four different methods (RFX voxel-level, MFX voxel-level, cluster-levelRFX and cluster-level MFX) on the four functional contrasts: The voxel-levelanalysis finds only a few active voxels with the volume-based method, while theactivity map on the surface contains a significant number of active nodes in more

Page 65: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

Fig. 2. Left: Surface-based (top line) versus volume-based (bottom line) voxel-levelRFX group analysis results for the computation task. Right: Cluster-level RFX groupanalysis for the computation task on the surface (top line) and in the volume (bottomline).

Contrast Computation Reading Left Motor Auditory

Method RFX MFX RFX MFX RFX MFX RFX MFX

Surface 14 17 6 7 8 6 4 4Volume 5 0 4 0 4 0 2 2

Table 1. Comparison of the number of clusters found using RFX and MFX methodson volume-based and surface-based approaches.

brain regions (see e.g. the regions activated for the computation task, in the leftpart of Fig. 2, in particular the bottom part of the pre-central gyrus).

The same tendency can be observed with the cluster-level analysis (see Ta-ble 1 and Fig. 2), although the effect is more subtle: the surface-based proce-dures detected many more functional clusters than the volume-based methods,but these clusters are much smaller. This indicates that:

– Many clusters that are merged in 3D are split into different components onthe surface (see Fig. 2); this means that several different functional regionsare merged into one cluster in volume-based analyses. While these regionsare close in Euclidian distance (in the volume space), there are distant onceprojected on the surface (e.g. on a different gyrus).

– The size threshold is much lower for the surface-based approach than forthe volume-based approach, it is likely that 3D cluster include more falsepositive than clusters in 2D.

The MFX statistic never detects any voxels or clusters activated in the vol-ume, while the results on the surface are similar, though weaker than with theRFX statistic. The reduced sensitivity with mixed effects is unexpected and canhint at a first-level model mis-specification, hence a defect of the model design.The results in terms of activated area are given in Table 2; the surface-based

Page 66: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

Cluster-level

Contrast Computation Reading Left Motor Auditory

Method RFX MFX RFX MFX RFX MFX RFX MFX

Surface 131.6 135.1 26.8 28.9 65.4 63.8 96.6 96.7Volume 105.1 0 97.6 0 45.3 0 99 109.1

Voxel-level

Contrast Computation Reading Left Motor Auditory

Method RFX MFX RFX MFX RFX MFX RFX MFX

Surface 17.1 15.4 0.5 0.4 23.9 22.7 46.5 46.9Volume 10.8 0 15 0 15.3 0 46.9 0

Table 2. Comparison of the area of supra-threshold regions in cm2 of volume-base(after projection) and surface-base approaches, for MFX and RFX statistics, with bothcluster-level and voxel-level inference.

approach detects wider regions for the computation task, and less for the readingtask when considering RFX. It is generally more sensitive using MFX.

Additionally, we studied the bootstrap reproducibility of the supra-thresholdregions in the surface- and volume-based procedures: we created P = 103 sur-rogate groups and computed how many times a vertex or a voxel had a p-valuebelow 10−3, uncorrected. We systematically found a higher proportion of con-sistently supra-threshold regions in the surface-based method as illustrated inFig. 3. This effect was quantified using the classification index based on this his-togram [13], which was consistently higher for the surface-based approach (datanot shown).

Fig. 3. Bootstrap reproducibility of RFX thresholding procedures for the motor con-trast: This is the normalized histogram of the reproducibility map that indicates howfrequently a voxel or vertex has a supra-threshold activity, with a threshold correspond-ing to p < 10−3, uncorrected: the surface-based approach has a much higher proportionof consistently supra-threshold regions.

4 Discussion

We have shown in this work that surface-based approaches detects activity inmore brain regions when voxel-based detection is performed (voxel-level RFX,

Page 67: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

Fig. 4. The activity on the pre-central gyrus has also been projected on the neighbour-ing gyrus (post-central gyrus).

voxel-level MFX) compared to the volume-based approach. This gain in sensi-tivity is probably related to two factors:

– The search domain is smaller (i.e. it consists of the cortex only), while thevolume-based approach tests also white matter and cerebro-spinal fluid re-gions. This effect is visible in the bootstrap reproducibility histogram shownin Fig. 3. The price to pay is that sub-cortical structures (thalamus, basalganglia, cerebellum etc.) are not considered in the surface-based representa-tions.

– The coregistration of the data is certainly more accurate, as illustrated inFig. 1 with an anatomical landmark. Note that this better coregistrationcould have the effect to reduce the spread of significant regions by reducingspatial uncertainty on their position.

When considering cluster-level statistics, the gain is more subtle, as discussedin the previous section. More precisely, it can be expected that the finer coreg-istration related to the surface-based approach has a mitigated effect on thedetection of large clusters: volume-based approaches tend to join smaller clus-ters which are close in Euclidean space, but relatively distant on the geodesic(surface-based) representation. It should also be reminded that cluster-level ana-lyses allow only a weak control of false detections: a supra-threshold cluster hasthe property to contain at least one voxel for which the null hypothesis can berejected, but does not allow to conclude for all the voxels or nodes in that cluster.

Moreover, when considering tasks with asymmetric activity, such as reading,the volume-based approach can miss significant activity in a weakly-activatedhemisphere, compared to the surface-based approach (data not shown).

However, in surface-based approaches, if the resolution is not fine enough,or if EPI distortions are not perfectly corrected, the functional signal may bediluted to neighbouring gyri, creating false supplementary clusters (as we can seeon Fig. 4). An interesting question that can still be addressed is how a surfacecoregistration procedure based on geometric features such as Freesurfer wouldcompare to representations of the surface that take into account sulcus labelling.Some recent experiments have indeed shown that sulcus-based coordinate sys-tems tend to stabilize the position of some functional landmarks [14].

Conclusion Performing fMRI group analysis on the cortical surface instead ofthe brain volume may benefit to the detection of some foci of activity, and thus

Page 68: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

yield more specific, and possibly more sensitive analysis than standard volume-based approaches. It may thus reveal sharper contrasts in the functional dataand thus provide more reliable markers of brain functional anatomy. It willalso be important to assess whether surface-based analysis better discriminatespopulation of controls and patients than standard volume-based methods.

References

1. Andrade, A., Kherif, F., Mangin, J.F., Worsley, K., Paradis, A.L., Simon, O.,Dehaene, S., Poline, J.B.: Detection of fMRI activation using cortical surfacemapping. Hum. Brain Mapp. 12 (2001) 79–93

2. Friston, K.: Statistical parametric mapping. In Frackowiak, R., Friston, K., Frith,C., Dolan, R., , Mazziotta, J., eds.: Human Brain Function. Academic Press USA(1997)

3. Hayasaka, S., Nichols, T.: Validating Cluster Size Inference: Random Field andPermutation Methods. NeuroImage 20(4) (2003) 2343–2356

4. Stiers, P., Peeters, R., Lagae, L., Hecke, P.V., Sunaert, S.: Mapping multiple visualareas in the human brain with a short fMRI sequence. NeuroImage 29(1) (Jan2006) 74–89

5. Fischl, B., Sereno, M.I., Dale, A.M.: Cortical surface-based analysis. ii: Inflation,flattening, and a surface-based coordinate system. Neuroimage 9(2) (Feb 1999)195–207

6. Argall, B.D., Saad, Z.S., Beauchamp, M.S.: Simplified intersubject averaging onthe cortical surface using suma. Hum Brain Mapp 27(1) (Jan 2006) 14–27

7. Desai, R., Liebenthal, E., Possing, E.T., Waldron, E., Binder, J.R.: Volumetric vs.surface-based alignment for localization of auditory cortex activation. NeuroImage26(4) (2005) 1019 – 1029

8. Anticevic, A., Dierker, D.L., Gillespie, S.K., Repovs, G., Csernansky, J.G., Essen,D.C.V., Barch, D.M.: Comparing surface-based and volume-based analyses offunctional neuroimaging data in patients with schizophrenia. NeuroImage 41(3)(2008) 835 – 848

9. Meriaux, S., Roche, A., Dehaene-Lambertz, G., Thirion, B., Poline, J.B.: Combinedpermutation test and mixed-effect model for group average analysis in fmri. HumBrain Mapp 27(5) (May 2006) 402–410

10. Pinel, P., Thirion, B., Meriaux, S., Jobert, A., Serres, J., Bihan, D.L., Poline,J.B., Dehaene, S.: Fast reproducible identification and large-scale databasing ofindividual functional cognitive networks. BMC Neurosci 8 (2007) 91

11. Operto, G., Bulot, R., et al.: Projection of fMRI data onto the cortical surfaceusing anatomically-informed convolution kernels. Neuroimage 39(1) (Jan 2008)127–135

12. Keller, M., Roche, A.: Increased sensitivity in fmri group analysis using mixed-effect modeling. ISBI 2008 (May 2008) 548–551

13. Thirion, B., Pinel, P., Mriaux, S., Roche, A., Dehaene, S., Poline, J.B.: Analysisof a large fmri cohort: Statistical and methodological issues for group analyses.NeuroImage 35(1) (2007) 105 – 120

14. Tucholka, A., Thirion, B., Pinel, P., Poline, J.B., Mangin, J.F.: Triangulatingcortical functional networks with anatomical landmarks. ISBI 2008 (May 2008)612–615

Page 69: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

CanICA: Model-based extraction of reproduciblegroup-level ICA patterns from fMRI time series

G. Varoquaux1, S. Sadaghiani2, J.B. Poline2, B. Thirion1

1 INRIA, Saclay-Ile de France, Saclay, France,2 CEA/Neurospin, Saclay, France ? ? ?

Abstract. Spatial Independent Component Analysis (ICA) is an in-creasingly used data-driven method to analyze functional Magnetic Res-onance Imaging (fMRI) data. To date, it has been used to extract mean-ingful patterns without prior information. However, ICA is not robustto mild data variation and remains a parameter-sensitive algorithm. Thevalidity of the extracted patterns is hard to establish, as well as the sig-nificance of differences between patterns extracted from different groupsof subjects. We start from a generative model of the fMRI group datato introduce a probabilistic ICA pattern-extraction algorithm, calledCanICA (Canonical ICA). Thanks to an explicit noise model and canon-ical correlation analysis, our method is auto-calibrated and identifies thegroup-reproducible data subspace before performing ICA. We compareour method to state-of-the-art multi-subject fMRI ICA methods andshow that the features extracted are more reproducible.

1 Introduction

Resting-state fMRI is expected to give insight into the intrinsic structure ofthe brain and its networks. In addition, such protocols can be easily appliedto impaired subjects and can thus yield useful biomarkers to understand themechanisms of brain diseases and for diagnosis. Spatial ICA has been the mostsuccessful method for identifying meaningful patterns in resting-state fMRI datawithout prior knowledge. The use of the resulting patterns is widespread incognitive neuroscience, as they are usually well-contrasted, separate differentunderlying physiological, physical, and cognitive processes, and bring into lightrelevant long-range cognitive networks.

However, validation of the resulting individual patterns suffers from the lackof testable hypothesis. As a result, cognitive studies seldom rely on automaticanalysis, and relevant ICA maps are cherry picked by eye to separate themfrom noise-induced patterns. Probabilistic ICA models have been used to pro-vide pattern-level noise-rejection criteria [1] or likelihood for the model [2], buthave yet to provide adequate auto-calibration and pattern-significance testing.The lack of reproducibility is detrimental to group analysis: various, often non-overlapping, patterns have been published [3, 4] and the statistical frameworksfor comparison or inference on ICA maps have to be further developed.

? ? ? Funding from INRIA-INSERM collaboration. The fMRI set was acquired in thecontext of the SPONTACT ANR project

Page 70: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

To allow for group analysis, it is important to extract from a group of sub-jects ICA maps that are well-represented in the group. Various strategies havebeen adopted: group ICA [5] concatenates individual time series; tensor ICA [6]estimates ICA maps across subjects with different loadings per subject; NEDICA[4] merges ICA maps by hierarchical clustering.

In this paper, we present a novel model and a method, that we dub CanICA,to extract only the reproducible ICA maps from group data. The strength of thismethod lies in the elimination of the components non-reproducible across sub-jects using model-informed statistical testing and canonical correlation analysis(CCA). We compare the reproducibility of features extracted by our method tofeatures extracted using tensor ICA and group ICA, but could not compare toNEDICA since it suppresses in some way between-subject variability.

2 Methods

2.1 Generative model: from group-level patterns to observations

At the group level, we describe intrinsic brain activity by a set of spatial patternsB corresponding to networks common to the group. We give a generative modelto account for inter-subject variability and observation noise.

The activity recorded on each subject s can be described by a set of subject-specific spatial patterns Ps, which are a combination of the group-level patternsB with loadings given by Λs and additional subject-variability denoted as aresidual matrix Rs. If we write the spatial patterns B, Ps and Rs as npatterns×nvoxels matrices, for each subject s, Ps = Λs B + Rs. In other words, at thegroup level, considering the group of patterns (vertically concatenated matrices)P = {Ps}, R = {Rs}, and Λ = {Λs}, s = 1 . . . S,

P = Λ B + R. (1)

For paradigm-free acquisitions, there is no specific time course set by an externalstimulus or task, and for each acquisition-frame time point a mixture of differentprocesses, described by different patterns, is observed. The observed fMRI datais a mixture of these patterns confounded by observation noise: let Ys be theresulting spatial images in BOLD MRI sequences for subject s (an nframes ×nvoxels matrix), Es the observation noise, and Ws a loading matrix such that:

Ys = Ws Ps + Es. (2)

2.2 Estimating group-reproducible patterns from fMRI data

Noise rejection using the generative model. Starting from fMRI image se-quences, {Ys, s = 1...S}, time-sliced interpolated and registered to the MNI152template, we separate reproducible patterns from noise by estimating succes-sively each step of the above hierarchical model.

Page 71: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

First, we separate observation noise Es from subject-specific patterns Ps

(Eq. 2) through principal component analysis (PCA). The principal componentsexplaining most of the variance for a given subject’s data set form the patterns ofinterest, while the tail of the spectrum is considered as observation noise. Usinga singular value decomposition (SVD), Ys = Us Σs VT

s . The n first columns ofVT

s constitute the “whitened” patterns Ps that we retain: Ps = (Vs)1...n, andthe residual constitutes the observation noise: Es = Ys − (Us Σs VT

s )1...n.As PCA can be interpreted as latent-variable separation under the assump-

tion of normally-distributed individual random variates, we model observationnoise as normally-distributed. Following [7], we set the number n of significantPCA components retained by drawing a sample null-hypothesis dataset using arandom normal matrix and comparing the bootstrap stability of PCA patternsfor the measured signal and the null-hypothesis sample. Unlike information-based criteria used in previous methods [1, 5] for order selection, such as approx-imations of model evidence or BIC, the selected number of significant compo-nents does not increase when adding artificially number of noise sources[7], andthus does not diverge with long fMRI time series. This is important to avoidextracting group-reproducible patterns from observation noise.

To identify a stable-component subspace across subjects, estimating Eq. (1),we use a generalization of canonical correlation analysis (CCA). CCA is usedto identify common subspaces between two different datasets. While there is nounique generalization to multiple datasets, an SVD of the various whitened andconcatenated datasets can be used and is equivalent to standard CCA in the two-datasets case [8]. Given P = {Ps}, SVD yields P = Υ Z ΘT where ΘT formsthe canonical variables, and Z the canonical correlations, which yield a measureof between-subject reproducibility. Estimation of the inter-subject reproduciblecomponents B is given by the vectors of ΘT for which the corresponding canon-ical correlation Z is above significance threshold. Λ is identified as the corre-sponding loading vectors of Υ Z. For a given number of selected components,this estimator minimizes the sum of squares for the residual R = P− Λ B.

The significance threshold on the canonical correlations is set by samplinga bootstrap distribution of the maximum canonical correlation using Es, thesubject observation noise identified previously, instead of Ps. Selected canonicalvariables have a probability p < 0.05 of being generated by the noise.

Identifying independent features in the group-level patterns The se-lected group-level components B form reproducible patterns of task-free acti-vation, but they represent a mixture of various processes and are difficult tointerpret for lack of distinguishable shape standing out. We perform source sep-aration using spatial ICA on this subspace. From the patterns B we estimatea mixing matrix M and group-level independent components A using the FAS-TICA algorithm [9]: B = M A. FASTICA is an optimization algorithm whichsuccessively extracts patterns with maximally non-Gaussian marginal distribu-tions. This separation corresponds to identifying the maximum-contrast patternsin the subspace of interest. These spatial patterns correspond to minimally-

Page 72: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

dependant processes and contain identifiable physiological, physical, or neuronalcomponents.

Consistent with the FASTICA model, the main regions that form the nodesof the functional networks within the resulting patterns are the regions withvalues corresponding to the non-Gaussian tails of the histogram. Following [10],we model the non-interesting voxels as normally-distributed and estimate thenull distribution from the central part of the histogram. The voxels of interestare selected using an uncorrected p-value of 10−3. We use a specificity criterionrather than a false discovery rate as it yields more stable results, especially on thevery-long-tailed distributions that we encounter. As the total number of voxelsin the brain is about 40 000, we expect no more than 40 false positives, whichcorresponds to a small false-discovery rate.

2.3 Model validation for inter-subject generalization

The validation criteria for an ICA decomposition are unclear, as this algorithmis not based on a testable hypothesis. The use of ICA is motivated by the factthat the patterns extracted from the fMRI data display meaningful features inrelation to our knowledge of functional neuroanatomy. These features should becomparable between subjects, and thus generalize to new subjects.

To test the reproducibility of the results across subjects, we split our groupof subjects in two and learn ICA maps from each sub-group: this yields A1 andA2. We compare the overlap of thresholded maps and reorder one set to matchmaps by maximum overlap. Reproducibility can be quantified by studying thecross-correlation matrix C = AT

1 A2. For unit-normed components, Ci,j is 1 ifand only if (A1)i and (A2)j are identical.

We define two measures for overall stability and reproducibility of the maps.First, a measure of the overlap of the subspaces selected in both groups is given bythe energy of the matrix: E = tr (CT C). To compare this quantity for differentsubspace sizes, we normalize it by the minimum dimension of the subspaces,d = min(rank A1, rank A2): e = 1

d tr (CT C). e quantifies the reproducibility ofthe subspace spanned by the maps. For e = 1, the two groups of maps span thesame subspace, although individual independent components may differ. Second,we use an overall measure of reproducibility for the maps: the normalized trace ofthe reordered cross-correlation matrix C, t = 1

d tr (C). Indeed, after A2 has beenreordered to maximize matching with A1, the diagonal coefficients of C give theoverlap between matched components. Finally, the maximum value of each rowand column of C expresses the best match each component learned in one groupon the set learned in the other. We plot its histogram. This indicator accountsfor components of one group matching multiple components of the other.

3 Results

3.1 Extracted group-level patterns of interest

Twelve healthy volunteers were scanned after giving informed consent. 820 EPIvolumes were acquired every TR = 1.5 s at a 3mm isotropic resolution, during a

Page 73: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

-347 32(a) 0-4 11(b)

045 -16(c) 43-48 -14(d)

Fig. 1. ICA maps: (a) a neuronal component (default mode network), (b) a ventricularcomponent, (c) and (d) physiological noise and motion components.

rest period of 20 minutes. CanICA identified 50 non-observation-noise principalcomponents at the subject level (Eq. 2) and a subspace of 42 reproducible pat-terns at the group level (Eq. 1), which matches numbers commonly hand-selectedby users in current ICA studies. On these long sequences, model-evidence-basedmethods such as those used in [1] select more than 300 components. Extractedmaps can be classified by eye in neuronal components, cardio-spinal-fluid (CSF)induced fluctuations, and movement-related patterns (Fig. 1). The empirical-null-based thresholding yields best results for neuronal components (see Fig. 2).An interesting side result of these maps is that measurement artifacts such asmovement or CSF noise form reproducible patterns between different subjects.

3.2 Inter-group reproducibility

We performed 38 analyses on paired groups of 6 different subjects. Out of the 76groups, 2 yielded 20 stable components, 19 yielded 21 components, 36 yielded 22components, 17 yielded 23 components, and the last 2 yielded 24 components. Wecompare our method with tensor ICA [6], running analysis using the MELODICsoftware on each group, and group-ICA, using the GIFT ICA toolbox (http://icatb.sourceforge.net/). To avoid bias from the selected-subspace dimension,we run the tensor ICA and group ICA analysis specifying 23 components.

We do cross-correlation analysis on the non-thresholded ICA maps, but alsouse each implementation’s thresholding algorithm to separate the features ofinterest. For Group-ICA, the thresholding is done on the t statistics maps gen-erated by the algorithm. As these maps have low amplitude, thresholding on|t| > 3 leaves very few selected voxels; we use |t| > 2, which yields the samenumber of selected voxels as those selected by the two other methods.

On non-thresholded maps, CanICA and Group ICA perform similarly whereastensor ICA selects a slightly less stable subspace, and thus yields less repro-ducible ICA maps (see Fig. 3 and Tab. 1). Thresholding the maps does notsignificantly change the subspace stability (e) and map reproducibility (t) forCanICA, but decreases performance for tensor ICA and drastically affects sta-bility and reproducibility for group ICA.

Page 74: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

4417 24(a)

Map histogram

empiricall nulldata

(b)

4417 24(c)

Fig. 2. (a) A neuronal ICA map and (b) its histogram. The null distribution, shown inred on the histogram, is estimated from the center of the histogram. (c) Correspondingthresholded map; only voxels with p < 5·10−4 (uncorrected, two-sided test) are kept.

Non-thresholded mapsGroup ICA Tensor ICA CanICA

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

(a) 0.0 0.2 0.4 0.6 0.8 1.0Component overlap

0

5

10

15

Perc

enta

ge

Maximum inter-group component match

Group ICATensor ICACanICA

(b)Thresholded maps

Group ICA Tensor ICA CanICA

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

(c) 0.0 0.2 0.4 0.6 0.8 1.0Component overlap

0

10

20

30

40

50

Perc

enta

ge

Maximum inter-group component match

Group ICATensor ICACanICA

(d)

Fig. 3. (a) Typical cross-correlation matrices between non-thresholded ICA mapslearned on two half-splits of the total 12 subjects, for group ICA, tensor ICA, andCanICA. (b) Histogram of the maximal overlap for each component from one half-split to a component from the other half. (c) and (d) cross-correlation matrices andhistogram for thresholded ICA maps.

Non-thresholded maps Thresholded mapsGroup ICA Tensor ICA CanICA Group ICA Tensor ICA CanICA

e 0.58 (0.04) 0.47 (0.06) 0.55 (0.05) 0.03 (.004) 0.31 (0.03) 0.52 (0.05)t 0.53 (0.04) 0.36 (0.03) 0.53 (0.05) 0.10 (0.01) 0.35 (0.02) 0.53 (0.04)

Table 1. Reproducibility measures e and t for Group ICA, Tensor ICA and CanICAcalculated on the half-split cross-correlation matrices, both for non-thresholded andthresholded maps. Numbers in parenthesis give the standard deviation of the mean.

Page 75: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

4 Discussion

Importance of capturing inter-subject variability The close correspon-dence between the overlap of the selected subspace (e) and independent-componentmatching quality (t) suggests that identification of the reproducible signal is akey step to identifying stable independent components. Our method explainsonly a small fraction of the total signal variance –less than 50% for all 12 subjects.This fraction is selected both on noise-rejection criteria, with well-specified noisemodels, and to best take in account inter-subject variability, by CCA. Indeed,CCA selects linear combinations of subject patterns that have highest canonicalcorrelations. As the individual subject components are whitened, each one cancontribute no more than 1 to the canonical correlation. Thus high canonical cor-relations ensure representation of a large fraction of the group. In addition, thealgorithm minimizes the sum of squares of the total subject-variability residual,R. The stability of the subspace selected by the GIFT implementation of groupICA is explained by similar reasons.

We believe that subject variability is not accounted for as well by tensor ICAbecause the variability noise is estimated during the tensorial ICA step, for whichstatistical significance is hard to establish. The combination of subject-specificcomponents present in the final estimated independent components is not guar-antied to reflect multiple-subject contributions and in practice the correspondingsubject-loading vectors are often unbalanced across subjects.

Residual ICA-patterns instability ICA extracts map by successive opti-mizations of linear combinations to maximize negentropy. It is very flexible be-cause, unlike PCA, it does not require orthogonality of the corresponding timecourses. However, even with the careful noise reduction performed by CanICA,small signal perturbation can lead to different patterns. As an example, CanICArun on two eleven-subjects groups, differing only by one subject, can yield pat-terns in which the plausible neuronal activation clusters differ (Fig. 4). Oneshould thus be careful when inferring cognitive networks using ICA and, whenpossible, do significance testing using other criteria, such as seed-voxel correla-tion analysis.

48-56 39(a) 48-56 39(b)

Fig. 4. Two ICA patterns corresponding to Fig. 2 estimated from two sub-groups of11 subjects, with 10 common subjects. On the left, (a), an activation cluster in the leftsuperior parietal lobe is clearly visible; whereas on the right, (b), the correspondingcluster does not stand out from noise.

Page 76: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

5 Conclusion

We have presented a novel blind pattern-extraction algorithm for fMRI data. Ourmethod, CanICA, is auto-calibrated and extracts the significant patterns accord-ing to a noise model. From these patterns, reproducible and meaningful featurescould be extracted. An important aspect of our method, specifically designed toperform group analysis, is that the features selected are more reproducible thanother group-level ICA methods because it identifies a significantly-reproduciblesignal subspace and extracts localized features with a criteria consistent withthe ICA algorithm.

CanICA is numerically efficient, as it relies solely on well-optimized linearalgebra routines and performs the ICA optimization loop on a small numberof components. Performance is important to scale to long fMRI time series,high-resolution data, or large groups. In addition, as the group-level patternextraction (CCA and ICA) is very fast1, cross-validation is feasible. ICA is anunstable algorithm with no intrinsic significance testing, but we have shown thatcross-validation can be used to establish validity of group-level maps.

References

1. Beckmann, C.F., Smith, S.M.: Probabilistic independent component analysis forfunctional magnetic resonance imaging. Trans Med Im 23(2) (2004) 137–152

2. Guo, Y., Pagnoni, G.: A unified framework for group independent componentanalysis for multi-subject fMRI data. NeuroImage 42(3) (2008) 1078 – 1093

3. Damoiseaux, J.S., Rombouts, S.A.R.B., Barkhof, F., Scheltens, P., Stam, C.J.,Smith, S.M., Beckmann, C.F.: Consistent resting-state networks across healthysubjects. Proc Natl Acad Sci U S A 103(37) (2006) 13848–13853

4. Perlbarg, V., Marrelec, G., Doyon, J., Pelegrini-Issac, M., Lehericy, S., Benali, H.:NEDICA: Detection of group functional networks in FMRI using spatial indepen-dent component analysis. In: Proc. ISBI. (2008) 1247–1250

5. Calhoun, V.D., Adali, T., Pearlson, G.D., Pekar, J.J.: A method for making groupinferences from functional MRI data using independent component analysis. HumBrain Mapp 14(3) (2001) 140–151

6. Beckmann, C.F., Smith, S.M.: Tensorial extensions of independent componentanalysis for multisubject FMRI analysis. Neuroimage 25(1) (2005) 294–311

7. Mei, L., Figl, M., Rueckert, D., Darzi, A., Edwards, P.: Statistical shape modelling:How many modes should be retained? CVPRW (2008) 1–8

8. Kettenring, J.R.: Canonical analysis of several sets of variables. Biometrika 58(3)(1971) 433–451

9. Hyvrinen, A., Oja, E.: Independent component analysis: algorithms and applica-tions. Neural Networks 13(4-5) (2000) 411 – 430

10. Schwartzman, A., Dougherty, R.F., Lee, J., Ghahremani, D., Taylor, J.E.: Em-pirical null and false discovery rate analysis in neuroimaging. NeuroImage 44(1)(2009) 71 – 82

1 a few minutes for our data on a 2 GHz Intel core Duo

Page 77: MICCAI’09 fMRI data analysis workshop (Thursday 24th ......poral autocorrelations inherent to fMRI data [1{4] and anatomical information [5,6]. More recently, inspired by the success

Author index

Author lastname Paper index(es) & page referenceDeepti Bathula [Paper I – pp. 5]Lawrence Staib [Paper I – pp. 5]Hemant Tagare [Paper I – pp. 5]Xenophon Papademetris [Paper I – pp. 5]Robert Schultz [Paper I – pp. 5]James Duncan [Paper I – pp. 5]Gregory Optero [Paper II – pp. 13]Bernard Fertil [Paper II – pp. 13]Remy Bulot [Paper II – pp. 13]Olivier Coulon [Paper II – pp. 13]Jian Cheng [Paper III – pp. 21]Feng Shi [Paper III – pp. 21]Kun Wang [Paper III – pp. 21]Ming Song [Paper III – pp. 21]Jiefeng Jiang [Paper III – pp. 21]Lijuan Xu [Paper III – pp. 21]Tianzi Jiang [Paper III – pp. 21]Vincent Michel [Paper IV – pp. 29]Evelyn Eger [Paper IV – pp. 29]Christine Keribin [Paper IV – pp. 29]Bertrand Thirion [Papers IV/VIII/IX – pp. 29/61/69]Bernard Ng [Paper V – pp. 37]Rafeef Abugharbieh [Paper V – pp. 37]Ghassan Hamarnech [Paper V – pp. 37]Martin McKeown [Paper V – pp. 37]Indrayana Rustandi [Paper VI – pp. 45]Marcel Just [Paper VI – pp. 45]Tom Mitchell [Paper VI – pp. 45]Benoıt Scherrer [Paper VII – pp. 53]Olivier Commowick [Paper VII – pp. 53]S.K. Warfield [Paper VII – pp. 53]Alan Tucholka [Paper VIII – pp. 61]Merlin Keller [Paper VIII – pp. 61]Jean-Baptiste Poline [Paper VIII/IX – pp. 61/69]Alexis Roche [Paper VIII – pp. 61]Gael Varoquaux [Paper IX – pp. 69]Sepideh Sadaghiani [Paper IX – pp. 69]