introduction to classifiers for multivariate decoding of fmri data

Introduction to classifiers for multivariate decoding of fMRI data

Evelyn Eger

MMN 15/12/08

Two directions of inference

Psychological variable

Data

1) Forward modelling:

(p-value)

Data Psychological variable

2) Decoding:

(predictionaccuracy)

Two directions of inference

Inverse inference (decoding) is of special interest e.g., for brain – computer interface, automated diagnosis, etc.

In other cases the two are in principle interchangeable, both demonstrate a statistical dependency between experimental variable and data

In many paradigms applying decoding to fMRI, the direction of inference is not central for the interpretation (eg., Haynes & Rees, 2006, Kriegeskorte & Bandettini, 2007 for reviews)

Efficient, powerful methods based on decoding exist for pattern-based (multivariate) applications

Univariate versus multivariate

Univariate analysis:

effects are analysed for a single dependent variablee.g., t-test, F-test, ANOVA

Special case: „mass-univariate“ analysis in brain imaging: we test effects in a large number of voxels treated as independent

Multivariate analysis:

Effects are analysed for multiple dependent variablese.g., Hotelling´s t-square test, Wilks Lambda, MANOVA

Adapted from Haynes et al. 2006

Stimulus conditions:

1 2

Discrimination can be improved with higher dimensions

Significance of individual voxels not required

Why go multivariate in brain imaging

Linear classification (in 2D space)

Voxel 1

Vox

el 2

b

w

Set of points xi

with labels yi Є {1,-1}

separated by ahyperplane y = wTx + b

so that yi(wxi + b) > 1

For dimensions NHyperplane N-1

Linear classification (in 2D space)

Voxel 1

Vox

el 2 New data projected

onto previously learned hyperplane

Assignment to classesyi Є {1,-1}

prediction accuracy

Which hyperplane to choose ?

Difference between means

w m2-m1

Corresponding to a classifier based on Euclidean distance / correlation

m2

m1

Examples difference between means

From Haxby et al., 2001

used to demonstrate distinct multi-voxel

activity patterns for object categories in

ventral visual cortex (Haxby et al., 2001)

and for other recent studies on object

representation, e.g. position tolerance

(Schwarzlose et al., 2008), perceived shape

similarity (Op de Beeck et al., 2008)

Difference between means

w m2-m1

Corresponding to a classifier based on Euclidean distance / correlation

not taking into account variances/covariances

m2

m1

Fishers linear discriminant

w S-1(m2-m1)

S – covariance matrix

Distance measure:Mahalanobis distance

m2

m1

Examples Fishers linear discriminant

Decoding of conscious and unconscious stimulus orientation from early visual cortex activity (Haynes & Rees, 2005)

Discrimination of individual faces in anterior inferotemporal cortex (Kriegeskorte et al., 2007)

From Haynes & Rees, 2006 review

From Kriegeskorte et al, 2007

Fishers linear discriminant

w S-1(m2-m1)

S – covariance matrix

Distance measure:Mahalanobis distance

Curse of dimensionality:

S is not invertible when dimensionality exeeds number of data points

m2

m1

w : weighted linear combination of support vectors

minimising ||w||/2 subject to yi(wxi + b) > 1, i = 1 : N

“hard-margin” classifier

Support vector machines

SupportVector

SupportVector

SupportVector

“soft-margin” classifier

ξ

ξ


w : weighted linear combination of support vectors

minimising ||w||/2 + C∑ξi

subject to yi(wxi + b) ≥ 1 – ξi, i = 1 : N, ξ >0

C – regularisation parameter(trade-off largest margin versus fewest misclassi-fications)

SupportVector

SupportVector

SupportVector

Examples SVM

Decoding of attented orientation and motion direction from early visual cortex activity (Kamitani & Tong, 2005, 2006)

From Kamitani & Tong, 2005


Non-linear classifier

SupportVector

SupportVector

SupportVector

SupportVector

Use of non-linear kernel functions

Potential of overfitting, especially when few training examples available

Hardly used in fMRI

Comparison of classifier performance

From Cox & Savoy, 2003

Analysis work flow

1) ROI definition

...

Condition 1 Condition 2

2) Data extraction

Patternclassifier

3) Training

Object discrimination(same size)

Size generalisation(1 step)

4) Test

Patternclassifier



...


Analysis work flow

1) ROI definition 2) Data extraction

3) Training

4) Test

ROI definition – voxel selection

Regions of interest have to be defined by orthogonal contrast

(e.g., in object exemplar discrimination experiment, LOC

localiser session, all stimuli vs baseline etc.)

if a further voxel-selection is performed based on the contrast

of interest, this has to be on training data only to avoid bias

also other criteria for voxel selection (e.g., „reproducibility“ of

voxelwise response to different conditions in separate sessions,

Grill-Spector et al., 2006, Nat Neurosci) can be biased

Patternclassifier



...


Analysis work flow


3) Training

4) Test

Data extraction

Which data to use for classification?

No general rule, different studies used beta images or raw EPI

images

ideally as many images as possible for optimal classification

performance

in typical neuroimaging studies, there is a tradeoff between

number of images and their individual signal-to-noise ratio

fewer, but less noisy images are sometimes preferable (when

using SVM)

Patternclassifier



...


Analysis work flow


3) Training

4) Test

Crossvalidation (Training – test)

Classifier performance always has to be tested on independent

data

Split-half crossvalidation (often used in studies employing

correlation) – one half of data for training, the other for test

Leave-one-out crossvalidation (common with other classifiers),

e.g. all but one sessions for training, remaining session for test

testSVM patternclassifier

training?

…

…

Leave-one-out Crossvalidation

Leave one out with N-fold cross-validation

…

…

Condition 1

Condition 2

Block 1 : N

(all but one patterns / condition)

testSVM patternclassifier

training?

…

…

(all but one patterns / condition)

Leave one out with N-fold cross-validation

…

…

Condition 1

Condition 2

Block 1 : N

Leave-one-out Crossvalidation

Crossvalidation (Training – test)

Classifier performance always has to be tested on independent

data

Split-half crossvalidation (often used in studies employing

correlation) – one half of data for training, the other for test

Leave-one-out crossvalidation (common with other classifiers),

e.g. all but one sessions for training, remaining session for test

Importantly, „leave-one-out“ should mean leave one image of

each condition out (all of one session) – avoid biases due to

session effects and unequal prior probabilities (with SVM)

Implementations

General SVM implementations exist in different languages:

Matlab: SVM toolbox (University of Southampton,UK) http://www.isis.ecs.soton.ac.uk/resources/svminfo

SVM toolbox (TU Graz, Austria)http://ida.first.fraunhofer.de/~anton/software.html

C: SVM-light

http://svmlight.joachims.org

Python or R

Multi - Voxel Pattern Analysis (MVPA) toolbox for fMRI data

developed at Princeton University (beta version - matlab, python)http://www.csbmb.princeton.edu/mvpa

Appendix: Distance measures

Given an m-by-n data matrix X, which is treated as m (1-by-n) row vectors x1, x2,

..., xm, the various distances between the vector xr and xs are defined as:

Euclidean distance:

Drs2 = (xr-xs)(xr-xs)´

Standardised Euclidean distance:

Drs2 = (xr-xs)D-1(xr-xs)´

D - diagonal matrix with diagonal elements given by the variance of the variable

Xi over the m objects

Mahalanobis distance:

Drs2 = (xr-xs)S-1(xr-xs)´

S - sample covariance matrix

introduction to classifiers for multivariate decoding of fmri data

Documents