introduction to classifiers for multivariate decoding of fmri data

30
Introduction to classifiers for multivariate decoding of fMRI data Evelyn Eger MMN 15/12/08

Upload: olin

Post on 06-Jan-2016

29 views

Category:

Documents


2 download

DESCRIPTION

Introduction to classifiers for multivariate decoding of fMRI data. Evelyn Eger. MMN 15/12/08. Two directions of inference. 1) Forward modelling:. Psychological variable. (p-value). Data. 2) Decoding:. (prediction accuracy). Psychological variable. Data. Two directions of inference. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction to classifiers for multivariate decoding of fMRI data

Introduction to classifiers for multivariate decoding of fMRI data

Evelyn Eger

MMN 15/12/08

Page 2: Introduction to classifiers for multivariate decoding of fMRI data

Two directions of inference

Psychological variable

Data

1) Forward modelling:

(p-value)

Data Psychological variable

2) Decoding:

(predictionaccuracy)

Page 3: Introduction to classifiers for multivariate decoding of fMRI data

Two directions of inference

Inverse inference (decoding) is of special interest e.g., for brain – computer interface, automated diagnosis, etc.

In other cases the two are in principle interchangeable, both demonstrate a statistical dependency between experimental variable and data

In many paradigms applying decoding to fMRI, the direction of inference is not central for the interpretation (eg., Haynes & Rees, 2006, Kriegeskorte & Bandettini, 2007 for reviews)

Efficient, powerful methods based on decoding exist for pattern-based (multivariate) applications

Page 4: Introduction to classifiers for multivariate decoding of fMRI data

Univariate versus multivariate

Univariate analysis:

effects are analysed for a single dependent variablee.g., t-test, F-test, ANOVA

Special case: „mass-univariate“ analysis in brain imaging: we test effects in a large number of voxels treated as independent

Multivariate analysis:

Effects are analysed for multiple dependent variablese.g., Hotelling´s t-square test, Wilks Lambda, MANOVA

Page 5: Introduction to classifiers for multivariate decoding of fMRI data

Adapted from Haynes et al. 2006

Stimulus conditions:

1 2

Discrimination can be improved with higher dimensions

Significance of individual voxels not required

Why go multivariate in brain imaging

Page 6: Introduction to classifiers for multivariate decoding of fMRI data

Linear classification (in 2D space)

Voxel 1

Vox

el 2

b

w

Set of points xi

with labels yi Є {1,-1}

separated by ahyperplane y = wTx + b

so that yi(wxi + b) > 1

For dimensions NHyperplane N-1

Page 7: Introduction to classifiers for multivariate decoding of fMRI data

Linear classification (in 2D space)

Voxel 1

Vox

el 2 New data projected

onto previously learned hyperplane

Assignment to classesyi Є {1,-1}

prediction accuracy

Which hyperplane to choose ?

Page 8: Introduction to classifiers for multivariate decoding of fMRI data

Difference between means

w m2-m1

Corresponding to a classifier based on Euclidean distance / correlation

m2

m1

Page 9: Introduction to classifiers for multivariate decoding of fMRI data

Examples difference between means

From Haxby et al., 2001

used to demonstrate distinct multi-voxel

activity patterns for object categories in

ventral visual cortex (Haxby et al., 2001)

and for other recent studies on object

representation, e.g. position tolerance

(Schwarzlose et al., 2008), perceived shape

similarity (Op de Beeck et al., 2008)

Page 10: Introduction to classifiers for multivariate decoding of fMRI data

Difference between means

w m2-m1

Corresponding to a classifier based on Euclidean distance / correlation

not taking into account variances/covariances

m2

m1

Page 11: Introduction to classifiers for multivariate decoding of fMRI data

Fishers linear discriminant

w S-1(m2-m1)

S – covariance matrix

Distance measure:Mahalanobis distance

m2

m1

Page 12: Introduction to classifiers for multivariate decoding of fMRI data

Examples Fishers linear discriminant

Decoding of conscious and unconscious stimulus orientation from early visual cortex activity (Haynes & Rees, 2005)

Discrimination of individual faces in anterior inferotemporal cortex (Kriegeskorte et al., 2007)

From Haynes & Rees, 2006 review

From Kriegeskorte et al, 2007

Page 13: Introduction to classifiers for multivariate decoding of fMRI data

Fishers linear discriminant

w S-1(m2-m1)

S – covariance matrix

Distance measure:Mahalanobis distance

Curse of dimensionality:

S is not invertible when dimensionality exeeds number of data points

m2

m1

Page 14: Introduction to classifiers for multivariate decoding of fMRI data

w : weighted linear combination of support vectors

minimising ||w||/2 subject to yi(wxi + b) > 1, i = 1 : N

“hard-margin” classifier

Support vector machines

SupportVector

SupportVector

SupportVector

Page 15: Introduction to classifiers for multivariate decoding of fMRI data

“soft-margin” classifier

ξ

ξ

Support vector machines

w : weighted linear combination of support vectors

minimising ||w||/2 + C∑ξi

subject to yi(wxi + b) ≥ 1 – ξi, i = 1 : N, ξ >0

C – regularisation parameter(trade-off largest margin versus fewest misclassi-fications)

SupportVector

SupportVector

SupportVector

Page 16: Introduction to classifiers for multivariate decoding of fMRI data

Examples SVM

Decoding of attented orientation and motion direction from early visual cortex activity (Kamitani & Tong, 2005, 2006)

From Kamitani & Tong, 2005

Page 17: Introduction to classifiers for multivariate decoding of fMRI data

Support vector machines

Non-linear classifier

SupportVector

SupportVector

SupportVector

SupportVector

Use of non-linear kernel functions

Potential of overfitting, especially when few training examples available

Hardly used in fMRI

Page 18: Introduction to classifiers for multivariate decoding of fMRI data

Comparison of classifier performance

From Cox & Savoy, 2003

Page 19: Introduction to classifiers for multivariate decoding of fMRI data

Analysis work flow

1) ROI definition

...

Condition 1 Condition 2

2) Data extraction

Patternclassifier

3) Training

Object discrimination(same size)

Size generalisation(1 step)

4) Test

Page 20: Introduction to classifiers for multivariate decoding of fMRI data

Patternclassifier

Object discrimination(same size)

Size generalisation(1 step)

...

Condition 1 Condition 2

Analysis work flow

1) ROI definition 2) Data extraction

3) Training

4) Test

Page 21: Introduction to classifiers for multivariate decoding of fMRI data

ROI definition – voxel selection

Regions of interest have to be defined by orthogonal contrast

(e.g., in object exemplar discrimination experiment, LOC

localiser session, all stimuli vs baseline etc.)

if a further voxel-selection is performed based on the contrast

of interest, this has to be on training data only to avoid bias

also other criteria for voxel selection (e.g., „reproducibility“ of

voxelwise response to different conditions in separate sessions,

Grill-Spector et al., 2006, Nat Neurosci) can be biased

Page 22: Introduction to classifiers for multivariate decoding of fMRI data

Patternclassifier

Object discrimination(same size)

Size generalisation(1 step)

...

Condition 1 Condition 2

Analysis work flow

1) ROI definition 2) Data extraction

3) Training

4) Test

Page 23: Introduction to classifiers for multivariate decoding of fMRI data

Data extraction

Which data to use for classification?

No general rule, different studies used beta images or raw EPI

images

ideally as many images as possible for optimal classification

performance

in typical neuroimaging studies, there is a tradeoff between

number of images and their individual signal-to-noise ratio

fewer, but less noisy images are sometimes preferable (when

using SVM)

Page 24: Introduction to classifiers for multivariate decoding of fMRI data

Patternclassifier

Object discrimination(same size)

Size generalisation(1 step)

...

Condition 1 Condition 2

Analysis work flow

1) ROI definition 2) Data extraction

3) Training

4) Test

Page 25: Introduction to classifiers for multivariate decoding of fMRI data

Crossvalidation (Training – test)

Classifier performance always has to be tested on independent

data

Split-half crossvalidation (often used in studies employing

correlation) – one half of data for training, the other for test

Leave-one-out crossvalidation (common with other classifiers),

e.g. all but one sessions for training, remaining session for test

Page 26: Introduction to classifiers for multivariate decoding of fMRI data

testSVM patternclassifier

training?

Leave-one-out Crossvalidation

Leave one out with N-fold cross-validation

Condition 1

Condition 2

Block 1 : N

(all but one patterns / condition)

Page 27: Introduction to classifiers for multivariate decoding of fMRI data

testSVM patternclassifier

training?

(all but one patterns / condition)

Leave one out with N-fold cross-validation

Condition 1

Condition 2

Block 1 : N

Leave-one-out Crossvalidation

Page 28: Introduction to classifiers for multivariate decoding of fMRI data

Crossvalidation (Training – test)

Classifier performance always has to be tested on independent

data

Split-half crossvalidation (often used in studies employing

correlation) – one half of data for training, the other for test

Leave-one-out crossvalidation (common with other classifiers),

e.g. all but one sessions for training, remaining session for test

Importantly, „leave-one-out“ should mean leave one image of

each condition out (all of one session) – avoid biases due to

session effects and unequal prior probabilities (with SVM)

Page 29: Introduction to classifiers for multivariate decoding of fMRI data

Implementations

General SVM implementations exist in different languages:

Matlab: SVM toolbox (University of Southampton,UK) http://www.isis.ecs.soton.ac.uk/resources/svminfo

SVM toolbox (TU Graz, Austria)http://ida.first.fraunhofer.de/~anton/software.html

C: SVM-light

http://svmlight.joachims.org

Python or R

Multi - Voxel Pattern Analysis (MVPA) toolbox for fMRI data

developed at Princeton University (beta version - matlab, python)http://www.csbmb.princeton.edu/mvpa

Page 30: Introduction to classifiers for multivariate decoding of fMRI data

Appendix: Distance measures

Given an m-by-n data matrix X, which is treated as m (1-by-n) row vectors x1, x2,

..., xm, the various distances between the vector xr and xs are defined as:

Euclidean distance:

Drs2 = (xr-xs)(xr-xs)´

Standardised Euclidean distance:

Drs2 = (xr-xs)D-1(xr-xs)´

D - diagonal matrix with diagonal elements given by the variance of the variable

Xi over the m objects

Mahalanobis distance:

Drs2 = (xr-xs)S-1(xr-xs)´

S - sample covariance matrix