introduction to classifiers for multivariate decoding of fmri data
DESCRIPTION
Introduction to classifiers for multivariate decoding of fMRI data. Evelyn Eger. MMN 15/12/08. Two directions of inference. 1) Forward modelling:. Psychological variable. (p-value). Data. 2) Decoding:. (prediction accuracy). Psychological variable. Data. Two directions of inference. - PowerPoint PPT PresentationTRANSCRIPT
Introduction to classifiers for multivariate decoding of fMRI data
Evelyn Eger
MMN 15/12/08
Two directions of inference
Psychological variable
Data
1) Forward modelling:
(p-value)
Data Psychological variable
2) Decoding:
(predictionaccuracy)
Two directions of inference
Inverse inference (decoding) is of special interest e.g., for brain – computer interface, automated diagnosis, etc.
In other cases the two are in principle interchangeable, both demonstrate a statistical dependency between experimental variable and data
In many paradigms applying decoding to fMRI, the direction of inference is not central for the interpretation (eg., Haynes & Rees, 2006, Kriegeskorte & Bandettini, 2007 for reviews)
Efficient, powerful methods based on decoding exist for pattern-based (multivariate) applications
Univariate versus multivariate
Univariate analysis:
effects are analysed for a single dependent variablee.g., t-test, F-test, ANOVA
Special case: „mass-univariate“ analysis in brain imaging: we test effects in a large number of voxels treated as independent
Multivariate analysis:
Effects are analysed for multiple dependent variablese.g., Hotelling´s t-square test, Wilks Lambda, MANOVA
Adapted from Haynes et al. 2006
Stimulus conditions:
1 2
Discrimination can be improved with higher dimensions
Significance of individual voxels not required
Why go multivariate in brain imaging
Linear classification (in 2D space)
Voxel 1
Vox
el 2
b
w
Set of points xi
with labels yi Є {1,-1}
separated by ahyperplane y = wTx + b
so that yi(wxi + b) > 1
For dimensions NHyperplane N-1
Linear classification (in 2D space)
Voxel 1
Vox
el 2 New data projected
onto previously learned hyperplane
Assignment to classesyi Є {1,-1}
prediction accuracy
Which hyperplane to choose ?
Difference between means
w m2-m1
Corresponding to a classifier based on Euclidean distance / correlation
m2
m1
Examples difference between means
From Haxby et al., 2001
used to demonstrate distinct multi-voxel
activity patterns for object categories in
ventral visual cortex (Haxby et al., 2001)
and for other recent studies on object
representation, e.g. position tolerance
(Schwarzlose et al., 2008), perceived shape
similarity (Op de Beeck et al., 2008)
Difference between means
w m2-m1
Corresponding to a classifier based on Euclidean distance / correlation
not taking into account variances/covariances
m2
m1
Fishers linear discriminant
w S-1(m2-m1)
S – covariance matrix
Distance measure:Mahalanobis distance
m2
m1
Examples Fishers linear discriminant
Decoding of conscious and unconscious stimulus orientation from early visual cortex activity (Haynes & Rees, 2005)
Discrimination of individual faces in anterior inferotemporal cortex (Kriegeskorte et al., 2007)
From Haynes & Rees, 2006 review
From Kriegeskorte et al, 2007
Fishers linear discriminant
w S-1(m2-m1)
S – covariance matrix
Distance measure:Mahalanobis distance
Curse of dimensionality:
S is not invertible when dimensionality exeeds number of data points
m2
m1
w : weighted linear combination of support vectors
minimising ||w||/2 subject to yi(wxi + b) > 1, i = 1 : N
“hard-margin” classifier
Support vector machines
SupportVector
SupportVector
SupportVector
“soft-margin” classifier
ξ
ξ
Support vector machines
w : weighted linear combination of support vectors
minimising ||w||/2 + C∑ξi
subject to yi(wxi + b) ≥ 1 – ξi, i = 1 : N, ξ >0
C – regularisation parameter(trade-off largest margin versus fewest misclassi-fications)
SupportVector
SupportVector
SupportVector
Examples SVM
Decoding of attented orientation and motion direction from early visual cortex activity (Kamitani & Tong, 2005, 2006)
From Kamitani & Tong, 2005
Support vector machines
Non-linear classifier
SupportVector
SupportVector
SupportVector
SupportVector
Use of non-linear kernel functions
Potential of overfitting, especially when few training examples available
Hardly used in fMRI
Comparison of classifier performance
From Cox & Savoy, 2003
Analysis work flow
1) ROI definition
...
Condition 1 Condition 2
2) Data extraction
Patternclassifier
3) Training
Object discrimination(same size)
Size generalisation(1 step)
4) Test
Patternclassifier
Object discrimination(same size)
Size generalisation(1 step)
...
Condition 1 Condition 2
Analysis work flow
1) ROI definition 2) Data extraction
3) Training
4) Test
ROI definition – voxel selection
Regions of interest have to be defined by orthogonal contrast
(e.g., in object exemplar discrimination experiment, LOC
localiser session, all stimuli vs baseline etc.)
if a further voxel-selection is performed based on the contrast
of interest, this has to be on training data only to avoid bias
also other criteria for voxel selection (e.g., „reproducibility“ of
voxelwise response to different conditions in separate sessions,
Grill-Spector et al., 2006, Nat Neurosci) can be biased
Patternclassifier
Object discrimination(same size)
Size generalisation(1 step)
...
Condition 1 Condition 2
Analysis work flow
1) ROI definition 2) Data extraction
3) Training
4) Test
Data extraction
Which data to use for classification?
No general rule, different studies used beta images or raw EPI
images
ideally as many images as possible for optimal classification
performance
in typical neuroimaging studies, there is a tradeoff between
number of images and their individual signal-to-noise ratio
fewer, but less noisy images are sometimes preferable (when
using SVM)
Patternclassifier
Object discrimination(same size)
Size generalisation(1 step)
...
Condition 1 Condition 2
Analysis work flow
1) ROI definition 2) Data extraction
3) Training
4) Test
Crossvalidation (Training – test)
Classifier performance always has to be tested on independent
data
Split-half crossvalidation (often used in studies employing
correlation) – one half of data for training, the other for test
Leave-one-out crossvalidation (common with other classifiers),
e.g. all but one sessions for training, remaining session for test
testSVM patternclassifier
training?
…
…
Leave-one-out Crossvalidation
Leave one out with N-fold cross-validation
…
…
Condition 1
Condition 2
Block 1 : N
(all but one patterns / condition)
testSVM patternclassifier
training?
…
…
(all but one patterns / condition)
Leave one out with N-fold cross-validation
…
…
Condition 1
Condition 2
Block 1 : N
Leave-one-out Crossvalidation
Crossvalidation (Training – test)
Classifier performance always has to be tested on independent
data
Split-half crossvalidation (often used in studies employing
correlation) – one half of data for training, the other for test
Leave-one-out crossvalidation (common with other classifiers),
e.g. all but one sessions for training, remaining session for test
Importantly, „leave-one-out“ should mean leave one image of
each condition out (all of one session) – avoid biases due to
session effects and unequal prior probabilities (with SVM)
Implementations
General SVM implementations exist in different languages:
Matlab: SVM toolbox (University of Southampton,UK) http://www.isis.ecs.soton.ac.uk/resources/svminfo
SVM toolbox (TU Graz, Austria)http://ida.first.fraunhofer.de/~anton/software.html
C: SVM-light
http://svmlight.joachims.org
Python or R
Multi - Voxel Pattern Analysis (MVPA) toolbox for fMRI data
developed at Princeton University (beta version - matlab, python)http://www.csbmb.princeton.edu/mvpa
Appendix: Distance measures
Given an m-by-n data matrix X, which is treated as m (1-by-n) row vectors x1, x2,
..., xm, the various distances between the vector xr and xs are defined as:
Euclidean distance:
Drs2 = (xr-xs)(xr-xs)´
Standardised Euclidean distance:
Drs2 = (xr-xs)D-1(xr-xs)´
D - diagonal matrix with diagonal elements given by the variance of the variable
Xi over the m objects
Mahalanobis distance:
Drs2 = (xr-xs)S-1(xr-xs)´
S - sample covariance matrix