multi-variate/voxel pattern analysis (mvpa) - umich.edunii/doc/polk-mvpa.pdf · multi-variate/voxel...
TRANSCRIPT
Multi-Variate/Voxel Pattern Analysis (MVPA)
Some slides adapted from those of Jonas Kaplan, Dipanjan Chakraborty, and Jinwei Gu
V1 V2 V3 V4 V5 V6
V1 V2 V3 V4 V5 V6 V1 V2 V3 V4 V5 V6
Average Average
Not significant
—
Individually not significant…
…but the patterns are clearly different
Univariate analysis:
Fusiform face area
Significant?
Multivariate pattern analysis: Predictable?
V1 V2 V3 V4 V5 V6
Classifier
? ? ? ? ? ? ? ? ? ? ? ? Activity pattern v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v
Testing trials
Classifier guess
Stimulus
Performance: 75% Stimulus
Activity pattern v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v
Testing trials
v v v v v
v v v v v v v v
v v v v
v v v v
v v v v v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v
v v v v v v v v
v v v
Stimulus
Activity pattern
Training trials
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
Runs
Cro
ss-v
alid
atio
n st
eps
Training run Testing run
Performance 1 Performance 2 Performance 3 Performance 4 Performance 5 Performance 6 Performance 7 Performance 8
Overall performance
Cross-validation paradigm:
What is the input to the classifier?
§ Raw fMRI data
TR TR TR TR TR TR TR TR TR TR TR TR
TASK A TASK B
What is the input to the classifier?
§ Raw fMRI data § Averaged fMRI data
TR TR TR TR TR TR TR TR TR TR TR TR
TASK A TASK B
AVG AVG
What is the input to the classifier?
§ Raw fMRI data § Averaged fMRI data § beta values from a GLM analysis
TR TR TR TR TR TR TR TR TR TR TR TR
TASK A TASK B
Classifier algorithms • Nearest neighbor (e.g., Haxby) • Support Vector Machines (SVM) • Neural Networks (e.g., Backpropagation) • Linear Discriminant Analysis (LDA) • Gaussian Naive Bayes (GNB) • Sparse Multinomial Linear Regression (SMLR) • ... • ...
§ How would you classify these points using a linear discriminant function in order to minimize the error rate?
Support Vector Machines denotes +1
denotes -1
x1
x2
n Infinite number of answers!
§ How would you classify these points using a linear discriminant function in order to minimize the error rate?
denotes +1
denotes -1
x1
x2
n Infinite number of answers!
Support Vector Machines
§ How would you classify these points using a linear discriminant function in order to minimize the error rate?
denotes +1
denotes -1
x1
x2
n Infinite number of answers!
Support Vector Machines
x1
x2 § How would you classify these points using a linear discriminant function in order to minimize the error rate?
denotes +1
denotes -1
n Infinite number of answers!
n Which one is the best?
Support Vector Machines
“safe zone”
n Margin is defined as the width that the boundary could be increased by before hitting a data point
n Why it is the best? q Robust to outliners and thus
strong generalization ability
Margin
x1
x2
denotes +1
denotes -1
Support Vector Machines
n The linear discriminant function (classifier) with the maximum margin is the best
x1
x2
denotes +1
denotes -1
Margin
x+
x+
x-
n The linear discriminant function (classifier) with the maximum margin is the best
n Margin is defined as the width that the boundary could be increased by before hitting a data point
n Why it is the best? q Robust to outliners and thus
strong generalization ability
Support Vector Machines
Support Vectors
§ What if data is not linearly separable? (noisy data, outliers, etc.)
n Slack variables ξi can be added to allow mis-classification of difficult or noisy data points
x1
x2
denotes +1
denotes -1
1ξ2ξ
n Maximize the margin while minimizing the sum of the slack variables ξi x cost parameter C
Support Vector Machines
Non-linear SVMs n Datasets that are linearly separable with noise work out great:
0 x
0 x
x2
0 x
n But what are we going to do if the dataset is just too hard?
n How about… mapping data to a higher-dimensional space:
This slide is courtesy of www.iro.umontreal.ca/~pift6080/documents/papers/svm_tutorial.ppt
Non-linear SVMs: Feature Space n General idea: the original input space can be mapped to
some higher-dimensional feature space where the training set is separable:
Φ: x → φ(x)
n Don’t usually need to do this for fMRI because space is already high-dimensional (large number of voxels)
Support Vector Machines: Summary
• Binary classifier
• Draws a hyperplane to separate the categories, maximizing the margin between classes
• Works quickly on large feature sets (lots of voxels)
• Adding slack variables and cost parameter allows for some misclassification
• Linear version is usually sufficient (very little advantage to nonlinear SVM with lots of features and few stimuli)
How To MVPA § Write your own code § Libsvm/Liblinear § 3dsvm § BrainVoyager § Princeton MVPA toolbox § PRoNTo § PyMVPA § Matlab Bioinformatics toolbox § …
Write Your Own § Many MVPA analyses are easy § Haxby:
§ Run GLM and store activation pattern (betas) for each condition in odd runs and in even runs (Haxby subtracted mean activation across conditions)
§ Compute all pairwise correlations between even and odd runs § Predict house(odd) vs. house(even) > house(odd) vs. face(even) § Predict face(odd) vs. face(even) > house(odd) vs. face(even) § … § Accuracy = Correct predictions / Total predictions (not really predicting
specific category)
§ Or store betas for each block and classify test block based on classification of most similar training block (1-nearest neighbor)
§ Or compute difference between within- and between-category correlations
Carp et al. (2011, Neuroimage)
−0.5
0.0
0.5
1.0
# Voxels
Corr
ela
tion
2 4 8 16 32 64 128 256 512
*
*
0.0
0.2
0.4
0.6
0.8
1.0
# Voxels
Neura
l D
istinctiveness
2 4 8 16 32 64 128 256 512
Young
Old
*
BA
Libsvm/Liblinear § Probably most widely used and powerful tools for SVM § Set of C++ and Java libraries § Interfaces for Matlab, Python, R, Perl, Common Lisp, … § Cross validation for model selection (e.g., choosing C
parameter) § Variety of different SVM formulations § Efficient multi-class classification § Libsvm includes general tools for SVM (including non-linear) § Liblinear is very efficient implementation for linear SVM
J Park et al. (2010, J Neurosci)
Young Old
Cla
ssifi
catio
n A
ccur
acy
Young Old
0.5
0.6
0.7
0.8
0.9
1.0
Ne
ura
l sp
ecific
ity o
f vis
ua
l a
ctivity
A
●
Young Old
0.5
0.6
0.7
0.8
0.9
1.0
Ne
ura
l sp
ecific
ity o
f m
oto
r a
ctivity
B
3dsvm § Command line program and plugin for AFNI § Built around SVM-light package
§ Features: § Reading AFNI-supported binary image formats § Masking of variables (brain pixels) § Censoring training samples § Visualizing alphas as time series and linear weight vectors as
functional overlays § Classifying multiple categories
BrainVoyager § BrainVoyager QX 2.0 includes MVPA tools
§ SVM within selected ROIs or whole brain § Multivariate searchlight § Recursive feature elimination for feature selection
Princeton MVPA Toolbox § Set of Matlab tools specifically designed for MVPA
§ Import, export and visualization of data § AFNI, DICOM, ANALYZE, NIFTI, BrainVoyager
§ Preprocessing (z-scoring time series)
§ Feature selection (ANOVA)
§ Classification and cross-validation § Backprop is default classifier right now, but includes others § And relatively easy to get it working with libsvm/liblinear
§ Easy to install and use if you know Matlab § Reasonable tutorials & manual; user community
PRoNTo § MVPA toolbox built by John Ashburner & others at UCL § Five main modules:
§ Data & Design § Prepare feature set § Specify and Run model § Compute weights § Display Results
§ Accepts NIFTI files § Graphical user interface § Classifiers: Support Vector Machine, Gaussian Process
Classifier, Random Forest § Regression models: Kernel Ridge Regression and Relevance
Vector Regression
PyMVPA § Multivariate Pattern Analysis in Python § Provides high-level abstraction of typical processing steps § Implementations of most popular algorithms and interfaces to
lots of tools (e.g., libsvm) § May be the most powerful and flexible of all the tools § Reasonable tutorial and manual; user community
§ Kind of a hassle to install (at least on Macs) § Requires knowledge of the Python programming language
Matlab Bioinformatics Toolbox § Cross-validation:
§ crossvalind – Generate cross-validation indices § classperf – Create & update classifier performance object
§ Nearest-neighbor: § knnclassify – Classify data using nearest neighbor method
§ SVM: § svmtrain – Train support vector machine classifier § svmclassify – Classify using support vector machine
§ This is what I’m most familiar with, but it wasn’t built for imaging data (doesn’t read in NIFTI, doesn’t display brain pictures, …) so need to use other tools for that (e.g. SPM)
Matlab Demos § Assuming the following two variables are already defined:
§ data : Matrix containing patterns to be classified as rows § Each column might be a value from a different voxel
§ CorrectLabels : Vector with correct label for each pattern § Number of entries must equal the number of rows in data
Cross-validated kNN Classifier in Matlab
k = 1; % How many neighbors for kNN classifier (usually odd) Nfolds = 5; % How many folds (divisions of data) for cross-validation indices = crossvalind('Kfold', size(data,1), Nfolds); % Randomly divide data into folds. indices is a vector where each entry % is an integer between 1 and Nfolds, indicating which fold each datapoint % belongs to cp = classperf(CorrectLabels); % Initialize a classifer performance object for i = 1:Nfolds test = (indices == i); train = ~test; % test indicates data to be tested in this fold. train indicates data % used in training for this fold. classes= knnclassify(data(test,:),data(train,:),CorrectLabels(train,:), k); % Perform kNN classification on this fold's test data based on training data classperf(cp,classes,test); % Update the CP object with the classification results from this fold end cp.CorrectRate % Output the average correct classification rate
Cross-validated SVM Classifier in Matlab
Nfolds = 5; % How many folds (divisions of data) for cross-validation indices = crossvalind('Kfold', size(data,1), Nfolds); % Randomly divide data into folds. indices is a vector where each entry % is an integer between 1 and Nfolds, indicating which fold each datapoint % belongs to cp = classperf(CorrectLabels); % Initialize a classifer performance object for i = 1:Nfolds test = (indices == i); train = ~test; % test indicates data to be tested in this fold. train indicates data % used in training for this fold. svmStruct = svmtrain(data(train,:),CorrectLabels(train)); % Train a support vector machine on this fold's training data classes = svmclassify(svmStruct,data(test,:)); % Use the trained SVM to classify this fold's test data classperf(cp,classes,test); % Update the CP object with the classification results from this fold end cp.CorrectRate % Output the average correct classification rate