multi-variate/voxel pattern analysis (mvpa) - umich.edunii/doc/polk-mvpa.pdf · multi-variate/voxel...

37
Multi-Variate/Voxel Pattern Analysis (MVPA) Some slides adapted from those of Jonas Kaplan, Dipanjan Chakraborty, and Jinwei Gu

Upload: doanmien

Post on 21-Jul-2018

233 views

Category:

Documents


0 download

TRANSCRIPT

Multi-Variate/Voxel Pattern Analysis (MVPA)

Some slides adapted from those of Jonas Kaplan, Dipanjan Chakraborty, and Jinwei Gu

V1 V2 V3 V4 V5 V6

V1 V2 V3 V4 V5 V6 V1 V2 V3 V4 V5 V6

Average Average

Not significant

Individually not significant…

…but the patterns are clearly different

Univariate analysis:

Fusiform face area

Significant?

Multivariate pattern analysis: Predictable?

V1 V2 V3 V4 V5 V6

Classifier

? ? ? ? ? ? ? ? ? ? ? ? Activity pattern v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v

Testing trials

Classifier guess

Stimulus

Performance: 75% Stimulus

Activity pattern v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v

Testing trials

v v v v v

v v v v v v v v

v v v v

v v v v

v v v v v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v

v v v v v v v v

v v v

Stimulus

Activity pattern

Training trials

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

Runs

Cro

ss-v

alid

atio

n st

eps

Training run Testing run

Performance 1 Performance 2 Performance 3 Performance 4 Performance 5 Performance 6 Performance 7 Performance 8

Overall performance

Cross-validation paradigm:

Norman et al. (2006, TICS)

What is the input to the classifier?

§ Raw fMRI data

TR TR TR TR TR TR TR TR TR TR TR TR

TASK A TASK B

What is the input to the classifier?

§ Raw fMRI data § Averaged fMRI data

TR TR TR TR TR TR TR TR TR TR TR TR

TASK A TASK B

AVG AVG

What is the input to the classifier?

§ Raw fMRI data § Averaged fMRI data § beta values from a GLM analysis

TR TR TR TR TR TR TR TR TR TR TR TR

TASK A TASK B

Classifier algorithms •  Nearest neighbor (e.g., Haxby) •  Support Vector Machines (SVM) •  Neural Networks (e.g., Backpropagation) •  Linear Discriminant Analysis (LDA) •  Gaussian Naive Bayes (GNB) •  Sparse Multinomial Linear Regression (SMLR) • ... • ...

1-Nearest Neighbor

3-Nearest Neighbor

Nearest Neighbor

Haxby et al. (2001, Science)

§ How would you classify these points using a linear discriminant function in order to minimize the error rate?

Support Vector Machines denotes +1

denotes -1

x1

x2

n  Infinite number of answers!

§ How would you classify these points using a linear discriminant function in order to minimize the error rate?

denotes +1

denotes -1

x1

x2

n  Infinite number of answers!

Support Vector Machines

§ How would you classify these points using a linear discriminant function in order to minimize the error rate?

denotes +1

denotes -1

x1

x2

n  Infinite number of answers!

Support Vector Machines

x1

x2 § How would you classify these points using a linear discriminant function in order to minimize the error rate?

denotes +1

denotes -1

n  Infinite number of answers!

n  Which one is the best?

Support Vector Machines

“safe zone”

n  Margin is defined as the width that the boundary could be increased by before hitting a data point

n  Why it is the best? q  Robust to outliners and thus

strong generalization ability

Margin

x1

x2

denotes +1

denotes -1

Support Vector Machines

n  The linear discriminant function (classifier) with the maximum margin is the best

x1

x2

denotes +1

denotes -1

Margin

x+

x+

x-

n  The linear discriminant function (classifier) with the maximum margin is the best

n  Margin is defined as the width that the boundary could be increased by before hitting a data point

n  Why it is the best? q  Robust to outliners and thus

strong generalization ability

Support Vector Machines

Support Vectors

§ What if data is not linearly separable? (noisy data, outliers, etc.)

n  Slack variables ξi can be added to allow mis-classification of difficult or noisy data points

x1

x2

denotes +1

denotes -1

1ξ2ξ

n  Maximize the margin while minimizing the sum of the slack variables ξi x cost parameter C

Support Vector Machines

Non-linear SVMs n  Datasets that are linearly separable with noise work out great:

0 x

0 x

x2

0 x

n  But what are we going to do if the dataset is just too hard?

n  How about… mapping data to a higher-dimensional space:

This slide is courtesy of www.iro.umontreal.ca/~pift6080/documents/papers/svm_tutorial.ppt

Non-linear SVMs: Feature Space n  General idea: the original input space can be mapped to

some higher-dimensional feature space where the training set is separable:

Φ: x → φ(x)

n  Don’t usually need to do this for fMRI because space is already high-dimensional (large number of voxels)

Support Vector Machines: Summary

•  Binary classifier

•  Draws a hyperplane to separate the categories, maximizing the margin between classes

•  Works quickly on large feature sets (lots of voxels)

•  Adding slack variables and cost parameter allows for some misclassification

•  Linear version is usually sufficient (very little advantage to nonlinear SVM with lots of features and few stimuli)

How To MVPA § Write your own code § Libsvm/Liblinear § 3dsvm § BrainVoyager § Princeton MVPA toolbox § PRoNTo § PyMVPA § Matlab Bioinformatics toolbox § …

Write Your Own § Many MVPA analyses are easy § Haxby:

§  Run GLM and store activation pattern (betas) for each condition in odd runs and in even runs (Haxby subtracted mean activation across conditions)

§  Compute all pairwise correlations between even and odd runs §  Predict house(odd) vs. house(even) > house(odd) vs. face(even) §  Predict face(odd) vs. face(even) > house(odd) vs. face(even) §  … §  Accuracy = Correct predictions / Total predictions (not really predicting

specific category)

§  Or store betas for each block and classify test block based on classification of most similar training block (1-nearest neighbor)

§  Or compute difference between within- and between-category correlations

Carp et al. (2011, Neuroimage)

−0.5

0.0

0.5

1.0

# Voxels

Corr

ela

tion

2 4 8 16 32 64 128 256 512

*

*

0.0

0.2

0.4

0.6

0.8

1.0

# Voxels

Neura

l D

istinctiveness

2 4 8 16 32 64 128 256 512

Young

Old

*

BA

Libsvm/Liblinear § Probably most widely used and powerful tools for SVM § Set of C++ and Java libraries §  Interfaces for Matlab, Python, R, Perl, Common Lisp, … § Cross validation for model selection (e.g., choosing C

parameter) § Variety of different SVM formulations § Efficient multi-class classification § Libsvm includes general tools for SVM (including non-linear) § Liblinear is very efficient implementation for linear SVM

J Park et al. (2010, J Neurosci)

Young Old

Cla

ssifi

catio

n A

ccur

acy

Young Old

0.5

0.6

0.7

0.8

0.9

1.0

Ne

ura

l sp

ecific

ity o

f vis

ua

l a

ctivity

A

Young Old

0.5

0.6

0.7

0.8

0.9

1.0

Ne

ura

l sp

ecific

ity o

f m

oto

r a

ctivity

B

3dsvm § Command line program and plugin for AFNI § Built around SVM-light package

§ Features: § Reading AFNI-supported binary image formats § Masking of variables (brain pixels) § Censoring training samples §  Visualizing alphas as time series and linear weight vectors as

functional overlays § Classifying multiple categories

BrainVoyager § BrainVoyager QX 2.0 includes MVPA tools

§  SVM within selected ROIs or whole brain § Multivariate searchlight § Recursive feature elimination for feature selection

Princeton MVPA Toolbox § Set of Matlab tools specifically designed for MVPA

§  Import, export and visualization of data §  AFNI, DICOM, ANALYZE, NIFTI, BrainVoyager

§  Preprocessing (z-scoring time series)

§  Feature selection (ANOVA)

§ Classification and cross-validation §  Backprop is default classifier right now, but includes others §  And relatively easy to get it working with libsvm/liblinear

§ Easy to install and use if you know Matlab § Reasonable tutorials & manual; user community

PRoNTo § MVPA toolbox built by John Ashburner & others at UCL § Five main modules:

§ Data & Design §  Prepare feature set §  Specify and Run model § Compute weights § Display Results

§ Accepts NIFTI files § Graphical user interface § Classifiers: Support Vector Machine, Gaussian Process

Classifier, Random Forest § Regression models: Kernel Ridge Regression and Relevance

Vector Regression

PyMVPA § Multivariate Pattern Analysis in Python § Provides high-level abstraction of typical processing steps §  Implementations of most popular algorithms and interfaces to

lots of tools (e.g., libsvm) § May be the most powerful and flexible of all the tools § Reasonable tutorial and manual; user community

§ Kind of a hassle to install (at least on Macs) § Requires knowledge of the Python programming language

Matlab Bioinformatics Toolbox § Cross-validation:

§  crossvalind – Generate cross-validation indices §  classperf – Create & update classifier performance object

§ Nearest-neighbor: §  knnclassify – Classify data using nearest neighbor method

§ SVM: §  svmtrain – Train support vector machine classifier §  svmclassify – Classify using support vector machine

§ This is what I’m most familiar with, but it wasn’t built for imaging data (doesn’t read in NIFTI, doesn’t display brain pictures, …) so need to use other tools for that (e.g. SPM)

Matlab Demos § Assuming the following two variables are already defined:

§ data : Matrix containing patterns to be classified as rows §  Each column might be a value from a different voxel

§ CorrectLabels : Vector with correct label for each pattern §  Number of entries must equal the number of rows in data

Cross-validated kNN Classifier in Matlab

k = 1; % How many neighbors for kNN classifier (usually odd) Nfolds = 5; % How many folds (divisions of data) for cross-validation indices = crossvalind('Kfold', size(data,1), Nfolds); % Randomly divide data into folds. indices is a vector where each entry % is an integer between 1 and Nfolds, indicating which fold each datapoint % belongs to cp = classperf(CorrectLabels); % Initialize a classifer performance object for i = 1:Nfolds test = (indices == i); train = ~test; % test indicates data to be tested in this fold. train indicates data % used in training for this fold. classes= knnclassify(data(test,:),data(train,:),CorrectLabels(train,:), k); % Perform kNN classification on this fold's test data based on training data classperf(cp,classes,test); % Update the CP object with the classification results from this fold end cp.CorrectRate % Output the average correct classification rate

Cross-validated SVM Classifier in Matlab

Nfolds = 5; % How many folds (divisions of data) for cross-validation indices = crossvalind('Kfold', size(data,1), Nfolds); % Randomly divide data into folds. indices is a vector where each entry % is an integer between 1 and Nfolds, indicating which fold each datapoint % belongs to cp = classperf(CorrectLabels); % Initialize a classifer performance object for i = 1:Nfolds test = (indices == i); train = ~test; % test indicates data to be tested in this fold. train indicates data % used in training for this fold. svmStruct = svmtrain(data(train,:),CorrectLabels(train)); % Train a support vector machine on this fold's training data classes = svmclassify(svmStruct,data(test,:)); % Use the trained SVM to classify this fold's test data classperf(cp,classes,test); % Update the CP object with the classification results from this fold end cp.CorrectRate % Output the average correct classification rate