beyond brain blobs - francisco pereira

Beyond Brain Blobs:machine learning classifiers asinstruments for analyzing fMRI data

Francisco Pereira Computer Science Department andCenter for the Neural Basis of Cognition

Tom Mitchell, Geoff Gordon Machine Learning DepartmentCarnegie Mellon University

[fMRI data from Marcel Just and collaborators, Center for Cognitive Brain Imaging,CMUand James Haxby, Department of Psychology, Princeton University]

2

a very brief introduction to fMRI

MRI - magnetic resonance imaging A 3D grid of volume elements (voxels) How does time come into the picture?

introduction : classification experiments : SVDM

3


Functional MRI neuronal activity consumes oxygen increased blood flow brings more oxygen increase in oxygenated haemoglobin affects

the MRI signal for a few seconds


4


a few secondsof signal ineach voxel


5


With anatomical information, O(10K) voxels/image Typical rate of acquisition is 1 3D image/sec,

overall duration hundreds/thousands of images

…


6

fMRI analysis

A typical experiment is designed to have the subject perform: a task of interest (e.g. read a word) a control task (e.g. read a nonsense word)

introduction : classifier experiments : SVDM

time

task

control

referencetime series

experimental conditions

7

fMRI analysis

The goal is to find voxels that match the reference


time

task

control

referencetime series

time

task

control

voxeltime series

8

fMRI analysis

This is done for each voxel in the brain yields an image with the matching score for each voxel that image is thresholded leaving only significant matches


statistical parametric map (SPM)

9

fMRI analysis

This is done for each voxel in the brain yields an image with the matching score for each voxel that image is thresholded leaving only significant matches


statistical parametric map (SPM)

a.k.a. BRAIN BLOBS

10

fMRI analysis

SPM as an instrument identifies voxels that are more active in task than in

control tests statistical significance of what was identified

it provides a view of the data that answers“which voxels are more active in task than in control images?”


11

fMRI analysis

“Brain Activation During Viewing of Erotic Film Excerpts under Influenceof Alcohol”

“In order to examine this issue, functional MRI was performed in agroup of young, healthy, right handed males. Subjects viewed eroticfilm excerpts alternating with emotionally neutral excerpts in astandard block-design paradigm.”

if you can only test for location, experimentalhypotheses will be formulated in terms of location

ever finer contrasts


12

fMRI analysis

What else could be missing? voxel interactions very small/unreliable differences between conditions making sense of many task conditions


13

fMRI analysis


subjects see gratings inone of 8 orientations

orientationsvoxel responses

voxels in visual cortexrespond similarly to

different orientations

[Kamitani&Tong, 2005]

14

fMRI analysis


subjects see gratings inone of 8 orientations

orientationsvoxel responses

voxels in visual cortexrespond similarly to

different orientations

[Kamitani&Tong, 2005]

yet, voxels can be combinedto predict the orientationof the grating being seen!

15

what questions can we ask?

meaningfulword

nonsenseword

univariate:Is the activity of voxel v sensitiveto an experimental condition?


vs

multivariate:Can voxel set S={v1, ... vn}

be used to predict theexperimental condition?

meaningfulword

nonsenseword

16

what questions should we ask?

Can we predict?

Can we say what in the image is related towhat we are trying to predict, and how?

Can we use prior knowledge or new hypothesesto make better predictions?


17

classifiers on fMRI

We can predict! [Mitchell et al 2004, Haynes 2006, Norman 2006]

is the subject seeing a sentence or a picture? which of several categories of words or pictures is a

subject seeing? is the subject reading an ambiguous sentence? will the subject answer correctly? what is the orientation of a stimulus visual grating? is there a face/music/tools/… in a film clip being seen? what is the subject perceiving? is the subject concealing information?


18

yes, one can read minds*...*Conditions may apply

... but what does this tell us about brains?

19

outline

Three case studies on dissecting classifiers can we predict? can we say what in the data helps predict?

Support Vector Decomposition Machines: can we test hypotheses/incorporate constraints?


20

classification concepts

Goal:for each example,learn to predict the value of its label

voxel

examples

voxels (features)

…

toolsbuildingstoolsbuildingsbuildingstools

labels


tools

21


Training set Training labels

PredictionModel

Test set

PredictionModel

Predicted labels

True test labels

Predicted labelsAccuracy (or lack thereof…)


22


Training set Training labels

PredictionModel

Test set

PredictionModel

Predicted labels

True test labels

Predicted labelsAccuracy (or lack thereof…)

Voxel Selection


Voxels have information!

23


PredictionModel 1


cross-validationPredictionModel 2

PredictionModel 3

Training data

Test data

Accuracy 1 Accuracy 2 Accuracy 3

Average accuracy

24


Simple, less prone to overfitting

Various kinds Gaussian Naive Bayes/Logistic Regression/Linear SVM

differ on how weights are chosen

If

otherwise

tools

buildings


Linear Discriminants:

...voxel 2voxel 1

weight1x

voxel n

weight2x

weight nx+ + + +weight 0 + > 0

25


Few examples (10s-100s) Many voxels (10K-100K) Noise:

the scanner (mostly dealt with by preprocessing) the brain/body (large blood vessels, breathing) the subject (motion, distraction) the subject (habituation, different strategies) the subject (vision, use of language, attention)


26

experiments

3 studies designed to: elicit mental representations of semantic categories try to understand how those map to brain activation


27

experiments

3 studies designed to: elicit mental representations of semantic categories try to understand how those map to brain activation

The features are voxels Linear discriminant classifiers:

Gaussian Naive Bayes/Logistic Regression/Linear SVM

Leave-one-of-each-class-out cross-validation Best subject results (consistent across subjects)


28

2 categories experiment

Subjects read concrete nouns in 2 categories words are either tools or buildings task:

see a word/think about it for 3 sec., 8 sec. pause afterwards

e.g. “hammer”, “saw”, “palace”, “hut”


29


Subjects read concrete nouns in 2 categories words are either tools or buildings task:


e.g. “hammer”, “saw”, “palace”, “hut”

Classification task: predict the category Example:

average 3D image of middle 4 secs of a trial

42 examples of each noun category 10K-20K features


30

2 categories linear discriminants

It’s possible to predict category using all the voxels

GNB weights(accuracy 65%)


31

2 categories linear discriminants

It’s possible to predict category using all the voxels

GNB weights(accuracy 65%)

L2 LogisticRegressionweights(accuracy 74%)

correlation 0.8


32

2 categories voxel accuracy map

What is each voxel contributing?

GNB weights(accuracy 0.65)

accuracy ofvoxelwiseprediction


33

[Kriegeskorte 2002]:

Examine information inside a small region Train a classifier for

each voxel togetherwith its neighbours

voxel searchlight


34

2 categories voxel searchlight map

accuracy ofvoxelprediction

accuracy ofvoxelsearchlightprediction


35

experiments – voxel selection

Scoring methods for voxel selection activation (different from zero in at least one class) accuracy (training set cross-validation accuracy of a voxel) searchlight accuracy (same but accuracy of voxel+neighbours) weight range (training set logistic reg. weights across classes)


36

experiments – voxel selection

Scoring methods for voxel selection activation (different from zero in at least one class) accuracy (training set cross-validation accuracy of a voxel) searchlight accuracy (same but accuracy of voxel+neighbours) weight range (training set logistic reg. weights across classes)

Filter voxel selection in each fold rank voxels by their score according to a method pick top 10, top 20, top 40, etc


37


Stimuli are photographs of objects in 8 categories faces, houses, cats, bottles, scissors, shoes, chairs,

scrambled block: series of photographs of the same category, one

each 2 sec... [Haxby 2001]


38


Stimuli are photographs of objects in 8 categories faces, houses, cats, bottles, scissors, shoes, chairs,

scrambled block: series of photographs of the same category, one

each 2 sec

Classification task: predict the category Example: average 3D image in a 24 second block 12 examples of each category O(20K) features

... [Haxby 2001]


39


GNB Log.Reg.

all cortex voxels 35% 43%


40

8 categories experimentPeak accuracy selecting 200 voxels with the 4 methods:

GNB Log.Reg.activation 85% 88%accuracy 86% 90%searchlight accuracy 84% 88%weight range 93% 92%



41

8 categories experimentPeak accuracy selecting 200 voxels with the 4 methods:

GNB Log.Reg. Fold Overlapactivation 85% 88% 0.06accuracy 86% 90% 0.26searchlight accuracy 84% 88% 0.20weight range 93% 92% 0.38


#voxels selected on all folds#voxels selected on any fold

= overlap

42


What are the selected voxels doing?

voxe

ls

categories

facehouse

catbottle

scissorsshoe

chairscramble

mean value in examplesof each category


43


What are the selected voxels doing?

voxe

ls

categories

facehouse

catbottle

scissorsshoe

chairscramble

mean value in examplesof each category


clustervoxels

44

10 exemplar experiment

subjects read concrete nouns in 2 categories words are either tools or buildings task:


subjects do the same task with drawings

Classification task: predict the exemplar Example:

average 3D image middle 4 secs of a trial

6 examples of each exemplar


45


Peak accuracy selecting 400 voxels with the 4methods:




46


Peak accuracy selecting 400 voxels with the 4methods:



Clustering of voxel activity harder to interpret

What makes searchlight accuracy better here?


47


searchlightselected voxelspicture stimuli

voxelcorrelation


subject 1

subject 2

voxelcorrelation

48

classifier experiment conclusions

What do we care about? prediction accuracy describing what was learnt intelligibly

location voxel behaviour reduced to a few classes voxel relationships

reproducibility [Strother 2002] within subject/across subjects

consistency with prior knowledge


49



location voxel behaviour reduced to a few classes voxel groupings


consistency with prior knowledge

Information is redundant (present in many locations/different) local (voxels do more than correlate with neighbours)


50



location voxel behaviour reduced to a few classes voxel groupings


consistency with prior knowledge (mostly location)

Information is redundant (present in many locations/different) local (voxels do more than correlate with neighbours).

Voxels are a painintroduction : classification experiments : SVDM

51

outline

Three case studies on dissecting classifiers can we predict? can we say what in the data helps predict?

Support Vector Decomposition Machines: can we test hypotheses/incorporate constraints?


52

low-dimensional spatial decompositions

=

components or eigenimages


example

a + b + c + d

(a,b,c,d)are coordinates in a basis of components

53


…

=

=

x

n examples X

m voxels

= xZ W l components

l-dimensionalrepresentation of data


m voxels

n examples

l components

54


Why express an image in terms of a basis of images? basis has orthogonal (SVD) or independent (ICA) components minimize reconstruction error basis images capture spatial patterns of activity over many voxels low-dimensional coordinates give a succinct representation of the data

…

=

=

x


55


…

=

=

x

classify from new features?


Why express an image in terms of a basis of images? basis has orthogonal (SVD) or independent (ICA) components minimize reconstruction error basis images capture spatial patterns of activity over many voxels low-dimensional coordinates give a succinct representation of the

data

56


However: all the activation is captured it will be split among basis images by method criteria minimizes reconstruction error rather than classification

error

tools

buildings


57

support vector decomposition machine (SVDM)

Learning a linear SVM basedon a low-dimensional

representation

Learning an informativelow-dimensionalrepresentation


58

support vector decomposition machine (SVDM)

Learning a linear SVM basedon a low-dimensional

representation

Learning an informativelow-dimensionalrepresentation


59

SVDM as a spatial decomposition

…

=

=

x

new features to classifyfrom with linear discriminant

n examples X

m voxels

= xZ W l components

l-dimensionalrepresentation of data


m voxels

!

60

SVDM notation

n examples

m featuresk classification problems(e.g. tools vs buildingsand word vs picture)

Y X


61

SVDM notation

n examples

m featuresk classification problems

Y X

l componentsxZ WX =

m features

Y = Z x

k classification problems

!!Z xY =

Predictions

sign

l componentsLearnt


62

SVDM optimization problem

Find Z,W and that minimize!

where

subject to

examples

classificationproblems


63



where

subject to

reconstruction error

examples



64



where

subject to

reconstruction error classification error

examples



65



where

subject to


weight of the hingeloss term relative toreconstruction error

examples



classification error

66



where

subject to


weight of thehinge loss term

example hasmultiple labels

examples



classification error

67

SVDM parameters

L - # of components try a reasonable range (e.g. 1-15)

D is the parameter that trades off the two errors D smaller

better reconstruction of X, more training set errors

D largerfewer training set errors at the expense of reconstructing X

in practice, cross-validation,pick the smallest D with very few or no errors


68

SVDM classification performance

2 category taskdescribed earlier

Beats GNB andSVM based onSVD or ICA

Learnsinformativecomponents(usable by anotherclassifier)


69

SVDM linear discriminant

GNB(65%)

SVM(linear)(74%)

SVDM(79%)


70

SVDM linear discriminant

GNB(65%)

SVM(linear)(74%)

SVDM(80%)


71

SVDM work in progress

Multi-class Learn components to represent subsets of the classes


72



Multi-subject shared and Z shared

!

!


= Z W1X1 X2W2

73



Multi-subject shared and Z shared

Constraints component smoothness/sparsity voxel behaviour (e.g. active in few classes) hypothesis-driven component sharing

!

!


= Z W1X1 X2W2

74

Conclusions

We can predict

We can say what in the image is related towhat we are trying to predict, and how (sometimes)

We are working on making it easy to addconstraints or build prediction models thatincorporate hypotheses


75

thank you!

Questions?

**No classifiers were harmed in producing this talk. Some grad students may have been.*Conditions may apply, ask carefully.

beyond brain blobs - francisco pereira

Documents