beyond brain blobs - francisco pereira
TRANSCRIPT
Beyond Brain Blobs:machine learning classifiers asinstruments for analyzing fMRI data
Francisco Pereira Computer Science Department andCenter for the Neural Basis of Cognition
Tom Mitchell, Geoff Gordon Machine Learning DepartmentCarnegie Mellon University
[fMRI data from Marcel Just and collaborators, Center for Cognitive Brain Imaging,CMUand James Haxby, Department of Psychology, Princeton University]
2
a very brief introduction to fMRI
MRI - magnetic resonance imaging A 3D grid of volume elements (voxels) How does time come into the picture?
introduction : classification experiments : SVDM
3
a very brief introduction to fMRI
Functional MRI neuronal activity consumes oxygen increased blood flow brings more oxygen increase in oxygenated haemoglobin affects
the MRI signal for a few seconds
introduction : classification experiments : SVDM
4
a very brief introduction to fMRI
a few secondsof signal ineach voxel
introduction : classification experiments : SVDM
5
a very brief introduction to fMRI
With anatomical information, O(10K) voxels/image Typical rate of acquisition is 1 3D image/sec,
overall duration hundreds/thousands of images
…
introduction : classification experiments : SVDM
6
fMRI analysis
A typical experiment is designed to have the subject perform: a task of interest (e.g. read a word) a control task (e.g. read a nonsense word)
introduction : classifier experiments : SVDM
time
task
control
referencetime series
experimental conditions
7
fMRI analysis
The goal is to find voxels that match the reference
introduction : classifier experiments : SVDM
time
task
control
referencetime series
time
task
control
voxeltime series
8
fMRI analysis
This is done for each voxel in the brain yields an image with the matching score for each voxel that image is thresholded leaving only significant matches
introduction : classifier experiments : SVDM
statistical parametric map (SPM)
9
fMRI analysis
This is done for each voxel in the brain yields an image with the matching score for each voxel that image is thresholded leaving only significant matches
introduction : classifier experiments : SVDM
statistical parametric map (SPM)
a.k.a. BRAIN BLOBS
10
fMRI analysis
SPM as an instrument identifies voxels that are more active in task than in
control tests statistical significance of what was identified
it provides a view of the data that answers“which voxels are more active in task than in control images?”
introduction : classifier experiments : SVDM
11
fMRI analysis
“Brain Activation During Viewing of Erotic Film Excerpts under Influenceof Alcohol”
“In order to examine this issue, functional MRI was performed in agroup of young, healthy, right handed males. Subjects viewed eroticfilm excerpts alternating with emotionally neutral excerpts in astandard block-design paradigm.”
if you can only test for location, experimentalhypotheses will be formulated in terms of location
ever finer contrasts
introduction : classifier experiments : SVDM
12
fMRI analysis
What else could be missing? voxel interactions very small/unreliable differences between conditions making sense of many task conditions
introduction : classifier experiments : SVDM
13
fMRI analysis
introduction : classifier experiments : SVDM
subjects see gratings inone of 8 orientations
orientationsvoxel responses
voxels in visual cortexrespond similarly to
different orientations
[Kamitani&Tong, 2005]
14
fMRI analysis
introduction : classifier experiments : SVDM
subjects see gratings inone of 8 orientations
orientationsvoxel responses
voxels in visual cortexrespond similarly to
different orientations
[Kamitani&Tong, 2005]
yet, voxels can be combinedto predict the orientationof the grating being seen!
15
what questions can we ask?
meaningfulword
nonsenseword
univariate:Is the activity of voxel v sensitiveto an experimental condition?
introduction : classifier experiments : SVDM
vs
multivariate:Can voxel set S={v1, ... vn}
be used to predict theexperimental condition?
meaningfulword
nonsenseword
16
what questions should we ask?
Can we predict?
Can we say what in the image is related towhat we are trying to predict, and how?
Can we use prior knowledge or new hypothesesto make better predictions?
introduction : classifier experiments : SVDM
17
classifiers on fMRI
We can predict! [Mitchell et al 2004, Haynes 2006, Norman 2006]
is the subject seeing a sentence or a picture? which of several categories of words or pictures is a
subject seeing? is the subject reading an ambiguous sentence? will the subject answer correctly? what is the orientation of a stimulus visual grating? is there a face/music/tools/… in a film clip being seen? what is the subject perceiving? is the subject concealing information?
introduction : classification experiments : SVDM
19
outline
Three case studies on dissecting classifiers can we predict? can we say what in the data helps predict?
Support Vector Decomposition Machines: can we test hypotheses/incorporate constraints?
introduction : classification experiments : SVDM
20
classification concepts
Goal:for each example,learn to predict the value of its label
voxel
examples
voxels (features)
…
toolsbuildingstoolsbuildingsbuildingstools
labels
introduction : classification experiments : SVDM
tools
21
classification concepts
Training set Training labels
PredictionModel
Test set
PredictionModel
Predicted labels
True test labels
Predicted labelsAccuracy (or lack thereof…)
introduction : classification experiments : SVDM
22
classification concepts
Training set Training labels
PredictionModel
Test set
PredictionModel
Predicted labels
True test labels
Predicted labelsAccuracy (or lack thereof…)
Voxel Selection
introduction : classification experiments : SVDM
Voxels have information!
23
classification concepts
PredictionModel 1
introduction : classification experiments : SVDM
cross-validationPredictionModel 2
PredictionModel 3
Training data
Test data
Accuracy 1 Accuracy 2 Accuracy 3
Average accuracy
24
classification concepts
Simple, less prone to overfitting
Various kinds Gaussian Naive Bayes/Logistic Regression/Linear SVM
differ on how weights are chosen
If
otherwise
tools
buildings
introduction : classifier experiments : SVDM
Linear Discriminants:
...voxel 2voxel 1
weight1x
voxel n
weight2x
weight nx+ + + +weight 0 + > 0
25
classification concepts
Few examples (10s-100s) Many voxels (10K-100K) Noise:
the scanner (mostly dealt with by preprocessing) the brain/body (large blood vessels, breathing) the subject (motion, distraction) the subject (habituation, different strategies) the subject (vision, use of language, attention)
introduction : classification experiments : SVDM
26
experiments
3 studies designed to: elicit mental representations of semantic categories try to understand how those map to brain activation
introduction : classification experiments : SVDM
27
experiments
3 studies designed to: elicit mental representations of semantic categories try to understand how those map to brain activation
The features are voxels Linear discriminant classifiers:
Gaussian Naive Bayes/Logistic Regression/Linear SVM
Leave-one-of-each-class-out cross-validation Best subject results (consistent across subjects)
introduction : classification experiments : SVDM
28
2 categories experiment
Subjects read concrete nouns in 2 categories words are either tools or buildings task:
see a word/think about it for 3 sec., 8 sec. pause afterwards
e.g. “hammer”, “saw”, “palace”, “hut”
introduction : classification experiments : SVDM
29
2 categories experiment
Subjects read concrete nouns in 2 categories words are either tools or buildings task:
see a word/think about it for 3 sec., 8 sec. pause afterwards
e.g. “hammer”, “saw”, “palace”, “hut”
Classification task: predict the category Example:
average 3D image of middle 4 secs of a trial
42 examples of each noun category 10K-20K features
introduction : classification experiments : SVDM
30
2 categories linear discriminants
It’s possible to predict category using all the voxels
GNB weights(accuracy 65%)
introduction : classification experiments : SVDM
31
2 categories linear discriminants
It’s possible to predict category using all the voxels
GNB weights(accuracy 65%)
L2 LogisticRegressionweights(accuracy 74%)
correlation 0.8
introduction : classification experiments : SVDM
32
2 categories voxel accuracy map
What is each voxel contributing?
GNB weights(accuracy 0.65)
accuracy ofvoxelwiseprediction
introduction : classification experiments : SVDM
33
[Kriegeskorte 2002]:
Examine information inside a small region Train a classifier for
each voxel togetherwith its neighbours
voxel searchlight
introduction : classification experiments : SVDM
34
2 categories voxel searchlight map
accuracy ofvoxelprediction
accuracy ofvoxelsearchlightprediction
introduction : classification experiments : SVDM
35
experiments – voxel selection
Scoring methods for voxel selection activation (different from zero in at least one class) accuracy (training set cross-validation accuracy of a voxel) searchlight accuracy (same but accuracy of voxel+neighbours) weight range (training set logistic reg. weights across classes)
introduction : classification experiments : SVDM
36
experiments – voxel selection
Scoring methods for voxel selection activation (different from zero in at least one class) accuracy (training set cross-validation accuracy of a voxel) searchlight accuracy (same but accuracy of voxel+neighbours) weight range (training set logistic reg. weights across classes)
Filter voxel selection in each fold rank voxels by their score according to a method pick top 10, top 20, top 40, etc
introduction : classification experiments : SVDM
37
8 categories experiment
Stimuli are photographs of objects in 8 categories faces, houses, cats, bottles, scissors, shoes, chairs,
scrambled block: series of photographs of the same category, one
each 2 sec... [Haxby 2001]
introduction : classification experiments : SVDM
38
8 categories experiment
Stimuli are photographs of objects in 8 categories faces, houses, cats, bottles, scissors, shoes, chairs,
scrambled block: series of photographs of the same category, one
each 2 sec
Classification task: predict the category Example: average 3D image in a 24 second block 12 examples of each category O(20K) features
... [Haxby 2001]
introduction : classification experiments : SVDM
39
8 categories experiment
GNB Log.Reg.
all cortex voxels 35% 43%
introduction : classification experiments : SVDM
40
8 categories experimentPeak accuracy selecting 200 voxels with the 4 methods:
GNB Log.Reg.activation 85% 88%accuracy 86% 90%searchlight accuracy 84% 88%weight range 93% 92%
all cortex voxels 35% 43%
introduction : classification experiments : SVDM
41
8 categories experimentPeak accuracy selecting 200 voxels with the 4 methods:
GNB Log.Reg. Fold Overlapactivation 85% 88% 0.06accuracy 86% 90% 0.26searchlight accuracy 84% 88% 0.20weight range 93% 92% 0.38
all cortex voxels 35% 43%
#voxels selected on all folds#voxels selected on any fold
= overlap
42
8 categories experiment
What are the selected voxels doing?
voxe
ls
categories
facehouse
catbottle
scissorsshoe
chairscramble
mean value in examplesof each category
introduction : classification experiments : SVDM
43
8 categories experiment
What are the selected voxels doing?
voxe
ls
categories
facehouse
catbottle
scissorsshoe
chairscramble
mean value in examplesof each category
introduction : classification experiments : SVDM
clustervoxels
44
10 exemplar experiment
subjects read concrete nouns in 2 categories words are either tools or buildings task:
see a word/think about it for 3 sec., 8 sec. pause afterwards
subjects do the same task with drawings
Classification task: predict the exemplar Example:
average 3D image middle 4 secs of a trial
6 examples of each exemplar
introduction : classification experiments : SVDM
45
10 exemplar experiment
Peak accuracy selecting 400 voxels with the 4methods:
GNB Log.Reg. Fold Overlapactivation 70% 58% 0.09accuracy 72% 70% 0.01searchlight accuracy 90% 92% 0.26weight range 72% 45% 0.03
all cortex voxels 23% 22%
introduction : classification experiments : SVDM
46
10 exemplar experiment
Peak accuracy selecting 400 voxels with the 4methods:
GNB Log.Reg. Fold Overlapactivation 70% 58% 0.09accuracy 72% 70% 0.01searchlight accuracy 90% 92% 0.26weight range 72% 45% 0.03
all cortex voxels 23% 22%
Clustering of voxel activity harder to interpret
What makes searchlight accuracy better here?
introduction : classification experiments : SVDM
47
10 exemplar experiment
searchlightselected voxelspicture stimuli
voxelcorrelation
introduction : classification experiments : SVDM
subject 1
subject 2
voxelcorrelation
48
classifier experiment conclusions
What do we care about? prediction accuracy describing what was learnt intelligibly
location voxel behaviour reduced to a few classes voxel relationships
reproducibility [Strother 2002] within subject/across subjects
consistency with prior knowledge
introduction : classification experiments : SVDM
49
classifier experiment conclusions
What do we care about? prediction accuracy describing what was learnt intelligibly
location voxel behaviour reduced to a few classes voxel groupings
reproducibility [Strother 2002] within subject/across subjects
consistency with prior knowledge
Information is redundant (present in many locations/different) local (voxels do more than correlate with neighbours)
introduction : classification experiments : SVDM
50
classifier experiment conclusions
What do we care about? prediction accuracy describing what was learnt intelligibly
location voxel behaviour reduced to a few classes voxel groupings
reproducibility [Strother 2002] within subject/across subjects
consistency with prior knowledge (mostly location)
Information is redundant (present in many locations/different) local (voxels do more than correlate with neighbours).
Voxels are a painintroduction : classification experiments : SVDM
51
outline
Three case studies on dissecting classifiers can we predict? can we say what in the data helps predict?
Support Vector Decomposition Machines: can we test hypotheses/incorporate constraints?
introduction : classification experiments : SVDM
52
low-dimensional spatial decompositions
=
components or eigenimages
introduction : classification experiments : SVDM
example
a + b + c + d
(a,b,c,d)are coordinates in a basis of components
53
low-dimensional spatial decompositions
…
=
=
x
n examples X
m voxels
= xZ W l components
l-dimensionalrepresentation of data
introduction : classification experiments : SVDM
m voxels
n examples
l components
54
low-dimensional spatial decompositions
Why express an image in terms of a basis of images? basis has orthogonal (SVD) or independent (ICA) components minimize reconstruction error basis images capture spatial patterns of activity over many voxels low-dimensional coordinates give a succinct representation of the data
…
=
=
x
introduction : classification experiments : SVDM
55
low-dimensional spatial decompositions
…
=
=
x
classify from new features?
introduction : classification experiments : SVDM
Why express an image in terms of a basis of images? basis has orthogonal (SVD) or independent (ICA) components minimize reconstruction error basis images capture spatial patterns of activity over many voxels low-dimensional coordinates give a succinct representation of the
data
56
low-dimensional spatial decompositions
However: all the activation is captured it will be split among basis images by method criteria minimizes reconstruction error rather than classification
error
tools
buildings
introduction : classification experiments : SVDM
57
support vector decomposition machine (SVDM)
Learning a linear SVM basedon a low-dimensional
representation
Learning an informativelow-dimensionalrepresentation
introduction : classification experiments : SVDM
58
support vector decomposition machine (SVDM)
Learning a linear SVM basedon a low-dimensional
representation
Learning an informativelow-dimensionalrepresentation
introduction : classification experiments : SVDM
59
SVDM as a spatial decomposition
…
=
=
x
new features to classifyfrom with linear discriminant
n examples X
m voxels
= xZ W l components
l-dimensionalrepresentation of data
introduction : classification experiments : SVDM
m voxels
!
60
SVDM notation
n examples
m featuresk classification problems(e.g. tools vs buildingsand word vs picture)
Y X
introduction : classification experiments : SVDM
61
SVDM notation
n examples
m featuresk classification problems
Y X
l componentsxZ WX =
m features
Y = Z x
k classification problems
!!Z xY =
Predictions
sign
l componentsLearnt
introduction : classification experiments : SVDM
62
SVDM optimization problem
Find Z,W and that minimize!
where
subject to
examples
classificationproblems
introduction : classification experiments : SVDM
63
SVDM optimization problem
Find Z,W and that minimize!
where
subject to
reconstruction error
examples
classificationproblems
introduction : classification experiments : SVDM
64
SVDM optimization problem
Find Z,W and that minimize!
where
subject to
reconstruction error classification error
examples
classificationproblems
introduction : classification experiments : SVDM
65
SVDM optimization problem
Find Z,W and that minimize!
where
subject to
reconstruction error
weight of the hingeloss term relative toreconstruction error
examples
classificationproblems
introduction : classification experiments : SVDM
classification error
66
SVDM optimization problem
Find Z,W and that minimize!
where
subject to
reconstruction error
weight of thehinge loss term
example hasmultiple labels
examples
classificationproblems
introduction : classification experiments : SVDM
classification error
67
SVDM parameters
L - # of components try a reasonable range (e.g. 1-15)
D is the parameter that trades off the two errors D smaller
better reconstruction of X, more training set errors
D largerfewer training set errors at the expense of reconstructing X
in practice, cross-validation,pick the smallest D with very few or no errors
introduction : classification experiments : SVDM
68
SVDM classification performance
2 category taskdescribed earlier
Beats GNB andSVM based onSVD or ICA
Learnsinformativecomponents(usable by anotherclassifier)
introduction : classification experiments : SVDM
69
SVDM linear discriminant
GNB(65%)
SVM(linear)(74%)
SVDM(79%)
introduction : classification experiments : SVDM
70
SVDM linear discriminant
GNB(65%)
SVM(linear)(74%)
SVDM(80%)
introduction : classification experiments : SVDM
71
SVDM work in progress
Multi-class Learn components to represent subsets of the classes
introduction : classification experiments : SVDM
72
SVDM work in progress
Multi-class Learn components to represent subsets of the classes
Multi-subject shared and Z shared
!
!
introduction : classification experiments : SVDM
= Z W1X1 X2W2
73
SVDM work in progress
Multi-class Learn components to represent subsets of the classes
Multi-subject shared and Z shared
Constraints component smoothness/sparsity voxel behaviour (e.g. active in few classes) hypothesis-driven component sharing
!
!
introduction : classification experiments : SVDM
= Z W1X1 X2W2
74
Conclusions
We can predict
We can say what in the image is related towhat we are trying to predict, and how (sometimes)
We are working on making it easy to addconstraints or build prediction models thatincorporate hypotheses
introduction : classifier experiments : SVDM