tw pattern classification - sbirc.ed.ac.uk · pdf filespm course edinburgh 2010 ... advantages...
TRANSCRIPT
Multivariate Pattern Multivariate Pattern ClassificationClassification
Thomas WolbersThomas WolbersSpace and Aging LaboratorySpace and Aging LaboratoryCentre for Cognitive and Neural SystemsCentre for Cognitive and Neural Systems
SPM Course Edinburgh 2010
WHY PATTERN CLASSIFICATION?WHY PATTERN CLASSIFICATION?
PROCESSING STREAMPROCESSING STREAM
PREPROCESSING / FEATURE REDUCTIONPREPROCESSING / FEATURE REDUCTION
CLASSIFICATIONCLASSIFICATION
EVALUATING RESULTSEVALUATING RESULTS
APPLICATIONSAPPLICATIONS
OutlineOutline
WHY PATTERN CLASSIFICATION?WHY PATTERN CLASSIFICATION?
PROCESSING STREAMPROCESSING STREAM
PREPROCESSING / FEATURE REDUCTIONPREPROCESSING / FEATURE REDUCTION
CLASSIFICATIONCLASSIFICATION
EVALUATING RESULTSEVALUATING RESULTS
APPLICATIONSAPPLICATIONS
SPM Course Edinburgh 2010
== ++
Tim
e (
Tim
e ( s
can
scan
))
WhyWhy patternpattern classclass.?.?datadata parameterparameter errorerrordesigndesign matrixmatrix
β1β2β3β4β5β6β7β8β9β10β0
εε== ββ ++yy X ••
••
GLM: separate GLM: separate modelmodel fittingfitting forfor eacheach voxel voxel massmass‐‐univariateunivariate analysisanalysis!!
SPM Course Edinburgh 2010WhyWhy patternpattern classclass.?.?
Key idea behind pattern classificationKey idea behind pattern classificationGLM analysis relies exclusively on the information contained in the time course of individual voxelsMultivariate analyses take advantage of the information contained in activity patterns across space, frommultiple voxels Cognitive/Sensorimotor states are expressed in the brain as distributed patterns of brain activity
GLM GLM
SPM Course Edinburgh 2010WhyWhy patternpattern classclass.?.?
Advantages of multivariate pattern classificationAdvantages of multivariate pattern classification
increaseincrease in in sensitivitysensitivity: : weakweak informationinformation in in singlesingle voxels voxels isisaccumulatedaccumulated acrossacross manymany voxelsvoxels
multiple multiple regionsregions/voxels /voxels maymay onlyonly carry carry infoinfo aboutabout brainbrainstatesstates whenwhen jointlyjointly analyzedanalyzed
can preventcan prevent informationinformation lossloss duedue to to spatialspatial smoothing smoothing (but see Op de Beeck, 2009 / Kamitani & Sawahata 2010)(but see Op de Beeck, 2009 / Kamitani & Sawahata 2010)
cancan preservepreserve temporal temporal resolutionresolution insteadinstead of of characterizingcharacterizingaverageaverage responseresponse acrossacross manymany trialstrials
SPM Course Edinburgh 2010OutlineOutline
WHY PATTERN CLASSIFICATION?WHY PATTERN CLASSIFICATION?
PROCESSING STREAMPROCESSING STREAM
PREPROCESSING / FEATURE REDUCTIONPREPROCESSING / FEATURE REDUCTION
CLASSIFICATIONCLASSIFICATION
EVALUATING RESULTSEVALUATING RESULTS
APPLICATIONSAPPLICATIONS
SPM Course Edinburgh 2010
Haynes & Rees (2005). Current Biology
Can spontaneous changes in conscious experience be decodedfrom fMRI signals in early visual cortex?
BINOCULAR RIVALRYBINOCULAR RIVALRY
SPM Course Edinburgh 2010ProcessingProcessing streamstream
1. Acquire fMRI data while subject is viewing blue and red gratings
SPM Course Edinburgh 2010
1. Acquire fMRI data
2. Preprocess fMRI data
ProcessingProcessing streamstream
SPM Course Edinburgh 2010
1. Acquire fMRI data2. Preprocess fMRI data
3. Select relevant features (i.e. voxels)
ProcessingProcessing streamstream
SPM Course Edinburgh 2010
1. Acquire fMRI data2. Preprocess fMRI data3. Select features
4. Convert each fMRI volume into a vector that reflects the pattern of activity across voxels at that point in time.
ProcessingProcessing streamstream
SPM Course Edinburgh 2010ProcessingProcessing streamstream
1. Acquire fMRI data2. Preprocess fMRI data3. Select features4. Generate fMRI patterns
5. Label fMRI patterns according to whether the subject was perceiving blue vs. red (adjusting for hemodynamic lag)
SPM Course Edinburgh 2010
1. Acquire fMRI data2. Preprocess fMRI data3. Select features4. Generate fMRI patterns5. Label fMRI patterns
6. Train a classifier to discriminate between blue patterns and red patterns
ProcessingProcessing streamstream
SPM Course Edinburgh 2010
1. Acquire fMRI data2. Preprocess fMRI data3. Select features4. Generate fMRI patterns5. Label fMRI patterns6. Train the classifier
7. Apply the trained classifier to new fMRI patterns (not presented at training).
ProcessingProcessing streamstream
SPM Course Edinburgh 2010
1. Acquire fMRI data2. Preprocess fMRI data3. Select features4. Generate fMRI patterns5. Label fMRI patterns6. Train the classifier7. Apply the trained classifier
to new fMRI patterns (not presented at training).
8. Crossvalidation
ProcessingProcessing streamstream
SPM Course Edinburgh 2010
1. Acquire fMRI data2. Preprocess fMRI data3. Select features4. Generate fMRI patterns5. Label fMRI patterns6. Train the classifier7. Apply the trained classifier
to new fMRI patterns (not presented at training).
8. Crossvalidation
9. Statistical inference
ProcessingProcessing streamstream
SPM Course Edinburgh 2010OutlineOutline
WHY PATTERN CLASSIFICATION?WHY PATTERN CLASSIFICATION?
PROCESSING STREAMPROCESSING STREAM
PREPROCESSING / FEATURE REDUCTIONPREPROCESSING / FEATURE REDUCTION
CLASSIFICATIONCLASSIFICATION
EVALUATING RESULTSEVALUATING RESULTS
APPLICATIONSAPPLICATIONS
SPM Course Edinburgh 2010PreprocessingPreprocessing
1.1. ((SliceSlice Timing +) Realignment Timing +) Realignment (SPM,(SPM, FSL FSL ……))
2.2. HighHigh‐‐pass filtering / Detrendingpass filtering / Detrendingremoveremove linear (and linear (and quadraticquadratic) ) trendstrends ((i.ei.e. . scannerscannerdrift)drift)removeremove lowlow‐‐frequencyfrequency artifactsartifacts ((i.ei.e. . biosignalsbiosignals))
3.3. ZZ‐‐ScoringScoringremoveremove baselinebaseline shiftsshifts betweenbetween scanningscanning runsrunsreducereduce impactimpact of of outliersoutliers
SPM Course Edinburgh 2010Feature Feature ReductionReduction
TheThe problemproblemfMRI fMRI datadata areare typicallytypically sparsesparse, , highhigh‐‐dimensionaldimensional and and noisynoisy
ClassificationClassification isis sensitive to sensitive to informationinformation contentcontent in all voxelsin all voxels
manymany uninformative voxels = uninformative voxels = poorpoor classificationclassification ((i.ei.e. . duedueto to overfittingoverfitting))
number of features
performan
ce
Solution 1: Feature Solution 1: Feature selectionselection
selectselect subsetsubset withwith thethe mostmost informative informative featuresfeaturesoriginal original featuresfeatures remainremain unchangedunchanged
SPM Course Edinburgh 2010Feature Feature SelectionSelection
‘‘ExternalExternal‘‘ SolutionsSolutionsAnatomicalAnatomical regionsregions of of interestinterestIndependent Independent functionalfunctional localizerlocalizer (Haynes & (Haynes & ReesRees: : retinotopicretinotopic mappingmapping to to identifyidentify earlyearly visualvisual areas)areas)Searchlight classification: define region of interest (i.e. Searchlight classification: define region of interest (i.e. sphere) and move it across the search volume sphere) and move it across the search volume exploratory analysisexploratory analysis
‘‘InternalInternal‘‘ univariateunivariate solutionssolutionsactivationactivation vs. vs. baselinebaseline (t(t‐‐Test)Test)meanmean differencedifference betweenbetween conditionsconditions (ANOVA)(ANOVA)singlesingle voxel voxel classificationclassification accuracyaccuracy
SPM Course Edinburgh 2010Feature Feature SelectionSelection
PeekingPeeking #1 (ANOVA and #1 (ANOVA and classificationclassification onlyonly))testingtesting a a trainedtrained classifierclassifier needsneeds to to bebe performedperformed on on independentindependent test test datasetsdatasetsifif entireentire datasetdataset isis usedused forfor featurefeature selectionselection, , classificationclassification estimatesestimates becomebecome overlyoverly optimisticoptimisticnestednested crosscross‐‐validationvalidation!!
Pereira et al. (2009)
SPM Course Edinburgh 2010Feature Feature ExtractionExtraction
Solution 1: Feature Solution 1: Feature selectionselection
selectselect subsetsubset fromfrom all all availableavailable featuresfeaturesoriginal original featuresfeatures remainremain unchangedunchanged
createcreate newnew featuresfeatures as a as a functionfunction of of existingexisting featuresfeaturesLinear Linear functionsfunctions (PCA,(PCA, ICA,ICA,……))NonlinearNonlinear functionsfunctions duringduringclassificationclassification ((i.ei.e. . hiddenhidden unitsunits in a in a neuralneural networknetwork))
Solution 2: Feature Solution 2: Feature extractionextraction
SPM Course Edinburgh 2010OutlineOutline
WHY PATTERN CLASSIFICATION?WHY PATTERN CLASSIFICATION?
PROCESSING STREAMPROCESSING STREAM
PREPROCESSING / FEATURE REDUCTIONPREPROCESSING / FEATURE REDUCTION
CLASSIFICATIONCLASSIFICATION
EVALUATING RESULTSEVALUATING RESULTS
APPLICATIONSAPPLICATIONS
SPM Course Edinburgh 2010ClassificationClassification
Linear Linear classificationclassification
voxel 1
voxel 2
volume in t1
volume in t2
volume in t4
volume in t32
4
trainingtraining datadata
volume in t25
independent independent test test datadata
hyperplanehyperplane
ourour tasktask: find a : find a hyperplanehyperplane thatthat separates separates bothboth conditionsconditions
SPM Course Edinburgh 2010ClassificationClassification
Linear Linear classificationclassification
voxel 1
voxel 2
volume in t1
volume in t2
volume in t4
volume in t32
4
bxwxwxwxfy nn ++++== ...)( 2211decisiondecision functionfunction::
•• ifif y y << 0, 0, predictpredict redred // // ifif y y >> 0, 0, predictpredict blueblue•• predictionprediction = = linearlinear functionfunction of of featuresfeatures
trainingtraining datadata
volume in t25
independent independent test test datadata
hyperplanehyperplane
SPM Course Edinburgh 2010ClassificationClassification
Linear Linear classificationclassification
Project data on a new axis that maximes the class separabilityProject data on a new axis that maximes the class separability
hyperplanehyperplane
Project data on a new axis that maximes the class separabilityProject data on a new axis that maximes the class separability
Hyperplane is orthogonal to the best projection axisHyperplane is orthogonal to the best projection axis
SPM Course Edinburgh 2010ClassificationClassification
Simplest Approach: Fisher Linear Simplest Approach: Fisher Linear DiscriminantDiscriminant (FLD)(FLD)
FLD classifies by projecting the training set on the axis that iFLD classifies by projecting the training set on the axis that is defined s defined by the difference between the center of mass for both classes, by the difference between the center of mass for both classes, corrected by the within class scattercorrected by the within class scatter
separation is separation is maximisedmaximised for:for:21
21
covcov classclass
mmw+−
=
SPM Course Edinburgh 2010ClassificationClassification
weightvector w
voxel 1
voxel 2
volume in t1
volume in t2
volume in t4
volume in t3volume in t25
hyperplanehyperplane
Linear Linear classificationclassification
by += wx hyperplane defined by weight vector w and hyperplane defined by weight vector w and offset boffset b
SPM Course Edinburgh 2010ClassificationClassification
How to interpret the How to interpret the weight vector?weight vector?
weightvector w
voxel 1
voxel 2
volume in t1 volume in t2
volume in t4volume in t3
volume in t25
hyperplanehyperplane
Weight vector (Discriminating Volume)W = [0.45 0.89] 0.45 0.89
The value of each voxel in the weight vector indicates its imporThe value of each voxel in the weight vector indicates its importance in tance in discriminating between the two classes (i.e. cognitive states).discriminating between the two classes (i.e. cognitive states).
Linear Linear classificationclassification
SPM Course Edinburgh 2010ClassificationClassification
Support Vector Machine (SVM)
Which of the linear separators is the optimal one?
voxel 2
voxel 1
SPM Course Edinburgh 2010ClassificationClassification
Support Vector Machine (SVM)
SVM = maximum margin classifier
margin
support vectors
voxel 2
voxel 1If classes have overlapping distributions), SVM’s are modified to account for misclassification errors by introducing additional slack variables
SPM Course Edinburgh 2010ClassificationClassification
Linear Linear classifiersclassifiersFisher Linear Fisher Linear DiscriminantDiscriminantSupport Vector Support Vector MachineMachine (SVM)(SVM)LogisticLogistic RegressionRegressionGaussianGaussian Naive Naive BayesBayes……
NonlinearNonlinear classifiersclassifiersSVM SVM withwith kernelkernelNeuralNeural NetworksNetworks……
How to choose the right classifier?
SPM Course Edinburgh 2010ClassificationClassification
Situation 1: Situation 1: scansscans ↓↓, , featuresfeatures ↑↑ ((i.ei.e. . wholewhole brainbrain datadata))
FLD FLD unsuitableunsuitable: : dependsdepends on on reliablereliable estimationestimation of of covariancecovariance matrixmatrix
GNBGNB inferior to SVM and LR inferior to SVM and LR thethe latterlatter come come withwith regularisationregularisationthatthat helphelp weighweigh down down thethe effectseffects of of noisynoisy and and highlyhighly correlatedcorrelatedfeaturesfeatures
Cox & Savoy (2003). NeuroImage
SPM Course Edinburgh 2010ClassificationClassification
Situation 2: Situation 2: scansscans ↓↓, , featuresfeatures ↓↓ ((i.ei.e. . featurefeature selectionselection ororfeaturefeature extractionextraction))
GNB, SVM and LR: GNB, SVM and LR: oftenoften similarsimilar performanceperformanceSVM SVM originallyoriginally designeddesigned forfor twotwo‐‐classclass problemsproblems onlyonlySVM SVM forfor multiclassmulticlass problemsproblems: multiple : multiple binarybinarycomparisonscomparisons, , votingvoting schemescheme to to identifyidentify classesclasses
accuracyaccuracy of SVM of SVM increasesincreases fasterfaster thanthan GNB GNB whenwhen thethenumbernumber of of scansscans increaseincreaseseesee Mitchell et al. (2005) Mitchell et al. (2005) forfor furtherfurther comparisonscomparisonsbetweenbetween different different classifiersclassifiers
SPM Course Edinburgh 2010ClassificationClassification
PeekingPeeking #2#2classifierclassifier performanceperformance = = unbiasedunbiased estimateestimate of of classificationclassification accuracyaccuracyhowhow well well wouldwould thethe classifierclassifier labellabel a a newnew exampleexamplerandomlyrandomly drawndrawn fromfrom thethe samesame distributiondistribution??testingtesting a a trainedtrained classifierclassifier needsneeds to to bebe performedperformed on a on a datasetdataset thethe classifierclassifier has has nevernever seenseen beforebeforeifif entireentire datasetdataset isis usedused forfor trainingtraining a a classifierclassifier, , classificationclassification estimatesestimates becomebecome overlyoverly optimisticoptimistic
Solution: Solution: leaveleave--oneone--outout crossvalidationcrossvalidation
SPM Course Edinburgh 2010ClassificationClassification
CrossvalidationCrossvalidationstandardstandard approachapproach: : leaveleave‐‐oneone‐‐outoutcrossvalidationcrossvalidationsplitsplit datasetdataset intointo n n foldsfolds ((i.ei.e. . runsruns))traintrain classifierclassifier on 1:non 1:n‐‐1 1 foldsfoldstest test thethe trainedtrained classifierclassifier on on foldfold nnrerunrerun trainingtraining//testingtesting whilewhilewithholdingwithholding a different a different foldfoldrepeatrepeat procedureprocedure untiluntil eacheach foldfold has has beenbeen withheldwithheld onceonceClassificationClassification accuracyaccuracy usuallyusuallycomputedcomputed as as meanmean accuracyaccuracy
training set test set
SPM Course Edinburgh 2010OutlineOutline
WHY PATTERN CLASSIFICATION?WHY PATTERN CLASSIFICATION?
PROCESSING STREAMPROCESSING STREAM
PREPROCESSING / FEATURE REDUCTIONPREPROCESSING / FEATURE REDUCTION
CLASSIFICATIONCLASSIFICATION
EVALUATING RESULTSEVALUATING RESULTS
APPLICATIONSAPPLICATIONS
SPM Course Edinburgh 2010EvaluatingEvaluating resultsresults
Independent test Independent test datadataClassificationClassification accuracyaccuracy = = unbiasedunbiased estimateestimate of of thethe truetrue accuracyaccuracyof of thethe classifierclassifierQuestionQuestion: : whatwhat isis thethe probabilityprobability of of obtainingobtaining 57% 57% accuracyaccuracyunderunder thethe null null hypothesishypothesis (no (no informationinformation aboutabout thethe variable of variable of interestinterest in in mymy datadata)?)?Binary classification: pBinary classification: p‐‐valuevalue cancan bebe calculatedcalculated underunder a a binomialbinomialdistributiondistribution withwith N N trialstrials ((i.ei.e. 100) and P . 100) and P probabilityprobability of of successsuccess((i.ei.e. 0.5) . 0.5) MatlabMatlab: p = 1 : p = 1 ‐‐ binocdf(X,N,Pbinocdf(X,N,P) = 0.067 () = 0.067 (hmmhmm……))
X = X = numbernumber of of correctlycorrectly labeledlabeled examplesexamples ((i.ei.e. 57). 57)
CanCan I I publishpublish mymy datadata withwith 57% 57% classificationclassification accuracyaccuracy in in Science Science oror Nature?Nature?
SPM Course Edinburgh 2010EvaluatingEvaluating resultsresults
NonparametricNonparametric approachesapproachesPermutation Permutation teststests ((i.ei.e. . PolynPolyn et al, 2005):et al, 2005):
createcreate a null a null distributiondistribution of of performanceperformance valuesvalues byby repeatedlyrepeatedlygeneratinggenerating scrambledscrambled versionsversions of of thethe classifierclassifier outputoutputMVPA: MVPA: waveletwavelet basedbased scramblingscrambling techniquetechnique ((BullmoreBullmore et al., 2004) et al., 2004)
cancan accomodateaccomodate nonnon‐‐independentindependent datadata
BootstrappingBootstrappingestimateestimate thethe variancevariance and and distributiondistribution of a of a statisticstatistic ((i.ei.e. voxel . voxel weightsweights))Multiple Multiple iterationsiterations of of datadata resamplingresampling byby drawingdrawing withwith replacementreplacementfromfrom thethe datasetdataset
MulticlassMulticlass problemsproblems: : accuracyaccuracy cancan bebe painfulpainfulaverageaverage rank of rank of thethe correctcorrect labellabelaverageaverage of all of all pairwisepairwise comparisonscomparisons
SPM Course Edinburgh 2010GettingGetting resultsresults
Design Design considerationsconsiderationsacquireacquire as as manymany trainingtraining examplesexamples as as possiblepossible classifierclassifier needsneeds to to bebe ableable to to „„seesee throughthrough thethe noisenoise““
averagingaveraging consecutiveconsecutive TRTR‘‘ss cancan helphelp to to reducereduce thethe impactimpact of of noisenoise((butbut maymay also also eliminateeliminate naturalnatural, informative , informative variation)variation)
alternative to averaging: use beta weights from a GLM analysis (alternative to averaging: use beta weights from a GLM analysis (i.e. i.e. based on FIR or HRF) based on FIR or HRF) requires many runs / trialsrequires many runs / trials
avoidavoid usingusing consecutiveconsecutive scansscans forfor trainingtraining a a classifierclassifier lots of lots of highlyhighly similarsimilar datapointsdatapoints do do notnot givegive newnew informationinformation
acquireacquire as as manymany testtest examplesexamples as as possiblepossible increasesincreases thethe powerpower of of significancesignificance testtest
balancebalance conditionsconditions ifif notnot, , classifierclassifier maymay tendtend to to focusfocus on on predominantpredominant conditioncondition
SPM Course Edinburgh 2010OutlineOutline
WHY PATTERN CLASSIFICATION?WHY PATTERN CLASSIFICATION?
PROCESSING STREAMPROCESSING STREAM
PREPROCESSING / FEATURE REDUCTIONPREPROCESSING / FEATURE REDUCTION
CLASSIFICATIONCLASSIFICATION
EVALUATING RESULTSEVALUATING RESULTS
APPLICATIONSAPPLICATIONS
SPM Course Edinburgh 2010ApplicationsApplications
Pattern Pattern discriminationdiscriminationQuestionQuestion 1: do 1: do thethe selectedselected fMRI fMRI datadata containcontain informationinformationaboutabout a variable of a variable of interestinterest ((i.ei.e. . consciousconscious perceptpercept in Haynes & in Haynes & ReesRees)?)?
Pattern Pattern localizationlocalizationQuestionQuestion 2: 2: wherewhere in in thethebrainbrain isis informationinformation aboutaboutthethe variable of variable of interestinterestrepresentedrepresented??weightweight vectorvector containscontains infoinfoon on thethe importanceimportance of of eacheachvoxel voxel forfor differentiatingdifferentiatingbetweenbetween classesclasses
weightvector w
voxel 1
voxel 2
volume in t1
volume in t2
volume in t4
volume in t3volume in t25
hyperplanehyperplane
SPM Course Edinburgh 2010ApplicationsApplications
Pattern localization Pattern localization ‐‐ SpaceSpace
Polyn et al. (2005), Science.
SPM Course Edinburgh 2010ApplicationsApplications
Pattern Pattern localization localization ‐‐ SpaceSpaceSearchlightSearchlight analysisanalysis: : classificationclassification//crossvalidationcrossvalidation isisperformedperformed on a voxel and on a voxel and itsits (spherical)(spherical) neighbourhoodneighbourhoodclassificationclassification accuracyaccuracy isis assignedassigned to to centrecentre voxelvoxelsearchlightsearchlight isis movedmoved acrossacross entireentire datasetdataset to to obtainobtain accuracyaccuracyestimatesestimates forfor eacheach voxelvoxelcancan bebe usedused forfor featurefeature selectionselection oror to to generategenerate a a brainbrain mapmap of of pp‐‐valuesvalues
Hassabis et al. (2009), Current Biology.
positionclass.
SPM Course Edinburgh 2010ApplicationsApplications
Motor intention
Pattern Pattern localization localization ‐‐ TimeTime
QuestionQuestion 3: 3: whenwhen doesdoes thethe brainbrain representrepresent informationinformation aboutaboutdifferentdifferent classesclasses??
Soon et al. (2008), Nature Neuroscience.
SPM Course Edinburgh 2010ApplicationsApplications
Pattern Pattern characterizationcharacterizationQuestionQuestion 4: 4: HowHow areare stimulusstimulus classesclasses representedrepresented in in thethe brainbrain??goalgoal: : characterizingcharacterizing thethe relationshiprelationship betweenbetween stimulusstimulus classesclasses and and BOLD BOLD patternspatternsKay et al. (2008): Kay et al. (2008): trainingtraining of a of a receptivereceptive fieldfield modelmodel forfor eacheach voxel in voxel in V1, V2 and V3 V1, V2 and V3 basedbased on on locationlocation, , spatialspatial frequencyfrequency and and orientationorientation(1750 (1750 naturalnatural imagesimages))
subsequentsubsequent classificationclassification of of completelycompletelynewnew stimulistimuli (120 (120 naturalnatural imagesimages))
SPM Course Edinburgh 2010TopicsTopics
Useful literatureUseful literatureHaynes JD, Rees G (2006) Decoding mental states from brain activity in humans. Nat Rev Neurosci 7:523‐534.Formisano E, De Martino F, Valente G (2008) Multivariate analysis of fMRI time series: classification and regression of brain responses using machine learning. Magn Reson Imaging 26(7):921‐34.Kriegeskorte N, Goebel R, Bandettini P (2006) Information‐based functional brain mapping. Proc Natl Acad Sci U S A 103:3863‐3868.Mitchell TM, et al. (2004) Learning to Decode Cognitive States from Brain Images. Machine Learning 57:145‐175.Norman KA, Polyn SM, Detre GJ, Haxby JV (2006) Beyond mind‐reading: multi‐voxel pattern analysis of fMRI data. Trends Cogn Sci 10:424‐430.O’Toole et al. (2007). Theoretical, statistical, and practical perspectives on pattern‐based classification approaches to the analysis of functional neuroimaging data. J Cogn Neurosci.19(11):1735‐52Pereira F, Mitchell TM, Botvinick M (2009) Machine Learning Classifiers and fMRI: a tutorial overview. Neuroimage 45(1 Suppl):S199‐209.