![Page 2: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/2.jpg)
Whatcanyousayaboutthefigure?
2
0.0 0.5 1.0 1.5
0.0
0.5
1.0
signal T
sign
al C
• ≈1500subjects
• Twomeasurementspersubject
![Page 3: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/3.jpg)
3
0.0 0.5 1.0 1.5
0.0
0.5
1.0
signal T
sign
al C
![Page 4: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/4.jpg)
40.0 0.5 1.0 1.5
0.0
0.5
1.0
signal T
sign
al C
CC
TT
CT
![Page 5: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/5.jpg)
ClusterAnalysis
• Seeksrulestogroupdata– Largebetween-clusterdifference– Smallwithin-clusterdifference
• Exploratory
• Aimstounderstand/learntheunknownsubstructureofmultivariatedata
5
![Page 6: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/6.jpg)
ClusterAnalysisvsClassification
• Dataareunlabeled
• Thenumberofclustersareunknown
• “Unsupervised”learning
• Goal:findunknownstructures
6
• Thelabelsfortrainingdataareknown
• Thenumberofclassesareknown
• “Supervised”learning
• Goal:allocatenewobservations,whoselabelsareunknown,tooneoftheknownclasses
![Page 7: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/7.jpg)
TheIrisData
• ItwascollectedbyF.A.Fisher• Afamousdatasetthathasbeenwidelyusedintextbooks
• Fourfeatures:– sepallengthincm– sepalwidthincm– petallengthincm– petalwidthincm
![Page 8: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/8.jpg)
TheIrisData
• Threetypes:– Setosa
– Versicolor
– Virginica
8
![Page 9: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/9.jpg)
TheIrisDataSepalL.SepalW.PetalL.PetalW.[1,]5.13.51.40.2[2,]4.93.01.40.2[3,]4.73.21.30.2[4,]4.63.11.50.2[5,]5.03.61.40.2[6,]5.43.91.70.4[7,]4.63.41.40.3[8,]5.03.41.50.2[9,]4.42.91.40.2…………[45,]5.13.81.90.4[46,]4.83.01.40.3[47,]5.13.81.60.2[48,]4.63.21.40.2[49,]5.33.71.50.2[50,]5.03.31.40.2
SepalL.SepalW.PetalL.PetalW.[1,]6.33.36.02.5[2,]5.82.75.11.9[3,]7.13.05.92.1[4,]6.32.95.61.8[5,]6.53.05.82.2[6,]7.63.06.62.1[7,]4.92.54.51.7[8,]7.32.96.31.8[9,]6.72.55.81.8…………[45,]6.73.35.72.5[46,]6.73.05.22.3[47,]6.32.55.01.9[48,]6.53.05.22.0[49,]6.23.45.42.3[50,]5.93.05.11.8
IrisSetosa IrisVirginica
SepalL.SepalW.PetalL.PetalW.[1,]7.03.24.71.4[2,]6.43.24.51.5[3,]6.93.14.91.5[4,]5.52.34.01.3[5,]6.52.84.61.5[6,]5.72.84.51.3[7,]6.33.34.71.6[8,]4.92.43.31.0[9,]6.62.94.61.3…………[45,]5.62.74.21.3[46,]5.73.04.21.2[47,]5.72.94.21.3[48,]6.22.94.31.3[49,]5.12.53.01.1[50,]5.72.84.11.3
IrisVersicolor
![Page 10: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/10.jpg)
TheIrisData
10
SepalL. SepalW. PetalL. PetalW.
Setosa
Versicolor
Virginica
![Page 11: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/11.jpg)
ClusteringMethods
• Model-free:– Nonhierarchicalclustering.K-means.– Hierarchicalclustering.Basedonsimilaritymeasures
• Model-basedclustering
11
![Page 12: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/12.jpg)
Model-FreeClusteringNonhierarchicalClustering:K-Means
12
![Page 13: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/13.jpg)
K-Means
• Assigneachobservationtotheclusterwiththenearestmean
• “Nearest”isusuallydefinedbasedonEuclideandistance
13
![Page 14: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/14.jpg)
K-Means:Algorithm
• Step0:Preprocessdata.Standardizedataifappropriate
• Step1:PartitiontheobservationsintoK initialclusters.
• Step2– 2.a(updatestep):Calculatethecentroids.– 2.b(assignmentstep):Assigneachobservationtoitsnearestcluster.
• Repeatstep2untilnomorechangesinassignments
14
![Page 15: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/15.jpg)
From“AnIntroductiontoStatisticalLearning”15
![Page 16: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/16.jpg)
Remarks
• Beforeconvergence,eachstepisguaranteedtodecreasethewithin-clustersumofsquaresobjective
• Withinafinitenumberofsteps,thealgorithmmightconvergetoa(local)minimum
• Usedifferentandrandominitialvalues
16
![Page 17: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/17.jpg)
DifferentInitialValues
17From“AnIntroductiontoStatisticalLearning”
![Page 18: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/18.jpg)
Example:ClusterAnalysisofIrisData(PetalL&W)
• Pretendthattheiristypesoftheobservationsareunknown=>clusteranalysis
• Asanexample,andforillustrationpurpose,wewillusepetallengthandwidth
• ChooseK=3• K-means
18
![Page 19: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/19.jpg)
K-MeanClustering:Iris(PetalL&W)
19Note:theanimationinthefiguredoesn’tworkappropriatelyonMAC.
![Page 20: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/20.jpg)
K-MeanClustering:Iris(PetalL&W)
20
![Page 21: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/21.jpg)
K-MeanClustering:Iris(PetalL&W)
21
Iteration= 1
0.0
0.2
0.4
0.6
0.8
1.0
Iteration= 2
0.0
0.2
0.4
0.6
0.8
1.0
Iteration= 3
0.0
0.2
0.4
0.6
0.8
1.0
Iteration= 4
0.0
0.2
0.4
0.6
0.8
1.0
Iteration= 5
0.0
0.2
0.4
0.6
0.8
1.0
Iteration= 6
0.0
0.2
0.4
0.6
0.8
1.0
Iteration= 7
0.0
0.2
0.4
0.6
0.8
1.0
Iteration= 8
0.0
0.2
0.4
0.6
0.8
1.0
Iteration= 9
0.0
0.2
0.4
0.6
0.8
1.0
Setosa VersicolorVirginica
Setosa VersicolorVirginica
Setosa VersicolorVirginica
Setosa VersicolorVirginica
Setosa VersicolorVirginica
Setosa VersicolorVirginica
Setosa VersicolorVirginica
Setosa VersicolorVirginica
Setosa VersicolorVirginica
![Page 22: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/22.jpg)
K-MeanClustering:Iris(PetalL&W)
Setosa VersicolorVirginica22
Note:theanimationinthefiguredoesn’tworkappropriatelyonMAC.
![Page 23: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/23.jpg)
Model-FreeClustering:HierarchicalClustering
23
![Page 24: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/24.jpg)
HierarchicalClustering
• Thenumberofclustersisnotrequired• Givesatree-basedrepresentationofobservations- dendrogram
• Eachleafrepresentsanobservation
• Leavessimilar witheachotherarefusedtobranches
• Leaves/branchessimilarwitheachotherarefusedtobranches
• …24
![Page 25: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/25.jpg)
Setosa Virginica Versicolor
25
![Page 26: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/26.jpg)
HierarchicalClustering
• Togrowatree,weneedtodefinedissimilarities(distances)betweenleaves/branches– Twoleaves:easy.Onecanuseadissimilaritymeasure
– Aleafandabranch:therearedifferentoptions– Twobranches:similarto“aleafandabranch”,therearedifferentoptions
26
![Page 27: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/27.jpg)
DistancebetweenTwoBranches/Clusters
Singlelinkage
Completelinkage
Averagelinkage
Manyotheroptions!27
![Page 28: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/28.jpg)
Model-BasedClustering
28
![Page 29: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/29.jpg)
Model-BasedClustering:MixtureModel
• ConsiderarandomvariableX.• WesayitfollowsamixtureofK distributionsifitsdistributioncanberepresentedusingKdistributions:
• Theweightspk,k=1,…,K arenonnegativenumbersandtheyaddupto1
29
![Page 30: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/30.jpg)
ClusterAnalysisBasedonMixtureModel
• Ipresentafrequentistversion– Chooseanappropriatemodel.E.g.,AGaussianmixturemodelwithK=2clusters
– Writedownthelikelihoodfunction– Findthemaximumlikelihoodestimateoftheparameters
– CalculatethePr(clusterk|observationxi)fori=1,…,n,k=1,2
30
![Page 31: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/31.jpg)
The Maximum Likelihood Estimate (MLE) of the Parameters
• Aneasy-to-implementalgorithmtofindtheMLEsiscalledtheExpectationandMaximization(EM)algorithm
• Initializeparameters• Estep:calculate“conditional”expectation.
– “conditional”meansconditionaloncurrentestimateoftheparameters
– Thisstepinvolvescalculatingprob(clusterk|obs I,currentestimateofpara),k=1,…,K,i=1,…,n
– Thisstepissimilartotheassignment stepinanK-meansalgorithm
31
![Page 32: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/32.jpg)
TheMaximumLikelihoodEstimate(MLE)oftheParameters
• TheMstep:findthesetofvaluesthatmaximizetheconditionalexpectationcalculatedintheEstep.Thisstepupdatestheparametervalues
• RepeattheEandMstepsuntilconvergence
32
![Page 33: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/33.jpg)
EM vsK-Mean
EM• Step1:initialization• E: Calculateconditionalprobabilities
• Mstep:Findoptimalvaluesforparameters
• RepeattheEandMstepsuntilconvergence
• Allowsclusterstohavedifferentshapes
33
K-Mean• Step1:initialization• Step2a:guessclustermembership
• Step2b:findclustercenters
• Repeat2a-2buntilconvergence
![Page 34: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/34.jpg)
Example:GaussianMixtureModel
• Observeddata(simulatedfromtwonormaldistributions)– 0.371.180.162.601.330.181.491.743.582.694.513.392.380.794.122.962.983.943.823.59
• AssumingK=2
• Parameters:μ1,μ0,σ1,σ0,p
34
![Page 35: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/35.jpg)
Example:simulateddata
Group1~N(1,1)
Group2~N(3,1) 35
Note:theanimationinthefiguredoesn’tworkappropriatelyonMAC.
![Page 36: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/36.jpg)
Example:ClusterAnalysisofIrisDataUsingPetalLength
Setosa Versicolor36
Note:theanimationinthefiguredoesn’tworkappropriatelyonMAC.
![Page 37: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/37.jpg)
RPackage:MCLUST• DevelopedbyAdrianRaftery andcolleagues
• Gaussianmixturemodel
• EM
• Clustering,classification,densityestimation
• Pleasetryitout!
37
![Page 38: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/38.jpg)
ClusteringAnalysisForMultidimensionalData
38
![Page 39: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/39.jpg)
MultidimensionalData• Humanfaces,images
• 3Dobjects
• Textdocuments
• Brainimaging
39
![Page 40: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/40.jpg)
40
WholeBrainConnectivity
Sub1
Sub2
Sub3
Sub4
task1task2task3rest1rest2rest3
![Page 41: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/41.jpg)
BrainConnectivityvsFingerprint
41SubjectID
![Page 42: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/42.jpg)
42
task1task2task3rest1rest2rest3
![Page 43: An Introduction to Cluster Analysis · Cluster Analysis Based on Mixture Model • I present a frequentist version – Choose an appropriate model. E.g., A Gaussian mixture model](https://reader033.vdocument.in/reader033/viewer/2022050200/5f5405bd9a64c7534779d9a8/html5/thumbnails/43.jpg)
SomeTechnicalDetails
43
?