amin fazel 2006
DESCRIPTION
Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images. Amin Fazel 2006. Department of Computer Science and Electrical Engineering University of Missouri – Kansas City. Motivation and Goals. Chromosomes store genetic information - PowerPoint PPT PresentationTRANSCRIPT
Gaussian Mixture ModelGaussian Mixture Modelclassificationclassification ofof
Multi-Color Fluorescence In Situ Multi-Color Fluorescence In Situ
Hybridization (M-FISH) ImagesHybridization (M-FISH) Images
Amin Fazel
2006
Department of Computer Science and Electrical EngineeringDepartment of Computer Science and Electrical Engineering University of Missouri – Kansas CityUniversity of Missouri – Kansas City
Thursday, June, 2006
CS and EE DepartmentCS and EE DepartmentUMKCUMKC
2/15
Motivation and Goals
• Chromosomes store genetic information
• Chromosome images can indicate genetic disease, cancer, radiation damage, etc.
• Research goals:– Locate and classify each chromosome in
an image– Locate chromosome abnormalities
Thursday, June, 2006
CS and EE DepartmentCS and EE DepartmentUMKCUMKC
3/15
Karyotyping
• 46 human chromosomes form 24 types– 22 different pairs– 2 sex chromosomes, X and Y
• Grouped and ordered by length
Banding Patterns Karyotype
Thursday, June, 2006
CS and EE DepartmentCS and EE DepartmentUMKCUMKC
4/15
Multi-spectral Chromosome Imaging
• Multiplex Fluorescence In-Situ Hybridization (M-FISH) [1996]
• Five color dyes (fluorophores)• Each human chromosome type
absorbs a unique combination of the dyes
• 32 (25) possible combinations of dyes distinguish 24 human chromosome types
Healthy Male
Thursday, June, 2006
CS and EE DepartmentCS and EE DepartmentUMKCUMKC
5/15
M-FISH Images
• 6th dye (DAPI) binds to all chromosomes
DAPI Channel6th Dye
M-FISH Image5 Dyes
Thursday, June, 2006
CS and EE DepartmentCS and EE DepartmentUMKCUMKC
6/15
M-FISH Images
• Images of each dye obtained with appropriate optical filter
• Each pixel a six dimensional vector• Each vector element gives contribution of a
dye at pixel• Chromosomal origin distinguishable at single
pixel (unless overlapping)• Unnecessary to estimate length, relative
centromere position, or banding pattern
Thursday, June, 2006
CS and EE DepartmentCS and EE DepartmentUMKCUMKC
7/15
Bayesian Classification
• Based on probability theory– A feature vector is denoted as
• x = [x1; x2; : : : ; xD]T
– D is the dimension of a vector
• The probability that a feature vector x belongs to class wk is p(wk|x) and this posteriori probability can be computed via
• and
)(
)()|()|(
xp
cPcxpxwp kk
k
k
iii cPcxpxp
1
)()|()(
Probability density function of class wk
Prior probability
Thursday, June, 2006
CS and EE DepartmentCS and EE DepartmentUMKCUMKC
8/15
Gaussian Probability Density Function
• In the D-dimensional space
• is the mean vector • is the covariance matrix
– In the Gaussian distribution lies an assumption that the class model is truly a model of one basic class
)()(2
1
2/12/
1
e||)2(
1),;(
μxΣμx
ΣΣμx
T
DN
μ
Σ
Thursday, June, 2006
CS and EE DepartmentCS and EE DepartmentUMKCUMKC
9/15
Gaussian mixture model GMM
• GMM is a set of several Gaussians which try to represent groups / clusters of data– therefore represent different subclasses
inside one class– The PDF is defined as a weighted sum of
Gaussians
•
C
ccckΝp
1
),;();( Σμxx
Thursday, June, 2006
CS and EE DepartmentCS and EE DepartmentUMKCUMKC
10/15
Gaussian Mixture Models
Equations for GMMs:
multi-dimensional case: becomes vector , becomes covariance matrix .
assume is diagonal matrix:
C
ccccΝp
1
),,();( μxx
22 2/)(e2
1),,(
xxN
)()(2
1 1
e||)2(
1),,(
μxμxμx
T
DN
n
iii
1
2||
211
1
222
1
233
1
0 000
0 0-1 =
Thursday, June, 2006
CS and EE DepartmentCS and EE DepartmentUMKCUMKC
11/15
GMM
• Gaussian Mixture Model (GMM) is characterized by• the number of components,• the means and covariance matrices of
the Gaussian components• the weight (height) of each component
Thursday, June, 2006
CS and EE DepartmentCS and EE DepartmentUMKCUMKC
12/15
GMM
• GMM is the same dimension as the feature space (6-dimensional GMM)
• for visualization purposes, here are 2-dimensional GMMs:
like
liho
od
value1
valu
e2va
lue2
Thursday, June, 2006
CS and EE DepartmentCS and EE DepartmentUMKCUMKC
13/15
GMM
• These parameters are tuned using a iterative procedure called the Expectation Maximization (EM)
• EM algorithm: recursively updates distribution of each Gaussian model and conditional probability to increase the maximum likelihood.
Thursday, June, 2006
CS and EE DepartmentCS and EE DepartmentUMKCUMKC
14/15
GMM Training Flow Chart (1)• Initialize the initial Gaussian means μi using the K-means clustering
algorithm• Initialize the covariance matrices to the distance to the nearest cluster• Initialize the weights 1 / C so that all Gaussian are equally likely
• K-means clustering1. Initialization:
random or max. distance.2. Search:
for each training vector, find the closest code word,assign this training vector to that cell
3. Centroid Update:for each cell, compute centroid of that cell. Thenew code word is the centroid.
4. Repeat (2)-(3) until average distance falls below threshold
Thursday, June, 2006
CS and EE DepartmentCS and EE DepartmentUMKCUMKC
15/15
GMM Training Flow Chart (2)
E step: Computes the conditional expectation of the complete log-likelihood, (Evaluate the posterior probabilities that relate each cluster to each data point in the conditional probability) assuming the current cluster parameters to be correct
M step: Find the cluster parameters that maximize the likelihood of the data assuming that the current data distribution is correct.
N
n cnNic wp
1 ,11
N
ncn
N
ncnn
ic
w
w
1,
1,
1
x
c
j
in
ij
in
ic
cn
jxpp
cxppw
1
,
);|(
);|(
N
ncn
N
n
Ticn
icncn
ic
w
xxw
1,
1
11,
1
))((
Thursday, June, 2006
CS and EE DepartmentCS and EE DepartmentUMKCUMKC
16/15
GMM Training Flow Chart (3)
• recompute wn,c using the new weights, means and covariances. Stop training if
– wn+1,c - wn,c < threshold
• Or the number of epochs reach the specified value. Otherwise, continue the iterative updates.
Thursday, June, 2006
CS and EE DepartmentCS and EE DepartmentUMKCUMKC
17/15
GMM Test Flow Chart
• Present each input pattern x and compute the confidence for each class k:
• Where is the prior probability of class ck estimated by counting the number of training patterns
• Classify pattern x as the class with the highest confidence.
),|()( kk cPcP x
)( kcP
Thursday, June, 2006
CS and EE DepartmentCS and EE DepartmentUMKCUMKC
18/15
Results
Training Input Data
Thursday, June, 2006
CS and EE DepartmentCS and EE DepartmentUMKCUMKC
19/15
ResultsOne GaussianCorrectness
Two GaussianCorrectness
True label
Thursday, June, 2006
CS and EE DepartmentCS and EE DepartmentUMKCUMKC
20/15
Thanks for your patience !