l15:microarray analysis (classification) the biological problem two conditions that need to be...
Post on 20-Dec-2015
214 views
TRANSCRIPT
![Page 1: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/1.jpg)
L15:Microarray analysis (Classification)
![Page 2: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/2.jpg)
The Biological Problem
• Two conditions that need to be differentiated, (Have different treatments).
• EX: ALL (Acute Lymphocytic Leukemia) & AML (Acute Myelogenous Leukima)
• Possibly, the set of genes over-expressed are different in the two conditions
![Page 3: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/3.jpg)
Geometric formulation
• Each sample is a vector with dimension equal to the number of genes.
• We have two classes of vectors (AML, ALL), and would like to separate them, if possible, with a hyperplane.
![Page 4: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/4.jpg)
Basic geometry
• What is ||x||2 ?
• What is x/||x||• Dot product?
x=(x1,x2)
y
€
xT y = x1y1 + x2y2
= || x ||⋅ || y || cosθx cosθy + || x ||⋅ || y || sin(θx )sin(θy )|| x ||⋅ || y || cos(θx −θy )
![Page 5: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/5.jpg)
Dot Product
• Let be a unit vector.– |||| = 1
• Recall that Tx = ||x|| cos
• What is Tx if x is orthogonal (perpendicular) to ?
x
Tx = ||x|| cos
![Page 6: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/6.jpg)
Hyperplane
• How can we define a hyperplane L?
• Find the unit vector that is perpendicular (normal to the hyperplane)
![Page 7: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/7.jpg)
Points on the hyperplane
• Consider a hyperplane L defined by unit vector , and distance 0 from the origin
• Notes;– For all x L, xT must be the
same, xT = 0
– For any two points x1, x2,• (x1- x2)T =0
• Therefore, given a vector , and an offset 0, the hyperplane is the set of all points – {x : xT = 0}
x1
x2
![Page 8: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/8.jpg)
Hyperplane properties
• Given an arbitrary point x, what is the distance from x to the plane L?
• D(x,L) = (Tx - 0)• When are points x1 and x2
on different sides of the hyperplane?
x0
![Page 9: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/9.jpg)
Hyperplane properties
• Given an arbitrary point x, what is the distance from x to the plane L?– D(x,L) = (Tx - 0)
• When are points x1 and x2 on different sides of the hyperplane?
• Ans: If D(x1,L)* D(x2,L) < 0
x0
![Page 10: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/10.jpg)
Separating by a hyperplane• Input: A training set of +ve
& -ve examples• Recall that a hyperplane is
represented by – {x:-0+1x1+2x2=0} or – (in higher dimensions) {x:
Tx-0=0}• Goal: Find a hyperplane
that ‘separates’ the two classes.
• Classification: A new point x is +ve if it lies on the +ve side of the hyperplane (D(x,L)> 0) , -ve otherwise.
x2
x1
+
-
![Page 11: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/11.jpg)
Hyperplane separation
• What happens if we have many choices of a hyperplane?– We try to maximize the
distance of the points from the hyperplane.
• What happens if the classes are not separable by a hyperplane?– We define a function
based on the amount of mis-classification, and try to minimize it
![Page 12: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/12.jpg)
Error in classification
• Sample Function: sum of distances of all misclassified points – Let yi=-1 for +ve example i,
yi=+1 otherwise.
• The best hyperplane is one that minimizes D(,0)
• Other definitions are also possible.
x2
x1
+
-
€
D(β,β 0) = y i x iTβ − β 0( )
i∈M
∑
![Page 13: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/13.jpg)
Restating Classification
• The (supervised) classification problem can now be reformulated as an optimization problem.
• Goal: Find the hyperplane (,0), that optimizes the objective D(,0).
• No efficient algorithm is known for this problem, but a simple generic optimization can be applied.
• Start with a randomly chosen (,0)
• Move to a neighboring (’,’0) if D(’,’0)< D(,0)
![Page 14: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/14.jpg)
Gradient Descent
• The function D() defines the error.
• We follow an iterative refinement. In each step, refine so the error is reduced.
• Gradient descent is an approach to such iterative refinement.
D()
€
← + ρ⋅D'(β )
D’()
![Page 15: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/15.jpg)
Rosenblatt’s perceptron learning algorithm
€
D(β,β 0) = y i x iTβ − β 0( )
i∈M
∑
∂D(β,β 0)
∂β= y ix i
i∈M
∑
∂D(β,β 0)
∂β 0
= y i
i∈M
∑
⇒ Update rule : β
β 0
⎛
⎝ ⎜
⎞
⎠ ⎟=
β
β 0
⎛
⎝ ⎜
⎞
⎠ ⎟+ ρ
y ix i
y i
⎛
⎝ ⎜
⎞
⎠ ⎟
i
∑
![Page 16: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/16.jpg)
Classification based on perceptron learning
• Use Rosenblatt’s algorithm to compute the hyperplane L=(,0).
• Assign x to class 1 if D(x,L)= Tx-0 >= 0, and to class 2 otherwise.
x
0
![Page 17: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/17.jpg)
Perceptron learning
• If many solutions are possible, it does no choose between solutions
• If data is not linearly separable, it does not terminate, and it is hard to detect.
• Time of convergence is not well understood
![Page 18: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/18.jpg)
Linear Discriminant analysis
• Provides an alternative approach to classification with a linear function.
• Project all points, including the means, onto vector .
• We want to choose such that– Difference of projected
means is large.– Variance within group is
small
x2
x1
+
-
![Page 19: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/19.jpg)
Choosing the right
x2
x1
+
- 2
x2
x1
+
-
1
1 is a better choice than 2 as the variance within a group is small, and difference of means is large.
• How do we compute the best ?
![Page 20: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/20.jpg)
Linear Discriminant analysis
€
Maxβ
(difference of projected means)
(sum of projected variance)
• Fisher Criterion
![Page 21: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/21.jpg)
LDA cont’d
x2
x1
+
-
• What is the projection of a point x onto ?– Ans: Tx
• What is the distance between projected means?
x
€
˜ m 1 − ˜ m 22
= β T m1 − m2( )2€
˜ m 1€
˜ m 2
![Page 22: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/22.jpg)
LDA Cont’d
€
| ˜ m 1 − ˜ m 2 |2 = β T (m1 − m2)2
= β T (m1 − m2)(m1 − m2)T β
= β T SBβ
scatter within sample : ˜ s 12 + ˜ s 2
2
where, ˜ s 12 = (y − ˜ m 1
y
∑ )2 = (β T (x − m1)x∈D1
∑ )2 = β T S1β
˜ s 12 + ˜ s 2
2 = β T (S1 + S2)β = β T Swβ
€
maxβ
β T SBβ
β T SwβFisher Criterion
€
T
€
(m1 − m2)
€
(m1 − m2)T
€
€
€
T
€
SB
![Page 23: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/23.jpg)
LDA
€
Let maxβ
β T SBβ
β T Swβ= λ
Then, β T SBβ − λSwβ( ) = 0
⇒ λSwβ = SBβ
⇒ λβ = Sw−1SBβ
⇒ β = Sw−1(m1 − m2)
Therefore, a simple computation (Matrix inverse) is sufficient to compute the ‘best’ separating hyperplane
![Page 24: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/24.jpg)
Maximum Likelihood discrimination
• Consider the simple case of single dimensional data.
• Compute a distribution of the values in each class.
values
Pr
![Page 25: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/25.jpg)
Maximum Likelihood discrimination
• Suppose we knew the distribution of points in each class i.– We can compute Pr(x|i) for
all classes i, and take the maximum
• The true distribution is not known, so usually, we assume that it is Gaussian
![Page 26: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/26.jpg)
ML discrimination
• Suppose all the points were in 1 dimension, and all classes were normally distributed.
€
Pr(ωi | x) =Pr(x |ωi)Pr(ωi)
Pr(x |ω j )Pr(ω j )j∑
gi(x) = ln Pr(x |ωi)( ) + ln Pr(ωi)( )
≅−(x − μ i)
2
2σ i2
− ln(σ i) + ln Pr(ωi)( )
Choose argmini
(x − μ i)2
2σ i2
+ ln(σ i) − ln Pr(ωi)( ) ⎛
⎝ ⎜
⎞
⎠ ⎟
1 2x
![Page 27: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/27.jpg)
ML discrimination (multi-dimensional case)
€
p(x |ωi) =1
2π( )d
2 Σ1
2
exp −1
2x −m( )
TΣ−1 x −m( )
⎛
⎝ ⎜
⎞
⎠ ⎟
gi(x) = ln p(x |ωi)( ) + lnP(ωi)
Compute argmaxi gi(x)
Not part of the syllabus.
![Page 28: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/28.jpg)
Supervised classification summary
• Most techniques for supervised classification are based on the notion of a separating hyperplane.
• The ‘optimal’ separation can be computed using various combinatorial (perceptron), algebraic (LDA), or statistical (ML) analyses.
![Page 29: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/29.jpg)
Dimensionality reduction
• Many genes have highly correlated expression profiles.
• By discarding some of the genes, we can greatly reduce the dimensionality of the problem.
• There are other, more principled ways to do such dimensionality reduction.
![Page 30: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/30.jpg)
Principle Components Analysis
• Consider the expression values of 2 genes over 6 samples.
• Clearly, the expression of the two genes is highly correlated.
• Projecting all the genes on a single line could explain most of the data.
• This is a generalization of “discarding the gene”.
![Page 31: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/31.jpg)
PCA
• Suppose all of the data were to be reduced by projecting to a single line from the mean.
• How do we select the line ?
m
![Page 32: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/32.jpg)
PCA cont’d
• Let each point xk map to x’k. We want to mimimize the error
• Observation 1: Each point xk maps to x’k = m + T(xk-m)€
xk − x 'k2
k
∑ m
xk
x’k
![Page 33: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/33.jpg)
PCA: motivating example
• Consider the expression values of 2 genes over 6 samples.
• Clearly, the expression of g1 is not informative, and it suffices to look at g2 values.
• Dimensionality can be reduced by discarding the gene g1
g1
g2
![Page 34: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/34.jpg)
PCA: Ex2
• Consider the expression values of 2 genes over 6 samples.
• Clearly, the expression of the two genes is highly correlated.
• Projecting all the genes on a single line could explain most of the data.
![Page 35: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/35.jpg)
PCA
• Suppose all of the data were to be reduced by projecting to a single line from the mean.
• How do we select the line ?
m
![Page 36: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/36.jpg)
PCA cont’d
• Let each point xk map to x’k=m+ak. We want to minimize the error
• Observation 1: Each point xk maps to x’k = m + T(xk-m)– (ak= T(xk-m))€
xk − x 'k2
k
∑ m
xk
x’k
![Page 37: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/37.jpg)
Proof of Observation 1
€
minakxk − x'k
2
= minakxk − m + m − x'k
2
= minakxk − m
2+ m − x'k
2− 2(x'k −m)T (xk − m)
= minakxk − m
2+ ak
2β Tβ − 2akβT (xk − m)
= minakxk − m
2+ ak
2 − 2akβT (xk − m)
€
2ak − 2β T (xk − m) = 0
ak = β T (xk − m)
⇒ ak2 = akβ
T (xk − m)
⇒ xk − x 'k2
= xk − m2
− β T (xk − m)(xk − m)T β
Differentiating w.r.t ak
![Page 38: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/38.jpg)
Minimizing PCA Error
• To minimize error, we must maximize TS• By definition, = TS implies that is an eigenvalue, and
the corresponding eigenvector.• Therefore, we must choose the eigenvector
corresponding to the largest eigenvalue.
€
xk − x 'kk
∑2
= C − β T
k
∑ (xk − m)(xk − m)T β = C − β T Sβ
![Page 39: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/39.jpg)
PCA
• The single best dimension is given by the eigenvector of the largest eigenvalue of S
• The best k dimensions can be obtained by the eigenvectors {1, 2, …, k} corresponding to the k largest eigenvalues.
• To obtain the k dimensional surface, take BTM
BT
1T
M
![Page 40: L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d415503460f94a1b328/html5/thumbnails/40.jpg)