jean-philippe vert...
TRANSCRIPT
![Page 1: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/1.jpg)
Machine learning in post-genomic
Jean-Philippe [email protected]
Mines ParisTech / Institut Curie / Inserm
Atelier Statistique de la SFDS, Paris, November 27, 2008.
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 1 / 166
![Page 2: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/2.jpg)
Tissue classification from microarray data
GoalDesign a classifier toautomatically assign aclass to future samplesfrom their expressionprofileInterpret biologically thedifferences between theclasses
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 2 / 166
![Page 3: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/3.jpg)
Supervised sequence classification
Data (training)Secreted proteins:MASKATLLLAFTLLFATCIARHQQRQQQQNQCQLQNIEA...MARSSLFTFLCLAVFINGCLSQIEQQSPWEFQGSEVW...MALHTVLIMLSLLPMLEAQNPEHANITIGEPITNETLGWL......
Non-secreted proteins:MAPPSVFAEVPQAQPVLVFKLIADFREDPDPRKVNLGVG...MAHTLGLTQPNSTEPHKISFTAKEIDVIEWKGDILVVG...MSISESYAKEIKTAFRQFTDFPIEGEQFEDFLPIIGNP.....
GoalBuild a classifier to predict whether new proteins are secreted ornot.
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 3 / 166
![Page 4: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/4.jpg)
Ligand-Based Virtual Screening and QSAR
inactive
active
active
active
inactive
inactive
NCI AIDS screen results (from http://cactus.nci.nih.gov).Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 4 / 166
![Page 5: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/5.jpg)
Outline
1 Pattern recognition and regression
2 Support vector machines
3 Classification of CGH data
4 Classification of expression data
5 Classification of biological sequences
6 Virtual screening and QSAR
7 Conclusion
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 5 / 166
![Page 6: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/6.jpg)
Outline
1 Pattern recognition and regression
2 Support vector machines
3 Classification of CGH data
4 Classification of expression data
5 Classification of biological sequences
6 Virtual screening and QSAR
7 Conclusion
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 5 / 166
![Page 7: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/7.jpg)
Outline
1 Pattern recognition and regression
2 Support vector machines
3 Classification of CGH data
4 Classification of expression data
5 Classification of biological sequences
6 Virtual screening and QSAR
7 Conclusion
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 5 / 166
![Page 8: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/8.jpg)
Outline
1 Pattern recognition and regression
2 Support vector machines
3 Classification of CGH data
4 Classification of expression data
5 Classification of biological sequences
6 Virtual screening and QSAR
7 Conclusion
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 5 / 166
![Page 9: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/9.jpg)
Outline
1 Pattern recognition and regression
2 Support vector machines
3 Classification of CGH data
4 Classification of expression data
5 Classification of biological sequences
6 Virtual screening and QSAR
7 Conclusion
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 5 / 166
![Page 10: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/10.jpg)
Outline
1 Pattern recognition and regression
2 Support vector machines
3 Classification of CGH data
4 Classification of expression data
5 Classification of biological sequences
6 Virtual screening and QSAR
7 Conclusion
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 5 / 166
![Page 11: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/11.jpg)
Outline
1 Pattern recognition and regression
2 Support vector machines
3 Classification of CGH data
4 Classification of expression data
5 Classification of biological sequences
6 Virtual screening and QSAR
7 Conclusion
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 5 / 166
![Page 12: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/12.jpg)
Outline
1 Pattern recognition and regression
2 Support vector machines
3 Classification of CGH data
4 Classification of expression data
5 Classification of biological sequences
6 Virtual screening and QSAR
7 Conclusion
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 6 / 166
![Page 13: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/13.jpg)
Pattern recognition, aka supervised classification
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 7 / 166
![Page 14: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/14.jpg)
Regression
From Hastie et al. (2001) The elements of statistical learning
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 8 / 166
![Page 15: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/15.jpg)
Formalization
InputX the space of patterns (typically, X = Rp)Y the space of response or labels
Regression : Y = RPattern recognition : Y = −1, 1
S = (x1, y1) , . . . , (xn, yn) a training set in (X × Y)n
OutputA function f : X → Y to predict the output associated to any newpattern x ∈ X by f (x)
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 9 / 166
![Page 16: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/16.jpg)
Examples
Least-square regressionNearest neighborsDecision treesNeural networksLogistic regressionPLSSVM
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 10 / 166
![Page 17: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/17.jpg)
Outline
1 Pattern recognition and regressionEmpirical risk minimizationFeature selectionShrinkage methods
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 11 / 166
![Page 18: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/18.jpg)
Probabilistic formalism
RiskP an (unknown) distribution on X × Y.Observation: Sn = (Xi , Yi)i=1,...,n i.i.d. random variables accordingto P.Loss function ` (f (x) , y) ∈ R small when f (x) is a good predictorfor yRisk: R(f ) = El (f (X ) , Y ).Estimator fn : X → Y.
Goal: small risk R(
fn)
.
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 12 / 166
![Page 19: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/19.jpg)
Loss for regression
Square loss : ` (f (x) , y) = (f (x)− y)2
ε-insensitive loss : ` (f (x) , y) = (| f (x)− y | − ε)+Huber loss : mixed quadratic/linear
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 13 / 166
![Page 20: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/20.jpg)
Loss for pattern recognition
Large margin classifiersFor pattern recognition Y = −1, 1Estimate a function f : X → R.The margin of the function f for a pair (x, y) is: yf (x).The loss function is usually a decreasing function of the margin :` (f (x) , y) = φ (yf (x)),
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 14 / 166
![Page 21: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/21.jpg)
Empirical risk minimization (ERM)
ERM estimatorF a class of candidate functions (e.g., linear functions)The empirical risk is:
Rn(f ) =1n
n∑i=1
` (f (Xi) , Yi) .
The ERM estimator on the functional class F is the solution (whenit exists) of:
fn = arg minf∈F
Rn(f ) .
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 15 / 166
![Page 22: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/22.jpg)
Example: least squares linear regression
X = Rp , Y = RX the n × p matrix of patterns, y the n × 1 vector of outputsLinear estimator:
fβ(x) = β0 +
p∑i=1
xiβi
ERM estimator for the square loss:
minβ∈Rp+1
Rn(fβ) =n∑
i=1
(fβ (xi)− yi)2
= (y − Xβ)> (y − Xβ)
(1)
Explicit solution:
β =(
X>X)−1
X>y .
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 16 / 166
![Page 23: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/23.jpg)
Example: least squares linear regression
X = Rp , Y = RX the n × p matrix of patterns, y the n × 1 vector of outputsLinear estimator:
fβ(x) = β0 +
p∑i=1
xiβi
ERM estimator for the square loss:
minβ∈Rp+1
Rn(fβ) =n∑
i=1
(fβ (xi)− yi)2
= (y − Xβ)> (y − Xβ)
(1)
Explicit solution:
β =(
X>X)−1
X>y .
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 16 / 166
![Page 24: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/24.jpg)
Example: least squares linear regression
X = Rp , Y = RX the n × p matrix of patterns, y the n × 1 vector of outputsLinear estimator:
fβ(x) = β0 +
p∑i=1
xiβi
ERM estimator for the square loss:
minβ∈Rp+1
Rn(fβ) =n∑
i=1
(fβ (xi)− yi)2
= (y − Xβ)> (y − Xβ)
(1)
Explicit solution:
β =(
X>X)−1
X>y .
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 16 / 166
![Page 25: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/25.jpg)
Example: pattern recognition with the hinge loss
X = Rp , Y = −1, 1Linear estimator:
fβ(x) = sign(x>β)
ERM estimator for the hinge loss:
minβ∈Rp
Rn(fβ) =n∑
i=1
max (0, 1− yi f (xi))
Equivalent to the linear program
minn∑
i=1
ξi
subject to ξi ≥ 0, ξi ≥ 1− yix>i β
(2)
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 17 / 166
![Page 26: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/26.jpg)
Example: pattern recognition with the hinge loss
X = Rp , Y = −1, 1Linear estimator:
fβ(x) = sign(x>β)
ERM estimator for the hinge loss:
minβ∈Rp
Rn(fβ) =n∑
i=1
max (0, 1− yi f (xi))
Equivalent to the linear program
minn∑
i=1
ξi
subject to ξi ≥ 0, ξi ≥ 1− yix>i β
(2)
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 17 / 166
![Page 27: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/27.jpg)
Example: pattern recognition with the hinge loss
X = Rp , Y = −1, 1Linear estimator:
fβ(x) = sign(x>β)
ERM estimator for the hinge loss:
minβ∈Rp
Rn(fβ) =n∑
i=1
max (0, 1− yi f (xi))
Equivalent to the linear program
minn∑
i=1
ξi
subject to ξi ≥ 0, ξi ≥ 1− yix>i β
(2)
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 17 / 166
![Page 28: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/28.jpg)
Other ERM methods : convex optimization
For other losses, there is generally no explicit analytical formulafor the solutionHowever, if the loss function is convex in f , then we end up with aconvex optimization problem that can usually be solved efficiently
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 18 / 166
![Page 29: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/29.jpg)
Limits of ERM
Unfortunately, the ERM estimator can be:ill-posednot statistically consistent (i.e., bad accuracy)
This is particularly the case in high dimension...
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 19 / 166
![Page 30: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/30.jpg)
ERM is ill-posed in high dimension
Suppose n < pThen X>X is not invertible, so the least-square estimatore(X>X
)−1 X>y is not defined.More precisely, there are an infinite number of solution thatminimize the empirical risk to 0.
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 20 / 166
![Page 31: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/31.jpg)
ERM is ill-posed in high dimension
Suppose n < pThen X>X is not invertible, so the least-square estimatore(X>X
)−1 X>y is not defined.More precisely, there are an infinite number of solution thatminimize the empirical risk to 0.
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 20 / 166
![Page 32: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/32.jpg)
ERM is not consistent
From the law of large numbers, for any f ∈ F , the empirical riskconverges to the true risk when the sample size increases:
∀f ∈ F , Rn(f ) →n→∞
R(f )
This suggest that minimizing Rn(f ) should give a good estimatorof the minimizer of R(f ), but...Unfortunately it is not so simple! Vapnik in particular showed thatthis is only true if the "capacity" of F is not too large
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 21 / 166
![Page 33: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/33.jpg)
Solution
Restrict the space of hypothesisA solution to work in high dimension is to restrict the space offunctions F over which ERM is applied:
minf∈F
Rn(f )
We will focus on linear functions f (x) = x>β, and put variousconstraints on β
Restrict the number of non-zero components (feature selection)Restrict the size of β, for some norm (shrinkage methods)
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 22 / 166
![Page 34: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/34.jpg)
The bias / variance trade-off
When F is small, the ERM principle is efficient to find a goodsolution among F , i.e.:
R(f ) ∼ inff∈F
R(f )
We say that the variance is small.When F is large, then the best solution in F is close to the bestsolution possible:
inff∈F
R(f ) ∼ inff
R(f )
We say that the bias is small.A good estimator should have a small bias and small varianceTherefore it is important to put prior knowledge on the design ofF , to make it as small as possible (small variance) but make sureit contains good functions (small bias)
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 23 / 166
![Page 35: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/35.jpg)
Outline
1 Pattern recognition and regressionEmpirical risk minimizationFeature selectionShrinkage methods
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 24 / 166
![Page 36: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/36.jpg)
Motivation
In feature selection, we look for a linear function f (x) = x>β,where only a limited number of coefficients in β are non-zero.Motivations
Accuracy: by restricting F , we increase the bias but decrease thevariance. This should be helpful in particular in high dimension,where bias is low and variance is large.Interpretation: with a large number of predictors, we often wouldlike to determine a smaller subset that exhibit the strongest effects.
Of course, this is particularly relevant if we believe that there existgood predictors which are sparse (prior knowledge).
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 25 / 166
![Page 37: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/37.jpg)
Best subset selection
In best subset selection, we must solve the problem:
min R(fβ) s.t. ‖β ‖0 ≤ k
for k = 1, . . . , p.The state-of-the-art is branch-and-bound optimization, known asleaps and bound for least squares (Furnival and Wilson, 1974).This is usually a NP-hard problem, feasible for p as large as 30 or40
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 26 / 166
![Page 38: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/38.jpg)
Efficient feature selection
To work with more variables, we must use different methods. Thestate-of-the-art is split among
Filter methods : the predictors are preprocessed and ranked fromthe most relevant to the less relevant. The subsets are thenobtained from this list, starting from the top.Wrapper method: here the feature selection is iterative, and usesthe ERM algorithm in the inner loopEmbedded methods : here the feature selection is part of theERM algorithm itself (see later the shrinkage estimators).
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 27 / 166
![Page 39: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/39.jpg)
Filter methods
Associate a score S(i) to each feature i , then rank the features bydecreasing score.Many scores / criteria can be used
Loss of the ERM trained on a single featureStatistical tests (Fisher, T-test)Other performance criteria of the ERM restricted to a single feature(AUC, ...)Information theoretical criteria (mutual information...)
ProsSimple, scalable, good empirical success
ConsSelection of redundant featuresSome variables useless alone can become useful together
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 28 / 166
![Page 40: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/40.jpg)
Filter methods
Associate a score S(i) to each feature i , then rank the features bydecreasing score.Many scores / criteria can be used
Loss of the ERM trained on a single featureStatistical tests (Fisher, T-test)Other performance criteria of the ERM restricted to a single feature(AUC, ...)Information theoretical criteria (mutual information...)
ProsSimple, scalable, good empirical success
ConsSelection of redundant featuresSome variables useless alone can become useful together
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 28 / 166
![Page 41: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/41.jpg)
Filter methods
Associate a score S(i) to each feature i , then rank the features bydecreasing score.Many scores / criteria can be used
Loss of the ERM trained on a single featureStatistical tests (Fisher, T-test)Other performance criteria of the ERM restricted to a single feature(AUC, ...)Information theoretical criteria (mutual information...)
ProsSimple, scalable, good empirical success
ConsSelection of redundant featuresSome variables useless alone can become useful together
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 28 / 166
![Page 42: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/42.jpg)
Wrapper methods
The ideaA greedy approach to
min Rn(fβ) s.t. ‖β ‖0 ≤ k
For a given set of seleted features, we know how to minimizeRn(f )We iteratively try to find a good set of features, byadding/removing features which contribute most to decrease therisk (using ERM as an internal loop)
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 29 / 166
![Page 43: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/43.jpg)
Two flavors of wrapper methods
Forward stepwise selectionStart from no featuresSequentially add into the model the feature that most improves thefit
Backward stepwise selection (if n>p)Start from all featuresSequentially removes from the model the feature that leastdegrades the fit
Other variantsHybrid stepwise selection strategies that consider both forward andbackward moves at each stage, and make the "best" move
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 30 / 166
![Page 44: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/44.jpg)
Two flavors of wrapper methods
Forward stepwise selectionStart from no featuresSequentially add into the model the feature that most improves thefit
Backward stepwise selection (if n>p)Start from all featuresSequentially removes from the model the feature that leastdegrades the fit
Other variantsHybrid stepwise selection strategies that consider both forward andbackward moves at each stage, and make the "best" move
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 30 / 166
![Page 45: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/45.jpg)
Two flavors of wrapper methods
Forward stepwise selectionStart from no featuresSequentially add into the model the feature that most improves thefit
Backward stepwise selection (if n>p)Start from all featuresSequentially removes from the model the feature that leastdegrades the fit
Other variantsHybrid stepwise selection strategies that consider both forward andbackward moves at each stage, and make the "best" move
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 30 / 166
![Page 46: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/46.jpg)
Outline
1 Pattern recognition and regressionEmpirical risk minimizationFeature selectionShrinkage methods
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 31 / 166
![Page 47: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/47.jpg)
The idea
The following problem is NP-hard:
min R(fβ) s.t. ‖β ‖0 ≤ k
As a proxy we can consider the more general problem:
min R(fβ) s.t. Ω(β) ≤ γ
where Ω(β) is a penalty function.
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 32 / 166
![Page 48: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/48.jpg)
Motivation
Accuracy: as for feature selection, we reduce F , hence reducevarianceInclusion of prior knowledge: Ω(β) is the place to put your priorknowledge to reduce the biasComputational efficiency: if R(f ) and Ω(β) are convex, then weobtain a convex optimization problem that can often be solvedexactly and efficiently. It is then equivalent to:
min R(fβ) + λΩ(β)
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 33 / 166
![Page 49: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/49.jpg)
Ridge regression
Take Ω(β) =∑p
i=1 β2i = ‖β ‖2
2.Constrained least-square:
minβ∈Rp+1
Rn(fβ) =n∑
i=1
(fβ (xi)− yi)2 + λ
p∑i=1
β2i
= (y − Xβ)> (y − Xβ) + λβ>β .
(3)
Explicit solution:
β =(
X>X + λI)−1
X>y .
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 34 / 166
![Page 50: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/50.jpg)
Ridge regression
Take Ω(β) =∑p
i=1 β2i = ‖β ‖2
2.Constrained least-square:
minβ∈Rp+1
Rn(fβ) =n∑
i=1
(fβ (xi)− yi)2 + λ
p∑i=1
β2i
= (y − Xβ)> (y − Xβ) + λβ>β .
(3)
Explicit solution:
β =(
X>X + λI)−1
X>y .
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 34 / 166
![Page 51: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/51.jpg)
Ridge regression example
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 35 / 166
![Page 52: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/52.jpg)
LASSO regression
Take Ω(β) =∑p
i=1 |βi | = ‖β ‖1.Constrained least-square:
minβ∈Rp+1
Rn(fβ) =n∑
i=1
(fβ (xi)− yi)2 + λ
p∑i=1
|βi | (4)
No explicit solution, but this is just a quadratic program.LARS (Efron et al., 2004) provides a fast algorithm to compute thesolution for all λ’s simultaneously (regularization path)
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 36 / 166
![Page 53: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/53.jpg)
LASSO regression
Take Ω(β) =∑p
i=1 |βi | = ‖β ‖1.Constrained least-square:
minβ∈Rp+1
Rn(fβ) =n∑
i=1
(fβ (xi)− yi)2 + λ
p∑i=1
|βi | (4)
No explicit solution, but this is just a quadratic program.LARS (Efron et al., 2004) provides a fast algorithm to compute thesolution for all λ’s simultaneously (regularization path)
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 36 / 166
![Page 54: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/54.jpg)
LASSO regression example
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 37 / 166
![Page 55: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/55.jpg)
Why LASSO leads to sparse solutions
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 38 / 166
![Page 56: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/56.jpg)
Summary
ERM is a popular induction principle, which underlies manyalgorithms for regression and pattern recognitionIn high dimension we must be carefulConstrained ERM provides a coherent and nice framework
minf
Rn(f ) s.t. Ω(f ) < γ
A strong constraint (γ small) reduces the variance but increasesthe biasThe key idea to learn in high dimension is to use prior knowledgeto design Ω(f ) to ensure a small bias.
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 39 / 166
![Page 57: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/57.jpg)
Outline
1 Pattern recognition and regression
2 Support vector machines
3 Classification of CGH data
4 Classification of expression data
5 Classification of biological sequences
6 Virtual screening and QSAR
7 Conclusion
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 40 / 166
![Page 58: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/58.jpg)
Motivation
SVM is just a particular constrained ERM algorithmIt became extremely popular in many applied fields over the last10 yearsIt allows to extend considerably the hypothesis space F beyondlinear functions, thanks to the use of positive definite kernels (It also allows to extend most linear methods to structured objects,e.g., strings and graphs.
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 41 / 166
![Page 59: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/59.jpg)
Linear SVM for pattern recognition
yf(x)
l(f(x),y)
1
X = Rp,Y = −1, 1Linear classifiers:
fβ (x) = x>β .
The loss function is the hinge loss:
φhinge(u) = max (1− u, 0) =
0 if u ≥ 1,
1− u otherwise.
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 42 / 166
![Page 60: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/60.jpg)
Linear SVM for pattern recognition
SVM solve the problem:
minfβ∈F
1n
n∑i=1
φhinge (yi fβ (xi)) s.t. ‖β ‖22 ≤ γ .
Equivalently
minfβ∈F
1n
n∑i=1
φhinge (yi fβ (xi)) + λ‖β ‖22
.
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 43 / 166
![Page 61: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/61.jpg)
Dual problem
This is a convex optimization problem. It is equivalent to thefollowing dual problem (good exercice to derive it):
maxα∈Rd
2n∑
i=1
αiyi −n∑
i,j=1
αiαjx>i xj ,
subject to:
0 ≤ yiαi ≤1
2λn, for i = 1, . . . , n .
If α solves this problem, we recover the solution of the primalproblem by:
fβ (x) =n∑
i=1
αix>i x .
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 44 / 166
![Page 62: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/62.jpg)
fβ (x) =∑n
i=1 αix>i x
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 45 / 166
![Page 63: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/63.jpg)
fβ (x) =∑n
i=1 αix>i x
f(x)=
−1
f(x)=
+1
f(x)=
0
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 45 / 166
![Page 64: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/64.jpg)
fβ (x) =∑n
i=1 αix>i x
0<α
α=0
y<1/2n
αy=1/2nλ
λ
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 45 / 166
![Page 65: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/65.jpg)
Support vectors
Consequence of KKT conditionsThe training points with αi 6= 0 are called support vectors.Only support vectors are important for the classification of newpoints:
∀x ∈ X , f (x) =n∑
i=1
αix>i x =∑i∈SV
αix>i x ,
where SV is the set of support vectors.
ConsequencesThe solution is sparse in α, leading to fast algorithms for training(use of decomposition methods).The classification of a new point only involves kernel evaluationswith support vectors (fast).
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 46 / 166
![Page 66: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/66.jpg)
An important remark
Training a SVM means finding α ∈ Rn which solves:
maxα∈Rd
2n∑
i=1
αiyi −n∑
i,j=1
αiαjx>i xj ,
subject to:
0 ≤ yiαi ≤1
2λn, for i = 1, . . . , n .
The prediction for a new point x is the sign of
f (x) =n∑
i=1
αix>i x .
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 47 / 166
![Page 67: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/67.jpg)
Kernels
Let the kernel function:
K(x, x′
)= x>x′ .
Training a SVM means finding α ∈ Rn which solves:
maxα∈Rd
2n∑
i=1
αiyi −n∑
i,j=1
αiαjK (xi , xj) ,
subject to:
0 ≤ yiαi ≤1
2λn, for i = 1, . . . , n .
The prediction for a new point x is the sign of
f (x) =n∑
i=1
αiK (xi , x) .Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 48 / 166
![Page 68: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/68.jpg)
Extension
Let X be any set, andΦ : X → H
an embedding in a Hilbert space (H = Rp with p finite or infinite)Then we can train and use a SVM implicitly in H if we are able tocompute the kernel:
K(x, x′
)= φ(x)>Φ(x′) .
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 49 / 166
![Page 69: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/69.jpg)
Extension
Let X be any set, andΦ : X → H
an embedding in a Hilbert space (H = Rp with p finite or infinite)Then we can train and use a SVM implicitly in H if we are able tocompute the kernel:
K(x, x′
)= φ(x)>Φ(x′) .
φ
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 49 / 166
![Page 70: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/70.jpg)
Extension
Let X be any set, andΦ : X → H
an embedding in a Hilbert space (H = Rp with p finite or infinite)Then we can train and use a SVM implicitly in H if we are able tocompute the kernel:
K(x, x′
)= φ(x)>Φ(x′) .
φ
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 49 / 166
![Page 71: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/71.jpg)
Extension
Let X be any set, andΦ : X → H
an embedding in a Hilbert space (H = Rp with p finite or infinite)Then we can train and use a SVM implicitly in H if we are able tocompute the kernel:
K(x, x′
)= φ(x)>Φ(x′) .
φ
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 49 / 166
![Page 72: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/72.jpg)
Example: polynomial kernel
2R
x1
x2
x1
x2
2
For x = (x1, x2)> ∈ R2, let Φ(x) = (x2
1 ,√
2x1x2, x22 ) ∈ R3:
K (x , x ′) = x21 x ′21 + 2x1x2x ′1x ′2 + x2
2 x ′22
=(x1x ′1 + x2x ′2
)2
=(
x>x ′)2
.
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 50 / 166
![Page 73: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/73.jpg)
Positive Definite (p.d.) Kernels
DefinitionA positive definite (p.d.) kernel on the set X is a functionK : X × X → R symmetric:
∀(x, x′
)∈ X 2, K
(x, x′
)= K
(x′, x
),
and which satisfies, for all N ∈ N, (x1, x2, . . . , xN) ∈ XN et(a1, a2, . . . , aN) ∈ RN :
N∑i=1
N∑j=1
aiajK(xi , xj
)≥ 0.
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 51 / 166
![Page 74: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/74.jpg)
Characterization of inner products
Theorem (Aronszajn, 1950)K is a p.d. kernel on the set X if and only if there exists a Hilbert spaceH and a mapping
Φ : X 7→ H ,
such that, for any x, x′ in X :
K(x, x′
)=⟨Φ (x) ,Φ
(x′)⟩H .
φX F
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 52 / 166
![Page 75: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/75.jpg)
Examples
Classical kernels for vectors (X = Rp) include:The linear kernel
Klin(x, x′
)= x>x′ .
The polynomial kernel
Kpoly(x, x′
)=(
x>x′ + a)d
.
The Gaussian RBF kernel:
KGaussian(x, x′
)= exp
(−‖x− x′ ‖2
2σ2
).
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 53 / 166
![Page 76: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/76.jpg)
Summary
Kernels allow to apply linear methods in a much larger space (Fincreases, bias decreases) without changing the algorithmThis can be generalized to any ERM constrained by the Euclideannorm (kernel ridge regression ...)Allows to infer nonlinear functionsAllows to work with non-vector space (see later: strings, graphs,...)Include prior knowledge in the kernel
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 54 / 166
![Page 77: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/77.jpg)
Outline
1 Pattern recognition and regression
2 Support vector machines
3 Classification of CGH data
4 Classification of expression data
5 Classification of biological sequences
6 Virtual screening and QSAR
7 Conclusion
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 55 / 166
![Page 78: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/78.jpg)
Motivation
Comparative genomic hybridization (CGH) data measure the DNAcopy number along the genomeVery useful, in particular in cancer researchCan we classify CGH arrays for diagnosis or prognosis purpose?
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 56 / 166
![Page 79: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/79.jpg)
Prior knowledge
Let x be a CGH profileWe focus on linear classifiers, i.e., the sign of :
f (x) = x>β .
We expect β to besparse : only a few positions should be discriminativepiecewise constant : within a region, all probes should contributeequally
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 57 / 166
![Page 80: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/80.jpg)
Example: CGH array classification
A solution (Rapaport et al., 2008)
Ωfusedlasso(β) =∑
i
|βi |+∑i∼j
|βi − βj | .
Good performance on diagnosis for bladder cancer, and prognosisfor melanoma.More interpretable classifiers
0 500 1000 1500 2000 2500 3000 3500 4000−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
BAC
Wei
ght
0 500 1000 1500 2000 2500 3000 3500 4000−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
BAC
Wei
ght
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 58 / 166
![Page 81: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/81.jpg)
Outline
1 Pattern recognition and regression
2 Support vector machines
3 Classification of CGH data
4 Classification of expression data
5 Classification of biological sequences
6 Virtual screening and QSAR
7 Conclusion
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 59 / 166
![Page 82: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/82.jpg)
Outline
4 Classification of expression dataMotivationUsing gene networks as prior knowledgeApplication
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 60 / 166
![Page 83: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/83.jpg)
Tissue profiling with DNA chips
DataGene expression measures for more than 10k genesMeasured typically on less than 100 samples of two (or more)different classes (e.g., different tumors)
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 61 / 166
![Page 84: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/84.jpg)
Tissue classification from microarray data
GoalDesign a classifier toautomatically assign aclass to future samplesfrom their expressionprofileInterpret biologically thedifferences between theclasses
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 62 / 166
![Page 85: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/85.jpg)
Linear classifiers
The approachEach sample is represented by a vector x = (x1, . . . , xp) wherep > 105 is the number of probesClassification: given the set of labeled sample, learn a lineardecision function:
fβ(x) =
p∑i=1
βixi + β0 ,
that is positive for one class, negative for the otherInterpretation: the weight βi quantifies the influence of gene i forthe classificationWe must use prior knowledge for this small n large p problem.
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 63 / 166
![Page 86: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/86.jpg)
Outline
4 Classification of expression dataMotivationUsing gene networks as prior knowledgeApplication
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 64 / 166
![Page 87: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/87.jpg)
Gene networks
N
-
Glycan biosynthesis
Protein kinases
DNA and RNA polymerase subunits
Glycolysis / Gluconeogenesis
Sulfurmetabolism
Porphyrinand chlorophyll metabolism
Riboflavin metabolism
Folatebiosynthesis
Biosynthesis of steroids, ergosterol metabolism
Lysinebiosynthesis
Phenylalanine, tyrosine andtryptophan biosynthesis Purine
metabolism
Oxidative phosphorylation, TCA cycle
Nitrogen,asparaginemetabolism
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 65 / 166
![Page 88: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/88.jpg)
Gene network interpretation
MotivationBasic biological functions usually involve the coordinated action ofseveral proteins:
Formation of protein complexesActivation of metabolic, signalling or regulatory pathways
Many pathways and protein-protein interactions are already knownHypothesis: the weights of the classifier should be “coherent” withrespect to this prior knowledge
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 66 / 166
![Page 89: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/89.jpg)
The idea
1 Use the gene network to extract the “important information” ingene expression profiles by Fourier analysis on the graph
2 Learn a linear classifier on the smooth components
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 67 / 166
![Page 90: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/90.jpg)
Notations
1
2
3
4
5
A =
0 0 1 0 00 0 1 0 01 1 0 1 00 0 1 0 10 0 0 1 0
, D =
1 0 0 0 00 1 0 0 00 0 3 0 00 0 0 2 00 0 0 0 1
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 68 / 166
![Page 91: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/91.jpg)
Graph Laplacian
DefinitionThe Laplacian of the graph is the matrix L = D − A.
1
2
3
4
5
L = D − A =
1 0 −1 0 00 1 −1 0 0−1 −1 3 −1 00 0 −1 2 −10 0 0 1 1
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 69 / 166
![Page 92: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/92.jpg)
Properties of the Laplacian
LemmaLet L = D − A be the Laplacian of the graph:
For any f : X → R,
f>Lf =∑i∼j
(f (xi)− f
(xj))2
L is a symmetric positive semi-definite matrix0 is an eigenvalue with multiplicity equal to the number ofconnected components.
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 70 / 166
![Page 93: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/93.jpg)
Proof: link between Ω(f ) and L
∑i∼j
(f (xi)− f
(xj))2
=∑i∼j
(f (xi)
2 + f(xj)2 − 2f (xi) f
(xj))
=m∑
i=1
Di,i f (xi)2 − 2
∑i∼j
f (xi) f(xj)
= f>Df − f>Af
= f>Lf
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 71 / 166
![Page 94: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/94.jpg)
Proof: eigenstructure of L
L is symmetric because A and D are symmetric.For any f ∈ Rm, f>Lf ≥ 0, therefore the (real-valued) eigenvaluesof L are ≥ 0 : L is therefore positive semi-definite.f is an eigenvector associated to eigenvalue 0iff f>Lf = 0iff∑
i∼j(f (xi)− f
(xj))2
= 0 ,iff f (xi) = f
(xj)
when i ∼ j ,iff f is constant (because the graph is connected).
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 72 / 166
![Page 95: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/95.jpg)
Fourier basis
DefinitionThe eigenvectors e1, . . . , en of L with eigenvalues0 = λ1 ≤ . . . ≤ λn form a basis called Fourier basisFor any f : V → R, the Fourier transform of f is the vector f ∈ Rn
defined by:fi = f>ei , i = 1, . . . , n.
Obviously the inverse Fourier formula holds:
f =n∑
i=1
fiei .
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 73 / 166
![Page 96: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/96.jpg)
Fourier basis
λ= 4.2
λ=0 λ= 0.5 λ= 1
λ= 2.3
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 74 / 166
![Page 97: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/97.jpg)
Fourier basis
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 75 / 166
![Page 98: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/98.jpg)
Smoothing operator
DefinitionLet φ : R+ → R+ be non-increasing.A smoothing operator Sφ transform a function f : V → R into asmoothed version:
Sφ(f ) =n∑
i=1
fiφ(λi)ei .
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 76 / 166
![Page 99: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/99.jpg)
Smoothing operators
ExamplesIdentity operator (Sφ(f ) = f ):
φ(λ) = 1 , ∀λ
Low-pass filter:
φ(λ) =
1 if λ ≤ λ∗ ,
0 otherwise.
Attenuation of high frequencies:
φ(λ) = exp(−βλ) .
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 77 / 166
![Page 100: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/100.jpg)
Smoothing operators
ExamplesIdentity operator (Sφ(f ) = f ):
φ(λ) = 1 , ∀λ
Low-pass filter:
φ(λ) =
1 if λ ≤ λ∗ ,
0 otherwise.
Attenuation of high frequencies:
φ(λ) = exp(−βλ) .
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 77 / 166
![Page 101: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/101.jpg)
Smoothing operators
ExamplesIdentity operator (Sφ(f ) = f ):
φ(λ) = 1 , ∀λ
Low-pass filter:
φ(λ) =
1 if λ ≤ λ∗ ,
0 otherwise.
Attenuation of high frequencies:
φ(λ) = exp(−βλ) .
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 77 / 166
![Page 102: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/102.jpg)
Supervised classification and regression
Working with smoothed profilesClassical methods for linear classification and regression with aridge penalty solve:
minβ∈Rp
1n
n∑i=1
l(β>fi , yi
)+ λβ>β .
Applying these algorithms on the smooth profiles means solving:
minβ∈Rp
1n
n∑i=1
l(β>Sφ(fi), yi
)+ λβ>β .
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 78 / 166
![Page 103: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/103.jpg)
Smooth solution
LemmaThis is equivalent to:
minv∈Rp
1n
n∑i=1
l(
v>fi , yi
)+ λ
p∑i=1
v2i
φ(λi),
hence the linear classifier v is smooth.
Proof
Let v =∑n
i=1 φ(λi)eie>i β, then
β>Sφ(fi) = β>n∑
i=1
fiφ(λi)ei = f>v .
Then vi = φ(λi)βi and β>β =∑n
i=1v2
iφ(λi )2 .
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 79 / 166
![Page 104: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/104.jpg)
Smooth solution
LemmaThis is equivalent to:
minv∈Rp
1n
n∑i=1
l(
v>fi , yi
)+ λ
p∑i=1
v2i
φ(λi),
hence the linear classifier v is smooth.
Proof
Let v =∑n
i=1 φ(λi)eie>i β, then
β>Sφ(fi) = β>n∑
i=1
fiφ(λi)ei = f>v .
Then vi = φ(λi)βi and β>β =∑n
i=1v2
iφ(λi )2 .
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 79 / 166
![Page 105: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/105.jpg)
Kernel methods
Smoothing kernelKernel methods (SVM, kernel ridge regression..) only need the innerproduct between smooth profiles:
K (f , g) = Sφ(f )>Sφ(g)
=n∑
i=1
fi giφ(λi)2
= f>(
n∑i=1
φ(λi)2eie>i
)g
= f>Kφg ,
(5)
with
Kφ =n∑
i=1
φ(λi)2eie>i .
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 80 / 166
![Page 106: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/106.jpg)
Examples
For φ(λ) = exp(−tλ), we recover the diffusion kernel:
Kφ = expM(−2tL) .
For φ(λ) = 1/√
1 + λ, we obtain
Kφ = (L + I)−1 ,
and the penalization is:
n∑i=1
v2i
φ(λi)= v> (L + I) v = ‖ v ‖2
2 +∑i∼j
(vi − vj)2 .
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 81 / 166
![Page 107: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/107.jpg)
Examples
For φ(λ) = exp(−tλ), we recover the diffusion kernel:
Kφ = expM(−2tL) .
For φ(λ) = 1/√
1 + λ, we obtain
Kφ = (L + I)−1 ,
and the penalization is:
n∑i=1
v2i
φ(λi)= v> (L + I) v = ‖ v ‖2
2 +∑i∼j
(vi − vj)2 .
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 81 / 166
![Page 108: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/108.jpg)
Outline
4 Classification of expression dataMotivationUsing gene networks as prior knowledgeApplication
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 82 / 166
![Page 109: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/109.jpg)
Data
ExpressionStudy the effect of low irradiation doses on the yeast12 non irradiated vs 6 irradiatedWhich pathways are involved in the response at the transcriptomiclevel?
GraphKEGG database of metabolic pathwaysTwo genes are connected is they code for enzymes that catalyzesuccessive reactions in a pathway (metabolic gene network).737 genes, 4694 vertices.
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 83 / 166
![Page 110: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/110.jpg)
Classification performance
Spectral analysis of gene expression profiles using gene networks
a)
PC1
pyruvate metabolism glucose metabolism carbohydrate metabolism ergosterol biosynthesis
PC2
trehalose biosynthesis carboxylic acid metabolism
PC3
KEGG glycolysis
b)
PC1
protein kinases cluster RNA and DNA polymerase subunits
PC2
Glycolysis/Gluconeogenesis Citrate cycle (TCA cycle) CO2 fixation
PC3
N-Glycan biosynthesis Glycerophospholipid metabolism Alanine and aspartate metabolism riboflavin metabolism
Fig. 2. PCA plots of the initial expression profiles (a) and the transformed profiles using network topology (80% of the eigenvalues removed)
(b). The green squares are non-irradiated samples and the red rhombuses are irradiated samples. Individual sample labels are shown together
with GO and KEGG annotations associated with each principal component.
0 10 20 30 40 50
05
10
15
Beta
LO
O h
ing
e lo
ss /
err
or
ErrorHinge
0 20 40 60 80 100
05
10
15
Percentage of eigenvalues removed
LO
O h
ing
e lo
ss /
err
or
ErrorHinge
Fig. 3. Performance of the supervised classification when changing the metric with the function !exp(") = exp(!#") for different valuesof # (left picture), or the function !thres(") = 1(" < "0) for different values of "0 (i.e., keeping only a fraction of the smallest eigenvalues,
right picture). The performance is estimated from the number of misclassifications in a leave-one-out error.
shift). The reconstruction of this from our data with no prior
input of this knowledge strongly confirms the relevance of our
analysis method. It also shows that analysing expression in
terms of the global up- or down-regulation of entire pathways
as defined, for example, by KEGG, could mislead as there are
many antagonist processes that take place inside pathways.
Representing KEGG as a large network helps keep the bio-
chemical relationships between genes without the constraints
of pathway limits.
7
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 84 / 166
![Page 111: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/111.jpg)
ClassifierRapaport et al
N
-
Glycan biosynthesis
Protein kinases
DNA and RNA polymerase subunits
Glycolysis / Gluconeogenesis
Sulfurmetabolism
Porphyrinand chlorophyll metabolism
Riboflavin metabolism
Folatebiosynthesis
Biosynthesis of steroids, ergosterol metabolism
Lysinebiosynthesis
Phenylalanine, tyrosine andtryptophan biosynthesis Purine
metabolism
Oxidative phosphorylation, TCA cycle
Nitrogen,asparaginemetabolism
Fig. 4. Global connection map of KEGG with mapped coefficients of the decision function obtained by applying a customary linear SVM
(left) and using high-frequency eigenvalue attenuation (80% of high-frequency eigenvalues have been removed) (right). Spectral filtering
divided the whole network into modules having coordinated responses, with the activation of low-frequency eigen modes being determined by
microarray data. Positive coefficients are marked in red, negative coefficients are in green, and the intensity of the colour reflects the absolute
values of the coefficients. Rhombuses highlight proteins participating in the Glycolysis/Gluconeogenesis KEGG pathway. Some other parts of
the network are annotated including big highly connected clusters corresponding to protein kinases and DNA and RNA polymerase sub-units.
5 DISCUSSION
Our algorithm groups predictor variables according to highly
connected "modules" of the global gene network. We assume
that the genes within a tightly connected network module
are likely to contribute similarly to the prediction function
because of the interactions between the genes. This motivates
the filtering of gene expression profile to remove the noisy
high-frequencymodes of the network.
Such grouping of variables is a very useful feature of the
resulting classification function because the function beco-
mes meaningful for interpreting and suggesting biological
factors that cause the class separation. This allows classifi-
cations based on functions, pathways and network modules
rather than on individual genes. This can lead to a more robust
behaviour of the classifier in independent tests and to equal if
not better classification results. Our results on the dataset we
analysed shows only a slight improvement, although this may
be due to its limited size. Thereforewe are currently extending
our work to larger data sets.
An important remark to bear in mind when analyzing pictu-
res such as fig.4 and 5 is that the colors represent the weights
of the classifier, and not gene expression levels. There is
of course a relationship between the classifier weights and
the typical expression levels of genes in irradiated and non-
irradiated samples: irradiated samples tend to have expression
profiles positively correlated with the classifier, while non-
irradiated samples tend to be negatively correlated. Roughly
speaking, the classifier tries to find a smooth function that
has this property. If more samples were available, better
non-smooth classifier might be learned by the algorithm, but
constraining the smoothness of the classifier is away to reduce
the complexity of the learning problem when a limited num-
ber of samples are available. This means in particular that the
pictures provide virtually no information regarding the over-
8
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 85 / 166
![Page 112: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/112.jpg)
ClassifierSpectral analysis of gene expression profiles using gene networks
a) b)Fig. 5. Theglycolysis/gluconeogenesis pathways ofKEGGwithmapped coefficients of the decision function obtained by applying a customary
linear SVM (a) and using high-frequency eigenvalue attenuation (b). The pathways are mutually exclusive in a cell, as clearly highlighted by
our algorithm.
or under-expression of individual genes, which is the cost to
pay to obtain instead an interpretation in terms of more glo-
bal pathways. Constraining the classifier to rely on just a few
genes would have a similar effect of reducing the complexity
of the problem,butwould lead to amoredifficult interpretation
in terms of pathways.
An advantage of our approach over other pathway-based
clustering methods is that we consider the network modules
that naturally appear from spectral analysis rather than a histo-
rically defined separation of the network into pathways. Thus,
pathways cross-talking is taken into account, which is diffi-
cult to do using other approaches. It can however be noticed
that the implicit decomposition into pathways that we obtain
is biased by the very incomplete knowledge of the network
and that certain regions of the network are better understood,
leading to a higher connection concentration.
Like most approaches aiming at comparing expression data
with gene networks such as KEGG, the scope of this work
is limited by two important constraints. First the gene net-
work we use is only a convenient but rough approximation to
describe complex biochemical processes; second, the trans-
criptional analysis of a sample can not give any information
regarding post-transcriptional regulation and modifications.
Nevertheless, we believe that our basic assumptions remain
valid, in that we assume that the expression of the genes
belonging to the same metabolic pathways module are coor-
dinately regulated. Our interpretation of the results supports
this assumption.
Another important caveat is that we simplify the network
description as an undirected graph of interactions. Although
this would seem to be relevant for simplifying the descrip-
tion of metabolic networks, real gene regulation networks are
influenced by the direction, sign and importance of the interac-
tion. Although the incorporationof weights into the Laplacian
(equation 1) is straightforward and allows the extension of the
approach to weighted undirected graphs, the incorporation
of directions and signs to represent signalling or regulatory
pathways requires more work but could lead to important
advances for the interpretation of microarray data in cancer
studies, for example.
ACKNOWLEDGMENTS
This work was supported by the grant ACI-IMPBIO-2004-47
of the French Ministry for Research and New Technologies.
9
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 86 / 166
![Page 113: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/113.jpg)
Outline
1 Pattern recognition and regression
2 Support vector machines
3 Classification of CGH data
4 Classification of expression data
5 Classification of biological sequences
6 Virtual screening and QSAR
7 Conclusion
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 87 / 166
![Page 114: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/114.jpg)
Outline
5 Classification of biological sequencesMotivationFeature space approachUsing generative modelsDerive from a similarity measureApplication: remote homology detection
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 88 / 166
![Page 115: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/115.jpg)
Proteins
A : Alanine V : Valine L : Leucine
F : Phenylalanine P : Proline M : Méthionine
E : Acide glutamique K : Lysine R : Arginine
T : Threonine C : Cysteine N : Asparagine
H : Histidine V : Thyrosine W : Tryptophane
I : Isoleucine S : Sérine Q : Glutamine
D : Acide aspartique G : Glycine
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 89 / 166
![Page 116: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/116.jpg)
Challenges with protein sequences
A protein sequences can be seen as a variable-length sequenceover the 20-letter alphabet of amino-acids, e.g., insuline:FVNQHLCGSHLVEALYLVCGERGFFYTPKA
These sequences are produced at a fast rate (result of thesequencing programs)Need for algorithms to compare, classify, analyze thesesequencesApplications: classification into functional or structural classes,prediction of cellular localization and interactions, ...
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 90 / 166
![Page 117: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/117.jpg)
Example: supervised sequence classification
Data (training)Secreted proteins:MASKATLLLAFTLLFATCIARHQQRQQQQNQCQLQNIEA...MARSSLFTFLCLAVFINGCLSQIEQQSPWEFQGSEVW...MALHTVLIMLSLLPMLEAQNPEHANITIGEPITNETLGWL......
Non-secreted proteins:MAPPSVFAEVPQAQPVLVFKLIADFREDPDPRKVNLGVG...MAHTLGLTQPNSTEPHKISFTAKEIDVIEWKGDILVVG...MSISESYAKEIKTAFRQFTDFPIEGEQFEDFLPIIGNP.....
GoalBuild a classifier to predict whether new proteins are secreted ornot.
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 91 / 166
![Page 118: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/118.jpg)
Supervised classification with vector embedding
The ideaMap each string x ∈ X to a vector Φ(x) ∈ Rp.Train a classifier for vectors on the images Φ(x1), . . . ,Φ(xn) of thetraining set (nearest neighbor, linear perceptron, logisticregression, support vector machine...)
mahtlg...
φX F
maskat...msises
marssl...
malhtv...mappsv...
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 92 / 166
![Page 119: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/119.jpg)
Example: support vector machine
mahtlg...
φX F
maskat...msises
marssl...
malhtv...mappsv...
SVM algorithm
f (x) = sign
(n∑
i=1
αiyiΦ(xi)>Φ(x)
),
where α1, . . . , αn solve, under the constraints 0 ≤ αi ≤ C:
minα
(12
n∑i=1
n∑i=1
αiαjyiyjΦ(xi)>Φ(xj)−
n∑i=1
αi
).
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 93 / 166
![Page 120: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/120.jpg)
Explicit vector embedding
mahtlg...
φX F
maskat...msises
marssl...
malhtv...mappsv...
DifficultiesHow to define the mapping Φ : X → Rp ?No obvious vector embedding for strings in general.How to include prior knowledge about the strings (grammar,probabilistic model...)?
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 94 / 166
![Page 121: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/121.jpg)
Implicit vector embedding with kernels
The kernel trickMany algorithms just require inner products of the embeddingsWe call it a kernel between strings:
K (x , x ′) ∆= Φ(x)>Φ(x ′)
Kernels for protein sequencesKernel methods have been widely investigated since Jaakkola etal.’s seminal paper (1998).What is a good kernel?
it should be mathematically valid (symmetric, p.d. or c.p.d.)fast to computeadapted to the problem (give good performances)
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 95 / 166
![Page 122: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/122.jpg)
Implicit vector embedding with kernels
The kernel trickMany algorithms just require inner products of the embeddingsWe call it a kernel between strings:
K (x , x ′) ∆= Φ(x)>Φ(x ′)
Kernels for protein sequencesKernel methods have been widely investigated since Jaakkola etal.’s seminal paper (1998).What is a good kernel?
it should be mathematically valid (symmetric, p.d. or c.p.d.)fast to computeadapted to the problem (give good performances)
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 95 / 166
![Page 123: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/123.jpg)
Kernel engineering for protein sequences
Define a (possibly high-dimensional) feature space of interestPhysico-chemical kernelsSpectrum, mismatch, substring kernelsPairwise, motif kernels
Derive a kernel from a generative modelFisher kernelMutual information kernelMarginalized kernel
Derive a kernel from a similarity measureLocal alignment kernel
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 96 / 166
![Page 124: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/124.jpg)
Kernel engineering for protein sequences
Define a (possibly high-dimensional) feature space of interestPhysico-chemical kernelsSpectrum, mismatch, substring kernelsPairwise, motif kernels
Derive a kernel from a generative modelFisher kernelMutual information kernelMarginalized kernel
Derive a kernel from a similarity measureLocal alignment kernel
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 96 / 166
![Page 125: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/125.jpg)
Kernel engineering for protein sequences
Define a (possibly high-dimensional) feature space of interestPhysico-chemical kernelsSpectrum, mismatch, substring kernelsPairwise, motif kernels
Derive a kernel from a generative modelFisher kernelMutual information kernelMarginalized kernel
Derive a kernel from a similarity measureLocal alignment kernel
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 96 / 166
![Page 126: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/126.jpg)
Outline
5 Classification of biological sequencesMotivationFeature space approachUsing generative modelsDerive from a similarity measureApplication: remote homology detection
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 97 / 166
![Page 127: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/127.jpg)
Vector embedding for strings
The ideaRepresent each sequence x by a fixed-length numerical vectorΦ (x) ∈ Rp. How to perform this embedding?
Physico-chemical kernelExtract relevant features, such as:
length of the sequencetime series analysis of numerical physico-chemical properties ofamino-acids along the sequence (e.g., polarity, hydrophobicity),using for example:
Fourier transforms (Wang et al., 2004)Autocorrelation functions (Zhang et al., 2003)
rj =1
n − j
n−j∑i=1
hihi+j
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 98 / 166
![Page 128: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/128.jpg)
Vector embedding for strings
The ideaRepresent each sequence x by a fixed-length numerical vectorΦ (x) ∈ Rp. How to perform this embedding?
Physico-chemical kernelExtract relevant features, such as:
length of the sequencetime series analysis of numerical physico-chemical properties ofamino-acids along the sequence (e.g., polarity, hydrophobicity),using for example:
Fourier transforms (Wang et al., 2004)Autocorrelation functions (Zhang et al., 2003)
rj =1
n − j
n−j∑i=1
hihi+j
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 98 / 166
![Page 129: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/129.jpg)
Substring indexation
The approachAlternatively, index the feature space by fixed-length strings, i.e.,
Φ (x) = (Φu (x))u∈Ak
where Φu (x) can be:the number of occurrences of u in x (without gaps) : spectrumkernel (Leslie et al., 2002)the number of occurrences of u in x up to m mismatches (withoutgaps) : mismatch kernel (Leslie et al., 2004)the number of occurrences of u in x allowing gaps, with a weightdecaying exponentially with the number of gaps : substring kernel(Lohdi et al., 2002)
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 99 / 166
![Page 130: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/130.jpg)
Example: spectrum kernel
The 3-spectrum ofx = CGGSLIAMMWFGV
is:(CGG,GGS,GSL,SLI,LIA,IAM,AMM,MMW,MWF,WFG,FGV) .
Let Φu (x) denote the number of occurrences of u in x. Thek -spectrum kernel is:
K(x, x′
):=
∑u∈Ak
Φu (x) Φu(x′)
.
This is formally a sum over |A|k terms, but at most |x | − k + 1terms are non-zero in Φ (x)
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 100 / 166
![Page 131: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/131.jpg)
Substring indexation in practice
Implementation in O(|x|+ |x′|) in memory and time for thespectrum and mismatch kernels (with suffix trees)Implementation in O(|x| × |x′|) in memory and time for thesubstring kernelsThe feature space has high dimension (|A|k ), so learning requiresregularized methods (such as SVM)
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 101 / 166
![Page 132: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/132.jpg)
Dictionary-based indexation
The approachChose a dictionary of sequences D = (x1, x2, . . . , xn)
Chose a measure of similarity s (x, x′)Define the mapping ΦD (x) = (s (x, xi))xi∈D
ExamplesThis includes:
Motif kernels (Logan et al., 2001): the dictionary is a library ofmotifs, the similarity function is a matching functionPairwise kernel (Liao & Noble, 2003): the dictionary is the trainingset, the similarity is a classical measure of similarity betweensequences.
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 102 / 166
![Page 133: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/133.jpg)
Dictionary-based indexation
The approachChose a dictionary of sequences D = (x1, x2, . . . , xn)
Chose a measure of similarity s (x, x′)Define the mapping ΦD (x) = (s (x, xi))xi∈D
ExamplesThis includes:
Motif kernels (Logan et al., 2001): the dictionary is a library ofmotifs, the similarity function is a matching functionPairwise kernel (Liao & Noble, 2003): the dictionary is the trainingset, the similarity is a classical measure of similarity betweensequences.
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 102 / 166
![Page 134: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/134.jpg)
Outline
5 Classification of biological sequencesMotivationFeature space approachUsing generative modelsDerive from a similarity measureApplication: remote homology detection
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 103 / 166
![Page 135: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/135.jpg)
Probabilistic models for sequences
Probabilistic modeling of biological sequences is older than kerneldesigns. Important models include HMM for protein sequences, SCFGfor RNA sequences.
Parametric modelA model is a family of distribution
Pθ, θ ∈ Θ ⊂ Rm ⊂ M+1 (X )
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 104 / 166
![Page 136: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/136.jpg)
Fisher kernel
DefinitionFix a parameter θ0 ∈ Θ (e.g., by maximum likelihood over atraining set of sequences)For each sequence x, compute the Fisher score vector:
Φθ0(x) = ∇θ log Pθ(x)|θ=θ0 .
Form the kernel (Jaakkola et al., 1998):
K(x, x′
)= Φθ0(x)>I(θ0)
−1Φθ0(x′) ,
where I(θ0) = Eθ0
[Φθ0(x)Φθ0(x)>
]is the Fisher information matrix.
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 105 / 166
![Page 137: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/137.jpg)
Fisher kernel properties
The Fisher score describes how each parameter contributes tothe process of generating a particular exampleThe Fisher kernel is invariant under change of parametrization ofthe modelA kernel classifier employing the Fisher kernel derived from amodel that contains the label as a latent variable is, asymptotically,at least as good a classifier as the MAP labelling based on themodel (under several assumptions).
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 106 / 166
![Page 138: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/138.jpg)
Fisher kernel in practice
Φθ0(x) can be computed explicitly for many models (e.g., HMMs)I(θ0) is often replaced by the identity matrixSeveral different models (i.e., different θ0) can be trained andcombinedFeature vectors are explicitly computed
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 107 / 166
![Page 139: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/139.jpg)
Mutual information kernels
DefinitionChose a prior w(dθ) on the measurable set Θ
Form the kernel (Seeger, 2002):
K(x, x′
)=
∫θ∈Θ
Pθ(x)Pθ(x′)w(dθ) .
No explicit computation of a finite-dimensional feature vectorK (x, x′) =< φ (x) , φ (x′) >L2(w) with
φ (x) = (Pθ (x))θ∈Θ .
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 108 / 166
![Page 140: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/140.jpg)
Example: coin toss
Let Pθ(X = 1) = θ and Pθ(X = 0) = 1− θ a model for randomcoin toss, with θ ∈ [0, 1].Let dθ be the Lebesgue measure on [0, 1]
The mutual information kernel between x = 001 and x′ = 1010 is:Pθ (x) = θ (1− θ)2 ,
Pθ (x′) = θ2 (1− θ)2 ,
K(x, x′
)=
∫ 1
0θ3 (1− θ)4 dθ =
3!4!
8!=
1280
.
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 109 / 166
![Page 141: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/141.jpg)
Context-tree model
DefinitionA context-tree model is a variable-memory Markov chain:
PD,θ(x) = PD,θ (x1 . . . xD)n∏
i=D+1
PD,θ (xi | xi−D . . . xi−1)
D is a suffix treeθ ∈ ΣD is a set of conditional probabilities (multinomials)
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 110 / 166
![Page 142: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/142.jpg)
Context-tree model: example
P(AABACBACC) = P(AAB)θAB(A)θA(C)θC(B)θACB(A)θA(C)θC(A) .
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 111 / 166
![Page 143: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/143.jpg)
The context-tree kernel
Theorem (Cuturi et al., 2004)For particular choices of priors, the context-tree kernel:
K(x, x′
)=∑D
∫θ∈ΣD
PD,θ(x)PD,θ(x′)w(dθ|D)π(D)
can be computed in O(|x|+ |x′|) with a variant of the Context-TreeWeighting algorithm.This is a valid mutual information kernel.The similarity is related to information-theoretical measure ofmutual information between strings.
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 112 / 166
![Page 144: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/144.jpg)
Marginalized kernels
DefinitionFor any observed data x ∈ X , let a latent variable y ∈ Y beassociated probabilistically through a conditional probabilityPx (dy).Let KZ be a kernel for the complete data z = (x, y)
Then the following kernel is a valid kernel on X , called amarginalized kernel (Kin et al., 2002):
KX(x, x′
):= EPx(dy)×Px′ (dy′)KZ
(z, z′
)=
∫ ∫KZ((x, y) ,
(x′, y′
))Px (dy) Px′
(dy′)
.
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 113 / 166
![Page 145: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/145.jpg)
Marginalized kernels: proof of positive definiteness
KZ is p.d. on Z. Therefore there exists a Hilbert space H andΦZ : Z → H such that:
KZ(z, z′
)=⟨ΦZ (z) ,ΦZ
(z′)⟩H .
Marginalizing therefore gives:
KX(x, x′
)= EPx(dy)×Px′ (dy′)KZ
(z, z′
)= EPx(dy)×Px′ (dy′)
⟨ΦZ (z) ,ΦZ
(z′)⟩H
=⟨EPx(dy)ΦZ (z) , EPx(dy′)ΦZ
(z′)⟩H ,
therefore KX is p.d. on X .
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 114 / 166
![Page 146: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/146.jpg)
Example: HMM for normal/biased coin toss
S
B
0.5
0.5
0.10.1
0.05
0.05N
E
0.85
0.85
Normal (N) and biased (B)coins (not observed)
Observed output are 0/1 with probabilities:π(0|N) = 1− π(1|N) = 0.5,
π(0|B) = 1− π(1|B) = 0.8.
Example of realization (complete data):
NNNNNBBBBBBBBBNNNNNNNNNNNBBBBBB1001011101111010010111001111011
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 115 / 166
![Page 147: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/147.jpg)
1-spectrum kernel on complete data
If both x ∈ A∗ and y ∈ S∗ were observed, we might rather use the1-spectrum kernel on the complete data z = (x, y):
KZ(z, z′
)=
∑(a,s)∈A×S
na,s (z) na,s (z) ,
where na,s (x, y) for a = 0, 1 and s = N, B is the number ofoccurrences of s in y which emit a in x.Example:
z =1001011101111010010111001111011,z′ =0011010110011111011010111101100101,
KZ(z, z′
)= n0 (z) n0
(z′)
+ n0 (z) n0(z′)
+ n1 (z) n1(z′)
+ n1 (z) n1(z′)
= 7× 15 + 9× 12 + 13× 6 + 2× 1 = 293.
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 116 / 166
![Page 148: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/148.jpg)
1-spectrum marginalized kernel on observed data
The marginalized kernel for observed data is:
KX(x, x′
)=
∑y,y′∈S∗
KZ ((x, y) , (x, y)) P (y|x) P(y′|x′
)=
∑(a,s)∈A×S
Φa,s (x) Φa,s(x′),
withΦa,s (x) =
∑y∈S∗
P (y|x) na,s (x, y)
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 117 / 166
![Page 149: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/149.jpg)
Computation of the 1-spectrum marginalized kernel
Φa,s (x) =∑y∈S∗
P (y|x) na,s (x, y)
=∑y∈S∗
P (y|x)
n∑
i=1
δ (xi , a) δ (yi , s)
=n∑
i=1
δ (xi , a)
∑y∈S∗
P (y|x) δ (yi , s)
=
n∑i=1
δ (xi , a) P (yi = s|x) .
and P (yi = s|x) can be computed efficiently by forward-backwardalgorithm!
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 118 / 166
![Page 150: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/150.jpg)
HMM example (DNA)
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 119 / 166
![Page 151: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/151.jpg)
HMM example (protein)
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 120 / 166
![Page 152: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/152.jpg)
SCFG for RNA sequences
SFCG rulesS → SSS → aSaS → aSS → a
Marginalized kernel (Kin et al., 2002)Feature: number of occurrences of each (base,state) combinationMarginalization using classical inside/outside algorithm
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 121 / 166
![Page 153: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/153.jpg)
Marginalized kernels in practice
ExamplesSpectrum kernel on the hidden states of a HMM for proteinsequences (Tsuda et al., 2002)Kernels for RNA sequences based on SCFG (Kin et al., 2002)Kernels for graphs based on random walks on graphs (Kashima etal., 2004)Kernels for multiple alignments based on phylogenetic models(Vert et al., 2005)
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 122 / 166
![Page 154: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/154.jpg)
Marginalized kernels: example
PC2
PC1
A set of 74 human tRNAsequences is analyzed usinga kernel for sequences (thesecond-order marginalizedkernel based on SCFG). Thisset of tRNAs contains threeclasses, called Ala-AGC(white circles), Asn-GTT(black circles) and Cys-GCA(plus symbols) (from Tsudaet al., 2003).
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 123 / 166
![Page 155: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/155.jpg)
Outline
5 Classification of biological sequencesMotivationFeature space approachUsing generative modelsDerive from a similarity measureApplication: remote homology detection
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 124 / 166
![Page 156: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/156.jpg)
Sequence alignment
MotivationHow to compare 2 sequences?
x1 = CGGSLIAMMWFGVx2 = CLIVMMNRLMWFGV
Find a good alignment:
CGGSLIAMM----WFGV|...|||||....||||C---LIVMMNRLMWFGV
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 125 / 166
![Page 157: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/157.jpg)
Alignment score
In order to quantify the relevance of an alignment π, define:a substitution matrix S ∈ RA×A
a gap penalty function g : N → RAny alignment is then scored as follows
CGGSLIAMM----WFGV|...|||||....||||C---LIVMMNRLMWFGV
sS,g(π) = S(C, C) + S(L, L) + S(I, I) + S(A, V ) + 2S(M, M)
+ S(W , W ) + S(F , F ) + S(G, G) + S(V , V )− g(3)− g(4)
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 126 / 166
![Page 158: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/158.jpg)
Local alignment kernel
Smith-Waterman scoreThe widely-used Smith-Waterman local alignment score is definedby:
SWS,g(x, y) := maxπ∈Π(x,y)
sS,g(π).
It is symmetric, but not positive definite...
LA kernelThe local alignment kernel:
K (β)LA (x, y) =
∑π∈Π(x,y)
exp(βsS,g (x, y, π)
),
is symmetric positive definite (Vert et al., 2004).
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 127 / 166
![Page 159: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/159.jpg)
Local alignment kernel
Smith-Waterman scoreThe widely-used Smith-Waterman local alignment score is definedby:
SWS,g(x, y) := maxπ∈Π(x,y)
sS,g(π).
It is symmetric, but not positive definite...
LA kernelThe local alignment kernel:
K (β)LA (x, y) =
∑π∈Π(x,y)
exp(βsS,g (x, y, π)
),
is symmetric positive definite (Vert et al., 2004).
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 127 / 166
![Page 160: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/160.jpg)
LA kernel is p.d.: proof
If K1 and K2 are p.d. kernels for strings, then their convolutiondefined by:
K1 ? K2(x, y) :=∑
x1x2=x,y1y2=y
K1(x1, y1)K2(x2, y2)
is also p.d. (Haussler, 1999).LA kernel is p.d. because it is a convolution kernel (Haussler,1999):
K (β)LA =
∞∑n=0
K0 ?(
K (β)a ? K (β)
g
)(n−1)? K (β)
a ? K0.
where K0, Ka and Kg are three basic p.d. kernels (Vert et al.,2004).
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 128 / 166
![Page 161: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/161.jpg)
LA kernel in practice
Implementation by dynamic programming in O(|x| × |x′|)
a:0/1
a:0/1
a:0/1
a:0/1
0:a/1
0:a/1
0:a/1 0:a/1
0:a/1
0:0/1
0:0/1
0:0/1
0:0/1
0:0/1
0:0/1
0:a/1
a:b/m(a,b)
a:b/m(a,b)
a:b/m(a,b)
a:b/m(a,b)
a:b/m(a,b)a:0/D
a:0/E
0:b/E
0:b/D
0:b/D
B M E
XX X
YY Y
1
1 2
2
In practice, values are too large (exponential scale) so taking itslogarithm is a safer choice (but not p.d. anymore!)
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 129 / 166
![Page 162: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/162.jpg)
Outline
5 Classification of biological sequencesMotivationFeature space approachUsing generative modelsDerive from a similarity measureApplication: remote homology detection
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 130 / 166
![Page 163: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/163.jpg)
Remote homology
Sequence similarity
Close
hom
ologs
Twili
ght zone
Non h
omolo
gs
Homologs have common ancestorsStructures and functions are more conserved than sequencesRemote homologs can not be detected by direct sequencecomparison
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 131 / 166
![Page 164: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/164.jpg)
SCOP database
Remote homologs
Superfamily
Family
SCOP
Close homologs
Fold
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 132 / 166
![Page 165: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/165.jpg)
A benchmark experiment
Goal: recognize directly the superfamilyTraining: for a sequence of interest, positive examples come fromthe same superfamily, but different families. Negative from othersuperfamilies.Test: predict the superfamily.
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 133 / 166
![Page 166: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/166.jpg)
Difference in performance
0
10
20
30
40
50
60
0 0.2 0.4 0.6 0.8 1
No.
of f
amili
es w
ith g
iven
per
form
ance
ROC50
SVM-LASVM-pairwise
SVM-MismatchSVM-Fisher
Performance on the SCOP superfamily recognition benchmark (fromVert et al., 2004).
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 134 / 166
![Page 167: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/167.jpg)
Outline
1 Pattern recognition and regression
2 Support vector machines
3 Classification of CGH data
4 Classification of expression data
5 Classification of biological sequences
6 Virtual screening and QSAR
7 Conclusion
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 135 / 166
![Page 168: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/168.jpg)
Outline
6 Virtual screening and QSARMotivation2D Kernel3D Pharmacophore Kernel
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 136 / 166
![Page 169: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/169.jpg)
Ligand-Based Virtual Screening and QSAR
inactive
active
active
active
inactive
inactive
NCI AIDS screen results (from http://cactus.nci.nih.gov).Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 137 / 166
![Page 170: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/170.jpg)
More formally...
ObjectiveBuild models to predict biochemical properties Y of small moleculesfrom their structures X , using a training set of (X , Y ) pairs.
Structures X
C15H14ClN3O3
Properties Ybinding to a therapeutic target,pharmacokinetics (ADME),toxicity...
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 138 / 166
![Page 171: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/171.jpg)
Classical approaches
Two steps1 Map each molecule to a vector of fixed dimension using molecular
descriptorsGlobal properties of the molecules (mass, logP...)2D and 3D descriptors (substructures, fragments, ....)
2 Apply an algorithm for regression or pattern recognition.PLS, ANN, ...
Example: 2D structural keys
O
N
O
O
OO
N N N
O O
O
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 139 / 166
![Page 172: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/172.jpg)
Which descriptors?
O
N
O
O
OO
N N N
O O
O
DifficultiesMany descriptors are needed to characterize various features (inparticular for 2D and 3D descriptors)But too many descriptors are harmful for memory storage,computation speed, statistical estimation
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 140 / 166
![Page 173: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/173.jpg)
Kernels
DefinitionLet Φ(x) = (Φ1(x), . . . ,Φp(x)) be a vector representation of themolecule xThe kernel between two molecules is defined by:
K (x , x ′) = Φ(x)>Φ(x ′) =
p∑i=1
Φi(x)Φi(x ′) .
φX H
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 141 / 166
![Page 174: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/174.jpg)
Making kernels for molecules
Strategy 1: use well-known molecular descriptors to representmolecules m as vectors Φ(m), and then use kernels for vectors,e.g.:
K (m1, m2) = Φ(m1)>Φ(m2).
Strategy 2: invent new kernels to do things you can not do withstrategy 1, such as using an infinite number of descriptors. We willnow see two examples of this strategy, extending 2D and 3Dmolecular descriptors.
φX H
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 142 / 166
![Page 175: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/175.jpg)
Summary
The problemRegression and pattern recognition over moleculesClassical vector representation is both statistically andcomputationally challenging
The kernel approachBy defining a kernel for molecules we can work implicitly in large(potentially infinite!) dimensions:
Allows to consider a large number of potentially importantfeatures.No need to store explicitly the vectors (no problem of memorystorage or hash clashes) thanks to the kernel trickUse of regularized statistical algorithm (SVM, kernel PLS, kernelperceptron...)to handle the statistical problem of large dimension
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 143 / 166
![Page 176: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/176.jpg)
Summary
The problemRegression and pattern recognition over moleculesClassical vector representation is both statistically andcomputationally challenging
The kernel approachBy defining a kernel for molecules we can work implicitly in large(potentially infinite!) dimensions:
Allows to consider a large number of potentially importantfeatures.No need to store explicitly the vectors (no problem of memorystorage or hash clashes) thanks to the kernel trickUse of regularized statistical algorithm (SVM, kernel PLS, kernelperceptron...)to handle the statistical problem of large dimension
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 143 / 166
![Page 177: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/177.jpg)
Outline
6 Virtual screening and QSARMotivation2D Kernel3D Pharmacophore Kernel
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 144 / 166
![Page 178: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/178.jpg)
Motivation: 2D Fingerprints
FeaturesA vector indexed by a large set of molecular fragments
. . . . . .C NCCON
C C NO C
CO C
CC CCN C
NO CC C C
CC CC C C
CN CC C C
N
O
O
O N
O
C. . . . . . . . .
ProsMany featuresEasy to detect
ConsToo many features?Hashing =⇒ clashes
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 145 / 166
![Page 179: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/179.jpg)
Motivation: 2D Fingerprints
FeaturesA vector indexed by a large set of molecular fragments
. . . . . .C NCCON
C C NO C
CO C
CC CCN C
NO CC C C
CC CC C C
CN CC C C
N
O
O
O N
O
C. . . . . . . . .
ProsMany featuresEasy to detect
ConsToo many features?Hashing =⇒ clashes
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 145 / 166
![Page 180: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/180.jpg)
SVM approach
. . . . . .C NCCON
C C NO C
CO C
CC CCN C
NO CC C C
CC CC C C
CN CC C C
N
O
O
O N
O
C. . . . . . . . .
Let Φ(x) the vector of fragment counts:Long fragments lead to large dimensions :
SVM can learn in high dimensionΦ(x) is too long to be stored, and hashes induce clashes:
SVM do not need Φ(x), they just need the kernel
K (x , x ′) = φ(x)>φ(x ′) .
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 146 / 166
![Page 181: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/181.jpg)
2D fingerprint kernel
DefinitionFor any d > 0 let φd(x) be the vector of counts of all fragments oflength d :
φ1(x) = ( #(C),#(O),#(N), ...)>
φ2(x) = ( #(C-C),#(C=O),#(C-N), ...)> etc...
The 2D fingerprint kernel is defined, for λ < 1, by
K2D(x , x ′) =∞∑
d=1
λdφd(x)>φd(x ′) .
This is an inner product in the space of 2D fingerprints of infinitelength.
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 147 / 166
![Page 182: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/182.jpg)
2D kernel computation
TheoremThe 2D fingerprint kernel between two molecules x and x ′ can becomputed with a worst-case complexity O
((| x | × | x ′ |)3
)(much faster
in practice).
RemarksThe complexity is not related to the length of the fragmentsconsidered (although faster computations are possible if thelength is limited).Solves the problem of clashes and memory storage.Allows to work with infinite-length fingerprints without computingthem!
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 148 / 166
![Page 183: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/183.jpg)
2D kernel computation
TheoremThe 2D fingerprint kernel between two molecules x and x ′ can becomputed with a worst-case complexity O
((| x | × | x ′ |)3
)(much faster
in practice).
RemarksThe complexity is not related to the length of the fragmentsconsidered (although faster computations are possible if thelength is limited).Solves the problem of clashes and memory storage.Allows to work with infinite-length fingerprints without computingthem!
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 148 / 166
![Page 184: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/184.jpg)
2D kernel computation trick
Rephrase the kernel computation as that as counting the numberof walks on a graph (the product graph)
G1 x G2
c
d e43
2
1 1b 2a 1d
1a 2b
3c
4c
2d
3e
4e
G1 G2
a b
The infinite counting can be factorized
λA + λ2A2 + λ3A3 + . . . = (I − λA)−1 − I .
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 149 / 166
![Page 185: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/185.jpg)
Extensions 1: label enrichment
Atom relabebling with the Morgan index
Order 2 indices
N
O
O
1
1
1
1
1
1
1
1
1
N
O
O
2
2
2
3
2
3
1
1
2
N
O
O
4
4
5
7
5
5
3
3
4
No Morgan Indices Order 1 indices
Compromise between fingerprints and structural keys features.Other relabeling schemes are possible.Faster computation with more labels (less matches implies asmaller product graph).
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 150 / 166
![Page 186: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/186.jpg)
Extension 2: Non-tottering walk kernel
Tottering walksA tottering walk is a walk w = v1 . . . vn with vi = vi+2 for some i .
Tottering
Non−tottering
Tottering walks seem irrelevant for many applicationsFocusing on non-tottering walks is a way to get closer to the pathkernel (e.g., equivalent on trees).
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 151 / 166
![Page 187: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/187.jpg)
Computation of the non-tottering walk kernel (Mahé etal., 2005)
Second-order Markov random walk to prevent tottering walksWritten as a first-order Markov random walk on an augmentedgraphNormal walk kernel on the augmented graph (which is always adirected graph).
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 152 / 166
![Page 188: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/188.jpg)
Extensions 3: tree-like fragments
.
.
.
.
.
.
.
.
.N
N
C
CO
C
.
.
. C
O
C
N
C
N O
C
N CN C C
N
N
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 153 / 166
![Page 189: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/189.jpg)
Experiments
MUTAG datasetaromatic/hetero-aromatic compoundshigh mutagenic activity /no mutagenic activity, assayed inSalmonella typhimurium.188 compouunds: 125 + / 63 -
Results10-fold cross-validation accuracy
Method AccuracyProgol1 81.4%2D kernel 91.2%
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 154 / 166
![Page 190: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/190.jpg)
Subtree kernels
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 176
77
78
79
80
81
82
83
84
85
86
lambda
AU
C
BRANCH−BASED, NO−TOTTERING
h = 2h = 3h = 4h = 5h = 6h = 7h = 8h = 9h = 10
AUC as a function of the branching factors for different tree depths(from Mahé et al., 2007).
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 155 / 166
![Page 191: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/191.jpg)
2D Subtree vs walk kernels
7072
7476
7880
AU
CWalksSubtrees
CC
RF
−C
EM
HL−
60(T
B)
K−
562
MO
LT−
4R
PM
I−82
26 SR
A54
9/A
TC
CE
KV
XH
OP
−62
HO
P−
92N
CI−
H22
6N
CI−
H23
NC
I−H
322M
NC
I−H
460
NC
I−H
522
CO
LO_2
05H
CC
−29
98H
CT
−11
6H
CT
−15
HT
29K
M12
SW
−62
0S
F−
268
SF
−29
5S
F−
539
SN
B−
19S
NB
−75
U25
1LO
X_I
MV
IM
ALM
E−
3M M14
SK
−M
EL−
2S
K−
ME
L−28
SK
−M
EL−
5U
AC
C−
257
UA
CC
−62
IGR
−O
V1
OV
CA
R−
3O
VC
AR
−4
OV
CA
R−
5O
VC
AR
−8
SK
−O
V−
378
6−0
A49
8A
CH
NC
AK
I−1
RX
F_3
93S
N12
CT
K−
10U
O−
31P
C−
3D
U−
145
MC
F7
NC
I/AD
R−
RE
SM
DA
−M
B−
231/
AT
CC
HS
_578
TM
DA
−M
B−
435
MD
A−
NB
T−
549
T−
47D
Screening of inhibitors for 60 cancer cell lines.Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 156 / 166
![Page 192: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/192.jpg)
Outline
6 Virtual screening and QSARMotivation2D Kernel3D Pharmacophore Kernel
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 157 / 166
![Page 193: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/193.jpg)
Space of pharmacophore
3-points pharmacophores
O
O
2
d1
d3
d
O
O
2
d1
d3
d
A set of 3 atoms, and 3 inter-atom distances:
T = ((x1, x2, x3) , (d1, d2, d3)) , xi ∈ atom types; di ∈ R
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 158 / 166
![Page 194: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/194.jpg)
3D fingerprint kernel
Pharmacophore fingerprint1 Discretize the space of pharmacophores T (e.g., 6 atoms or
groups of atoms, 6-7 distance bins) into a finite set Td
2 Count the number of occurrences φt(x) of each pharmacophorebin t in a given molecule x , to form a pharmacophore fingerprint.
3D kernelA simple 3D kernel is the inner product of pharmacophore fingerprints:
K (x , x ′) =∑t∈Td
φt(x)φt(x ′) .
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 159 / 166
![Page 195: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/195.jpg)
Discretization of the pharmacophore space
Common issues1 If the bins are too large, then they are not specific enough2 If the bins are too large, then they are too specific
In all cases, the arbitrary position of boundaries between bins affectsthe comparison:
x1 x3
x2
→ d(x1, x3) < d(x1, x2)BUT bin(x1) = bin(x2) 6= bin(x3)
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 160 / 166
![Page 196: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/196.jpg)
Kernels between pharmacophores
A small trick
K (x , y) =∑t∈Td
φt(x)φt(y)
=∑t∈Td
(X
px∈P(x)
1(bin(px) = t))(X
py∈P(y)
1(bin(py) = t))
=∑
px∈P(x)
∑py∈P(y)
1(bin(px) = bin(py))
General pharmacophore kernel
K (x , y) =∑
px∈P(x)
∑py∈P(y)
KP(px , py )
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 161 / 166
![Page 197: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/197.jpg)
New pharmacophore kernels
Discretizing the pharmacophore space is equivalent to taking thefollowing kernel between individual pharmacophores:
KP(p1, p2) = 1 (bin(px) = bin(py))
For general kernels, there is no need for discretization!For example, is d(p1, p2) is a Euclidean distance betweenpharmacophores, take:
KP (p1, p2) = exp (−γd (p1, p2)) .
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 162 / 166
![Page 198: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/198.jpg)
Experiments
4 public datasetsBZR: ligands for the benzodiazepine receptorCOX: cyclooxygenase-2 inhibitorsDHFR: dihydrofolate reductase inhibitorsER: estrogen receptor ligands
TRAIN TESTPos Neg Pos Neg
BZR 94 87 63 62COX 87 91 61 64DHFR 84 149 42 118ER 110 156 70 110
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 163 / 166
![Page 199: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/199.jpg)
Experiments
Results (accuracy)Kernel BZR COX DHFR ER2D (Tanimoto) 71.2 63.0 76.9 77.13D fingerprint 75.4 67.0 76.9 78.63D not discretized 76.4 69.8 81.9 79.8
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 164 / 166
![Page 200: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/200.jpg)
Outline
1 Pattern recognition and regression
2 Support vector machines
3 Classification of CGH data
4 Classification of expression data
5 Classification of biological sequences
6 Virtual screening and QSAR
7 Conclusion
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 165 / 166
![Page 201: Jean-Philippe Vert Jean-Philippe.Vert@ensmpcbio.mines-paristech.fr/~jvert/talks/081127sfds/sfds.pdfMachine learning in post-genomic Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines](https://reader033.vdocument.in/reader033/viewer/2022041908/5e6576b13a79fb76607f7a1f/html5/thumbnails/201.jpg)
Summary
Genomic data are often high-dimensional and structuredInference is possible with the constrained ERM principlePrior knowledge can be included in the contraint, e.g.:
Sparsity-inducing priorsEuclidean balls with kernels
The final performance depends a lot on the prior constraint whenfew examples are availabe =⇒ it is a good place to put someeffort
Jean-Philippe Vert (ParisTech) Machine learning in post-genomics SFDS 2008 166 / 166