supervised inference of biological networks and...
TRANSCRIPT
Supervised inference of biological networksand
Classification of gene expression data with genenetworks
Jean-Philippe [email protected]
Centre for Computational BiologyEcole des Mines de Paris, ParisTech
Institute for Infocomm Research, Singapore, February 14th, 2007
J.-P. Vert (Ecole des Mines) Supervised network inference 1 / 32
Outline
1 Supervised inference of biological networks from heterogeneousgenomic data
2 Using gene networks for gene expression data classification
J.-P. Vert (Ecole des Mines) Supervised network inference 2 / 32
Outline
1 Supervised inference of biological networks from heterogeneousgenomic data
2 Using gene networks for gene expression data classification
J.-P. Vert (Ecole des Mines) Supervised network inference 2 / 32
Outline
1 Supervised inference of biological networks from heterogeneousgenomic data
2 Using gene networks for gene expression data classification
J.-P. Vert (Ecole des Mines) Supervised network inference 3 / 32
Motivation
DataGene expression,Gene sequence,Protein localization, ...
GraphProtein-protein interactions,Metabolic pathways,Signaling pathways, ...
J.-P. Vert (Ecole des Mines) Supervised network inference 4 / 32
Strategies
Unsupervised approachesThe graph is completely unknown
model-based approaches : Bayes nets, dynamical systems,..similarity-based : connect similar nodes
Supervised approachesPart of the graph is known in advance
Prior knowledge in model-based approachesStatistical / Machine learning approaches: learn from the knownsubnetwork a rule that can predict edges from genomic data
J.-P. Vert (Ecole des Mines) Supervised network inference 5 / 32
Strategies
Unsupervised approachesThe graph is completely unknown
model-based approaches : Bayes nets, dynamical systems,..similarity-based : connect similar nodes
Supervised approachesPart of the graph is known in advance
Prior knowledge in model-based approachesStatistical / Machine learning approaches: learn from the knownsubnetwork a rule that can predict edges from genomic data
J.-P. Vert (Ecole des Mines) Supervised network inference 5 / 32
Genomic Data
Data representation a distancesWe assume that each type of data (expression, sequences...)defines a (negative definite) distance between genes.Many such distances exist (cf kernel methods).Data integration is easily obtained by summing the distance toobtain an “integrated” distance
J.-P. Vert (Ecole des Mines) Supervised network inference 6 / 32
Method 1: Direct similarity-based prediction
Motivation: “connect similar genes”Connect a and b if d(a, b) is below a threshold.This is an unsupervised approach (no use of the knownsubnetwork).
J.-P. Vert (Ecole des Mines) Supervised network inference 7 / 32
Method 1: Direct similarity-based prediction
Motivation: “connect similar genes”Connect a and b if d(a, b) is below a threshold.This is an unsupervised approach (no use of the knownsubnetwork).
J.-P. Vert (Ecole des Mines) Supervised network inference 7 / 32
Method 2: metric learning
Metric learningMotivation: use the known subnetwork to refine the distancemeasure, before applying the similarity-based methodBased on kernel CCA (Yamanishi et al., 2004) or kernel metriclearning (V. and Yamanishi, 2005).
J.-P. Vert (Ecole des Mines) Supervised network inference 8 / 32
Method 2: metric learning
Metric learningMotivation: use the known subnetwork to refine the distancemeasure, before applying the similarity-based methodBased on kernel CCA (Yamanishi et al., 2004) or kernel metriclearning (V. and Yamanishi, 2005).
J.-P. Vert (Ecole des Mines) Supervised network inference 8 / 32
Method 2: metric learning
Metric learningMotivation: use the known subnetwork to refine the distancemeasure, before applying the similarity-based methodBased on kernel CCA (Yamanishi et al., 2004) or kernel metriclearning (V. and Yamanishi, 2005).
J.-P. Vert (Ecole des Mines) Supervised network inference 8 / 32
Method 2: metric learning
Metric learningMotivation: use the known subnetwork to refine the distancemeasure, before applying the similarity-based methodBased on kernel CCA (Yamanishi et al., 2004) or kernel metriclearning (V. and Yamanishi, 2005).
J.-P. Vert (Ecole des Mines) Supervised network inference 8 / 32
Metric learning example
Kernel metric learning (V. and Yamanishi, 2005)Criterion: connected points should be near each other aftermapping to a new d-dimensional Euclidean space.Add regularization to deal with high dimensions.Mapping f (x) = (f1(x), . . . , fd(x)) with:
fi = arg minf⊥f1,...,fi−1,var(f )=1
∑i∼j
(f (xi)− f (xj)
)2+ λ||f ||2k
.
Interpolates between (kernel) PCA (λ = ∞) and graph embedding(λ = 0).Equivalent to a generalized eigenvalue problem.
J.-P. Vert (Ecole des Mines) Supervised network inference 9 / 32
Metric learning example
Kernel CCA (Yamanishi et al., 2004)Criterion: Find a subspace where the graph distance and thegenomic data distance matchFormulated as a search for correlated directions (kernel trick).
g8
g3 g7
g6g5g4
g1
g2
J.-P. Vert (Ecole des Mines) Supervised network inference 10 / 32
Metric learning example
Kernel CCA (Yamanishi et al., 2004)Criterion: Find a subspace where the graph distance and thegenomic data distance matchFormulated as a search for correlated directions (kernel trick).
g8
g3 g7
g6g5g4
g1
g2
g1
g4
g6
g7
g8
g3g5g2
Kernel
J.-P. Vert (Ecole des Mines) Supervised network inference 10 / 32
Metric learning example
Kernel CCA (Yamanishi et al., 2004)Criterion: Find a subspace where the graph distance and thegenomic data distance matchFormulated as a search for correlated directions (kernel trick).
g8
g3 g7
g6g5g4
g1
g2
g6
g7 g8g3
g1
g4
g5
g1
g4
g6
g7
g8
g3g5g2
g2
Diffusion kernelKernel
J.-P. Vert (Ecole des Mines) Supervised network inference 10 / 32
Metric learning example
Kernel CCA (Yamanishi et al., 2004)Criterion: Find a subspace where the graph distance and thegenomic data distance matchFormulated as a search for correlated directions (kernel trick).
g8
g3 g7
g6g5g4
g1
g2
g6
g7 g8g3
g1
g4
g5
g1
g4
g6
g7
g8
g3g5g2
g2
Kernel CCA
Diffusion kernelKernel
J.-P. Vert (Ecole des Mines) Supervised network inference 10 / 32
Method 3: Matrix completion
Motivation: Fill missing entries in the adjacency matrix directly, bymaking it similar to (a variant of) the data matrixMethod: EM algorithm based on information geometry of positivesemidefinite matrices (Kato et al., 2005)
J.-P. Vert (Ecole des Mines) Supervised network inference 11 / 32
Method 4: Supervised binary classification
A pair can be connected (1) or not connected (-1)Use known network as a training set for a SVM that will predict ifnew pair is connected or notExample: SVM with tensor product pairwise kernel (Ben-Hur andNoble, 2006):
KTTPK ((x1, x2), (x3, x4)) = K (x1, x3)K (x2, x4) + K (x1, x4)K (x2, x3) .
J.-P. Vert (Ecole des Mines) Supervised network inference 12 / 32
Method 5: Local predictions
Motivation: define specific models for each target node todiscriminate between its neighbors and the othersTreat each node independently from the other. Then combinepredictions for ranking candidate edges.
J.-P. Vert (Ecole des Mines) Supervised network inference 13 / 32
Method 5: Local predictions
Motivation: define specific models for each target node todiscriminate between its neighbors and the othersTreat each node independently from the other. Then combinepredictions for ranking candidate edges.
+1
−1
?
?
?
+1
−1
−1
J.-P. Vert (Ecole des Mines) Supervised network inference 13 / 32
Local predictions: pros and cons
ProsAllow very different models for nearby nodes on the graphFaster to train n models with n examples than 1 model with n2
examples
ConsFew positive examples available for some nodes
J.-P. Vert (Ecole des Mines) Supervised network inference 14 / 32
Local predictions: pros and cons
ProsAllow very different models for nearby nodes on the graphFaster to train n models with n examples than 1 model with n2
examples
ConsFew positive examples available for some nodes
J.-P. Vert (Ecole des Mines) Supervised network inference 14 / 32
Experiments
NetworkMetabolic network (668 vertices, 2782 edges)Protein-protein interaction network (984 vertices, 2438 edges)
Data (yeast)Gene expression (157 experiments)Phylogenetic profile (145 organisms)Cellular localization (23 intracellular locations)Yeast two-hybrid data (2438 interactions among 984 proteins)
Method5-fold cross-validationPredict edges between test set and training set
J.-P. Vert (Ecole des Mines) Supervised network inference 15 / 32
Results: protein-protein interaction
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
Ratio of false positives
Rat
io o
f tru
e po
sitiv
es
DirectkMLkCCAemlocalPkernel
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
Ratio of true positives
Fal
se d
isco
very
rat
e
DirectkMLkCCAemlocalPkernel
J.-P. Vert (Ecole des Mines) Supervised network inference 16 / 32
Results: metabolic gene network
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
Ratio of false positives
Rat
io o
f tru
e po
sitiv
es
DirectkMLkCCAemlocalPkernel
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
Ratio of true positives
Fal
se d
isco
very
rat
e
DirectkMLkCCAemlocalPkernel
J.-P. Vert (Ecole des Mines) Supervised network inference 17 / 32
Results: effect of data integration
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
Ratio of false positives
Rat
io o
f tru
e po
sitiv
es
explocphyy2hint
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
Ratio of true positivesF
alse
dis
cove
ry r
ate
explocphyy2hint
Local SVM, protein-protein interaction network.
J.-P. Vert (Ecole des Mines) Supervised network inference 18 / 32
Results: effect of data integration
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
Ratio of false positives
Rat
io o
f tru
e po
sitiv
es
explocphyint
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
Ratio of true positivesF
alse
dis
cove
ry r
ate
explocphyint
Local SVM, metabolic gene network.
J.-P. Vert (Ecole des Mines) Supervised network inference 19 / 32
Conclusion
SummaryA variety of methods have been investigated recentlySome reach interesting performance on the benchmarks: LocalSVM retrieve 45% of all true edges of the metabolic gene networkat a FDR below 50%
Valid for any network, but non-mechanistic model.Future work: experimental validation, improved data integration,semi-local approaches...
J.-P. Vert (Ecole des Mines) Supervised network inference 20 / 32
Outline
1 Supervised inference of biological networks from heterogeneousgenomic data
2 Using gene networks for gene expression data classification
J.-P. Vert (Ecole des Mines) Supervised network inference 21 / 32
Tumor classification from microarray data
Data availableGene expression measures for more than 10k genesMeasured on less than 100 samples of two (or more) differentclasses (e.g., different tumors)
GoalDesign a classifier to automatically assign a class to futuresamples from their expression profileInterpret biologically the differences between the classes
J.-P. Vert (Ecole des Mines) Supervised network inference 22 / 32
Tumor classification from microarray data
Data availableGene expression measures for more than 10k genesMeasured on less than 100 samples of two (or more) differentclasses (e.g., different tumors)
GoalDesign a classifier to automatically assign a class to futuresamples from their expression profileInterpret biologically the differences between the classes
J.-P. Vert (Ecole des Mines) Supervised network inference 22 / 32
Linear classifiers
The approachEach sample is represented by a vector x = (x1, . . . , xp) wherep > 105 is the number of probesClassification: given the set of labeled sample, learn a lineardecision function:
f (x) =
p∑i=1
βixi + β0 ,
that is positive for one class, negative for the otherInterpretation: the weight βi quantifies the influence of gene i forthe classification
J.-P. Vert (Ecole des Mines) Supervised network inference 23 / 32
Linear classifiers
PitfallsNo robust estimation procedure exist for 100 samples in 105
dimensions!It is necessary to reduce the complexity of the problem with priorknowledge.
J.-P. Vert (Ecole des Mines) Supervised network inference 24 / 32
Example : Norm Constraints
The approachA common method in statistics to learn with few samples in highdimension is to constrain the norm of β, e.g.:
Euclidean norm (support vector machines, ridge regression):‖β ‖2 =
∑pi=1 β2
i
L1-norm (lasso regression) : ‖β ‖1 =∑p
i=1 |βi |
ProsGood performance inclassification
ConsLimited interpretation(small weights)No prior biologicalknowledge
J.-P. Vert (Ecole des Mines) Supervised network inference 25 / 32
Example 2: Feature Selection
The approachConstrain most weights to be 0, i.e., select a few genes (< 20) whoseexpression are enough for classification. Interpretation is then aboutthe selected genes.
ProsGood performance inclassificationUseful for biomarkerselectionApparently easyinterpretation
ConsThe gene selectionprocess is usually notrobustWrong interpretation isthe rule (too muchcorrelation betweengenes)
J.-P. Vert (Ecole des Mines) Supervised network inference 26 / 32
Pathway interpretation
MotivationBasic biological functions are usually expressed in terms ofpathways and not of single genes (metabolic, signaling,regulatory)Many pathways are already knownHow to use this prior knowledge to constrain the weights to havean interpretation at the level of pathways?
J.-P. Vert (Ecole des Mines) Supervised network inference 27 / 32
Pathway-derived norm constraint
One solution (Rapaport et al., 2007)Let the set of pathways be represented by an undirected graph.Consider the pathway-derived norm:
Ω(β) =∑i∼j
(βi − βj
)2.
Constrain Ω(β) instead of ‖β ‖22
Remard: this is equivalent to a SVM with a particular kernel.
J.-P. Vert (Ecole des Mines) Supervised network inference 28 / 32
Pathway interpretation
N
-
Glycan biosynthesis
Protein kinases
DNA and RNA polymerase subunits
Glycolysis / Gluconeogenesis
Sulfurmetabolism
Porphyrinand chlorophyll metabolism
Riboflavin metabolism
Folatebiosynthesis
Biosynthesis of steroids, ergosterol metabolism
Lysinebiosynthesis
Phenylalanine, tyrosine andtryptophan biosynthesis Purine
metabolism
Oxidative phosphorylation, TCA cycle
Nitrogen,asparaginemetabolism
Bad exampleThe graph is thecomplete knownmetabolic network of thebudding yeast (fromKEGG database)We project the classifierweight learned by aSVMGood classificationaccuracy, but nopossible interpretation!
J.-P. Vert (Ecole des Mines) Supervised network inference 29 / 32
Pathway interpretation
Good exampleThe graph is thecomplete knownmetabolic network of thebudding yeast (fromKEGG database)We project the classifierweight learned by aspectral SVMGood classificationaccuracy, and goodinterpretation!
J.-P. Vert (Ecole des Mines) Supervised network inference 30 / 32
Conclusion
Use the gene graph to encode prior knowledge about theclassifier.Prior knowledge is always needed to classify few examples inlarge dimensions (sometimes implicitly)Future work: validation of the method on more data, otherformulations, directed graphs...
J.-P. Vert (Ecole des Mines) Supervised network inference 31 / 32
Acknowledgements
Supervised graph inferenceYoshihiro Yamanishi, Minoru Kanehisa (Univ. Kyoto): kCCA, kMLKevin Bleakley, Gerard Biau (Univ. Montpellier): local SVM
Classification of microarray dataFranck Rapaport, Emmanuel Barillot, Andrei Zynoviev, MarieDutreix (Curie Institute)
J.-P. Vert (Ecole des Mines) Supervised network inference 32 / 32