Download - Machine Learning for Functional Genomics II
![Page 1: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/1.jpg)
1
Machine Learning for Functional Genomics II
Matt Hibbs
http://cbfg.jax.org
![Page 2: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/2.jpg)
2
Functional Genomics• Identify the roles played by
genes/proteins
Sealfon et al., 2006.
![Page 3: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/3.jpg)
3
Promise of Computational Functional Genomics
Data & Existing
Knowledge
Data & Existing
Knowledge
Computational Approaches
Computational Approaches
PredictionsPredictions
Laboratory ExperimentsLaboratory
Experiments
![Page 4: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/4.jpg)
4
Computational Solutions
• Machine learning & data mining– Use existing data to make new predictions
• Similarity search algorithms• Bayesian networks• Support vector machines• etc.
– Validate predictions with follow-up lab work
• Visualization & exploratory analysis– Seeing and interacting with data important– Show data so that questions can be
answered• Scalability, incorporate statistics, etc.
![Page 5: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/5.jpg)
5
Computational Solutions
• Machine learning & data mining– Use existing data to make new predictions
• Similarity search algorithms• Bayesian networks• Support vector machines• etc.
– Validate predictions with follow-up lab work
• Visualization & exploratory analysis– Seeing and interacting with data important– Show data so that questions can be
answered• Scalability, incorporate statistics, etc.
![Page 6: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/6.jpg)
6
Bayesian Networks• Encodes dependence relationships
between observed and unobserved events
Raining?Raining?
Jim brought umbrella
Jim brought umbrella
Cloudy this morning
Cloudy this morning
Rain in forecastRain in forecast
![Page 7: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/7.jpg)
7
Bayesian Network Overview• Graphical representation of
relationships– Probabilistic information from data to
concepts
![Page 8: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/8.jpg)
8
Bayesian Network Overview• Graphical representation of
relationships– Probabilistic information from data to
concepts
![Page 9: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/9.jpg)
9
Bayesian Network OverviewP(FR | CE, AP, Y2H)
P(FR | CE=yes, AP=yes, Y2H=yes)
= α P(FR) P(CE=yes|FR) Σ P(PI|FR) P(AP=yes|PI) P(Y2H=yes|PI)
P(FR=yes) + P(FR=no) = 0.0105α + 0.0216α P(FR) = .327 (up from 0.10)Bayes’ Rule: P(A|B) ~ P(A) P(B|A)Bayes’ Rule: P(A|B) ~ P(A) P(B|A)
![Page 10: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/10.jpg)
10
Naïve Bayes• No internal hidden nodes• Greatly simplifies problem, reduces
computational complexity and time• Imposes independence assumption
![Page 11: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/11.jpg)
11
Naïve BayesP(FR | D1, D2, D3, D4)
Bayes’ Rule: P(A|B) ~ P(A) P(B|A)Bayes’ Rule: P(A|B) ~ P(A) P(B|A)
= α P(FR) P(D1|FR) P(D2|FR) P(D3|FR) P(D4|FR)
Assumes that all measures are independent
![Page 12: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/12.jpg)
12
Learning Naïve Bayes Nets
FR = yes FR = no
100 900 counts
0.1 0.9 prob.
FR # D1 = yes # D1 = no P(D1=yes)
yes 70 30 .7
no 300 600 .33…
![Page 13: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/13.jpg)
13
Steps for Bayesian network integration
• Construct a gold standard• Convert data to pair-wise format• Count positive/negative pairs in each
dataset• Create CPTs to define Bayes net• Inference to calculate all pair-wise
probabilities• Evaluate performance• Predict functions given network
![Page 14: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/14.jpg)
14
Steps for Bayesian network integration
• Construct a gold standard• Convert data to pair-wise format• Count positive/negative pairs in each
dataset• Create CPTs to define Bayes net• Inference to calculate all pair-wise
probabilities• Evaluate performance• Predict functions given network
![Page 15: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/15.jpg)
15
Gold Standard Construction• Gene Ontology annotations used to define known
functional relationships
Threshold for positive relationships
Threshold for negative relationships
Myers et al., 2006
![Page 16: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/16.jpg)
16
Gold Standard Used For Trainingpositive relationshipsnegative relationships
Global Gold Standard
![Page 17: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/17.jpg)
17
Steps for Bayesian network integration
• Construct a gold standard• Convert data to pair-wise format• Count positive/negative pairs in each
dataset• Create CPTs to define Bayes net• Inference to calculate all pair-wise
probabilities• Evaluate performance• Predict functions given network
![Page 18: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/18.jpg)
18
Gene-Gene Scores• Binary data– PPI, co-localization, synthetic lethality– Can use binary scores– Can use profiles to generate scores (dot
product)
• Continuous data– Profile distance metrics
• Binning results– Converts everything to discrete case
![Page 19: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/19.jpg)
19
Distance Metrics• Choice of distance measure is important for
quantifying relationships in datasets• Pair-wise metrics – compare vectors of numbers– e.g. genes x & y, ea. with n measurements
Euclidean Distance
Pearson Correlation
Spearman Correlation
![Page 20: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/20.jpg)
20
Distance Metrics
Euclidean Distance
Pearson Correlation
Spearman Correlation
![Page 21: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/21.jpg)
21
• Commonly used Pearson correlation yields greatly different distributions of correlation
• These differences complicate comparisons
DeRisi et al., 97 Primig et al., 00
Histograms of Pearson correlations between all pairs of genes
Sensible Binning
![Page 22: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/22.jpg)
22
• Fisher Z-transform, Z-score equalizes distributions
• Increases comparability between datasets
Histograms of Z-scores between all pairs of genes
Sensible Binning
![Page 23: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/23.jpg)
23
Pre-calculation and Storage• Pair-wise distances only need to be
calculated once, even if using different binnings
• Typical mouse microarray ~5-20k genes
• 16M pair-wise distances• ~50-700 MB of storage for one dataset• ~800 datasets in GEO• ~200 GB for all datasets
![Page 24: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/24.jpg)
24
Steps for Bayesian network integration
• Construct a gold standard• Convert data to pair-wise format• Count positive/negative pairs in each
dataset• Create CPTs to define Bayes net• Inference to calculate all pair-wise
probabilities• Evaluate performance• Predict functions given network
![Page 25: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/25.jpg)
25
Counting & Learning• Conceptually straightforward• Counting– Just look at all of the pairs in each dataset,
see which bin it falls into, increment a counter
– But… you need to do this 16M times/dataset
– “Dumb” parallelization – each dataset is independent
• Learning CPTs– Fractions based on counts
![Page 26: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/26.jpg)
26
Steps for Bayesian network integration
• Construct a gold standard• Convert data to pair-wise format• Count positive/negative pairs in each
dataset• Create CPTs to define Bayes net• Inference to calculate all pair-wise
probabilities• Evaluate performance• Predict functions given network
![Page 27: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/27.jpg)
27
Inference• Also pretty straightforward– For all pairs of genes…• For each dataset
– Look-up value from pre-calculated distances– Determine bin and value from CPT– Multiply probability into product
• Do this for FR=yes and FR=no• Normalize out α• Store Result
• 1.5GB result file
![Page 28: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/28.jpg)
28
Steps for Bayesian network integration
• Construct a gold standard• Convert data to pair-wise format• Count positive/negative pairs in each
dataset• Create CPTs to define Bayes net• Inference to calculate all pair-wise
probabilities• Evaluate performance• Predict functions given network
![Page 29: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/29.jpg)
29
Evaluation Metrics• TPs, FPs, TNs, FNs• Agnostic to pairs not appearing in
standard
• ROC curves: Sensitivity-Specificity• PR curves: Precision-Recall
![Page 30: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/30.jpg)
30
Precision Recall Curves
Ordered Predictions
Precision
TPTP + FP
Recall TPTP + FN
0
0 1
1
![Page 31: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/31.jpg)
31
Summary Statistics• AUC – area under the (ROC) curve– equivalent to Mann-Whitney U
• Average Precision – average of the precisions calculated at each true positive– quantized version of area under
precision recall curve (AUPRC)
• Precision @ n% recall
![Page 32: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/32.jpg)
32
Cross Validation
![Page 33: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/33.jpg)
33
Steps for Bayesian network integration
• Construct a gold standard• Convert data to pair-wise format• Count positive/negative pairs in each
dataset• Create CPTs to define Bayes net• Inference to calculate all pair-wise
probabilities• Evaluate performance• Predict functions given network
![Page 34: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/34.jpg)
34
Graph Analysis for Predictions
ci = confidence of functionS = set of genes in functionG = set of all geneswi,j = weight of edge
gi
![Page 35: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/35.jpg)
35
Steps for Our Evaluation
• Construct a gold standard• Convert data to pair-wise format• Count positive/negative pairs in each
dataset• Create CPTs to define Bayes net• Inference to calculate all pair-wise
probabilities• Evaluate performance• Predict functions given network
![Page 36: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/36.jpg)
36
Bayesian Network IntegrationG
ene
exp
ress
ion
Gene expression dataset 1
Gene expression dataset 2
Gene expression dataset N
Ph
ysic
al
inte
ract
ion
s
Yeast two-hybrid dataset 1
Co-precipitation dataset 1
Oth
er
Transcription factor bin sites
Localization
Curated literature
Gen
etic
in
tera
ctio
ns
Synthetic lethality dataset
Synthetic rescue dataset
Myers et al., 2005; Huttenhower et al., 2006; Guan et al., 2008
New genes predicted to interact with known mitochondrial genes
Data integration via a Bayesian network
User-selected query focuses search
Probabilistic, weighted networks of gene function
Results displayed
![Page 37: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/37.jpg)
37
Basic Approach Applied Several Times
Myers et al., 2005; 2007
Guan et al., 2008
Huttenhower et al., 2007
Huttenhower et al., 2009
![Page 38: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/38.jpg)
38
Limitations and Improvements• Original work designed for yeast, and
general notion of functionally related– Ignores reality that some genes are related
only under certain conditions– Treats multi-cellular organisms as big single-
celled organisms
• Increased specificity can be used to improve results– 2nd iteration of bioPIXIE included biological
processes into gold standards– Currently working on 2nd generation
mouseNET to account for tissue and developmental stages
![Page 39: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/39.jpg)
39
General mouseNET Approach
![Page 40: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/40.jpg)
40
Global Gold Standardpositive relationshipsnegative relationships
Global Gold Standard
![Page 41: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/41.jpg)
41
Specific Gold Standards• Not all datasets capture all functional
relationships– Process/Pathway specific
• Functionally related genes aren’t always functionally related– Tissue specific– Developmental stage specific
![Page 42: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/42.jpg)
42
Specific Gold Standard Construction
positive relationshipsnegative relationships
Global Gold Standard Specific Gold Standard
![Page 43: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/43.jpg)
43
Tissue/Stage Gold Standards• Based on data from GXD• Cross reference Theiler stages with
mammalian anatomy hierarchy• 729 total intersections– ranging from 50 to ~3500 genes– not including post-natal stages
![Page 44: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/44.jpg)
44
Initial Computational Evaluations
![Page 45: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/45.jpg)
45
Preliminary Results• Running 4-fold cross validation using
tissue/stage specific GO-based gold standards
training evaluation
test evaluation
![Page 46: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/46.jpg)
46
Preliminary Results• Accounting for developmental stage
helps
training evaluation
test evaluation
![Page 47: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/47.jpg)
47
Preliminary Results• Many specific tissue/stage
combinations are overfitting
training evaluation
test evaluation
![Page 48: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/48.jpg)
48
Preliminary Results• Folds were randomly generated, are
biased, need to balance positives and negatives
![Page 49: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/49.jpg)
49
New Visualization Interface• Graphle
![Page 50: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/50.jpg)
50
Simple Things Long Times• No single step is too complicated• Mostly O(G2D)• 16M * 800 * 4• Evaluating one fold ~7 hours• So far have results for ~200
tissue/stages– Should take ~3 days on the cluster– Actually took ~15 days
![Page 51: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/51.jpg)
51
Bayesian network utility• Bayesian networks powerful tool• Currently improving on existing
MouseNET project by incorporating tissue/stage information
• Preliminary results are promising, standards may be too limited
• Multiple stage process may be useful– predict tissue/stage specific expression– use these predictions in functional gold
standards– use a continuous gold standard?
![Page 52: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/52.jpg)
52
Computational Solutions
• Machine learning & data mining– Use existing data to make new predictions
• Similarity search algorithms• Bayesian networks• Support vector machines• etc.
– Validate predictions with follow-up lab work
• Visualization & exploratory analysis– Seeing and interacting with data important– Show data so that questions can be
answered• Scalability, incorporate statistics, etc.
![Page 53: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/53.jpg)
53
From Relationships to Phenotypes
• Use the outputs of Bayesian integration of data as inputs to a phenotype prediction problem
• For each gene – vector of relationship probabilities used as feature vector
• Use a Support Vector Machine (SVM) to classify genes involved in a phenotypes vs. not involved in a phenotype
• Process repeated for hundreds of phenotypes
![Page 54: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/54.jpg)
54
SVM Methodology• Every feature vector is thought of as
a point in space• Points nearer to each other tend to
belong to the same class• In our case, we have a ~20k
dimensional space where each point is a gene, and its location is determined by the relationship probabilities
~20K
gen
es
~20K probabilities
![Page 55: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/55.jpg)
55
SVM Methodology
![Page 56: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/56.jpg)
56
SVM Methodology
![Page 57: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/57.jpg)
57
SVM Methodology
![Page 58: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/58.jpg)
58
SVM Methodology
![Page 59: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/59.jpg)
59
SVM Methodology
![Page 60: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/60.jpg)
60
SVM Methodology
![Page 61: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/61.jpg)
61
Software for SVM• SVMlight & SVMperf from Cornell• http://svmlight.joachims.org/• Several simple kernels implemented,
can write additional code to use custom kernels
• “perf” version maximizes different statistics (AUC, precision, etc.)
![Page 62: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/62.jpg)
62
Phenotype Predictions• Using the MGI phenotype info as a
starting point, predicted genes for ~1150 phenotypes
![Page 63: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/63.jpg)
63
Phenotype Predictions• Every gene with at least one allele
annotated considered “involved” with the phenotype
![Page 64: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/64.jpg)
64
Phenotype Predictions• Selected phenotypes with >30
annotations, <500 annotations, non-identical
• SVM trained for each phenotype• Classification predictions created for
all tested phenotypes• Can assess prediction performance
computationally
![Page 65: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/65.jpg)
65
Evaluation Metrics• TPs, FPs, TNs, FNs• Agnostic to pairs not appearing in
standard
• ROC curves: Sensitivity-Specificity (TPR-FPR)
• PR curves: Precision-Recall
![Page 66: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/66.jpg)
66
PR-curves vs. ROC curves• ROC gives you credit for correctly
predicting negatives• For function/phenotype predictions,
we realistically are only concerned with positives
• Further, we care most about high confidence positives
• PR-curves better at showing this
![Page 67: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/67.jpg)
67
Performance Measurements• On average, 10 fold improvement
over random–Median - ~7.5 fold over random–Max ~100, Min ~1
• Some phenotypeswe can predictwell, others, not somuch
![Page 68: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/68.jpg)
68
Some Top Phenotypes• Arrested B cell differentiation• Abnormal joint morphology• Abnormal cell cycle checkpoint• Decreased circulating hormone level• Abnormal liver development• …
![Page 69: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/69.jpg)
69
Some Bottom Phenotypes• hepatoma• head bobbing• disheveled coat• necrosis• increased glycogen level• lethargy• …
![Page 70: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/70.jpg)
70
PR vs. ROC
AUC=0.63~45 fold improvement AUC=0.63
![Page 71: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/71.jpg)
71
PR vs. ROC
AUC=0.70~3 fold improvement
![Page 72: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/72.jpg)
72
Some Interesting Phenotypes
![Page 73: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/73.jpg)
73
Laboratory Evaluation• Computational evaluation helpful, but
not the real goal• Cheryl has been kindly testing two
predictions related to bone phenotypes– Timp2 and Abcg8– Timp2-/- female mice have decreased
bone density, and possible morphological defects
– Abcg8-/- male mice have increased bone density
![Page 74: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/74.jpg)
74
Timp2 Preliminary Results
Timp2-/-, 5 days old
![Page 75: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/75.jpg)
75
Results are complementary to Quantitative Genetics
![Page 76: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/76.jpg)
76
Conclusions & Plans• Bayes nets and SVMs are powerful
tools• Careful construction of Training Sets
(Gold Standard) is key• Computational evaluations need to
be appropriate to the problem context
• Laboratory evaluations are critical• Complementary approaches are good
![Page 77: Machine Learning for Functional Genomics II](https://reader035.vdocument.in/reader035/viewer/2022062323/56815249550346895dc086b6/html5/thumbnails/77.jpg)
77
Acknowledgements
• Hibbs Lab– Karen Dowell– Tongjun Gu– Al Simons
• Olga Troyanskaya Lab– Patrick Bradley– Maria Chikina– Yuanfang Guan
• Chad Myers• David Hess• Florian Markowetz• Edo Airoldi• Curtis Huttenhower
• Kai Li Lab– Grant Wallace
• Amy Caudy
• Maitreya Dunham
• Botstein, Kruglyak, Broach, Rose labs
• Kyuson Yun
• Carol Bult