sam danziger institute for genomics and bioinformatics department of biomedical engineering
Post on 30-Dec-2015
33 Views
Preview:
DESCRIPTION
TRANSCRIPT
Choosing where to look next in Choosing where to look next in a mutation sequence space:a mutation sequence space:
Active Learning of informative p53 cancer rescue mutants
Sam DanzigerInstitute For Genomics and Bioinformatics
Department of Biomedical Engineering
University of California, Irvine
www.SamDanziger.com
Rainer BrachmannDepartment of Medicine
Richard LathropDepartment of Computer Science
Jue ZengDepartment of Medicine
University of California, Irvine
OutlineOutline Overview: Computer Guided DiscoveryOverview: Computer Guided Discovery Problem: Cancer and p53Problem: Cancer and p53 Results: Best Active LearningResults: Best Active Learning Next: Future ExperimentsNext: Future Experiments
Computer Guided DiscoveryComputer Guided DiscoveryOf “Active” Mutant ProteinsOf “Active” Mutant Proteins
Known Known MutantsMutants
Other Possible Other Possible MutantsMutants
• Starting Point: A biomedically important protein with some known mutants.• Problem: Find novel mutant proteins with an “Active” phenotype.• Naive Solution: Make and test all other possible mutants in the wet lab.
Why Use Computers?Why Use Computers?
Spiral Galaxy M101
http://hubblesite.org/
~10^9 stars~10^9 stars.
Known Mutants
Known Mutants: ~10^2~10^2Assuming up to 5 mutants in 200 residuesHow Many Mutants are There?: ~10^11~10^11
A Better Solution: Active Learning A Better Solution: Active Learning Pick the best unknown mutants to knowPick the best unknown mutants to know
Example MExample M
……
Example N+4Example N+4
Example N+3Example N+3
Example N+2Example N+2
Example N+1Example N+1
UnknownUnknown
Example N
…
Example 3
Example 2
Example 1
Known
Classifier
Train the Classifier
Choose an Example to
Label
Training Set Add the New ExampleTo Training Set
An Example of Active Learning:An Example of Active Learning:Minimum Marginal HyperplaneMinimum Marginal Hyperplane
Should unknown Mutant 1Mutant 1 or Mutant 2Mutant 2 be added to the training set?
Select Mutant 2Mutant 2
ACTIVEACTIVE
INACTIVEINACTIVE
Known Active
Known Inactive
11
22Unknown Mutant 111
Unknown Mutant 222
Another Example: Another Example: Maximum CuriosityMaximum Curiosity
Should Mutant 1Mutant 1 or Mutant 2Mutant 2 be added to the training set?
Cross-validator
Training Set + Mutant 1Mutant 1 (Active)
Training Set + Mutant 1Mutant 1 (Inactive)
.0411.0411
.0276
-.6014
.0309
Training Set
Change in correlation coefficient
Cross-validator
Cross-validator
Cross-validator
Training Set + Mutant 2Mutant 2 (Active)
Training Set + Mutant 2Mutant 2 (Inactive)
Select Mutant 1Mutant 1
A Third Example:A Third Example:Entropic TradeoffEntropic Tradeoff
Known Active
Known Inactive
Unclassified
ACTIVEACTIVE
INACTIVEINACTIVE
OKOK
OKOK
OKOK
SelectedUnclassified
OKOK
Which is the Best Active Which is the Best Active Learning Method?Learning Method?
TYPE ITYPE I: : Select mutants that most improve the classifier if correctly predicted.Select mutants that most improve the classifier if correctly predicted. Maximum CuriosityMaximum Curiosity Composite ClassifierComposite Classifier Improved Composite ClassifierImproved Composite Classifier
TYPE IITYPE II: : Select mutants that most improve the classifier.Select mutants that most improve the classifier. Additive CuriosityAdditive Curiosity Additive Bayesian SurpriseAdditive Bayesian Surprise
TYPE IIITYPE III: : Common methods taken from the literature.Common methods taken from the literature. Minimum Marginal HyperplaneMinimum Marginal Hyperplane Maximum EntropyMaximum Entropy
TYPE IVTYPE IV: : Variations on methods from the literature.Variations on methods from the literature. Maximum Marginal HyperplaneMaximum Marginal Hyperplane Minimum EntropyMinimum Entropy Entropic TradeoffEntropic Tradeoff
TYPE CTYPE C: : ControlsControls Non-iterated PredictionNon-iterated Prediction Predict All InactivePredict All Inactive Random (30 trials)Random (30 trials)
OutlineOutline Overview: Computer Guided DiscoveryOverview: Computer Guided Discovery Problem: Cancer and p53Problem: Cancer and p53 Results: Best Active LearningResults: Best Active Learning Next: Future ExperimentsNext: Future Experiments
The Problem: p53 and CancerThe Problem: p53 and Cancer p53 mutations occur in ~50% of human cancersp53 mutations occur in ~50% of human cancers
Tumor Suppressor Tumor Suppressor Protein.Protein.
Receives upstream Receives upstream signals indicating signals indicating cellular stress.cellular stress.
Acts as a transcription Acts as a transcription factor in the cancer factor in the cancer suppression pathway.suppression pathway.
p53 core domain bound to DNAImage Generated with UCSF Chimera
Cho, Y., Gorina, S., Jeffrey, P.D., Pavletich, N.P. Crystal structure of a p53 tumor suppressor-DNA complex:
understanding tumorigenic mutations. Science v265 pp.346-355 , 1994
The p53 Cancer PathwayThe p53 Cancer Pathway
David W. Meek: http://www.dundee.ac.uk/biomedres/meek.htm
N C
Core domain for DNA binding Tetramerization
102-292 324-355
Transactivation
1-42
The Concept of “Cancer Rescue”:The Concept of “Cancer Rescue”:Second-site Suppressor MutationsSecond-site Suppressor Mutations
175
245
248249
273
282
Cancer mutation prevalence data from the IARC p53 database: http://www-p53.iarc.fr/
235+240
Ultimate GoalUltimate Goal
Inactive p53
Cancer Mutant
Engineered Small
MoleculeDrug
+ =
Functionally Active
Rescued p53
Advance medical practice by revealing p53 mutant functional properties across p53’s mutation sequence space.
Intermediate GoalIntermediate Goal
Find novel p53 Cancer Rescue Mutants.
Immediate GoalImmediate Goal
Evaluating Cancer Rescue Evaluating Cancer Rescue Mutants in the Wet LabMutants in the Wet Lab
A Yeast containing an inactive inactive p53 cancerp53 cancer mutant
will not growwill not grow.
A Yeast containing an active active p53 cancer rescuep53 cancer rescue mutant
will growwill grow.
INACTIVEINACTIVE ACTIVEACTIVE
Baroni, T.E., Wang, T., Qian, H., Dearth, L.R., Truong, L.N., Zeng, J., Denes, Baroni, T.E., Wang, T., Qian, H., Dearth, L.R., Truong, L.N., Zeng, J., Denes, A.E., Chen, S.W. and Brachmann, R.K. (2004) A global suppressor motif for A.E., Chen, S.W. and Brachmann, R.K. (2004) A global suppressor motif for
p53 cancer mutants. p53 cancer mutants. Proc Natl Acad Sci U S AProc Natl Acad Sci U S A, 101, 4930-5., 101, 4930-5.
In VitroIn Vitro Phenotype Phenotype
In a Nutshell In a Nutshell
Cancer Rescue MutantsCancer Rescue Mutants
Use Active Use Active Learning to select Learning to select the p53 mutants the p53 mutants that will be the that will be the most informative.most informative.
Test the predictions Test the predictions in-vitro.in-vitro.Build classifiers of putative p53 cancer rescue mutants.
ExperimentExperiment
ModelModel
Find all p53 Find all p53 cancer rescue cancer rescue
mutantsmutants
KnowledgeKnowledge
OutlineOutline Overview: Computer Guided DiscoveryOverview: Computer Guided Discovery Problem: Cancer and p53Problem: Cancer and p53 Results: Best Active LearningResults: Best Active Learning Next: Future ExperimentsNext: Future Experiments
The Active Learning Tradeoff:The Active Learning Tradeoff:
How Fast Does It Learn?How Fast Does It Learn?
The Active Learning Tradeoff:The Active Learning Tradeoff:
How Accurate On The Chosen?How Accurate On The Chosen?204 Predicts 57204 Predicts 57
TypeType MethodMethod AccuracyAccuracyCorrelation Correlation
CoefficientCoefficientStudent-T Student-T
II Maximum CuriosityMaximum Curiosity 77.19% +/- 5.61%77.19% +/- 5.61% .5255.5255 0.00%0.00%
II Composite ClassifierComposite Classifier 70.18% +/- 6.11%70.18% +/- 6.11% .4447.4447 100.0%100.0%
II Improved Composite ClassifierImproved Composite Classifier 71.93% +/- 6.00%71.93% +/- 6.00% .4637.4637 100.0%100.0%
IIII Additive CuriosityAdditive Curiosity 73.68% +/- 5.88%73.68% +/- 5.88% .3857.3857 99.81%99.81%
IIII Additive Bayesian SurpriseAdditive Bayesian Surprise 73.68% +/- 5.88%73.68% +/- 5.88% .4342.4342 99.81%99.81%
IIIIIIMinimum Marginal Minimum Marginal
HyperplaneHyperplane64.91% +/- 6.38%64.91% +/- 6.38%
.2845.2845100.0%100.0%
IIIIII Maximum EntropyMaximum Entropy 64.91% +/- 6.38%64.91% +/- 6.38% .2845.2845 100.0%100.0%
IVIVMaximum Marginal Maximum Marginal
HyperplaneHyperplane78.95% +/- 5.45%78.95% +/- 5.45%
.3699.369990.42%90.42%
IVIV Minimum EntropyMinimum Entropy 77.19% +/- 5.61%77.19% +/- 5.61% .3406.3406 0.00%0.00%
IVIV Entropic TradeoffEntropic Tradeoff 80.70 % +/- 5.27%80.70 % +/- 5.27% .4860.4860 99.89%99.89%
CC Non-iterated PredictionNon-iterated Prediction 56.14% +/- 6.63%56.14% +/- 6.63% .2530.2530 100.0%100.0%
CC Predict All InactivePredict All Inactive 80.70% +/- 5.27%80.70% +/- 5.27% .0000.0000 99.89%99.89%
CC Random (30 trials)Random (30 trials)74.39% +/- 3.87%74.39% +/- 3.87%
.3550 +/- .0992.3550 +/- .099299.24% 99.24%
+/- 2.89%+/- 2.89%
The TradeoffThe Tradeoff
How Fast Does It Learn?
How
Acc
ura
te o
n t
he C
hose
n?
Sum?Sum?Length + WidthLength + Width
Geometric Distance?Geometric Distance?
Area?Area?
Length * WidthLength * Width
SolutionSolution: Average Score of All Three Metrics: Average Score of All Three Metrics
Maximum Curiosity
Entropic Tradeoff
Minimum Marginal Hyperplane
The Overall BestThe Overall Best
RankRank MethodMethodAverage Average
ScoreScore
11 Maximum CuriosityMaximum Curiosity 6.116.11
22 Entropic TradeoffEntropic Tradeoff 5.565.56
33 Random (30 trials)Random (30 trials) 5.505.50
44 Minimum EntropyMinimum Entropy 4.444.44
55Maximum Marginal Maximum Marginal
HyperplaneHyperplane3.223.22
66 Maximum EntropyMaximum Entropy 3.223.22
77 Additive Bayesian SurpriseAdditive Bayesian Surprise 2.892.89
88Minimum Marginal Minimum Marginal
HyperplaneHyperplane2.332.33
99 Additive CuriosityAdditive Curiosity 1.891.89
How Fast Does It Learn?How Fast Does It Learn?The Three Previous ExamplesThe Three Previous Examples
How Accurate On The Chosen?How Accurate On The Chosen? The Three Previous Examples The Three Previous Examples
204 Predicts 57204 Predicts 57
TypeType MethodMethod AccuracyAccuracyCorrelation Correlation
CoefficientCoefficientStudent-T Student-T
II Maximum CuriosityMaximum Curiosity77.19% +/- 77.19% +/-
5.61%5.61%.5255.5255
0.00%0.00%
IIIIIIMinimum Marginal Minimum Marginal
HyperplaneHyperplane64.91% +/- 64.91% +/-
6.38%6.38%.2845.2845
100.0%100.0%
IVIV Entropic TradeoffEntropic Tradeoff80.70 % +/- 80.70 % +/-
5.27%5.27%.4860.4860
99.89%99.89%
CC Non-iterated PredictionNon-iterated Prediction56.14% +/- 56.14% +/-
6.63%6.63%.2530.2530
100.0%100.0%
CC Predict All InactivePredict All Inactive80.70% +/- 80.70% +/-
5.27%5.27%.0000.0000
99.89%99.89%
CC Random (30 trials)Random (30 trials)74.39% +/- 74.39% +/-
3.87%3.87%.3550 +/- .0992.3550 +/- .0992
99.24% 99.24% +/- 2.89%+/- 2.89%
Why Does Random Do So Well?Why Does Random Do So Well?
Tong, S. and D. Koller (2002). "Support vector machine active learning with applications to text classification." The Journal of Machine Learning Research 2: 45-66.
Very Few Examples
OutlineOutline Overview: Computer Guided DiscoveryOverview: Computer Guided Discovery Problem: Cancer and p53Problem: Cancer and p53 Results: Best Active LearningResults: Best Active Learning Next: Future ExperimentsNext: Future Experiments
Exploring New p53 RegionsExploring New p53 Regions Each new p53 region potentially Each new p53 region potentially
introduces new rescue mechanisms.introduces new rescue mechanisms. New pools of mutants restart the New pools of mutants restart the
Active Learning problem.Active Learning problem.
113-124
281-289
p53 Core Domain
N
C
175
245
248 273
282
Most Interesting or Most Interesting or Most Interesting Active?Most Interesting Active?
Which Finds More Active Cancer Rescue Mutants?Which Finds More Active Cancer Rescue Mutants?
Iteration 1
Iteration 2
Iteration 3
Select The Most Interesting
Select The Most Interesting Active
Iteration 1
Iteration 2
Iteration 3
Known Mutants
ConclusionConclusion
TheoryTheory
Find Cancer Rescue Mutants
KnowledgeKnowledge
ExperimentExperiment
Pierre BaldiPierre Baldi
Jonathan ChenJonathan Chen
Hiroto SaigoHiroto Saigo
S. Joshua SwamidassS. Joshua Swamidass
Baldi LabBaldi LabRainer BrachmannRainer Brachmann
Jue ZengJue Zeng
Brachmann LabBrachmann Lab
Richard LathropRichard Lathrop
Gabe MoothartGabe Moothart
Lathrop LabLathrop Lab
Ying WangYing Wang
Leuke LabLeuke Lab
Ray LuoRay Luo
Qiang LuQiang Lu
Luo LabLuo Lab
AcknowledgmentsAcknowledgments
FundingFundingNational Institute of Health ( p53: CA112560 ), National Institute of Health ( p53: CA112560 ), UCI Office of Research and Graduate Studies, UCI Office of Research and Graduate Studies,
UCI Institute for Genomics and Bioinformatics ( BIT: LM007443 ), UCI Institute for Genomics and Bioinformatics ( BIT: LM007443 ), US Department of Energy (DOE)US Department of Energy (DOE)
Questions?Questions?
TheoryTheory
Find Cancer Rescue Mutants
KnowledgeKnowledge
ExperimentExperiment
Most Interesting RegionMost Interesting Region
Scan the p53 core domain to find the Scan the p53 core domain to find the most interesting region.most interesting region.
Create All Single Point Mutations in Create All Single Point Mutations in a Region a Region in-vitroin-vitro??
CODA*: Assemble p53 using thermodynamicallyoptimized oligonucleotides.
Allow all possible mutationswithin a region.
Assemble mutated regionwith cancer mutants to lookfor rescue mutants.
*http://www.codagenomics.com/
Knowledge Representation: Knowledge Representation: Homology ModelingHomology Modeling
Modeling done using Amber™ with zinc ion characteristics tuned by Dr. Qiang Lu working in Dr. Ray Lui’s lab.Modeling done using Amber™ with zinc ion characteristics tuned by Dr. Qiang Lu working in Dr. Ray Lui’s lab.
1. Take a wild type crystal structure of the protein in question.
2. Substitute one or more amino acids to mutate the protein.
3. Apply simulated physical laws to determine an energy function.
4. Minimize the energy of the new mutant protein.
Knowledge Representation: Knowledge Representation: FeaturesFeatures
Simulated Structure -> String of NumbersSimulated Structure -> String of Numbers
1d1d: Sequence Mutation Features: Sequence Mutation Features s1ds1d: Sequence Similarity Features: Sequence Similarity Features 2d2d: Surface Map Features: Surface Map Features 3d3d: Atomic Position Features: Atomic Position Features 4d4d: “Time Dependant” Stability : “Time Dependant” Stability
InformationInformation
What is Machine Learning?What is Machine Learning?
Training: Set the parameters (W) with n features.
Testing: Use the parameters (W) to predict unclassified examples
WW11
WW22
……
WWnn
FF1111 FF1212 …… FF1n1n
FF2121 …… …… ……
…… …… …… ……
FFm1m1 …… …… FFmnmn
Example 1Example 1
Example 2Example 2
……
Example mExample m
Class 1Class 1
Class 2Class 2
……
Class mClass m
Unknown Unknown FF1111 FF1212 …… FF1n1n
WW11
WW22
……
WWnn
PredictionPrediction
Modeling: Modeling: How To Use ItHow To Use It
BiologyComputer Generated Structure
Make a protein and test it in-
vitro
PRO: Real
CON: Slow
Predict a protein
structure in-silico
PRO: Fast
CON: Inaccurate, what does it tell us?
Machine
Learning
Use Homology
Modeling to guide
biological research
Maximum CuriosityMaximum Curiosity
Choose a mutant Choose a mutant from the from the testtest set set that has not been that has not been considered yet. considered yet. Assume the Assume the chosenchosen is “Active” or is “Active” or “Inactive”“Inactive”
Crossvalidate the trainingtraining set with the chosenchosen
mutant and record the correlation coefficient.
Start with a trainingtraining set of examples with known classes and an unclassed testtest set.
ModelModel
Find the Find the Mutants that Mutants that Most Improve Most Improve the Training the Training
SetSetKnowledgeKnowledge
ExperimentExperiment
Exploring New p53 RegionsExploring New p53 Regions
Each new p53 region potentially Each new p53 region potentially introduces new rescue mechanisms.introduces new rescue mechanisms.
New pools of mutants restart the New pools of mutants restart the Active Learning problem.Active Learning problem.
113-124 281-289
p53 Core Domain
Primary CollaboratorsPrimary Collaborators
Dr. Rainer Brachmann
School of Medicine
Dr. Richard LathropSchool of Information and
Computer Science
Jue ZengSchool of Medicine
top related