computational diagnostics a new research group at the max planck institute for molecular genetics,...
TRANSCRIPT
![Page 1: Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin](https://reader036.vdocument.in/reader036/viewer/2022062304/56649f355503460f94c53d79/html5/thumbnails/1.jpg)
Computational Diagnostics
A new research group at the
Max Planck Institute for molecular Genetics,
Berlin
![Page 2: Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin](https://reader036.vdocument.in/reader036/viewer/2022062304/56649f355503460f94c53d79/html5/thumbnails/2.jpg)
Will the patient respond to this drug?
?
![Page 3: Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin](https://reader036.vdocument.in/reader036/viewer/2022062304/56649f355503460f94c53d79/html5/thumbnails/3.jpg)
computational diagnostics
A simple solution for simple problems
Find all genes that are induced at least x-fold and use them to predict clinical outcomes
![Page 4: Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin](https://reader036.vdocument.in/reader036/viewer/2022062304/56649f355503460f94c53d79/html5/thumbnails/4.jpg)
computational diagnostics
Statistical Modeling
Experimental Design, Quality Control, Scaling, Normalization, Dimension Reduction, Predictive Classification, Quantifying the Evidence, Identifying the Evidence
![Page 5: Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin](https://reader036.vdocument.in/reader036/viewer/2022062304/56649f355503460f94c53d79/html5/thumbnails/5.jpg)
computational diagnostics
Computational Infrastructure and more Data
Databases, Automatic Uploading, Standard Analysis Protocols, Analysis Software, Query Language, Understanding the disease, Designing a small Diagnostic Chip
![Page 6: Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin](https://reader036.vdocument.in/reader036/viewer/2022062304/56649f355503460f94c53d79/html5/thumbnails/6.jpg)
computational diagnostics
Clinical Practice
Large Patient Databases complemented by expression profiles monitoring the Epidemiology of the disease
![Page 7: Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin](https://reader036.vdocument.in/reader036/viewer/2022062304/56649f355503460f94c53d79/html5/thumbnails/7.jpg)
Breast Cancer, Expression Profiles and
Binary Regression in 7000 Dimensions
Rainer Spang, Harry Zuzan, Carrie Blanchette, Erich Huang, Holly Dressman, Jeff Marks,
Joe Nevins, Mike West
Duke Medical Center & Duke University
![Page 8: Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin](https://reader036.vdocument.in/reader036/viewer/2022062304/56649f355503460f94c53d79/html5/thumbnails/8.jpg)
Estrogen Receptor Status
• 7000 genes• 49 breast tumors• 25 ER+• 24 ER-
![Page 9: Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin](https://reader036.vdocument.in/reader036/viewer/2022062304/56649f355503460f94c53d79/html5/thumbnails/9.jpg)
Tumor – Chip - 7000 Numbers
![Page 10: Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin](https://reader036.vdocument.in/reader036/viewer/2022062304/56649f355503460f94c53d79/html5/thumbnails/10.jpg)
We Assume That the Following Steps Are Done:
• Choosing the patients• Doing the surgery• Handling the tissues• Preparing mRNA• Hybridizing the chips• Image analysis• Excluding low quality data• Normalization• Scaling
![Page 11: Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin](https://reader036.vdocument.in/reader036/viewer/2022062304/56649f355503460f94c53d79/html5/thumbnails/11.jpg)
•
![Page 12: Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin](https://reader036.vdocument.in/reader036/viewer/2022062304/56649f355503460f94c53d79/html5/thumbnails/12.jpg)
How Much Evidence Is There?
I am 80% sure The probability that
I know it the patient has xxx
It was a guess given the profile is
0.8, 1, 0.5
![Page 13: Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin](https://reader036.vdocument.in/reader036/viewer/2022062304/56649f355503460f94c53d79/html5/thumbnails/13.jpg)
Given
7000 Numbers
Wanted
89%
The probability that the tumor is ER+
![Page 14: Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin](https://reader036.vdocument.in/reader036/viewer/2022062304/56649f355503460f94c53d79/html5/thumbnails/14.jpg)
7000 Numbers Are More Numbers Than We Need
Predict ER status based on the expression levels of super-genes
![Page 15: Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin](https://reader036.vdocument.in/reader036/viewer/2022062304/56649f355503460f94c53d79/html5/thumbnails/15.jpg)
![Page 16: Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin](https://reader036.vdocument.in/reader036/viewer/2022062304/56649f355503460f94c53d79/html5/thumbnails/16.jpg)
Overfitting: We Can Not Identify a Model
• There are many different models that assign high probabilities for ER+ tumors and low probabilities for ER- tumors in the training set
• For a new patient we find among these models some that support that she is ER+ and others that predict she is ER-
• ???
![Page 17: Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin](https://reader036.vdocument.in/reader036/viewer/2022062304/56649f355503460f94c53d79/html5/thumbnails/17.jpg)
Given the Few Profiles With Known Diagnosis:
• The uncertainty on the right model is high
• The variance of the model-weights is large
• The likelihood landscape is flat• We need additional model
assumptions to solve the problem
![Page 18: Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin](https://reader036.vdocument.in/reader036/viewer/2022062304/56649f355503460f94c53d79/html5/thumbnails/18.jpg)
Informative Priors
Likelihood Prior Posterior
![Page 19: Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin](https://reader036.vdocument.in/reader036/viewer/2022062304/56649f355503460f94c53d79/html5/thumbnails/19.jpg)
If the Prior Is Chosen Badly:
• We can not reproduce the diagnosis of the training profiles any more
• We still can not identify the model• The diagnosis is driven mostly by
the additional assumptions and not by the data
![Page 20: Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin](https://reader036.vdocument.in/reader036/viewer/2022062304/56649f355503460f94c53d79/html5/thumbnails/20.jpg)
The Prior Needs to Be Designed in 49
Dimensions
• Shape?• Center?• Orientation?• Not to narrow ... not to wide
![Page 21: Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin](https://reader036.vdocument.in/reader036/viewer/2022062304/56649f355503460f94c53d79/html5/thumbnails/21.jpg)
Shape
multidimensional normal
for simplicity
![Page 22: Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin](https://reader036.vdocument.in/reader036/viewer/2022062304/56649f355503460f94c53d79/html5/thumbnails/22.jpg)
Center
Assumptions on the model correspond to assumptions on the
diagnosis
![Page 23: Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin](https://reader036.vdocument.in/reader036/viewer/2022062304/56649f355503460f94c53d79/html5/thumbnails/23.jpg)
Orientation
orthogonal super-genes !
![Page 24: Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin](https://reader036.vdocument.in/reader036/viewer/2022062304/56649f355503460f94c53d79/html5/thumbnails/24.jpg)
Not to Narrow ... Not to Wide
Auto adjusting model
Scales are hyper parameters with their own priors
![Page 25: Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin](https://reader036.vdocument.in/reader036/viewer/2022062304/56649f355503460f94c53d79/html5/thumbnails/25.jpg)
What are the additional assumptions
that came in by the prior?
• The model can not be dominated by only a few super-genes ( genes! )
• The diagnosis is done based on global changes in the expression profiles influenced by many genes
• The assumptions are neutral with respect to the individual diagnosis
![Page 26: Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin](https://reader036.vdocument.in/reader036/viewer/2022062304/56649f355503460f94c53d79/html5/thumbnails/26.jpg)
![Page 27: Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin](https://reader036.vdocument.in/reader036/viewer/2022062304/56649f355503460f94c53d79/html5/thumbnails/27.jpg)
Which Genes Have Driven the Prediction ?
Gene Weight
nuclear factor 3 alpha 0.853
cysteine rich heart protein 0.842
estrogen receptor 0.840
intestinal trefoil factor 0.840
x box binding protein 1 0.835
gata 3 0.818
ps 2 0.818
liv1 0.812
... many many more ... ...
![Page 28: Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin](https://reader036.vdocument.in/reader036/viewer/2022062304/56649f355503460f94c53d79/html5/thumbnails/28.jpg)
Cysteine Rich Heart Protein
![Page 29: Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin](https://reader036.vdocument.in/reader036/viewer/2022062304/56649f355503460f94c53d79/html5/thumbnails/29.jpg)
Summary ... so far
• We have solved a relatively simple computational diagnostics problem (ER-status in human breast cancers)
• Probit model• Overfitting is a problem• Additional model assumptions
do the trick
![Page 30: Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin](https://reader036.vdocument.in/reader036/viewer/2022062304/56649f355503460f94c53d79/html5/thumbnails/30.jpg)
A Common Problem With Expression Profiles
• We do not have enough samples to answer a certain question
• A possible strategy: Introduce additional model
assumptions
![Page 31: Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin](https://reader036.vdocument.in/reader036/viewer/2022062304/56649f355503460f94c53d79/html5/thumbnails/31.jpg)
Differential Expression I
Setup: Two conditions ( healthy vs sick ), some repetitions, 10 000 genes
Which genes are up or down regulated ?
The most basic question
Good because it is a hypothesis free approach
![Page 32: Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin](https://reader036.vdocument.in/reader036/viewer/2022062304/56649f355503460f94c53d79/html5/thumbnails/32.jpg)
Differential Expression II
10 000 degrees of freedom
A very bad multiple testing problem
It is possible in principal, but might require many replications depending on signal to noise ratios
SAM: regularized t-statistic + permutation based false positive rates
Hard to improve the analysis because it is a hypothesis free approach
![Page 33: Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin](https://reader036.vdocument.in/reader036/viewer/2022062304/56649f355503460f94c53d79/html5/thumbnails/33.jpg)
Clustering of Genes
• Setup: many different conditions - time series - multiple knock-outs
• 100% explorative analysis• Essentially it is rearranging the data• Good for finding hypotheses but not
for verifying them
![Page 34: Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin](https://reader036.vdocument.in/reader036/viewer/2022062304/56649f355503460f94c53d79/html5/thumbnails/34.jpg)
Clustering of Profiles (Patients)
• Maybe we can find new disease types or refine existing ones
• Completely different results when different sets of genes are used
• No predictive analysis
![Page 35: Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin](https://reader036.vdocument.in/reader036/viewer/2022062304/56649f355503460f94c53d79/html5/thumbnails/35.jpg)
Think About Data Analysis Ahead of Time
Collect possible questions on the data
Which of them are easy ? - Biologists and Bioinformaticians might have a different take on that -
Compare: number of samples vs. degrees of freedom
It is possible to compensate lack of data with model assumptions: Which assumptions make sense ?
More complex question can be the easier ones if they allow for an appropriate model