The Genomics Revolution and Human Health
Michael Snyder
August 15, 2013
Conflicts: Personalis, Genapsys, Illumina
Health Is a Product of Genome + Environment
Exposome
Health
Genome
Health Is a Product of Genome + Environment
Exposome
Health
Genome
• Understand and Treat Disease – Cancer– Mystery diseases
• Pharmacogenomics – Determining which drug side effects and doses
• Managing Health Care in Healthy Individuals
Impact of Genomics on Medicine
Personalized Omics Profiling: Combine Genomic and Other Omic Information
Genomic Transcriptomic, Proteomic, Metabolomic
1. Predict risk2. Diagnose3. Monitor4. Treat &5. UnderstandDisease States
GGTTCCAAAAGTTTATTGGATGCCGTTTCAGTACATTTATCGTTTGCTTTGGATGCCCTAATTAAAAGTGACCCTTTCAAACTGAAATTCATGATACACCAATGGATATCCTTAGTCGATAAAATTTGCGAGTACTTTCAAAGCCAAATGAAATTATCTATGGTAGACAAAACATTGACCAATTTCATATCGATCCTCCTGAATTTATTGGCGTTAGACACAGTTGGTATATTTCAAGTGACAAGGACAATTACTTGGACCGTAATAGATTTTTTGAGGCTCAGCAAAAAAGAAAATGGAAATTAATTTTGAAGTGCCATTGA….
Genome
Transcriptome(mRNA, miRNA, isoforms, edits)
Proteome
Metabolome
PersonalOmicsProfile
Autoantibody-ome
Microbiome
Personal “Omics” Profiling (POP)
Cytokines
Epigenome
Genome
Transcriptome(mRNA, miRNA, isoforms, edits)
Proteome
Metabolome
PersonalOmicsProfile
Autoantibody-ome
Microbiome
Personal “Omics” Profiling (POP)
Cytokines
Epigenome
Initially 40K
Molecules/Measure-
ments
Now Billions!
Personal Omics Profile40 months; 61 Timepoints; 6 Viral Infections
/
/
Chen et al., Cell 2012
Accurate Genome Sequencing
3.3 M Hi conf. SNVs, 217K Indels and 3K SVs2 or more Platforms
(Plus low confidence)
Whole Genome Sequencing• Complete Genomics: 35 b paired ends (150X)• Illumina: 100 b paired ends (120X)
Exome Sequencing• Nimblegen• Illumina• Aglilent
3.30M89%
100K2%
345K9%
CGIllumina
Local phasing + population data= highly phased blocks
Genome Phasing: Assign Variants to Parental Chromosomes
Moleculo Technology: ~6-10 kb Sequences, 6X coverage
Moleculo: Volodymyr Kuleshov, Michael Kertesz
Percent SNPs phased 98.2%
Switch accuracy 99.9%+
CodingNon-Coding
miRNA Splice UTR
miRNA targets
Seedsequence SIFT PP2
OMIM/Curated Mendelian disease
(51)
Nonsynonymous(1320)
Synonymous
mRNA stability
tRNA rate
I. Highly Penetrant Variants:
Mendelian Disease Risk Pipeline
Rick Dewey & Euan Ashley
Damaging(234)
All variants~3.5M
Rare/novel variants (<5%)
Missense• ALAD, ABCC2, ACADVL, ADAMTS13, AGRN, BAAT, CDS1,
CHD7, COL4A3, CTSD, DGCR2, DLD, DYSF, EPCAM, FGFR1OP, FKRP, GAA, GNAI2, HSPB1, IGKC, ITPR1, MED12, MKS1, NTRK1, PCM1, PKD1, PLEKHG5, PMS2, PRSS1, PTCH2, SERPINA1, SETX, SYNE1, TERT, TTN, VWF, ZFPM2, PNPLA2.
Nonsense• PRAMEF2, PLCXD2, NUP54, RP1L1, PIK3C2G,
NDE1, GGN, CYP2A7, IGKC
Not Rare But Important• KCNJ11 , KLF4, GCKR …
High Cholesterol
Aplastic Anemia
Rare Variants in Disease Genes (51 Total)
Integrate Over Many Markers:Complex Disease
0% 100%
Predict Type 2 Diabetes
GLUCOSE LEVELS
HRV INFECTION(DAY 0-21)
RSV INFECTION(DAY 289-311)
LIFESTYLE CHANGE(DAY 380-
CURRENT)14
HbA1c (%): 6.4 6.7 4.9 5.4 5.3 4.7 (Day Number) (329) (369) (476) (532) (546) (602)
Dynamical Outcomes for Integrated Analysis of Proteome, Transcriptome, Metabolome
george mias RSV 18 days
Platelet Plug Formation
Glucose Regulation of Insulin Secretion
The Future?
Genomic Sequencing
1. Predict risk2. Early Diagnose3. Monitor4. Treat
Omes and Other Information: Home Sensors
http://www.baby-connect.com/
GGTTCCAAAAGTTTATTGGATGCCGTTTCAGTACATTTATCGTTTGCTTTGGATGCCCTAATTAAAAGTGACCCTTTCAAACTGAAATTCATGATACACCAATGGATATCCTTAGTCGATAAAATTTGCGAGTACTTTCAAAGCCAAATGAAATTATCTATGGTAGACAAAACATTGACCAATTTCATATCGATCCTCCTGAATTTATTGGCGTTAGACACAGTTGGTATATTTA….
Study of 10 Healthy People5 Asian, 5 European
Dewey, Grove, Pan, Ashley, Quertermous et al
- Median 5 reportable disease risk associations (ACMG) per individual (range 2-6)
- 3 followup diagnostic tests (range 0-10)- Cost $362-$1427 per individual
- 54 minutes per variant
Many Unaddressed Challenges
1) Accuracy and coverage
2) Interpretation
3) Interpreting non-protein coding regions
4) DNA Methylation
5) Sample size
6) Exposome
1) Accurate Genome Sequences and Coverage
Whole Genome Sequencing• Complete Genomics: 35 b paired ends (150X)• Illumina: 100 b paired ends (120X)
3.30M89%
100K2%
345K9%
CGIllumina
Single Nucleotide Variants Getting Better.
Indels and Structural Variants Need Work!
SNV Comparison
• Complete Genomics: 35 b paired ends (150X)• Illumina: 100 b paired ends (120X)
3.30M89%
100K2%
345K9%
Complete Genomics
Illumina
Hugo Lam, Michael Clark, Rui Chen
Ti/Tv = 1.6817/18 Sanger
Ti/Tv = 2.1420/20 Sanger
Ti/Tv = 1.402/15 Sanger
31 DiseaseAssociated SNP
3 DiseaseAssociated SNP
Sequencing AccuracySequencing the Same Genome Twice
Personalis
146,100 SNPs (3.7%)
Exome-seq and WGS-specific detection45X WGS vs 80X Exome
Clark et al. 2011 Nature Biotech
Overall Statistics for Finishing Medically Interesting Genes- ACE
ACE v1 = Thick LinesTruSeq Exome (10G) = Thin Lines
Personalis
Normal Exome ~2,000
Custom Exome (ACE) ~2,000
Exons Covered by ACE, Missed by Standard Exome
Personalis
1) Search for disease causing mutations (highly penetrant)
GCKR (high lipids); TERT (aplastic anemia)
2) Sum over multiple common risk allele to predict risk
2. Genome Interpretation
Missense VariantsALAD, ABCC2, ACADVL, ADAMTS13, AGRN, BAAT, CDS1, CHD7, COL4A3, CTSD, DGCR2, DLD, DYSF, EPCAM, FGFR1OP, FKRP, GAA, GNAI2, HSPB1, IGKC, ITPR1, MED12, MKS1, NTRK1, PCM1, PKD1, PLEKHG5, PMS2, PRSS1, PTCH2, SERPINA1, SETX, SYNE1, TERT, TTN, VWF, ZFPM2, PNPLA2.
0% 100% Ashley, Butte et al.
Missing Regulatory Variation
88% of Disease Variants Lie Outside of Genes!
26
X
Two approaches:
1) Mapping transcription factor binding in different people.
2) RegulomeDB: Assembling regulatory information from the ENCODE Project and other sources.
Damaging Variation in an Individual
Gene Regulatory region
Protein Coding Non-coding
and
CAPN1: Protective against Alzheimer’s
Coding Variants
Regulatory Variants
3. Incorporate Methylation Data
Possible Phenotypic Consequences of Differentially Methylated Regions?
4. Sample Size—Need to Reduce
AliveCor Measures ECG
5. Other Data Types: Sensors
71
Moves App
Conclusions1) Personal genome sequencing is here. The
medical interpretation is difficult.
2) Genome sequencing can predict disease risk that can be monitored with other omics information.
3) Integrated analysis can provide a detailed physiological perspective for what is occurring.
4) Every person’s complex disease profile is different and following many components longitudinally may provide valuable information.
5) You are responsible for your own health
Data at: snyderome.stanford.edu
The Personal Omics Profiling Project
Rui Chen, George Mias, Hugo Lam, Jennifer Li-Pook-Than, Lihua Jiang, Konrad Karczewski, Michael
Clark, Maeve O’Huallachain, Manoj Hariharan,Yong Cheng, Suganthi Bali, Sara Hillemenyer, Rajini
Haraksingh, Elana Miriami, Lukas Habegger, Rong Chen, Joel Dudley, Frederick Dewey, Shin Lin, Teri Klein, Russ Altman, Atul Butte, Euan Ashley, Tom
Quetermous, Mark Gerstein, Kari Nadeau, Hua Tang, Phyllis Snyder
Acknowledgements
34
Human Regulatory Variation:Maya Kasowski, Fabian Grubert, Alex Urban, Alexej A, Chris Heffelfinger, Manoj Harihanan, Akwasi Asbere, Lukas Habegger, Joel Rozowsky, Mark Gerstein, Sebastian Waszak, Jan Korbel (EMBL, Heidelberg)
Regulome DB:Alan Boyle, Manoj Hariharan, Yong Cheng, Eurie Hong, Mike Cherry
Methylome:Dan Xie, Volodymyr Kuleshov, Rui Chen, Dmitry Pushkarev, Konrad Karczewski, Alan Boyle, Tim Blauwkamp, Michael Kertesz
Genome (1TB)
Transcriptome (0.7TB)(mRNA, miRNA, isoforms, edits)
Proteome (0.02 TB)
Metabolome (0.02 TB)
PersonalOmicsProfileTotal =5.74TB/
Sample + 1 TB
GenomeAutoantibody-ome
Microbiome (3TB)
6. Big Data Handling and Storage
Cytokines
Epigenome (2TB)