amia tb-review-12

81
Translational Bioinformatics 2012: The Year in Review Russ B. Altman, MD, PhD Stanford University

Upload: russ-altman

Post on 21-Mar-2017

97 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Amia tb-review-12

Translational Bioinformatics 2012: The Year in Review

Russ B. Altman, MD, PhDStanford University

Page 2: Amia tb-review-12

Goals

• Provide an overview of the scientific trends and publications in translational bioinformatics

• Create a “snapshot” of what seems to be important in March, 2011 for the amusement of future generations.

• Marvel at the progress made and the opportunities ahead.

Page 3: Amia tb-review-12

Process

1. Follow literature through the year

2. Solicit nominations from colleagues

3. Search key journals

4. Stress out a bit.

5. Select papers to highlight in ~2-3 slides

Page 4: Amia tb-review-12

Caveats• Translational bioinformatics = informatics methods

that link biological entities (genes, proteins, small molecules) to clinical entities (diseases, symptoms, drugs)--or vice versa.

• Considered last ~14 months (to Jan 2011)

• Focused on human biology and clinical implications: molecules, clinical data, informatics.

• NOTE: Amazing biological papers with straightforward informatics generally not included (genome sequencing for rare diseases, disease analyses with high-throughput data).

Page 5: Amia tb-review-12

Final list• ~100 finalist papers (will make list available)

• 25 presented here (briefly!). 14 “shout outs”

• Apologies to many I missed. Mistakes are mine.

• This talk and semi-finalist bibliography will be made available on the conference website and my blog on rbaltman.wordpress.com

• TOPICS: systems medicine, finding & defining phenotypes, biomarkers, genomic infrastructure, drug adverse events & interactions, drug repurposing

Page 6: Amia tb-review-12

Thanks!• Bruce Aronow

• Atul Butte

• Phil Bourne

• Andrea Califano

• Lisa Cannon-Albright

• Josh Denny

• Joel Dudley

• Larry Fagan

• Guy Fernald

• Carol Friedman

• Yael Garten

• Mark Gerstein

• Maureen Hillenmeyer

• George Hripcsak

• Larry Hunter

• Peter Kang

• Rachel Karchin

• Konrad Karczewski

• Hiroaki Kitano

• Ron Kostoff

• Alain Laederach

• Jennifer Lahti

• Tianyun Liu

• Yves Lussier

• Dan Masys

• Alex Morgan

• Stephen Montgomery

• Peter O’Donnell

• Lucila Ohno-Machado

• Raul Rabadan

• Predrag Radcovic

• Soumya Raychaudhuri

• Neil Sarkar

• Nigam Shah

• Ted Shortliffe

• Mike Snyder

• Nick Tatonetti

• Peter Tarczy-Hornoch

• Olga Troyanskaya

• Alfonso Valencia

• Liping Wei

• Jeff Williamson

• Jonathan Wren

• Hong Yu

• Qunying Xie

Page 7: Amia tb-review-12

“ISCB public policy statement on open access to scientific and technical research literature” (Lathrop et al, PLoS Comp Bio)

• Goal: Influence policy by supporting open access to scientific literature (and block attempts by for-profit publishers to roll back open access rules)

• Conclusion: (1) essential to have access for mining, (2) existing models show success, (3) will enable novel tools, (4) supplementary data should be freely available, (5) cost recovery is necessary, (6) details will matter but should not distract, (7) neutral on funding policy, (8) cost is small compared to alternative.

Page 8: Amia tb-review-12
Page 9: Amia tb-review-12

Systems medicine

Page 10: Amia tb-review-12

“Three-dimensional reconstruction of protein networks provides insight into human genetic disease” (Wang et al, Nat. Biotech.)

• Goal: Understand molecular mechanisms underlying human disease.

• Method: Create interactome of genes, mutations and associated disorders, in context of protein-protein interactions.

• Result: In-frame mutations occur at protein interfaces & disease specificity depends on location within an interface.

• Conclusion: Predict 292 genes for 694 diseases.

Page 11: Amia tb-review-12
Page 12: Amia tb-review-12

Nodes = proteins;edges =

interactions; colored nodes =

disease associated proteins

Page 13: Amia tb-review-12

“Protein networks as logic functions in development and cancer” (Dutkowski et al, PLoS Comp Bio)

• Goal: Understand how protein modules combine protein functions to create output signals.

• Method: Network-Guided Forests to identify predictive modules and logic functions that connect module to component genes.

• Result: Modules implement complex logic, not simple linear models.

• Conclusion: Genetic effects of cancer genes are not additive, but engage in nontrivial combinatorial logic.

Page 14: Amia tb-review-12
Page 15: Amia tb-review-12
Page 16: Amia tb-review-12

“Reverse engineering of TLX oncogenic transcriptional networks identifies RUNX1 as a tumor suppressor in T-ALL” (Gatta et al, Nat. Medicine)

• Goal: Use transcriptional data to study the pathogenesis of T-cell ALL, understand regulation

• Method: Network structure analysis of relationships gathered from expression analysis.

• Result: TLX1 and TLX3 key regulators. RUNX1 is tumor supressor and shows high rates of loss-of-function mutations in T-ALL subjects

• Conclusion: Network analyses can identify key cancer players.

Page 17: Amia tb-review-12
Page 18: Amia tb-review-12

“Computational modeling of pancreatic cancer reveals kinetics of metastasis suggesting optimum treatment strategies” (Haeno et al, Cell)

• Goal: Math model of pancreatic cancer progression and impact of different drug dosing regimens.

• Method: Differential equation modeling of cell growth, death and effects of drugs.

• Result: Therapies that reduce growth rate of cells early look superior to upfront resection strategies.

• Conclusion: Math modeling of cancer progression can yield insight into detailed risks/benefits of different treatment strategies.

Page 19: Amia tb-review-12
Page 20: Amia tb-review-12
Page 21: Amia tb-review-12
Page 22: Amia tb-review-12

“Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE” (Qiu et al, Nature Biotech)

• Goal: Understand cellular heterogeneity in single cell measurements of hematopoietic cells.

• Method: Spanning-tree progression analysis of density-normalized events (SPADE) to subcluster and connect cell groupings.

• Result: Found hierarchies of related phenotypes recapitulating hematopoiesis.

• Conclusion: Cytometry data allows separation, characterization and definition of relationships between single cells.

Page 23: Amia tb-review-12
Page 24: Amia tb-review-12
Page 25: Amia tb-review-12

“Computational design of proteins targeting the conserved stem region of influenza hemagglutinin” (Fleishman et al, Science)

• Goal: Design proteins for diagnostic and therapeutic purposes.

• Method: Use protein prediction algorithms (Rosetta-derived) to design amino acid surface that bind target proteins.

• Result: Developed two proteins that bind a conserved patch on 1918 H1N1 pandemic virus.

• Conclusion: Designed proteins using knowledge-based methods can achieve high binding specificity.

Page 26: Amia tb-review-12
Page 27: Amia tb-review-12
Page 28: Amia tb-review-12

Finding & Defining Phenotypes

Page 29: Amia tb-review-12

“Using electronic patient records to discover disease correlations and stratify patient cohorts” (Roque et al, PLoS Comp Bio)

• Goal: Mine phenotype descriptions from EMR and connect to genetic networks.

• Method: Extract free-text to classify patients and disease co-occurrence. Use OMIM to map to genetics.

• Result: Large set of disease correlations, associated with genes.

• Conclusion: EMR can identify new phenotype “syndromes” with genetic hooks.

Page 30: Amia tb-review-12
Page 31: Amia tb-review-12
Page 32: Amia tb-review-12

“Enabling enrichment analysis with the human disease ontology” (LePendu et al, J Biomed Inf)

• Goal: Enable enrichment analyses with ontologies that do not have manually curated annotation sets.

• Method: Use GO annotations as filter to associate diseases to genes in 44,000 PubMed abstracts.

• Result: Can associate disease terms with 30% of the genome, and reproduce known associations of aging genes.

• Conclusion: Extension of enrichment analysis to other ontologies can use GO curation as a “seed.”

Page 33: Amia tb-review-12
Page 34: Amia tb-review-12

“Detecting novel associations in large data sets” (Reschef et al, Science)

• Goal: Find interesting (nonlinear) relationships between pairs of variables in large data sets.

• Method: Maximal information coefficient captures wide range of associations.

• Result: Applied to global health, gene expression, baseball, microbiota in gut with good results.

• Conclusion: Useful tool for finding associations in large data sets, e.g.

Page 35: Amia tb-review-12
Page 36: Amia tb-review-12

“Toward Precision Medicine: Building a Knowledge Network for Biomedical Research and a New Taxonomy of Disease” (National Academies Report)

• Goal: Explore the feasibility and need for a “new taxonomy” (NT) of human health based on molecular biology.

• Conclusion: (1) a NT will lead to better health care, (2) the time is right, (3) NT should be developed, (4) a knowledge network of disease would enable NT, (5) new models for population-based research will enable NT, (6) redirection of resources could facilitate development.

Page 37: Amia tb-review-12
Page 38: Amia tb-review-12

Biomarkers

Page 39: Amia tb-review-12

“Efficient replication of over 180 genetic associations with self-reported medical data” (Tung et al, PLoS ONE)

• Goal: Assess whether self-reported phenotypes are adequate for discovery.

• Method: Attempt to replicate genetic associations from 23andme customers.

• Result: 180 (70%) of associations replicated.

• Conclusion: Self-reported phenotypes have lower precision but still allow discovery.

Page 40: Amia tb-review-12
Page 41: Amia tb-review-12
Page 42: Amia tb-review-12

“Rare de novo variants associated with autism implicate a large functional network of genes involved in formation and function of synapses” (Gilman et al, Cell)

• Goal: Identify complex networks underlying common human phenotypes.

• Method: Network based analysis of genetic associations (NETBAG) to identify genes affected by rare CNVs in autism.

• Result: Perturbed synaptogenesis is associated with autism phenotype.

• Conclusion: Networks help with analysis of rare variation.

Page 43: Amia tb-review-12
Page 44: Amia tb-review-12

Orange = known brain function

Page 45: Amia tb-review-12

Orange = known brain function

Page 46: Amia tb-review-12

Genomic infrastructure

Page 47: Amia tb-review-12

“The mystery of missing heretability: genetic interactions create phantom heritability” (Zuk et al, PNAS)

• Goal: Understand why there is so much unexplained variability in setting of GWAS.

• Method: Explore the role of genetic interactions with quantitative modeling.

• Result: Assuming additive traits leads to overestimates of hereditability, and epistasis is common.

• Conclusion: Not as much heretability is missing as is common assumed.

Page 48: Amia tb-review-12

phantom (π) vs. apparent (h2)

Page 49: Amia tb-review-12

“Performance of mutation pathogenicity prediction methods on missense variants” (Thusberg et al, Human Mutation)

• Goal: Compare methods for predicting deleterious variants in protein sequences.

• Method: 40,000 pathogenic and neutral variants tested vs. 9 methods.

• Result: Performance Matthew’s CC 0.19 to 0.65. SNPs&GO and MutPred were best.

• Conclusion: General purpose predictors still with limited capabilities.

Page 50: Amia tb-review-12
Page 51: Amia tb-review-12

“A probabilistic disease-gene finder for personal genomes” (Yandell et al, Genome Res)

• Goal: Find disease-causing variants in whole genome sequences.

• Method: Bayesian variant prioritization for coding and non-coding variants combining several sequence features = Variant Annotation, Analysis & Search Tool (VAAST)

• Result: Demonstrate ability to detect key genes in small cohorts, and common multigenic diseases.

• Conclusion: Information integration for finding rare variants can be successful.

Page 52: Amia tb-review-12
Page 53: Amia tb-review-12
Page 54: Amia tb-review-12

“Technical desiderata for the integration of genomic data into electronic health records” (Masys et al, J. Biomed Inf.)

• Goal: Understand how genomic data differs from other health data in the medical record.

• Conclusion: (1) Maintain separation of primary data and observations, (2) Support lossless compression ,(3) Link observations to lab methods, (4) Compactly represent clinical actionability, (5) Support human and machine-readable formats, (6) Anticipate changes in our understanding of variation, (7) Support both clinical care and discovery science.

Page 55: Amia tb-review-12
Page 56: Amia tb-review-12

“Genomics and privacy: implications of the new reality of closed data for the field” (Greenbaum et al, PLoS Comp Bio)

• Goal: Examine state of genomic privacy in context of emerging privacy concerns.

• Conclusions: (1) Changing ability to interpret genomes makes it a moving target, (2) Methods needed to divide genome into segments for anlaysis, (3) Modification of informed consent required, (4) Cloud computing may help control access, (5) Education challenges in analyzing personal genomes.

Page 57: Amia tb-review-12

Drugs Adverse Events & Interactions

Page 58: Amia tb-review-12

“Structure-based discovery of prescription drugs that interact with the norepinephrine transporter, NET” (Schlessinger et al, PNAS)

• Goal: Find new substrates for NET transporter, among prescription drugs.

• Method: Model 3D structure, screen 6536 small molecules.

• Result: 10/18 high scoring molecules inhibited NET.

• Conclusion: Virtual screening against a modeled structure can provide valuable pharmacological info.

Page 59: Amia tb-review-12
Page 60: Amia tb-review-12

“Predicting adverse drug reactions using publicly available PubChem BioAssay Data” (Pouliot et al, Clin Pharm & Ther.)

• Goal: Develop method to predict adverse reactions to drugs based on bioassay data.

• Method: Build regression models that relate performance in bioassays to adverse events.

• Result: For 19 organ classes, 9 predictors successfully predict on cross-validation.

• Conclusion: Bioassay data can be used to predict ADRs and may shed light on mechanism.

Page 61: Amia tb-review-12
Page 62: Amia tb-review-12

“Predicting adverse drug events using pharmacological network models” (Cami et al, Science Trans Med)

• Goal: Create predictor for drug AEs based on training data from known drug-AE relations.

• Method: Build individual regressions based on network connectivity features, ATC codes, AE codes, drug properties.

• Result: AUROC 87% (42% sens with 95% spec)

• Conclusion: Can use these network models to predict AEs before drugs released.

Page 63: Amia tb-review-12
Page 64: Amia tb-review-12

“Detecting drug interactions from adverse-event reports: interaction between paroxetine and pravastation increases blood glucose levels” (Tatonetti et al, Clin Pharm & Ther.)

• Goal: Develop method for detecting latent signs of drug-drug interactions.

• Method: Learn pattern for hyperglycemia on single drugs, apply to pairs of drugs.

• Result: Paroxetine & Pravastatin with strong signal, seen in 3 EMRs, validated in mouse model.

• Conclusion: Despite 0 reports in FDA-AERS, strong latent signal for hyperglycemia for Pa & Pr.

Page 65: Amia tb-review-12
Page 66: Amia tb-review-12
Page 67: Amia tb-review-12

Drug Repurposing

Page 68: Amia tb-review-12

“Prediction of drug combinations by integrating molecular and pharmacological data” (Zhao et al, PLoS Comp Bio)

• Goal: Predict effective drug combinations with molecular & pharmacological data.

• Method: Look at approved drug combinations for specific patterns of features (targets, indications), and use these to predict new combinations.

• Result: 69% of predictions have literature support.

• Conclusion: This approach can help look for drug combinations that are likely to be effective.

Page 69: Amia tb-review-12
Page 70: Amia tb-review-12
Page 71: Amia tb-review-12

“PREDICT: a method for inferring novel drug indications with application to personalized medicine” (Gottlieb et al, Mol Sys Biol)

• Goal: Find novel uses for existing drugs.

• Method: Use drug-drug and disease-disease similarities to create new drug-disease pairs.

• Result: Validated by assessing overlap with drugs currently in clinical trials. Also cross-validation .90.

• Conclusion: Disease-specific signatures can predict new drugs with high cross-val.

Page 72: Amia tb-review-12
Page 73: Amia tb-review-12
Page 74: Amia tb-review-12

“Discovery and preclinical validation of drug indications using compendia of public gene expression data” (Sirota et al, Science Trans Med)

• Goal: Predict novel drug uses

• Method: Compare molecular signatures (expression) for drugs and diseases and find complements.

• Result: New indications for 164 drugs, some experimentally validated.

• Conclusion: A computational method for suggesting drug repurposing.

(Dudley et al in same issue showed validation of anticonvulsant topiramate for IBD using this method).

Page 75: Amia tb-review-12
Page 76: Amia tb-review-12

Shout outs...“A systematic survey of loss-of-function variants in human protein-coding genes.”

MacArthur et al, Science

“PASTE: patient-centered SMS text tagging in a medication management system.” Stenner et al, JAMIA

“Discovering disease associations by integrating electronic clinical data and medical literature” Holmes et al, PLoS ONE

“Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions: using electronic medical records for genome- and phenome-wide studies.”

Denny et al, AJHG

“BioNOT: a searchable database of biomedical negated sentences.”Agarwal et al, BMC Bioinformatics

“The impact of risk information exposure on women’s beliefs about direct-to-consumer genetic testing for BRCA mutations” Gray et al, Clinical Genetics

“Mapping clinical phenotype data elements to standardized metadata repositories and controlled terminologies; the eMERGE Network experience.” Pathak et al, JAMIA.

Page 77: Amia tb-review-12

Shout outs...“Evidence for hitchhiking of deleterious mutations within the human genome.”

Chun & Fay, PLoS Genetics

“The write position” Wren et al, EMBO Reports

“Phase whole-genome genetic risk in a family quartet using a major allele reference sequence.” Dewey et al, PLoS Genetics

“A quantitative analysis of adverse events and ‘overwarning’ in drug labeling” Duke et al, Arch. Internal Medicine

“Making a definitive diagnosis: successful clinical application of whole genome sequencing in a child with intractable inflammatory bowel disease.”

Worthey et al, Genetics in Medicine

“Global analysis of disease-related DNA sequence variation in 10 healthy individuals: implications for whole genome-based clinical diagnostics.”

Moore et al, Genetics in Medicine

“Enterotypees of the human gut microbiome.” Arumugam et al, Nature

Page 78: Amia tb-review-12

2011 Crystal ball... Consumer sequencing (vs. genotyping) will emerge

Cloud computing will contribute to major biomedical discovery.

Informatics applications to stem cell science will increase

Important discoveries from text mining

Population-based data mining will yield important biomedical insights

Systems modeling will suggest useful polypharmacy

Immune genomics will emerge as powerful data

Page 79: Amia tb-review-12

2011 Crystal ball... Consumer sequencing (vs. genotyping) will emerge

Cloud computing will contribute to major biomedical discovery.

Informatics applications to stem cell science will increase

Important discoveries from text mining

Population-based data mining will yield important biomedical insights

Systems modeling will suggest useful polypharmacy

Immune genomics will emerge as powerful data

Page 80: Amia tb-review-12

2012 Crystal ball... Cloud computing will contribute to major biomedical discovery.

Informatics applications to stem cell science will increase

Immune genomics will emerge as powerful data

Flow cytometry informatics will grow

Molecular & expression data will combine for drug repurposing

Exome sequencing will persist longer than expected

Progress in interpreting non-coding DNA variations