bio-it 2010 genome commons

25
1 Toward Meaningful Whole-Genome Interpretation with Open Access Tools From the Genome Commons BioIT World Expo 2010-04-22 Reece Hart, Ph.D. Chief Scientist, Genome Commons QB3 / Center for Computational Biology UC Berkeley [email protected] 2010-04-22 11:43

Upload: reece-hart

Post on 07-May-2015

431 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Bio-IT 2010 Genome Commons

1

Toward Meaningful Whole-Genome Interpretation with Open Access Tools From the Genome CommonsBioIT World Expo2010-04-22

Reece Hart, Ph.D.Chief Scientist, Genome CommonsQB3 / Center for Computational BiologyUC [email protected]

2010-04-22 11:43

Page 2: Bio-IT 2010 Genome Commons

2

What did we learn from their genomes?

Not much.

Page 3: Bio-IT 2010 Genome Commons

3

Can we agree to disagree? Probably not.

Heart Attack Risk Predictionfrom Experimental Man, DE DuncanGene Marker Risk Allele Genotype Risk CompanyCELSR2/PSEC1 rs599839 G AG 0.86CDKN2A/CDKN2B? rs10116277 T GT 1CDKN2A/CDKN2B? rs1333049 C CC 1.72MTHFD1L rs6922269 A AA 1.53CDKN2A/CDKN2B? rs2383207 G GG 1.22 23andme

deCodeMedeCodeMeNavigenicsNavigenics

Page 4: Bio-IT 2010 Genome Commons

4

Trouble for direct-to-consumer testing.

http://blog.navigenics.com/articles/comments/an_open_letter_to_nature/

Page 5: Bio-IT 2010 Genome Commons

5

There's lots of good news, too.

➢ Disease diagnosis & prognosis

➢ Drug dosing and side effects

➢ Disease variant/gene identification

➢ Technological advances

Page 6: Bio-IT 2010 Genome Commons

6

The Genome Commons seeks to build

open access, open source tools that

maximize the predictive, preventative,

and personalized value of genomic data.

● Technical – organize date and streamline tools

● Scientific – improve predictive accuracy

● Clinical – engage clinicians and counselors

● ELSI – address ineluctable ethical, legal, and social dilemmas

Page 7: Bio-IT 2010 Genome Commons

7

Collect datain one place.

Page 8: Bio-IT 2010 Genome Commons

8

Databases isolation impedes effective use.

935 genes

1177 Locus-Specific Databases

OMIMGeneTests/

GeneReviews

NHGRI GWAS

PharmGKB

dbSNPLiteratureLiterature

Source: http://www.hgvs.org/dblist/glsdb.html on Oct 15.Some genes have multiple LSDBs.

LSDBs

Data are studied, compiled, and stored gene-wise.That makes sense for collection, but not for genome-wide use.

Page 9: Bio-IT 2010 Genome Commons

9

GCdb will be a repository of variants and traits.

variants pheno-types

Genome CommonsDatabase

dbSNP

LSDBs

GeneTests

PharmGKB

GO

ICD-10

UMLS

Automated bulkloading of structured data

OMIM from dbSNP

Curated, high-quality, and traceable association data

➢ Genotypes in standard coordinates

➢ Phenotype ontologies➢ Asociations with

likelihood, confidence, evidence, and severity

➢ Up-to-date➢ Quality-controlled➢ Open access➢ Based on Unison

Page 10: Bio-IT 2010 Genome Commons

10

Make genomic data usable and useful.

Page 11: Bio-IT 2010 Genome Commons

11

Genome Commons Navigator

Facile user interfaces for basic research, clinical application, drug development, epidemiology, and other uses.

Infer variants in LD with typed markers

Identify variants with known phenotypic impact

Integrate and reconcile all classified variants into a comprehensive report

Genome Commons Database

Genotypes (e.g., by hybridization) Annotator

ImpactPredictor

VariantAnnotationIntegrator

RemapperImputer

Assembler/Aligner

VariantCaller

Whole Genome/Exome Sequences

Assemble genome sequence and call variants (separately or jointly)

Phased, aligned variants, from genotyping, imputation, or sequencing

Infer effect of unclassified genetic variants

Align variants to specified genome

Variants

The Navigator will integrate data and tools.

External Data and Tools

Page 12: Bio-IT 2010 Genome Commons

12

Improve variantimpact predictions.

Page 13: Bio-IT 2010 Genome Commons

13

➢ Follow the successful CASP framework● Solicit unpublished data● Collect blind predictions from participants● Assess against revealed annotations,

mechanisms, and phenotypes

➢ Prediction Domains:

CAGI – Critical Assessment of Genome InterpretationA community assessment of the state-of-the-art in phenotype prediction.

Molecular phenotype Organismal phenotypeA

TA

T

With John Moult & Steven Brenner

Cellular phenotypeA

T

Page 14: Bio-IT 2010 Genome Commons

14

MTHFR and Methylation

5,10-Methylene tetrahydrofolate (TH4) is required for the synthesis of nucleic acids, while 5-methyl TH4 is required for the formation of methionine from homocysteine. Methionine, in the form of S-adenosylmethionine, is required for many biological methylation reactions, including DNA methylation. Methylene TH4 reductase is a flavin-dependent enzyme required to catalyze the reduction of 5,10-methylene TH4 to 5-methyl TH4.

Linus Pauling Institutehttp://lpi.oregonstate.edu

met13

fol3

exogenousfolate

Page 15: Bio-IT 2010 Genome Commons

15

Sequencing 18 Genes of Folate PathwayGuthrie-Spot Sequencing Protocol

➢ 250 NTD children and 250 case matched controls

➢ Protocol● 2mm punch● Isolate genomic DNA● Amplification● Purification● Sequencing by JGI

➢ Variant calls of 238 exons in 18 genes● Analysis● Curate● QC

Jasper Rine

Page 16: Bio-IT 2010 Genome Commons

16

MTHFR variants exhibit 3 classes of effects.[F

OLI

NIC

AC

ID]

50 µ

g/m

l 25

µg/

ml

0

0.1

0.2

0.3

0.4

0.5

0.6

0 612 18 24 30 36 42 48 54 60

HOURS

OD

600

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 612 18 24 30 36 42 48 54 60

HOURS

OD

600

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 612 18 24 30 36 42 48 54 60

HOURS

OD

600

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 612 18 24 30 36 42 48 54 60

HOURS

OD

600

0

0.1

0.2

0.3

0.4

0.5

0.6

0 612 18 24 30 36 42 48 54 60

HOURS

OD

600

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 612 18 24 30 36 42 48 54 60

HOURS

OD

600

Severely Impairede.g., R134C

Folate Remediale.g,. M110I, D223N

No Effecte.g., R519C

MTHFR

R134C

met13

MTHFR

M110I

D223N

MTHFR

R519C

OD

OD

Jasper Rine

Time

S. cerevisiae growth with MTHFR knock-in mutants

Page 17: Bio-IT 2010 Genome Commons

17

Step 1: Collect predictions.

mutation Team 1 Team 2M110I No Effect RemediableR134C Impaired RemediableD223N RemediableR519C No Effect No Effect

Page 18: Bio-IT 2010 Genome Commons

18

Step 2: Assess predictions.

mutation Team 1 Team 2 ExperimentM110I No Effect Remediable RemediableR134C Impaired Remediable ImpairedD223N Remediable RemediableR519C No Effect No Effect No Effect

Page 19: Bio-IT 2010 Genome Commons

19

Step 3: Celebrate and learn.It's not whether you win or lose...

mutation Team 1 Team 2 ExperimentM110I No Effect Remediable RemediableR134C Impaired Remediable ImpairedD223N Remediable RemediableR519C No Effect No Effect No Effect

Page 20: Bio-IT 2010 Genome Commons

20

Be clinically relevant.

Page 21: Bio-IT 2010 Genome Commons

21

Sequencing identifies clinically important associations.

Concurrence among cases

Intersectio n among

database s

Page 22: Bio-IT 2010 Genome Commons

22

Do itethically.

Page 23: Bio-IT 2010 Genome Commons

23

A few ineluctable ethical issues.

➢ How to fairly acknowledge aggregated data?

➢ Should scientifically suggestive results be used for clinical care?

➢ What is the balance between openness and preventing misinterpretation?

➢ What happens to confidentiality agreements during bankruptcy?

➢ How do we balance personal privacy with opportunities for public health advances?

Bernard Lo

Page 24: Bio-IT 2010 Genome Commons

24

Robert NussbaumJasper Rine Bernie LoSteven Brenner

The Genome Commons

Page 25: Bio-IT 2010 Genome Commons

25Nature. 2007 Mar 13;452(7184):151.