bio-it 2010 genome commons
TRANSCRIPT
1
Toward Meaningful Whole-Genome Interpretation with Open Access Tools From the Genome CommonsBioIT World Expo2010-04-22
Reece Hart, Ph.D.Chief Scientist, Genome CommonsQB3 / Center for Computational BiologyUC [email protected]
2010-04-22 11:43
2
What did we learn from their genomes?
Not much.
3
Can we agree to disagree? Probably not.
Heart Attack Risk Predictionfrom Experimental Man, DE DuncanGene Marker Risk Allele Genotype Risk CompanyCELSR2/PSEC1 rs599839 G AG 0.86CDKN2A/CDKN2B? rs10116277 T GT 1CDKN2A/CDKN2B? rs1333049 C CC 1.72MTHFD1L rs6922269 A AA 1.53CDKN2A/CDKN2B? rs2383207 G GG 1.22 23andme
deCodeMedeCodeMeNavigenicsNavigenics
4
Trouble for direct-to-consumer testing.
http://blog.navigenics.com/articles/comments/an_open_letter_to_nature/
5
There's lots of good news, too.
➢ Disease diagnosis & prognosis
➢ Drug dosing and side effects
➢ Disease variant/gene identification
➢ Technological advances
6
The Genome Commons seeks to build
open access, open source tools that
maximize the predictive, preventative,
and personalized value of genomic data.
● Technical – organize date and streamline tools
● Scientific – improve predictive accuracy
● Clinical – engage clinicians and counselors
● ELSI – address ineluctable ethical, legal, and social dilemmas
7
Collect datain one place.
8
Databases isolation impedes effective use.
935 genes
1177 Locus-Specific Databases
OMIMGeneTests/
GeneReviews
NHGRI GWAS
PharmGKB
dbSNPLiteratureLiterature
Source: http://www.hgvs.org/dblist/glsdb.html on Oct 15.Some genes have multiple LSDBs.
LSDBs
Data are studied, compiled, and stored gene-wise.That makes sense for collection, but not for genome-wide use.
9
GCdb will be a repository of variants and traits.
variants pheno-types
Genome CommonsDatabase
dbSNP
LSDBs
GeneTests
PharmGKB
GO
ICD-10
UMLS
⋮
Automated bulkloading of structured data
OMIM from dbSNP
Curated, high-quality, and traceable association data
➢ Genotypes in standard coordinates
➢ Phenotype ontologies➢ Asociations with
likelihood, confidence, evidence, and severity
➢ Up-to-date➢ Quality-controlled➢ Open access➢ Based on Unison
10
Make genomic data usable and useful.
11
Genome Commons Navigator
Facile user interfaces for basic research, clinical application, drug development, epidemiology, and other uses.
Infer variants in LD with typed markers
Identify variants with known phenotypic impact
Integrate and reconcile all classified variants into a comprehensive report
Genome Commons Database
Genotypes (e.g., by hybridization) Annotator
ImpactPredictor
VariantAnnotationIntegrator
RemapperImputer
Assembler/Aligner
VariantCaller
Whole Genome/Exome Sequences
Assemble genome sequence and call variants (separately or jointly)
Phased, aligned variants, from genotyping, imputation, or sequencing
Infer effect of unclassified genetic variants
Align variants to specified genome
Variants
The Navigator will integrate data and tools.
External Data and Tools
12
Improve variantimpact predictions.
13
➢ Follow the successful CASP framework● Solicit unpublished data● Collect blind predictions from participants● Assess against revealed annotations,
mechanisms, and phenotypes
➢ Prediction Domains:
CAGI – Critical Assessment of Genome InterpretationA community assessment of the state-of-the-art in phenotype prediction.
Molecular phenotype Organismal phenotypeA
TA
T
With John Moult & Steven Brenner
Cellular phenotypeA
T
14
MTHFR and Methylation
5,10-Methylene tetrahydrofolate (TH4) is required for the synthesis of nucleic acids, while 5-methyl TH4 is required for the formation of methionine from homocysteine. Methionine, in the form of S-adenosylmethionine, is required for many biological methylation reactions, including DNA methylation. Methylene TH4 reductase is a flavin-dependent enzyme required to catalyze the reduction of 5,10-methylene TH4 to 5-methyl TH4.
Linus Pauling Institutehttp://lpi.oregonstate.edu
met13
fol3
exogenousfolate
15
Sequencing 18 Genes of Folate PathwayGuthrie-Spot Sequencing Protocol
➢ 250 NTD children and 250 case matched controls
➢ Protocol● 2mm punch● Isolate genomic DNA● Amplification● Purification● Sequencing by JGI
➢ Variant calls of 238 exons in 18 genes● Analysis● Curate● QC
Jasper Rine
16
MTHFR variants exhibit 3 classes of effects.[F
OLI
NIC
AC
ID]
50 µ
g/m
l 25
µg/
ml
0
0.1
0.2
0.3
0.4
0.5
0.6
0 612 18 24 30 36 42 48 54 60
HOURS
OD
600
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 612 18 24 30 36 42 48 54 60
HOURS
OD
600
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 612 18 24 30 36 42 48 54 60
HOURS
OD
600
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 612 18 24 30 36 42 48 54 60
HOURS
OD
600
0
0.1
0.2
0.3
0.4
0.5
0.6
0 612 18 24 30 36 42 48 54 60
HOURS
OD
600
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 612 18 24 30 36 42 48 54 60
HOURS
OD
600
Severely Impairede.g., R134C
Folate Remediale.g,. M110I, D223N
No Effecte.g., R519C
MTHFR
R134C
met13
MTHFR
M110I
D223N
MTHFR
R519C
OD
OD
Jasper Rine
Time
S. cerevisiae growth with MTHFR knock-in mutants
17
Step 1: Collect predictions.
mutation Team 1 Team 2M110I No Effect RemediableR134C Impaired RemediableD223N RemediableR519C No Effect No Effect
18
Step 2: Assess predictions.
mutation Team 1 Team 2 ExperimentM110I No Effect Remediable RemediableR134C Impaired Remediable ImpairedD223N Remediable RemediableR519C No Effect No Effect No Effect
19
Step 3: Celebrate and learn.It's not whether you win or lose...
mutation Team 1 Team 2 ExperimentM110I No Effect Remediable RemediableR134C Impaired Remediable ImpairedD223N Remediable RemediableR519C No Effect No Effect No Effect
20
Be clinically relevant.
21
Sequencing identifies clinically important associations.
Concurrence among cases
Intersectio n among
database s
22
Do itethically.
23
A few ineluctable ethical issues.
➢ How to fairly acknowledge aggregated data?
➢ Should scientifically suggestive results be used for clinical care?
➢ What is the balance between openness and preventing misinterpretation?
➢ What happens to confidentiality agreements during bankruptcy?
➢ How do we balance personal privacy with opportunities for public health advances?
Bernard Lo
24
Robert NussbaumJasper Rine Bernie LoSteven Brenner
The Genome Commons
25Nature. 2007 Mar 13;452(7184):151.