haendel clingenetics.3.14.14
Post on 10-May-2015
478 Views
Preview:
TRANSCRIPT
Expanding the Clinical Phenotype Space with Semantics and Model Systems Melissa HaendelMarch 14th, 2014
Updates in Clinical Genetics 2014
Outline Issues in candidate prioritization
Computational techniques for comparing phenotypes
Undiagnosed Disease Program semantic phenotyping
Minimum phenotype requirements
Tools leveraging phenotypes
The Challenge: Interpretation of Disease Candidates
?
What’s in the box? How are
candidates identified?
How do they compare?
Prioritized Candidates, Models, functional validation
M1
M2
M3
M4
...
Phenotypes
P1
P2
P3
…
Genotype info
G1
G2
G3
G4
…
Pathogenicity, frequency, protein interactions, gene expression, gene networks, epigenomics, metabolomics….
Candidate gene prioritization
B6.Cg-Alms1foz/fox/J
increased weight,adipose tissue volume,
glucose homeostasis altered
ALSM1(NM_015120.4)[c.10775delC] + [-]
GENOTYPE
PHENOTYPE
obesity,diabetes mellitus, insulin resistance
increased food intake, hyperglycemia,
insulin resistance
kcnj11c14/c14; insrt143/+(AB)
Models recapitulate various phenotypic aspects of disease
?
OMIM Query # Records
“large bone” 785
“enlarged bone” 156
“big bone” 16
“huge bones” 4
“massive bones” 28
“hyperplastic bones” 12
“hyperplastic bone” 40
“bone hyperplasia” 134
“increased bone growth” 612
Searching for phenotypes usingtext alone is insufficient
Problem: Clinical and model phenotypes are described differently
“Expanding” the phenotypic coverage of the human genome
0%
20%
40%
60%
80%
100%OMIM
OMIM+GWAS
GWAS
% h
um
an
cod
ing
gen
es
Ortholog only
Human+Ortholog
Human only
Five model organisms (mouse, zebrafish, fly, yeast, rat) provide almost 80% phenotypic coverage of the human genome
How can we take advantage this model organism phenotype data?
Outline Issues in candidate prioritization
Computational techniques for comparing phenotypes
Undiagnosed Disease Program semantic phenotyping
Minimum phenotype requirements
Tools leveraging phenotypes
Using ontologies to compare phenotypes across species
Washington, N. L., Haendel, M. A., Mungall, C. J., Ashburner, M., Westerfield, M., & Lewis, S. E. (2009). Linking Human Diseases to Animal Models Using Ontology-Based Phenotype Annotation. PLoS Biol, 7(11). doi:10.1371/journal.pbio.1000247
What is an ontology?A set of logically defined, inter-related terms used to annotate data
Use of common or logically related terms across databases enables integration
Relationships between terms allow annotations to be grouped in scientifically meaningful ways
Reasoning software enables computation of inferred knowledge
Groups of annotations can be compared using semantic similarity algorithms
An ontology provides the logical basis of classification
Any sense organ that functions in the detection of smell is an olfactory sense organ
sense organcapable_of some detection of smell
olfactory sense organ
nose
sense organ
nose
capable_of some detection of smell
sense organcapable_of some detection of smell
olfactory sense organ
nose
=> These are necessary and sufficient conditions
Classifying
Representating phenotypes
Human Phenotype Ontology
Used to annotate:• Patients• Disorders• Genotypes• Genes• Sequence variantsIn human
Köhler et al. The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. 2014 Jan 1;42(1):D966-74.
Mammalian Phenotype Ontology
Smith et al. (2005). The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information. Genome Biol, 6(1). doi:10.1186/gb-2004-6-1-r7
Used to annotate and query:• Genotypes• Alleles• GenesIn mice
Post-composed models of phenotype annotation
EntityAnatomy: headAnatomy: heart
Anatomy: ventral mandibular archGene Ontology: swim bladder inflation
QualitySmall sizeEdematousThickArrested
A human phenotype example
Abnormality of the eye
Vitreous hemorrhage
Abnormal eye morphology
Abnormality of the cardiovascular system
Abnormal eye physiology
Hemorrhage of the eye
Internal hemorrhage
Abnormality of the globe
Abnormality of blood circulation
lung
lung
lobular organ
parenchymatous organ
solid organ
pleural sac
thoracic cavity organ
thoracic cavity
abnormal lung morphology
abnormal respiratory system morphology
Mammalian Phenotype
Mouse Anatomy
FMA
abnormal pulmonary acinus morphology
abnormal pulmonary alveolus morphology
lungalveolus
organ system
respiratory system
Lower respiratory
tract
alveolar sac
pulmonary acinus
organ system
respiratory system
Human development
lung
lung bud
respiratory primordium
pharyngeal region
Problem: Data silos
develops_frompart_of
is_a (SubClassOf)
surrounded_by
Solution: bridging semantics
Mungall, C. J., Torniai, C., Gkoutos, G. V., Lewis, S. E., & Haendel, M. A. (2012). Uberon, an integrative multi-species anatomy ontology. Genome Biology, 13(1), R5. doi:10.1186/gb-2012-13-1-r5
anatomical structure
endoderm of forgut
lung bud
lung
respiration organ
organ
foregut
alveolus
alveolus of lung
organ part
FMA:lung
MA:lung
endoderm
GO: respiratory gaseous exchange
MA:lung alveolus
FMA: pulmonary
alveolus
is_a (taxon equivalent)
develops_frompart_of
is_a (SubClassOf)
capable_of
NCBITaxon: Mammalia
EHDAA:lung bud
only_in_taxon
pulmonary acinus
alveolar sac
lung primordium
swim bladder
respiratory primordium
NCBITaxon:Actinopterygii
Köhler et al. (2014) Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research F1000Research 2014, 2:30
Phenotype representation requires more than “phenotype ontologies”
glucose metabolism (GO:00060
06)
Gene/protein function
data
glucose(CHEBI:17
234)
Metabolomics,
toxicogenomics
data
Disease & phenotyp
e data
type II diabetes mellitus
(DOID:9352)
pyruvate(CHEBI:15
361)
Disease Gene Ontology Chemical
pancreatic beta cell
(CL:0000169)
transcriptomic data
Cell
OWLsim: Phenotype similarity across patients or organisms
https://code.google.com/p/owltools/wiki/OwlSim
Outline Issues in candidate prioritization
Computational techniques for comparing phenotypes
Undiagnosed Disease Program semantic phenotyping
Minimum phenotype requirements
Tools leveraging phenotypes
General exome analysis Single Exome
Remove off-target and common variants, filter on predicted deleteriousness, candidate gene strategies
Prioritize based on known genes, allele frequency, and pathogenicity
Homozygous recessive, X-linked, De novo (if trio)
Undiagnosed Disease Program exome analysis
Family exome data
Prioritize based on alignment quality, allele frequency, predicted deleterious, and PubMed
Filter using SNP chip data, Mendelian models of inheritance
and Population frequency
exome analysis
Recessive, De novo filters
Remove off-target, common variants, and variants not in known disease causing genes
Zemojtelet al., manuscript submittedhttp://compbio.charite.de/PhenIX/
Remove off-target and common variants
Recessive, De novo filters
https://www.sanger.ac.uk/resources/databases/exomiser/
Robinson et al. http://genome.cshlp.org/content/early/2013/10/25/gr.160325.113.abstract
Exomiser exome analysis
Current UDP analysis with semantic phenotyping
Family Exome Data
CombinedScore
Phenotype Data
Filter using SNP chip data, Mendelian models of
inheritance, and population frequency
Benchmarking
1092 unaffected exomes 28,516 disease
associated variants
100,000 simulated exomes
Annotate variants Remove off-target, syn and common(>1%
MAF) variants (plus optional inheritance model filtering)
Prioritize based on combined score
All diseases Autosomal Dominant
Autosomal Recessive
(hom)
Autosomal Recessive
(compound het)
0
10
20
30
40
50
60
70
80
90
100
% e
xom
es w
ith d
isea
se g
ene
as
top
hit
Variant
Phenotypic relevancePHIVE
Phenotype and variant data synergistically improve exome interpretation
Results Correct gene as top scoring hit in 68.3% of
exomes out of an average of 272 post-filtering candidate genes
Improvement of between 1.8 and 5.1 fold in the percentage of candidate genes correctly ranked in first place compared to just using pathogenicity and frequency data
Shows utility of structured phenotype data for computational analysis
UDP Experiment
UDP Diploid Aligned Cohort
VCF file18 families
Phenotype profiles
Mendelian filtered files (per family)
Mendelian Filters
Exomiser
PhenIX
Phenotype only
VCF files with phenotype and variant
scores (per family)
Top de novo candidates for patient 2543
Patient Exomiser Phenotype only PhenIX
UDP2543 STIM1, CYP2D6, MUC5B
ITGA7, PLEC, STIM1, PTGS1,
TTN
STIM1, RB1, DLEC1, CHRNB4, MUC5B, REPIN1, NBPF8, GPRIN3, TMEFF1, FLT3LG,
OSM, FZD10, MUC12
Gene Variant MAF(ESP or 1000g)
Consequence Predicted pathogenicity: SIFT, PolyPhen, MutTaster (0-1)
STIM1 chr11:g.4045175A>T [0/1] 0% p.I115T 1
UDP2543: phenotypic similarityPatient Stim1 het mouse OMIM:612783
(IMMUNODEFICIENCY 10) - hom STIM1 mutations
OMIM:160565 (MYOPATHY, TUBULAR AGGREGATE) - het STIM1 mutations
Impaired platelet aggregation
abnormal platelet activation
Thrombocytopenia
Thrombocytopenia decreased platelet cell number
Thrombocytopenia
Myopathy Myopathy Myopathy
Generalized hypotonia
Muscular hypotonia Proximal muscle weakness
Petechiae increased bleeding time
Autoimmune hemolytic anemia
Delayed gross motor development
Epistaxisincreased bleeding time
Gower sign
STIM1
Proposed workflow for undiagnosed diseases
What constitutes an adequate phenotype annotation for an undiagnosed patient?
Defining minimum phenotype standard:
1. Is the annotation specificity similar to or better than the corpus of available phenotype data?
2. Is the number of annotations/patient similar or better?
3. How does the ontology and annotation set differ across anatomical systems in terms of granularity? Does this change specificity requirements for phenotypic profiles?
4. How does use of NOT annotations help further specify the uniqueness of an undiagnosed patient?
5. How do onset, temporal ordering, and severity affect specificity?
UDP phenotype annotation metrics
UDP annotations have a similar Information content (IC) and a larger number of average annotations per disease/patient
Anatomical annotation distribution in the corpus
Nervous system, skeletal system, and immune system is highest => these categories require greater specificity and numbers of annotations
Annotation specificity meter
What about common traits, like blue eyes or acne?
Making the patient phenotype profiles as good as can be
Total requests from UDP 614 Examples
Number of requests assigned to HPO terms 423 Chronic limb pain -> limb pain
Number of terms that need consideration by UDP 145
Expressive language -> delay? Increase? Abnormal?
Number of requests that belong in other parts of the patient record 68
Abnormal aCGH 12q21.1-12q.2 (662 kb duplication) paternal origin -> move to genotype information portion of the record
It is a community effort to contribute requests to the ontologies and quality profiling helps make our tools work better for everyone
Limitations and ongoing work
Adding negation to the algorithm
Temporal ordering of phenotypes
Leveraging severity, expressivity, and penetrance data
Additional tools leveraging structured phenotype data
Monarch phenotype dataSpecies Source Unique
genotypes/variants
Disease/phenotype associations
Mouse MGI 53,573 406,618
Zebrafish ZFIN 14,703 75,698
C. elegans Wormbase 116,106 411,154
Fruit fly Flybase 98,596 265,329
Human OMIM 26,372 27,798
Human Orphanet 2,872 5,095
Human ClinVar 62,437 178,424
Also in the system: Rat; IMPC; GO annotations; Coriell cell lines; OMIA; MPD; Yeast; CTD; GWAS; Panther, Homologene orthologs; BioGrid interactions; Drugbank; AutDB; Allen Brain …157 sources to date
Coming soon: Animal QTLs for pig, cattle, chicken, sheep, trout, dog, horse
ModelCompare: How do the models recapitulate the disease?
Late-onset Parkinson’s Phenotypes Mouse Phenotypes
Slc6a3Dbh
Tyrosine metabolism
Slc6a3
Slc18a2
Uchl1
Uchl3
Snca
Mfn2
Cx IV
Cox8a
Th
Late-onsetParkinson’s Phenotypes
(subset)
Bradykinesia
Depression
Dysphagia
Lewy bodies
Network phenotype distribution
Slc6a3Dbh
Tyrosine metabolism
Slc6a3
Slc18a2
Uchl1
Uchl3
Snca
Mfn2
Cx IV
Cox8a
Th
Late-onsetParkinson’s Phenotypes
(subset)
Bradykinesia
Depression
Dysphagia
Lewy bodies
Abnormal gait
ataxia
paralysis
BradykinesiaAbnormal locomotion
Abnormality of central motor function
Phenotypes in common
Finding collaborators for functional validation
PatientPhenotype profile
Phenotyping experts
Exome Walker: Network based exploration of phenotypically similar diseases
http://compbio.charite.de/ExomeWalker/Walking the interactome for prioritization of candidate disease genes.Am J Hum Genet. 2008 Apr;82(4):949-58. doi: 10.1016/j.ajhg.2008.02.013.
Bare Lymphocyte Syndrome Type 1 Protein-Interaction Network
Exploits vicinity in the protein interaction network between phenotypically related diseases and uses this to rank exome candidates
Large boost in rankings of candidate genes using 250 disease gene-families
Prototype version online, manuscript in preparation
PhenoViz: Integrate all human, mouse, and fish data to understand CNVs
Desktop application for differential diagnostics in CNVs
Explain manifestations of CNV diseases based on genes contained in CNV
E.g., Supravalcular aortic stenosis in Williams syndrome can be explained by haploinsufficiency for elastin Double the number of explanations using model data
Doelken, Köhler, et al. (2013) Dis Model Mech 6:358-72
Conclusions Cross-species phenotype data can be
used to perform semantic similarity
Structured phenotype data for rare and undiagnosed disease patients can aid candidate evaluation
We are experimenting with these methods for UDP patient phenotypes to aid candidate prioritization, identify models, explore mechanisms, and find collaborators
NIH-UDPWilliam BoneMurat SincanDavid AdamsAmanda LinksDavid DraperNeal BoerkoelCyndi TifftBill Gahl
OHSUNicole VasileskyMatt Brush
Lawrence BerkeleyNicole WashingtonSuzanna LewisChris Mungall
UCSDAmarnath GuptaJeff GretheAnita BandrowskiMaryann Martone
U of PittChuck BoromeoJeremy EspinoHarry Hochheiser
AcknowledgmentsSanger
Anika OehlrichJules JacobsonDamian Smedley
TorontoMarta GirdeaSergiu Dumitriu Mike Brudno
JAXCynthia Smith
CharitéSebastian KohlerSandra DoelkenSebastian BauerPeter Robinson
Funding:NIH Office of Director: 1R24OD011883NIH-UDP: HHSN268201300036C
top related