haendel clingenetics.3.14.14

Post on 10-May-2015

478 Views

Category:

Documents

6 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Expanding the Clinical Phenotype Space with Semantics and Model Systems Melissa HaendelMarch 14th, 2014

Updates in Clinical Genetics 2014

Outline Issues in candidate prioritization

Computational techniques for comparing phenotypes

Undiagnosed Disease Program semantic phenotyping

Minimum phenotype requirements

Tools leveraging phenotypes

The Challenge: Interpretation of Disease Candidates

?

What’s in the box? How are

candidates identified?

How do they compare?

Prioritized Candidates, Models, functional validation

M1

M2

M3

M4

...

Phenotypes

P1

P2

P3

Genotype info

G1

G2

G3

G4

Pathogenicity, frequency, protein interactions, gene expression, gene networks, epigenomics, metabolomics….

Candidate gene prioritization

B6.Cg-Alms1foz/fox/J

increased weight,adipose tissue volume,

glucose homeostasis altered

ALSM1(NM_015120.4)[c.10775delC] + [-]

GENOTYPE

PHENOTYPE

obesity,diabetes mellitus, insulin resistance

increased food intake, hyperglycemia,

insulin resistance

kcnj11c14/c14; insrt143/+(AB)

Models recapitulate various phenotypic aspects of disease

?

OMIM Query # Records

“large bone” 785

“enlarged bone” 156

“big bone” 16

“huge bones” 4

“massive bones” 28

“hyperplastic bones” 12

“hyperplastic bone” 40

“bone hyperplasia” 134

“increased bone growth” 612

Searching for phenotypes usingtext alone is insufficient

Problem: Clinical and model phenotypes are described differently

“Expanding” the phenotypic coverage of the human genome

0%

20%

40%

60%

80%

100%OMIM

OMIM+GWAS

GWAS

% h

um

an

cod

ing

gen

es

Ortholog only

Human+Ortholog

Human only

Five model organisms (mouse, zebrafish, fly, yeast, rat) provide almost 80% phenotypic coverage of the human genome

How can we take advantage this model organism phenotype data?

Outline Issues in candidate prioritization

Computational techniques for comparing phenotypes

Undiagnosed Disease Program semantic phenotyping

Minimum phenotype requirements

Tools leveraging phenotypes

Using ontologies to compare phenotypes across species

Washington, N. L., Haendel, M. A., Mungall, C. J., Ashburner, M., Westerfield, M., & Lewis, S. E. (2009). Linking Human Diseases to Animal Models Using Ontology-Based Phenotype Annotation. PLoS Biol, 7(11). doi:10.1371/journal.pbio.1000247

What is an ontology?A set of logically defined, inter-related terms used to annotate data

Use of common or logically related terms across databases enables integration

Relationships between terms allow annotations to be grouped in scientifically meaningful ways

Reasoning software enables computation of inferred knowledge

Groups of annotations can be compared using semantic similarity algorithms

An ontology provides the logical basis of classification

Any sense organ that functions in the detection of smell is an olfactory sense organ

sense organcapable_of some detection of smell

olfactory sense organ

nose

sense organ

nose

capable_of some detection of smell

sense organcapable_of some detection of smell

olfactory sense organ

nose

=> These are necessary and sufficient conditions

Classifying

Representating phenotypes

Human Phenotype Ontology

Used to annotate:• Patients• Disorders• Genotypes• Genes• Sequence variantsIn human

Köhler et al. The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. 2014 Jan 1;42(1):D966-74.

Mammalian Phenotype Ontology

Smith et al. (2005). The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information. Genome Biol, 6(1). doi:10.1186/gb-2004-6-1-r7

Used to annotate and query:• Genotypes• Alleles• GenesIn mice

Post-composed models of phenotype annotation

EntityAnatomy: headAnatomy: heart

Anatomy: ventral mandibular archGene Ontology: swim bladder inflation

QualitySmall sizeEdematousThickArrested

A human phenotype example

Abnormality of the eye

Vitreous hemorrhage

Abnormal eye morphology

Abnormality of the cardiovascular system

Abnormal eye physiology

Hemorrhage of the eye

Internal hemorrhage

Abnormality of the globe

Abnormality of blood circulation

lung

lung

lobular organ

parenchymatous organ

solid organ

pleural sac

thoracic cavity organ

thoracic cavity

abnormal lung morphology

abnormal respiratory system morphology

Mammalian Phenotype

Mouse Anatomy

FMA

abnormal pulmonary acinus morphology

abnormal pulmonary alveolus morphology

lungalveolus

organ system

respiratory system

Lower respiratory

tract

alveolar sac

pulmonary acinus

organ system

respiratory system

Human development

lung

lung bud

respiratory primordium

pharyngeal region

Problem: Data silos

develops_frompart_of

is_a (SubClassOf)

surrounded_by

Solution: bridging semantics

Mungall, C. J., Torniai, C., Gkoutos, G. V., Lewis, S. E., & Haendel, M. A. (2012). Uberon, an integrative multi-species anatomy ontology. Genome Biology, 13(1), R5. doi:10.1186/gb-2012-13-1-r5

anatomical structure

endoderm of forgut

lung bud

lung

respiration organ

organ

foregut

alveolus

alveolus of lung

organ part

FMA:lung

MA:lung

endoderm

GO: respiratory gaseous exchange

MA:lung alveolus

FMA: pulmonary

alveolus

is_a (taxon equivalent)

develops_frompart_of

is_a (SubClassOf)

capable_of

NCBITaxon: Mammalia

EHDAA:lung bud

only_in_taxon

pulmonary acinus

alveolar sac

lung primordium

swim bladder

respiratory primordium

NCBITaxon:Actinopterygii

Köhler et al. (2014) Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research F1000Research 2014, 2:30

Phenotype representation requires more than “phenotype ontologies”

glucose metabolism (GO:00060

06)

Gene/protein function

data

glucose(CHEBI:17

234)

Metabolomics,

toxicogenomics

data

Disease & phenotyp

e data

type II diabetes mellitus

(DOID:9352)

pyruvate(CHEBI:15

361)

Disease Gene Ontology Chemical

pancreatic beta cell

(CL:0000169)

transcriptomic data

Cell

OWLsim: Phenotype similarity across patients or organisms

https://code.google.com/p/owltools/wiki/OwlSim

Outline Issues in candidate prioritization

Computational techniques for comparing phenotypes

Undiagnosed Disease Program semantic phenotyping

Minimum phenotype requirements

Tools leveraging phenotypes

General exome analysis Single Exome

Remove off-target and common variants, filter on predicted deleteriousness, candidate gene strategies

Prioritize based on known genes, allele frequency, and pathogenicity

Homozygous recessive, X-linked, De novo (if trio)

Undiagnosed Disease Program exome analysis

Family exome data

Prioritize based on alignment quality, allele frequency, predicted deleterious, and PubMed

Filter using SNP chip data, Mendelian models of inheritance

and Population frequency

exome analysis

Recessive, De novo filters

Remove off-target, common variants, and variants not in known disease causing genes

Zemojtelet al., manuscript submittedhttp://compbio.charite.de/PhenIX/

Remove off-target and common variants

Recessive, De novo filters

https://www.sanger.ac.uk/resources/databases/exomiser/

Robinson et al. http://genome.cshlp.org/content/early/2013/10/25/gr.160325.113.abstract

Exomiser exome analysis

Current UDP analysis with semantic phenotyping

Family Exome Data

CombinedScore

Phenotype Data

Filter using SNP chip data, Mendelian models of

inheritance, and population frequency

Benchmarking

1092 unaffected exomes 28,516 disease

associated variants

100,000 simulated exomes

Annotate variants Remove off-target, syn and common(>1%

MAF) variants (plus optional inheritance model filtering)

Prioritize based on combined score

All diseases Autosomal Dominant

Autosomal Recessive

(hom)

Autosomal Recessive

(compound het)

0

10

20

30

40

50

60

70

80

90

100

% e

xom

es w

ith d

isea

se g

ene

as

top

hit

Variant

Phenotypic relevancePHIVE

Phenotype and variant data synergistically improve exome interpretation

Results Correct gene as top scoring hit in 68.3% of

exomes out of an average of 272 post-filtering candidate genes

Improvement of between 1.8 and 5.1 fold in the percentage of candidate genes correctly ranked in first place compared to just using pathogenicity and frequency data

Shows utility of structured phenotype data for computational analysis

UDP Experiment

UDP Diploid Aligned Cohort

VCF file18 families

Phenotype profiles

Mendelian filtered files (per family)

Mendelian Filters

Exomiser

PhenIX

Phenotype only

VCF files with phenotype and variant

scores (per family)

Top de novo candidates for patient 2543

Patient Exomiser Phenotype only PhenIX

UDP2543 STIM1, CYP2D6, MUC5B

ITGA7, PLEC, STIM1, PTGS1,

TTN

STIM1, RB1, DLEC1, CHRNB4, MUC5B, REPIN1, NBPF8, GPRIN3, TMEFF1, FLT3LG,

OSM, FZD10, MUC12

Gene Variant MAF(ESP or 1000g)

Consequence Predicted pathogenicity: SIFT, PolyPhen, MutTaster (0-1)

STIM1 chr11:g.4045175A>T [0/1] 0% p.I115T 1

UDP2543: phenotypic similarityPatient Stim1 het mouse OMIM:612783

(IMMUNODEFICIENCY 10) - hom STIM1 mutations

OMIM:160565 (MYOPATHY, TUBULAR AGGREGATE) - het STIM1 mutations

Impaired platelet aggregation

abnormal platelet activation

Thrombocytopenia

Thrombocytopenia decreased platelet cell number

Thrombocytopenia

Myopathy Myopathy Myopathy

Generalized hypotonia

Muscular hypotonia Proximal muscle weakness

Petechiae increased bleeding time

Autoimmune hemolytic anemia

Delayed gross motor development

Epistaxisincreased bleeding time

Gower sign

STIM1

Proposed workflow for undiagnosed diseases

What constitutes an adequate phenotype annotation for an undiagnosed patient?

Defining minimum phenotype standard:

1. Is the annotation specificity similar to or better than the corpus of available phenotype data?

2. Is the number of annotations/patient similar or better?

3. How does the ontology and annotation set differ across anatomical systems in terms of granularity? Does this change specificity requirements for phenotypic profiles?

4. How does use of NOT annotations help further specify the uniqueness of an undiagnosed patient?

5. How do onset, temporal ordering, and severity affect specificity?

UDP phenotype annotation metrics

UDP annotations have a similar Information content (IC) and a larger number of average annotations per disease/patient

Anatomical annotation distribution in the corpus

Nervous system, skeletal system, and immune system is highest => these categories require greater specificity and numbers of annotations

Annotation specificity meter

What about common traits, like blue eyes or acne?

Making the patient phenotype profiles as good as can be

Total requests from UDP 614 Examples

Number of requests assigned to HPO terms 423 Chronic limb pain -> limb pain

Number of terms that need consideration by UDP 145

Expressive language -> delay? Increase? Abnormal?

Number of requests that belong in other parts of the patient record 68

Abnormal aCGH 12q21.1-12q.2 (662 kb duplication) paternal origin -> move to genotype information portion of the record

It is a community effort to contribute requests to the ontologies and quality profiling helps make our tools work better for everyone

Limitations and ongoing work

Adding negation to the algorithm

Temporal ordering of phenotypes

Leveraging severity, expressivity, and penetrance data

Additional tools leveraging structured phenotype data

The Monarch system

http://monarchinitiative.org

Monarch phenotype dataSpecies Source Unique

genotypes/variants

Disease/phenotype associations

Mouse MGI 53,573 406,618

Zebrafish ZFIN 14,703 75,698

C. elegans Wormbase 116,106 411,154

Fruit fly Flybase 98,596 265,329

Human OMIM 26,372 27,798

Human Orphanet 2,872 5,095

Human ClinVar 62,437 178,424

Also in the system: Rat; IMPC; GO annotations; Coriell cell lines; OMIA; MPD; Yeast; CTD; GWAS; Panther, Homologene orthologs; BioGrid interactions; Drugbank; AutDB; Allen Brain …157 sources to date

Coming soon: Animal QTLs for pig, cattle, chicken, sheep, trout, dog, horse

ModelCompare: How do the models recapitulate the disease?

Late-onset Parkinson’s Phenotypes Mouse Phenotypes

Slc6a3Dbh

Tyrosine metabolism

Slc6a3

Slc18a2

Uchl1

Uchl3

Snca

Mfn2

Cx IV

Cox8a

Th

Late-onsetParkinson’s Phenotypes

(subset)

Bradykinesia

Depression

Dysphagia

Lewy bodies

Network phenotype distribution

Slc6a3Dbh

Tyrosine metabolism

Slc6a3

Slc18a2

Uchl1

Uchl3

Snca

Mfn2

Cx IV

Cox8a

Th

Late-onsetParkinson’s Phenotypes

(subset)

Bradykinesia

Depression

Dysphagia

Lewy bodies

Abnormal gait

ataxia

paralysis

BradykinesiaAbnormal locomotion

Abnormality of central motor function

Phenotypes in common

Finding collaborators for functional validation

PatientPhenotype profile

Phenotyping experts

Exome Walker: Network based exploration of phenotypically similar diseases

http://compbio.charite.de/ExomeWalker/Walking the interactome for prioritization of candidate disease genes.Am J Hum Genet. 2008 Apr;82(4):949-58. doi: 10.1016/j.ajhg.2008.02.013.

Bare Lymphocyte Syndrome Type 1 Protein-Interaction Network

Exploits vicinity in the protein interaction network between phenotypically related diseases and uses this to rank exome candidates

Large boost in rankings of candidate genes using 250 disease gene-families

Prototype version online, manuscript in preparation

PhenoViz: Integrate all human, mouse, and fish data to understand CNVs

Desktop application for differential diagnostics in CNVs

Explain manifestations of CNV diseases based on genes contained in CNV

E.g., Supravalcular aortic stenosis in Williams syndrome can be explained by haploinsufficiency for elastin Double the number of explanations using model data

Doelken, Köhler, et al. (2013) Dis Model Mech 6:358-72

Conclusions Cross-species phenotype data can be

used to perform semantic similarity

Structured phenotype data for rare and undiagnosed disease patients can aid candidate evaluation

We are experimenting with these methods for UDP patient phenotypes to aid candidate prioritization, identify models, explore mechanisms, and find collaborators

NIH-UDPWilliam BoneMurat SincanDavid AdamsAmanda LinksDavid DraperNeal BoerkoelCyndi TifftBill Gahl

OHSUNicole VasileskyMatt Brush

Lawrence BerkeleyNicole WashingtonSuzanna LewisChris Mungall

UCSDAmarnath GuptaJeff GretheAnita BandrowskiMaryann Martone

U of PittChuck BoromeoJeremy EspinoHarry Hochheiser

AcknowledgmentsSanger

Anika OehlrichJules JacobsonDamian Smedley

TorontoMarta GirdeaSergiu Dumitriu Mike Brudno

JAXCynthia Smith

CharitéSebastian KohlerSandra DoelkenSebastian BauerPeter Robinson

Funding:NIH Office of Director: 1R24OD011883NIH-UDP: HHSN268201300036C

top related