global phenotypic data sharing standards to maximize diagnostic discovery

66
Global Phenotypic Data Sharing Standards to Maximize Diagnostic Discovery Melissa Haendel, PhD and Sebastian Köhler, PhD RD-Action workshop April 26 th and 27 th , Brussels

Upload: mhaendel

Post on 23-Jan-2018

467 views

Category:

Science


1 download

TRANSCRIPT

Page 1: Global phenotypic data sharing standards to maximize diagnostic discovery

Global Phenotypic Data Sharing Standards to Maximize Diagnostic Discovery

Melissa Haendel, PhD and Sebastian Köhler, PhD

RD-Action workshopApril 26th and 27th, Brussels

Page 2: Global phenotypic data sharing standards to maximize diagnostic discovery

Talk outline

About HPO Semantic similarity Leveraging basic research data Exome analysis and disease discovery HPO-based tools Phenotype data standards for exchange

Page 3: Global phenotypic data sharing standards to maximize diagnostic discovery

What do we mean by phenotype?

= Phenotypic abnormality = clinical feature

Constellation/Pattern clinical featuresdefines a disease:– [Disease X]... is a rare developmental disorder defined by

the combination of aplasia cutis congenita of the scalpvertex and terminal transverse limb defects. In addition, vascular anomalies such as cutis marmoratatelangiectatica ... are recurrently seen.

(Yes, this is a simplification)

Page 4: Global phenotypic data sharing standards to maximize diagnostic discovery

Starting point: OMIM

Clinical Synopsis (CS) section

Free text phenotypic description Very expressive

Online Mendelian Inheritance in Man database

Page 5: Global phenotypic data sharing standards to maximize diagnostic discovery

(Un)Controlled Vocabularies

Not designed to be easily machine interpretable

Spelling problems, acronyms, etc.

Homonyms:

... fibrillation ...

fibrillation ≠ fibrillation

= ventricular fibrillation= muscle fibrillation

Page 6: Global phenotypic data sharing standards to maximize diagnostic discovery

Why you should care

OMIM Query Number of Results

large bones 264

large bone 785

enlarged bones 87

enlarged bone 156

big bones 16

huge bones 4

massive bones 28

hyperplastic bones 12

hyperplastic bone 40

bone hyperplasia 134

increased bone growth 612

Page 7: Global phenotypic data sharing standards to maximize diagnostic discovery

Motivation

HPO started in 2008

Goal: computer-interpretable clinical features!

Reliable information extraction from databases based on clinicalfeatures

Compute similarity between diseases based on clinical features

Compute similarity between patients based on clinical features

Compute similarity between patients and diseases based on clinicalfeatures

Interoperability with basic research to improve diagnostic discovery

Easy to use

Freely available

Page 8: Global phenotypic data sharing standards to maximize diagnostic discovery

The Human Phenotype Ontology

(HPO)

Description of phenotypic abnormalities (or clinical features) in

humans

abnormality of thenervous system

neurofibrillary tangles

cerebral inclusion bodies

gait ataxia

gait disturbance

ataxia

phenotypicabnormality

incoordination

abnormality of movement

abnormality of the central nervous

system

This is a term

CS of OMIM:0815

CS of OMIM:1234

Neurofibrillary tanglesmay be present

Paired helical filaments

Page 9: Global phenotypic data sharing standards to maximize diagnostic discovery

The Human Phenotype Ontology (HPO)

Synonyms merged into one term

Textual definitions for each term

id: HP:0002185

name: Neurofibrillary tangles

def: Pathological protein

aggregates formed by

hyperphosphorylation of a

microtubule-associated protein

known as tau, causing it to

aggregate in an insoluble form.

[HPO:sdoelken]

synonym: Neurofibrillary tangles

may be present EXACT []

synonym: Paired helical filaments

EXACT []

abnormality of thenervous system

neurofibrillary tangles

cerebral inclusion bodies

gait ataxia

gait disturbance

ataxia

phenotypicabnormality

abnormality of movement

abnormality of the central nervous

system

incoordination

Page 10: Global phenotypic data sharing standards to maximize diagnostic discovery

The Human Phenotype Ontology

(HPO)

Semantic relations

(’subclass of’, ‘is a’)

From top to bottom,

terms get more specific

abnormality of the nervous system

neurofibrillary tangles

cerebral inclusionbodies

gait ataxia

gait disturbance

ataxia

phenotypicabnormality

abnormality of movement

abnormality of the central nervous

system

is a

is ais a

is a

is a

is a

is ais a

is a

is a

is a

is a

is a

is a incoordination

Page 11: Global phenotypic data sharing standards to maximize diagnostic discovery

Computable phenotype definitions of

disease

HPO Terms are used to annotate (describe) diseases

E.g. neurofibrillary tangles is used to annotate Alzheimer Disease:

Orphanet + Monarch:

~124,000 annotations of 7,700 rare diseases from OMIM, Orphanet, DECIPHER

~133,000 annotations of 3,145 common diseases

Köhler et al. https://doi.org/10.1093/nar/gkw1039

OMIM:0815 OMIM:1234

Neurofibrillary tanglesmay be present

Paired helical filaments

Page 12: Global phenotypic data sharing standards to maximize diagnostic discovery

Why HPO? Existing clinical vocabularies don’t

adequately cover phenotypic descriptions

Winnenburg and Bodenreider, 2014

0

10

20

30

40

50

60

70

80

90

100

HPO UMLS SNOMED CT CHV MedDRA MeSH NCIT ICD10 OMIM

Perc

ent

cove

rage

LDDB (✓)

Orphanet (✓) (Use HPO directly)

MedDRA (✓)

UMLS (completely incorporated)

Page 13: Global phenotypic data sharing standards to maximize diagnostic discovery

Community contribution

Multiple HPO-specific workshopsConstant discussions via Tracker-System and E-Mail

We try our best to acknowledge contributors:

+ microattributions

Page 14: Global phenotypic data sharing standards to maximize diagnostic discovery

Contributing to and extending HPO

Page 15: Global phenotypic data sharing standards to maximize diagnostic discovery

HPO language translations

We need your help! http://bit.ly/hpo-translations

Translation of labels, synonyms, and text definitions

Italian Spanish Russian French

German English layperson Japanese Chinese

100%11%

12%

100%

19%19%

near 100%

20%

Page 16: Global phenotypic data sharing standards to maximize diagnostic discovery

Adoption of HPO

Public facing databases using HPO toannotate patients

Tools ingesting HPO-annotated data:

Köhler et al. https://doi.org/10.1093/nar/gkw1039

Page 17: Global phenotypic data sharing standards to maximize diagnostic discovery

Why HPO is a successful standard

One language shared by “all“ Synonyms “map“ to one concept (HPO term) Contains terms that no other ontology has Comes with disease annotations! (Not just “Yet another clinical

terminology“) Simple, qualitative phenotyping, deviation (abnormal, abnormal

increase, abnormal decrease, ...) to ease analysis Documented, traceable editing Open science community project with diverse contributors Constantly improved and extended, examples:

Layperson version for patients Language translations Opposite-relations between terms

Page 18: Global phenotypic data sharing standards to maximize diagnostic discovery

Talk outline

About HPO Semantic similarity Leveraging basic research data Exome analysis and disease discovery HPO-based tools Phenotype data standards for exchange

Page 19: Global phenotypic data sharing standards to maximize diagnostic discovery

A disease can be described

algorithmically as a collection of

phenotypes

Patient

Disease X

Differential diagnosis with matching phenotype concepts is already good

SplenomegalyNasal speech

Increased spleen size Nasal voice

These are synonyms in HPO, i.e. map to the same term

These are synonyms in HPO, i.e. map to the same term

Page 20: Global phenotypic data sharing standards to maximize diagnostic discovery

A disease can be described

algorithmically as a collection of

phenotypes

Patient

Disease X

Differential diagnosis with similar but non-matching phenotypes is difficult

Splenomegaly Oral motor hypotonia

Ruptured spleen Decreased muscle mass

Page 21: Global phenotypic data sharing standards to maximize diagnostic discovery

Similarity between two terms

Oral motor

hypotonia

Muscular

hypotonia of the

trunk

Abnormal muscle

tone

Oral motor

hypotonia

Abnormality of

calvarial

morphology

Phenotypic

abnormality

High scoring match

Very low scoring match

Medium scoring match

Score: Measured by Information Content

Page 22: Global phenotypic data sharing standards to maximize diagnostic discovery

Comparing phenotype profiles

E.g. Patient-to-Disease

comparison

Patient‘s phenotypesmore similar to Disease A

Orphamizer would rank Disease A before DiseaseB

Disease BPatientPatient Disease A

High scoring match

Very low scoring match

Medium scoring match

Score: Measured by Information Content

Page 23: Global phenotypic data sharing standards to maximize diagnostic discovery

Orphamizer

Page 24: Global phenotypic data sharing standards to maximize diagnostic discovery

Talk outline

About HPO Semantic similarity Leveraging basic research data Exome analysis and disease discovery HPO-based visualization tools Phenotype data standards for exchange

Page 25: Global phenotypic data sharing standards to maximize diagnostic discovery

The genome is sequenced, but...

3,398 OMIM

Mendelian Diseases with

no known genetic basis

?

At least 120,000* ClinVar

Variants with no known

pathogenicity

…we still don’t know very much about what it does

*This is > twice what it was in 2016!

Page 26: Global phenotypic data sharing standards to maximize diagnostic discovery

Adding other species’ data

helps fill knowledge gaps in human genome

Page 27: Global phenotypic data sharing standards to maximize diagnostic discovery

More species = more coverage

19,008

78%

14,779

Number of human protein-coding genes in ExAC DB as per Lek et al. Nature 2016

19,008

Even inclusion of just four species boosts

phenotypic coverage of genes by 38%

(5189%)Combined = 89%

19,008

2,195 7,544 7,235 = 16,974 (union of coverage in any species)

9,739

51%

Mungall et al Nucleic Acids Research bit.ly/monarch-nar-2016

Page 28: Global phenotypic data sharing standards to maximize diagnostic discovery

Ulcerated

paws

Palmoplantar

hyperkeratosis

Thick hand skin

Image credits:

"HandsEBS" by James Heilman, MD - Own work. Licensed under CC BY-SA 3.0 via Commons –https://commons.wikimedia.org/wiki/File:HandsEBS.JPG#/media/File:HandsEBS.JPG

http://www.guinealynx.info/pododermatitis.html

Page 29: Global phenotypic data sharing standards to maximize diagnostic discovery

Challenge: Each database uses their own

vocabulary/ontology

MP

HP

MGIHPOA

Page 30: Global phenotypic data sharing standards to maximize diagnostic discovery

Challenge: Each database uses their own

phenotype vocabulary/ontology

ZFA

MPDPO

WPO

HP

OMIA

VT

FYPOAPO

SNOMED

…NCIT

WB

PB

FB

OMIA

MGI

RGD

ZFIN

SGD

HPOA

EHR

IMPCOMIM

QTLdb

Page 31: Global phenotypic data sharing standards to maximize diagnostic discovery

Can we help machines understand

phenotype terms?

“Palmoplantar hyperkeratosis”

Human phenotype

I have absolutely no idea what that means

Page 32: Global phenotypic data sharing standards to maximize diagnostic discovery

Decomposition of complex concepts

using species neutral terms

Mungall, C. J., Gkoutos, G., Smith, C., Haendel, M., Lewis, S., & Ashburner, M. (2010). Integrating phenotype ontologies across multiple species. Genome Biology, 11(1), R2. doi:10.1186/gb-2010-11-1-r2

“Palmoplantar

hyperkeratosis”

increased

Stratum corneumlayer of skin

=

Human phenotypePATO

Uberon

Species neutral ontologies, homologous concepts

Autopod

keratinization

GO

Page 33: Global phenotypic data sharing standards to maximize diagnostic discovery

How can anatomy be “species-

neutral”?

Page 34: Global phenotypic data sharing standards to maximize diagnostic discovery

HPO Interoperability and

annotations

Hyposmia

Abnormality of

globe location

eyeball of camera-type eye

sensory perception of smell

Abnormal eye

morphology

Motor neuron

atrophyDeeply set eyes

motor neuronCL

34571 annotations in

22 species

157534 phenotype

annotations

2150 phenotype

annotations

11,813phenotype terms

127,125 rare disease -phenotype annotations

136,268common disease -phenotype annotations

http://bit.ly/hpo-paper

Page 35: Global phenotypic data sharing standards to maximize diagnostic discovery

Which phenotypic profile is most

similar?

Model X

Patient

Disease Y

Page 36: Global phenotypic data sharing standards to maximize diagnostic discovery

Model X

Patient

Disease Ywww.owlsim.org

Fuzzy-phenotype matching

Page 37: Global phenotypic data sharing standards to maximize diagnostic discovery

But what about the diseases? How to choose

which ones? What is their provenance?

Page 38: Global phenotypic data sharing standards to maximize diagnostic discovery

A dynamic nosology

Challenge: can we rapidly synergize multiple knowledge sources into a dynamic ontology?

classic clinical phenotype-oriented disease classification and molecular sources

Knowledge-based approaches

Logical Definition OWL Ontology Merging

Bayesian OWL Ontology Merging

Data driven

Phenotype and functional ontology networks

Mungall, C. J.,. (2016). k-BOOM: bioRxiv, 048843. doi:10.1101/048843

Page 39: Global phenotypic data sharing standards to maximize diagnostic discovery

DOID(blue)

OMIM(brown)

MESH(grey)

ORDO/Orphanet(yellow)

SubClassOf(solid line)

Xref(dashed grey line)

4 disease resourcesplus mappings:Hemolytic anemia

Page 40: Global phenotypic data sharing standards to maximize diagnostic discovery

Coherent disease classification =>

Orphanet

https://github.com/monarch-initiative/monarch-disease-ontology

“Ontology” Classes (before, after merge)

SubClass axioms Xrefs

Inputs:

DOID 6878 6012 7082 36656

MESH (D) 11314 4152 19036

OMIM (D) 7783 7783 0 31242

Orphanet (D) 8740 4683 15182 20326

OMIA 4833 4833 3120 355

DC 209 208 310 316

Medic 0 8630 3435

Output:

Merged 39757 27617 44837

Page 41: Global phenotypic data sharing standards to maximize diagnostic discovery

Talk outline

About HPO Semantic similarity Leveraging basic research data Exome analysis and disease discovery HPO-based tools Phenotype data standards for exchange

Page 42: Global phenotypic data sharing standards to maximize diagnostic discovery

Prevailing clinical genomic pipelines

leverage only a tiny fraction of the available

data

PATIENT EXOME/ GENOME

PATIENT CLINICAL PHENOTYPES

PUBLIC GENOMIC DATA

PUBLIC CLINICAL PHENOTYPE, DISEASE DATA

POSSIBLE DISEASES

DIAGNOSIS & TREATMENT

PATIENT ENVIRONMENTPUBLIC ENVIRONMENT,

DISEASE DATA

PATIENT OMICS PHENOTYPES PUBLIC OMICS PHENOTYPES,CORRELATIONS

Under-utilized data

Page 43: Global phenotypic data sharing standards to maximize diagnostic discovery

Phenotypic profile matching

Page 44: Global phenotypic data sharing standards to maximize diagnostic discovery

Combining G2P data for variant

prioritization

Whole exome

Remove off-target and common variants

Variant score from allele freq and pathogenicity

Phenotype score from phenotypic similarity

PHIVE score to give final candidates

Mendelian filters

Page 45: Global phenotypic data sharing standards to maximize diagnostic discovery

Exomiser results for UDP diagnosed

patients

Inclusion of phenotype data improves variant prioritization

In 60% of first 1000 genomes at GEL, Exomiserpredicts top candidateIn 86% of cases, Exomiser predicts within top 5

Page 46: Global phenotypic data sharing standards to maximize diagnostic discovery

Example case solved by ExomiserP

he

no

typ

ic

pro

file

Ge

ne

s Heterozygous, missense mutation

STIM-1

N/A

Heterozygous, missense mutation

STIM-1N/A

Stim1Sax/Sax

Ranked STIM-1 variant maximally pathogenic based on cross-species G2P data,

in the absence of traditional data sourceshttp://bit.ly/exomiser

Page 47: Global phenotypic data sharing standards to maximize diagnostic discovery

Deep phenotyping and “fuzzy” matching

algorithms improve diagnostics

4.9% exomes with dual molecular diagnoses, differentiated with deep phenotyping

Page 48: Global phenotypic data sharing standards to maximize diagnostic discovery

Talk outline

About HPO Semantic similarity Leveraging basic research data Exome analysis and disease discovery HPO-based tools Phenotype data standards for exchange

Page 49: Global phenotypic data sharing standards to maximize diagnostic discovery

How much phenotyping is enough?

Enlarged ears (2)Dark hair (6) Female (4)Male (4)

Blue skin (1)Pointy ears (1)

Hair absent on head (1)Horns present (1)

Hair present on head (7)

Enlarged lip (2)

Increased skin pigmentation (3)

bit.ly/annotationsufficiency

Page 50: Global phenotypic data sharing standards to maximize diagnostic discovery

Phenotype matching visualization

widget

file:///.file/id=6571367.18966428

bit.ly/monarch-nar-2016

Page 51: Global phenotypic data sharing standards to maximize diagnostic discovery

Matchmaker Exchange for patients, diseases, and model

organisms to aid diagnosis and mechanistic discovery

www.monarchinitiative.orghttp://bit.ly/Monarch-MME

Goal: Get clinical sites & public databases to provide standardized phenotype data

Page 52: Global phenotypic data sharing standards to maximize diagnostic discovery

Talk outline

About HPO Semantic similarity Leveraging basic research data Exome analysis and disease discovery HPO-based tools Phenotype data standards for exchange

Page 53: Global phenotypic data sharing standards to maximize diagnostic discovery

Genes Environment Phenotypes+ =

Biology central dogma

Standards for exchanging data

must be up to these challenges.

Page 54: Global phenotypic data sharing standards to maximize diagnostic discovery

Genes Environment Phenotypes+ =

Computable encodings are essential

Base pairsVariant notation (eg. HGVS)

SNOMED-CTMedical procedure codingEnvironment Ontology

@ontowonka

Page 55: Global phenotypic data sharing standards to maximize diagnostic discovery

Genes Environment Phenotypes

VCF PXFGFF

Standard exchange mechanisms exist for

genes … but for phenotypes? Environment?

BED

Page 56: Global phenotypic data sharing standards to maximize diagnostic discovery

Introducing PhenoPackets

A packet of phenotype data to be used

anywhere, written by anyone

http://phenopackets.org

Page 57: Global phenotypic data sharing standards to maximize diagnostic discovery

What does a phenopacket look like?

Alacrima Sleep Apnea Microcephaly

phenotype_profile:

- entity: ”patient16"

phenotype:

types:

- id: "HP:0000522"

label: ”Alacrima"

onset:

description: “at birth”

types:

- id: "HP:0003577"

label: "Congenital onset"

evidence:

- types:

- id: "ECO:0000033"

label: ”Traceable Author Statement"

source:

- id: ”PMID:"

Clinical labs Public databases Journals

Page 58: Global phenotypic data sharing standards to maximize diagnostic discovery

What about patients? Can they phenotype

themselves?

Page 59: Global phenotypic data sharing standards to maximize diagnostic discovery
Page 60: Global phenotypic data sharing standards to maximize diagnostic discovery

HPO for Patients

http://bit.ly/hpo-biocuration

6,200 plain language terms for patients, families, and non-experts

New software application being developed for patients

Page 61: Global phenotypic data sharing standards to maximize diagnostic discovery

Layperson HPO + Phenopackets

Dry eyes Stops breathing during sleep Small head

phenotype_profile:

- entity: “Grace”

phenotype:

types:

- id: "HP:0000522"

label: “Alacrima"

onset:

description: “at birth"

types:

- id: "HP:0003577"

label: "Congenital onset"

evidence:

- types:

- id: “ECO:0000033”

label: “Traceable Author Statement"

source:

- id: “

https://twitter.com/examplepatient/status/1

23456789”

• Patient registries• Social media

Page 62: Global phenotypic data sharing standards to maximize diagnostic discovery

Journals are now requiring HPO

terms

Robinson, P. N., Mungall, C. J., & Haendel, M. (2015). Capturing phenotypes for precision medicine. Molecular Case Studies, 1(1), a000372. doi:10.1101/mcs.a000372

Each phenopacketcan be shared via DOI

in any repository outside paywall (eg.

Figshare, Zenodo, etc)

Each article can beassociated with a

phenopacket

Page 63: Global phenotypic data sharing standards to maximize diagnostic discovery

Community “curate-athons” for of HPO

Cardiovascular curate-athon at Stanford. @20 cardiologists (surgeons, pediatric, etc.), four ontologists, and three clinical curators met for two days.

Abnormal ComplexVoltage to be added to all waves-increased, decreased, fluctuating (alternans)Duration to be added to all waves-increased, decreasedP wave-notching-axisQRS-fractionation-axis (right/left/extreme)Q waveR waveS waveR’ waveS’ wave (abnormal only)J wave (can be normal variant)Epsilon wave (abnormal only)Osborne wave (abnormal only)Terminal slur wave (can be normal variant)Delta wave (abnormal only)

Added 100s of clinically relevant cardiophysiology phenotypes to HPO, new exome analysis possible

Page 64: Global phenotypic data sharing standards to maximize diagnostic discovery

Summary

The Human Phenotype Ontology is a robust standard describing phenotypic abnormalities FOR the community, FROM the community for deep phenotyping rare disease patients

Model organism data can fill gaps in our knowledge and aid mechanistic exploration of disease candidates

Tools that leverage the Human Phenotype Ontology can be used to prioritize coding and noncoding variants for WES and WGS and CNVs

Patients can provide self-phenotyping information as partners in the deep phenotyping process

Phenopackets is a FAIR-based GA4GH exchange standard for facilitating distributed phenotype data sharing for clinics, labs, patients, and journals

Page 65: Global phenotypic data sharing standards to maximize diagnostic discovery

Acknowledgements

Orphanet

Ana Rath

Annie Olry

Marc Hanauer

Halima Lourghi

Lawrence BerkeleyChris Mungall

Suzanna Lewis

Jeremy Nguyen

Seth Carbon

RENCIJim Balhoff

OHSUMatt Brush

Kent Shefchek

Julie McMurry

Tom Conlin

Nicole Vasilevsky

Dan Keith

Genomics England/Queen Mary

Damian Smedley

Jules Jacobson

Jackson LaboratoryPeter Robinson

Leigh Carmody

With special thanks to Julie McMurry for excellent graphic design

GarvanTudor Groza

Craig McNamara

Hipbi / NeuroCureDominik Seelow

Markus Schülke-Gerstenfeld

ChariteDominik Seelow

Tomasz Zemojtel

Page 66: Global phenotypic data sharing standards to maximize diagnostic discovery

www.monarchinitiative.org

Funding: NIH Office of Director: 2R24OD011883; NHGRI UDP: HHSN268201300036C,

HSN268201400093P; NCATS: UDN U01TR001395,

Biomedical Data Translator: 1OT3TR002019; E-RARE 2015: Hipbi-RD 01GM1608