ga4gh phenotype ontologies task team update
TRANSCRIPT
Melissa Haendel@ontowonka@monarchinit
Phenotype Ontologies (and a bit of G2P) task team
Genes Environment Phenotypes
Determinants of Health are Diverse
Physical environment Chemical exposures Treatments Smoking, alcohol
Education Health services Income Social status Stress Employment Working conditions Microbiome Pathogens
Clinical observations Laboratory tests Patient reported outcomes Child development Biometrics Behaviors
Sleep Exercise Screen time Diet
Genomic endowment Epigenetics Gene expression Gene regulation
Ontologies can help make (some of) this computable
Genes Environment Phenotypes+ =
But its not just the types of things
…the relationships and their evidence must also be
captured
G-P or D (disease)• causes• contributes to• is risk factor for• protects against• correlates with• is marker for• modulates• involved in• increases susceptibility to
G-G (kind of)• regulates• negatively regulates (inhibits)• positively regulates (activates)• directly regulates• interacts with• co-localizes with• co-expressed with
P/D - P/D• part of• results in• co-occurs with• correlates with• hallmark of (P->D)
E-P• contributes to (E->P)• influences (E->P)• exacerbates (E->P)• manifest in (P->E)
G-E (kind of)• expressed in• expressed during• contains• inactivated by
The Human Phenotype Ontology
11,813phenotype terms
127,125 rare disease -phenotype annotations
136,268common disease -phenotype annotations
bit.ly/hpo-paper
Peter Robinson, Sebastian Koehler, Chris Mungall
Other clinical vocabularies don’t adequately
cover phenotypic descriptions
Winnenburg and Bodenreider, 2014
Perc
ent
cove
rage
=> HPO is now in the UMLS
0
20
40
60
80
100H
PO
UM
LS
SN
OM
ED
CT
CH
V
Med
DR
A
MeS
H
NC
IT
ICD
10
OM
IM
Precision fuzzy phenotype matching
DOI: 10.1126/scitranslmed.3009262
How much phenotyping is enough?
Enlarged ears (2)Dark hair (6) Female (4)Male (4)
Blue skin (1)Pointy ears (1)
Hair absent on head (1)Horns present (1)
Hair present on head (7)
Enlarged lip (2)
Increased skin pigmentation (3)
bit.ly/annotationsufficiency
Matchmaker Exchange for patients, diseases, and model
organisms
Computational matching of rare disease patients across clinical & public sourcesFind models and experts for functional validation
bit.ly/mme-matchboxpatientarchive.orgbit.ly/exomiser-2017
Plain language synonyms for computable
phenotypes
Layperson-HPO driven phenotyping tool
https://www.pcori.org/research-results/2017/realization-standard-care-rare-diseases-using-patient-engaged-phenotyping
Catherine Brownstein, Ingrid Holm
NCI Thesaurus is the de facto cancer
vocabulary standard
Required for drug trials by FDA, but not interoperable with other vocabulary standards
SequenceOntology
UberonAnatomyOntology
GenotypeOntology
MONDODisease
Ontology
Human PhenotypeOntology NCBIGene
Reactome
NCBITaxon
ProteinOntology
ChEBI chemicalentities ontology
UNII chemicalsubstance registry
CellOntology
CellOntology
Ontology ofBiomedical
Investigations
GeneOntology(GO-BP)
UberonAnatomyOntology
GeneOntology(GO-CC)
UniProt
Tailoring the NCIt for computational
interoperability
https://github.com/NCI-ThesaurusICD-O and Oncotree slims available too: https://github.com/NCI-Thesaurus/thesaurus-obo-edition/wiki/Downloads
SequenceOntology
UberonAnatomyOntology
GenotypeOntology
MONDODisease
Ontology
Human PhenotypeOntology NCBIGene
Reactome
NCBITaxon
ProteinOntology
ChEBI chemicalentities ontology
UNII chemicalsubstance registry
CellOntology
CellOntology
Ontology ofBiomedical
Investigations
GeneOntology(GO-BP)
UberonAnatomyOntology
GeneOntology(GO-CC)
UniProt
Lobular Breast Carcinoma = 'Breast Adenocarcinoma'and (Disease_Has_Normal_Tissue_Origin some 'Terminal Ductal Lobular Unit')and (Disease_Has_Normal_Cell_Origin some 'Terminal Ductal Lobular Unit Cell')and (Disease_Has_Abnormal_Cell some 'Lobular Carcinoma Cell')and (Disease_May_Have_Cytogenetic_Abnormality some 'Loss of Chromosome 16q')and (Disease_Excludes_Abnormal_Cell some 'Ductal Carcinoma Cell')and (Disease_Excludes_Finding some 'Mixed Cellular Population')and (Disease_Mapped_To_Gene some 'CDH1 Gene')and (Disease_May_Have_Molecular_Abnormality some 'Loss of E-cadherin Expression')and (Disease_May_Have_Molecular_Abnormality some 'CDH1 Gene Inactivation')
Tailoring the NCIt for computational
interoperability
Jim Balhoff, Sherri DeCorronado, Giberto Fragoso, Nicole Vasilevsky, Paula Carrio Caro, Matt Brush, Chris Mungall
Variant Pathogenicity Interpretations
Pathogenic ?
Benign ?
"DSC2:c.631-2A>G
Right
Ventricular
Cardiomyopathy
Complications to variant interpretation:
Pathogenicity evidence is complex, diverse, indirect, conflicting
Siloed curation guidelines
High stakes (Applied directly to care)
Improving Rigor and Consistency of
Variant Interpretation
2015 ACMG-AMP Variant Interpretation Guidelines 28 ‘criteria’ re: evidence types, strength
Framework for combining criteria outcomes
ClinGen Variant Curation Interface (VCI) and DMWG Data model and curation for variant evidence and provenance
SEPIO Scientific Evidence and Provenance Information Ontology Computable model for representation and analysis of evidence and provenance
Merged Disease Classification• Harmonized disease classification for algorithmic use and pathogenicity assignment
SEPIOScientific Evidence and
Provenance Information
Matt Brush, Selina Dwight, Larry Babb, Chris Bizon, Bradford Powell, Tristan Nelson, Bob Freimuth, Chris Mungall
co-localization evidence functional
complementation evidence
microscopy evidence
imaging evidence co-immunoprecipitation
evidence
:e4
Algorithms can leverage semantics of SEPIO models to compute quantitative metrics of evidence quality, quantity, diversity, and concordance – supporting automated evaluation of claims.
:e5:e3:e1 :e2
:claim1“pathogenic”
:claim2“benign”
Evidence-Based Computational
Evaluation of Claims
https://github.com/monarch-initiative/SEPIO-ontology/wiki
Disease 1 Disease 2
Data Standards Ontologies Data Standards Ontologies Data Standards Ontologies
Genes Environment Phenotypes
How do all these ontologies fit into our
notion of disease?
FHIR
Disease 1 Disease 2
Data Standards Ontologies Data Standards Ontologies Data Standards Ontologies
Genes Environment Phenotypes
FHIR
METADATA, EVIDENCE
Defining disease and clinical pathogenicity:
A lumping and splitting problem
source IDs
split/merge
manage resolution &provenance
MONDO Unified Disease OntologySEPIO
Scientific Evidence andProvenance Information
One disease or two? What does the evidence favor?
One disease or two? How do we manage identifiers, hierarchy?
OMIM(brown)
MESH(grey)
ORDO/Orphanet(yellow)
SubClassOf(solid line)
Xref(dashed grey line)
Hemolytic anemia mappings across resources
Each nosology is different, they inconsistently map to each other, leading to poor interoperability and computability
New integrated nosology
http://bit.ly/Monarch-Diseasehttp://purl.obolibrary.org/obo/mondo/pre/mondo.owl
Genes Environment Phenotypes
VCF PXFGFF
Standard exchange formats exist for genes …
but for phenotypes? Environment?
BED
http://phenopackets.org New Funding: Forums for Phenomics!
What does a phenopacket look like?
Alacrima
Sleep Apnea
Microcephaly
phenotype_profile:
- entity: ”patient16"
phenotype:
types:
- id: "HP:0000522"
label: ”Alacrima"
onset:
description: “at birth”
types:
- id: "HP:0003577"
label: "Congenital onset"
evidence:
- types:
- id: "ECO:0000033"
label: ”Traceable Author Statement"
source:
- id: ”PMID:"
Clinical labs
Public databases
Journals
Layperson HPO + Phenopackets
Dry eyes
Stops breathing during sleep
Small head
phenotype_profile:
- entity: “Grace”
phenotype:
types:
- id: "HP:0000522"
label: “Alacrima"
onset:
description: “at birth"
types:
- id: "HP:0003577"
label: "Congenital onset"
evidence:
- types:
- id: “ECO:0000033”
label: “Traceable Author Statement"
source:
- id: “
https://twitter.com/examplepatient/status/1
23456789”
• Patient registries
• Social media
What’s next? Challenges for this
workstream Figure out how ontologies, metadata, eHealth and exchange
standards all fit together in this workstream
Further harmonize existing disease and phenotype ontologies and standards
Define exchange of structured phenotype data in different contexts: clinical, basic research, patients, databases, journals
Getting structured G2P data–that is about the biology of the patient -into/out of the EHR
Demonstrate standardization success across the driver projects
Discuss!