phenotype terminologies in use for genotype-phenotype databases: a common core for standardisation...
DESCRIPTION
The community needs to be provided with terminology standards in order to achieve interoperability between databases intended for clinical research and including description of phenotypes. This is crucial to interpret genomic rearrangements as well as future high-throughput sequence data. The aim of our work was to promote a core terminology of phenotypes interoperable with all the terminologies in use. Relevant terminologies in use by different communities to describe phenomes were cross–referenced: PhenoDB (2846 terms), London Dysmorphology Database (LDDB; 1318 terms), Orphanet (1243 terms), Human Phenotype Ontology (9895 terms, 22/08/2012), Elements of Morphology (AJMG; 423 terms), ICD10 (1230 terms), as well as medical terminologies in use: UMLS (7,957,179 distinct concept terms), SNOMED CT (>311,000 concepts), MeSH (26,853 concepts) and MedDRA (69,389 concepts). We established a strategy to compare them to find commonalities and differences, using ONAGUI as a tool to pick-up exact matches. The non-exact matches were verified manually by an expert. A core-terminology of 2,300 terms was derived and analysed by a panel of experts (International Consortium for Human Phenotype Terminologies – ICHPT). The resulting consensual terminology will be freely available in a dedicated website (www.ichpt.org) and mappings with other terminologies will be given in order to ease the interoperability between databases without disturbing the habits of the different groups of users.TRANSCRIPT
www.orpha.net
Phenotype terminologies in use for genotype-phenotype
databases:
A common core for standardisation and interoperability
HVP5 – UNESCO, Paris – 22 May 2014
Aymé S., Rath A., Chanas L., Hamosh A., Robinson P.N. &The International Consortium for Human Phenotype Terminologies
www.orpha.net
Do you mean?
elementsofmorphology.nih.gov
Long narrow head dolichocephaly
scaphocephaly
www.orpha.net
Different resources, different terminologies
(e)HR:SNOMED CT
Others?
Free text
Mutation/patient registries,databases:
HPOLDDB
PhenoDBElements of morphology
Others? Free text?
Tools for diagnosis:
HPOLDDB
Orphanet
www.orpha.net
Levels of granularity
Disorders• Purpose: coding diagnoses (i.e. medical records)
Clinical manifestations• Purpose: describing patients, genotype-phenotype correlations,
… (i.e. assitance-to-diganosis tools, research databases)
Specialised terms• Fit the particular needs of a disease-focused database/registry
(i.e. Phe values in PKU and related disorders)
For phenotype annotations, interoperability between terminologies is needed at the clinical manifestations level.
www.orpha.net
Interoperability based on mappings
Syntaxic:• The terms are identical
Can be done by machines
Semantic:• The concepts are identical
Should be done by humans
Structural:• The comprehension of the concepts is identical
Impossible to maintain
www.orpha.net
Phenotype terminology project
Aims:
• Map commonly used clinical terminologies (Orphanet, LDDB, HPO,
Elements of morphology, PhenoDB, UMLS, SNOMED-CT, MESH,
MedDRA):
automatic map, expert validation, detection and correction of
inconsistencies
• Find common terms in the terminologies
• Produce a core terminology
Common denominator allowing to share/exchange phenotypic data
between databases
Mapped to every single terminology
www.orpha.net
Overview of project progressSept 2012: start of mappings (Orphanet)
EUGT2 – EUCERD workshop (Paris, September 2012)• Constitution of the International Consortium of Human
Phenotype Terminologies
ICHPT workshop (ASHG, Boston, October 2013)• Selection of 2,300 core terms
PhenoDB
HPO
Orphanet
LDDBElements of Morphology
POSSUMSNOMED CT (IHTSDO)
DECIPHERIRDiRC
www.orpha.net
Step 1: mapping terminologies
Orphanet: 1357 terms (Orphanet database, version 2008)
LDDB: 1348 dysmorphological terms (Installation CD)
Elements of Morphology: 423 terms (retrieved manually from
publication AJMG, January 2009)
HPO: 9895 terms (download bioportal, obo format, 30/08/12)
PhenoDB: 2846 terms (given in obo format, 02/05/2012)
UMLS: (version 2012AA) (integrating MeSH, MedDra, SNOMED CT)
www.orpha.net
ToolsOnaGUI (INSERM U729): ontology alignment tool
Work with file in owl format I-Sub algorithm: detect syntaxic
similarity Graphical interface to check
automatic mappings and manually add ones
Metamap (National Library of Medicine): a tool to map biomedical text to the UMLS Metathesaurus
Perl scripts: format conversion, launching Metamap, comparison of results…
www.orpha.net
Comparison of mappings and deductionPerl script to compare all the mappings and infer mappings of non-Orphanet terminologiesEg: Orphanet ID XX mapped to YY in HPO and ZZ in LDDB -> deduction: YY and ZZ should probably map
Retrieve HPO mappings versus UMLS, MeSH
First figures:LDDB El. Morpho PhenoDB HPO UMLS…
Orphanet E: 1062 E: 416 E: 978 E: 2228 E: 6948
LDDB D: 275 D: 533 D: 1123 D:2678
El. Morpho D: 177 D: 716 D: 409
PhenoDB D: 1045 D:3268
HPO D: 6307+4800
UMLS…
www.orpha.net
Mapping of non-Orphanet terminologiesAutomatic and infered mappings were checked by experts• Using OnaGUI for all, except UMLS
Automatic I-Sub: 7.0 + deduction• Metamap + deduction + HPO mappings
Figures:
El. Morpho PhenoDB HPO UMLS…
LDDB D: 257+23 added
D:528, 92%EA:674, 38%E
D: 1105, 87%EA: 2084, 23%E
D: 2654, 83%EA: 11731
El. Morpho D:174, 50%EA:189, 74%E
D:393, 93%EA: 436, 16%E
D:405, 84%EA:1248
PhenoDB D:1018, 91%EA: 4168, 6%E
D: 3222, 82%EA: 18776
HPO D: 7389A: 65535
UMLS…
www.orpha.net
First list of common terms
Present in at least 3 terminologies
Definition of rules for nomenclature
Addition of terms present in each terminology as synonyms
www.orpha.net
Next steps
Cleaning-up: around 2,300 terms as a result
Re-do mappings• In order to provide exact matchs
Revision process by the group
Addition of definitions• Elements of Morphology• HPO• New definitions
Release in a dedicated website, hosted by• Visualisation• Download
www.orpha.net
Talking each otherLong
narrow head
dolichocephalyscaphocephaly
www.orpha.net
Thank you for your attention