phenotype terminologies in use for genotype-phenotype databases: a common core for standardisation...

15
www.orpha.net Phenotype terminologies in use for genotype- phenotype databases: A common core for standardisation and interoperability HVP5 – UNESCO, Paris – 22 May 2014 Aymé S., Rath A., Chanas L., Hamosh A., Robinson P.N. & The International Consortium for Human Phenotype Terminologies

Upload: human-variome-project

Post on 10-May-2015

242 views

Category:

Science


1 download

DESCRIPTION

The community needs to be provided with terminology standards in order to achieve interoperability between databases intended for clinical research and including description of phenotypes. This is crucial to interpret genomic rearrangements as well as future high-throughput sequence data. The aim of our work was to promote a core terminology of phenotypes interoperable with all the terminologies in use. Relevant terminologies in use by different communities to describe phenomes were cross–referenced: PhenoDB (2846 terms), London Dysmorphology Database (LDDB; 1318 terms), Orphanet (1243 terms), Human Phenotype Ontology (9895 terms, 22/08/2012), Elements of Morphology (AJMG; 423 terms), ICD10 (1230 terms), as well as medical terminologies in use: UMLS (7,957,179 distinct concept terms), SNOMED CT (>311,000 concepts), MeSH (26,853 concepts) and MedDRA (69,389 concepts). We established a strategy to compare them to find commonalities and differences, using ONAGUI as a tool to pick-up exact matches. The non-exact matches were verified manually by an expert. A core-terminology of 2,300 terms was derived and analysed by a panel of experts (International Consortium for Human Phenotype Terminologies – ICHPT). The resulting consensual terminology will be freely available in a dedicated website (www.ichpt.org) and mappings with other terminologies will be given in order to ease the interoperability between databases without disturbing the habits of the different groups of users.

TRANSCRIPT

Page 1: Phenotype terminologies in use for genotype-phenotype databases: a common core for standardisation and interoperability - Ana Rath

www.orpha.net

Phenotype terminologies in use for genotype-phenotype

databases:

A common core for standardisation and interoperability

HVP5 – UNESCO, Paris – 22 May 2014

Aymé S., Rath A., Chanas L., Hamosh A., Robinson P.N. &The International Consortium for Human Phenotype Terminologies

Page 2: Phenotype terminologies in use for genotype-phenotype databases: a common core for standardisation and interoperability - Ana Rath

www.orpha.net

Do you mean?

elementsofmorphology.nih.gov

Long narrow head dolichocephaly

scaphocephaly

Page 3: Phenotype terminologies in use for genotype-phenotype databases: a common core for standardisation and interoperability - Ana Rath

www.orpha.net

Different resources, different terminologies

(e)HR:SNOMED CT

Others?

Free text

Mutation/patient registries,databases:

HPOLDDB

PhenoDBElements of morphology

Others? Free text?

Tools for diagnosis:

HPOLDDB

Orphanet

Page 4: Phenotype terminologies in use for genotype-phenotype databases: a common core for standardisation and interoperability - Ana Rath

www.orpha.net

Levels of granularity

Disorders• Purpose: coding diagnoses (i.e. medical records)

Clinical manifestations• Purpose: describing patients, genotype-phenotype correlations,

… (i.e. assitance-to-diganosis tools, research databases)

Specialised terms• Fit the particular needs of a disease-focused database/registry

(i.e. Phe values in PKU and related disorders)

For phenotype annotations, interoperability between terminologies is needed at the clinical manifestations level.

Page 5: Phenotype terminologies in use for genotype-phenotype databases: a common core for standardisation and interoperability - Ana Rath

www.orpha.net

Interoperability based on mappings

Syntaxic:• The terms are identical

Can be done by machines

Semantic:• The concepts are identical

Should be done by humans

Structural:• The comprehension of the concepts is identical

Impossible to maintain

Page 6: Phenotype terminologies in use for genotype-phenotype databases: a common core for standardisation and interoperability - Ana Rath

www.orpha.net

Phenotype terminology project

Aims:

• Map commonly used clinical terminologies (Orphanet, LDDB, HPO,

Elements of morphology, PhenoDB, UMLS, SNOMED-CT, MESH,

MedDRA):

automatic map, expert validation, detection and correction of

inconsistencies

• Find common terms in the terminologies

• Produce a core terminology

Common denominator allowing to share/exchange phenotypic data

between databases

Mapped to every single terminology

Page 7: Phenotype terminologies in use for genotype-phenotype databases: a common core for standardisation and interoperability - Ana Rath

www.orpha.net

Overview of project progressSept 2012: start of mappings (Orphanet)

EUGT2 – EUCERD workshop (Paris, September 2012)• Constitution of the International Consortium of Human

Phenotype Terminologies

ICHPT workshop (ASHG, Boston, October 2013)• Selection of 2,300 core terms

PhenoDB

HPO

Orphanet

LDDBElements of Morphology

POSSUMSNOMED CT (IHTSDO)

DECIPHERIRDiRC

Page 8: Phenotype terminologies in use for genotype-phenotype databases: a common core for standardisation and interoperability - Ana Rath

www.orpha.net

Step 1: mapping terminologies

Orphanet: 1357 terms (Orphanet database, version 2008)

LDDB: 1348 dysmorphological terms (Installation CD)

Elements of Morphology: 423 terms (retrieved manually from

publication AJMG, January 2009)

HPO: 9895 terms (download bioportal, obo format, 30/08/12)

PhenoDB: 2846 terms (given in obo format, 02/05/2012)

UMLS: (version 2012AA) (integrating MeSH, MedDra, SNOMED CT)

Page 9: Phenotype terminologies in use for genotype-phenotype databases: a common core for standardisation and interoperability - Ana Rath

www.orpha.net

ToolsOnaGUI (INSERM U729): ontology alignment tool

Work with file in owl format I-Sub algorithm: detect syntaxic

similarity Graphical interface to check

automatic mappings and manually add ones

Metamap (National Library of Medicine): a tool to map biomedical text to the UMLS Metathesaurus

Perl scripts: format conversion, launching Metamap, comparison of results…

Page 10: Phenotype terminologies in use for genotype-phenotype databases: a common core for standardisation and interoperability - Ana Rath

www.orpha.net

Comparison of mappings and deductionPerl script to compare all the mappings and infer mappings of non-Orphanet terminologiesEg: Orphanet ID XX mapped to YY in HPO and ZZ in LDDB -> deduction: YY and ZZ should probably map

Retrieve HPO mappings versus UMLS, MeSH

First figures:LDDB El. Morpho PhenoDB HPO UMLS…

Orphanet E: 1062 E: 416 E: 978 E: 2228 E: 6948

LDDB D: 275 D: 533 D: 1123 D:2678

El. Morpho D: 177 D: 716 D: 409

PhenoDB D: 1045 D:3268

HPO D: 6307+4800

UMLS…

Page 11: Phenotype terminologies in use for genotype-phenotype databases: a common core for standardisation and interoperability - Ana Rath

www.orpha.net

Mapping of non-Orphanet terminologiesAutomatic and infered mappings were checked by experts• Using OnaGUI for all, except UMLS

Automatic I-Sub: 7.0 + deduction• Metamap + deduction + HPO mappings

Figures:

El. Morpho PhenoDB HPO UMLS…

LDDB D: 257+23 added

D:528, 92%EA:674, 38%E

D: 1105, 87%EA: 2084, 23%E

D: 2654, 83%EA: 11731

El. Morpho D:174, 50%EA:189, 74%E

D:393, 93%EA: 436, 16%E

D:405, 84%EA:1248

PhenoDB D:1018, 91%EA: 4168, 6%E

D: 3222, 82%EA: 18776

HPO D: 7389A: 65535

UMLS…

Page 12: Phenotype terminologies in use for genotype-phenotype databases: a common core for standardisation and interoperability - Ana Rath

www.orpha.net

First list of common terms

Present in at least 3 terminologies

Definition of rules for nomenclature

Addition of terms present in each terminology as synonyms

Page 13: Phenotype terminologies in use for genotype-phenotype databases: a common core for standardisation and interoperability - Ana Rath

www.orpha.net

Next steps

Cleaning-up: around 2,300 terms as a result

Re-do mappings• In order to provide exact matchs

Revision process by the group

Addition of definitions• Elements of Morphology• HPO• New definitions

Release in a dedicated website, hosted by• Visualisation• Download

Page 14: Phenotype terminologies in use for genotype-phenotype databases: a common core for standardisation and interoperability - Ana Rath

www.orpha.net

Talking each otherLong

narrow head

dolichocephalyscaphocephaly

Page 15: Phenotype terminologies in use for genotype-phenotype databases: a common core for standardisation and interoperability - Ana Rath

www.orpha.net

Thank you for your attention