use of uberon in the bgee database: how to deal with a complex, large, dynamic ontology?
DESCRIPTION
Presentation of the methods used to simplify the display of the Uberon ontology, and to maintain up-to-date annotations to it. Presented at the Biocuration 2013 conference.TRANSCRIPT
Use of Uberon in the Bgee database:
How to deal with a complex, large, dynamic ontology?
Frederic BastianBiocuration 2013
© 2013 SIB
A biocurator nightmare?
Ontologies now regularly include thousands of terms.
Complex relations are used, e.g., “transitively proximally connected to”.
Curators are expected to provide complex annotations, e.g.: post-composition of terms.
=> How can we simplify the use of complex ontologies?
© 2013 SIB
The Bgee database
http://bgee.unil.ch
© 2013 SIB
The Bgee database
Description of anatomy and development
http://bgee.unil.ch
© 2013 SIB
The Bgee database
Description of anatomy and development
Expression data
http://bgee.unil.ch
© 2013 SIB
The Bgee database
Description of anatomy and development
Expression data Homology
http://bgee.unil.ch
© 2013 SIB
The Bgee database
http://tinyurl.com/bgee12-hoxa5a
…
© 2013 SIB
The Bgee database
http://tinyurl.com/bgee12-hoxa5a
…
© 2013 SIB
The Bgee database
http://tinyurl.com/bgee12-hoxa5a
…
© 2013 SIB
Use of anatomical ontologies in Bgee
Several species-specific ontologies were used:
• ZFA
• XAO
• FBbt
• EMAPA, MA
• EHDAA, EV
© 2013 SIB
Use of anatomical ontologies in Bgee
Several species-specific ontologies were used:
• ZFA
• XAO
• FBbt
• EMAPA, MA
• EHDAA, EV
=> Limitation to add new species
=> Inconsistent anatomical descriptions, different formalisms adopted, etc.
© 2013 SIB
Homology relations between anatomical ontologies
To perform automated comparisons: • We built groups of homologous organs• We organized these groups into an ontology
VHOG:0000157 brain
EHDAA:2629 brainEHDAA:300 brainEHDAA:830 future brainEMAPA:16089 future brainEMAPA:16894 brainEV:0100164 brainMA:0000168 brainXAO:0000010 brainZFA:0000008 brainZFA:0000146 presumptive brain
© 2013 SIB
Homology relations between anatomical ontologies
To perform automated comparisons: • We built groups of homologous organs• We organized these groups into an ontology
=> vHOG ontologyvHOG, a multispecies vertebrate ontology of homologous organs groups
Bioinformatics (2012) 28(7): 1017-1020, 2012.
© 2013 SIB
Homology relations between anatomical ontologies
To perform automated comparisons: • We built groups of homologous organs• We organized these groups into an ontology
=> vHOG ontology
To add a species: • All groups need to be re-evaluated • The graph structure needs to be updated
=> Not maintainable on the long run
© 2013 SIB
And then came Uberon …
Fruit fly FBbt ‘tibia’ Human FMA ‘tibia’
UBERON: tibia
UBERON: bone
is_a
is_a
is_a
Vertebrata
Drosophila melanogaster
part_of
Homo sapiens
is_a
only_in_taxon
part_of
© 2013 SIB
And then came Uberon …
Fruit fly FBbt ‘tibia’ Human FMA ‘tibia’
UBERON: tibia
UBERON: bone
is_a
is_a
is_a
Vertebrata
Drosophila melanogaster
part_of
Homo sapiens
is_a
only_in_taxon
part_of
© 2013 SIB
And then came Uberon …
Uberon also provides a composite ontology:
Merges terms from species-specific ontologies, when term not present in Uberon.
=> Allow to import data from Model Organism Databases.
.... is_a UBERON:0003059 ! presomitic mesoderm devf UBERON:0002329 ! somite is_a ZFA:0000073 ! somite 5 (zebrafish) is_a ZFA:0000982 ! somite 6 (zebrafish) is_a EHDAA2:0001853 ! somite 05 (embryonic human) is_a EHDAA2:0001854 ! somite 06 (embryonic human)
© 2013 SIB
And then came Uberon … BUT
Uberon is complex:
• About 22 000 terms in the composite ontology
© 2013 SIB
And then came Uberon … BUT
Uberon is complex:
• About 22 000 terms in the composite ontology
• Use of advanced constructs, supported only in OWL• Use of high level abstract terms for interoperability
© 2013 SIB
And then came Uberon … BUT
Uberon is complex:
• About 22 000 terms in the composite ontology
• Use of advanced constructs, supported only in OWL• Use of high level abstract terms for interoperability
• Frequently updated, highly responsive• Structure changes when any imported species-specific
ontology changes => even more updated
© 2013 SIB
Uberon cannot be easily browsed
© 2013 SIB
First step: ontology simplification
© 2013 SIB
First step: ontology simplification
1. Simplification of the relations
Keep only is_a, part_of, develops_from.
Map all relations to their ancestors, e.g.:
develops_directly_from => develops_from
© 2013 SIB
First step: ontology simplification
2. Removal of redundant relations
A is_a B; B is_a C;
=> A is_a C is redundant.
© 2013 SIB
First step: ontology simplification
2. Removal of redundant relations
A is_a B; B is_a C;
=> A is_a C is redundant.
But, we consider part_of and is_a relations as equivalent.
A part_of B; B is_a C
=> A part_of C and A is_a C are considered redundant
This removes almost all “is_a anatomical entity”
© 2013 SIB
First step: ontology simplification
3. Removal of relations to upper_level terms
upper_level subset: "abstract upper-level terms not directly useful for analysis”
Terms useful for analysis are almost all present under “upper_level” terms, thus being confusing.
=> remove relations to “upper_level” terms if non-orphan
© 2013 SIB
First step: ontology simplification
3. Removal of relations to upper_level terms
upper_level subset: "abstract upper-level terms not directly useful for analysis”
Terms useful for analysis are almost all present under “upper_level” terms, thus being confusing.
=> remove relations to “upper_level” terms if non-orphan
[Term]id: MA:0000747name: lymph organ (mouse) is_a: UBERON:0001062 ! anatomical entityrelationship: part_of UBERON:0002465 ! lymphoid system
© 2013 SIB
First step: ontology simplification
3. Removal of relations to upper_level terms
upper_level subset: "abstract upper-level terms not directly useful for analysis”
Terms useful for analysis are almost all present under “upper_level” terms, thus being confusing.
=> remove relations to “upper_level” terms if non-orphan
[Term]id: MA:0000747name: lymph organ (mouse) is_a: UBERON:0001062 ! anatomical entityrelationship: part_of UBERON:0002465 ! lymphoid system
© 2013 SIB
First step: ontology simplification
3. Removal of relations to upper_level terms
upper_level subset: "abstract upper-level terms not directly useful for analysis”
Terms useful for analysis are almost all present under “upper_level” terms, thus being confusing.
=> remove relations to “upper_level” terms if non-orphan
[Term]id: UBERON:0007502name: epithelial plexusis_a: UBERON:0000480 ! anatomical group
© 2013 SIB
First step: ontology simplification
4. Generate species-specific versions
To simplify even more the “composite-metazoan” ontology, generate a version for each species used in Bgee.
© 2013 SIB
First step: ontology simplification
© 2013 SIB
Second step: track ontology changes
1. Store annotation status
- “Perfect” annotation: would not need to be refined as long as the term used is not obsoleted.
- “Missing granularity” annotation: a term is missing in the ontology, e.g., vastus lateralis.
If a new child was added to the term, refine annotation
© 2013 SIB
Second step: track ontology changes
2. Track ontology changes
- Compare the versions used between two annotation cycles.
- If a term used in a “missing granularity” annotation has new children, refine the annotation.
© 2013 SIB
Conclusion 1/2
To manage complex, frequently updated ontology:
1. Provide a formal version for the reasoning, and a simplified view for the end-user.
2. Store annotation status, to focus only on annotations which need to be updated.
© 2013 SIB
Conclusion 2/2
Major update of Bgee incoming for fall 2013:
- All expression data annotations are being transferred to Uberon.
- All homology information are being transferred from vHOG to Uberon, using an external file.
© 2013 SIB
Conclusion 2/2
Major update of Bgee incoming for fall 2013:
- All expression data annotations are being transferred to Uberon.
- All homology information are being transferred from vHOG to Uberon, using an external file.
And also:
- Besides present/absent calls, Bgee will include: overexpression calls; biologically significant expression.
- Revamped interfaces, webservices, APIs, …
© 2013 SIB
Advertisement! Other Bgee-related work
Poster 145:
Average rank IQR: a new improved method for Affymetrix microarray quality control for meta-analyses and database curation.
Marta Rosikiewicz
Database biocuration virtual issue: Uncovering hidden duplicated content in public transcriptomics data Marta Rosikiewicz, Aurélie Comte, Anne Niknejad, Marc Robinson-Rechavi, and Frederic B. Bastian Database Vol. 2013, bat010; doi:10.1093/database/bat010
Thank You
Marta RosikiewiczSébastien Moretti
Komal Sanjeev
Anne NiknejadAurélie Comte
Mathieu SeppeyMarc Robinson-Rechavi
And also:
• Melissa Haendel
• Chris Mungall