Use of Ontologies in the Life Sciences: BioPax
Graciela Gonzalez, PhD
(some slides adapted from presentations available at www.biopax.org)
Definition of an Ontology
Conceptualization of a domain of interest Concepts, relations, attributes, constraints, objects, v
alues… An ontology is a specification of a conceptualization
Formal notation Documentation
A variety of forms, but includes: A vocabulary of terms Some specification of the meaning of the terms
Ontologies – Key Aspects
Focus on semantics! Accurately model a complex domain Capture semantic nuances Rigorously define what each field means Adhere to those definitions!
Ontologies – Key Aspects
Ontologies are for people and computers:People browse the ontology to learn itIt encodes the definition of a concept so that
the computer “understands” it “understands” = automated reasoning with
concept definitionsIs concept A more general than concept B?Is X an instance of concept A?
Components of an Ontology
Concepts (Class, Set, Type, Predicate)ex: Gene, Reaction, Macromolecule
Taxonomy of concepts (generalization/specialization hierarchies)
ex: a physical interaction is an interaction Relations and Attributes Domains –values allowed for an attribute-
ex: a feature location consists of a sequence location Constraints and other meta-information about relations
ex: a pathway has at least one interaction
Ontologies in Bioinformatics
Biological DBs need to have a good ontology AND a good mapping –implementation- of it: this prevents errors on data entry and interpretation
Provide a common framework for multidatabase queries
Provide a controlled vocabulary, such as for genome annotation
For information extraction
BioPAX Biological PAthway eXchange
A data exchange ontology and format for biological pathway integration, aggregation and inference
Open source, ongoing
BioPAX Goals
Include support for these pathway types:Metabolic pathwaysSignaling pathwaysProtein-protein interactionsGenetic regulatory pathways
Note: representing pathways is nothing new
The problem
200 + pathway databases of different kinds (http://www.pathguide.org/)
Rich data, different ontologies Nightmare for integration and data exc
hange
Ontologies reflect “real life”
A typical pathway would be decomposed into:
A single pathway instance, which would contain several pathway steps, which would each contain one or more interactions occurring between physical entity participants, which each point to one physical entity.
BioPAX vs other ontologies
Conceptual framework based upon existing DB schemas, allowing wide range of detail, multiple levels of abstraction
Uses (refers to) existing ontologies to provide supplemental annotations where appropriate Cellular location GO Component Cell type Cell.obo Organism NCBI taxon DB
Incorporates other standards where appropriate Interoperates with existing standards
BioPax & other Exchange Formats
BioPAX
PSI-MI 2SBML,CellML
GeneticInteractions
Molecular InteractionsPro:Pro All:All
Interaction NetworksMolecular Non-molecularPro:Pro TF:Gene Genetic
Regulatory PathwaysLow Detail High Detail
DB ExchangeFormats
Simulation ModelExchange Formats
RateFormulas
Biochemical Reactions
Small MoleculesLow Detail High Detail
Metabolic PathwaysLow Detail High Detail
BioPAX level 1
Capturing data at different resolutions Metabolic pathway data has a high level of det
ail Molecular interaction have less
Ex: no causal or temporal aspects of interactions
BioPAX Level 2 captures molecular binding interactions at a relatively high level in the ontology class hierarchy
This reflects the fact that any given binding interaction may be a low-resolution (or more abstract) view of a more specific type of interaction.
Example
A signaling database would likely capture the interaction between MEK1 and ERK1 as a catalysis event (MEK1 catalyzes the phosphorylation of ERK1).
A molecular interaction database would likely store the interaction using a simpler abstraction, such as a protein-protein interaction.
BioPAX Level 2 supports both of these representations.
Aggregation, Integration, Inference with BioPax
1. Aggregation: represent multiple kinds of pathway databases
metabolic molecular interactions signal transduction gene regulatory
2. Integration: special constructs designed for integration
DB References XRefs (Publication, Unification, Relationship) Synonyms
3. OWL DL – to enable reasoning
BioPAX Ontology: Top Level
Pathway A set of interactions E.g. Glycolysis, MAPK, Apoptosis
Interaction A set of entities and some relationship between
them E.g. Reaction, Molecular Association, Catalysis
Physical Entity A building block of simple interactions E.g. Small molecule, Protein, DNA, RNA
Entity
Pathway
Interaction
Physical Entity
Subclass (is a)Contains (has a)
BioPAX Ontology: Physical Entities
PhysicalEntity
Complex RNAProtein Small Molecule
• This class serves as the super-class for all physical entities, although its current set of subclasses is limited to molecules. • This list may be expanded to include photon, environment, cell and cellular component in later levels of BioPAX.