ontologygsk
TRANSCRIPT
![Page 1: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/1.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 1/57
Ontology
From Wikipedia, the free encyclopedia
In philosophy, ontology (from the Greek ὄ ν, genitive ὄ ντος: of being (part. of ε ἶ ναι: tobe) and -λογία: science, study, theory) is the most fundamental branch of metaphysics. It
studies being or existence and their basic categories and relationships, to determine whatentities and what types of entities exist. Ontology thus has strong implications for
conceptions of reality.
Some philosophers, notably of the Platonic school, contend that all nouns refer to entities.
Other philosophers contend that some nouns do not name entities but provide a kind of shorthand way of referring to a collection (of either objects or events). In this latter view,
mind , instead of referring to an entity, refers to a collection of mental events experienced
by a person; society refers to a collection of persons with some shared characteristics, and
geometry refers to a collection of a specific kind of intellectual activity. Any ontologymust give an account of which words refer to entities, which do not, why, and what
categories result. When one applies this process to nouns such as electrons, energy, contract , happiness, time, truth, causality, and god , ontology becomes fundamental tomany branches of philosophy.
Contents
• 1 Some basic questions
• 2 Concepts
• 3 Early history of ontology
• 4 Subject, relationship, object• 5 Body and environment• 6 Being
• 7 Social science
• 8 Prominent ontologists
• 9 See also
• 10 External links
Some basic questions
Ontology has one basic question: "What actually exists?" Different philosophers provide
different answers to this question.
One common approach is to divide the extant entities into groups called "categories".However, these lists of categories are also quite different from one another. It is in this
latter sense that ontology is applied to such fields as theology, library science and
artificial intelligence.
![Page 2: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/2.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 2/57
Further examples of ontological questions include:
• What is existence?
• Is existence a property?
• Why does something exist rather than nothing?
•
What constitutes the identity of an object?• What is a physical object?
• What features are the essential, as opposed to merely accidental, attributes of agiven object?
• Can one give an account of what it means to say that a physical object exists?
• What are an object's properties or relations and how are they related to the objectitself?
• When does an object go out of existence, as opposed to merely changing ?
Concepts
Quintessential ontological concepts include:
• Universals
• Substance
Early history of ontology
The concept of ontology is generally thought to have originated in early Greece andoccupied Plato and Aristotle. While the etymology is Greek, the oldest extant record of
the word itself is the Latin form ontologia, which appeared in 1606, in the work Ogdoas
Scholastica by Jacob Lorhard ( Lorhardus) and in 1613 in the Lexicon philosophicum byRudolf Göckel (Goclenius). The first occurrence in English of "ontology" as recorded bythe OED appears in Bailey’s dictionary of 1721, which defines ontology as ‘an Account
of being in the Abstract’. However its appearance in a dictionary indicates it was in use
already at that time. It is likely the word was first used in its latin form by philosophers based on the latin roots, which themselves are based on the Greek.
Students of Aristotle first used the word 'metaphysica' (literally "after the physical") to
refer to the work their teacher described as "the science of being qua being". The word'qua' means 'in the capacity of'. According to this theory, then, ontology is the science of being inasmuch as it is being, or the study of beings insofar as they exist. Take anything
you can find in the world, and look at it, not as a puppy or a slice of pizza or a foldingchair or a president, but just as something that is. More precisely, ontology concernsdetermining what categories of being are fundamental and asks whether, and in what
sense, the items in those categories can be said to "be".
Ontological questions have also been raised and debated by thinkers in the ancient
civilizations of India and China, in some cases perhaps predating the Greek thinkers whohave become associated with the concept.
![Page 3: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/3.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 3/57
Subject, relationship, object
"What exists", "What is", "What am I", "What is describing this to me", all exemplify
questions about being, and highlight the most basic problems in ontology: finding asubject, a relationship, and an object to talk about. During the Enlightenment the view of
René Descartes that "cogito ergo sum" ("I think therefore I am") had generally prevailed,although Descartes himself did not believe the question worthy of any deep investigation.However, Descartes was very religious in his philosophy, and indeed argued that "cogito
ergo sum" proved the existence of God. Later theorists would note the existence of the
"Cartesian Other " — asking "who is reading that sentence about thinking and being?" —
and generally concluded that it must be God.
This answer, however, became increasingly unsatisfactory in the 20th century as the
philosophy of mathematics and the philosophy of science and even particle physics
explored some of the most fundamental barriers to knowledge about being. Sociologicaltheorists, most notably George Herbert Mead and Erving Goffman, saw the Cartesian
Other as a "Generalized Other," the imaginary audience that individuals use whenthinking about the self. The Cartesian Other was also used by Freud, who saw the
superego as an abstract regulatory force.
Body and environment
Schools of subjectivism, objectivism and relativism existed at various times in the 20th
century, and the postmodernists and body philosophers tried to reframe all these
questions in terms of bodies taking some specific action in an environment. This relied toa great degree on insights derived from scientific research into animals taking instinctive
action in natural and artificial settings — as studied by biology, ecology, and cognitivescience.
The processes by which bodies related to environments became of great concern, and the
idea of being itself became difficult to really define. What did people mean when they
said "A is B", "A must be B", "A was B"...? Some linguists advocated dropping the verb
"to be" from the English language, leaving "E Prime", supposedly less prone to badabstractions. Others, mostly philosophers, tried to dig into the word and its usage.
Heidegger attempted to distinguish being and existence.
Being
Existentialism regards being as a fundamental central concept. It is anything that can besaid to 'be' in various senses of the word 'be'. The verb to be has many different meanings
and can therefore be rather ambiguous. Because "to be" has so many different meanings,
there are, accordingly, many different ways of being. In Systems-Theory, 'being'corresponds with the 'system-state' and Systems-Engineering(not system-
administration...) is the engineering-grade/wise onthology, which identifies to the
architects the existence of systems and defines their boundaries to them.
![Page 4: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/4.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 4/57
Social science
Social scientists adopt one of four main ontological approaches: realism (the idea that
facts are out there just waiting to be discovered), empiricism (the idea that we canobserve the world and evaluate those observations in relation to facts), positivism (which
focuses on the observations themselves, attentive more to claims about facts than to factsthemselves), and post-modernism (which holds that facts are fluid and elusive, so that weshould focus only on our observational claims).
Prominent ontologists
• Aquinas
• Aristotle
• Martin Heidegger
• Heraclitus
•
Edmund Husserl• Roman Ingarden
• Immanuel Kant
• Gottfried Leibniz
• Parmenides• Plato
• W. V. Quine
• Gilbert Ryle
• Jean-Paul Sartre
• Baruch Spinoza
• Alfred North Whitehead
•
Charles Taylor • Ludwig Wittgenstein
External links
• Aristotle's definition of a science of Being qua Being: ancient and moderninterpretations
• Buffalo Ontology Site
• Building a Sensor Ontology: A Practical Approach Leveraging ISO and OGCModels
• Example General Ontology
• National Center for Ontological Research• National Center for Biomedical Ontology
• Notes on the history of Ontology
• Ontology. A resource guide for philosophers
• Applied Ontology. An interdisciplinary journal on ontological analysis andconceptual modeling
• Laboratory for Applied Ontology
• Clay Shirky: Ontology is Overrated
![Page 5: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/5.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 5/57
• W3C Semantic Web
• WikiVentory on WikiPedia Meta
• The ontology of quantum fields: entity and quality
![Page 6: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/6.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 6/57
Open Biomedical Ontologies
Open Biomedical Ontologies (formerly Open Biological Ontologies) is an effort to create
controlled vocabularies for shared use across different biological and medical domains.As of 2006, OBO forms part of the resources of the U.S. National Center for Biomedical
Ontology, where it will form a central element of the NCBO's BioPortal.
Contents
• 1 OBO Foundry• 2 Related Projects
• 3 OBO and Semantic Web
• 4 External links
OBO Foundry
The OBO Ontology library forms the basis of the OBO Foundry, a collaborativeexperiment involving a group of ontology developers who have agreed in advance to the
adoption of a growing set of principles specifying best practices in ontology
development. These principles are designed to foster interoperability of ontologies withinthe broader OBO framework, and also to ensure a gradual improvement of quality and
formal rigor in ontologies, in ways designed to meet the increasing needs of data and
information integration in the biomedical domain.
Related Projects
Ontology Lookup Service
The Ontology Lookup Service is a spin-off of the PRIDE project, which required a
centralized query interface for ontology and controlled vocabulary lookup. While many
of the ontologies queriable by the OLS are available online, each has its own queryinterface and output format. The OLS provides a web service interface to query multiple
ontologies from a single location with a unified output format.
Gene Ontology Consortium
The goal of the Gene ontology (GO) consortium is to produce a controlled vocabularythat can be applied to all organisms even as knowledge of gene and protein roles in cells
is accumulating and changing. GO provides three structured networks of defined terms to
describe gene product attributes.
![Page 7: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/7.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 7/57
Sequence Ontology
The Sequence Ontology (SO) is a part of the Gene Ontology project and the aim is to
develop an ontology suitable for describing biological sequences. It is a joint effort bygenome annotation centres, including WormBase, the Berkeley Drosophila Genome
Project, FlyBase, the Mouse Genome Informatics group, and the Sanger Institute.
Generic Model Organism Databases
The Generic Model Organism Project (GMOD) is a joint effort by the model organism
system databases WormBase, FlyBase, MGI, SGD, Gramene, Rat Genome Database,EcoCyc, and TAIR to develop reusable components suitable for creating new community
databases of biology.
Standards and Ontologies for Functional Genomics
SOFG is both a meeting and a website; it aims to bring together biologists, bioinformaticians, and computer scientists who are developing and using standards and
ontologies with an emphasis on describing high-throughput functional genomics
experiments.
MGED
The Microarray Gene Expression Data (MGED) Society is an international organisationof biologists, computer scientists, and data analysts that aims to facilitate the sharing of
microarray data generated by functional genomics and proteomics experiments.
Ontology for Biomedical Investigations
The Ontology for Biomedical Investigations (OBI) is an open access, integrated ontology
for the description of biological and clinical investigations. OBI provides a model for the
design of an investigation, the protocols and instrumentation used, the materials used, thedata generated and the type of analysis performed on it. The project is being developed as
part of the OBO Foundry and as such adheres to all the principles therein such as
orthogonal coverage (i.e. clear delineation from other foundry member ontologies) and
the use of a common formal language. In OBI the common formal language used is theWeb Ontology Language (OWL).
Plant Ontology Consortium
The Plant Ontology Consortium (POC) aims to develop, curate and share structured
controlled vocabularies (ontologies) that describe plant structures andgrowth/developmental stages. Through this effort, the project aims to facilitate cross
database querying by fostering consistent use of these vocabularies in the annotation of
tissue and/or growth stage specific expression of genes, proteins and phenotypes.
![Page 8: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/8.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 8/57
OBO and Semantic Web
OBO2OWL - Layer Cakes and Roundtrip Transformations
As a migration path for biomedical ontologies, this is a solution for lossless roundtrip
transformations between Open Biomedical Ontologies (OBO) format and OWL. Containsmethodical examination of each of the constructs of OBO and a layer cake for OBO,
similar to the Semantic Web stack. Project Page Morphster Project
External links
• Open Biomedical Ontologies (OBO)
• The OBO Foundry
• Morphster ATOL Project at The University of Texas at Austin
• Ontology browser for most of the Open Biological Ontologies at BRENDA
website• OBO Relation Ontology
![Page 9: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/9.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 9/57
http://www.cs.man.ac.uk/~stevensr/onto
Ontology-based KnowledgeRepresentation for Bioinformatics
Robert Stevens , Carole A. Goble and Sean Bechhofer
Department of Computer Science and School of Biological Sciences
University of Manchester
Oxford Road
Manchester
M13 9PLrobert.stevens carole [email protected]
Abstract:
Much of biology works by applying prior knowledge (`what is known') to an unknownentity, rather than the application of a set of axioms that will elicit knowledge. In
addition, the complex biological data stored in bioinformatics databases often requires the
addition of knowledge to specify and constrain the values held in that database. One wayof capturing knowledge within bioinformatics applications and databases is the use of
ontologies. An ontology is the concrete form of a conceptualisation of a community's
knowledge of a domain.
This paper aims to introduce the reader to the use of ontologies within bioinformatics. Adescription of the type of knowledge held in an ontology will be given. The paper will be
illustrated throughout with examples taken from bioinformatics and molecular biology,
and a survey of current biological ontologies will be presented. From this it will be seenthat the use to which the ontology is put largely determines the content of the ontology.
Finally, the paper will describe the process of building an ontology, introducing the
reader to the techniques and methods currently in use and the open research questions inontology development.
![Page 10: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/10.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 10/57
Introduction
Biologists need knowledge in order to perform their work. A biologist will often use
some pre-existing item of knowledge to make inferences about the item under investigation. The most common example of this within molecular biology is the use of
sequence comparison to infer the function of a novel protein sequence. The reasoning is
that if a sequence of unknown function is highly similar to a sequence of known function,then it is probable that the novel sequence also has that function. So, rather than using a
rule, law or equation to find the function of a protein, a biologist uses the knowledge that
a similar sequence has a known function to make a judgment about the function of the
new sequence. This is why it is sometimes said that biology is a `knowledge based',rather than an `axiom based' discipline [1].
Modern biologists also need knowledge for communication. Biology is a data rich
discipline, which is available as a fund of knowledge by which biologists generate further knowledge. This knowledge is stored in many hundreds of databases and many of these
databases need to be used in concert during an investigation. Knowledge is vital in two
respects during this process. For instance, when using more than one data store or
analysis tool, a biologist needs to be sure that knowledge within one resource can bereliably compared to another. A prime example is the differing uses of the term `gene'
within the community. In one database, gene may be defined as `the coding region of
DNA'; in another as `DNA fragment that can be transcribed and translated into a protein'and `DNA region of biological interest with a name and that carries a genetic trait or
phenotype' in a third [2]. Being able to conform to a common definition or reason about
the differences between definitions, in order to reconcile databases, would be
advantageous. The second need for knowledge is to define and constrain data within aresource. Biological data can be very complex; not only in the type of data stored, but in
the richness and constraints working upon relationships between those data. When
designing a database it is useful to be able to describe what values can be specified for which attributes under which conditions. This is the encapsulation of biological
knowledge within database schema.
It is impossible for a single biologist to deal with all the domain knowledge. The arrivalof whole genomes and the knowledge they contain only exacerbates the situation. There
is, therefore, a need to create systems that can apply the knowledge in the heads of
domain experts to biological data. It is not envisaged that such systems could ever
perform better than human experts, however, they could play a crucial role in helping the processing of data to the point where human experts could again apply their knowledge
sensibly. This then raises numerous questions, in particular regarding how knowledge can
be captured in ways that make it available and useful within computer applications.
This briefing is about the use of such knowledge within bioinformatics applications.Knowledge can be captured and made available to both machines and humans by an
ontology. The premise for the need for ontologies within bioinformatics is the need to
![Page 11: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/11.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 11/57
make knowledge available to that community and its applications. This paper will only be
a brief introduction and will not be a complete guide to the philosophy, building and use
of an ontology. It does, however, aim to provide the foundations.
Section 2 gives the definitions of ontology and related terms. In Section 3, we will
describe the uses to which ontologies can be put, and then in Section 4 we will describesome current bioinformatics and molecular biology ontologies and how they are used.
Section 5 will describe the processes of conceptualisation and specification, or buildingof, an ontology. Finally, Section 6 draws together the main themes of the paper and
explores the future of ontologies in the bioinformatics domain.
![Page 12: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/12.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 12/57
What is an Ontology?Ontology is the study or concern about what kinds of things exist - what entities or `things' there are in the universe [3]. The computer science view of ontology is somewhat
narrower, where an ontology is the working model of entities and interactions either
generically (e.g. the Cyc ontology [4]) or in some particular domain of knowledge or practice, such as molecular biology or bioinformatics. The following definition is given
in [5]:
An ontology may take a variety of forms, but necessarily it will include a vocabulary of
terms, and some specification of their meaning . This includes definitions and an
indication of how concepts are inter-related which collectively impose a structure on thedomain and constrain the possible interpretations of terms.'
Gruber defines an ontology as `the specification of conceptualisations, used to help
programs and humans share knowledge' [6]. The conceptualisation is the couching of knowledge about the world in terms of entities (things, the relationships they hold and the
constraints between them). The specification is the representation of this
conceptualisation in a concrete form. One step in this specification is the encoding of theconceptualisation in a knowledge representation language. The goal is to create an
agreed-upon vocabulary and semantic structure for exchanging information about that
domain. The specification or encoding of an ontology will be explored in Section 5.
The main components of an ontology are concepts, relations, instances and axioms. Aconcept represents a set or class of entities or `things' within a domain. Protein is a
concept within the domain of molecular biology. Concepts fall into two kinds:
1. primitive concepts are those which only have necessary conditions (in terms of their properties) for membership of the class. For example, a globular protein is a
kind of protein with a hydrophobic core, so all globular proteins must have a
hydrophobic core, but there could be other things that have a hydrophobic core
that are not globular proteins.2. defined concepts are those whose description is both necessary and sufficient for a
thing to be a member of the class. For example, Eukaryotic cells are kinds of cellsthat have a nucleus. Not only does every eukaryotic cell have a nucleus, everynucleus containing cell is eukaryotic.
Relations describe the interactions between concepts or a concept's properties. Relations
also fall into two broad kinds:
![Page 13: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/13.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 13/57
1. Taxonomies that organize concepts into sub- super-concept tree structures. The
most common forms of these areo Specialisation relationships commonly known as the ‘is a kind of’
relationship. For example, an Enzyme is a kind of Protein, which in turn
is a kind of Macromolecule.
o Partitive relationships describe concepts that are part of other concepts -Protein has ComponentModificationSite.
2. Associative relationships that relate concepts across tree structures. Commonly
found examples include the following:o Nominative relationships describe the names of concepts - Protein
hasAccessionNumber AccessionNumber (in the context of
bioinformatics) and Gene hasName GeneName.
o Locative relationships describe the location of one concept with respect to
another - Chromosome hasSubcellularLocation Nucleus.
o Associative relationships that represent, for example, the functions,
processes a concept has or is involved in, and other properties of the
concept - Protein hasFunction Receptor, ProteinisAssociatedWithProcess Transcription and Protein
hasOrganismClassification Species.
o Many other types of relationships exist, such as `causative' relationships,
that are described in [7,8].
The relations, like concepts, can be organised into taxonomies. For example, hasName
can be subdivided into hasGeneName, hasProteinName and hasDiseaseName. Relations
also have properties that capture further knowledge about the relationships betweenconcepts. These include, but are not restricted to:
• whether it is universally necessary that a relationship must hold on a concept. For example, when describing a protein database, we might want to say that Protein
hasAccessionNumber AccessionNumber holds universally, i.e., for all proteins.
• whether a relationship can optionally hold on a concept, for example, we might
want to describe that Enzyme hascofactor Cofactor only describes the
possibility that enzymes have a cofactor, as not all enzymes do have a cofactor.
• whether the concept a relationship links to is restricted to certain kinds of
concepts. For example, Protein hasFunction Receptor restricts the
hasFunction relation to only link to concepts that are kinds of receptors. Protein
hasFunction says that Protein has a function but does not restrict as to what kind
of concept the function might be.
• the cardinality of the relationship. For example, a particular AccessionNumber isthe accession number of only one Protein, but one Chromosome may have many
Genes.
• whether the relationship is transitive, for example if Protein
isAssociatedWithProcess Transcription and Transcription
isAssociatedWithProcess GeneExpression, then Protein
isAssociatedWithProcess GeneExpression. The taxonomy relations always
have this property.
![Page 14: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/14.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 14/57
Once this conceptualisation has been made concrete (see Section 5) an ontology has been
produced.
Instances are the `things' represented by a concept - a human cytochrome C is an instanceof the concept Protein. Strictly speaking, an ontology should not contain any instances,
because it is supposed to be a conceptualisation of the domain. The combination of anontology with associated instances is what is known as a knowledge base. However,
deciding whether something is a concept of an instance is difficult, and often depends onthe application [9]. For example, Atom is a concept and `potassium' is an instance of that
concept. It could be argued that Potassium is a concept representing the different
instances of potassium and its isotopes etc. This is a well known and open question inknowledge management research.
Finally, axioms are used to constrain values for classes or instances. In this sense the
properties of relations are kinds of axioms. Axioms also, however, include more generalrules, such as nucleic acids shorter than 20 residues are oligonucleiotides.
![Page 15: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/15.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 15/57
Applications and Types of Bio-Ontologies
A common ideal for an ontology is that it should be re-usable [6]. This ambition
distinguishes an ontology from a database schema, even though both areconceptualisations. For example - a database schema is intended to satisfy only one
application, but an ontology could be re-used in many applications. However, an
ontology is only re-usable when it is to be used for the same purpose for which it wasdeveloped. Not all ontologies have the same intended purpose and may have parts that
are re-usable and other parts that are not. They will also vary in their coverage and level
of detail.
We can divide ontology use into three types:
1. Domain-oriented, which are either domain specific (e.g. E. coli) or domain
generalisations (e.g. gene function or ribosomes);2. Task-oriented, which are either task specific (e.g. annotation analysis) or task
generalisations (e.g. problem solving);3. Generic, which capture common high level concepts, such as Physical,
Abstract, Structure and Substance. This can be especially useful when trying
to re-use an ontology, as it allows concepts to be correctly or more reliably placed. It can also be important when generating or analysing natural language
expressions using an ontology. Generic ontologies are also known as `upper
ontologies', `core ontologies' or `reference ontologies'.
Most bio-ontologies will have a mixture of all three of these types in their ontology. A
well-formed ontology will be built in a modular way using a mixture of generic domain,generic task and application ontologies. Its parts will be clearly defined so that they can
be re-used. An less well-formed ontology will have these distinctions blurred, making re-use and modification more difficult. The measure of how well the dependencies in an
ontology have been separated is known as its ontological commitment . Other measures
for the quality of an ontology include its clarity, consistency, completeness andconciseness [6].
Ontologies are used in a wide range of application scenarios [10]:
1. A community reference - neutral authoring . The knowledge is authored in a
single language, and converted into a different form for use in multiple targetsystems. Benefits include knowledge re-use, improved maintainability and longterm knowledge retension;
2. Either defining database schema or defining a common vocabulary for database
annotation - ontology as specification. Describing a protein entry as`mitochondrial double stranded DNA binding protein' will ensure that a common
vocabulary is available for description, sharing and posing questions (see item 4
![Page 16: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/16.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 16/57
in this list). Benefits include documentation, maintenance, reliability, sharing and
knowledge re-use;
3. Providing common access to information. Information must be shared but isexpressed using unfamiliar vocabulary. The ontology helps to render the
information intelligible by providing a shared understanding of the terms or
mapping between the terms. Benefits include interoperability, and more effectiveuse and re-use of knowledge resources;
4. Ontology-based search by forming queries over databases. An ontology is used
for searching an information repository. For example, when searching databasesfor `mitochondrial double stranded DNA binding proteins', all and only those
proteins will be found, as the exact terms for searching can be used. Whether the
user of the terms can be sure of their meaning depends on how the knowledge in
the ontology has been represented. For example, is it explicit that the`mitochondrial' applies to the `DNA' or the `binding protein'?
Queries can be refined by following relationships within the ontology, for
example, following relationships to find those processes in which proteins of certain functions act and gathering the associated proteins. Moving up and downthe `is a kind of' hierarchy within the ontology can also be used to refine queries.
For example, specialising `DNA binding protein' to `single stranded DNA binding
protein' by moving down the hierarchy when the former gathered too manyanswers. Benefits include more effective access and hence more effective use and
re-use of knowledge resources;
5. Understanding database annotation and technical literature. These ontologies are
designed to support natural language processing (NLP) that not only link domainknowledge but also how knowledge is related to linguistic structures such as
grammar and lexicons.
Although some methodologies are beginning to emerge that compare the structure and
role of various ontologies [11], none have appeared that compare the content of oneontology with another for a specific domain.
![Page 17: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/17.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 17/57
A Survey of Current Bio-Ontologies
The use of ontology within bioinformatics is relatively recent and consequently there are
not a large number in existence. In this section, a representative sample of existing bio-ontologies will be reviewed. This survey has been restricted to those ontologies most
pertinent to current trends in bioinformatics and molecular biology, rather than the wider
field of biology. Biology is rich in taxonomies, such as the Enzyme Classification [12]and species taxonomies. Being taxonomies, they only use a subsumption hierarchy. The
ontologies reviewed here tend to be richer in their use of relationships, hence their
inclusion, but this is not to denigrate the usefulness of taxonomies to many applications.
The ontologies reviewed are as follows:
• The RiboWeb ontology http://smi-
web.stanford.edu/projects/helix/riboweb.html ;
•
The EcoCyc ontologyhttp://ecocyc.PangeaSystems.com/ecocyc/ecocyc.html ;
• The Schulze-Kremer ontology for molecular biology (MBO) http://igd.rz-
berlin.mpg.de/~www/oe/mbo.html;
• The Gene Ontology (GO) http://genome-www.stanford.edu/GO/;
• The TAMBIS Ontology (TaO) http://img.cs.man.ac.uk/tambis.
The content, in terms of scope, concepts and relationships, as well as the use of eachontology will be presented. In the section on building an ontology, these ontologies will
be revisited, as they also illustrate the variety of ontology building styles. Table 1
summarises these bio-ontologies with respect to organisation, structure, purpose and
content.
![Page 18: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/18.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 18/57
The RiboWeb Ontology
RiboWeb [13,14] is a resource whose primary aim is to facilitate the construction of
three-dimensional models of ribosomal components and compare the results to existing
studies. The knowledge that RiboWeb uses to perform these tasks is captured in four ontologies: The physical-thing ontology; the data ontology; the publication ontology and
the methods ontology. The physical-thing ontology describes ribosomal components and
associated `physical things'. It has three principle conceptualisations: Molecules,
Molecule-Ensembles and Molecule-Parts. The first describes covalently bonded
molecules and includes the main biological macromolecules. Molecule-ensembles
captures non-covalently bonded collections of molecules, such as enzyme complexes.The molecule-part ontology holds knowledge about regions of molecules that do not exist
independently, but need to be talked about by biologists. These would include amino acid
side chains and the 3' and 5' ends of nucleic acid molecules. The data ontology capturesknowledge about experimental detail as well as data on the structure of physical-things.
The methods ontology contains information about techniques for analysing data. It holdsknowledge of which techniques can be applied to which data, as well as the input andoutputs of each method.
Instances are added to RiboWeb that correspond to these concepts. For example, a
publication in a peer-reviewed article describes the three-dimensional structure of the 30s
ribosomal subunit. This means linked instances need to be created in the publication, dataand physical-thing ontologies. A user may want to see if this structure is consistent with
others captured within RiboWeb [14]. The constraints described within RiboWeb can
highlight conflicts with current knowledge to the biologist.
![Page 19: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/19.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 19/57
The EcoCyc Ontology
EcoCyc, like RiboWeb, uses an ontology to describe the richness and complexity of a
domain and the constraints acting within that domain, to specify a database schema [15].
EcoCyc is presented to biologists using an encyclopaedia metaphor. It covers E. coli.genes, metabolism, regulation and signal transduction, which a biologist can explore and
use to visualise information [16]. The knowledge base currently describes 4391 E. coli.
genes, 695 enzymes encoded by a subset of these genes, 904 metabolic reactions and theorganisation of these reactions into 129 metabolic pathways. EcoCyc uses the
classification of gene product function from Riley [17] as part of this description.
Scientists can visualise the layout of genes within the E. coli. chromosome, or of anindividual biochemical reaction, or of a complete biochemical pathway (with compound
structures displayed).
EcoCyc's use of an ontology to define a database schema has the advantages of its
expressivity and ability to evolve quickly to account for the rapid schema changes neededfor biological information [15]. The user is not aware of this use of an ontology, except
that the constraints expressed in the knowledge captured mean that the complexity of the
data held is captured precisely. In EcoCyc, for example, the concept of Gene is
represented by a class with various attributes, that link through to other concepts:
Polypeptide product, Gene name, synonyms and identifiers used in other databases
etc. The representation system can be used to impose constraints on those concepts andinstances which may appear in the places described within the system.
![Page 20: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/20.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 20/57
The Ontology for Molecular Biology
The Ontology for Molecular Biology (MBO) is an attempt to provide clarity and
communication within the molecular biology database community [2]. The use of MBO
would avoid `semantic confusion', such as that which arises with the use of the concept of Gene (see Section 1). Schulze-Kremer claims `By adhering to a commonly agreeable
ontology, uncertainty and misunderstanding about the semantic relations between
database entries from different databases can be eliminated.' This would mean that either the different databases agreed to the common MBO definition (and changed their
annotations accordingly) or inferences about the differences between each databases
conceptualisation of `gene' could be made in terms of the MBO. In either case, attemptscould then be made to reconcile or interoperate between the databases.
The MBO contains concepts and relationships that are required to describe biological
objects, experimental procedures and computational aspects of molecular biology [2]. It
is very wide ranging and has over 1200 nodes representing both concepts and instances.In the conceptual part of the MBO, the primary relationship used is the `is a kind of'
relationship. The MBO has an organising, upper-level ontology. The root concept
``Being' divides into `object' and `event'. `Object', for instance, is subdivided into`physical-' and `abstract-' object. This helps give a precise classification for lower level
concepts - so, `physics objectis an `abstract object' and `DNA' a `physical-object'. MBO
defines a linkage map from GDB in the following way: `DBObject MappingObject Map
LinkageMap' (the represents the sub-concept relationship).
The actual biological content of the MBO is currently relatively small, ending at quite
large grained concepts such as Protein, Gene, and Chromosome. The framework,
however, exists for extending the MBO much further into the biological domain.
![Page 21: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/21.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 21/57
The TAMBIS Ontology
TAMBIS (Transparent Access to Multiple Bioinformatics Information Sources) uses an
ontology to enable biologists to ask questions over multiple external databases using a
common query interface [1]. The TAMBIS ontology (TaO) [19] describes a wide rangeof bioinformatics tasks and resources, and has a central role within the TAMBIS system.
An interesting difference between the TaO and some of the other ontologies reviewed
here, is that the TaO does not contain any instances. The TaO only contains knowledgeabout bioinformatics and molecular biology concepts and their relationships - the
instances they represent still reside in the external databases. As concepts represent
instances, a concept can act as a question. The concept Receptor Protein represents the
instances of proteins with a receptor function and gathering these instances is answering
that question.
The TaO is a dynamic ontology, in that it can grow without the need for either conceptualising or encoding new knowledge. In contrast, the other ontologies describedhere are static - developers must interveen and encode new conceptualisation to form new
concepts. The TaO uses rules within the ontology to govern what concepts can be joined
to another concept via relationships, to form new concepts. Thus the TaO places great
emphasis on relations. A user can form a complex, multi-source query, usingrelationships, in the following manner. Starting with the concept Protein, the TaO is
consulted as to which relationships can be used to join Protein to other concepts.
Amongst many, the following two are offered: is homologous to Protein and
hasAccessionNumber AccessionNumber. Initially, the original Protein is extended to
give a new concept Protein isHomologous to Protein (The concept Protein
Protein homologue); then the second `protein' is extended with hasAccessionNumber
AccessionNumber. The resulting concept (`Protein homologue of Protein with Accession
Number') describes proteins which are homologous to protein with a particular accessionnumber. This concept can be used as a source independent query containing no
information on how to answer such a query. The rest of the TAMBIS system takes this
conceptual query and processes it to an executable program against the external
sources [20].
The TaO is available in two forms - a small model that concentrates on proteins and a
larger scale model that includes nucleic acids. The small TaO, with 250 concepts and 60
relationships, describes Proteins and enzymes, as well as their motifs, secondary and
tertiary structure, functions and processes. There is also supporting material onsubcellular structure and chemicals, including cofactors. Motifs extend to detail such as
the principal modification sites; function and process to broad classifications such as Hormone and Receptor , and Apoptosis and Lactation; structure extends to detail such asgross architecture - for example, SevenPropellor . Important relationships include is
component of , has name, has function and is homologous to, as well as many more. The
larger model, with 1500 concepts, broadens these areas to include concepts pertinent tonucleic acid, its children and genes.
![Page 22: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/22.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 22/57
The Gene Ontology
The Gene Ontology (GO), like the MBO, has database annotation as its main purpose.GO, however, has grown up from within a group of databases, rather than being proposed
from outside. GO's scope is also narrower; instead of attempting to describe the whole of
molecular biology captured in the community's databases, GO seeks to captureinformation about the role of gene products within an organism. The classification of
gene function by Riley [17] has a similar scope, but for E. coli only. GO was initially
created to reflect Drosophila gene function via the Flybase database [18], but has
expanded to encompass mouse yeast and gene expression databases, and is expected toexpand further. Thus, the main use of GO is as a controlled vocabulary for conceptual
annotation of gene product function, process and location in databases.
GO lacks any upper-level organising ontology. It is essentially composed of threehierarchies, representing the function of a gene product; the process in which it takes
place and cellular location and structure. GO contains a wide range of concepts, and
provides a rich level of detail in its three hierarchies. It uses the `is a kind of' and `is part
of ' relationships to describe the role of gene products. It currently has over 5000concepts within the ontology.
GO defines a fine level of conceptual detail: Double stranded DNA binding proteins;
Transcription factors; cytosolic chaperones; muscle motor protein; learning and memory;
blood coagulation; male genital morphogenesis; ventral pattern formation; and many
pathways, transport and signal transduction systems. GO uses multiple inheritance in the`is a kind of' hierarchy in forming some of the concepts and there is some use of an `is
part of' relationship. Many of the relationships held by concepts, however, remainimplicit in GO. For example, the concept `succinate (cytosol) to fumarate
(mitochondrion) transporter' implicitly holds properties about location and orientation in
the mitochondrial membrane etc.
Summary
There are two important messages from this brief survey of bio-ontologies: the first is
that ontologies are being used within the community to provide knowledge input todatabases and applications. The second message is that all these ontologies are verydifferent and specific to their intended use. TaO is an ontology of bioinformatics
tasks and so contains such concepts as AccessionNumber and ProteinId ,
which are not part of the world of molecular biology. The TaO could not be
substituted for EcoCyc's ontology. GO is an ontology of gene product function and
RiboWeb represents knowledge of Ribosomal subunit structure, data andmethodologies. As GO is used for database annotation, it holds a fine level of detail
![Page 23: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/23.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 23/57
whereas the TaO is quite shallow, but precision is gained during query formulation
by joining concepts together. Even if one ontology could be developed, individual
applications would only use a subset, leading to a requirement of highly modular ontologies with minimised dependencies and assumptions between them. That
ontology use influences the content and nature of the knowledge captured within an
ontology is not a contradiction of the knowledge holding ability of ontologies. Notonly does the purpose determine the scope and granularity to which the same
knowledge is represented in different ontologies, but conceptualisations may differ
without one being incorrect. For example, TaO describes that DNA may be translatedto protein. This is wrong in molecular biological terms, but is a feature of
bioinformatics - so conceptualisations of the same domain may differ. Sometimes a
constraint is necessary for an application and sometimes it is not needed for another,
this simply changes what knowledge is captured or how it is captured, it does notchange the knowledge itself.
![Page 24: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/24.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 24/57
Building an Ontology
Although there is some collective experience in developing and using ontologies, there is
no field of ontological engineering comparable to knowledge engineering. In particular,as yet, there are no standardised methodologies for building ontologies. Such a
methodology would include a set of stages that occur when building ontologies,
guidelines and principles to assist in the different stages, and an ontology life-cycle whichindicates the relationships among stages [21]. The most well known ontology
construction guidelines were developed by Gruber [6], to encourage the development of
more re-usable ontologies. Recently, there has been increased effort in trying to develop a
comprehensive ontology methodology (e.g. [22,23,21]). A survey is given in [24].
The Development Lifecycle
Methodologies broadly divide into those that are stage-based (e.g. TOVE [21]) and thosethat rely on iterative evolving prototypes (e.g. Methontology [25]). These are in factcomplementary techniques. Most distinguish between an informal stage, where the
ontology is sketched out using either natural language descriptions or some diagram
technique, and a formal stage where the ontology is encoded in a formal knowledge
representation language, that is machine computable. As an ontology should ideally becommunicated to people and unambiguously interpreted by software, the informal
representation helps the former and the formal the latter.
Figures 1 and 2 represents a skeletal methodology and life-cycle for building ontologies,
inspired by the software engineering V-process model [26]. The left side of the V charts
the processes in building an ontology and the right side charts the guidelines, principlesand evaluation used to `quality assure' the ontology. The overall process, however, moves
through a life-cycle, as depicted in Figure 2.
![Page 25: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/25.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 25/57
Figure 1: The V-model inspired methodology for building ontologies.
![Page 26: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/26.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 26/57
Figure 2: The ontology building life-cycle.
The stages in the V-process model and life-cycle are:
Identify purpose and scope:
developing a requirements specification for the ontology by identifying the
intended scope and purpose of the ontology. A well-characterised requirements
specification is important to the design, evaluation and re-use of an ontology. Itcan be seen from Section 4 that the use to which an ontology is put has a great
effect on the content and style of that ontology.
Knowledge Acquisition:
the process of acquiring domain knowledge from which the ontology will be built.Sources span the complete range of knowledge holders: Specialist biologists;
database metadata; standard text books; research papers and other ontologies.
Motivating scenarios are collected and informal competency questions formed[21] - these are informal questions that the ontology must be able to answer and
will be used to check that the ontology is fit for purpose. The EcoCyc and
RiboWeb ontologies had the bulk of their knowledge gathered from the researchliterature on E. coli. metabolism and ribosomal structure respectively. In the
![Page 27: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/27.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 27/57
former case this was a huge volume of material, which took many years to
process. The TaO, being built to query databases, extracted a large part of its
knowledge from database documentation. Standard texts also contributed to theknowledge of core molecular biology.
Conceptualisation:
identifying the key concepts that exist in the domain, their properties and therelationships that hold between them; identifying natural language terms to refer
to such concepts, relations and attributes; and structuring domain knowledge into
explicit conceptual models. This is the process touched upon in Section 2, wherethe concepts and relationships describing the domain are captured. The ontology
is usually described using some informal terminology. Gruber [6] suggests
writing lists of the concepts to be contained within the ontology and exploring
other ontologies to re-use all or part of their conceptualisations and terminologies.At this stage it is important to bear the results of the first step, that of
requirements gathering, in mind.
Integrating:
use or specialise an existing ontology: a task frequently hindered by theinadequate documentation of existing ontologies, notably their implicit
assumptions. Using a generic ontology, such as MBO, or [27,28] gives a deeper definition of the concepts in the chosen domain.
Encoding:
representing the conceptualisation in some formal language, e.g. frames, object
models or logic. This includes the creation of formal competency questions interms of the terminological specification language chosen (usually first order
logic). The representation of ontologies is explored further below.
Documentation:
informal and formal complete definitions, assumptions and examples are essential
to promote the appropriate use and re-use of an ontology. Documentation is
important for defining, more expansively than is possible within the ontology, theexact meaning of terms within the ontology.
Evaluation:
determining the appropriateness of an ontology for its intended application.Evaluation is done pragmatically, by assessing the competency of the ontology to
satisfy the requirements of its application, including determining the consistency,
completeness and conciseness of an ontology [25]. Conciseness implies an
absence of redundancy in the definitions of an ontology and an appropriategranularity. For example, an ontology that modelled protein molecules at the
atomic resolution when the amino acid level would suffice would not be
considered concise.
![Page 28: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/28.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 28/57
![Page 29: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/29.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 29/57
Vocabularies support the creation of purely hand-crafted ontologies with simple tree-like
inheritance structures. The Gene Ontology, for example, has a hierarchical structure
which is asserted - the position of each concept and its relation with others in theontology is completely determined by the modeller or ontologist. Each entry or concept
in the GO has a name, an identifier and other optional pieces of information such as
synonyms, references to external databases and so on.
Although this provides great flexibility, the lack of any structure in the representation canlead to difficulties with maintenance or preserving consistency, and there are usually no
formally defined semantics. The single inheritance provided by a tree structure (each
concept has only one parent in the is-a hierarchy) can also prove limiting. Maintainingmultiple inheritance hierarchies, however, is an arduous task - the hand-crafting of single
inheritance hierarchies is a difficult enough exercise.
A frame-based system provides greater structure. Frame-based systems are based around
the notion of frames or classes which represent collections of instances (the concepts of
the ontology). Each frame has an associated collection of slots or attributes which can befilled by values or other frames. In particular, frames can have a kind-of slot which
allows the assertion of a frame taxonomy. This hierarchy can then be used for inheritanceof slots, allowing a sparse representation. As well as frames representing concepts, a
frame-based representation may also contain instance frames, which represent particular
instances.
Frame-based systems have been used extensively in the KR world, particularly for applications in natural language processing. The most well known frame system is
Ontolingua [31]. Both EcoCyc and RiboWeb use a frame representation. EcoCyc has a
frame, amongst others, called `Gene', representing the concept Gene. This frame has slots
describing relationships to other concepts, such as Polypeptide product, gene name,synonyms and so on. Frames are popular because frame-based modelling is similar to
object-based modelling and is intuitive for many users.
The semantics of frame systems are defined by the OKBC standard [32], although this isa little unclear in places. For example, it is not always clear how to interpret an assertion
that a slot is filled with a particular value. Does this mean that all instances of the frame
must have this particular attribute taking this value? Or does the value represent possiblefillers for the slot for each instance? For example, we might want to say that the frameGene has a slot saying `all genes must have a GeneName', but it is only a possibility that
Genes `have a Polypeptide Product' (some, after all, produce tRNAs).
An alternative to frames is logic, notably Description Logics (DLs) [33,34]. DLs describe
knowledge in terms of concepts and relations that are used to automatically derive
classification taxonomies. A major characteristic of a DL is that concepts are defined in
terms of descriptions using other roles and concepts. For instance, in the TaO, theconcept Enzyme was not simply asserted by the ontologist. Instead, a composite concept
was made from Protein and Reaction, joined with the relation `catalyses' - to make the
concept Protein which catalyses Reaction. Thus someone viewing the ontology
![Page 30: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/30.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 30/57
![Page 31: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/31.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 31/57
tools are really essential for maintaining complex ontologies that are necessary for
capturing knowledge within the biology domain. Other tools support the collaborative
development of ontologies over the web (e.g. WebOnto [40]). A survey of tools can befound in [41].
DiscussionThis briefing has introduced the need and use of ontology within the bioinformaticscommunity. The need for ontologies arises from the need to be able to cope with the size
and complexity of biological knowledge and data. Ontologies enable knowledge to be
used within systems for communication, specification and other processing tasks (see
Section 3).
Several bio-ontologies have already been used within the community. Those reviewed in
Section 4 demonstrate a wide range of scopes and granularities. Most have in common
some core features of molecular biology, such as Gene, Protein and relatedbiologicalFunction and BiologicalProcess, but differ widely in both the content
and articulation of their knowledge. This is primarily due to the wide range of tasks to
which the ontologies are put. Both RiboWeb and EcoCyc use part of their ontology todefine the structure and content of their databases, but as the databases are as different as
ribosomal subunit structure and E. coli. metabolism, the ontologies are also necessarily
different. Even the common areas, such as macromolecule, differ widely between
ontologies, but without any of the ontologies being incorrect.
Bio-ontologies are currently being used for communication of knowledge, as well as
database schema definition, query formulation and annotation. When the use of
conceptual annotation grows we can expect to see a concomitant change in databaseretrieval. This will become much more precise and complete than is currently possiblewith natural language based annotations. Annotation by ontologies should also allow the
relationships describing functions, process and components etc. of retrieved entries to be
explored with ease.
There are a number of open issues to be addressed in the use of ontology within the
bioinformatics community:
Knowledge based reasoning
This briefing started with a description of how biology research is often driven by
the use of knowledge, especially by determination of function by sequencesimilarity. Only RiboWeb, of the ontologies described, approaches this kind of
use. It can be expected that the use of ontology to assist in analysis will grow
further. This will be made easier by the conceptual annotation of the primarydatabases - A collection of similar sequences returned by a search could be
clustered within an ontology of protein function and features. Such clustering
should be able to help with the analysis of similarity search results and other bioinformatics analyses.
![Page 32: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/32.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 32/57
Re-use vs Specific
Currently there is little re-use of bio-ontologies - this is partly because of
difficulties in the diversity of their representational form, the explicitness of their semantics and the range of applications they address. OIL moves us further
forward to a common representational language. As the number of bio-ontologies
increases, it will be interesting to see whether there is a growth in the re-use of ontology. The use of ontology in annotation could drive this process, as well as
that of ontology in analysis. An open issue in ontology re-use is the evolution of
the source ontology once it has been re-used in another ontology. If the originalontology changes, should the changes be reflected where it is re-used and how
would this evolution be managed?
Tools and Libraries
The frame-based Protégé ontology development tool [42] is currently beingadapted to represent ontologies in OIL, so that we can build and deliver frame-
based ontologies whilst gaining from the reasoning services offered by a DL. This
may be less important with small local ontologies designed by one expert, but
becomes important for large, collaboratively developed ontologies that areintended to be re-used and shared. Libraries of ontologies, such as those held by
WebOnto and Ontolingua, must be developed if re-use is to be promoted.
Methodologies for constructing ontologies
The process of building an ontology, as described in Section 5, is a high-cost
process. The reality is that the construction of ontologies is an art rather than a
science. Methodologies (supported by tools) are essential to: help the developer spot a concept; to modularise their ontologies; to avoid problems such as over
elaboration (when should I stop elaborating the ontology); to ensure relevance
(when is a concept relevant for an application?) and to verify the ontology for itsfitness of purpose and its re-usability (if any).
If the application genuinely needs an ontology and that ontology will be long lived, then
the investment may well be worth while. Like many technologies, in a discipline such as
bioinformatics, it is the community effort that is important in making the use of thattechnology productive.
Acknowledgements: Robert Stevens is supported by a grant from the BBSRC/EPSRC
under the bioinformatics initiative (34/BIO12090); Sean Bechhofer is supported by a
grant from the EPSRC under the DIM initiative (GR/M/75426).
Bibliography
1
P.G. Baker, A. Brass, S. Bechhofer, C. Goble, N. Paton, and R. Stevens.
TAMBIS: Transparent Access to Multiple Bioinformatics Information Sources.An Overview.
In Proceedings of the Sixth International Conference on Intelligent Systems for
Molecular Biology (ISMB'98), pages 25-34, Menlow Park, California, June 28-July 1 1998. AAAI Press.
![Page 33: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/33.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 33/57
2
S. Schulze-Kremer.
Ontologies for Molecular Biology.In Proceedings of the Third Pacific Symposium on Biocomputing , pages 693-704.
AAAI Press, 1998.
3 S. Blackburn.
The Oxford Dictionary of Philosophy.
Oxford University Press, 1996.4
D. B. Lenat.
Cyc: A Large-Scale Investment in Knowledge Infrastructure.Communications of the ACM , 38(11):32-38, 1995.
5
M. Uschold, M. King, S. Moralee, and Y. Zorgios.
The Enterprise Ontology.
The Knowledge Engineering Review, 13(1):31-89, 1998.SpecialIssue on Putting Ontologies to Use.
6T.R. Gruber.
Towards Principles for the Design of Ontologies Used for Knowledge Sharing.
In Roberto Poli Nicola Guarino, editor, International Workshop on Formal
Ontology, Padova, Italy, 1993.Available as technical report KSL-93-04, Knowledge Systems Laboratory,
Stanford University: ftp.ksl.ftanford.edu/pub/KSL_Reports/KSL-983-
04.ps.
7
M. Winston, R. Chaffin, and D. Herrmann.A Taxonomy of Part-Whole Relations.Cognitive Science, 11:417-444, 1987.
8
J.J. Odell.Six Different Kinds of Aggregation, pages 139-149.
Cambridge University Press, 1998.
9R. J. Brachman, D. L. McGuinness, P. F. Patel-Schneider, L. A. Resnick, and
A. Borgida.
Living with Classic: When and How to Use a KL-ONE-like Language.
In J. Sowa, editor, Principles of Semantic Networks: Explorations in the
representation of knowledge, pages 401-456. Morgan Kaufmann, 1991.
10
R. Jasper and M. Uschold.A Framework for Understanding and Classifying Ontology Applications.
In Twelfth Workshop on Knowledge Acquisition Modeling and Management KAW'99, 1999.Published on-line http://sern.ucalgary.ca/KSI/KAW/KAW99/ .
![Page 34: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/34.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 34/57
11
Nicola Guarino and C. Welty.
Identity, Unity, and Individuality: Towards a Formal Toolkit for OntologicalAnalysis.
In W Horn, editor, Proceedings of ECAI-2000: The European Conference on
Artificial Intelligence, Amsterdam, August 2000. IOS Press.12
International Union of Biochemistry . Enzyme Nomenclature 1984 : Recommendations of the Nomenclature Committee
of the International Union of Biochemistry on the Nomenclature and
Classification of Enzyme-Catalyzed Reactions.
Academic Press (for The International Union of Biochemistry by), Orlando, FL,
1984.13
R.O. Chen, R. Felciano, and R.B. Altman.
RiboWeb: Linking Structural Computations to a Knowledge Base of Published
Experimental Data.In Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology, pages 84-87. AAAI Press, 1997.
14
R. Altman, M. Bada, X.J. Chai, M. Whirl Carillo, R.O. Chen, and N.F.
Abernethy.
RiboWeb: An Ontology-Based System for Collaborative Molecular Biology. IEEE Intelligent Systems, 14(5):68-76, 1999.
15
P. Karp and S. Paley.Integrated Access to Metabolic and Genomic Data.
Journal of Computational Biology, 3(1):191-212, 1996.
16P. Karp, M. Riley, S. Paley, A. Pellegrini-Toole, and M. Krummenacker.
EcoCyc: Electronic Encyclopedia of E. coli Genes and Metabolism. Nucleic Acids Research, 27(1):55-58, 1999.
17
M. Riley.
Functions of the gene products of Escherichia coli.Microbiological Reviews, 57:862-952, 1993.
18
The FlyBase Consortium.
The FlyBase database of Drosophila Genome Projects and Community Literature.
Nucleic Acids Research, 27(1):85-88, 1999.http://flybase.bio.indiana.edu/ ).
19
P.G. Baker, C.A. Goble, S. Bechhofer, N.W. Paton, R. Stevens, and A Brass.
An Ontology for Bioinformatics Applications. Bioinformatics, 15(6):510-520, 1999.
![Page 35: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/35.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 35/57
20
N.W. Paton, R.D. Stevens, P.G. Baker, C.A. Goble, S. Bechhofer, and A. Brass.
Query Processing in the TAMBIS Bioinformatics Source Integration System.In et al . Z.M. Ozsoyoglo, editor, Proc. 11th Int. Conf. on Scientific and Statistical
Database Management (SSDBM), pages 138-147, Los Alamitos, California, July
1999. IEEE Press.21
M. Uschold and M. Gruninger.
Ontologies: Principles, Methods and Applications. Knowledge Engineering Review, 11(2):93-113, June 1996.
22
M. Fernandez, A. Gomez-Perez, and N. Juristo.
METHODONTOLOGY: From Ontological Art to Ontological Engineering.In Workshop on Knowledge Engineering: Spring Symposium Series (AAAI'97),
pages 33-40, Menlow Park, Ca, 1997. AAAI Press.
23
M. Gruninger and M. S. Fox.Methodology for the Design and Evaluation of Ontologies.
In IJCAI Workshop on Basic Ontological Issues in Knowledge Sharing , 1995.Published on-line by Ceur Publication http://sunsite.informatik.rwth-
aachen.de/Publications/CEUR-WS/Vol-18/ .
24D.M. Jones, T.J.M. Bench-Capon, and P.R.S. Visser.
Methodologies for Ontology Development.
In J. Cuena, editor, Proc. ITi and KNOWS Conference of the 15th IFIP World
Computer Congress, pages 62-75, London, UK, 1998. Chapman and Hall Ltd.25
A. Gomez-Perez.Some Ideas and Examples to Evaluate Ontologies.Technical Report Technical Report KSL-94-65, Knowledge Systems Laboratory ,
Stanford, 1994.
26M.A. Ould.
Strategies for Software Engineering : The Management of Risk and Quality.
Chichester : Wiley, 1990.(Wiley series in software engineering practice.
27
A.L. Rector, J.E. Rogers, and P Pole.
The Galen high level ontology.Studies in Health Technology and Informatics, 34:174-178, 1996.
28
J. Sowa.Top-level ontological categories.
International Journal of Human-Computer Studies, 43(5/6):669-686, 1995.
29
![Page 36: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/36.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 36/57
D.A. Duce G.A. Ringland. Approaches to Knowledge Representation: An Introduction.
Knowledge-Based and Expert Systems Series. John Wiley, Chichester, 1988.30
I. Horrocks, D. Fensel, J. Broekstra, M. Crubezy, S. Decker, M. Erdmann,
W. Grosso, C. Goble, F. Van Harmelen, M. Klein, M. Musen, S. Staab, andR. Studer.
The ontology interchange language oil: The grease between ontologies.http://www.cs.vu.nl/~dieter/oil.
31
A. Farquhar, R. Fikes, and J.P. Rice.
The ontolingua server: A tool for collaborative ontology construction. Journal of Human-Computer Studies, 46:707-728, 1997.
32
V.K. Chaudhri, A. Farquhar, R. Fikes, P.D. Karp, and J.P. Rice.
OKBC: A programmatic foundation for knowledge base interoperability.
In Proc of 15th National Conf on At (AAAI-98) and the 10th Conf on Innovative Applications of AI (IAAI-98), pages 600-607, Menlow Park, Ca, 1998. AAAI
Press.33
A. Borgida.
Description Logics in Data Management. IEEE Trans Knowledge and Data Engineering , 7(5):671-782, 1995.
34
W. A. Woods and J. G. Schmolze.
The KL-ONE Family.Computers Math. Applic., 23(2-5):133-177, 1992.
35 S Bechhofer and C.A. Goble.Delivering Terminological Services.
AI*IA Notizie, Periodico dell'Associazione Italiana per l'intelligenza Artificiale,
12(1), March 1999.36
I. Horrocks.
Using an Expressive Description Logic: FaCT or Fiction?
In A.G.Cohn, L.K. Schubert, and S.C.Shapiro, editors, Principles of Knowledge
Representation and Reasoning: Proceedings of the Sixth International Conference (KR'98). Morgan Kaufmann Publishers, San Fransisco, CA, 1998.
37A. L. Rector, S. K. Bechhofer, C. A. Goble, I. Horrocks, W. A. Nowlan, and
W. D. Solomon.
The GRAIL Concept Modelling Language for Medical Terminology. Artificial Intelligence in Medicine, 9:139-171, 1996.
38
J.E. Rogers, W.D. Solomon, A.L. Rector, P.M. Pole, P. Zanstra, and van der Haring E.
![Page 37: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/37.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 37/57
Rubrics to Dissections to GRAIL to Classifications.
In Medical Informatics Europe '97 , pages 241-245, Amsterdam, 1997. IOS Press
Vol 43.39
P.D. Karp, V.K. Chaudhri, and S.M Paley.
A Collaborative Environment for Authoring Large Knowledge Bases. Journal of Intelligent Information Systems, 13(3):155-194, 1999.
40
J. Domingue.Tadzebao and WebOnto: Discussing, Browsing, and Editing Ontologies on the
Web.
In 11th Knowledge Acquisition for Knowledge-Based Systems Workshop, 1998.
41A. J. Duineveld, R. Stoter, M.R. Weiden, B. Kenepa, and V. R. Benjamins.
Wondertools? A Comparative Study of Ontological Engineering Tools.
In Twelfth Workshop on Knowledge Acquisition, Modeling and Management ,
1999.Published on-line http://sern.ucalgary.ca/KSI/KAW/KAW99/ .
42W. E. Grosso, H. Eriksson, R. W. Fergerson, J. H. Gennari, S. W. Tu, and M. A.
Musen.
Knowledge Modeling at the Millennium (The Design and Evolution of Protégé-2000).
Technical Report SMI-1999-0801, Stanford Medical Informatics (SMI), Stanford
University School of Medicine, 1999.
[Online] at: http://www-smi.stanford.edu/pubs/SMI_Reports/SMI-1999-
0801.pdf.
![Page 38: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/38.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 38/57
![Page 39: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/39.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 39/57
SBO
From Wikipedia, the free encyclopedia
• Ten things you may not know about Wikipedia•
Jump to: navigation, search
For other uses, see SBO (disambiguation).
SBO is the Systems Biology Ontology project, another cornerstone of the BioModels.net
effort. The goal of SBO is to develop Controlled vocabularies and ontologies tailored
specifically for the kinds of problems being faced in Systems biology, especially in thecontext of computational modeling.
Contents
[hide]
• 1 Motivation
• 2 Structure• 3 Resources
• 4 SBO and SBML
• 5 Organization of SBO development
• 6 Funding for SBO
• 7 External references
[edit] Motivation
The rise of Systems Biology, seeking to comprehend biological processes as a whole,highlighted the need to not only develop corresponding quantitative models, but also to
create standards allowing their exchange and integration. This concern drove the
community to design common data format such as SBML and CellML. SBML is nowlargely accepted and used in the field. However, as important as the definition of a
common syntax is, it is also necessary to make clear the semantics of models. SBO is an
attempt to provide the means of annotating models with terms that indicate the intended
semantics of an important subset of models in common use in computational systems biology.
[edit] Structure
![Page 40: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/40.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 40/57
SBO is currently made up of five different vocabularies: quantitative parameters
(catalytic constant, thermodynamic temperature ...), participant type (substrate, product,
catalyst...), modelling frameworks (discrete, continuous...), mathematical expressions andevents.
[edit] Resources
The Systems Biology Ontology Browser
To curate and maintain SBO, a dedicated resource has been developed and the public
interface of the SBO browser can be accessed at http://www.ebi.ac.uk/sbo. A relational
database management system (MySQL) at the back-end is accessed through a webinterface based on Java Server Pages (JSP) and JavaBeans. Its content is encoded in
UTF-8, therefore supporting a large set of characters in the definitions of terms.
Distributed curation is made possible by using a custom-tailored locking system allowingconcurrent access. This system allows a continuous update of the ontology with
immediate availability and suppress merging problems.
Several exports formats (OBO flat file, SBO-XML and OWL) are generated daily or onrequest and can be downloaded from the web interface.
To allow programmatic access to the resource, Web Services have been implemented
based on Apache Axis for the communication layer and Castor for the validation. The
librairies, full documentation, samples and tutorial are available online.
The sourceforge project can be accessed at http://sourceforge.net/projects/sbo/.
[edit] SBO and SBML
SBML Level 2 Version 2 provides a mechanism to annotate model components withSBO terms, therefore increasing the semantics of the model beyond the sole topology of
interaction and mathematical expression. Simulation tools can check the consistency of arate law, convert reaction from one modelling framework to another (e.g., continuous to
discrete), or distinguish between identical mathematical expressions based on different
assumptions (e.g., Henri-Michaelis-Menten Vs. Briggs-Haldane). Other tools such asSBMLmerge can use the SBO annotation to integrate individual models into a larger one.
The use of SBO is not restricted to the development of models. Resources providing
![Page 41: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/41.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 41/57
quantitative experimental information such as SABIO Reaction Kinetics will be able to
annotate the parameters (what do they mean exactly, how were they calculated) and
determine relationships between them.
[edit] Organization of SBO development
SBO is built in collaboration by the Computational Neurobiology Group (Nicolas Le
Novère, EMBL-EBI, United-Kingdom) and the SBMLTeam (Michael Hucka, Caltech,
USA).
[edit] Funding for SBO
SBO has benefited from the funds of the European Molecular Biology Laboratory and the
National Institute of General Medical Sciences.
[edit] External references
• www.biomodels.net
• [1] The Systems Biology Markup Language (SBML): A Medium for
Representation and Exchange of Biochemical Network Models
• [2] CellML: its future, present and past.
![Page 42: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/42.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 42/57
the Gene Ontology
• Open menus
• Home• FAQ
• Downloads
• Ontologies
• Annotations
• Database
• Mappings to GO
• Teaching Resources
• Other files
•
FTP and CVS downloads• Tools
• Browsers
• Microarray tools
• Annotation tools• Other tools
• Submit New Tools
• Documentation
• Introduction
• Annotation Guide
• Evidence Code Guide
•
Component Ontology• Function Ontology
• Process Ontology
• File Format Guide• GO Database Guide
• GO Slim Guide
• Meeting minutes
• Editorial Style Guide
• About GO
• GO Consortium
• Publications
• Citation Policy
• Mailing lists
• Interest Groups
• GO People
• Funding
• Acknowledgements• Newsletter
• Projects
![Page 43: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/43.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 43/57
• Cardiovascular
• Immunology
• Reference Genomes
• Contact GO
• Site Map
An Introduction to the Gene Ontology
• What does the Gene Ontology Consortium do?
• Terms in the Gene Ontology
• Species-specific terms
• Obsolete terms• The Ontologies
• Cellular component
• Biological process
•
Molecular function• Ontology structure
• Topology
• Term-Term Relationships
• Relationship Transitivity
• What GO is NOT
• Annotation and tools
• Downloads
• Beyond GO
• Cross-products• Mappings to other classification systems
•
Contributing to GO
What does the Gene Ontology Consortium do?
Biologists currently waste a lot of time and effort in searching for all of the availableinformation about each small area of research. This is hampered further by the wide
variations in terminology that may be common usage at any given time, which inhibit
effective searching by both computers and people. For example, if you were searching for new targets for antibiotics, you might want to find all the gene products that are involved
in bacterial protein synthesis, and that have significantly different sequences or structures
from those in humans. If one database describes these molecules as being involved in
'translation', whereas another uses the phrase 'protein synthesis', it will be difficult for you- and even harder for a computer - to find functionally equivalent terms.
The Gene Ontology (GO) project is a collaborative effort to address the need for
consistent descriptions of gene products in different databases. The project began as acollaboration between three model organism databases, FlyBase (Drosophila), the
Saccharomyces Genome Database (SGD) and the Mouse Genome Database (MGD),
in 1998. Since then, the GO Consortium has grown to include many databases, including
![Page 44: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/44.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 44/57
several of the world's major repositories for plant, animal and microbial genomes. See the
GO Consortium page for a full list of member organizations.
The GO project has developed three structured controlled vocabularies (ontologies) thatdescribe gene products in terms of their associated biological processes, cellular
components and molecular functions in a species-independent manner. There are threeseparate aspects to this effort: first, the development and maintenance of the ontologies
themselves; second, the annotation of gene products, which entails making associations between the ontologies and the genes and gene products in the collaborating databases;
and third, development of tools that facilitate the creation, maintenance and use of
ontologies.
The use of GO terms by collaborating databases facilitates uniform queries across them.The controlled vocabularies are structured so that they can be queried at different levels:
for example, you can use GO to find all the gene products in the mouse genome that are
involved in signal transduction, or you can zoom in on all the receptor tyrosine kinases.
This structure also allows annotators to assign properties to genes or gene products atdifferent levels, depending on the depth of knowledge about that entity.
Back to top
Terms in the Gene Ontology
The building blocks of the Gene Ontology are the terms, so what makes up a GO term?
Each entry in GO has a unique numerical identifier of the form GO:nnnnnnn, and a term
name, e.g. cell, fibroblast growth factor receptor binding or signal transduction. Each
term is also assigned to one of the three ontologies, molecular function, cellular component or biological process.
The majority of terms have a textual definition, with references stating the source of the
definition. If any clarification of the definition or remarks about term usage are required,
these are held in a separate comments field.
Many GO terms have synonyms; GO uses 'synonym' in a loose sense, as the nameswithin the synonyms field may not mean exactly the same as the term they are attached
to. Instead, a GO synonym may be broader or narrower than the term string; it may be a
related phrase; it may be alternative wording, spelling or use a different system of
nomenclature; or it may be a true synonym. This flexibility allows GO synonyms to serveas valuable search aids, as well as being useful for applications such as text mining and
semantic matching. The relationship of the synonym to the term is recorded within theGO file.
The scope of the Gene Ontology overlaps with a number of other databases, and in cases
where a GO term is identical in meaning to an object in another database, a database
![Page 45: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/45.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 45/57
cross reference is added to the term. These cross references can also be downloaded from
the mappings to GO page.
Species-specific terms
The Gene Ontology aims to provide a controlled vocabulary that can be used to describeany organism; nevertheless, many functions, processes and components are not commonto all life forms. The convention is to include any term that can apply to more than one
taxonomic class of organism. To specify the class of organisms to which a term is
applicable, GO uses the designator sensu, 'in the sense of'; for example, trichomedifferentiation (sensu Magnoliophyta) represents the differentiation of plant hair cells
(trichomes).
Obsolete terms
Occasionally, a term is found that is outside the scope of GO, is misleadingly named or
defined, or describes a concept that would be better represented in another way. Rather than delete the term, it is deprecated or made obsolete. The term and ID still exist in theGO database, but the term is marked as obsolete, and a comment if often added, giving a
reason for the obsoletion. A replacement term is usually also suggested.
Back to top
The Ontologies
The three organizing principles of GO are cellular component, biological process andmolecular function. A gene product might be associated with or located in one or more
cellular components; it is active in one or more biological processes, during which it
performs one or more molecular functions. For example, the gene product cytochrome ccan be described by the molecular function term oxidoreductase activity, the biological
process terms oxidative phosphorylation and induction of cell death, and the cellular
component terms mitochondrial matrix and mitochondrial inner membrane.
Cellular component
A cellular component is just that, a component of a cell, but with the proviso that it is partof some larger object; this may be an anatomical structure (e.g. rough endoplasmic
reticulum or nucleus) or a gene product group (e.g. ribosome, proteasome or a protein
dimer). See the documentation on the cellular component ontology for more details.
Biological process
A biological process is series of events accomplished by one or more ordered assemblies
of molecular functions. Examples of broad biological process terms are cellular physiological process or signal transduction. Examples of more specific terms are
![Page 46: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/46.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 46/57
pyrimidine metabolic process or alpha-glucoside transport. It can be difficult to
distinguish between a biological process and a molecular function, but the general rule is
that a process must have more than one distinct steps.
A biological process is not equivalent to a pathway; at present, GO does not try to
represent the dynamics or dependencies that would be required to fully describe a pathway.
Further information can be found in the process ontology documentation.
Molecular function
Molecular function describes activities, such as catalytic or binding activities, that occur at the molecular level. GO molecular function terms represent activities rather than the
entities (molecules or complexes) that perform the actions, and do not specify where or
when, or in what context, the action takes place. Molecular functions generally
correspond to activities that can be performed by individual gene products, but someactivities are performed by assembled complexes of gene products. Examples of broad
functional terms are catalytic activity, transporter activity, or binding; examples of
narrower functional terms are adenylate cyclase activity or Toll receptor binding.
It is easy to confuse a gene product name with its molecular function, and for that reason
many GO molecular functions are appended with the word "activity". The documentation
on gene products explains this confusion in more depth. The documentation on the
function ontology explains more about GO functions and the rules governing them.
Back to top
Ontology structure
Topology
The ontologies are structured as directed acyclic graphs, which are similar to hierarchies
but differ in that a more specialized term (child) can be related to more than one less
specialized term (parent). For example, the biological process term hexose biosynthetic process has two parents, hexose metabolic process and monosaccharide biosynthetic
process. This is because biosynthetic process is a type of metabolic process and a hexose
is a type of monosaccharide. When any gene involved in hexose biosynthetic process is
annotated to this term, it is automatically annotated to both hexose metabolic process andmonosaccharide biosynthetic process.
Term-Term Relationships
GO terms can be linked by five types of relationships: is_a, part_of, regulates,
positively_regulates and negatively_regulates.
![Page 47: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/47.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 47/57
is_a
The is_a relationship is a simple class-subclass relationship, where A is_a B means that A
is a subclass of B; for example, nuclear chromosome is_a chromosome.
GO:0043232 : intracellular non-membrane-bound organelle[i] GO:0005694 : chromosome---[i] GO:0000228 : nuclear chromosome
part_of
The part_of relationship is slightly more complex; C part_of D means that whenever C is
present, it is always a part of D, but C does not always have to be present. An example
would be periplasmic flagellum part_of periplasmic space:
GO:0044464 : cell part
[i] GO:0042995 : cell projection---[i] GO:0019861 : flagellum------[i] GO:0009288 : flagellin-based flagellum---------[i] GO:0055040 : periplasmic flagellum[i] GO:0042597 : periplasmic space---[p] GO:0055040 : periplasmic flagellum
When a periplasmic flagellum is present, it is always part_of a periplasmic space.
However, every periplasmic space does not necessarily have a periplasmic flagellum.
regulates, positively_regulates and negatively_regulates
The regulates, positively_regulates and negatively_regulates relationships describeinteractions between biological processes and other biological processes, molecular
functions or biological qualities. When a biological process E regulates a function or a
process F, it modulates the occurrence of F. If F is a biological quality, then E modulatesthe value of F. An example of the regulation of a biological process would be the term
regulation of transcription. When regulation of transcription occurs, it always alters the
rate, extent or frequency at which a gene is transcribed.
Relationship Transitivity
is_a and part_of
The is_a and part_of relationships are transitive, which means that the relationships are propagated from children terms to parent terms. An example of is_a transitivity is shown
in the nuclear chromosome example previously used:
![Page 48: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/48.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 48/57
GO:0043232 : intracellular non-membrane-bound organelle[i] GO:0005694 : chromosome---[i] GO:0000228 : nuclear chromosome
All nuclear chromosomes must be intracellular non-membrane-bound organelles.
An example of part_of transitivity is shown below:
GO:0048869 : cellular developmental process[i] GO:0030154 : cell differentiation---[p] GO:0048468 : cell development------[p] GO:0000904 : cellular morphogenesis duringdifferentiation
Every occurrence of cellular morphogenesis during differentiation must be a part of an
occurrence of cell differentiation.
regulates, positively_regulates and negatively_regulates
The regulates relationships are transitive over both the part_of and is_a relationships.
GO:0010467 : gene expression[r] GO:0010468 : regulation of gene expression---[i] GO:0045449 : regulation of transcription[p] GO:0006350 : transcription---[r] GO:0045449 : regulation of transcription
part_of transitivity: If process Y exists in the GO biological process ontology and it is a part_of child of process X then any process that regulates process Y also regulates
process X.
In the example above, regulation of transcription regulates transcription which is part_of
gene expression. Therefore, regulation of transcription also regulates gene expression.
is_a transitivity: If process B exists in the GO biological process ontology and it is an
is_a child of process A then any process that regulates process B also regulates process
A.
In the example above, regulation of transcription is_a form of regulation of geneexpression, which regulates gene expression. Therefore, regulation of transcription also
regulates gene expression.
Transitivity of regulates
The regulates relationship is transitive over both the is_a and part_of relationships.
![Page 49: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/49.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 49/57
is_a transitivity: If process B exists in the GO biological process ontology and it is an
is_a child of process A then any process that regulates process B also regulates process
A. For example:
GO:0016049 : cell growth
[i] GO:0042815 : bipolar cell growth---[r] GO:0051516 : regulation of bipolar cell growth
Due to is_a transitivity, we can say that any process that regulates bipolar cell growth
also regulates cell growth.
part_of transitivity: If process Y exists in the GO biological process ontology and it is a
part_of child of process X then any process that regulates process Y also regulates process X.
GO:0001754 : eye photoreceptor cell differentiation
[p] GO:0042462 : eye photoreceptor cell development---[r] GO:0042478 : regulation of eye photoreceptor celldevelopment
Every GO term must obey the true path rule: if the child term describes the gene product,
then all its parent terms must also apply to that gene product.
Back to top
What GO is NOT
It is important to clearly state the scope of GO, and what it does and does not cover. Theontologies section explains the domains covered by GO; the following areas are outside
the scope of GO, and terms in these domains would not appear in the ontologies.
• Gene products: e.g. cytochrome c is not in the ontologies, but attributes of cytochrome c, such as oxidoreductase activity, are.
• Processes, functions or components that are unique to mutants or diseases: e.g.
oncogenesis is not a valid GO term because causing cancer is not the normalfunction of any gene.
• Attributes of sequence such as intron/exon parameters: these are not attributes of
gene products and will be described in a separate sequence ontology (see the
OBO website for more information).• Protein domains or structural features.
• Protein-protein interactions.
• Environment, evolution and expression.• Anatomical or histological features above the level of cellular components,
including cell types.
![Page 50: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/50.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 50/57
GO is not a database of gene sequences, nor a catalog of gene products. Rather, GO
describes how gene products behave in a cellular context.
GO is not a dictated standard, mandating nomenclature across databases. Groups participate because of self-interest, and cooperate to arrive at a consensus.
GO is not a way to unify biological databases (i.e. GO is not a 'federated solution').
Sharing vocabulary is a step towards unification, but is not, in itself, sufficient. Reasons
for this include the following:
• Knowledge changes and updates lag behind.
• Individual curators evaluate data differently. While we can agree to use the word
'kinase', we must also agree to support this by stating how and why we use
'kinase', and consistently apply it. Only in this way can we hope to compare gene products and determine whether they are related.
• GO does not attempt to describe every aspect of biology; its scope is limited to
the domains described above.
Back to top
Annotation and tools
How do the terms in GO become associated with their appropriate gene products?
Collaborating databases annotate their genes or gene products with GO terms, providing
references and indicating what kind of evidence is available to support the annotations.More information can be found in the GO Annotation Guide.
If you browse any of the contributing databases, you'll find that each gene or gene product has a list of associated GO terms. Each database also publishes downloadable
files containing these associations; these can be downloaded from the GO annotations page. You can browse the ontologies using a range of web-based browsers. A full list of
these, and other tools for analyzing gene function using GO, is available on the GO Tools
section.
In addition, the GO consortium has prepared GO slims, 'slimmed down' versions of theontologies that allow you to annotate genomes or sets of gene products to gain a high-
level view of gene functions. Using GO slims you can, for example, work out what
proportion of a genome is involved in signal transduction, biosynthesis or reproduction.
See the GO Slim Guide for more information.
Back to top
Downloads
![Page 51: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/51.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 51/57
All data from the GO project is freely available. You can download the ontology data in a
number of different formats, including XML and mySQL, from the GO Downloads page.
For more information on the syntax of these formats, see the GO File Format Guide.
If you need lists of the genes or gene products that have been associated with a particular
GO term, the Current Annotations table tracks the number of annotations and provideslinks to the gene association files for each of the collaborating databases is available.
Back to top
Beyond GO
GO allows us to annotate genes and their products with a limited set of attributes. For
example, GO does not allow us to describe genes in terms of which cells or tissues they're
expressed in, which developmental stages they're expressed at, or their involvement indisease. It is not necessary for GO to do these things because other ontologies are being
developed for these purposes. The GO consortium supports the development of other ontologies and makes its tools for editing and curating ontologies freely available. A listof freely available ontologies that are relevant to genomics and proteomics and are
structured similarly to GO can be found at the Open Biomedical Ontologies website . A
larger list, which includes the ontologies listed at OBO and also other controlled
vocabularies that do not fulfill the OBO criteria is available at the Ontology WorkingGroup section of the Microarray Gene Expression Data (MGED) Network site .
Cross-products
The existence of several ontologies will also allow us to create 'cross-products' that
maximize the utility of each ontology while avoiding redundancy. For example, bycombining the developmental terms in the GO process ontology with a second ontologythat describes Drosophila anatomical structures, we could create an ontology of fly
development. We could repeat this process for other organisms without having to clutter
up GO with large numbers of species-specific terms. Similarly, we could create anontology of biosynthetic pathways by combining the biosynthesis terms in the GO
process ontology with a chemical ontology.
Mappings to other classification systems
GO is not the only attempt to build structured controlled vocabularies for genome
annotation, nor is it the only such series of catalogs in current use. The GO project
provides mappings between GO and these other systems, although we caution that thesemappings are neither complete nor exact and should only to be used as a guide.
Back to top
Contributing to GO
![Page 52: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/52.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 52/57
The GO project is constantly evolving, and we welcome feedback from all users. If you
need a new term or definition, or would like to suggest that we reorganize a section of
one of the ontologies, please do so through the GO curator requests tracker . Any errorsor omissions in annotations should be reported to the GO annotation mailing list.
Any other questions or suggestions should be addressed to the GO helpdesk .
Back to top
Last modified Tuesday, 25-Mar-2008 09:40:13 PDT
Cite GO • Terms of use • GO helpdesk
Copyright © 1999-Tuesday, 20-May-2008 20:51:29 PDT the Gene Ontology
File Format Guide
The GO File Format Guide documents the structure and syntax of the GO files availableon the GO website, to assist users who need to read, write parsers for, or create these
files.
See also the GO annotation file format guide for the format used in the gene associationfiles.
• Anatomy of a GO Term
• Ontology Flat File Formats
• GO RDF-XML Format
• OWL Format• OBO-XML Format
• MySQL Format
• FASTA Format
• Mappings to Other Classification Systems
Anatomy of a GO Term
Terms and unique identifiers
The structure of a GO term is very simple. At its bare minimum, each GO entry consistsof a term name (e.g. cell) and a unique, zero-padded seven-digit identifier (or accession
![Page 53: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/53.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 53/57
number) prefixed by GO: (e.g. GO:0005623), which is used as a unique idenfier and
database cross-reference. The same number range is used across all three ontologies. The
numeric portion of a GO ID does not have any 'meaning' or relation to the position of theterm in the ontologies; instead, ranges of GO IDs are assigned to specific groups or
individual curators, so a GO ID can be used to trace who added a term.
Secondary IDs
Terms may have one or more secondary IDs, alternate IDs that refer to the term.Secondary IDs come about when two or more terms are identical in meaning, and are
merged into a single term. All terms IDs are preserved so that no information (for
example, annotations to the merged IDs) is lost. More information on the protocolsinvolved can be found in the documentation on term merges.
Synonyms
Any term may, but does not need to, include one or more synonyms (e.g. type I programmed cell death is a synonym of apoptosis). Synonyms are assigned a relationship
to the primary term string; see the documentation on synonyms for more information.
Database cross-references
Another optional extra is one or more general database cross-references (dbxrefs), which
refer to an identical object in another database. For instance, the molecular function term
retinal isomerase activity has the database cross reference EC:5.2.1.3, which is theaccession number of this enzyme activity in the Enzyme Commission database. There is a
complete list of database cross-references and database abbreviations used by GO
available.
Definition and Comment
GO terms should be equipped with a text definition, which includes an indication of the
source of the definition. Terms may also have a comment, which gives more information
about the term and its usage.
Back to top
Ontology Flat File FormatsThere are two types of ontology flat file format, the older GO flat file format and the
newer OBO flat file format. The GO flat file format is now deprecated but will continue
to be provided alongside the new format.
See also the Java OBO parser guide, which gives details of the OBO parser implemented
as part of OBO-Edit, and how to use it.
![Page 54: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/54.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 54/57
Back to top
GO RDF-XML Format
The GO RDF-XML version of GO, which includes all three ontologies and the
definitions, can be downloaded from the GO database archive. The document typedefinition ( DTD) is available from the GO FTP site.
The GO RDF-XML file is built from the flat files and the gene association files on a
monthly basis.
Here's a GO RDF-XML snapshot (with some lines wrapped for legibility):
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE go:go>
<go:go xmlns:go="xml-dtd/go.dtd#"xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"><go:version timestamp="Wed May 9 23:55:02 2001" /><rdf:RDF><go:term rdf:about="go#GO:0003673"><go:accession>GO:0003673</go:accession><go:name>Gene_Ontology</go:name><go:definition></go:definition>
</go:term><go:term rdf:about="go#GO:0003674"><go:accession>GO:0003674</go:accession><go:name>molecular_function</go:name><go:definition>The action characteristic of a gene
product.</go:definition>
<go:part-of rdf:resource="go#GO:0003673" /><go:dbxref><go:database_symbol>go</go:database_symbol><go:reference>curators</go:reference>
</go:dbxref></go:term><go:term rdf:about="go#GO:0016209"><go:accession>GO:0016209</go:accession><go:name>antioxidant</go:name><go:definition></go:definition><go:isa rdf:resource="go#GO:0003674" /><go:association><go:evidence evidence_code="ISS"><go:dbxref><go:database_symbol>fb</go:database_symbol><go:reference>fbrf0105495</go:reference>
</go:dbxref></go:evidence><go:gene_product><go:name>CG7217</go:name><go:dbxref><go:database_symbol>fb</go:database_symbol><go:reference>FBgn0038570</go:reference>
![Page 55: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/55.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 55/57
</go:dbxref></go:gene_product>
</go:association><go:association><go:evidence evidence_code="ISS"><go:dbxref><go:database_symbol>fb</go:database_symbol><go:reference>fbrf0105495</go:reference>
</go:dbxref></go:evidence><go:gene_product><go:name>Jafrac1</go:name><go:dbxref><go:database_symbol>fb</go:database_symbol><go:reference>FBgn0040309</go:reference>
</go:dbxref></go:gene_product>
</go:association></go:term>
</rdf:RDF>
</go:go>
The basic unit of the GO RDF-XML database is GO:termid. Owing to limitations of the
XML id and idref attributes (for instance, multiple parentage cannot be represented), the
linking mechanism is RDF. RDF provides a much more flexible system for representingtrees. To follow the links, note that term molecular function ; GO:0003674 has the
attribute
rdf:about="go#GO:0003674"
This is roughly equivalent to
id="go#GO:0003674"
In rdf, unique urls are used as ids to make them universally unique. Now, note that term
antioxidant activity ; GO:0016209 has the tag
<go:isardf:resource="go#GO:0003674" />
This shows that its parent is molecular function ; GO:0003674. This tag represents the
relationship "GO:0016209 isa GO:0003674" or, in plain English, "antioxidant is a
molecular function". The other type of parentage relationship is go:part-of. molecular
function ; GO:0003674 has the tag
<go:part-ofrdf:resource="go#GO:0003673" />
This shows the relationship "molecular function is part of the Gene Ontology".
![Page 56: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/56.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 56/57
![Page 57: OntologyGSK](https://reader031.vdocument.in/reader031/viewer/2022021213/577d27b21a28ab4e1ea493ed/html5/thumbnails/57.jpg)
8/6/2019 OntologyGSK
http://slidepdf.com/reader/full/ontologygsk 57/57
go_YYYYMM-schema-html
Designed for viewing with a web browser; does not contain full documentation.
Further documentation on the GO database can be found in the GO database guide.
Back to top
FASTA Format
There is a FASTA version of the gene products in the database available from the
database archives.
Back to top
Mappings to Other Classification Systems
Mappings of GO have been made to other many other classification systems; a full list is
available on the Mappings to GO page. The syntax of these files is as follows:
The source of the external file is given in the line beginning !Uses:
!Uses:http://www.tigr.org/docs/tigr-scripts/egad_scripts/role_reports.spl, 15 aug 2000.
The line syntax for mappings is
external database:term identifier (id/name) > GO:GO term name ; GO:id
For example:
TIGR_role:11030 73 Amino acid biosynthesis Glutamate family >GO:glutamine family amino-acid biosynthesis ; GO:0009084
all on a single line. The relationship between terms from external systems to GO terms
can also be one to many, and these should just be added with a further >. For example:
MultiFun:1.5.1.18 Isoleucine/valine > GO:isoleucine biosynthesis ;GO:0009097 > GO:valine biosynthesis ; GO:0009099
If no equivalent GO term exists for a term from another classification system, GO:.
should be added as a mapping. For example: