ontologygsk

57
Ontology From Wikipedia, the free encyclopedia In philosophy, ontology (from the Greek   ν, genitive  ντος: of being (part. of ε  ἶ  ναι: to be) and -λογία: science, study, theory) is the most fundamental branch of  metaphysics. It studies  being or existence and their  basic categories and relationships, to determine what entities and what types of entities exist. Ontology thus has strong implications for conceptions of reality. Some philosophers, notably of the Platonic school, contend that all nouns refer to entities. Other philosophers contend that some nouns do not name entities but provide a kind of shorthand way of referring to a collection (of either objects or events). In this latter view, mind , instead of referring to an en tity, refers to a collection of mental events experienced  by a person; society refers to a collection of persons with some shared characteristics, and  geometry refers to a collection of a specific kind of intellectual activity. Any ontology must give an account of which words refer to entities, which do not, why, and what categories result. When one applies this process to nouns such as  electrons, energy, contract , happiness, time, truth, causality, and god , ontology becomes fundamental to many branches of philosophy. Contents 1 Some basic questions 2 Conc epts 3 Early history of ontology 4 Subject , relation ship, object 5 Body and environme nt 6 Be in g 7 Soc ial scie nce 8 Prominen t ontologists 9 See als o 10 Ext erna l link s Some basic questions Ontology has one basic question: "What actually exists?" Different  philosophers provide different answers to this question. One common approach is to divide the extant entities into groups called "categories". However, these lists of categories are also quite d ifferent from one another. It is in this latter sense that ontology is applied to such fields as theology, library science and artificial intelligence.

Upload: raazia-mir

Post on 07-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 1/57

Ontology

From Wikipedia, the free encyclopedia

In philosophy, ontology (from the Greek  ὄ ν, genitive ὄ ντος: of being (part. of ε ἶ ναι: tobe) and -λογία: science, study, theory) is the most fundamental branch of  metaphysics. It

studies  being or existence and their  basic categories and relationships, to determine whatentities and what types of entities exist. Ontology thus has strong implications for 

conceptions of reality.

Some philosophers, notably of the Platonic school, contend that all nouns refer to entities.

Other philosophers contend that some nouns do not name entities but provide a kind of shorthand way of referring to a collection (of either objects or events). In this latter view,

mind , instead of referring to an entity, refers to a collection of mental events experienced

 by a person; society refers to a collection of persons with some shared characteristics, and

 geometry refers to a collection of a specific kind of intellectual activity. Any ontologymust give an account of which words refer to entities, which do not, why, and what

categories result. When one applies this process to nouns such as electrons, energy, contract , happiness, time, truth, causality, and god , ontology becomes fundamental tomany branches of philosophy.

Contents

• 1 Some basic questions

• 2 Concepts

• 3 Early history of ontology

• 4 Subject, relationship, object• 5 Body and environment• 6 Being

• 7 Social science

• 8 Prominent ontologists

• 9 See also

• 10 External links

Some basic questions

Ontology has one basic question: "What actually exists?" Different philosophers provide

different answers to this question.

One common approach is to divide the extant entities into groups called "categories".However, these lists of categories are also quite different from one another. It is in this

latter sense that ontology is applied to such fields as theology, library science and

artificial intelligence.

Page 2: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 2/57

Further examples of ontological questions include:

• What is existence?

• Is existence a property?

• Why does something exist rather than nothing?

What constitutes the identity of an object?• What is a physical object?

• What features are the essential, as opposed to merely accidental, attributes of agiven object?

• Can one give an account of what it means to say that a physical object exists?

• What are an object's properties or relations and how are they related to the objectitself?

• When does an object go out of existence, as opposed to merely changing ?

Concepts

Quintessential ontological concepts include:

• Universals

• Substance

Early history of ontology

The concept of ontology is generally thought to have originated in early Greece andoccupied Plato and Aristotle. While the etymology is Greek, the oldest extant record of 

the word itself is the Latin form ontologia, which appeared in 1606, in the work Ogdoas

Scholastica by Jacob Lorhard ( Lorhardus) and in 1613 in the Lexicon philosophicum byRudolf Göckel (Goclenius). The first occurrence in English of "ontology" as recorded bythe OED appears in Bailey’s dictionary of 1721, which defines ontology as ‘an Account

of being in the Abstract’. However its appearance in a dictionary indicates it was in use

already at that time. It is likely the word was first used in its latin form by philosophers based on the latin roots, which themselves are based on the Greek.

Students of Aristotle first used the word 'metaphysica' (literally "after the physical") to

refer to the work their teacher described as "the science of being qua being". The word'qua' means 'in the capacity of'. According to this theory, then, ontology is the science of  being inasmuch as it is being, or the study of beings insofar as they exist. Take anything

you can find in the world, and look at it, not as a puppy or a slice of pizza or a foldingchair or a president, but just as something that is. More precisely, ontology concernsdetermining what categories of being  are fundamental and asks whether, and in what

sense, the items in those categories can be said to "be".

Ontological questions have also been raised and debated by thinkers in the ancient

civilizations of India and China, in some cases perhaps predating the Greek thinkers whohave become associated with the concept.

Page 3: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 3/57

Subject, relationship, object

"What exists", "What is", "What am I", "What is describing this to me", all exemplify

questions about being, and highlight the most basic problems in ontology: finding asubject, a relationship, and an object to talk about. During the Enlightenment the view of 

René Descartes that "cogito ergo sum" ("I think therefore I am") had generally prevailed,although Descartes himself did not believe the question worthy of any deep investigation.However, Descartes was very religious in his philosophy, and indeed argued that "cogito

ergo sum" proved the existence of God. Later theorists would note the existence of the

"Cartesian Other " — asking "who is reading that sentence about thinking and being?" — 

and generally concluded that it must be God.

This answer, however, became increasingly unsatisfactory in the 20th century as the

 philosophy of mathematics and the philosophy of science and even  particle physics 

explored some of the most fundamental barriers to knowledge about being. Sociologicaltheorists, most notably George Herbert Mead and Erving Goffman, saw the Cartesian

Other as a "Generalized Other," the imaginary audience that individuals use whenthinking about the self. The Cartesian Other was also used by Freud, who saw the

superego as an abstract regulatory force.

Body and environment

Schools of subjectivism, objectivism and relativism existed at various times in the 20th

century, and the postmodernists and body philosophers tried to reframe all these

questions in terms of bodies taking some specific action in an environment. This relied toa great degree on insights derived from scientific research into animals taking instinctive

action in natural and artificial settings — as studied by biology, ecology, and cognitivescience.

The processes by which bodies related to environments became of great concern, and the

idea of  being itself became difficult to really define. What did people mean when they

said "A is B", "A must be B", "A was B"...? Some linguists advocated dropping the verb

"to be" from the English language, leaving "E Prime", supposedly less prone to badabstractions. Others, mostly philosophers, tried to dig into the word and its usage.

Heidegger attempted to distinguish being and existence.

Being

Existentialism regards being as a fundamental central concept. It is anything that can besaid to 'be' in various senses of the word 'be'. The verb to be has many different meanings

and can therefore be rather  ambiguous. Because "to be" has so many different meanings,

there are, accordingly, many different ways of being. In Systems-Theory, 'being'corresponds with the 'system-state' and Systems-Engineering(not system-

administration...) is the engineering-grade/wise onthology, which identifies to the

architects the existence of systems and defines their boundaries to them.

Page 4: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 4/57

Social science

Social scientists adopt one of four main ontological approaches: realism (the idea that

facts are out there just waiting to be discovered), empiricism (the idea that we canobserve the world and evaluate those observations in relation to facts), positivism (which

focuses on the observations themselves, attentive more to claims about facts than to factsthemselves), and post-modernism (which holds that facts are fluid and elusive, so that weshould focus only on our observational claims).

Prominent ontologists

• Aquinas

• Aristotle

• Martin Heidegger 

• Heraclitus

Edmund Husserl• Roman Ingarden

• Immanuel Kant

• Gottfried Leibniz

• Parmenides• Plato

• W. V. Quine

• Gilbert Ryle

• Jean-Paul Sartre

• Baruch Spinoza

• Alfred North Whitehead

Charles Taylor • Ludwig Wittgenstein

External links

• Aristotle's definition of a science of Being qua Being: ancient and moderninterpretations

• Buffalo Ontology Site

• Building a Sensor Ontology: A Practical Approach Leveraging ISO and OGCModels

• Example General Ontology

•  National Center for Ontological Research•  National Center for Biomedical Ontology

•  Notes on the history of Ontology

• Ontology. A resource guide for philosophers

• Applied Ontology. An interdisciplinary journal on ontological analysis andconceptual modeling

• Laboratory for Applied Ontology

• Clay Shirky: Ontology is Overrated

Page 5: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 5/57

• W3C Semantic Web

• WikiVentory on WikiPedia Meta

• The ontology of quantum fields: entity and quality

Page 6: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 6/57

Open Biomedical Ontologies

Open Biomedical Ontologies (formerly Open Biological Ontologies) is an effort to create

controlled vocabularies for shared use across different biological and medical domains.As of 2006, OBO forms part of the resources of the U.S.  National Center for Biomedical

Ontology, where it will form a central element of the NCBO's BioPortal.

Contents

• 1 OBO Foundry• 2 Related Projects

• 3 OBO and Semantic Web

• 4 External links

OBO Foundry

The OBO Ontology library forms the basis of the OBO Foundry, a collaborativeexperiment involving a group of ontology developers who have agreed in advance to the

adoption of a growing set of principles specifying best practices in ontology

development. These principles are designed to foster interoperability of ontologies withinthe broader OBO framework, and also to ensure a gradual improvement of quality and

formal rigor in ontologies, in ways designed to meet the increasing needs of data and

information integration in the biomedical domain.

Related Projects

Ontology Lookup Service

The Ontology Lookup Service is a spin-off of the PRIDE project, which required a

centralized query interface for ontology and controlled vocabulary lookup. While many

of the ontologies queriable by the OLS are available online, each has its own queryinterface and output format. The OLS provides a web service interface to query multiple

ontologies from a single location with a unified output format.

Gene Ontology Consortium

The goal of the Gene ontology (GO) consortium is to produce a controlled vocabularythat can be applied to all organisms even as knowledge of gene and protein roles in cells

is accumulating and changing. GO provides three structured networks of defined terms to

describe gene product attributes.

Page 7: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 7/57

Sequence Ontology

The Sequence Ontology (SO) is a part of the Gene Ontology project and the aim is to

develop an ontology suitable for describing biological sequences. It is a joint effort bygenome annotation centres, including WormBase, the Berkeley Drosophila Genome

Project, FlyBase, the Mouse Genome Informatics group, and the Sanger Institute.

Generic Model Organism Databases

The Generic Model Organism Project (GMOD) is a joint effort by the model organism

system databases WormBase, FlyBase, MGI, SGD, Gramene, Rat Genome Database,EcoCyc, and TAIR to develop reusable components suitable for creating new community

databases of biology.

Standards and Ontologies for Functional Genomics

SOFG is both a meeting and a website; it aims to bring together biologists, bioinformaticians, and computer scientists who are developing and using standards and

ontologies with an emphasis on describing high-throughput functional genomics

experiments.

MGED

The Microarray Gene Expression Data (MGED) Society is an international organisationof biologists, computer scientists, and data analysts that aims to facilitate the sharing of 

microarray data generated by functional genomics and proteomics experiments.

Ontology for Biomedical Investigations

The Ontology for Biomedical Investigations (OBI) is an open access, integrated ontology

for the description of biological and clinical investigations. OBI provides a model for the

design of an investigation, the protocols and instrumentation used, the materials used, thedata generated and the type of analysis performed on it. The project is being developed as

 part of the OBO Foundry and as such adheres to all the principles therein such as

orthogonal coverage (i.e. clear delineation from other foundry member ontologies) and

the use of a common formal language. In OBI the common formal language used is theWeb Ontology Language (OWL).

Plant Ontology Consortium

The Plant Ontology Consortium (POC) aims to develop, curate and share structured

controlled vocabularies (ontologies) that describe plant structures andgrowth/developmental stages. Through this effort, the project aims to facilitate cross

database querying by fostering consistent use of these vocabularies in the annotation of 

tissue and/or growth stage specific expression of genes, proteins and phenotypes.

Page 8: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 8/57

OBO and Semantic Web

OBO2OWL - Layer Cakes and Roundtrip Transformations

As a migration path for biomedical ontologies, this is a solution for lossless roundtrip

transformations between Open Biomedical Ontologies (OBO) format and OWL. Containsmethodical examination of each of the constructs of OBO and a layer cake for OBO,

similar to the Semantic Web stack. Project Page Morphster Project

External links

• Open Biomedical Ontologies (OBO)

• The OBO Foundry

• Morphster ATOL Project at The University of Texas at Austin

• Ontology browser for most of the Open Biological Ontologies at BRENDA

website• OBO Relation Ontology

Page 9: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 9/57

http://www.cs.man.ac.uk/~stevensr/onto

Ontology-based KnowledgeRepresentation for Bioinformatics

Robert Stevens , Carole A. Goble and Sean Bechhofer

Department of Computer Science and School of Biological Sciences

University of Manchester

Oxford Road

Manchester

M13 9PLrobert.stevens carole [email protected]

Abstract:

Much of biology works by applying prior knowledge (`what is known') to an unknownentity, rather than the application of a set of axioms that will elicit knowledge. In

addition, the complex biological data stored in bioinformatics databases often requires the

addition of knowledge to specify and constrain the values held in that database. One wayof capturing knowledge within bioinformatics applications and databases is the use of 

ontologies. An ontology is the concrete form of a conceptualisation of a community's

knowledge of a domain.

This paper aims to introduce the reader to the use of ontologies within bioinformatics. Adescription of the type of knowledge held in an ontology will be given. The paper will be

illustrated throughout with examples taken from bioinformatics and molecular biology,

and a survey of current biological ontologies will be presented. From this it will be seenthat the use to which the ontology is put largely determines the content of the ontology.

Finally, the paper will describe the process of building an ontology, introducing the

reader to the techniques and methods currently in use and the open research questions inontology development.

Page 10: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 10/57

Introduction

Biologists need knowledge in order to perform their work. A biologist will often use

some pre-existing item of knowledge to make inferences about the item under investigation. The most common example of this within molecular biology is the use of 

sequence comparison to infer the function of a novel protein sequence. The reasoning is

that if a sequence of unknown function is highly similar to a sequence of known function,then it is probable that the novel sequence also has that function. So, rather than using a

rule, law or equation to find the function of a protein, a biologist uses the knowledge that

a similar sequence has a known function to make a judgment about the function of the

new sequence. This is why it is sometimes said that biology is a `knowledge based',rather than an `axiom based' discipline [1].

Modern biologists also need knowledge for communication. Biology is a data rich

discipline, which is available as a fund of knowledge by which biologists generate further knowledge. This knowledge is stored in many hundreds of databases and many of these

databases need to be used in concert during an investigation. Knowledge is vital in two

respects during this process. For instance, when using more than one data store or 

analysis tool, a biologist needs to be sure that knowledge within one resource can bereliably compared to another. A prime example is the differing uses of the term `gene'

within the community. In one database, gene may be defined as `the coding region of 

DNA'; in another as `DNA fragment that can be transcribed and translated into a protein'and `DNA region of biological interest with a name and that carries a genetic trait or 

 phenotype' in a third [2]. Being able to conform to a common definition or reason about

the differences between definitions, in order to reconcile databases, would be

advantageous. The second need for knowledge is to define and constrain data within aresource. Biological data can be very complex; not only in the type of data stored, but in

the richness and constraints working upon relationships between those data. When

designing a database it is useful to be able to describe what values can be specified for which attributes under which conditions. This is the encapsulation of biological

knowledge within database schema.

It is impossible for a single biologist to deal with all the domain knowledge. The arrivalof whole genomes and the knowledge they contain only exacerbates the situation. There

is, therefore, a need to create systems that can apply the knowledge in the heads of 

domain experts to biological data. It is not envisaged that such systems could ever 

 perform better than human experts, however, they could play a crucial role in helping the processing of data to the point where human experts could again apply their knowledge

sensibly. This then raises numerous questions, in particular regarding how knowledge can

 be captured in ways that make it available and useful within computer applications.

This briefing is about the use of such knowledge within bioinformatics applications.Knowledge can be captured and made available to both machines and humans by an

ontology. The premise for the need for ontologies within bioinformatics is the need to

Page 11: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 11/57

make knowledge available to that community and its applications. This paper will only be

a brief introduction and will not be a complete guide to the philosophy, building and use

of an ontology. It does, however, aim to provide the foundations.

Section 2 gives the definitions of ontology and related terms. In Section 3, we will

describe the uses to which ontologies can be put, and then in Section 4 we will describesome current bioinformatics and molecular biology ontologies and how they are used.

Section 5 will describe the processes of conceptualisation and specification, or buildingof, an ontology. Finally, Section 6 draws together the main themes of the paper and

explores the future of ontologies in the bioinformatics domain.

Page 12: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 12/57

What is an Ontology?Ontology is the study or concern about what kinds of things exist - what entities or `things' there are in the universe [3]. The computer science view of ontology is somewhat

narrower, where an ontology is the working model of entities and interactions either 

generically (e.g. the Cyc ontology [4]) or in some particular domain of knowledge or  practice, such as molecular biology or bioinformatics. The following definition is given

in [5]:

An ontology may take a variety of forms, but necessarily it will include a vocabulary of 

terms, and some specification of their meaning . This includes definitions and an

indication of how concepts are inter-related which collectively impose a structure on thedomain and constrain the possible interpretations of terms.'

Gruber defines an ontology as `the specification of conceptualisations, used to help

 programs and humans share knowledge' [6]. The conceptualisation is the couching of knowledge about the world in terms of entities (things, the relationships they hold and the

constraints between them). The specification is the representation of this

conceptualisation in a concrete form. One step in this specification is the encoding of theconceptualisation in a knowledge representation language. The goal is to create an

agreed-upon vocabulary and semantic structure for exchanging information about that

domain. The specification or encoding of an ontology will be explored in Section 5. 

The main components of an ontology are concepts, relations, instances and axioms. Aconcept represents a set or class of entities or `things' within a domain. Protein is a

concept within the domain of molecular biology. Concepts fall into two kinds:

1.  primitive concepts are those which only have necessary conditions (in terms of their properties) for membership of the class. For example, a globular protein is a

kind of protein with a hydrophobic core, so all globular proteins must have a

hydrophobic core, but there could be other things that have a hydrophobic core

that are not globular proteins.2. defined concepts are those whose description is both necessary and sufficient for a

thing to be a member of the class. For example, Eukaryotic cells are kinds of cellsthat have a nucleus. Not only does every eukaryotic cell have a nucleus, everynucleus containing cell is eukaryotic.

 Relations describe the interactions between concepts or a concept's properties. Relations

also fall into two broad kinds:

Page 13: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 13/57

1. Taxonomies that organize concepts into sub- super-concept tree structures. The

most common forms of these areo Specialisation relationships commonly known as the ‘is a kind of’

relationship. For example, an Enzyme is a kind of Protein, which in turn

is a kind of Macromolecule.

o Partitive relationships describe concepts that are part of other concepts -Protein has ComponentModificationSite.

2.  Associative relationships that relate concepts across tree structures. Commonly

found examples include the following:o  Nominative relationships describe the names of concepts - Protein

hasAccessionNumber AccessionNumber (in the context of 

 bioinformatics) and Gene hasName GeneName.

o Locative relationships describe the location of one concept with respect to

another - Chromosome hasSubcellularLocation Nucleus.

o Associative relationships that represent, for example, the functions,

 processes a concept has or is involved in, and other properties of the

concept - Protein hasFunction Receptor, ProteinisAssociatedWithProcess Transcription and Protein

hasOrganismClassification Species.

o Many other types of relationships exist, such as `causative' relationships,

that are described in [7,8].

The relations, like concepts, can be organised into taxonomies. For example, hasName

can be subdivided into hasGeneName, hasProteinName and hasDiseaseName. Relations

also have properties that capture further knowledge about the relationships betweenconcepts. These include, but are not restricted to:

• whether it is universally necessary that a relationship must hold on a concept. For example, when describing a protein database, we might want to say that Protein

hasAccessionNumber AccessionNumber holds universally, i.e., for all proteins.

• whether a relationship can optionally hold on a concept, for example, we might

want to describe that Enzyme hascofactor Cofactor only describes the

 possibility that enzymes have a cofactor, as not all enzymes do have a cofactor.

• whether the concept a relationship links to is restricted to certain kinds of 

concepts. For example, Protein hasFunction Receptor restricts the

hasFunction relation to only link to concepts that are kinds of receptors. Protein

hasFunction says that Protein has a function but does not restrict as to what kind

of concept the function might be.

• the cardinality of the relationship. For example, a particular AccessionNumber isthe accession number of only one Protein, but one Chromosome may have many

Genes.

• whether the relationship is transitive, for example if Protein

isAssociatedWithProcess Transcription and Transcription

isAssociatedWithProcess GeneExpression, then Protein

isAssociatedWithProcess GeneExpression. The taxonomy relations always

have this property.

Page 14: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 14/57

Once this conceptualisation has been made concrete (see Section 5) an ontology has been

 produced.

 Instances are the `things' represented by a concept - a human cytochrome C is an instanceof the concept Protein. Strictly speaking, an ontology should not contain any instances,

 because it is supposed to be a conceptualisation of the domain. The combination of anontology with associated instances is what is known as a knowledge base. However,

deciding whether something is a concept of an instance is difficult, and often depends onthe application [9]. For example, Atom is a concept and `potassium' is an instance of that

concept. It could be argued that Potassium is a concept representing the different

instances of potassium and its isotopes etc. This is a well known and open question inknowledge management research.

Finally, axioms are used to constrain values for classes or instances. In this sense the

 properties of relations are kinds of axioms. Axioms also, however, include more generalrules, such as nucleic acids shorter than 20 residues are oligonucleiotides.

Page 15: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 15/57

Applications and Types of Bio-Ontologies

A common ideal for an ontology is that it should be re-usable [6]. This ambition

distinguishes an ontology from a database schema, even though both areconceptualisations. For example - a database schema is intended to satisfy only one

application, but an ontology could be re-used in many applications. However, an

ontology is only re-usable when it is to be used for the same purpose for which it wasdeveloped. Not all ontologies have the same intended purpose and may have parts that

are re-usable and other parts that are not. They will also vary in their coverage and level

of detail.

We can divide ontology use into three types:

1. Domain-oriented, which are either domain specific (e.g. E. coli) or domain

generalisations (e.g. gene function or ribosomes);2. Task-oriented, which are either task specific (e.g. annotation analysis) or task 

generalisations (e.g. problem solving);3. Generic, which capture common high level concepts, such as Physical,

Abstract, Structure and Substance. This can be especially useful when trying

to re-use an ontology, as it allows concepts to be correctly or more reliably placed. It can also be important when generating or analysing natural language

expressions using an ontology. Generic ontologies are also known as `upper 

ontologies', `core ontologies' or `reference ontologies'.

Most bio-ontologies will have a mixture of all three of these types in their ontology. A

well-formed ontology will be built in a modular way using a mixture of generic domain,generic task and application ontologies. Its parts will be clearly defined so that they can

 be re-used. An less well-formed ontology will have these distinctions blurred, making re-use and modification more difficult. The measure of how well the dependencies in an

ontology have been separated is known as its ontological commitment . Other measures

for the quality of an ontology include its clarity, consistency, completeness andconciseness [6].

Ontologies are used in a wide range of application scenarios [10]:

1. A community reference - neutral authoring . The knowledge is authored in a

single language, and converted into a different form for use in multiple targetsystems. Benefits include knowledge re-use, improved maintainability and longterm knowledge retension;

2. Either defining database schema or defining a common vocabulary for database

annotation - ontology as specification. Describing a protein entry as`mitochondrial double stranded DNA binding protein' will ensure that a common

vocabulary is available for description, sharing and posing questions (see item 4 

Page 16: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 16/57

in this list). Benefits include documentation, maintenance, reliability, sharing and

knowledge re-use;

3. Providing common access to information. Information must be shared but isexpressed using unfamiliar vocabulary. The ontology helps to render the

information intelligible by providing a shared understanding of the terms or 

mapping between the terms. Benefits include interoperability, and more effectiveuse and re-use of knowledge resources;

4. Ontology-based search by forming queries over databases. An ontology is used

for searching an information repository. For example, when searching databasesfor `mitochondrial double stranded DNA binding proteins', all and only those

 proteins will be found, as the exact terms for searching can be used. Whether the

user of the terms can be sure of their meaning depends on how the knowledge in

the ontology has been represented. For example, is it explicit that the`mitochondrial' applies to the `DNA' or the `binding protein'?

Queries can be refined by following relationships within the ontology, for 

example, following relationships to find those processes in which proteins of certain functions act and gathering the associated proteins. Moving up and downthe `is a kind of' hierarchy within the ontology can also be used to refine queries.

For example, specialising `DNA binding protein' to `single stranded DNA binding

 protein' by moving down the hierarchy when the former gathered too manyanswers. Benefits include more effective access and hence more effective use and

re-use of knowledge resources;

5. Understanding database annotation and technical literature. These ontologies are

designed to support natural language processing (NLP) that not only link domainknowledge but also how knowledge is related to linguistic structures such as

grammar and lexicons.

Although some methodologies are beginning to emerge that compare the structure and

role of various ontologies [11], none have appeared that compare the content of oneontology with another for a specific domain.

Page 17: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 17/57

A Survey of Current Bio-Ontologies

The use of ontology within bioinformatics is relatively recent and consequently there are

not a large number in existence. In this section, a representative sample of existing bio-ontologies will be reviewed. This survey has been restricted to those ontologies most

 pertinent to current trends in bioinformatics and molecular biology, rather than the wider 

field of biology. Biology is rich in taxonomies, such as the Enzyme Classification [12]and species taxonomies. Being taxonomies, they only use a subsumption hierarchy. The

ontologies reviewed here tend to be richer in their use of relationships, hence their 

inclusion, but this is not to denigrate the usefulness of taxonomies to many applications.

The ontologies reviewed are as follows:

• The RiboWeb ontology http://smi-

web.stanford.edu/projects/helix/riboweb.html ;

The EcoCyc ontologyhttp://ecocyc.PangeaSystems.com/ecocyc/ecocyc.html ;

• The Schulze-Kremer ontology for molecular biology (MBO) http://igd.rz-

berlin.mpg.de/~www/oe/mbo.html;

• The Gene Ontology (GO) http://genome-www.stanford.edu/GO/;

• The TAMBIS Ontology (TaO) http://img.cs.man.ac.uk/tambis.

The content, in terms of scope, concepts and relationships, as well as the use of eachontology will be presented. In the section on building an ontology, these ontologies will

 be revisited, as they also illustrate the variety of ontology building styles. Table 1 

summarises these bio-ontologies with respect to organisation, structure, purpose and

content.

Page 18: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 18/57

The RiboWeb Ontology

RiboWeb [13,14] is a resource whose primary aim is to facilitate the construction of 

three-dimensional models of ribosomal components and compare the results to existing

studies. The knowledge that RiboWeb uses to perform these tasks is captured in four ontologies: The physical-thing ontology; the data ontology; the publication ontology and

the methods ontology. The physical-thing ontology describes ribosomal components and

associated `physical things'. It has three principle conceptualisations: Molecules,

Molecule-Ensembles and Molecule-Parts. The first describes covalently bonded

molecules and includes the main biological macromolecules. Molecule-ensembles

captures non-covalently bonded collections of molecules, such as enzyme complexes.The molecule-part ontology holds knowledge about regions of molecules that do not exist

independently, but need to be talked about by biologists. These would include amino acid

side chains and the 3' and 5' ends of nucleic acid molecules. The data ontology capturesknowledge about experimental detail as well as data on the structure of physical-things.

The methods ontology contains information about techniques for analysing data. It holdsknowledge of which techniques can be applied to which data, as well as the input andoutputs of each method.

Instances are added to RiboWeb that correspond to these concepts. For example, a

 publication in a peer-reviewed article describes the three-dimensional structure of the 30s

ribosomal subunit. This means linked instances need to be created in the publication, dataand physical-thing ontologies. A user may want to see if this structure is consistent with

others captured within RiboWeb [14]. The constraints described within RiboWeb can

highlight conflicts with current knowledge to the biologist.

Page 19: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 19/57

The EcoCyc Ontology

EcoCyc, like RiboWeb, uses an ontology to describe the richness and complexity of a

domain and the constraints acting within that domain, to specify a database schema [15].

EcoCyc is presented to biologists using an encyclopaedia metaphor. It covers E. coli.genes, metabolism, regulation and signal transduction, which a biologist can explore and

use to visualise information [16]. The knowledge base currently describes 4391 E. coli.

genes, 695 enzymes encoded by a subset of these genes, 904 metabolic reactions and theorganisation of these reactions into 129 metabolic pathways. EcoCyc uses the

classification of gene product function from Riley [17] as part of this description.

Scientists can visualise the layout of genes within the E. coli. chromosome, or of anindividual biochemical reaction, or of a complete biochemical pathway (with compound

structures displayed).

EcoCyc's use of an ontology to define a database schema has the advantages of its

expressivity and ability to evolve quickly to account for the rapid schema changes neededfor biological information [15]. The user is not aware of this use of an ontology, except

that the constraints expressed in the knowledge captured mean that the complexity of the

data held is captured precisely. In EcoCyc, for example, the concept of Gene is

represented by a class with various attributes, that link through to other concepts:

Polypeptide product, Gene name, synonyms and identifiers used in other databases

etc. The representation system can be used to impose constraints on those concepts andinstances which may appear in the places described within the system.

Page 20: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 20/57

The Ontology for Molecular Biology

The Ontology for Molecular Biology (MBO) is an attempt to provide clarity and

communication within the molecular biology database community [2]. The use of MBO

would avoid `semantic confusion', such as that which arises with the use of the concept of Gene (see Section 1). Schulze-Kremer claims `By adhering to a commonly agreeable

ontology, uncertainty and misunderstanding about the semantic relations between

database entries from different databases can be eliminated.' This would mean that either the different databases agreed to the common MBO definition (and changed their 

annotations accordingly) or inferences about the differences between each databases

conceptualisation of `gene' could be made in terms of the MBO. In either case, attemptscould then be made to reconcile or interoperate between the databases.

The MBO contains concepts and relationships that are required to describe biological

objects, experimental procedures and computational aspects of molecular biology [2]. It

is very wide ranging and has over 1200 nodes representing both concepts and instances.In the conceptual part of the MBO, the primary relationship used is the `is a kind of'

relationship. The MBO has an organising, upper-level ontology. The root concept

``Being' divides into `object' and `event'. `Object', for instance, is subdivided into`physical-' and `abstract-' object. This helps give a precise classification for lower level

concepts - so, `physics objectis an `abstract object' and `DNA' a `physical-object'. MBO

defines a linkage map from GDB in the following way: `DBObject MappingObject Map

LinkageMap' (the represents the sub-concept relationship).

The actual biological content of the MBO is currently relatively small, ending at quite

large grained concepts such as Protein, Gene, and Chromosome. The framework,

however, exists for extending the MBO much further into the biological domain.

Page 21: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 21/57

The TAMBIS Ontology

TAMBIS (Transparent Access to Multiple Bioinformatics Information Sources) uses an

ontology to enable biologists to ask questions over multiple external databases using a

common query interface [1]. The TAMBIS ontology (TaO) [19] describes a wide rangeof bioinformatics tasks and resources, and has a central role within the TAMBIS system.

An interesting difference between the TaO and some of the other ontologies reviewed

here, is that the TaO does not contain any instances. The TaO only contains knowledgeabout bioinformatics and molecular biology concepts and their relationships - the

instances they represent still reside in the external databases. As concepts represent

instances, a concept can act as a question. The concept Receptor Protein represents the

instances of proteins with a receptor function and gathering these instances is answering

that question.

The TaO is a dynamic ontology, in that it can grow without the need for either conceptualising or encoding new knowledge. In contrast, the other ontologies describedhere are static - developers must interveen and encode new conceptualisation to form new

concepts. The TaO uses rules within the ontology to govern what concepts can be joined

to another concept via relationships, to form new concepts. Thus the TaO places great

emphasis on relations. A user can form a complex, multi-source query, usingrelationships, in the following manner. Starting with the concept Protein, the TaO is

consulted as to which relationships can be used to join Protein to other concepts.

Amongst many, the following two are offered: is homologous to Protein and

hasAccessionNumber AccessionNumber. Initially, the original Protein is extended to

give a new concept Protein isHomologous to Protein (The concept Protein

Protein homologue); then the second `protein' is extended with hasAccessionNumber

AccessionNumber. The resulting concept (`Protein homologue of Protein with Accession

 Number') describes proteins which are homologous to protein with a particular accessionnumber. This concept can be used as a source independent query containing no

information on how to answer such a query. The rest of the TAMBIS system takes this

conceptual query and processes it to an executable program against the external

sources [20].

The TaO is available in two forms - a small model that concentrates on proteins and a

larger scale model that includes nucleic acids. The small TaO, with 250 concepts and 60

relationships, describes Proteins and enzymes, as well as their motifs, secondary and

tertiary structure, functions and processes. There is also supporting material onsubcellular structure and chemicals, including cofactors. Motifs extend to detail such as

the principal modification sites; function and process to broad classifications such as Hormone and Receptor , and Apoptosis and Lactation; structure extends to detail such asgross architecture - for example, SevenPropellor . Important relationships include is

component of , has name, has function and is homologous to, as well as many more. The

larger model, with 1500 concepts, broadens these areas to include concepts pertinent tonucleic acid, its children and genes.

Page 22: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 22/57

The Gene Ontology

The Gene Ontology (GO), like the MBO, has database annotation as its main purpose.GO, however, has grown up from within a group of databases, rather than being proposed

from outside. GO's scope is also narrower; instead of attempting to describe the whole of 

molecular biology captured in the community's databases, GO seeks to captureinformation about the role of gene products within an organism. The classification of 

gene function by Riley [17] has a similar scope, but for  E. coli only. GO was initially

created to reflect Drosophila gene function via the Flybase database [18], but has

expanded to encompass mouse yeast and gene expression databases, and is expected toexpand further. Thus, the main use of GO is as a controlled vocabulary for conceptual

annotation of gene product function, process and location in databases.

GO lacks any upper-level organising ontology. It is essentially composed of threehierarchies, representing the function of a gene product; the process in which it takes

 place and cellular location and structure. GO contains a wide range of concepts, and

 provides a rich level of detail in its three hierarchies. It uses the `is a kind of' and `is part

of ' relationships to describe the role of gene products. It currently has over 5000concepts within the ontology.

GO defines a fine level of conceptual detail: Double stranded DNA binding proteins;

Transcription factors; cytosolic chaperones; muscle motor protein; learning and memory;

 blood coagulation; male genital morphogenesis; ventral pattern formation; and many

 pathways, transport and signal transduction systems. GO uses multiple inheritance in the`is a kind of' hierarchy in forming some of the concepts and there is some use of an `is

 part of' relationship. Many of the relationships held by concepts, however, remainimplicit in GO. For example, the concept `succinate (cytosol) to fumarate

(mitochondrion) transporter' implicitly holds properties about location and orientation in

the mitochondrial membrane etc.

Summary

There are two important messages from this brief survey of bio-ontologies: the first is

that ontologies are being used within the community to provide knowledge input todatabases and applications. The second message is that all these ontologies are verydifferent and specific to their intended use. TaO is an ontology of bioinformatics

tasks and so contains such concepts as AccessionNumber and ProteinId ,

which are not part of the world of molecular biology. The TaO could not be

substituted for EcoCyc's ontology. GO is an ontology of gene product function and

RiboWeb represents knowledge of Ribosomal subunit structure, data andmethodologies. As GO is used for database annotation, it holds a fine level of detail

Page 23: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 23/57

whereas the TaO is quite shallow, but precision is gained during query formulation

 by joining concepts together. Even if one ontology could be developed, individual

applications would only use a subset, leading to a requirement of highly modular ontologies with minimised dependencies and assumptions between them. That

ontology use influences the content and nature of the knowledge captured within an

ontology is not a contradiction of the knowledge holding ability of ontologies. Notonly does the purpose determine the scope and granularity to which the same

knowledge is represented in different ontologies, but conceptualisations may differ 

without one being incorrect. For example, TaO describes that DNA may be translatedto protein. This is wrong in molecular biological terms, but is a feature of 

 bioinformatics - so conceptualisations of the same domain may differ. Sometimes a

constraint is necessary for an application and sometimes it is not needed for another,

this simply changes what knowledge is captured or how it is captured, it does notchange the knowledge itself.

Page 24: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 24/57

Building an Ontology

Although there is some collective experience in developing and using ontologies, there is

no field of ontological engineering comparable to knowledge engineering. In particular,as yet, there are no standardised methodologies for building ontologies. Such a

methodology would include a set of stages that occur when building ontologies,

guidelines and principles to assist in the different stages, and an ontology life-cycle whichindicates the relationships among stages [21]. The most well known ontology

construction guidelines were developed by Gruber [6], to encourage the development of 

more re-usable ontologies. Recently, there has been increased effort in trying to develop a

comprehensive ontology methodology (e.g. [22,23,21]). A survey is given in [24].

The Development Lifecycle

Methodologies broadly divide into those that are stage-based (e.g. TOVE [21]) and thosethat rely on iterative evolving prototypes (e.g. Methontology [25]). These are in factcomplementary techniques. Most distinguish between an informal stage, where the

ontology is sketched out using either natural language descriptions or some diagram

technique, and a formal stage where the ontology is encoded in a formal knowledge

representation language, that is machine computable. As an ontology should ideally becommunicated to people and unambiguously interpreted by software, the informal

representation helps the former and the formal the latter.

Figures 1 and 2 represents a skeletal methodology and life-cycle for building ontologies,

inspired by the software engineering V-process model [26]. The left side of the V charts

the processes in building an ontology and the right side charts the guidelines, principlesand evaluation used to `quality assure' the ontology. The overall process, however, moves

through a life-cycle, as depicted in Figure 2. 

Page 25: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 25/57

Figure 1: The V-model inspired methodology for building ontologies.

Page 26: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 26/57

Figure 2: The ontology building life-cycle.

The stages in the V-process model and life-cycle are:

Identify purpose and scope:

developing a requirements specification for the ontology by identifying the

intended scope and purpose of the ontology. A well-characterised requirements

specification is important to the design, evaluation and re-use of an ontology. Itcan be seen from Section 4 that the use to which an ontology is put has a great

effect on the content and style of that ontology.

Knowledge Acquisition:

the process of acquiring domain knowledge from which the ontology will be built.Sources span the complete range of knowledge holders: Specialist biologists;

database metadata; standard text books; research papers and other ontologies.

Motivating scenarios are collected and informal competency questions formed[21] - these are informal questions that the ontology must be able to answer and

will be used to check that the ontology is fit for purpose. The EcoCyc and

RiboWeb ontologies had the bulk of their knowledge gathered from the researchliterature on E. coli. metabolism and ribosomal structure respectively. In the

Page 27: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 27/57

former case this was a huge volume of material, which took many years to

 process. The TaO, being built to query databases, extracted a large part of its

knowledge from database documentation. Standard texts also contributed to theknowledge of core molecular biology.

Conceptualisation:

identifying the key concepts that exist in the domain, their properties and therelationships that hold between them; identifying natural language terms to refer 

to such concepts, relations and attributes; and structuring domain knowledge into

explicit conceptual models. This is the process touched upon in Section 2, wherethe concepts and relationships describing the domain are captured. The ontology

is usually described using some informal terminology. Gruber [6] suggests

writing lists of the concepts to be contained within the ontology and exploring

other ontologies to re-use all or part of their conceptualisations and terminologies.At this stage it is important to bear the results of the first step, that of 

requirements gathering, in mind.

Integrating:

use or specialise an existing ontology: a task frequently hindered by theinadequate documentation of existing ontologies, notably their implicit

assumptions. Using a generic ontology, such as MBO, or [27,28] gives a deeper definition of the concepts in the chosen domain.

Encoding:

representing the conceptualisation in some formal language, e.g. frames, object

models or logic. This includes the creation of formal competency questions interms of the terminological specification language chosen (usually first order 

logic). The representation of ontologies is explored further below.

Documentation:

informal and formal complete definitions, assumptions and examples are essential

to promote the appropriate use and re-use of an ontology. Documentation is

important for defining, more expansively than is possible within the ontology, theexact meaning of terms within the ontology.

Evaluation:

determining the appropriateness of an ontology for its intended application.Evaluation is done pragmatically, by assessing the competency of the ontology to

satisfy the requirements of its application, including determining the consistency,

completeness and conciseness of an ontology [25]. Conciseness implies an

absence of redundancy in the definitions of an ontology and an appropriategranularity. For example, an ontology that modelled protein molecules at the

atomic resolution when the amino acid level would suffice would not be

considered concise.

Page 28: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 28/57

Page 29: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 29/57

Vocabularies support the creation of purely hand-crafted ontologies with simple tree-like

inheritance structures. The Gene Ontology, for example, has a hierarchical structure

which is asserted - the position of each concept and its relation with others in theontology is completely determined by the modeller or ontologist. Each entry or concept

in the GO has a name, an identifier and other optional pieces of information such as

synonyms, references to external databases and so on.

Although this provides great flexibility, the lack of any structure in the representation canlead to difficulties with maintenance or preserving consistency, and there are usually no

formally defined semantics. The single inheritance provided by a tree structure (each

concept has only one parent in the is-a hierarchy) can also prove limiting. Maintainingmultiple inheritance hierarchies, however, is an arduous task - the hand-crafting of single

inheritance hierarchies is a difficult enough exercise.

A frame-based system provides greater structure. Frame-based systems are based around

the notion of frames or classes which represent collections of instances (the concepts of 

the ontology). Each frame has an associated collection of slots or attributes which can befilled by values or other frames. In particular, frames can have a kind-of slot which

allows the assertion of a frame taxonomy. This hierarchy can then be used for inheritanceof slots, allowing a sparse representation. As well as frames representing concepts, a

frame-based representation may also contain instance frames, which represent particular 

instances.

Frame-based systems have been used extensively in the KR world, particularly for applications in natural language processing. The most well known frame system is

Ontolingua [31]. Both EcoCyc and RiboWeb use a frame representation. EcoCyc has a

frame, amongst others, called `Gene', representing the concept Gene. This frame has slots

describing relationships to other concepts, such as Polypeptide product, gene name,synonyms and so on. Frames are popular because frame-based modelling is similar to

object-based modelling and is intuitive for many users.

The semantics of frame systems are defined by the OKBC standard [32], although this isa little unclear in places. For example, it is not always clear how to interpret an assertion

that a slot is filled with a particular value. Does this mean that all instances of the frame

must have this particular attribute taking this value? Or does the value represent possiblefillers for the slot for each instance? For example, we might want to say that the frameGene has a slot saying `all genes must have a GeneName', but it is only a possibility that

Genes `have a Polypeptide Product' (some, after all, produce tRNAs).

An alternative to frames is logic, notably Description Logics (DLs) [33,34]. DLs describe

knowledge in terms of concepts and relations that are used to automatically derive

classification taxonomies. A major characteristic of a DL is that concepts are defined in

terms of descriptions using other roles and concepts. For instance, in the TaO, theconcept Enzyme was not simply asserted by the ontologist. Instead, a composite concept

was made from Protein and Reaction, joined with the relation `catalyses' - to make the

concept Protein which catalyses Reaction. Thus someone viewing the ontology

Page 30: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 30/57

Page 31: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 31/57

tools are really essential for maintaining complex ontologies that are necessary for 

capturing knowledge within the biology domain. Other tools support the collaborative

development of ontologies over the web (e.g. WebOnto [40]). A survey of tools can befound in [41].

DiscussionThis briefing has introduced the need and use of ontology within the bioinformaticscommunity. The need for ontologies arises from the need to be able to cope with the size

and complexity of biological knowledge and data. Ontologies enable knowledge to be

used within systems for communication, specification and other processing tasks (see

Section 3).

Several bio-ontologies have already been used within the community. Those reviewed in

Section 4 demonstrate a wide range of scopes and granularities. Most have in common

some core features of molecular biology, such as Gene, Protein and relatedbiologicalFunction and BiologicalProcess, but differ widely in both the content

and articulation of their knowledge. This is primarily due to the wide range of tasks to

which the ontologies are put. Both RiboWeb and EcoCyc use part of their ontology todefine the structure and content of their databases, but as the databases are as different as

ribosomal subunit structure and E. coli. metabolism, the ontologies are also necessarily

different. Even the common areas, such as macromolecule, differ widely between

ontologies, but without any of the ontologies being incorrect.

Bio-ontologies are currently being used for communication of knowledge, as well as

database schema definition, query formulation and annotation. When the use of 

conceptual annotation grows we can expect to see a concomitant change in databaseretrieval. This will become much more precise and complete than is currently possiblewith natural language based annotations. Annotation by ontologies should also allow the

relationships describing functions, process and components etc. of retrieved entries to be

explored with ease.

There are a number of open issues to be addressed in the use of ontology within the

 bioinformatics community:

Knowledge based reasoning

This briefing started with a description of how biology research is often driven by

the use of knowledge, especially by determination of function by sequencesimilarity. Only RiboWeb, of the ontologies described, approaches this kind of 

use. It can be expected that the use of ontology to assist in analysis will grow

further. This will be made easier by the conceptual annotation of the primarydatabases - A collection of similar sequences returned by a search could be

clustered within an ontology of protein function and features. Such clustering

should be able to help with the analysis of similarity search results and other  bioinformatics analyses.

Page 32: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 32/57

Re-use vs Specific

Currently there is little re-use of bio-ontologies - this is partly because of 

difficulties in the diversity of their representational form, the explicitness of their semantics and the range of applications they address. OIL moves us further 

forward to a common representational language. As the number of bio-ontologies

increases, it will be interesting to see whether there is a growth in the re-use of ontology. The use of ontology in annotation could drive this process, as well as

that of ontology in analysis. An open issue in ontology re-use is the evolution of 

the source ontology once it has been re-used in another ontology. If the originalontology changes, should the changes be reflected where it is re-used and how

would this evolution be managed?

Tools and Libraries

The frame-based Protégé ontology development tool [42] is currently beingadapted to represent ontologies in OIL, so that we can build and deliver frame-

 based ontologies whilst gaining from the reasoning services offered by a DL. This

may be less important with small local ontologies designed by one expert, but

 becomes important for large, collaboratively developed ontologies that areintended to be re-used and shared. Libraries of ontologies, such as those held by

WebOnto and Ontolingua, must be developed if re-use is to be promoted.

Methodologies for constructing ontologies

The process of building an ontology, as described in Section 5, is a high-cost

 process. The reality is that the construction of ontologies is an art rather than a

science. Methodologies (supported by tools) are essential to: help the developer spot a concept; to modularise their ontologies; to avoid problems such as over 

elaboration (when should I stop elaborating the ontology); to ensure relevance

(when is a concept relevant for an application?) and to verify the ontology for itsfitness of purpose and its re-usability (if any).

If the application genuinely needs an ontology and that ontology will be long lived, then

the investment may well be worth while. Like many technologies, in a discipline such as

 bioinformatics, it is the community effort that is important in making the use of thattechnology productive.

Acknowledgements: Robert Stevens is supported by a grant from the BBSRC/EPSRC

under the bioinformatics initiative (34/BIO12090); Sean Bechhofer is supported by a

grant from the EPSRC under the DIM initiative (GR/M/75426).

Bibliography

1

P.G. Baker, A. Brass, S. Bechhofer, C. Goble, N. Paton, and R. Stevens.

TAMBIS: Transparent Access to Multiple Bioinformatics Information Sources.An Overview.

In Proceedings of the Sixth International Conference on Intelligent Systems for 

Molecular Biology (ISMB'98), pages 25-34, Menlow Park, California, June 28-July 1 1998. AAAI Press.

Page 33: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 33/57

2

S. Schulze-Kremer.

Ontologies for Molecular Biology.In Proceedings of the Third Pacific Symposium on Biocomputing , pages 693-704.

AAAI Press, 1998.

3 S. Blackburn.

The Oxford Dictionary of Philosophy.

Oxford University Press, 1996.4

D. B. Lenat.

Cyc: A Large-Scale Investment in Knowledge Infrastructure.Communications of the ACM , 38(11):32-38, 1995.

5

M. Uschold, M. King, S. Moralee, and Y. Zorgios.

The Enterprise Ontology.

The Knowledge Engineering Review, 13(1):31-89, 1998.SpecialIssue on Putting Ontologies to Use.

6T.R. Gruber.

Towards Principles for the Design of Ontologies Used for Knowledge Sharing.

In Roberto Poli Nicola Guarino, editor, International Workshop on Formal 

Ontology, Padova, Italy, 1993.Available as technical report KSL-93-04, Knowledge Systems Laboratory,

Stanford University: ftp.ksl.ftanford.edu/pub/KSL_Reports/KSL-983-

04.ps.

7

M. Winston, R. Chaffin, and D. Herrmann.A Taxonomy of Part-Whole Relations.Cognitive Science, 11:417-444, 1987.

8

J.J. Odell.Six Different Kinds of Aggregation, pages 139-149.

Cambridge University Press, 1998.

9R. J. Brachman, D. L. McGuinness, P. F. Patel-Schneider, L. A. Resnick, and

A. Borgida.

Living with Classic: When and How to Use a KL-ONE-like Language.

In J. Sowa, editor, Principles of Semantic Networks: Explorations in the

representation of knowledge, pages 401-456. Morgan Kaufmann, 1991.

10

R. Jasper and M. Uschold.A Framework for Understanding and Classifying Ontology Applications.

In Twelfth Workshop on Knowledge Acquisition Modeling and Management  KAW'99, 1999.Published on-line http://sern.ucalgary.ca/KSI/KAW/KAW99/ .

Page 34: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 34/57

11

 Nicola Guarino and C. Welty.

Identity, Unity, and Individuality: Towards a Formal Toolkit for OntologicalAnalysis.

In W Horn, editor, Proceedings of ECAI-2000: The European Conference on

 Artificial Intelligence, Amsterdam, August 2000. IOS Press.12

International Union of Biochemistry . Enzyme Nomenclature 1984 : Recommendations of the Nomenclature Committee

of the International Union of Biochemistry on the Nomenclature and 

Classification of Enzyme-Catalyzed Reactions.

Academic Press (for The International Union of Biochemistry by), Orlando, FL,

1984.13

R.O. Chen, R. Felciano, and R.B. Altman.

RiboWeb: Linking Structural Computations to a Knowledge Base of Published

Experimental Data.In Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology, pages 84-87. AAAI Press, 1997.

14

R. Altman, M. Bada, X.J. Chai, M. Whirl Carillo, R.O. Chen, and N.F.

Abernethy.

RiboWeb: An Ontology-Based System for Collaborative Molecular Biology. IEEE Intelligent Systems, 14(5):68-76, 1999.

15

P. Karp and S. Paley.Integrated Access to Metabolic and Genomic Data.

 Journal of Computational Biology, 3(1):191-212, 1996.

16P. Karp, M. Riley, S. Paley, A. Pellegrini-Toole, and M. Krummenacker.

EcoCyc: Electronic Encyclopedia of  E. coli Genes and Metabolism. Nucleic Acids Research, 27(1):55-58, 1999.

17

M. Riley.

Functions of the gene products of Escherichia coli.Microbiological Reviews, 57:862-952, 1993.

18

The FlyBase Consortium.

The FlyBase database of Drosophila Genome Projects and Community Literature.

 Nucleic Acids Research, 27(1):85-88, 1999.http://flybase.bio.indiana.edu/ ).

19

P.G. Baker, C.A. Goble, S. Bechhofer, N.W. Paton, R. Stevens, and A Brass.

An Ontology for Bioinformatics Applications. Bioinformatics, 15(6):510-520, 1999.

Page 35: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 35/57

20

 N.W. Paton, R.D. Stevens, P.G. Baker, C.A. Goble, S. Bechhofer, and A. Brass.

Query Processing in the TAMBIS Bioinformatics Source Integration System.In et al . Z.M. Ozsoyoglo, editor, Proc. 11th Int. Conf. on Scientific and Statistical 

 Database Management (SSDBM), pages 138-147, Los Alamitos, California, July

1999. IEEE Press.21

M. Uschold and M. Gruninger.

Ontologies: Principles, Methods and Applications. Knowledge Engineering Review, 11(2):93-113, June 1996.

22

M. Fernandez, A. Gomez-Perez, and N. Juristo.

METHODONTOLOGY: From Ontological Art to Ontological Engineering.In Workshop on Knowledge Engineering: Spring Symposium Series (AAAI'97),

 pages 33-40, Menlow Park, Ca, 1997. AAAI Press.

23

M. Gruninger and M. S. Fox.Methodology for the Design and Evaluation of Ontologies.

In IJCAI Workshop on Basic Ontological Issues in Knowledge Sharing , 1995.Published on-line by Ceur Publication http://sunsite.informatik.rwth-

aachen.de/Publications/CEUR-WS/Vol-18/ .

24D.M. Jones, T.J.M. Bench-Capon, and P.R.S. Visser.

Methodologies for Ontology Development.

In J. Cuena, editor, Proc. ITi and KNOWS Conference of the 15th IFIP World 

Computer Congress, pages 62-75, London, UK, 1998. Chapman and Hall Ltd.25

A. Gomez-Perez.Some Ideas and Examples to Evaluate Ontologies.Technical Report Technical Report KSL-94-65, Knowledge Systems Laboratory ,

Stanford, 1994.

26M.A. Ould.

Strategies for Software Engineering : The Management of Risk and Quality.

Chichester : Wiley, 1990.(Wiley series in software engineering practice.

27

A.L. Rector, J.E. Rogers, and P Pole.

The Galen high level ontology.Studies in Health Technology and Informatics, 34:174-178, 1996.

28

J. Sowa.Top-level ontological categories.

 International Journal of Human-Computer Studies, 43(5/6):669-686, 1995.

29

Page 36: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 36/57

D.A. Duce G.A. Ringland. Approaches to Knowledge Representation: An Introduction.

Knowledge-Based and Expert Systems Series. John Wiley, Chichester, 1988.30

I. Horrocks, D. Fensel, J. Broekstra, M. Crubezy, S. Decker, M. Erdmann,

W. Grosso, C. Goble, F. Van Harmelen, M. Klein, M. Musen, S. Staab, andR. Studer.

The ontology interchange language oil: The grease between ontologies.http://www.cs.vu.nl/~dieter/oil.

31

A. Farquhar, R. Fikes, and J.P. Rice.

The ontolingua server: A tool for collaborative ontology construction. Journal of Human-Computer Studies, 46:707-728, 1997.

32

V.K. Chaudhri, A. Farquhar, R. Fikes, P.D. Karp, and J.P. Rice.

OKBC: A programmatic foundation for knowledge base interoperability.

In Proc of 15th National Conf on At (AAAI-98) and the 10th Conf on Innovative Applications of AI (IAAI-98), pages 600-607, Menlow Park, Ca, 1998. AAAI

Press.33

A. Borgida.

Description Logics in Data Management. IEEE Trans Knowledge and Data Engineering , 7(5):671-782, 1995.

34

W. A. Woods and J. G. Schmolze.

The KL-ONE Family.Computers Math. Applic., 23(2-5):133-177, 1992.

35 S Bechhofer and C.A. Goble.Delivering Terminological Services.

 AI*IA Notizie, Periodico dell'Associazione Italiana per l'intelligenza Artificiale,

12(1), March 1999.36

I. Horrocks.

Using an Expressive Description Logic: FaCT or Fiction?

In A.G.Cohn, L.K. Schubert, and S.C.Shapiro, editors, Principles of Knowledge

 Representation and Reasoning: Proceedings of the Sixth International Conference (KR'98). Morgan Kaufmann Publishers, San Fransisco, CA, 1998.

37A. L. Rector, S. K. Bechhofer, C. A. Goble, I. Horrocks, W. A. Nowlan, and

W. D. Solomon.

The GRAIL Concept Modelling Language for Medical Terminology. Artificial Intelligence in Medicine, 9:139-171, 1996.

38

J.E. Rogers, W.D. Solomon, A.L. Rector, P.M. Pole, P. Zanstra, and van der Haring E.

Page 37: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 37/57

Rubrics to Dissections to GRAIL to Classifications.

In Medical Informatics Europe '97 , pages 241-245, Amsterdam, 1997. IOS Press

Vol 43.39

P.D. Karp, V.K. Chaudhri, and S.M Paley.

A Collaborative Environment for Authoring Large Knowledge Bases. Journal of Intelligent Information Systems, 13(3):155-194, 1999.

40

J. Domingue.Tadzebao and WebOnto: Discussing, Browsing, and Editing Ontologies on the

Web.

In 11th Knowledge Acquisition for Knowledge-Based Systems Workshop, 1998.

41A. J. Duineveld, R. Stoter, M.R. Weiden, B. Kenepa, and V. R. Benjamins.

Wondertools? A Comparative Study of Ontological Engineering Tools.

In Twelfth Workshop on Knowledge Acquisition, Modeling and Management ,

1999.Published on-line http://sern.ucalgary.ca/KSI/KAW/KAW99/ .

42W. E. Grosso, H. Eriksson, R. W. Fergerson, J. H. Gennari, S. W. Tu, and M. A.

Musen.

Knowledge Modeling at the Millennium (The Design and Evolution of Protégé-2000).

Technical Report SMI-1999-0801, Stanford Medical Informatics (SMI), Stanford

University School of Medicine, 1999.

[Online] at: http://www-smi.stanford.edu/pubs/SMI_Reports/SMI-1999-

0801.pdf.

Page 38: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 38/57

Page 39: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 39/57

SBO

From Wikipedia, the free encyclopedia

• Ten things  you may not know about Wikipedia•

Jump to: navigation, search

For other uses, see SBO (disambiguation).

SBO is the Systems Biology Ontology project, another cornerstone of the BioModels.net 

effort. The goal of SBO is to develop Controlled vocabularies and ontologies tailored

specifically for the kinds of problems being faced in Systems biology, especially in thecontext of computational modeling.

Contents

[hide]

• 1 Motivation

• 2 Structure• 3 Resources

• 4 SBO and SBML

• 5 Organization of SBO development

• 6 Funding for SBO

• 7 External references

[edit] Motivation

The rise of Systems Biology, seeking to comprehend biological processes as a whole,highlighted the need to not only develop corresponding quantitative models, but also to

create standards allowing their exchange and integration. This concern drove the

community to design common data format such as SBML and CellML. SBML is nowlargely accepted and used in the field. However, as important as the definition of a

common syntax is, it is also necessary to make clear the semantics of models. SBO is an

attempt to provide the means of annotating models with terms that indicate the intended

semantics of an important subset of models in common use in computational systems biology.

[edit] Structure

Page 40: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 40/57

SBO is currently made up of five different vocabularies: quantitative parameters

(catalytic constant, thermodynamic temperature ...), participant type (substrate, product,

catalyst...), modelling frameworks (discrete, continuous...), mathematical expressions andevents.

[edit] Resources

The Systems Biology Ontology Browser 

To curate and maintain SBO, a dedicated resource has been developed and the public

interface of the SBO browser can be accessed at http://www.ebi.ac.uk/sbo. A relational

database management system (MySQL) at the back-end is accessed through a webinterface based on Java Server Pages (JSP) and JavaBeans. Its content is encoded in

UTF-8, therefore supporting a large set of characters in the definitions of terms.

Distributed curation is made possible by using a custom-tailored locking system allowingconcurrent access. This system allows a continuous update of the ontology with

immediate availability and suppress merging problems.

Several exports formats (OBO flat file, SBO-XML and OWL) are generated daily or onrequest and can be downloaded from the web interface.

To allow programmatic access to the resource, Web Services have been implemented

 based on Apache Axis for the communication layer and Castor for the validation. The

librairies, full documentation, samples and tutorial are available online.

The sourceforge project can be accessed at http://sourceforge.net/projects/sbo/.

[edit] SBO and SBML

SBML Level 2 Version 2 provides a mechanism to annotate model components withSBO terms, therefore increasing the semantics of the model beyond the sole topology of 

interaction and mathematical expression. Simulation tools can check the consistency of arate law, convert reaction from one modelling framework to another (e.g., continuous to

discrete), or distinguish between identical mathematical expressions based on different

assumptions (e.g., Henri-Michaelis-Menten Vs. Briggs-Haldane). Other tools such asSBMLmerge can use the SBO annotation to integrate individual models into a larger one.

The use of SBO is not restricted to the development of models. Resources providing

Page 41: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 41/57

quantitative experimental information such as SABIO Reaction Kinetics will be able to

annotate the parameters (what do they mean exactly, how were they calculated) and

determine relationships between them.

[edit] Organization of SBO development

SBO is built in collaboration by the Computational Neurobiology Group (Nicolas Le

 Novère, EMBL-EBI, United-Kingdom) and the SBMLTeam (Michael Hucka, Caltech, 

USA).

[edit] Funding for SBO

SBO has benefited from the funds of the European Molecular Biology Laboratory and the

 National Institute of General Medical Sciences.

[edit] External references

• www.biomodels.net

• [1] The Systems Biology Markup Language (SBML): A Medium for 

Representation and Exchange of Biochemical Network Models

• [2] CellML: its future, present and past.

Page 42: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 42/57

the Gene Ontology

• Open menus

• Home• FAQ

• Downloads

• Ontologies

• Annotations

• Database

• Mappings to GO

• Teaching Resources

• Other files

FTP and CVS downloads• Tools

• Browsers

• Microarray tools

• Annotation tools• Other tools

• Submit New Tools

• Documentation

• Introduction

• Annotation Guide

• Evidence Code Guide

Component Ontology• Function Ontology

• Process Ontology

• File Format Guide• GO Database Guide

• GO Slim Guide

• Meeting minutes

• Editorial Style Guide

• About GO

• GO Consortium

• Publications

• Citation Policy

• Mailing lists

• Interest Groups

• GO People

• Funding

• Acknowledgements•  Newsletter 

• Projects

Page 43: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 43/57

• Cardiovascular 

• Immunology

• Reference Genomes

• Contact GO

• Site Map

An Introduction to the Gene Ontology

• What does the Gene Ontology Consortium do?

• Terms in the Gene Ontology

• Species-specific terms

• Obsolete terms• The Ontologies

• Cellular component

• Biological process

Molecular function• Ontology structure

• Topology

• Term-Term Relationships

• Relationship Transitivity

• What GO is NOT

• Annotation and tools

• Downloads

• Beyond GO

• Cross-products• Mappings to other classification systems

Contributing to GO

What does the Gene Ontology Consortium do?

Biologists currently waste a lot of time and effort in searching for all of the availableinformation about each small area of research. This is hampered further by the wide

variations in terminology that may be common usage at any given time, which inhibit

effective searching by both computers and people. For example, if you were searching for new targets for antibiotics, you might want to find all the gene products that are involved

in bacterial protein synthesis, and that have significantly different sequences or structures

from those in humans. If one database describes these molecules as being involved in

'translation', whereas another uses the phrase 'protein synthesis', it will be difficult for you- and even harder for a computer - to find functionally equivalent terms.

The Gene Ontology (GO) project is a collaborative effort to address the need for 

consistent descriptions of gene products in different databases. The project began as acollaboration between three model organism databases, FlyBase (Drosophila), the

Saccharomyces Genome Database (SGD) and the Mouse Genome Database (MGD),

in 1998. Since then, the GO Consortium has grown to include many databases, including

Page 44: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 44/57

several of the world's major repositories for plant, animal and microbial genomes. See the

GO Consortium page for a full list of member organizations.

The GO project has developed three structured controlled vocabularies (ontologies) thatdescribe gene products in terms of their associated biological processes, cellular 

components and molecular functions in a species-independent manner. There are threeseparate aspects to this effort: first, the development and maintenance of the ontologies

themselves; second, the annotation of gene products, which entails making associations between the ontologies and the genes and gene products in the collaborating databases;

and third, development of tools that facilitate the creation, maintenance and use of 

ontologies.

The use of GO terms by collaborating databases facilitates uniform queries across them.The controlled vocabularies are structured so that they can be queried at different levels:

for example, you can use GO to find all the gene products in the mouse genome that are

involved in signal transduction, or you can zoom in on all the receptor tyrosine kinases.

This structure also allows annotators to assign properties to genes or gene products atdifferent levels, depending on the depth of knowledge about that entity.

Back to top 

Terms in the Gene Ontology

The building blocks of the Gene Ontology are the terms, so what makes up a GO term?

Each entry in GO has a unique numerical identifier of the form GO:nnnnnnn, and a term

name, e.g. cell, fibroblast growth factor receptor binding or signal transduction. Each

term is also assigned to one of the three ontologies, molecular function, cellular component or biological process.

The majority of terms have a textual definition, with references stating the source of the

definition. If any clarification of the definition or remarks about term usage are required,

these are held in a separate comments field.

Many GO terms have synonyms; GO uses 'synonym' in a loose sense, as the nameswithin the synonyms field may not mean exactly the same as the term they are attached

to. Instead, a GO synonym may be broader or narrower than the term string; it may be a

related phrase; it may be alternative wording, spelling or use a different system of 

nomenclature; or it may be a true synonym. This flexibility allows GO synonyms to serveas valuable search aids, as well as being useful for applications such as text mining and

semantic matching. The relationship of the synonym to the term is recorded within theGO file.

The scope of the Gene Ontology overlaps with a number of other databases, and in cases

where a GO term is identical in meaning to an object in another database, a database

Page 45: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 45/57

cross reference is added to the term. These cross references can also be downloaded from

the mappings to GO page.

Species-specific terms

The Gene Ontology aims to provide a controlled vocabulary that can be used to describeany organism; nevertheless, many functions, processes and components are not commonto all life forms. The convention is to include any term that can apply to more than one

taxonomic class of organism. To specify the class of organisms to which a term is

applicable, GO uses the designator  sensu, 'in the sense of'; for example, trichomedifferentiation (sensu Magnoliophyta) represents the differentiation of plant hair cells

(trichomes).

Obsolete terms

Occasionally, a term is found that is outside the scope of GO, is misleadingly named or 

defined, or describes a concept that would be better represented in another way. Rather than delete the term, it is deprecated or made obsolete. The term and ID still exist in theGO database, but the term is marked as obsolete, and a comment if often added, giving a

reason for the obsoletion. A replacement term is usually also suggested.

Back to top 

The Ontologies

The three organizing principles of GO are cellular component, biological process andmolecular function. A gene product might be associated with or located in one or more

cellular components; it is active in one or more biological processes, during which it

 performs one or more molecular functions. For example, the gene product cytochrome ccan be described by the molecular function term oxidoreductase activity, the biological

 process terms oxidative phosphorylation and induction of cell death, and the cellular 

component terms mitochondrial matrix and mitochondrial inner membrane.

Cellular component

A cellular component is just that, a component of a cell, but with the proviso that it is partof some larger object; this may be an anatomical structure (e.g. rough endoplasmic

reticulum or nucleus) or a gene product group (e.g. ribosome, proteasome or a protein

dimer). See the documentation on the cellular component ontology for more details.

Biological process

A biological process is series of events accomplished by one or more ordered assemblies

of molecular functions. Examples of broad biological process terms are cellular  physiological process or signal transduction. Examples of more specific terms are

Page 46: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 46/57

 pyrimidine metabolic process or alpha-glucoside transport. It can be difficult to

distinguish between a biological process and a molecular function, but the general rule is

that a process must have more than one distinct steps.

A biological process is not equivalent to a pathway; at present, GO does not try to

represent the dynamics or dependencies that would be required to fully describe a pathway.

Further information can be found in the process ontology documentation.

Molecular function

Molecular function describes activities, such as catalytic or binding activities, that occur at the molecular level. GO molecular function terms represent activities rather than the

entities (molecules or complexes) that perform the actions, and do not specify where or 

when, or in what context, the action takes place. Molecular functions generally

correspond to activities that can be performed by individual gene products, but someactivities are performed by assembled complexes of gene products. Examples of broad

functional terms are catalytic activity, transporter activity, or binding; examples of 

narrower functional terms are adenylate cyclase activity or Toll receptor binding.

It is easy to confuse a gene product name with its molecular function, and for that reason

many GO molecular functions are appended with the word "activity". The documentation

on gene products explains this confusion in more depth. The documentation on the

function ontology explains more about GO functions and the rules governing them.

Back to top 

Ontology structure

Topology

The ontologies are structured as directed acyclic graphs, which are similar to hierarchies

 but differ in that a more specialized term (child) can be related to more than one less

specialized term (parent). For example, the biological process term hexose biosynthetic process has two parents, hexose metabolic process and monosaccharide biosynthetic

 process. This is because biosynthetic process is a type of metabolic process and a hexose

is a type of monosaccharide. When any gene involved in hexose biosynthetic process is

annotated to this term, it is automatically annotated to both hexose metabolic process andmonosaccharide biosynthetic process.

Term-Term Relationships

GO terms can be linked by five types of relationships: is_a, part_of, regulates,

 positively_regulates and negatively_regulates.

Page 47: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 47/57

is_a

The is_a relationship is a simple class-subclass relationship, where A is_a B means that A

is a subclass of B; for example, nuclear chromosome is_a chromosome.

GO:0043232 : intracellular non-membrane-bound organelle[i] GO:0005694 : chromosome---[i] GO:0000228 : nuclear chromosome

part_of 

The part_of relationship is slightly more complex; C part_of D means that whenever C is

 present, it is always a part of D, but C does not always have to be present. An example

would be periplasmic flagellum part_of periplasmic space:

GO:0044464 : cell part

[i] GO:0042995 : cell projection---[i] GO:0019861 : flagellum------[i] GO:0009288 : flagellin-based flagellum---------[i] GO:0055040 : periplasmic flagellum[i] GO:0042597 : periplasmic space---[p] GO:0055040 : periplasmic flagellum

When a periplasmic flagellum is present, it is always part_of a periplasmic space.

However, every periplasmic space does not necessarily have a periplasmic flagellum.

regulates, positively_regulates and negatively_regulates

The regulates, positively_regulates and negatively_regulates relationships describeinteractions between biological processes and other biological processes, molecular 

functions or biological qualities. When a biological process E regulates a function or a

 process F, it modulates the occurrence of F. If F is a biological quality, then E modulatesthe value of F. An example of the regulation of a biological process would be the term

regulation of transcription. When regulation of transcription occurs, it always alters the

rate, extent or frequency at which a gene is transcribed.

Relationship Transitivity

is_a and part_of 

The is_a and part_of relationships are transitive, which means that the relationships are propagated from children terms to parent terms. An example of is_a transitivity is shown

in the nuclear chromosome example previously used:

Page 48: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 48/57

GO:0043232 : intracellular non-membrane-bound organelle[i] GO:0005694 : chromosome---[i] GO:0000228 : nuclear chromosome

All nuclear chromosomes must be intracellular non-membrane-bound organelles.

An example of part_of transitivity is shown below:

GO:0048869 : cellular developmental process[i] GO:0030154 : cell differentiation---[p] GO:0048468 : cell development------[p] GO:0000904 : cellular morphogenesis duringdifferentiation

Every occurrence of cellular morphogenesis during differentiation must be a part of an

occurrence of cell differentiation.

regulates, positively_regulates and negatively_regulates

The regulates relationships are transitive over both the part_of and is_a relationships.

GO:0010467 : gene expression[r] GO:0010468 : regulation of gene expression---[i] GO:0045449 : regulation of transcription[p] GO:0006350 : transcription---[r] GO:0045449 : regulation of transcription

 part_of transitivity: If process Y exists in the GO biological process ontology and it is a part_of child of process X then any process that regulates process Y also regulates

 process X.

In the example above, regulation of transcription regulates transcription which is part_of 

gene expression. Therefore, regulation of transcription also regulates gene expression.

is_a transitivity: If process B exists in the GO biological process ontology and it is an

is_a child of process A then any process that regulates process B also regulates process

A.

In the example above, regulation of transcription is_a form of regulation of geneexpression, which regulates gene expression. Therefore, regulation of transcription also

regulates gene expression.

Transitivity of regulates

The regulates relationship is transitive over both the is_a and part_of relationships.

Page 49: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 49/57

is_a transitivity: If process B exists in the GO biological process ontology and it is an

is_a child of process A then any process that regulates process B also regulates process

A. For example:

GO:0016049 : cell growth

[i] GO:0042815 : bipolar cell growth---[r] GO:0051516 : regulation of bipolar cell growth

Due to is_a transitivity, we can say that any process that regulates bipolar cell growth

also regulates cell growth.

 part_of transitivity: If process Y exists in the GO biological process ontology and it is a

 part_of child of process X then any process that regulates process Y also regulates process X.

GO:0001754 : eye photoreceptor cell differentiation

[p] GO:0042462 : eye photoreceptor cell development---[r] GO:0042478 : regulation of eye photoreceptor celldevelopment

Every GO term must obey the true path rule: if the child term describes the gene product,

then all its parent terms must also apply to that gene product.

Back to top 

What GO is NOT

It is important to clearly state the scope of GO, and what it does and does not cover. Theontologies section explains the domains covered by GO; the following areas are outside

the scope of GO, and terms in these domains would not appear in the ontologies.

• Gene products: e.g. cytochrome c is not in the ontologies, but attributes of cytochrome c, such as oxidoreductase activity, are.

• Processes, functions or components that are unique to mutants or diseases: e.g.

oncogenesis is not a valid GO term because causing cancer is not the normalfunction of any gene.

• Attributes of sequence such as intron/exon parameters: these are not attributes of 

gene products and will be described in a separate sequence ontology (see the

OBO website for more information).• Protein domains or structural features.

• Protein-protein interactions.

• Environment, evolution and expression.• Anatomical or histological features above the level of cellular components,

including cell types.

Page 50: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 50/57

GO is not a database of gene sequences, nor a catalog of gene products. Rather, GO

describes how gene products behave in a cellular context.

GO is not a dictated standard, mandating nomenclature across databases. Groups participate because of self-interest, and cooperate to arrive at a consensus.

GO is not a way to unify biological databases (i.e. GO is not a 'federated solution').

Sharing vocabulary is a step towards unification, but is not, in itself, sufficient. Reasons

for this include the following:

• Knowledge changes and updates lag behind.

• Individual curators evaluate data differently. While we can agree to use the word

'kinase', we must also agree to support this by stating how and why we use

'kinase', and consistently apply it. Only in this way can we hope to compare gene products and determine whether they are related.

• GO does not attempt to describe every aspect of biology; its scope is limited to

the domains described above.

Back to top 

Annotation and tools

How do the terms in GO become associated with their appropriate gene products?

Collaborating databases annotate their genes or gene products with GO terms, providing

references and indicating what kind of evidence is available to support the annotations.More information can be found in the GO Annotation Guide. 

If you browse any of the contributing databases, you'll find that each gene or gene product has a list of associated GO terms. Each database also publishes downloadable

files containing these associations; these can be downloaded from the GO annotations page. You can browse the ontologies using a range of web-based browsers. A full list of 

these, and other tools for analyzing gene function using GO, is available on the GO Tools

section.

In addition, the GO consortium has prepared GO slims, 'slimmed down' versions of theontologies that allow you to annotate genomes or sets of gene products to gain a high-

level view of gene functions. Using GO slims you can, for example, work out what

 proportion of a genome is involved in signal transduction, biosynthesis or reproduction.

See the GO Slim Guide for more information.

Back to top 

Downloads

Page 51: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 51/57

All data from the GO project is freely available. You can download the ontology data in a

number of different formats, including XML and mySQL, from the GO Downloads page.

For more information on the syntax of these formats, see the GO File Format Guide. 

If you need lists of the genes or gene products that have been associated with a particular 

GO term, the Current Annotations table tracks the number of annotations and provideslinks to the gene association files for each of the collaborating databases is available.

Back to top 

Beyond GO

GO allows us to annotate genes and their products with a limited set of attributes. For 

example, GO does not allow us to describe genes in terms of which cells or tissues they're

expressed in, which developmental stages they're expressed at, or their involvement indisease. It is not necessary for GO to do these things because other ontologies are being

developed for these purposes. The GO consortium supports the development of other ontologies and makes its tools for editing and curating ontologies freely available. A listof freely available ontologies that are relevant to genomics and proteomics and are

structured similarly to GO can be found at the Open Biomedical Ontologies website . A

larger list, which includes the ontologies listed at OBO and also other controlled

vocabularies that do not fulfill the OBO criteria is available at the Ontology WorkingGroup section of the Microarray Gene Expression Data (MGED) Network site .

Cross-products

The existence of several ontologies will also allow us to create 'cross-products' that

maximize the utility of each ontology while avoiding redundancy. For example, bycombining the developmental terms in the GO process ontology with a second ontologythat describes Drosophila anatomical structures, we could create an ontology of fly

development. We could repeat this process for other organisms without having to clutter 

up GO with large numbers of species-specific terms. Similarly, we could create anontology of biosynthetic pathways by combining the biosynthesis terms in the GO

 process ontology with a chemical ontology.

Mappings to other classification systems

GO is not the only attempt to build structured controlled vocabularies for genome

annotation, nor is it the only such series of catalogs in current use. The GO project

 provides mappings between GO and these other systems, although we caution that thesemappings are neither complete nor exact and should only to be used as a guide.

Back to top 

Contributing to GO

Page 52: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 52/57

The GO project is constantly evolving, and we welcome feedback from all users. If you

need a new term or definition, or would like to suggest that we reorganize a section of 

one of the ontologies, please do so through the GO curator requests tracker  . Any errorsor omissions in annotations should be reported to the GO annotation mailing list. 

Any other questions or suggestions should be addressed to the GO helpdesk . 

Back to top 

Last modified Tuesday, 25-Mar-2008 09:40:13 PDT

Cite GO • Terms of use • GO helpdesk  

Copyright © 1999-Tuesday, 20-May-2008 20:51:29 PDT the Gene Ontology 

File Format Guide

The GO File Format Guide documents the structure and syntax of the GO files availableon the GO website, to assist users who need to read, write parsers for, or create these

files.

See also the GO annotation file format guide for the format used in the gene associationfiles.

• Anatomy of a GO Term

• Ontology Flat File Formats

• GO RDF-XML Format

• OWL Format• OBO-XML Format

• MySQL Format

• FASTA Format

• Mappings to Other Classification Systems

Anatomy of a GO Term

Terms and unique identifiers

The structure of a GO term is very simple. At its bare minimum, each GO entry consistsof a term name (e.g. cell) and a unique, zero-padded seven-digit identifier (or accession

Page 53: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 53/57

number) prefixed by GO: (e.g. GO:0005623), which is used as a unique idenfier and

database cross-reference. The same number range is used across all three ontologies. The

numeric portion of a GO ID does not have any 'meaning' or relation to the position of theterm in the ontologies; instead, ranges of GO IDs are assigned to specific groups or 

individual curators, so a GO ID can be used to trace who added a term.

Secondary IDs

Terms may have one or more secondary IDs, alternate IDs that refer to the term.Secondary IDs come about when two or more terms are identical in meaning, and are

merged into a single term. All terms IDs are preserved so that no information (for 

example, annotations to the merged IDs) is lost. More information on the protocolsinvolved can be found in the documentation on term merges.

Synonyms

Any term may, but does not need to, include one or more synonyms (e.g. type I programmed cell death is a synonym of apoptosis). Synonyms are assigned a relationship

to the primary term string; see the documentation on synonyms for more information.

Database cross-references

Another optional extra is one or more general database cross-references (dbxrefs), which

refer to an identical object in another database. For instance, the molecular function term

retinal isomerase activity has the database cross reference EC:5.2.1.3, which is theaccession number of this enzyme activity in the Enzyme Commission database. There is a

complete list of database cross-references and database abbreviations used by GO 

available.

Definition and Comment

GO terms should be equipped with a text definition, which includes an indication of the

source of the definition. Terms may also have a comment, which gives more information

about the term and its usage.

Back to top 

Ontology Flat File FormatsThere are two types of ontology flat file format, the older GO flat file format and the

newer OBO flat file format. The GO flat file format is now deprecated but will continue

to be provided alongside the new format.

See also the Java OBO parser guide, which gives details of the OBO parser implemented

as part of OBO-Edit, and how to use it.

Page 54: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 54/57

Back to top 

GO RDF-XML Format

The GO RDF-XML version of GO, which includes all three ontologies and the

definitions, can be downloaded from the GO database archive. The document typedefinition ( DTD) is available from the GO FTP site.

The GO RDF-XML file is built from the flat files and the gene association files on a

monthly basis.

Here's a GO RDF-XML snapshot (with some lines wrapped for legibility):

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE go:go>

<go:go xmlns:go="xml-dtd/go.dtd#"xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"><go:version timestamp="Wed May 9 23:55:02 2001" /><rdf:RDF><go:term rdf:about="go#GO:0003673"><go:accession>GO:0003673</go:accession><go:name>Gene_Ontology</go:name><go:definition></go:definition>

</go:term><go:term rdf:about="go#GO:0003674"><go:accession>GO:0003674</go:accession><go:name>molecular_function</go:name><go:definition>The action characteristic of a gene

product.</go:definition>

<go:part-of rdf:resource="go#GO:0003673" /><go:dbxref><go:database_symbol>go</go:database_symbol><go:reference>curators</go:reference>

</go:dbxref></go:term><go:term rdf:about="go#GO:0016209"><go:accession>GO:0016209</go:accession><go:name>antioxidant</go:name><go:definition></go:definition><go:isa rdf:resource="go#GO:0003674" /><go:association><go:evidence evidence_code="ISS"><go:dbxref><go:database_symbol>fb</go:database_symbol><go:reference>fbrf0105495</go:reference>

</go:dbxref></go:evidence><go:gene_product><go:name>CG7217</go:name><go:dbxref><go:database_symbol>fb</go:database_symbol><go:reference>FBgn0038570</go:reference>

Page 55: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 55/57

</go:dbxref></go:gene_product>

</go:association><go:association><go:evidence evidence_code="ISS"><go:dbxref><go:database_symbol>fb</go:database_symbol><go:reference>fbrf0105495</go:reference>

</go:dbxref></go:evidence><go:gene_product><go:name>Jafrac1</go:name><go:dbxref><go:database_symbol>fb</go:database_symbol><go:reference>FBgn0040309</go:reference>

</go:dbxref></go:gene_product>

</go:association></go:term>

</rdf:RDF>

</go:go> 

The basic unit of the GO RDF-XML database is GO:termid. Owing to limitations of the

XML id and idref attributes (for instance, multiple parentage cannot be represented), the

linking mechanism is RDF. RDF provides a much more flexible system for representingtrees. To follow the links, note that term molecular function ; GO:0003674 has the

attribute

rdf:about="go#GO:0003674" 

This is roughly equivalent to

id="go#GO:0003674" 

In rdf, unique urls are used as ids to make them universally unique. Now, note that term

antioxidant activity ; GO:0016209 has the tag

<go:isardf:resource="go#GO:0003674" /> 

This shows that its parent is molecular function ; GO:0003674. This tag represents the

relationship "GO:0016209 isa GO:0003674" or, in plain English, "antioxidant is a

molecular function". The other type of parentage relationship is go:part-of. molecular 

function ; GO:0003674 has the tag

<go:part-ofrdf:resource="go#GO:0003673" /> 

This shows the relationship "molecular function is part of the Gene Ontology".

Page 56: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 56/57

Page 57: OntologyGSK

8/6/2019 OntologyGSK

http://slidepdf.com/reader/full/ontologygsk 57/57

go_YYYYMM-schema-html

Designed for viewing with a web browser; does not contain full documentation.

Further documentation on the GO database can be found in the GO database guide. 

Back to top 

FASTA Format

There is a FASTA version of the gene products in the database available from the

database archives. 

Back to top 

Mappings to Other Classification Systems

Mappings of GO have been made to other many other classification systems; a full list is

available on the Mappings to GO page. The syntax of these files is as follows:

The source of the external file is given in the line beginning !Uses: 

!Uses:http://www.tigr.org/docs/tigr-scripts/egad_scripts/role_reports.spl, 15 aug 2000. 

The line syntax for mappings is

external database:term identifier (id/name) > GO:GO term name ; GO:id

For example:

TIGR_role:11030 73 Amino acid biosynthesis Glutamate family >GO:glutamine family amino-acid biosynthesis ; GO:0009084 

all on a single line. The relationship between terms from external systems to GO terms

can also be one to many, and these should just be added with a further >. For example:

MultiFun:1.5.1.18 Isoleucine/valine > GO:isoleucine biosynthesis ;GO:0009097 > GO:valine biosynthesis ; GO:0009099 

If no equivalent GO term exists for a term from another classification system, GO:.

should be added as a mapping. For example: