what is an ontology
DESCRIPTION
What is an Ontology. An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common knowledge) Terms represent a controlled vocabulary, and define the concepts of a domain. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: What is an Ontology](https://reader033.vdocument.in/reader033/viewer/2022061507/56812cda550346895d9199f1/html5/thumbnails/1.jpg)
What is an Ontology
• An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common knowledge)
• Terms represent a controlled vocabulary, and define the concepts of a domain.
• Terms are linked by relationships, which constitute a semantic network.
• Ontologies augment natural language annotations and can be more easily processed computationally. (becomes the language of the domain it describes for communication, coordination and collaboraton)
![Page 2: What is an Ontology](https://reader033.vdocument.in/reader033/viewer/2022061507/56812cda550346895d9199f1/html5/thumbnails/2.jpg)
Why We Need Ontology in Bioinformatics
• Biologists need knowledge in order to perform their work.• Sequence comparison to infer the function.
• Biologists need knowledge for communication, but such knowledge may be represented in different ways.• Different use of gene:
• The coding region of DNA
• DNA fragment that can be transcripted and translated into a protein
• DNA region of biological interest with a name and that carries a genetic trait or phenotype
![Page 3: What is an Ontology](https://reader033.vdocument.in/reader033/viewer/2022061507/56812cda550346895d9199f1/html5/thumbnails/3.jpg)
The Gene Ontology (GO)
• Provides structured vocabularies for describing gene products in the domain of molecular biology.
• Enables a common understanding of model organisms and between databases
• Consisted of three structurally unlinked hierarchies (molecular function, biological process and cellular component).
• 2 types of relationships between terms:
• is-a: subclass.
• part-of: physical part of, or subprocess of.
![Page 4: What is an Ontology](https://reader033.vdocument.in/reader033/viewer/2022061507/56812cda550346895d9199f1/html5/thumbnails/4.jpg)
Why Gene Ontology?
• Without structured vocabularies, different sources can refer to the same concept using different terms (e.g., cdc54 in yeast is MCM4 in mouse).
• What is a well-known shorthand in one research community is gibberish in another. Contributions by one research community may not be recognized by others.
• Without coordination, research work may be duplicated.
• The goal of the Gene Ontology Consortium is to produce a controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.
![Page 5: What is an Ontology](https://reader033.vdocument.in/reader033/viewer/2022061507/56812cda550346895d9199f1/html5/thumbnails/5.jpg)
Three GO Hierarchies
• Molecular function: elemental activity/task (what)
(e.g., DNA-binding, polymerase, transcription factor) (what a gene does at the biochemical level)
• Biological process: goal or objective (why)
(e.g., mitosis, DNA replication, cell cycle control) (A broad biological perspective – not currently a pathway)
• Cellular component: location within cellular structures and macromolecular complex (where)
(e.g., nucleus, ribosome, pre-replication complex)
(Each GO hierarchy has a DAG structure. A child term may have many parent terms)(Gene Ontology information can be accessed at http://www.geneontology.org/)
![Page 6: What is an Ontology](https://reader033.vdocument.in/reader033/viewer/2022061507/56812cda550346895d9199f1/html5/thumbnails/6.jpg)
Example: Gene Ontology Hierarchy
Biological process(GO:0008150)
Behavior(GO:0007610)
Cellular process(GO:0009987)
Development(GO:0007275)
Physiological(GO:0007582)
Cell death(GO:0008219)
Cell aging(GO:0007569)
Programmed(GO:0012501)
Apoptosis(GO:0006915)
Induction(GO:0012502)
Autophagic cell death(GO:0048102)
HS response(GO:0009626)
… … …
… … …
… … … … …
…
… … … …
… … …
… … …
…
i
i i i i
i i i
P i
P i
i
P
i is a
part of
Communication(GO:0007154)
Cell growth(GO:0008151)
…i …i
![Page 7: What is an Ontology](https://reader033.vdocument.in/reader033/viewer/2022061507/56812cda550346895d9199f1/html5/thumbnails/7.jpg)
Pi is-a part-of
![Page 8: What is an Ontology](https://reader033.vdocument.in/reader033/viewer/2022061507/56812cda550346895d9199f1/html5/thumbnails/8.jpg)
Gene Annotation Using GO Terms
• Association of GO terms with gene products based on evidence from literature reference or computational analysis.
• The creation of GO and the association of GO terms with gene products (gene annotation) are two independent operations.
• A gene can be associated with one or more GO terms (gene categories), and one category normally has many genes (many-to-many relationship between genes and GO terms)
![Page 9: What is an Ontology](https://reader033.vdocument.in/reader033/viewer/2022061507/56812cda550346895d9199f1/html5/thumbnails/9.jpg)
mouse
fly
yeast
Gene Product Associations to an Ontology
GO IDDB IDEvidence codeReference CitationNOT
IDTermDefinitionOntologySynonyms
Is-a| Part-ofNode1 IDNode2 ID
![Page 10: What is an Ontology](https://reader033.vdocument.in/reader033/viewer/2022061507/56812cda550346895d9199f1/html5/thumbnails/10.jpg)
Example: Part of Molecular Function
![Page 11: What is an Ontology](https://reader033.vdocument.in/reader033/viewer/2022061507/56812cda550346895d9199f1/html5/thumbnails/11.jpg)
Example: Part of Biological Process
![Page 12: What is an Ontology](https://reader033.vdocument.in/reader033/viewer/2022061507/56812cda550346895d9199f1/html5/thumbnails/12.jpg)
Example: Part of Cellular Component
![Page 13: What is an Ontology](https://reader033.vdocument.in/reader033/viewer/2022061507/56812cda550346895d9199f1/html5/thumbnails/13.jpg)
Genes of a Biological Process Tend to Be Co-Regulated
Gene Names BiologicalProcess
![Page 14: What is an Ontology](https://reader033.vdocument.in/reader033/viewer/2022061507/56812cda550346895d9199f1/html5/thumbnails/14.jpg)
Use Gene Ontology (GO) to Annotate Genes
• GO URL: http://www.geneontology.org/
• Two concepts:
• Gene Ontology: Provides structured vocabularies for describing gene products in the domain of molecular biology (all species share the same gene ontology)
• Annotations: Association of GO terms with gene products based on evidence from literature reference or computational analysis (each species has a separate annotation file)
![Page 15: What is an Ontology](https://reader033.vdocument.in/reader033/viewer/2022061507/56812cda550346895d9199f1/html5/thumbnails/15.jpg)
The Gene Ontology (GO)
• GO file: http://www.geneontology.org/ontology/gene_ontology.obo
• An example of GO term[Term]
id: GO:0000001 (A unique id for the GO term)
name: mitochondrion inheritance (The name of the GO term)
namespace: biological_process (see next slide)
def: "The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cytoskeleton." [PMID:10873824, PMID:11389764, SGD:mcc] (A detailed description of the GO term)
is_a: GO:0048308 ! organelle inheritance
is_a: GO:0048311 ! mitochondrion distribution
![Page 16: What is an Ontology](https://reader033.vdocument.in/reader033/viewer/2022061507/56812cda550346895d9199f1/html5/thumbnails/16.jpg)
Gene Annotation Using GO Terms
• http://www.geneontology.org/GO.current.annotations.shtml
• Select the annotation file for a particular species
• An example of an annotation entry for yeast
SGD S000004660 AAC1 GO:0005743SGD_REF:S000050955|PMID:2167309 TAS CADP/ATP translocator YMR056C gene taxon:4932
“AAC1” is the gene name“GO:0005743” is the GO id, we can link it to the corresponding item in the
ontology file“SGD_REF:S000050955|PMID:2167309” is where this annotation comes from“C” means this annotation belongs to the “cellular component” namespace“ADP/ATP translocator” is a brief description of this annotation“YMR056C” is another name for this gene“taxon:4932” means this is a yeast gene
![Page 17: What is an Ontology](https://reader033.vdocument.in/reader033/viewer/2022061507/56812cda550346895d9199f1/html5/thumbnails/17.jpg)
Gene Annotation Using GO Terms
Given a list of genes L from a specific species Sj1) go to http://www.geneontology.org/GO.current.annotations.shtml
2) select and download the annotation file Fj for Sj
For each gene Gi in list L3) find the annotation entry Ek for Gi in Fj
4) find the GO term id from entry Ek
5) go to http://www.geneontology.org/ontology/gene_ontology.obo
6) find the GO term in the ontology file, the GO term provides more detailed annotation for this gene
![Page 18: What is an Ontology](https://reader033.vdocument.in/reader033/viewer/2022061507/56812cda550346895d9199f1/html5/thumbnails/18.jpg)
Use of GO to Annotation Genes
Notations
Total number of genes in the data set : N
Total number of genes assigned to term T: M
Number of genes in the list: n
Number of genes in the list and assigned to term T: m
Problem: Given a list of n genes, whether they are significantly associated with a specific GO term ?
Solution: Calculate the p-Value.
![Page 19: What is an Ontology](https://reader033.vdocument.in/reader033/viewer/2022061507/56812cda550346895d9199f1/html5/thumbnails/19.jpg)
How to Assess Overrepresentation of a GO Term?
Genes on an array:
Total number of genes (N): 2,285
Number of genes – cell cycle (M): 161
Genes in a cluster:
Number of genes in the cluster (n): 147
Number of genes – cell cycle (m): 25
Is the GO term (i.e., cell cycle) significantly overrepresented in the cluster?
![Page 20: What is an Ontology](https://reader033.vdocument.in/reader033/viewer/2022061507/56812cda550346895d9199f1/html5/thumbnails/20.jpg)
Hyper-geometric Distribution
Given the total number of genes in the data set associated with term T is M, if randomly draw n genes from the data set N, what is the probability that m of the selected n genes will be associated with T?
n
N
mn
MN
m
M
nMNm ),,|Pr(
![Page 21: What is an Ontology](https://reader033.vdocument.in/reader033/viewer/2022061507/56812cda550346895d9199f1/html5/thumbnails/21.jpg)
P-Value
Based on Hyper-geometric distribution, the probability of having m genes or fewer associated to T in N can be calculated by summing the probabilities of a random list of N genes having 1, 2, …, m genes associated to T. So the p-value of over-representation is as follows:
),min( nM
mi
n
N
in
MN
i
M
p
![Page 22: What is an Ontology](https://reader033.vdocument.in/reader033/viewer/2022061507/56812cda550346895d9199f1/html5/thumbnails/22.jpg)
MAPPFinder
• A tool for mapping gene expression data to the GO hierarchies.
• Part of the free software package GenMAPP.
• Available at http://
www.genmapp.org/.
(Doniger et al., 2003)
![Page 23: What is an Ontology](https://reader033.vdocument.in/reader033/viewer/2022061507/56812cda550346895d9199f1/html5/thumbnails/23.jpg)
MAPPFinder Sample Output
(Doniger et al., 2003)
![Page 24: What is an Ontology](https://reader033.vdocument.in/reader033/viewer/2022061507/56812cda550346895d9199f1/html5/thumbnails/24.jpg)
GoMiner
(Zeeberg et al., 2003)
• A client-server application using Java (data on the server side).• Available at http://discover.nci.nih.gov/gominer/.
![Page 25: What is an Ontology](https://reader033.vdocument.in/reader033/viewer/2022061507/56812cda550346895d9199f1/html5/thumbnails/25.jpg)
Onto-Express• A web application for GO-based microarray data
analysis (http://vortex.cs.wayne.edu/Projects.html).
• The input to Onto-Express is a list of Affymetrix probe IDs, GenBank sequence accessions or UniGene cluster IDs.
• Part of the integrated Onto-Tools, including:– Onto-Compare: compare commercial arrays.– Onto-Design: help array design (probe selection).– Onto-Translate: provide mapping of different IDs.
p GO # genes (Genes linked to poor breast cancer outcome)