other biological databases and ontologies

43
Other biological databases and ontologies

Upload: marja

Post on 09-Jan-2016

33 views

Category:

Documents


0 download

DESCRIPTION

Other biological databases and ontologies. Biological systems. Sequence data. Protein folding and 3D structure. Taxonomic data Literature. Pathways and networks. Protein families and domains. Small molecules. Whole genome data. Ontologies -GO. Biological systems. Ontologies. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Other biological databases and ontologies

Other biological databases and ontologies

Page 2: Other biological databases and ontologies

Biological systems

Taxonomic data

Literature

Protein folding and 3D structure

Small molecules

Pathways and networks

Biological systems

Protein families and domains

Whole genome data

Sequence data

Ontologies -GO

Page 3: Other biological databases and ontologies

Ontologies

• An ontology is a formal specification of terms and relationships between them –widely used in biology and boinformatics (e.g. taxonomy)

• The relationships are important and represented as graphs• Ontology terms should have definitions• Ontologies are machine-readable• They are needed for ordering and comparing large data

sets

Page 4: Other biological databases and ontologies

• What is a cell?

What’s in a name?

Page 5: Other biological databases and ontologies

What’s in a name?

• What is a cell?

Page 6: Other biological databases and ontologies

Ambiguities in naming• The same name can be used to describe different

concepts, e.g:– Glucose synthesis– Glucose biosynthesis– Glucose formation– Glucose anabolism– Gluconeogenesis

• All refer to the process of making glucose• Makes it difficult to compare the information• Solution: use Ontologies and Data Standards

Page 7: Other biological databases and ontologies

Gene Ontology (GO)

http://www.geneontology.org

• Controlled vocabulary/ontology

• Introduced to provide standardised way of annotating gene products (http://www.geneontology.org)

• Used for functional annotation of genes or proteins

Page 8: Other biological databases and ontologies

GO ontologies

• Molecular function: – tasks performed by gene product –e.g. G-protein coupled

receptor

• Biological process: – broad biological goals accomplished by one or more gene

products –e.g. G-protein signaling pathway

• Cellular component: – part(s) of a cell of which a gene product is a component;

includes extracellular environment of cells –e.g nucleus, membrane etc.

Page 9: Other biological databases and ontologies

GO term examples

• GO terms arranged in DAG

• Relationships between terms

Page 10: Other biological databases and ontologies

How to annotate to GO

• See if gene product annotated already e.g. by MOD or GOA

• Manual annotation –need evidence codes

• Blast2GO

• Using GO mapping files (e.g. InterPro, EC, Swiss-Prot keyword)

Page 11: Other biological databases and ontologies

Multiple GO terms

Process mappings:

-Cell communication (IPR2GO)

-GPCR pathways (SPKW2GO)

-GPCR pathways (IDA)

Select most manual first, then most specific

Page 12: Other biological databases and ontologies

Finding existing GO annotation

• Small-scale –QuickGO or AmiGO browsers

• Large-scale:– GOA FTP site

• GOA proteomes (>25% coverage)

• GOA human, mouse, rat, cow, zebrafish, Arabidopsis, etc.

• GOA UniProt

– Proteome Analysis

Page 13: Other biological databases and ontologies

Searching GOA in QuickGO

• http://www.ebi.ac.uk/ego

Page 14: Other biological databases and ontologies

Microarray data analysis

Proteomics data analysis

Larkin JE et al, Physiol Genomics, 2004

Cunliffe HE et al, Cancer Res, 2003

GO classification

GO classification

Analysis of high-throughput data

Uses of GO annotation

Page 15: Other biological databases and ontologies

Open Biomedical Ontologies (OBO)

http://obo.sourceforge.net• Central web location for accessing well-structured

CVs and ontologies for use in the biological and medical sciences.

• Provides a simple format for ontologies that encodes terms, relationships between terms and definitions of terms (Not all OBO ontologies use this format however).

Page 16: Other biological databases and ontologies

Scope of OBO• Anatomy• Animal natural history and life history• Chemical• Development• Ethology• Evidence codes• Experimental conditions• Genomic and proteomic• Metabolomics• OBO relationship types• Phenotype• Taxonomic classification• Vocabularies

Page 17: Other biological databases and ontologies

Other Biological Databases

• Transcription factor binding sites -TRANSFAC

• Protein structure databases- PDB, SCOP, CATH

• Protein family databases- Pfam, Prints, PROSITE etc.

• Chemicals and small molecules -ChEBI

• Gene expression databases –GEO, ArrayExpress

• Metabolic pathways - Reactome, KEGG

• Genome Databases- Ensembl, FlyBase, WormBase etc.

Page 18: Other biological databases and ontologies

Transcription factor binding sites

• TRANSFAC –database of eukaryotic transcription factors: http://www.gene-regulation.com/pub/databases.html#transfac

• TESS –Transcription Element Search System –for predicting transcription factor binding sites, uses TRANSFAC: http://www.cbi.upenn.edu/tess

• TFsearch –for searching transcription factor binding sites: http://www.cbrc.jp/research/db/TFSEARCH.html

Page 19: Other biological databases and ontologies

Protein structure databases

• Main resource is Protein Data Bank (PDB): http://www.rcsb.org/pdb/

• Repository for solved structures• Can search by PDB code• Structural family databases based on PDB –SCOP

(http://scop.mrc-lmb.cam.ac.uk/scop/) and CATH (http://www.biochem.ucl.ac.uk/bsm/cath/)

• Predicted structures in SWISS-MODEL (http://swissmodel.expasy.org//SWISS-MODEL.html)

Page 20: Other biological databases and ontologies

Searching MSD

http://www.ebi.ac.uk/msd -Search by PDB code

Page 21: Other biological databases and ontologies

Link to CATH

Page 22: Other biological databases and ontologies

Protein family databases

• Databases that produce signatures for identifying protein families or domains

• Used for functional classification of proteins

• E.g. Pfam, PROSITE, Prints, SMART, TIGRFAMs etc.

• Integrated into single resource InterPro (http://www.ebi.ac.uk/interpro)

Page 23: Other biological databases and ontologies

InterProScan sequence search

Stand-alone version available

Page 24: Other biological databases and ontologies

Results for protein acc

Page 25: Other biological databases and ontologies

Example InterPro

entry

Page 26: Other biological databases and ontologies

Chemicals and small molecules

• Chemical abstracts- http://www.cas.org/• ChEBI- http://www.ebi.ac.uk/chebi• KEGG –part of it includes chemicals

http://www.genome.jp/kegg • ChemID plus -chemicals cited in NLM databases

http://chem2.sis.nlm.nih.gov/chemidplus/chemidlite.jsp

• MSD-Chem –ligands and chemicals in MSD

Page 27: Other biological databases and ontologies

CheBI example entry

Page 28: Other biological databases and ontologies

Hierarchy for

chemicals

Page 29: Other biological databases and ontologies

Gene expression databases

• NCBI Gene Expression Omnibus (GEO) http://www.ncbi.nlm.nih.gov/geo/

• ArrayExpress http://www.ncbi.nlm.nih.gov/geo/

• Stanford microarray database http://genome-www5.stanford.edu/

• Can usually search for experiments or particular expression profiles

Page 30: Other biological databases and ontologies

GEO search page

Page 31: Other biological databases and ontologies

Profiles search results

Page 32: Other biological databases and ontologies

Specific entry and experiment info

Page 33: Other biological databases and ontologies

ArrayExpress search results

Page 34: Other biological databases and ontologies

Metabolic Pathways• PATHGUIDE >200 pathways• KEGG (Kyoto encyclopedia of genes and genomes):

http://www.genome.jp/kegg -includes:– Database of chemicals, genes and networks (metabolic,

regulatory etc.)– Well-curated and quite specific

• EcoCyc (Encyclopedia of E. coli K12 genes and metabolism): http://ecocyc.org –curation of entries genome

• Reactome –curated biological pathways: http://www.reactome.org/

• GenMAPP –pathways contributed by users

Page 35: Other biological databases and ontologies

Pathway in Reactome

Page 36: Other biological databases and ontologies

Example of a pathway in BioCyc

Page 37: Other biological databases and ontologies

Protein-protein interaction databases

• Protein-protein interaction databases store pairwise interactions or complexes

• IntAct http://www.ebi.ac.uk/intact

• DIP (Database of Interacting Proteins) http://dip.doe-mbi.ucla.edu/

• BIND (Biomolecular Interaction Network Database) http://submit.bind.ca:8080/bind/

Page 38: Other biological databases and ontologies

Protein-protein interactions

Page 39: Other biological databases and ontologies

Genome browsers

• Integrate sequence & functional data for a genome• Ensembl –genome browser for major eukaryotic genomes,

e.g. human, mouse etc. http://www.ensembl.org• UCSC browser -http://genome.ucsc.edu/ • FlyBase –Drosophila genome database:

http://www.ebi.ac.uk/flybase• WormBase –C. elegans: http://www.wormbase.org• PlasmoDB –Plasmodium (malaria): http://plasmodb.org• Etc.

Page 40: Other biological databases and ontologies

Ensembl genome browser

Page 41: Other biological databases and ontologies

Ensembl gene view 1

Page 42: Other biological databases and ontologies

Ensembl gene view 2

Page 43: Other biological databases and ontologies

Gene within context on chromosome