other biological databases

Post on 27-Jan-2016

58 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Other biological databases. Biological systems. Sequence data. Protein folding and 3D structure. Taxonomic data Literature. Pathways and networks. Protein families and domains. Small molecules. Whole genome data. Ontologies -GO. Biological systems. Other Biological Databases. - PowerPoint PPT Presentation

TRANSCRIPT

Other biological databases

Biological systems

Taxonomic data

Literature

Protein folding and 3D structure

Small molecules

Pathways and networks

Biological systems

Protein families and domains

Whole genome data

Sequence data

Ontologies -GO

Other Biological Databases

• Transcription factor binding sites -TRANSFAC

• Protein structure databases- PDB, SCOP, CATH

• Protein family databases- Pfam, Prints, PROSITE etc.

• Chemicals and small molecules -ChEBI

• Gene expression databases –GEO, ArrayExpress

• Metabolic pathways - Reactome, KEGG

• Genome Databases- Ensembl, FlyBase, WormBase etc.

• Human genetics-related databases –HapMap, dbSNP

Transcription factor binding sites

• TRANSFAC –database of eukaryotic transcription factors: http://www.gene-regulation.com/pub/databases.html#transfac

• TESS –Transcription Element Search System –for predicting transcription factor binding sites, uses TRANSFAC: http://www.cbi.upenn.edu/tess

• TFsearch –for searching transcription factor binding sites: http://www.cbrc.jp/research/db/TFSEARCH.html

Protein structure databases

• Main resource is Protein Data Bank (PDB): http://www.rcsb.org/pdb/

• Contains the spatial coordinates of macromolecule atoms whose 3D structure has been obtained by X-ray or NMR studies

• Proteins represent more than 90% of available structures (others are DNA, RNA, sugars, viruses, protein/DNA complexes…)

• Can search by PDB code

Searching MSD

http://www.ebi.ac.uk/msd -Search by PDB code

Protein structure-related databases

• Structural family databases based on PDB –SCOP (http://scop.mrc-lmb.cam.ac.uk/scop/) and CATH (http://www.biochem.ucl.ac.uk/bsm/cath/)

• Predicted structures in SWISS-MODEL (http://swissmodel.expasy.org//SWISS-MODEL.html)

Protein family databases

• Databases that produce signatures for identifying protein families or domains

• Used for functional classification of proteins

• E.g. Pfam, PROSITE, Prints, SMART, TIGRFAMs etc.

• Integrated into single resource InterPro (http://www.ebi.ac.uk/interpro)

InterProScan sequence search

Stand-alone version available

InterPro text search

Search keyword, protein acc or InterPro acc

Results for

protein acc

Example InterPro

entry

Chemicals and small molecules

• Chemical abstracts- http://www.cas.org/• ChEBI- http://www.ebi.ac.uk/chebi• KEGG –part of it includes chemicals

http://www.genome.jp/kegg • ChemID plus -chemicals cited in NLM databases

http://chem2.sis.nlm.nih.gov/chemidplus/chemidlite.jsp

• MSD-Chem –ligands and chemicals in MSD

CheBI example entry

Hierarchy for

chemicals

Gene expression databases

• NCBI Gene Expression Omnibus (GEO) http://www.ncbi.nlm.nih.gov/geo/

• ArrayExpress http://www.ncbi.nlm.nih.gov/geo/

• Stanford microarray database http://genome-www5.stanford.edu/

• Can usually search for experiments or particular expression profiles

GEO search page

Profiles search results

Specific entry and experiment info

ArrayExpress search results

What does the data look like?

• Info on experiment, array used, etc.

• Raw or processed tab delimited file containing spots and their intensities cy3/cy5 ratios) across different samples

• Files with meta data e.g. sample info, annotation and coordinates of each spot on array

Proteomics: SWISS-2DPAGE

Enzymes and metabolic pathways

• Contain information describing enzymes, biochemical reactions and metabolic pathways;

• ENZYME and BRENDA: nomenclature databases that store information on enzyme names and reactions;

• IntEnz: Integrated relational Enzyme database

Enzyme nomenclature• E.C. (Enzyme Commission) numbers assigned based

on reactions they catalyze

• Hierarchy, high level groups:– EC 1 –Oxidoreductases– EC 2 –Transferases– EC 3 –Hydrolases– EC 4 –Lyases– EC 5 –Isomerases– EC 6 –Ligases

EC example

Metabolic Pathway databases• PATHGUIDE >200 pathways• KEGG (Kyoto encyclopedia of genes and genomes):

http://www.genome.jp/kegg -includes:– Database of chemicals, genes and networks (metabolic,

regulatory etc.)– Well-curated and quite specific

• EcoCyc (Encyclopedia of E. coli K12 genes and metabolism): http://ecocyc.org –curation of entries genome

• Reactome –curated biological pathways: http://www.reactome.org/

• GenMAPP –pathways contributed by users

http://www.genome.ad.jp/kegg

Different pathway in different species: -> comparison

Pathway in Reactome

Example of a pathway in BioCyc

Protein-protein interaction databases

• Protein-protein interaction databases store pairwise interactions or complexes

• Can get 1 to more than 20,000 interactions per publication• IntAct http://www.ebi.ac.uk/intact • DIP (Database of Interacting Proteins) http://dip.doe-

mbi.ucla.edu/• BIND (Biomolecular Interaction Network Database)

http://submit.bind.ca:8080/bind/

Protein-protein interactions in IntAct

Integrated functional interactions in STRING

Genome browsers

• Integrate sequence & functional data for a genome• Ensembl –genome browser for major eukaryotic genomes,

e.g. human, mouse etc. http://www.ensembl.org• UCSC browser -http://genome.ucsc.edu/ • FlyBase –Drosophila genome database:

http://www.ebi.ac.uk/flybase• WormBase –C. elegans: http://www.wormbase.org• PlasmoDB –Plasmodium (malaria): http://plasmodb.org• Etc.

Ensembl genome browser

Ensembl gene view 1

Ensembl gene view 2

Gene within context on chromosome

Human genetics databases

• GeneCards (http://www.genecards.org/)

• HapMap (http://hapmap.ncbi.nlm.nih.gov/)

• OMIM http://www.ncbi.nlm.nih.gov/omim

• HGDP Human Genome Diversity Project (http://hagsc.org/hgdp/files.html)

Most of the databases are disease or gene centric i.e. p53

Mutation/polymorphism databases

dbSNPhttp://www.ncbi.nlm.nih.gov/SNP/

Repository of all known mutation (human and other organisms)

Where to find the databases

• Table of addresses for major databases and tools

• Nucleic Acids Research Database issue January each year

• Nucleic Acids Research Software issue –new

• Expasy list of tools: http://ca.expasy.org/links.html

Large scale data retrieval

• Programmatic access to many databases

• MySQL access to some

• BioMart access –public and private

• FTP sites –large data downloads

Other tutorials

• http://www.ensembl.org/info/website/tutorials/index.html

• http://www.ebi.ac.uk/training/online/

• http://www.ebi.ac.uk/2can/home.html

top related