other biological databases. biological systems taxonomic data literature protein folding and 3d...

43
Other biological databases

Upload: osborne-preston

Post on 18-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

Other biological databases

Page 2: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

Biological systems

Taxonomic data

Literature

Protein folding and 3D structure

Small molecules

Pathways and networks

Biological systems

Protein families and domains

Whole genome data

Sequence data

Ontologies -GO

Page 3: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

Other Biological Databases

• Transcription factor binding sites -TRANSFAC

• Protein structure databases- PDB, SCOP, CATH

• Protein family databases- Pfam, Prints, PROSITE etc.

• Chemicals and small molecules -ChEBI

• Gene expression databases –GEO, ArrayExpress

• Metabolic pathways - Reactome, KEGG

• Genome Databases- Ensembl, FlyBase, WormBase etc.

• Human genetics-related databases –HapMap, dbSNP

Page 4: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

Transcription factor binding sites

• TRANSFAC –database of eukaryotic transcription factors: http://www.gene-regulation.com/pub/databases.html#transfac

• TESS –Transcription Element Search System –for predicting transcription factor binding sites, uses TRANSFAC: http://www.cbi.upenn.edu/tess

• TFsearch –for searching transcription factor binding sites: http://www.cbrc.jp/research/db/TFSEARCH.html

Page 5: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

Protein structure databases

• Main resource is Protein Data Bank (PDB): http://www.rcsb.org/pdb/

• Contains the spatial coordinates of macromolecule atoms whose 3D structure has been obtained by X-ray or NMR studies

• Proteins represent more than 90% of available structures (others are DNA, RNA, sugars, viruses, protein/DNA complexes…)

• Can search by PDB code

Page 6: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

Searching MSD

http://www.ebi.ac.uk/msd -Search by PDB code

Page 7: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

Protein structure-related databases

• Structural family databases based on PDB –SCOP (http://scop.mrc-lmb.cam.ac.uk/scop/) and CATH (http://www.biochem.ucl.ac.uk/bsm/cath/)

• Predicted structures in SWISS-MODEL (http://swissmodel.expasy.org//SWISS-MODEL.html)

Page 8: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

Protein family databases

• Databases that produce signatures for identifying protein families or domains

• Used for functional classification of proteins

• E.g. Pfam, PROSITE, Prints, SMART, TIGRFAMs etc.

• Integrated into single resource InterPro (http://www.ebi.ac.uk/interpro)

Page 9: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

InterProScan sequence search

Stand-alone version available

Page 10: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

InterPro text search

Search keyword, protein acc or InterPro acc

Page 11: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

Results for

protein acc

Page 12: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

Example InterPro

entry

Page 13: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

Chemicals and small molecules

• Chemical abstracts- http://www.cas.org/• ChEBI- http://www.ebi.ac.uk/chebi• KEGG –part of it includes chemicals

http://www.genome.jp/kegg • ChemID plus -chemicals cited in NLM databases

http://chem2.sis.nlm.nih.gov/chemidplus/chemidlite.jsp

• MSD-Chem –ligands and chemicals in MSD

Page 14: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

CheBI example entry

Page 15: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

Hierarchy for

chemicals

Page 16: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

Gene expression databases

• NCBI Gene Expression Omnibus (GEO) http://www.ncbi.nlm.nih.gov/geo/

• ArrayExpress http://www.ncbi.nlm.nih.gov/geo/

• Stanford microarray database http://genome-www5.stanford.edu/

• Can usually search for experiments or particular expression profiles

Page 17: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

GEO search page

Page 18: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

Profiles search results

Page 19: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

Specific entry and experiment info

Page 20: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

ArrayExpress search results

Page 21: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

What does the data look like?

• Info on experiment, array used, etc.

• Raw or processed tab delimited file containing spots and their intensities cy3/cy5 ratios) across different samples

• Files with meta data e.g. sample info, annotation and coordinates of each spot on array

Page 22: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

Proteomics: SWISS-2DPAGE

Page 23: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

Enzymes and metabolic pathways

• Contain information describing enzymes, biochemical reactions and metabolic pathways;

• ENZYME and BRENDA: nomenclature databases that store information on enzyme names and reactions;

• IntEnz: Integrated relational Enzyme database

Page 24: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

Enzyme nomenclature• E.C. (Enzyme Commission) numbers assigned based

on reactions they catalyze

• Hierarchy, high level groups:– EC 1 –Oxidoreductases– EC 2 –Transferases– EC 3 –Hydrolases– EC 4 –Lyases– EC 5 –Isomerases– EC 6 –Ligases

Page 25: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

EC example

Page 26: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

Metabolic Pathway databases• PATHGUIDE >200 pathways• KEGG (Kyoto encyclopedia of genes and genomes):

http://www.genome.jp/kegg -includes:– Database of chemicals, genes and networks (metabolic,

regulatory etc.)– Well-curated and quite specific

• EcoCyc (Encyclopedia of E. coli K12 genes and metabolism): http://ecocyc.org –curation of entries genome

• Reactome –curated biological pathways: http://www.reactome.org/

• GenMAPP –pathways contributed by users

Page 27: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

http://www.genome.ad.jp/kegg

Different pathway in different species: -> comparison

Page 28: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

Pathway in Reactome

Page 29: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

Example of a pathway in BioCyc

Page 30: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

Protein-protein interaction databases

• Protein-protein interaction databases store pairwise interactions or complexes

• Can get 1 to more than 20,000 interactions per publication• IntAct http://www.ebi.ac.uk/intact • DIP (Database of Interacting Proteins) http://dip.doe-

mbi.ucla.edu/• BIND (Biomolecular Interaction Network Database)

http://submit.bind.ca:8080/bind/

Page 31: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

Protein-protein interactions in IntAct

Page 32: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

Integrated functional interactions in STRING

Page 33: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

Genome browsers

• Integrate sequence & functional data for a genome• Ensembl –genome browser for major eukaryotic genomes,

e.g. human, mouse etc. http://www.ensembl.org• UCSC browser -http://genome.ucsc.edu/ • FlyBase –Drosophila genome database:

http://www.ebi.ac.uk/flybase• WormBase –C. elegans: http://www.wormbase.org• PlasmoDB –Plasmodium (malaria): http://plasmodb.org• Etc.

Page 34: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

Ensembl genome browser

Page 35: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

Ensembl gene view 1

Page 36: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

Ensembl gene view 2

Page 37: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

Gene within context on chromosome

Page 38: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

Human genetics databases

• GeneCards (http://www.genecards.org/)

• HapMap (http://hapmap.ncbi.nlm.nih.gov/)

• OMIM http://www.ncbi.nlm.nih.gov/omim

• HGDP Human Genome Diversity Project (http://hagsc.org/hgdp/files.html)

Page 39: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

Most of the databases are disease or gene centric i.e. p53

Mutation/polymorphism databases

Page 40: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

dbSNPhttp://www.ncbi.nlm.nih.gov/SNP/

Repository of all known mutation (human and other organisms)

Page 41: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

Where to find the databases

• Table of addresses for major databases and tools

• Nucleic Acids Research Database issue January each year

• Nucleic Acids Research Software issue –new

• Expasy list of tools: http://ca.expasy.org/links.html

Page 42: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

Large scale data retrieval

• Programmatic access to many databases

• MySQL access to some

• BioMart access –public and private

• FTP sites –large data downloads

Page 43: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological

Other tutorials

• http://www.ensembl.org/info/website/tutorials/index.html

• http://www.ebi.ac.uk/training/online/

• http://www.ebi.ac.uk/2can/home.html