biological databases
DESCRIPTION
Biological databases. International genome sequencing and protein structure determination. Protein Data Bank (PDB). Sequence data = strings of letters. Nucleotides (bases) Adenine ( A ) Cytosine ( C ) Guanine ( G ) Thymine ( T ). triplet codons genetic code. - PowerPoint PPT PresentationTRANSCRIPT
Sequence data = strings of letters
Nucleotides (bases)
Adenine (A)
Cytosine (C)
Guanine (G)
Thymine (T)
triplet codons
genetic code
20 amino acids (A, L, V, S etc.)
Data types
primary data
secondary data
tertiary data
sequence
DNA
amino acid
DMPVERILEALAVE…primary database
secondary protein structure“motifs”: regular
expressions, blocks, profiles, fingerprints
e. g., alpha-helices, beta-strands
secondary db
domains, folding units
tertiary protein structure
tertiary dbatomic co-ordinates
interaction data
binary protein-protein interactions/ networks
pathways and functional networks
interaction db
Primary biological databases
• Nucleic acidEMBLGenBankDDBJ
(DNA Data Bank of Japan)
• Protein
PIR
MIPS
SWISS-PROT
TrEMBL
NRL-3D
International nucleotide data banks
EMBL
Europe
EMBL
EBI
GenBank
USA
NLM
NCBI
DDBJ
Japan
NIG
CIB
International
Advisory Meeting
Collaborative Meeting
TrEMBL NRDB
Other primary protein databases
• TrEMBL (translated EMBL) in SWISS-PROT formatrapid access to sequence data from genome projects computer-annotated supplement to SWISS-PROT translations of all coding sequences (CDS) in EMBL
• SP-TrEMBL
Other primary protein databases
The Protein Information Resource (PIR)
• integrated system of protein sequence databases and derived related databases, e. g., alignment databases
• rapid searching, comparison, and pattern matching of protein sequences
• retrieval of descriptive, bibliographic, feature, and concurrent cross-reference information
• aims to be comprehensive and consistently annotated
PIR: related databases
NRL-3D Sequence-Structure Database
• produced by PIR from sequence and annotation information extracted from three-dimensional structures in the Protein Databank (PDB)
• allows keyword and similarity searches
Two other useful sites
INFOBIOGEN-The Public Catalog of Databases
http://www.infobiogen.fr/services/dbcat/
KEGG-Kyoto Encyclopedia of Genes and Genomes
http://www.genome.ad.jp/kegg/Kyoto Encyclopedia of Genes and Genomes (KEGG) is an effort to computerize current knowledge of molecular and cellular biology in terms of the information pathways that consist of interacting molecules or genes and to provide links from the gene catalogs produced by genome sequencing projects.
Sequence Retrieval System (SRS)
Database browser that allows users to
•retrieve
•link
•access
entries from all interconnected resources.
Users can formulate queries across a range of different database types.
Guide to Protein Databases:
http://www.biochem.ucl.ac.uk/~robert/bioinf/lecture1/index.htmlhttp://www.biochem.ucl.ac.uk/~robert/bioinf/lecture2/index.html
With thanks to Dr Roman Laskowski.
Biomolecule-ligand interactions
• SRS: Enzymes, reactions and metabolic pathway databases
• Receptor-ligand database searches
relibase.ebi.ac.uk/
Interaction databases
Yeast model• YPD - http://www.incyte.com/sequence/proteome
• proteome database of model organism• 6142 proteins : 3430 known, 804 similarity, 1908 unknown• data on protein interaction maps• derived from literature and experiment
• Curagen - http://curatools.curagen.com• Curagen -Yeast two-hybrid screen data• 957 putative interactions of 1004 yeast proteins• Uetz et al., 2000 - Nature 403 p623-630
KEGG
http://www.genome.ad.jp/kegg/
•Search database for metabolic and regulatory pathways
•Compute KEGG: Generate possible reaction pathways between two compounds
http://www.genome.ad.jp/