biological databases

34
Biological databases

Upload: melchior-magda

Post on 31-Dec-2015

21 views

Category:

Documents


0 download

DESCRIPTION

Biological databases. International genome sequencing and protein structure determination. Protein Data Bank (PDB). Sequence data = strings of letters. Nucleotides (bases) Adenine ( A ) Cytosine ( C ) Guanine ( G ) Thymine ( T ). triplet codons genetic code. - PowerPoint PPT Presentation

TRANSCRIPT

Biological databases

International genome sequencing and protein structure determination

Protein Data Bank (PDB)

Sequence data = strings of letters

Nucleotides (bases)

Adenine (A)

Cytosine (C)

Guanine (G)

Thymine (T)

triplet codons

genetic code

20 amino acids (A, L, V, S etc.)

Three-dimensional protein structure = atomic coordinates in 3D space

Conversion into metric

Protein folding

Data types

primary data

secondary data

tertiary data

sequence

DNA

amino acid

DMPVERILEALAVE…primary database

secondary protein structure“motifs”: regular

expressions, blocks, profiles, fingerprints

e. g., alpha-helices, beta-strands

secondary db

domains, folding units

tertiary protein structure

tertiary dbatomic co-ordinates

interaction data

binary protein-protein interactions/ networks

pathways and functional networks

interaction db

Primary biological databases

• Nucleic acidEMBLGenBankDDBJ

(DNA Data Bank of Japan)

• Protein

PIR

MIPS

SWISS-PROT

TrEMBL

NRL-3D

International nucleotide data banks

EMBL

Europe

EMBL

EBI

GenBank

USA

NLM

NCBI

DDBJ

Japan

NIG

CIB

International

Advisory Meeting

Collaborative Meeting

TrEMBL NRDB

GenBank file format

GenBank file format

Swiss-Prot

SWISS-PROT file format

SWISS-PROT file format

SWISS-PROT file format

SWISS-PROT file format

Other primary protein databases

• TrEMBL (translated EMBL) in SWISS-PROT formatrapid access to sequence data from genome projects computer-annotated supplement to SWISS-PROT translations of all coding sequences (CDS) in EMBL

• SP-TrEMBL

Other primary protein databases

The Protein Information Resource (PIR)

• integrated system of protein sequence databases and derived related databases, e. g., alignment databases

• rapid searching, comparison, and pattern matching of protein sequences

• retrieval of descriptive, bibliographic, feature, and concurrent cross-reference information

• aims to be comprehensive and consistently annotated

PIR: related databases

NRL-3D Sequence-Structure Database

• produced by PIR from sequence and annotation information extracted from three-dimensional structures in the Protein Databank (PDB)

• allows keyword and similarity searches

Two other useful sites

INFOBIOGEN-The Public Catalog of Databases

http://www.infobiogen.fr/services/dbcat/

KEGG-Kyoto Encyclopedia of Genes and Genomes

http://www.genome.ad.jp/kegg/Kyoto Encyclopedia of Genes and Genomes (KEGG) is an effort to computerize current knowledge of molecular and cellular biology in terms of the information pathways that consist of interacting molecules or genes and to provide links from the gene catalogs produced by genome sequencing projects.

Sequence Retrieval System (SRS)

Database browser that allows users to

•retrieve

•link

•access

entries from all interconnected resources.

Users can formulate queries across a range of different database types.

Guide to Protein Databases:

http://www.biochem.ucl.ac.uk/~robert/bioinf/lecture1/index.htmlhttp://www.biochem.ucl.ac.uk/~robert/bioinf/lecture2/index.html

With thanks to Dr Roman Laskowski.

Interaction databases

Biomolecule-ligand interactions

• SRS: Enzymes, reactions and metabolic pathway databases 

• Receptor-ligand database searches

relibase.ebi.ac.uk/

Interaction databases

Yeast model• YPD - http://www.incyte.com/sequence/proteome

• proteome database of model organism• 6142 proteins : 3430 known, 804 similarity, 1908 unknown• data on protein interaction maps• derived from literature and experiment

• Curagen - http://curatools.curagen.com• Curagen -Yeast two-hybrid screen data• 957 putative interactions of 1004 yeast proteins• Uetz et al., 2000 - Nature 403 p623-630

http://www.hgmp.mrc.ac.uk/GenomeWeb/prot-interaction.html

Protein-Protein Interaction Databases

Protein-Protein Interactions

Biocarta

DIP

KEGG

KEGG

http://www.genome.ad.jp/kegg/

•Search database for metabolic and regulatory pathways

•Compute KEGG: Generate possible reaction pathways between two compounds

http://www.genome.ad.jp/

Metabolic pathways

Signal transduction pathways

(species-specific,

Homo sapiens shown)

Biocarta pathway database

http://www.biocarta.com