databases in bioinformatics - welcome to srm university ... · pdf filedatabases in...

19
UNIT-V Databases in Bioinformatics R.KAVITHA,M.PHARM LECTURER, DEPARTMENT OF PHARMACUTICS SRM COLEGE OF PHARMACY SRMUNIVERITY

Upload: vonhu

Post on 21-Mar-2018

220 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Databases in Bioinformatics - Welcome to SRM University ... · PDF fileDatabases in Bioinformatics • Make biological data available to scientists – Consolidation of data (gather

UNIT-VDatabases in Bioinformatics

R.KAVITHA,M.PHARMLECTURER,

DEPARTMENT OF PHARMACUTICSSRM COLEGE OF PHARMACY

SRMUNIVERITY

Page 2: Databases in Bioinformatics - Welcome to SRM University ... · PDF fileDatabases in Bioinformatics • Make biological data available to scientists – Consolidation of data (gather

• Why?• The different types of databases• Database language: identifiers• Nucleotide sequence databases• Protein sequence databases• 3D structure databases• Ontologies

Databases in Bioinformatics

Page 3: Databases in Bioinformatics - Welcome to SRM University ... · PDF fileDatabases in Bioinformatics • Make biological data available to scientists – Consolidation of data (gather

• Make biological data available to scientists– Consolidation of data (gather data from different sources)– Provide access to large dataset that cannot be published

explicitly (genome, …)

• Make biological data available in computer-readable format– Make data accessible for automated analysis

Bioinformatics: “a collective term for data compilation, organisation, analysis and dissemination”

Biological databases: Why?

Page 4: Databases in Bioinformatics - Welcome to SRM University ... · PDF fileDatabases in Bioinformatics • Make biological data available to scientists – Consolidation of data (gather

The different types of Databases in Bioinformatics

1) Data:

Type of data:• nucleotide sequences• protein sequences• 3D structures• gene expression data• metabolic pathways• ….

Data entry and quality control:• data deposited directly• curators add and update data• treatment of erroneous data: removed,

or marked• error checking• consistency, updates• ….

Primary, or derived data:• Primary databases: direct experimental results• Secondary databases: result of analysis on primary databases• Consolidation of many databases• …

Page 5: Databases in Bioinformatics - Welcome to SRM University ... · PDF fileDatabases in Bioinformatics • Make biological data available to scientists – Consolidation of data (gather

The different types of Databases in Bioinformatics2) Database:

Organisation:• flat files• Relational databases• Object-oriented databases• ….

Curators:• Large, public institution (EMBL, NCBI)• Quasi-academic institute (Swiss institute of Bioinformatics, TIGR,…)• Academic group or scientist• Commercial company

Availability:• Publicly available, no restriction• Available, but with copyright• Accessible, but not downloadable• Academic, but not freely available• Commercial

Page 6: Databases in Bioinformatics - Welcome to SRM University ... · PDF fileDatabases in Bioinformatics • Make biological data available to scientists – Consolidation of data (gather

• Identifier: string of letters and digits that generally is “understandable”– Example: TPIS_CHICK (Triose Phosphate Isomerase from

chicken (gallus gallus) ) in SwissProt– The identifier can change (based on the curator)

• Accession code: a string of letters and digits that uniquely identifies an entry in its database.– The accession number for TPIS_CHICK in Swissprot is

P00940– Accession number should not changed!!

Identifiers and Accession numbers

Page 7: Databases in Bioinformatics - Welcome to SRM University ... · PDF fileDatabases in Bioinformatics • Make biological data available to scientists – Consolidation of data (gather
Page 8: Databases in Bioinformatics - Welcome to SRM University ... · PDF fileDatabases in Bioinformatics • Make biological data available to scientists – Consolidation of data (gather

• 3 main databases– EMBL: www.ebi.ac.uk/embl– GenBank: www.ncbi.nlm.nih.gov/GenBank– DDBJ: www.ddbj.nig.ac.jp

The 3 databases are synchronized on a daily basis, and the accession numbers are consistent.

There are no legal restriction in the usage of these databases. However, there are some patented sequences in the database

Nucleotide Sequence Databases

Page 9: Databases in Bioinformatics - Welcome to SRM University ... · PDF fileDatabases in Bioinformatics • Make biological data available to scientists – Consolidation of data (gather

Protein Sequence Databases

One of the first biological sequencedatabases was probably the book "Atlas of Protein Sequences and Structures"by Margaret Dayhoff and colleagues, first published in 1965. It contained the protein sequences determined at the time, and new editions of the book were published till 1978. It became the foundationof the PIR database.

http://pir.georgetown.edu/

Protein Information Resource

Page 10: Databases in Bioinformatics - Welcome to SRM University ... · PDF fileDatabases in Bioinformatics • Make biological data available to scientists – Consolidation of data (gather

Protein Sequence Databases

http://www.expasy.ch/sprot/

The SWISS-PROT database has some legal restrictions: the entries are copyrighted, but freely accessible by academic researchers. Commercial companies must buy a license fee from SIB.

Page 11: Databases in Bioinformatics - Welcome to SRM University ... · PDF fileDatabases in Bioinformatics • Make biological data available to scientists – Consolidation of data (gather

Amino AcidComposition

Size of SwissProt

SwissProt: Statistics

Page 12: Databases in Bioinformatics - Welcome to SRM University ... · PDF fileDatabases in Bioinformatics • Make biological data available to scientists – Consolidation of data (gather

• PDB: http://www.rcsb.org• SCOP: http://scop.berkeley.edu• CATH: http://biochem.ucl.ac.uk/bsm/CATH• ASTRAL: http://astral.berkeley.edu• HOMSTRAD: http://www-cryst.bioc.cam.ac.uk/data/align/• Interfaces to PDB:

– PDB at a glancehttp://cmm.info.nih.gov/modeling/pdb_at_a_glance.html

– Molecules to go http://molbio.info.nih.gov/cgi-bin/pdb/– EBI interface: http://www.ebi.ac.uk/msd/– PDBSum: http://www.ebi.ac.uk/thornton-srv/databases/pdbsum

Biomolecule Structure Database

Page 13: Databases in Bioinformatics - Welcome to SRM University ... · PDF fileDatabases in Bioinformatics • Make biological data available to scientists – Consolidation of data (gather

• GO paper: Creating the Gene Ontology Resource: Design and Implementation Genome Research (2001) 11:1425-1433

• The GO Website - http://www.geneontology.org• Application of GO –

The Gene Ontology Annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL, and InterPro Genome Res. 2003 Apr;13(4):662-72.

The Gene Ontology (GO)

Page 14: Databases in Bioinformatics - Welcome to SRM University ... · PDF fileDatabases in Bioinformatics • Make biological data available to scientists – Consolidation of data (gather

GO Goals

From Genome Res 2001 Aug;11(8):1425-33

Page 15: Databases in Bioinformatics - Welcome to SRM University ... · PDF fileDatabases in Bioinformatics • Make biological data available to scientists – Consolidation of data (gather

• Three levels of annotation:

– Molecular function - what a gene product does at the biochemical level

– Biological process - a broad biological perspective – not currently a pathway (no dynamics or dependencies)

– Cellular component - location within cellular structures (eg Golgi apparatus) and macromolecular complexes (ribosome)

Gene Ontology (GO)

Page 16: Databases in Bioinformatics - Welcome to SRM University ... · PDF fileDatabases in Bioinformatics • Make biological data available to scientists – Consolidation of data (gather

Structure of GO

Example from molecular function:

Transmembrane receptor tyrosine protein kinaseChild

ParentTransmembrane

receptorProtein tyrosine

kinase

Is_a Is_a

Page 17: Databases in Bioinformatics - Welcome to SRM University ... · PDF fileDatabases in Bioinformatics • Make biological data available to scientists – Consolidation of data (gather
Page 18: Databases in Bioinformatics - Welcome to SRM University ... · PDF fileDatabases in Bioinformatics • Make biological data available to scientists – Consolidation of data (gather

Searching for papers…

Page 19: Databases in Bioinformatics - Welcome to SRM University ... · PDF fileDatabases in Bioinformatics • Make biological data available to scientists – Consolidation of data (gather

Searching for papers…

http://scholar.google.com