features of biological databases
TRANSCRIPT
Features of Biological Databases
CHARU SHARMAB.Sc(H) BOTANY 3rd YEAR
Biological Database It is a collection of data that is
structured, searchable, updated periodically and cross-referenced.
Stores biological data in electronic form.
Purpose-Systemization of databaseAvailability of biological dataAnalysis of computed biological data
HISTORYInsulin, first protein that was sequenced;
composed of 55 amino acid.The sequence was published in “Atlas Of
Protein Sequence” in 1965 by Margaret Day Hoff.
Became base for PIR database.First nucleotide sequenced was of Yeast
tRNA, composed of 77 bp.First organism whose genome was
sequenced, a free living virus Haemophilus influenzae in 1995 by Craig Ventar
Features of Biological Databases1. Heterogeneity2. High volume data3. Uncertainity4. Data curation5. Data integration6. Data sharing7. Dynamics
1. Data Heterogeneity Availability of diverse and
complex data types. Data Types : Sequence- Nucleotide, Protein Graph - Data indicating
relationship among themselves can be captured as graph. It includes pathway data, genetic maps and structural taxonomy.
High dimensional data – Data generated from micro-array
experiments that involves thousands of genes and hundreds of experimental condition.
Shapes – It consists of 3D molecular structural
data. Example- Docking Temporal data – For studying dynamics of any biological
system. Example- Development biology
Patterns – There are patterns lying within
the genome that characterize biologically entities.
Example-Regulatory sequence (promoter)
Scalar and Vector fields – Extracted features data – Numerical data obtained from
combination of one of the above mentioned data types
2. High volume data In addition to being highly heterogeneous,
biological data are voluminous to support comprehensive investigations in various fields and directions.
3. Uncertainity Biological data have great deal of
uncertainity as they represent biological phenomenon that are observed and assumed.
4. Data curation Biological data are collected from
various sources across different structural and functional boundaries.
There are always chances of missing links.
To fill these, the data is analyzed and curated via automated methods.
5. Data integration After years of research, across
different structural and functional scales, data is collected from laboratories worldwide, and integrated together through a database and made available for use.
6. Data sharingBiological data is shared via
databases.Purpose: For scientific community’s
inspectionFor cross verificationTo prevent repetition and
validation of data
7. DynamicsNew data is generated every day
in laboratories.And sometimes this new data
contradicts with the old data.So, its necessary to develop new
organizational database schemes to incorporate new data.
CLASSIFICATION
Classification of biological databaseso Data typeo Maintainer statuso Data accesso Data sourceo Database designo Organism
1. Data type Sequence database a. Nucleotide database : GenBank, EMBL-Bank b. Protein database : Swiss-Prot, PIR Structure database - PDB, NDB, DALI, MSD Microarray database - ArrayExpress, MIAME Chemical database - PubChem Pathway database - KEGG, BioSilico Enzyme database - ExPASy, REBASE Disease database - OMIM, OMIA Literature database - PubMed, ScoPUS
2. Maintainer statusNCBI, EMBLAcademic group or scientistCommercial company
3. Data accessPublicly availableAvailable with copyrightBrowsing only, accessible but not
downloadableAcademic but not freely availableRestricted
4. Data sourcea) Primary database (archival) Original data submission by researcher occurs. Examples: Nucleotide - GenBank, EMBL, DDBJ Protein - UniProt Structure - PDB Literature - Medline (PubMed)b) Secondary database (curated) - Results of analysis of primary databases. - Either manually curated or by automated
methods Examples: Prosite , Pfam , RefSeq
5. Database designFlat filesRelational database (SQL)Object oriented databaseExchange/publication
technologies (FTP, HTML, SOAP, COBRA, XML)
6. Organism BacteriaVirusHuman
THANK YOU