Download - Designing Biological Databases
How do you solve a problem like a biological database?
(BNF 216 - Database Modeling and Design for Bioinformatics)
Arjei Balandra Software Developer
National Telehealth Center University of the Philippines – Manila
http://bumblebest.net
Database
• A database is a set of data that has a regular structure and that is organized in such a way that a computer can easily find the desired information.
– The Linux Information Project
(http://www.linfo.org/database.html)
Biological Database
• Biological databases are libraries of life sciences information collected from scientific experiments, published literature, high-throughput experiment technology, and computational analyses.
- Wikipedia (en.wikipedia.org/wiki/Biological_database)
NCBI - GenBank
European Nucleotide Archive – EMBL-EBI
DDBJ – DNA Data Bank Of Japan
Why Database?
• Data-intensive techniques such as high-throughput screening and gene expression experiments demand methods to correlate large and diverse datasets.
• Databases integrate information from a variety of sources allowing faster and more powerful searches.
DO A “GOOD” DATABASE DESIGN
Tip #1:
Good Database Design
• Provides easy access to previous results.
• Supports both expert- and machine-guided searches for novel correlations in data.
Bad Database Design
• Obfuscates the correlations for which the user is searching
• makes it difficult for biologists to fit their data into the database or to find previously stored data resulting to user contempt.
• ‘brittle’
LEARN FROM EXISTING LITERATURE
Tip #2:
• Generalizations
• Incorporate existing schema into the database design
• Use existing structures for common data
Generalizations
aMAZE (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC308873/figure/gkh139f2/)
RESPECT THE UNIQUE NEEDS OF BIOLOGISTS (AND USERS)
Tip #3:
Business rules
• constraints
– based on data derived from the real-world entities
– specific to the needs of the organization.
What they need?
– Use free-text Comments
– Create user-specific categories
Dealing with Business Rules
User-Specific Categories
DESIGN THE DATABASE BEFORE BUILDING IT
Tip #4:
USE THE DATABASE TO ENFORCE DATA INTEGRITY
Tip #5:
Normalization
Normalization
Normalization
KEEP THE DATABASE SCOPE MANAGEABLE
Tip #6:
• In Biology, one size does not fit all
• Focus on a subset of Biology (ie. Genes, Proteins)
• In large subsets, do it one at a time
• Inclusive
Keep the database scope manageable
LISTEN TO THE PEOPLE WHO HAVE TO WRITE AND USE THE INTERFACE
Tip #7:
• Databases are successful only when people use it
Users know what they want and need
+ Developers know what they can do
+ Designers know what must be done ---------------------------------------------------------
= Collaborative approach to develop a successful database
TEST THE DESIGN WITH REALISTIC DATA
Tip #8:
MAKE THE DATABASE STRUCTURE UNDERSTANDABLE AND EASY TO MAINTAIN
Tip #9:
THANK YOU!
REPLACE(quote,”pagmamahal”,”
data”);
quote
References
• The Linux Information Project (http://www.linfo.org/database.html)
• Nelson, M.R., Reisinger, S.J., Henry, S. (2003).Designing databases to store biological information. BIOSILICO Vol. 1, No. 4
• Wikipedia (en.wikipedia.org/wiki/Biological_database) • Lemer, C., Antezana, E., Couche, F., Fays, F., Santolaria,
X., Janky, R., … Wodak, S. J. (2004). The aMAZE LightBench: a web interface to a relational database of cellular processes. Nucleic Acids Research, 32(Database issue), D443–D448. doi:10.1093/nar/gkh139