designing biological databases

Post on 17-Jul-2015

52 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

How do you solve a problem like a biological database?

(BNF 216 - Database Modeling and Design for Bioinformatics)

Arjei Balandra Software Developer

National Telehealth Center University of the Philippines – Manila

http://bumblebest.net

Database

• A database is a set of data that has a regular structure and that is organized in such a way that a computer can easily find the desired information.

– The Linux Information Project

(http://www.linfo.org/database.html)

Biological Database

• Biological databases are libraries of life sciences information collected from scientific experiments, published literature, high-throughput experiment technology, and computational analyses.

- Wikipedia (en.wikipedia.org/wiki/Biological_database)

NCBI - GenBank

European Nucleotide Archive – EMBL-EBI

DDBJ – DNA Data Bank Of Japan

Why Database?

• Data-intensive techniques such as high-throughput screening and gene expression experiments demand methods to correlate large and diverse datasets.

• Databases integrate information from a variety of sources allowing faster and more powerful searches.

DO A “GOOD” DATABASE DESIGN

Tip #1:

Good Database Design

• Provides easy access to previous results.

• Supports both expert- and machine-guided searches for novel correlations in data.

Bad Database Design

• Obfuscates the correlations for which the user is searching

• makes it difficult for biologists to fit their data into the database or to find previously stored data resulting to user contempt.

• ‘brittle’

LEARN FROM EXISTING LITERATURE

Tip #2:

• Generalizations

• Incorporate existing schema into the database design

• Use existing structures for common data

Generalizations

aMAZE (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC308873/figure/gkh139f2/)

RESPECT THE UNIQUE NEEDS OF BIOLOGISTS (AND USERS)

Tip #3:

Business rules

• constraints

– based on data derived from the real-world entities

– specific to the needs of the organization.

What they need?

– Use free-text Comments

– Create user-specific categories

Dealing with Business Rules

User-Specific Categories

DESIGN THE DATABASE BEFORE BUILDING IT

Tip #4:

USE THE DATABASE TO ENFORCE DATA INTEGRITY

Tip #5:

Normalization

Normalization

Normalization

KEEP THE DATABASE SCOPE MANAGEABLE

Tip #6:

• In Biology, one size does not fit all

• Focus on a subset of Biology (ie. Genes, Proteins)

• In large subsets, do it one at a time

• Inclusive

Keep the database scope manageable

LISTEN TO THE PEOPLE WHO HAVE TO WRITE AND USE THE INTERFACE

Tip #7:

• Databases are successful only when people use it

Users know what they want and need

+ Developers know what they can do

+ Designers know what must be done ---------------------------------------------------------

= Collaborative approach to develop a successful database

TEST THE DESIGN WITH REALISTIC DATA

Tip #8:

MAKE THE DATABASE STRUCTURE UNDERSTANDABLE AND EASY TO MAINTAIN

Tip #9:

THANK YOU!

REPLACE(quote,”pagmamahal”,”

data”);

quote

References

• The Linux Information Project (http://www.linfo.org/database.html)

• Nelson, M.R., Reisinger, S.J., Henry, S. (2003).Designing databases to store biological information. BIOSILICO Vol. 1, No. 4

• Wikipedia (en.wikipedia.org/wiki/Biological_database) • Lemer, C., Antezana, E., Couche, F., Fays, F., Santolaria,

X., Janky, R., … Wodak, S. J. (2004). The aMAZE LightBench: a web interface to a relational database of cellular processes. Nucleic Acids Research, 32(Database issue), D443–D448. doi:10.1093/nar/gkh139

top related