biological databases: challenges in organization and usability

Post on 10-May-2015

248 Views

Category:

Science

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Biological databasesChallenges in organization and usability

Lars Juhl Jensen

Ph.D.

postdoc

staff scientist

group leader

cofounder

challenges

buzzword du jour

big data

semantic web

cognitive computing

Underpants Gnomes

elephant in the room

heterogeneous data

many databases

different formats

different identifiers

variable quality

difficult to interpret

organization

identifier mapping

pick a reference

map all else to that

hard work

database import

automatic updating

separate parsers

error checking

formats change

unstructured data

text mining

dictionary-based methods

co-occurrence statistics

steep learning curve

quality assessment

high error rates

don’t filter it

score it

von Mering et al., Nucleic Acids Research, 2005

calibrate vs. gold standard

von Mering et al., Nucleic Acids Research, 2005

control error rate

improves comparability

helps interpretation

usability

for bioinformaticians

common identifiers

common format

cannot ask for more

for biologists

web interfaces

unified information portal

nobody will use it

focused resources

STRING

protein associations

computational predictions

Korbel et al., Nature Biotechnology, 2004

experimental data

Jensen & Bork, Science, 2008

curated knowledge

Letunic & Bork, Trends in Biochemical Sciences, 2008

text mining

>10 km

general approach

COMPARTMENTS

TISSUES

DISEASES

visualization

quick overview

protein networks

string-db.org

subcellular localization

compartments.jensenlab.org

tissue expression

tissues.jensenlab.org

access to more details

tables are boring

summary

common identifiers

quality scores

focused resources

visualization

top related