data and text mining

Post on 10-May-2015

578 Views

Category:

Technology

8 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Data and Text Mining

Lars Juhl Jensen

sequence analysis

protein networks

de Lichtenberg, Jensen et al., Science, 2005

adverse drug reactions

Campillos, Kuhn et al., Science, 2008

group leader

cofounder

data mining

proteomics

text mining

biomedical literature

electronic health records

protein networks

guilt by association

STRING

Szklarczyk, Franceschini et al., Nucleic Acids Research, 2011

computational predictions

gene fusion

Korbel et al., Nature Biotechnology, 2004

gene neighborhood

operons

Korbel et al., Nature Biotechnology, 2004

bidirectional promoters

Korbel et al., Nature Biotechnology, 2004

phylogenetic profiles

Korbel et al., Nature Biotechnology, 2004

a real example

Cell

Cellulosomes

Cellulose

experimental data

gene coexpression

protein interactions

Jensen & Bork, Science, 2008

genetic interactions

Beyer et al., Nature Reviews Genetics, 2007

curated knowledge

complexes

pathways

Letunic & Bork, Trends in Biochemical Sciences, 2008

many databases

different formats

different identifiers

variable quality

not comparable

not same species

hard work

quality scores

von Mering et al., Nucleic Acids Research, 2005

calibrate vs. gold standard

von Mering et al., Nucleic Acids Research, 2005

homology-based transfer

Franceschini et al., Nucleic Acids Research, 2013

missing most of the data

text mining

>10 km

too much to read

computer

as smart as a dog

teach it specific tricks

named entity recognition

comprehensive lexicon

CDC2

cyclin dependent kinase 1

expansion rules

hCdc2

CDC2

flexible matching

cyclin-dependent kinase 1

cyclin dependent kinase 1

“black list”

SDS

augmented browsing

Reflect

browser add-on

real-time text mining

Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009O’Donoghue et al., Journal of Web Semantics, 2010

information extraction

co-mentioning

within documents

within paragraphs

within sentences

text corpus

~22 million abstracts

no access

millions of full-text articles

localization and disease

general approach

COMPARTMENTS

TISSUES

DISEASES

curated knowledge

experimental data

text mining

computational predictions

common identifiers

quality scores

visualization

compartments.jensenlab.org

tissues.jensenlab.org

dissemination

web interfaces

web services

diseases.jensenlab.org

bulk download

AcknowledgmentsSTRING

Christian von Mering

Damian Szklarczyk

Michael KuhnManuel Stark

Samuel ChaffronChris Creevey

Jean MullerTobias DoerksPhilippe Julien

Alexander RothMilan Simonovic

Jan KorbelBerend Snel

Martijn HuynenPeer Bork

Text miningSune FrankildEvangelos PafilisKalliopi TsafouAlberto SantosJanos BinderHeiko HornMichael KuhnNigel BrownReinhardt SchneiderSean O’ Donoghue

top related