one tagger, many uses: simple text-mining strategies for biomedicine

Post on 28-Jan-2018

49 Views

Category:

Science

14 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Lars Juhl Jensen@larsjuhljensen

One tagger, many usesSimple text-mining strategies for

biomedicine

>10 km

too much to read

computer

as smart as a dog

teach it specific tricks

named entity recognition

dictionary

genes / proteins

chemical compounds

diseases

organisms

environments

not comprehensive

expansion rules

prefixes and suffixes

curated blacklist

SDS

software

C++ tagger

>1000 abstracts / second

inherently thread-safe

70–80% recall

80–90% precision

open sourcebitbucket.org/larsjuhljensen/tagger/

Python module

Dockerhub.docker.com/r/larsjuhljensen/tagger/

web servicetagger.jensenlab.org

Extractextract.jensenlab.org

community resources

STRING

string-db.org

functional associations

DISEASES

disease–gene associations

Cytoscape

curated knowledge

experimental data

computational predictions

co-occurrence text mining

Medline abstracts

only abstracts

<1 km

access restrictions

are abstracts sufficient?

15 million full-text articles

Westergaard et al., BioRxiv, 2017

~50% more associations

electronic health records

Jensen et al., Nature Reviews Genetics, 2012

in Danish

dictionary

drugs

adverse events

in Danish

named entity recognition

temporal correlations

Drug introduction Drug discontinuation

Adverse eventNegative modifier Indication Pre-existingcondition

Adverse drug reaction Possibleadverse drug reaction

Adverse event

ADR ofadditional drug

Identification start

Eriksson et al., Drug Safety, 2014

find novel associations

summary

broadly applicable

keep it simple

free tools

AcknowledgmentsEvangelos PafilisSune Pletscher-

FrankildNadezhda Doncheva

Damian SzklarczykMichael Kuhn

Robert Eriksson

Peter Bjødstrup JensenJohn “Scooter” MorrisChristian von MeringPeer BorkChristos ArvanitidisSøren Brunak

top related