one tagger, many uses: simple text-mining strategies for biomedicine
Post on 28-Jan-2018
49 Views
Preview:
TRANSCRIPT
Lars Juhl Jensen@larsjuhljensen
One tagger, many usesSimple text-mining strategies for
biomedicine
>10 km
too much to read
computer
as smart as a dog
teach it specific tricks
named entity recognition
dictionary
genes / proteins
chemical compounds
diseases
organisms
environments
not comprehensive
expansion rules
prefixes and suffixes
curated blacklist
SDS
software
C++ tagger
>1000 abstracts / second
inherently thread-safe
70–80% recall
80–90% precision
open sourcebitbucket.org/larsjuhljensen/tagger/
Python module
Dockerhub.docker.com/r/larsjuhljensen/tagger/
web servicetagger.jensenlab.org
Extractextract.jensenlab.org
community resources
STRING
string-db.org
functional associations
DISEASES
disease–gene associations
Cytoscape
curated knowledge
experimental data
computational predictions
co-occurrence text mining
Medline abstracts
only abstracts
<1 km
access restrictions
are abstracts sufficient?
15 million full-text articles
Westergaard et al., BioRxiv, 2017
~50% more associations
electronic health records
Jensen et al., Nature Reviews Genetics, 2012
in Danish
dictionary
drugs
adverse events
in Danish
named entity recognition
temporal correlations
Drug introduction Drug discontinuation
Adverse eventNegative modifier Indication Pre-existingcondition
Adverse drug reaction Possibleadverse drug reaction
Adverse event
ADR ofadditional drug
Identification start
Eriksson et al., Drug Safety, 2014
find novel associations
summary
broadly applicable
keep it simple
free tools
AcknowledgmentsEvangelos PafilisSune Pletscher-
FrankildNadezhda Doncheva
Damian SzklarczykMichael Kuhn
Robert Eriksson
Peter Bjødstrup JensenJohn “Scooter” MorrisChristian von MeringPeer BorkChristos ArvanitidisSøren Brunak
top related