one tagger, many uses: simple text-mining strategies for biomedicine

70
Lars Juhl Jensen @larsjuhljensen One tagger, many uses Simple text-mining strategies for biomedicine

Upload: lars-juhl-jensen

Post on 28-Jan-2018

49 views

Category:

Science


14 download

TRANSCRIPT

Page 1: One tagger, many uses: Simple text-mining strategies for biomedicine

Lars Juhl Jensen@larsjuhljensen

One tagger, many usesSimple text-mining strategies for

biomedicine

Page 2: One tagger, many uses: Simple text-mining strategies for biomedicine

>10 km

Page 3: One tagger, many uses: Simple text-mining strategies for biomedicine

too much to read

Page 4: One tagger, many uses: Simple text-mining strategies for biomedicine

computer

Page 5: One tagger, many uses: Simple text-mining strategies for biomedicine

as smart as a dog

Page 6: One tagger, many uses: Simple text-mining strategies for biomedicine

teach it specific tricks

Page 7: One tagger, many uses: Simple text-mining strategies for biomedicine
Page 8: One tagger, many uses: Simple text-mining strategies for biomedicine
Page 9: One tagger, many uses: Simple text-mining strategies for biomedicine

named entity recognition

Page 10: One tagger, many uses: Simple text-mining strategies for biomedicine

dictionary

Page 11: One tagger, many uses: Simple text-mining strategies for biomedicine

genes / proteins

Page 12: One tagger, many uses: Simple text-mining strategies for biomedicine

chemical compounds

Page 13: One tagger, many uses: Simple text-mining strategies for biomedicine

diseases

Page 14: One tagger, many uses: Simple text-mining strategies for biomedicine

organisms

Page 15: One tagger, many uses: Simple text-mining strategies for biomedicine

environments

Page 16: One tagger, many uses: Simple text-mining strategies for biomedicine

not comprehensive

Page 17: One tagger, many uses: Simple text-mining strategies for biomedicine

expansion rules

Page 18: One tagger, many uses: Simple text-mining strategies for biomedicine

prefixes and suffixes

Page 19: One tagger, many uses: Simple text-mining strategies for biomedicine

curated blacklist

Page 20: One tagger, many uses: Simple text-mining strategies for biomedicine

SDS

Page 21: One tagger, many uses: Simple text-mining strategies for biomedicine

software

Page 22: One tagger, many uses: Simple text-mining strategies for biomedicine

C++ tagger

Page 23: One tagger, many uses: Simple text-mining strategies for biomedicine

>1000 abstracts / second

Page 24: One tagger, many uses: Simple text-mining strategies for biomedicine

inherently thread-safe

Page 25: One tagger, many uses: Simple text-mining strategies for biomedicine

70–80% recall

Page 26: One tagger, many uses: Simple text-mining strategies for biomedicine

80–90% precision

Page 27: One tagger, many uses: Simple text-mining strategies for biomedicine

open sourcebitbucket.org/larsjuhljensen/tagger/

Page 28: One tagger, many uses: Simple text-mining strategies for biomedicine

Python module

Page 29: One tagger, many uses: Simple text-mining strategies for biomedicine

Dockerhub.docker.com/r/larsjuhljensen/tagger/

Page 30: One tagger, many uses: Simple text-mining strategies for biomedicine

web servicetagger.jensenlab.org

Page 31: One tagger, many uses: Simple text-mining strategies for biomedicine

Extractextract.jensenlab.org

Page 32: One tagger, many uses: Simple text-mining strategies for biomedicine
Page 33: One tagger, many uses: Simple text-mining strategies for biomedicine

community resources

Page 34: One tagger, many uses: Simple text-mining strategies for biomedicine

STRING

Page 35: One tagger, many uses: Simple text-mining strategies for biomedicine

string-db.org

Page 36: One tagger, many uses: Simple text-mining strategies for biomedicine

functional associations

Page 37: One tagger, many uses: Simple text-mining strategies for biomedicine

DISEASES

Page 38: One tagger, many uses: Simple text-mining strategies for biomedicine

disease–gene associations

Page 39: One tagger, many uses: Simple text-mining strategies for biomedicine

Cytoscape

Page 40: One tagger, many uses: Simple text-mining strategies for biomedicine
Page 41: One tagger, many uses: Simple text-mining strategies for biomedicine

curated knowledge

Page 42: One tagger, many uses: Simple text-mining strategies for biomedicine

experimental data

Page 43: One tagger, many uses: Simple text-mining strategies for biomedicine

computational predictions

Page 44: One tagger, many uses: Simple text-mining strategies for biomedicine

co-occurrence text mining

Page 45: One tagger, many uses: Simple text-mining strategies for biomedicine

Medline abstracts

Page 46: One tagger, many uses: Simple text-mining strategies for biomedicine

only abstracts

Page 47: One tagger, many uses: Simple text-mining strategies for biomedicine

<1 km

Page 48: One tagger, many uses: Simple text-mining strategies for biomedicine

access restrictions

Page 49: One tagger, many uses: Simple text-mining strategies for biomedicine

are abstracts sufficient?

Page 50: One tagger, many uses: Simple text-mining strategies for biomedicine

15 million full-text articles

Page 51: One tagger, many uses: Simple text-mining strategies for biomedicine

Westergaard et al., BioRxiv, 2017

Page 52: One tagger, many uses: Simple text-mining strategies for biomedicine

~50% more associations

Page 53: One tagger, many uses: Simple text-mining strategies for biomedicine

electronic health records

Page 54: One tagger, many uses: Simple text-mining strategies for biomedicine

Jensen et al., Nature Reviews Genetics, 2012

Page 55: One tagger, many uses: Simple text-mining strategies for biomedicine
Page 56: One tagger, many uses: Simple text-mining strategies for biomedicine

in Danish

Page 57: One tagger, many uses: Simple text-mining strategies for biomedicine

dictionary

Page 58: One tagger, many uses: Simple text-mining strategies for biomedicine

drugs

Page 59: One tagger, many uses: Simple text-mining strategies for biomedicine

adverse events

Page 60: One tagger, many uses: Simple text-mining strategies for biomedicine

in Danish

Page 61: One tagger, many uses: Simple text-mining strategies for biomedicine

named entity recognition

Page 62: One tagger, many uses: Simple text-mining strategies for biomedicine

temporal correlations

Page 63: One tagger, many uses: Simple text-mining strategies for biomedicine

Drug introduction Drug discontinuation

Adverse eventNegative modifier Indication Pre-existingcondition

Adverse drug reaction Possibleadverse drug reaction

Adverse event

ADR ofadditional drug

Identification start

Eriksson et al., Drug Safety, 2014

Page 64: One tagger, many uses: Simple text-mining strategies for biomedicine

find novel associations

Page 65: One tagger, many uses: Simple text-mining strategies for biomedicine

summary

Page 66: One tagger, many uses: Simple text-mining strategies for biomedicine

broadly applicable

Page 67: One tagger, many uses: Simple text-mining strategies for biomedicine

keep it simple

Page 68: One tagger, many uses: Simple text-mining strategies for biomedicine

free tools

Page 69: One tagger, many uses: Simple text-mining strategies for biomedicine

AcknowledgmentsEvangelos PafilisSune Pletscher-

FrankildNadezhda Doncheva

Damian SzklarczykMichael Kuhn

Robert Eriksson

Peter Bjødstrup JensenJohn “Scooter” MorrisChristian von MeringPeer BorkChristos ArvanitidisSøren Brunak

Page 70: One tagger, many uses: Simple text-mining strategies for biomedicine