text mining, machine learning, nlp and all that (in 10 minutes)

15
text mining, machine learning, NLP and all that (in 10 minutes) Byron C Wallace Brown Center for Evidence Based Medicine #CochraneTech

Upload: cochranecollaboration

Post on 17-Dec-2014

427 views

Category:

Technology


1 download

DESCRIPTION

Byron C Wallace, from #CochraneTech Symposium, Québec 2013

TRANSCRIPT

Page 1: Text mining, machine learning, NLP and all that (in 10 minutes)

text mining, machine learning, NLP and all that (in 10 minutes)

Byron C WallaceBrown Center for Evidence Based Medicine

#CochraneTech

Page 2: Text mining, machine learning, NLP and all that (in 10 minutes)

why do we need this stuff?

[Bastian et al, PLoS Medicine 2010]

Page 3: Text mining, machine learning, NLP and all that (in 10 minutes)

why do we need this stuff?

[Bastian et al, PLoS Medicine 2010]

eleven systematic reviews. every day.

Page 4: Text mining, machine learning, NLP and all that (in 10 minutes)

PubMed growth

[http://altmetrics.org/wp-content/uploads/2010/10/medline-articles-by-year-lg.png]

Page 5: Text mining, machine learning, NLP and all that (in 10 minutes)

what can we automate

Page 6: Text mining, machine learning, NLP and all that (in 10 minutes)

what can we automate

Page 7: Text mining, machine learning, NLP and all that (in 10 minutes)

what can we automate?

Page 8: Text mining, machine learning, NLP and all that (in 10 minutes)

abstracts from PubMed search

doctor conducting review

manually screened abstracts

SVM

how does this work?

Page 9: Text mining, machine learning, NLP and all that (in 10 minutes)

SVMs

Page 10: Text mining, machine learning, NLP and all that (in 10 minutes)

bag of words

Page 11: Text mining, machine learning, NLP and all that (in 10 minutes)

special considerations for the case of systematic reviews

• class imbalance – far fewer relevant than irrelevant abstracts– asymmetric costs sensitivity more important than

specificity

• reviewer time is scarce and expensive– better models, fewer labels: active learning and

dual supervision

Page 12: Text mining, machine learning, NLP and all that (in 10 minutes)

how do we do?

we can achieve 100% sensitivity while

substantially reducing workload

“Towards Modernizing the Systematic Review Pipeline: Efficient Updating via Data Mining” Genetics in Medicine 2012

Page 13: Text mining, machine learning, NLP and all that (in 10 minutes)

beyond citation screening

Page 14: Text mining, machine learning, NLP and all that (in 10 minutes)

beyond citation screening