2014 stsi research_meeting_mturk_pdf

52
Crowdsourcing for information extraction: (dynamic assembly of expert “humans”) Benjamin Good The Scripps Research Institute [email protected] @bgood

Upload: goodb

Post on 17-Jul-2015

322 views

Category:

Science


1 download

TRANSCRIPT

Crowdsourcing for information extraction:

(dynamic assembly of expert “humans”)

Benjamin Good The Scripps Research Institute

[email protected] @bgood

High level goal: improve access to published knowledge

2

articles added to PubMed per year >100/hour

Thanks to Suzi Lewis from GO for smoothie

Example useWhat diseases are treated with curcumin (turmeric)?

3

Data is in there, just hard

to get

4

Kilicoglu H, Shin D, Fiszman M, Rosemblat G, Rindflesch TC. SemMedDB: a PubMed-scale repository of biomedical semantic predications. Bioinformatics. 2012 Dec 1;28(23):

3158-60. doi: 10.1093/bioinformatics/bts591. Epub 2012 Oct 8.

70,364,020 subject-predicate-object relations

NLM tool

24 million abstracts

ExampleWhat diseases are treated with curcumin (turmeric)?

5478 results

select * from PREDICATION_AGGREGATE where s_name = 'Curcumin' and predicate = 'TREATS'

Turmeric, the miracle spice!

6

ExampleWhat diseases are treated with curcumin (turmeric)?

7

478 results

select * from PREDICATION_AGGREGATE where s_name = 'Curcumin' and predicate = 'TREATS'

Data is easy to access, but is it all

in there? Is it correct?

More about Curcumin…

8

9

?!?!Effect on curcumin on cholesterol gall-stone induction.

Influence of dietary capsaicin and curcumin during experimental induction of cholesterol gallstone in mice.

Spice bioactive compounds, capsaicin and curcumin, were both individually and in combination examined for antilithogenic

potential during experimental induction of cholesterol gallstones in mice.

10

The diet that contained capsaicin, curcumin, or their combination reduced the incidence of cholesterol gallstones by 50%, 66%, and 56%, respectively.

Facts of life in NLP

• False Positives and False Negatives always present

• Human annotators remain the gold standard

• There are not nearly enough professional human annotators to process every document published

11

Observations

• There are about 2.92 billion Internet users

• Lots of them can read English

• Most of these would not have gotten that causal relation wrong for curcumin…

12 http://www.statista.com/statistics/273018/number-of-internet-users-worldwide/

Hypothesis

• We can generate the equivalent of professional annotators by incentivizing, guiding, and aggregating the labor of large numbers of non-professionals

13 Zhai 2013, Aroyo 2013, Burger 2014, Mortenson 2014, Good 2015

Information Extraction

1. Find mentions of high level concepts in text

2. Map mentions to specific terms in ontologies

3. Identify relationships between concepts

14

Microtask Crowdsourcing

• Distribute discrete units of work (aka “human intelligence tasks” or HITs) to many workers in parallel who are paid to solve them.

15

Reported 500,000 registered workers in

2011 [1]

[1] Paritosh P, Ipeirotis P, Cooper M, Suri S: The computer is the new sewingmachine: benefits and perils of crowdsourcing. WWW '11 2011:325–326.

AMT, how it works

16

Requester Tasks

AmazonFor each task, specify: • a qualification test • how many workers per

task • how much we will pay

per task • A Web form for

completing the task

Interact directly with Amazon system

Manages: • parallel execution of jobs • worker access to tasks

via qualification tests • payments • task advertising

Workers

How well can AMT workers, in aggregate, reproduce a gold standard disease mention corpus within the text of PubMed abstracts?

17

Corpus used for comparison

NCBI Disease corpus • 793 PubMed abstracts

• (100 development, 593 training, 100 test)

• 12 expert annotators (2 annotate each abstract)

6,900 “disease” mentions

18Doğan, Rezarta, and Zhiyong Lu. "An improved corpus of disease mentions in PubMed citations." Proceedings of the 2012

Workshop on Biomedical Natural Language Processing. Association for Computational Linguistics.

“Disease”Phrase is a disease IF: • it can be mapped to a unique UMLS metathesaurus

concept in one of these semantic types

19Doğan, Rezarta, and Zhiyong Lu. "An improved corpus of disease mentions in PubMed citations." Proceedings of the 2012

Workshop on Biomedical Natural Language Processing. Association for Computational Linguistics.

• and it contains information helpful to physicians

20

• Specific Disease: • “Diastrophic dysplasia”

• Disease Class: • “Cancers”

• Composite Mention: • “prostatic , skin , and lung cancer”

• Modifier: • ..the “familial breast cancer” gene , BRCA2..

Disease mentions

Experiment

21

Identify the disease mentions in 593 abstracts from the NCBI disease corpus

• 6 cents per HIT

• HIT = annotate one abstract from PubMed

• First HIT = survey, next 4 = training, then real

• 10% of rest of hits are gold standard tests

• 15 workers annotate each abstract

Instructions• Task: You will be presented with text from the biomedical literature which we believe may help

resolve some important medical questions. The task is to highlight words and phrases in that text which are diseases, disease groups, or symptoms of diseases. This work will help advance research in cancer and many other diseases!

• Highlight all diseases and disease abbreviations • “...are associated with Huntington disease ( HD )... HD patients

received...” • “The Wiskott-Aldrich syndrome ( WAS ) , an X-linked

immunodeficiency…” • Highlight the longest span of text specific to a disease

• “... contains the insulin-dependent diabetes mellitus locus …” • and not just ‘diabetes’.

• Highlight disease conjunctions as single, long spans. • “... a significant fraction of familial breast and ovarian cancer patients…”

• Highlight symptoms - physical results of having a disease• “XFE progeroid syndrome can cause dwarfism, cachexia, and

microcephaly. Patients often display learning disabilities, hearing loss, and visual impairment.

22

Qualification task: Q1Select all and only the terms that should be highlighted for each text segment:

23

1. “Myotonic dystrophy ( DM ) is associated with a ( CTG ) n trinucleotide repeat expansion in the 3-untranslated region of a protein kinase-encoding gene , DMPK , which maps to chromosome 19q13 . 3 . ”

• Myotonic

• dystrophy

• Myotonic dystrophy

• DM

• CTG

• trinucleotide repeat expansion

• kinase-encoding gene

• DMPK

Qualification task: Q2

24

2. “Germline mutations in BRCA1 are responsible for most cases of inherited breast and ovarian cancer . However , the function of the BRCA1 protein has remained elusive . As a regulated secretory protein , BRCA1 appears to function by a mechanism not previously described for tumour suppressor gene products.”

• Germline mutations

• BRCA1

• breast

• ovarian cancer

• inherited breast and ovarian cancer

• cancer

• tumour

• tumour suppressor

Qualification task: Q3

25

3. “We report about Dr . Kniest , who first described the condition in 1952 , and his patient , who , at the age of 50 years is severely handicapped with short stature , restricted joint mobility , and blindness but is mentally alert and leads an active life . This is in accordance with molecular findings in other patients with Kniest dysplasia and…”

• age of 50 years

• severely handicapped

• short

• short stature

• restricted joint mobility

• blindness

• mentally alert

• molecular findings

• Kniest dysplasia

• dysplasia

Qualification task results

26

• Experiment ran for 9 days • 346 workers attempted the qualification test • 145 (42%) passed

Passing threshold

Worker demographics: gender

27

0"

0.1"

0.2"

0.3"

0.4"

0.5"

0.6"

0.7"

female" male"

First HIT was a survey

Age

28

0"

0.1"

0.2"

0.3"

0.4"

0.5"

0.6"

0.7"

0.8"

age"18/21" age"21/35" age"36/45" age"46"and"greater"

Occupation

29 0" 0.05" 0.1" 0.15" 0.2" 0.25"

Unemployed"Student"

Technical"Science"

Computer"Business"

Educa=on"Programmer"

Art"Re=red"Labor"

Finance"Legal"

AEorney"Team"Leader"

Human"Resources"stay"at"home"mom"Biological"Sciences"

Bussiness"Caretaker"

Administra=ve"Assistant"microbiology"graduate"student"

Transporta=on"Industry"sales"

Hardware"Homemaker"

manufacturing"Chemical"Sciences"

mom"Web"Assessor"

Licensed"Prac=cal"Nurse"customer"service"rep"

Education

30

0" 0.05" 0.1" 0.15" 0.2" 0.25" 0.3"

Some"high"school"Finished"high"school"

Some"community"college"Finished"community"college"

Some"49year"college"Finished"49year"college"Some"masters"program"

Finished"masters"program"Some"PhD"program"

Finished"PhD"program"

Why?

31

Tagging interface

32

Click to see instructions

Highlight mentions

Feedback interface:

• Game-like learning signal

• Either see gold standard data or data from other workers

33

Results: quantity, cost

• 9 days

• 589 abstracts annotated by 15 different workers (8,835 tasks completed)

• 4 hits for training + survey overhead cost

• total cost: $630.96

34

Worker contributions

35

Worker quality

36

AMT, how it really works

37

Requester

Tasks

Amazon

Aggregation function

Workers

http://www.thesheepmarket.com/

Increase precision with voting

38

1 or more votes (K=1)This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies.

K=2This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies.

K=3This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies.

K=4This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies.

Aggregation function

Results 589 abstracts compared to gold standard

39

F = 0.87, k = 6

Inter-Annotator agreement among experts, NCBI Disease corpus

40Doğan, Rezarta, and Zhiyong Lu. "An improved corpus of disease mentions in PubMed citations." Proceedings of the 2012 Workshop on Biomedical Natural Language Processing. Association for Computational Linguistics, 2012.

0.760.87

Average level of agreement

between expert annotators (stage 1)

Professionals achieve equivalent agreement only after reviewing each

other’s annotations.

41

0.760.87

In aggregate, our worker ensemble is faster, cheaper and more accurate than a single

expert annotator for this task

• experts had consistency (F) with other experts = 0.76.

• Only after viewing each other’s annotators did experts reach 0.87 consistency

• The turker ensemble had consistency with the finalized standard = 0.87 (with access to much less information)

42

We are not alone• Mortenson et al (2014), 25 workers, 2¢/task = 1 biomedical

ontology expert. “Using the wisdom of the crowds to find critical errors in biomedical ontologies: a study of SNOMED CT”. JAMIA

• Burger et al (2014). 5 workers, 7¢/task = 1 expert curator. Hybrid curation of gene–mutation relations combining automated extraction and crowdsourcing. Database.

• Zhai et al (2013), 5 workers, 3¢/task = 1 expert curator. Web 2.0-Based Crowdsourcing for High-Quality Gold Standard Development in Clinical Natural Language Processing” J Med Internet Res

• .. more (e.g. IBM research “Crowd Watson” project by Arroyo and Welty)

To do list

• Machine learning experiment on TopCoder

• Citizen Science (volunteer) implementation of this

• New tasks

44

mturk -> machine learning

• The main purpose of building this particular corpus was to train a disease tagging algorithm.

45

Next Steps with Disease Corpus

46

• We have assembled a new 1,000 document corpus

• (took 6 days)

• Simply adding it to the training data didn’t help

• Execute TopCoder contest to produce a better algorithm.

could we just do them all?

• we peaked at a rate of 500 abstracts processed per day (assuming 5 workers/doc)

• 284 workers contributing in a span of 6 days

• at 1 million/year we would need to get to 2,700/day to do them all

• $0.066*5*1000000 = $330,000

47

Moving towards $0/task and many more workers

• mark2cure.org

• A citizen science portal for volunteers to do the same stuff

• first experiment will recapitulate results from AMT

48

Information Extraction

1. Find mentions of high level concepts in text

2. Map mentions to specific terms in ontologies

3. Identify relationships between concepts

49

50

?!?!Effect on curcumin on cholesterol gall-stone induction.

Influence of dietary capsaicin and curcumin during experimental induction of cholesterol gallstone in mice.

Spice bioactive compounds, capsaicin and curcumin, were both individually and in combination examined for antilithogenic

potential during experimental induction of cholesterol gallstones in mice.

70,364,020 subject-predicate-object relations

Thanks

51

Max Nanis Andrew Su

Mechanical Turk Workers! @bgood [email protected]

Ginger TsuengChunlei Wu

52

Could do well with far fewer workers..