2014 stsi research_meeting_mturk_pdf
TRANSCRIPT
Crowdsourcing for information extraction:
(dynamic assembly of expert “humans”)
Benjamin Good The Scripps Research Institute
[email protected] @bgood
High level goal: improve access to published knowledge
2
articles added to PubMed per year >100/hour
Thanks to Suzi Lewis from GO for smoothie
4
Kilicoglu H, Shin D, Fiszman M, Rosemblat G, Rindflesch TC. SemMedDB: a PubMed-scale repository of biomedical semantic predications. Bioinformatics. 2012 Dec 1;28(23):
3158-60. doi: 10.1093/bioinformatics/bts591. Epub 2012 Oct 8.
70,364,020 subject-predicate-object relations
NLM tool
24 million abstracts
ExampleWhat diseases are treated with curcumin (turmeric)?
5478 results
select * from PREDICATION_AGGREGATE where s_name = 'Curcumin' and predicate = 'TREATS'
ExampleWhat diseases are treated with curcumin (turmeric)?
7
478 results
select * from PREDICATION_AGGREGATE where s_name = 'Curcumin' and predicate = 'TREATS'
Data is easy to access, but is it all
in there? Is it correct?
9
?!?!Effect on curcumin on cholesterol gall-stone induction.
Influence of dietary capsaicin and curcumin during experimental induction of cholesterol gallstone in mice.
Spice bioactive compounds, capsaicin and curcumin, were both individually and in combination examined for antilithogenic
potential during experimental induction of cholesterol gallstones in mice.
10
The diet that contained capsaicin, curcumin, or their combination reduced the incidence of cholesterol gallstones by 50%, 66%, and 56%, respectively.
Facts of life in NLP
• False Positives and False Negatives always present
• Human annotators remain the gold standard
• There are not nearly enough professional human annotators to process every document published
11
Observations
• There are about 2.92 billion Internet users
• Lots of them can read English
• Most of these would not have gotten that causal relation wrong for curcumin…
12 http://www.statista.com/statistics/273018/number-of-internet-users-worldwide/
Hypothesis
• We can generate the equivalent of professional annotators by incentivizing, guiding, and aggregating the labor of large numbers of non-professionals
13 Zhai 2013, Aroyo 2013, Burger 2014, Mortenson 2014, Good 2015
Information Extraction
1. Find mentions of high level concepts in text
2. Map mentions to specific terms in ontologies
3. Identify relationships between concepts
14
Microtask Crowdsourcing
• Distribute discrete units of work (aka “human intelligence tasks” or HITs) to many workers in parallel who are paid to solve them.
15
Reported 500,000 registered workers in
2011 [1]
[1] Paritosh P, Ipeirotis P, Cooper M, Suri S: The computer is the new sewingmachine: benefits and perils of crowdsourcing. WWW '11 2011:325–326.
AMT, how it works
16
Requester Tasks
AmazonFor each task, specify: • a qualification test • how many workers per
task • how much we will pay
per task • A Web form for
completing the task
Interact directly with Amazon system
Manages: • parallel execution of jobs • worker access to tasks
via qualification tests • payments • task advertising
Workers
How well can AMT workers, in aggregate, reproduce a gold standard disease mention corpus within the text of PubMed abstracts?
17
Corpus used for comparison
NCBI Disease corpus • 793 PubMed abstracts
• (100 development, 593 training, 100 test)
• 12 expert annotators (2 annotate each abstract)
6,900 “disease” mentions
18Doğan, Rezarta, and Zhiyong Lu. "An improved corpus of disease mentions in PubMed citations." Proceedings of the 2012
Workshop on Biomedical Natural Language Processing. Association for Computational Linguistics.
“Disease”Phrase is a disease IF: • it can be mapped to a unique UMLS metathesaurus
concept in one of these semantic types
19Doğan, Rezarta, and Zhiyong Lu. "An improved corpus of disease mentions in PubMed citations." Proceedings of the 2012
Workshop on Biomedical Natural Language Processing. Association for Computational Linguistics.
• and it contains information helpful to physicians
20
• Specific Disease: • “Diastrophic dysplasia”
• Disease Class: • “Cancers”
• Composite Mention: • “prostatic , skin , and lung cancer”
• Modifier: • ..the “familial breast cancer” gene , BRCA2..
Disease mentions
Experiment
21
Identify the disease mentions in 593 abstracts from the NCBI disease corpus
• 6 cents per HIT
• HIT = annotate one abstract from PubMed
• First HIT = survey, next 4 = training, then real
• 10% of rest of hits are gold standard tests
• 15 workers annotate each abstract
Instructions• Task: You will be presented with text from the biomedical literature which we believe may help
resolve some important medical questions. The task is to highlight words and phrases in that text which are diseases, disease groups, or symptoms of diseases. This work will help advance research in cancer and many other diseases!
• Highlight all diseases and disease abbreviations • “...are associated with Huntington disease ( HD )... HD patients
received...” • “The Wiskott-Aldrich syndrome ( WAS ) , an X-linked
immunodeficiency…” • Highlight the longest span of text specific to a disease
• “... contains the insulin-dependent diabetes mellitus locus …” • and not just ‘diabetes’.
• Highlight disease conjunctions as single, long spans. • “... a significant fraction of familial breast and ovarian cancer patients…”
• Highlight symptoms - physical results of having a disease• “XFE progeroid syndrome can cause dwarfism, cachexia, and
microcephaly. Patients often display learning disabilities, hearing loss, and visual impairment.
22
Qualification task: Q1Select all and only the terms that should be highlighted for each text segment:
23
1. “Myotonic dystrophy ( DM ) is associated with a ( CTG ) n trinucleotide repeat expansion in the 3-untranslated region of a protein kinase-encoding gene , DMPK , which maps to chromosome 19q13 . 3 . ”
• Myotonic
• dystrophy
• Myotonic dystrophy
• DM
• CTG
• trinucleotide repeat expansion
• kinase-encoding gene
• DMPK
Qualification task: Q2
24
2. “Germline mutations in BRCA1 are responsible for most cases of inherited breast and ovarian cancer . However , the function of the BRCA1 protein has remained elusive . As a regulated secretory protein , BRCA1 appears to function by a mechanism not previously described for tumour suppressor gene products.”
• Germline mutations
• BRCA1
• breast
• ovarian cancer
• inherited breast and ovarian cancer
• cancer
• tumour
• tumour suppressor
Qualification task: Q3
25
3. “We report about Dr . Kniest , who first described the condition in 1952 , and his patient , who , at the age of 50 years is severely handicapped with short stature , restricted joint mobility , and blindness but is mentally alert and leads an active life . This is in accordance with molecular findings in other patients with Kniest dysplasia and…”
• age of 50 years
• severely handicapped
• short
• short stature
• restricted joint mobility
• blindness
• mentally alert
• molecular findings
• Kniest dysplasia
• dysplasia
Qualification task results
26
• Experiment ran for 9 days • 346 workers attempted the qualification test • 145 (42%) passed
Passing threshold
Worker demographics: gender
27
0"
0.1"
0.2"
0.3"
0.4"
0.5"
0.6"
0.7"
female" male"
First HIT was a survey
Age
28
0"
0.1"
0.2"
0.3"
0.4"
0.5"
0.6"
0.7"
0.8"
age"18/21" age"21/35" age"36/45" age"46"and"greater"
Occupation
29 0" 0.05" 0.1" 0.15" 0.2" 0.25"
Unemployed"Student"
Technical"Science"
Computer"Business"
Educa=on"Programmer"
Art"Re=red"Labor"
Finance"Legal"
AEorney"Team"Leader"
Human"Resources"stay"at"home"mom"Biological"Sciences"
Bussiness"Caretaker"
Administra=ve"Assistant"microbiology"graduate"student"
Transporta=on"Industry"sales"
Hardware"Homemaker"
manufacturing"Chemical"Sciences"
mom"Web"Assessor"
Licensed"Prac=cal"Nurse"customer"service"rep"
Education
30
0" 0.05" 0.1" 0.15" 0.2" 0.25" 0.3"
Some"high"school"Finished"high"school"
Some"community"college"Finished"community"college"
Some"49year"college"Finished"49year"college"Some"masters"program"
Finished"masters"program"Some"PhD"program"
Finished"PhD"program"
Feedback interface:
• Game-like learning signal
• Either see gold standard data or data from other workers
33
Results: quantity, cost
• 9 days
• 589 abstracts annotated by 15 different workers (8,835 tasks completed)
• 4 hits for training + survey overhead cost
• total cost: $630.96
34
AMT, how it really works
37
Requester
Tasks
Amazon
Aggregation function
Workers
http://www.thesheepmarket.com/
Increase precision with voting
38
1 or more votes (K=1)This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies.
K=2This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies.
K=3This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies.
K=4This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies.
Aggregation function
Inter-Annotator agreement among experts, NCBI Disease corpus
40Doğan, Rezarta, and Zhiyong Lu. "An improved corpus of disease mentions in PubMed citations." Proceedings of the 2012 Workshop on Biomedical Natural Language Processing. Association for Computational Linguistics, 2012.
0.760.87
Average level of agreement
between expert annotators (stage 1)
Professionals achieve equivalent agreement only after reviewing each
other’s annotations.
41
0.760.87
In aggregate, our worker ensemble is faster, cheaper and more accurate than a single
expert annotator for this task
• experts had consistency (F) with other experts = 0.76.
• Only after viewing each other’s annotators did experts reach 0.87 consistency
• The turker ensemble had consistency with the finalized standard = 0.87 (with access to much less information)
42
We are not alone• Mortenson et al (2014), 25 workers, 2¢/task = 1 biomedical
ontology expert. “Using the wisdom of the crowds to find critical errors in biomedical ontologies: a study of SNOMED CT”. JAMIA
• Burger et al (2014). 5 workers, 7¢/task = 1 expert curator. Hybrid curation of gene–mutation relations combining automated extraction and crowdsourcing. Database.
• Zhai et al (2013), 5 workers, 3¢/task = 1 expert curator. Web 2.0-Based Crowdsourcing for High-Quality Gold Standard Development in Clinical Natural Language Processing” J Med Internet Res
• .. more (e.g. IBM research “Crowd Watson” project by Arroyo and Welty)
To do list
• Machine learning experiment on TopCoder
• Citizen Science (volunteer) implementation of this
• New tasks
44
mturk -> machine learning
• The main purpose of building this particular corpus was to train a disease tagging algorithm.
45
Next Steps with Disease Corpus
46
• We have assembled a new 1,000 document corpus
• (took 6 days)
• Simply adding it to the training data didn’t help
• Execute TopCoder contest to produce a better algorithm.
could we just do them all?
• we peaked at a rate of 500 abstracts processed per day (assuming 5 workers/doc)
• 284 workers contributing in a span of 6 days
• at 1 million/year we would need to get to 2,700/day to do them all
• $0.066*5*1000000 = $330,000
47
Moving towards $0/task and many more workers
• mark2cure.org
• A citizen science portal for volunteers to do the same stuff
• first experiment will recapitulate results from AMT
48
Information Extraction
1. Find mentions of high level concepts in text
2. Map mentions to specific terms in ontologies
3. Identify relationships between concepts
49
50
?!?!Effect on curcumin on cholesterol gall-stone induction.
Influence of dietary capsaicin and curcumin during experimental induction of cholesterol gallstone in mice.
Spice bioactive compounds, capsaicin and curcumin, were both individually and in combination examined for antilithogenic
potential during experimental induction of cholesterol gallstones in mice.
70,364,020 subject-predicate-object relations
Thanks
51
Max Nanis Andrew Su
Mechanical Turk Workers! @bgood [email protected]
Ginger TsuengChunlei Wu