mark2cure: a crowdsourcing platform for biomedical literature annotation

1
0 10 20 30 40 50 60 70 consistency with NCBI gold standard Consistency with NCBI standard, Development Corpus mturk experiment 1, minimum 3 votes per annota9on mturk experiment 2, minimum 3 votes per annota9on NCBO annotator (Human Disease Ontology) NCBI condi9onal random field trained on the AZ corpus (only "all" reported) Mark2Cure: a crowdsourcing platform for biomedical literature annotation Identifying concepts and relationships in biomedical text enables knowledge to be applied in computational analyses, such as gene set enrichment evaluations, that would otherwise be impossible. As such, there is a long and fruitful history of BioNLP projects that apply natural language processing to address this challenge. However, the state of the art in BioNLP still leaves much room for improvement in terms of precision, recall and the complexity of knowledge structures that can be extracted automatically. Expert curators are still vital to the process of knowledge extraction but are in short supply. Recent studies have shown that workers on microtasking platforms such as Amazon’s Mechanical Turk (AMT) can, in aggregate, generate high- quality annotations of biomedical text. In addition, several recent volunteer-based citizen science projects have demonstrated the public’s strong desire and ability to participate in the scientific process even without any financial incentives. Based on these observations, the mark2cure initiative is developing a Web interface for engaging large groups of people in the process of manual literature annotation. The system will support both microtask workers and volunteers. These workers will be directed by scientific leaders from the community to help accomplish ‘quests’ associated with specific knowledge extraction problems. In particular, we are working with patient advocacy groups such as the Chordoma Foundation to identify motivated volunteers and to develop focused knowledge extraction challenges. We are currently evaluating the first prototype of the annotation interface using the AMT platform. Benjamin M Good, Max Nanis, Andrew I Su The Scripps Research Institute, La Jolla, California, USA REFERENCES We acknowledge support from the National Institute of General Medical Sciences (GM089820 and GM083924). CONTACT Benjamin Good: [email protected] Andrew Su: [email protected] 1. Zhai, Haijun, et al. "Web 2.0-based crowdsourcing for high-quality gold standard development in clinical natural language processing." Journal of medical Internet research 15.4 (2013). 2. Doğan, Rezarta Islamaj, and Zhiyong Lu. "An improved corpus of disease mentions in PubMed citations." Proceedings of the 2012 Workshop on Biomedical Natural Language Processing. Association for Computational Linguistics, 2012. ABSTRACT FUNDING Challenge Next Steps RESULTS, Comparison to concept recogniFon tools Proof of Concept Experiment with AMT (work in progress) Consistency(A,B) = 2*100*(N shared annota9ons) (N(A) + N(B)) Can nonexperts annotate disease occurrences in text beRer than machines? To what degree can we reproduce the NCBI disease corpus [2]? Objec9ves for Annotators Highlight all diseases and disease abbreviaFons “...are associated with Hun9ngton disease ( HD )... HD pa9ents received...” “The WiskoRAldrich syndrome ( WAS ) …” Highlight the longest span of text specific to a disease “... contains the insulindependent diabetes mellitus locus …” and not just ‘diabetes’. “...was ini9ally detected in four of 33 colorectal cancer families…” Highlight disease conjuncFons as single, long spans. “...the life expectancy of Duchenne and Becker muscular dystrophy pa9ents..” “... a significant frac9on of familial breast and ovarian cancer , but undergoes…” Highlight symptoms physical results of having a disease XFE progeroid syndrome can cause dwarfism, cachexia, and microcephaly. Pa9ents ofen display learning disabili9es, hearing loss, and visual impairment. Highlight all occurrences of disease terms “Women who carry a muta9on in the BRCA1 gene have an 80 % risk of breast cancer by the age of 70. Individuals who have rare alleles of the VNTR also have an increased risk of breast cancer ( 24 )”. 6900 disease men9ons in 793 PubMed abstracts developed by a team of 12 annotators covers all sentences in a PubMed abstract Disease men9ons are categorized into Specific Disease, Disease Class, Composite Men9on and Modifier categories. Goa l: structure all knowledge published as text on the same day it appears in PubMed with expert human level precision and recall 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000 Number arFcles added to PubMed Approach: CiFzen Science Idea : People are very effec9ve processors of text, even in areas where they aren’t experts [1]. Numerous experiments have shown the public’s desire to contribute to science. Lets give them an opportunity to help annotate the biomedical literature. Use the AMT to test the concept before aRemp9ng to mo9vate a ci9zen science movement Tes9ng on the 100 abstract “development set”, 5 workers per abstract, $.06 per completed abstract RESULTS, 2 experiments AMT workers performed beRer than condi9onal random field trained on the AZ corpus. Examples Con9nued refinement of the annota9on interface with AMT Experiment to compare AMT results versus volunteers Collabora9ons with disease groups such as the Chordoma Founda9on to prime the flow of ci9zen scien9st annotators We are hiring! Looking for postdocs, programmers interested in crowdsourcing and bioinforma9cs contact [email protected] Number of votes per annota9on 0 0.2 0.4 0.6 0.8 1 1 2 3 4 5 precision recall F one week each, ($30) one month turkspecific developer 9me... Costs Expanded instruc9ons with more examples Minor interface changes (selec9ng one term automa9cally selects all other occurrences) Worker instruc9ons Exp. 2 changes Nearly iden9cal results Exp. 1 results

Upload: goodb

Post on 10-May-2015

200 views

Category:

Documents


6 download

DESCRIPTION

Poster about mar2cure.org

TRANSCRIPT

Page 1: Mark2Cure: a crowdsourcing platform for biomedical literature annotation

0  

10  

20  

30  

40  

50  

60  

70  

consistency  with

 NCB

I  gold  stan

dard   Consistency  with  NCBI  standard,  Development  Corpus  

mturk  experiment  1,  minimum  3  votes  per  annota9on  

mturk  experiment  2,  minimum  3  votes  per  annota9on  

NCBO  annotator  (Human  Disease  Ontology)  

NCBI  condi9onal  random  field  trained  on  the  AZ  corpus  (only  "all"  reported)  

Mark2Cure: a crowdsourcing platform for biomedical literature annotation

Identifying concepts and relationships in biomedical text enables knowledge to be applied in computational analyses, such as gene set enrichment evaluations, that would otherwise be impossible. As such, there is a long and fruitful history of BioNLP projects that apply natural language processing to address this challenge. However, the state of the art in BioNLP still leaves much room for improvement in terms of precision, recall and the complexity of knowledge structures that can be extracted automatically. Expert curators are still vital to the process of knowledge extraction but are in short supply.

Recent studies have shown that workers on microtasking platforms such as Amazon’s Mechanical Turk (AMT) can, in aggregate, generate high-quality annotations of biomedical text. In addition, several recent volunteer-based citizen science projects have demonstrated the public’s strong desire and ability to participate in the scientific process even without any financial incentives. Based on these observations, the mark2cure initiative is developing a Web interface for engaging large groups of people in the process of manual literature annotation. The system will support both microtask workers and volunteers. These workers will be directed by scientific leaders from the community to help accomplish ‘quests’ associated with specific knowledge extraction problems. In particular, we are working with patient advocacy groups such as the Chordoma Foundation to identify motivated volunteers and to develop focused knowledge extraction challenges. We are currently evaluating the first prototype of the annotation interface using the AMT platform.

ABSTRACT  

Benjamin M Good, Max Nanis, Andrew I Su

The Scripps Research Institute, La Jolla, California, USA

REFERENCES  

We acknowledge support from the National Institute of General Medical Sciences (GM089820 and GM083924).

CONTACT  Benjamin Good: [email protected] Andrew Su: [email protected]

1.  Zhai, Haijun, et al. "Web 2.0-based crowdsourcing for high-quality gold standard development in clinical natural language processing." Journal of medical Internet research 15.4 (2013).

2.  Doğan, Rezarta Islamaj, and Zhiyong Lu. "An improved corpus of disease mentions in PubMed citations." Proceedings of the 2012 Workshop on Biomedical Natural Language Processing. Association for Computational Linguistics, 2012.

ABSTRACT  

FUNDING  

Challenge  Next  Steps  

RESULTS,  Comparison  to  concept  recogniFon  tools  Proof  of  Concept  Experiment  with  AMT  (work  in  progress)  

Consistency(A,B)  =  2*100*(N  shared  annota9ons)  

(N(A)  +  N(B))  

Can  non-­‐experts  annotate  disease  occurrences  in  text  beRer  than  machines?  

To  what  degree  can  we  reproduce  the  NCBI  disease  corpus  [2]?  

Objec9ves  for  Annotators  Highlight  all  diseases  and  disease  abbreviaFons    “...are  associated  with  Hun9ngton  disease  (  HD  )...  HD  pa9ents  received...”  “The  WiskoR-­‐Aldrich  syndrome  (  WAS  )  …”    Highlight  the  longest  span  of  text  specific  to  a  disease    “...  contains  the  insulin-­‐dependent  diabetes  mellitus  locus  …”  and  not  just  ‘diabetes’.  “...was  ini9ally  detected  in  four  of  33  colorectal  cancer  families…”  Highlight  disease  conjuncFons  as  single,  long  spans.    “...the  life  expectancy  of  Duchenne  and  Becker  muscular  dystrophy  pa9ents..”  “...  a  significant  frac9on  of  familial  breast  and  ovarian  cancer  ,  but  undergoes…”  Highlight  symptoms  -­‐  physical  results  of  having  a  disease  “XFE  progeroid  syndrome  can  cause    dwarfism,  cachexia,  and  microcephaly.  Pa9ents  ofen  display  learning  disabili9es,  hearing  loss,  and  visual  impairment.  Highlight  all  occurrences  of  disease  terms  “Women  who  carry  a  muta9on  in  the  BRCA1  gene  have  an  80  %  risk  of  breast  cancer  by  the  age  of  70.  Individuals  who  have  rare  alleles  of  the  VNTR  also  have  an  increased  risk  of  breast  cancer  (  2-­‐4  )”.      

•  6900  disease  men9ons  in  793  PubMed  abstracts  •  developed  by  a  team  of  12  annotators  •  covers  all  sentences  in  a  PubMed  abstract  •  Disease  men9ons  are  categorized  into  Specific  Disease,  

Disease  Class,  Composite  Men9on  and  Modifier  categories.    

Goal:  structure  all  knowledge  published  as  text  on  the  same  day  it  appears  in  PubMed  with  expert-­‐human  level  precision  and  recall  

0  

100000  

200000  

300000  

400000  

500000  

600000  

700000  

800000  

900000  

1000000  

Number  arFcles  added  to  PubMed  

Approach:  CiFzen  Science  

Idea:  People  are  very  effec9ve  processors  of  text,  even  in  areas  where  they  aren’t  experts  [1].    Numerous  experiments  have  shown  the  public’s  desire  to  contribute  to  science.    Lets  give  them  an  opportunity  to  help  annotate  the  biomedical  literature.  

Use  the  AMT  to  test  the  concept  before  aRemp9ng  to  mo9vate  a  ci9zen  science  movement  

Tes9ng  on  the  100  abstract  “development  set”,  5  workers  per  abstract,  $.06  per  completed  abstract  

RESULTS,  2  experiments  

AMT  workers  performed  beRer  than  condi9onal  random  field  trained  on  the  AZ  corpus.  

Examples  

•  Con9nued  refinement  of  the  annota9on  interface  with  AMT  

•  Experiment  to  compare  AMT  results  versus  volunteers  

•  Collabora9ons  with  disease  groups  such  as  the  Chordoma  Founda9on  to  prime  the  flow  of  ci9zen  scien9st  annotators  

We  are  hiring!    Looking  for  postdocs,  programmers  interested  in  crowdsourcing  and  bioinforma9cs  contact  [email protected]  

Number  of  votes  per  annota9on    

0  

0.2  

0.4  

0.6  

0.8  

1  

1   2   3   4   5  

precision  

recall  

F  

Experiment  1  

•  one  week  each,  ($30)  •  one  month  turk-­‐specific  

developer  9me...  

Costs  

•  Expanded  instruc9ons  with  more  examples  •  Minor  interface  changes  (selec9ng  one  

term  automa9cally  selects  all  other  occurrences)  

Worker  instruc9ons  

Exp.  2  changes  

Nearly  iden9cal  results  

Exp.  1  results