tong shu li bio ontologies 2015 presentation
TRANSCRIPT
![Page 1: Tong shu li bio ontologies 2015 presentation](https://reader030.vdocument.in/reader030/viewer/2022032700/55d2f347bb61eb232f8b46f4/html5/thumbnails/1.jpg)
Creating structured biomedical knowledge networks via crowdsourcingTong Shu LiSu Lab, The Scripps Research InstituteBio-Ontologies SIG, ISMB 20152015-07-10
![Page 2: Tong shu li bio ontologies 2015 presentation](https://reader030.vdocument.in/reader030/viewer/2022032700/55d2f347bb61eb232f8b46f4/html5/thumbnails/2.jpg)
Knowledge networks allow for result interpretation
Bainbridge 2011
![Page 3: Tong shu li bio ontologies 2015 presentation](https://reader030.vdocument.in/reader030/viewer/2022032700/55d2f347bb61eb232f8b46f4/html5/thumbnails/3.jpg)
Network creation process
![Page 4: Tong shu li bio ontologies 2015 presentation](https://reader030.vdocument.in/reader030/viewer/2022032700/55d2f347bb61eb232f8b46f4/html5/thumbnails/4.jpg)
Relationship extraction subproblems
![Page 5: Tong shu li bio ontologies 2015 presentation](https://reader030.vdocument.in/reader030/viewer/2022032700/55d2f347bb61eb232f8b46f4/html5/thumbnails/5.jpg)
Crowdsourcing introduction
• Members of the public perform small tasks for small amounts of money• Tasks are usually difficult for
computers• Workers contribute as a way of
earning supplemental income• Useful source of labor for
academics and companies
![Page 6: Tong shu li bio ontologies 2015 presentation](https://reader030.vdocument.in/reader030/viewer/2022032700/55d2f347bb61eb232f8b46f4/html5/thumbnails/6.jpg)
Crowdsourcing driven biocuration
• Goal: replicate work done by PhD biocurators with members of the crowd• Advantages:• Scalability• Faster results at a lower cost• Well suited for non-automatable
tasks where an expert is not necessary
![Page 7: Tong shu li bio ontologies 2015 presentation](https://reader030.vdocument.in/reader030/viewer/2022032700/55d2f347bb61eb232f8b46f4/html5/thumbnails/7.jpg)
Crowdsourcing relies on gold standards for validation• Crowdsourcing methods need to be validated with gold standards• Gold standard: EU-ADR corpus [1]• “Positive”: known relationship• “Speculative”: uncertain relationship• “Negative”: known lack of relationship• “False”: no claim of relationship
• Sentence-bound relationships• 300 Abstracts annotated with relationships between
genes/diseases/drugs
[1] van Mulligan et al. (2012) J. Biomed Inform. 45: 879
![Page 8: Tong shu li bio ontologies 2015 presentation](https://reader030.vdocument.in/reader030/viewer/2022032700/55d2f347bb61eb232f8b46f4/html5/thumbnails/8.jpg)
Platform interface for relation annotation
![Page 9: Tong shu li bio ontologies 2015 presentation](https://reader030.vdocument.in/reader030/viewer/2022032700/55d2f347bb61eb232f8b46f4/html5/thumbnails/9.jpg)
Crowd agreement with the EU-ADR
• Strict agreement with EU-ADR: 71.67% (43/60 sentences)• Agreement after combining
speculative and positive: 76.67%
• 10 judgements/sentence• 10 cents/judgement• Time to complete: 2 hours• Total cost: $182.21 USD
![Page 10: Tong shu li bio ontologies 2015 presentation](https://reader030.vdocument.in/reader030/viewer/2022032700/55d2f347bb61eb232f8b46f4/html5/thumbnails/10.jpg)
Variability of gold standards
Number of experts who chose that relationship type
Percent of raw EU-ADR relations
![Page 11: Tong shu li bio ontologies 2015 presentation](https://reader030.vdocument.in/reader030/viewer/2022032700/55d2f347bb61eb232f8b46f4/html5/thumbnails/11.jpg)
Crowd agreement as a proxy for clarity
Percent of crowd which chose published EU-ADR answer
![Page 12: Tong shu li bio ontologies 2015 presentation](https://reader030.vdocument.in/reader030/viewer/2022032700/55d2f347bb61eb232f8b46f4/html5/thumbnails/12.jpg)
Crowd agreement and accuracy probability
Percent crowd agreement for the top choice
Percent of annotations which agreed with EU-ADR
![Page 13: Tong shu li bio ontologies 2015 presentation](https://reader030.vdocument.in/reader030/viewer/2022032700/55d2f347bb61eb232f8b46f4/html5/thumbnails/13.jpg)
Abstract level relationship extraction
![Page 14: Tong shu li bio ontologies 2015 presentation](https://reader030.vdocument.in/reader030/viewer/2022032700/55d2f347bb61eb232f8b46f4/html5/thumbnails/14.jpg)
Preliminary results
• AUC of 0.904• Max F-score of 0.791 (0.773
precision, 0.809 recall)• Max F-score achieved at a voting
score of 0.407• 4.5 hours, $54.72 USD to
annotate 30 abstracts
![Page 15: Tong shu li bio ontologies 2015 presentation](https://reader030.vdocument.in/reader030/viewer/2022032700/55d2f347bb61eb232f8b46f4/html5/thumbnails/15.jpg)
Conclusion and next steps
• Gold standards are variable and imperfect• Binary agreement may hide
interesting information• Expert and crowd agreement can
be used to measure gold standard consistency
• Ambiguous portions of a gold standard may need to be treated differently during evaluations• Integration with machine
learning methods• Data generation• Feature extraction
• Semantically typed relationships
![Page 16: Tong shu li bio ontologies 2015 presentation](https://reader030.vdocument.in/reader030/viewer/2022032700/55d2f347bb61eb232f8b46f4/html5/thumbnails/16.jpg)
Acknowledgements
• Dr. Andrew Su• Dr. Benjamin Good• Dr. Laura Furlong• Dr. Zhiyong Lu• The Su Lab• We’re hiring!
![Page 17: Tong shu li bio ontologies 2015 presentation](https://reader030.vdocument.in/reader030/viewer/2022032700/55d2f347bb61eb232f8b46f4/html5/thumbnails/17.jpg)
EU-ADR relationship examples• Positive
• For exposure levels within standard recommended guidelines, radioisotopes are far more likely to play a role in the occurrence of spontaneous abortions than X-rays.
• Speculative• Information from the SITE Cohort
Study should clarify whether use of these immunosuppressive drugs for ocular inflammation increases the risk of mortality and fatal cancer.
• Negative• We found no evidence of impaired
control of the carbohydrate and lipid metabolism or aggravation of vascular lesions during the two years an etonogestrel implant was used by diabetic women.
• False• The frequency of PONV did not
correlate to the amounts of alfentanil, propofol, postoperative antiemetics consumed, or to female gender, non-smoking status, and history of PONV or motion sickness.
![Page 18: Tong shu li bio ontologies 2015 presentation](https://reader030.vdocument.in/reader030/viewer/2022032700/55d2f347bb61eb232f8b46f4/html5/thumbnails/18.jpg)
Data for all 244 drug-disease sentences
![Page 19: Tong shu li bio ontologies 2015 presentation](https://reader030.vdocument.in/reader030/viewer/2022032700/55d2f347bb61eb232f8b46f4/html5/thumbnails/19.jpg)
Crowd agreement and accuracy probability
Percent of annotations which agreed with EU-ADR
Percent crowd agreement for the top choice