search in medical text
TRANSCRIPT
Search in Medical Text
Sarvnaz Karimi
National ICT Australia (NICTA)The University of Melbourne
1 / 51
“What makes medical doctors use computers?”
2 / 51
Medicine and Computer Science
Data: Users:biomedical literature biomedical researchers, medical
doctors/students, curators
clinical records hospital staff, medical doctors
medical social media drug companies, health authorities
3 / 51
Challenges for Computer Scientists
Data: Challenges:biomedical literature creation of systematic reviews,
experts searching in the literature
clinical records search in medical records
medical social media discovery of drug side-effects
4 / 51
1 Systematic Reviews:
A Complex Search Episode for Evidence BasedPolicy and Practice
5 / 51
A long term smoker with chronic obstructive air-ways disease (COPD) who has recently quitsmoking has breathing difficulties. What are thesuitable non-drug therapies to improve the pa-tient’s breathing?
(example by Prof. Paul Glasziou)
6 / 51
Is adjunctive vitamin A effective in childrendiagnosed with non-measles pneumonia?
(Cochrane collaboration)
7 / 51
A clinician applying research to practice needs toknow:
What? interventions match the patient’s conditionsWhat? quality of evidence and applicabilityWhat? duration, dosage, ...
8 / 51
Growth of medical scientific literature archive(MEDLINE)
9 / 51
Evidence-Based Medicine (EBM)
Background Information/Expert Opinion
Randomized Controlled Trials (RCTs)
Critically Appraised Individual Articles
Critically Appraised Topics
Systematic
Reviews
Cohort Studies
Case−controlled Studies
Information
Filtered
Unfiltered Information
Qua
lity
of E
vide
nce
EBM applies the best available evidence to clinical decision-making.
10 / 51
A sample systematic reviewTitle: Vitamin A for non-measles pneumonia inchildren
Main question: Is adjunctive vitamin A effectivein children diagnosed with non-measles pneumo-nia?
Inclusion criteria: Only parallel-arm, randomizedcontrolled trials (RCTs) and quasi-RCTs, in whichchildren (younger than 15 years of age) with non-measles pneumonia were treated with adjunctivevitamin A, were included...
Methods: We searched The Cochrane Library,Cochrane Central Register of Controlled Trials(CENTRAL 2010, issue 3) which contains theAcute Respiratory Infections Group’s Specialised...
Main results: Six trials involving 1740 childrenwere included. There was no significant reduc-tion in mortality...
11 / 51
Systematic reviewing process
develop criteria for including studiesDefine a clear review question and
Systematic review
Presenting the results, interpretingthe findings, and drawing conclusions
?
Search
Selecting studies and collecting data
undertaking meta−analysisAnalysing the data and
12 / 51
A sample MEDLINE query
1. exp vitamin A/2. vitamin A.mp3. retinol.mp4. exp dietary supplements/5. or/1-46. exp pneumonia/7. pneumonia$.mp8. exp pneumonia, bacterial/9. exp pneumonia, lipid/10. exp pneumonia, mycoplasma/...14. exp pneumonia, viral/15. exp respiratory tract infections/16. acute adj respiratory.mp17. respiratory adj infection.mp18. respiratory adj disease.mp19. or/6-1820. 5 and 19
13 / 51
A sample MEDLINE query
1. exp vitamin A/2. vitamin A.mp3. retinol.mp4. exp dietary supplements/5. or/1-46. exp pneumonia/7. pneumonia$.mp8. exp pneumonia, bacterial/9. exp pneumonia, lipid/10. exp pneumonia, mycoplasma/...14. exp pneumonia, viral/15. exp respiratory tract infections/16. acute adj respiratory.mp17. respiratory adj infection.mp18. respiratory adj disease.mp19. or/6-1820. 5 and 19
13 / 51
Scale of evidence inclusion
Documentsto be read infull−text
To be actuallyincluded in the review
(500−2000)
Boolean Query Output
(4,000 −− 10,000)Title & Abstract
(10−100)
14 / 51
Where can we help?
Our contributions on introducing ranked retrieval is published in:* S. Karimi, S. Pohl, F. Scholer, L. Cavedon, J. Zobel, Boolean versus Ranked Querying forBiomedical Systematic Reviews, BMC Medical Informatics and Decision Making, Vol 10,Number 58, 2010* D. Martinez, S. Karimi, L. Cavedon, T. Baldwin, Facilitating Biomedical Systematic ReviewsUsing Ranked Text Retrieval and Classification, ADCS 2008, December 2008
15 / 51
To assist in query formulation for an initial searchstrategy
Suggesting key-terms and synonyms e.g neoplasm for cancer
Bag-of-words to Boolean Suggesting structure to specified queryterms. Template queries already exist for limited inclusion criteria.
16 / 51
Consistency verification
Automatic verification against inclusion criteria
Automatic self-consistency verification: If a reviewer selects onedocument, but later chooses to ignore a similar one, the systemshould flag this possible inconsistency.
17 / 51
Dynamic relevance feedback
Document selection process is currently paper-based.
A dynamic relevance feedback approach that is active during thedocument selection process could rank the remaining documentsbased on estimated importance.
Dynamic relevance feedback might identify additional documentsthat exist in the collection but were missed by the initial searchstrategy.
18 / 51
Analysis and Meta-analysis
There are tools that assist analysing already extracted numerical datafrom one or multiple studies, but the input to these tools should first beextracted manually from text. Automatic information extraction cansave hours.
19 / 51
Review update
Updating the review with new evidence so that it remains relevant.
Treatment X works. Treatment Y is preferred over X.Year 2005 Year 2010
20 / 51
Literature survey is hard!
21 / 51
2 User-Study:
Medical Expert’s Search Behavior
22 / 51
Subject: Library needs your helpVolunteers neededStudy : Improving Tools for Searching Medical Literature(Alfred Health Ethics Committee approved)Dear All, I am writing to you as a participant in a Library training class at the Ian PotterLibrary in 2010... probably realise that systems for online searching are often complexand not that easy to use...The Ian Potter Library is participating in a study together withNICTA (University of Melbourne) and RMIT, looking at ways of improving search toolsfor medical literature (see attached). We need volunteers..
Volunteers required - Improving tools for searching medica l literatureThis study aims to improve quality of search results in the biomedical domain. Theresearch team needs participants with (bio)medical background, especially medicalstudents/researchers, to carry out search tasks using search tools. The session willtake about 40 minutes. All participants receive movie vouchers. Alfred Hospital HRECnumber 22/10. Further information...
23 / 51
Why user study?
How should a biomedical search engine look like?
What are the needs of specific users of biomedical search tools?• Users’ behaviour, searching and querying style,...
Which one of our proposed systems is more effective?
24 / 51
Subjects
Experts : educational background in biomedical sciences andrelated domains.
Non-experts : absolutely no education or working experience inbiomedical domains.
We recruited 46 experts of which 2 were assigned to a pilot study, and6 did not finish the tasks, and also recruited 9 non-experts.
25 / 51
User study format
Subjects were asked to imagine that they should write a short reportabout each given topic. Their goal was to carry out searches to finduseful articles that they would want to read in order to prepare theirreport.Each subject was asked to complete the following:
1. Opening questionnaire2. Search phase, consisting of six tasks. For each task:
• Pre-task questionnaire to establish prior familiarity with topic• Search for useful documents• Post-task questionnaire about search experience
3. Closing questionnaire
26 / 51
Tasks assigned to the subjects
1 exercise therapy for cystic fibrosis2 families and grief in the ICU3 cognitive behaviour therapy for postnatal depression4 vitamin D and dementia5 ankle injuries and gait analysis6 prevention of type 2 diabetes in developing countries
These topics were previously referred to health librariansin Ian Potter library of Alfred Hospital in Melbourne, byeither students or staff.
27 / 51
Search systems and interfaces
System A: A Boolean retrieval system similar to PubMed. Resultswere ordered by date. Very complicated multi-line Booleanquerying was supported.
System B: A combination of ranked and Boolean system. Bothranked and Boolean querying were supported. If a query wasBoolean, the output was ranked based on the keywords.
System C: Topic modelling based system. The output of thequeries were topic modelled (LDA) and then ranked under eachtopic.
28 / 51
Preferred system and difficulty of using the systems
Only a slight difference between A and B (not-ranked andranked), but C (topic-modelled) was significantly less liked.
Between A and B, the ranked results of system B were slightlybut significantly better liked.
System C (topic modelling) was rated hardest to work with.
29 / 51
Topic Familiarity and its effect on querying
Tasks Queries Ranked Boolean Complex Total queryentered queries queries Boolean terms
Not familiar 147 438 (3.0) 154 (1.0) 271 (1.8) 13 (0.1) 1840 (13.2)Familiar 71 204 (3.0) 51 (0.7) 148 (2.1) 5 (0.1) 960 (15.9)Very familiar 10 14 (1.4) 5 (0.5) 9 (0.9) 0 (0.0) 92 (9.2)p-value 0.0172 0.0184 0.0334 0.6511 0.001* The table shows the sum for each category, with the mean indicated in parentheses.
The number of queries entered varied with their level of topic familiarity.
More queries for topics that subjects were not familiar with.
The number of ranked or Boolean queries employed by searchers varies significantly withthe level of familiarity.
For very familiar topics, users employ fewer query terms.
30 / 51
Familiarity, visited result pages, and documentsselected as relevant
Tasks Result pages Itemsviewed saved
Not familiar 147 494 (3.4) 999 (6.8)Familiar 71 253 (3.6) 425 (6.0)Very familiar 10 28 (2.8) 57 (5.7)p-value 0.2801 0.0535
No significant relationship was found between prior familiarity and the numberof result pages viewed.
The number of items saved (relevant) did not vary significantly with topicfamiliarity.
31 / 51
Familiarity based on the pre-task questionnaire andDifficulty based on post-task questionnaire
DifficultyEasy Medium Hard
Not familiar 78 44 25 147Familiar 44 21 16 81
122 65 41p-value=0.7678
No relation was confirmed between familiarity and perceived difficultyof working with the systems, in other words being familiar did NOTmake the task easier or harder .
32 / 51
3 Drug Side-Effects:
What Do Patient Forums Reveal?
33 / 51
Drug side-effect
A drug side-effect is an effect (positive or negative)that is secondary to the one intended.
Some side-effects are severe, such as organ failure,high blood sugar, stroke, heart disease, neuropathy,and some are mild, such as nausea, and dizziness.
Adverse side-effects that are unknown claim manylives each year.
34 / 51
Side-effect discovery
or demand
+
Volunteers
Clinical trials
35 / 51
Post-marketing Surveillance
Clinical trials are expensive, sometimes out-dated, timeconsuming, and often small-scale.
Professionals and drug users can report mostly severeside-effects in official web-sites.
Patient social networks and forums – such as DailyStrength, andAskPatient – collect feedback directly from drug consumers.
Data in such forums may be of questionable reliability, but itprovides indications of real side-effects, both mild and severe .
36 / 51
A new era in side-effect discovery
or demand
+Clinical trials
Volunteers
update
feedback
37 / 51
Trade-off in using data from social media
Advantages:
large amount of data
data generated by a large variety of people who shareinformation through personal blogs and public forums.
Disadvantages:(Medical social data is difficult to access and process)
data is scattered over multiple sources.
availability of useful resources is limited (ownership).
data often contains noise (informal language, or mis-spelledspecialised terms) so traditional methods for pre-processing suchas POS tagging, chunking, and sentence segmentation may notwork well.
38 / 51
What you may see in a medical forum
User A Side effects from MedicineX therapy?Post 1 . . . Since taking MedicineX for about 3 years, some time in the last
year or so I began to experience significant ringing in the ears. . . .
User B Re: Side effects from MedicineX therapy?Post 2 I haven never heard about it. But I had nausea, vomiting and fever.
User C Re: Side effects from MedicineX therapy?Post 3 It is not true at all. MedicineX is one of medicines which have least
side-effects. In fact, my heart related symptoms became better.
User D Re:Re: Side effects from MedicineX therapy?Post 4 I didn’t have nausea or vomiting but had a skin rash for a few days.
User E Warning!!! BLOOD CLOTHS IN MY LUNG!!!Post 5 After using MedicineX for 3.5 years, my doctor found a blood cloths
in my lung . . .
User A Thank youPost 6 thx. My doctor told me my ear ringing was not MedicineX but . . .
39 / 51
Everybody’s different
All previous studies are focused onextracting mentions of adverse effects andmostly ignore the contributing factors thatare patient-dependent.
We are interested in extracting bothadverse and beneficial side-effects alongwith background information on thepatients that could contribute to theirpositive or negative experience.
This is particularly interesting becauseclinical trials do not cover all possible patientconditions.
40 / 51
Entities to be extractedEntity ExampleDisease “After 3 years of having Ativan keep the anxiety in check,
...”Symptom “My heart was racing and ..”Drug “I must be addicted to Xanax”Duration “Began taking 5 mg daily(broke the 10mg pill in half) for
4 weeks”Dosage “Began taking 5 mg daily ...”Frequency “Began taking 5 mg daily ..”
Positive side-effect “I’m taking this for my back pain but it has been reducingmy stress as well.”
Negative side-effect “Sometimes causes drowsiness.”Lack of negative side-effect “I feel dizzy and low but no vomiting.”Lack of positive side-effect “I was feeling even more energetic initially but it doesnt
work like that any more”Positive outcome “No apparent side effects thus far and results have been
very effective for the pain.”Negative outcome “Problem is you build up a tolerance and eventually the
drug quits working as has been my case.”Gender of patient “I was prescribed this for anxiety when my teenage
daughter was driving my wife and I into”Age “I’m in my forties”
41 / 51
Relations to be extracted
Relation DescriptionDrug-Drug If a patient explicitly mentions that taking two named
drugs together had any effect or no effect, then the twodrugs are annotated by a positive , negative , or no ef-fect relation.
MedicineA MedicineB... was fine till I started taking as well...
Dosage-Frequency The frequency in which a dosage is taken is annotatedby a for relation.
Dosage-Duration The prolong of intake for a specific dosage is annotatedwith a for relation.
Drug-Dosage The dosage which a drug is taken is annotated with ataken relation.
42 / 51
Data
We gathered data for ten different drugs from two differentforums: AskPatient 2 and eHealth Forum 3.
A total of 5,996 posts (40,871 sentences) was collected.
We only relied on free-text comments in each post.
The annotation is ongoing by two annotators.
2http://www.askapatient.com/3http://ehealthforum.com/
43 / 51
A Survey:What do people think about medicine and socialmedia?
44 / 51
Who participated?
# Participants Gender Age Range EducationGroup A 83 61% M 2% under 21 57% G
39% F 83% 21-39 35% B15% above 40 8% under
Group B 379 42% M 7% under 21 20% G57% F 69% 21-39 7% B
24% above 40 73% underAll 462
Group A: survey posted on Facebook, e-health forum, and Yahoo health forumGroup B: Amazon Mechanical Turkers
B: bachelor degree, G: Graduate degreeM: Moderately, V: Very
45 / 51
How healthy our participants were? Do they trust theirdoctors?
# Participants Healthy Trust DoctorsGroup A 83 45% V 56% V
43% M 34% M12% not 10% little/none
Group B 379 53% V 68% V41% M 26% M6% not 12% little/not
Group A: survey posted on Facebook, e-health forum, and Yahoo health forumGroup B: Amazon Mechanical Turkers
B: bachelor degree, G: Graduate degreeM: Moderately, V: Very
46 / 51
Do people use medical social networks, forums, blogs,or medical information on Internet? Do people sharetheir experiences with drug side effects?
Generic Social Medical Social Int. search Trust Int. ShareGroup A 83% M to E 24% yes 48% M to E 38% well 4% M to E
13% S 76% no 47% S 51% little 17% S4% N 5% N 11% none 79% N
Group B 79% M to E 21% yes 56% M to E 50% well 31% M to E14% S 79% no 39% S 38% little 30% S7% N 5% N 12% none 54% N
Group A: survey posted on Facebook, e-health forum, and Yahoo health forumGroup B: Amazon Mechanical Turkers
N: never, S: sometimes, M: moderately often, E: extremely often
Not so healthy people share more than very healthy people (53% vs 38%).
47 / 51
What’s next
We propose finding patterns of side-effect reporting, both usingheuristics and automatically extracted rules. The outcome can beused in enriching side-effect ontologies.
One of our contributions will be providing the research communitywith a rich annotated collection that is large enough forexperimentation and diverse in the types of drugs and annotatedconcepts.
The existing literature does not provide a comparison overprevious approaches, mainly due to lack of availability of astandard and publicly accessible dataset. We intend to conduct acomprehensive comparison of existing methods as well as ourown techniques.
48 / 51
Summary
There are many areas in medicine and health which can benefit frommore effective search in text:
Techniques used for extensive search in biomedical literature foranswering focused clinical questions (systematic reviewing) arestill way behind the state-of-the-art search technology.
Domain-experts search differently from laymen and biomedicalsearch engines should accommodate these differences.
Analysing medical social media is one method of capturingpreviously undiscovered drug side-effects.
49 / 51
Expert Subjects (Opening questionnaire)
Category Number of SubjectsGender female 27 (71%)
male 11 (29%)
Position allied health 12 (32%)biomedical researcher 11 (29%)medical student 9 (24%)health librarian 3 (8%)nurse 1 (3%)
Search tool used PubMed 33 (87%)Google Scholar 31 (82%)Ovid 22 (58%)EBSCO 15 (40%)Other 7 (18%)
Satisfaction very satisfied 3 (8%)satisfied 30 (79%)borderline 5 (13%)unsatisfied 0 (0%)
50 / 51
Expert Subjects (Cont.)
Category Number of SubjectsSearch tool usage daily 7 (18%)
weekly 14 (37%)monthly 13 (34%)rarely 4 (10%)
Database used Medline 30 (79%)Journals@Ovid Full Text 17 (45%)CINAHL 15 (35%)Cochrane Systematic Reviews 12 (32%)PsycINFO 11 (29%)EMBASE 6 (16%)AMED 5 (13%)Other 12 (32%)
51 / 51