rapid information retrieval by creating a parallel implementation of medline bob badgett dept of...
TRANSCRIPT
Rapid information retrieval by creating a parallel implementation
of Medline
Bob BadgettDept of Medicine
UTHSC San Antonio1/2006
As Mark Twain reportedly put it, "Be careful about reading health books; you may die of a misprint"
Johnson T. NEJM 1998
X
Only errors that led to proximate adverse event
Discharges have 12% adverse events
The most common diagnosis in primary care is…
• Questions occur in 1/3 of visits– We pursue answers to 55% of their questions– Find answers to 70% (with difficulty in 40%)– Result is only 40% of their questions being
answered (guessing in 60%)
• The “diagnosis of information failure” occurs in about 20% of patients– Twice as common as the most frequent single
primary care diagnosis
MEDLINE searching is misery when in a hurry–30 minutes to search–50% of clinical searches by experts fail–Compared to librarians, clinicians find
•50% less relevant articles•50% more irrelevant articles
Doctors have two minutes available
Current search engine
• http://SUMSearch.uthscsa.edu– Live searching of MEDLINE– Iterative searching– 400 - 500 queries per day– Internationally recognized
• Review: equivalent to PubMed
– Basis of current grant proposalsNLM in collaboration with American College of
Physicians, Thomson-MicroMedex, others
Current method
• http://sumsearch.uthscsa.edu– Externally searches MEDLINE via PubMed– PubMed’s publicly stated limit is one search
every 8 seconds– We do ~6 per query
Users of proposal
• Department of Medicine, UTHSC San Antonio– Bob Badgett
• School of Health Information Sciences, UTHSC Houston– Elmer Bernstam
Knowledge management – 1. Vastness
0
2000000
4000000
6000000
8000000
10000000
12000000
Year
Art
icle
s n
ot
lett
ers
0
500
1000
1500
2000
2500
3000
Pag
es
MEDLINE Articles not letters
Harrison's page
USPSTF 1 –
198
9: 6
0 Topic
s
USPSTF 2 –
199
6: 7
0 Topic
s
USPSTF 3 –
200
0-20
03: >
80 T
opics
Preve
ntion: 7
.4 h
ours/d
ay
Rx: In
crea
sing #
of m
eds
Knowledge management – Vast & complex
• Articles come– 13 million citations– Half million added per year– MEDLINE’s doubling time is 15 years
• Articles go– 1/3 of research eventually refuted/attenuated
• JAMA. 2005. PMID: 16014596
– Original studies - T1/2 = 45 years• Ann Intern Med. 2002. PMID: 12069563
– Practice guidelines – T1/2 = 6 years• JAMA. 2001. PMID: 11572738
• Some articles never should have been– 25 of 33 streptokinase studies maybe were not needed. PMID: 1614465
• But there is more…
Knowledge management – Misinformation
• Manuscript reviewers prefer manuscripts they agree with– J Lab Clin Med. 1994. PMID: 8051481
• Quality of reviews and textbooks– Original author misquoted in 15% of references– Errors in citation of references - 25%BMJ 1985. PMID: 3931753
• Biases that hinder disseminination– Publication bias against negative studiesBMJ 1998. PMID: 98113104
• Industry sponsored research• Media coverage of unpublished articles
– 1/3 never published
Proposed search engine
http://medinformatics.uthscsa.edu/grant-public/
Overall strategy
• Search ‘systematic textbook’– PIER (American College of Physicians)
• Depending on query– National Guidelines Clearinghouse– FDA– CDC– Others
• In case nothing found (20%?)– Evidence is too subtle or recent– MEDLINE
MEDLINE the data
• 15 million records in xml– Currently 52 GBs– Growing at 6 GBs per year
• Updated weekly
• Its thesaurus, MeSH is 23 descriptors and is updated yearly
• The UMLS meta-thesaurus has 5 million concept names
MEDLINE Strategy
Original studies Systematic reviews Practice guidelines
Other types
3-4 iterations with increasingly restrictive limits
3-4 iterations with increasingly
restrictive limits
3-4 iterations with increasingly restrictive limits
3-4 iterations with increasingly restrictive limits
12 searches per query
Need subscecond