medical information retrieval challenges in a webbed world william hersh, m.d. associate professor...
TRANSCRIPT
![Page 1: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/1.jpg)
Medical Information RetrievalChallenges in a Webbed World
William Hersh, M.D.
Associate Professor and Chief
Division of Medical Informatics and Outcomes Research
Oregon Health Sciences University
![Page 2: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/2.jpg)
Overview
Describe current information retrieval technology
Summarize IR research activities and their results
Discuss the implications of the World Wide Web (WWW) for IR
![Page 3: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/3.jpg)
Overview of IR Process
Indexinglanguage
Indexinglanguage
QueriesQueries DocumentsDocuments
Searchengine
Searchengine
Retrieval Indexing
![Page 4: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/4.jpg)
What is the field of IR?
Concerned with creation, storage, organization, and retrieval of computer-based information
“IR” has traditionally focused on retrieval of information from heterogeneous textual databases
Recent expansion to multimedia and integration with “traditional” databases
![Page 5: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/5.jpg)
Why is IR pertinent to health care?
Growth of knowledge has long surpassed human memory capabilities
Clinicians have frequent and unmet information needs
Primary literature on a given topic can be scattered and hard to synthesize
Non-primary literature sources are often neither comprehensive nor systematic
![Page 6: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/6.jpg)
Further reading
Hersh WR, Information Retrieval: A Health Care Perspective, Springer-Verlag, 1996
Hersh WR, Hickam DH, How well do physicians use electronic information retrieval systems? A framework for investigation and systematic review, Journal of the American Medical Association, 1998, 280: 1347-1352
![Page 7: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/7.jpg)
IR state of the art
Databases Indexing Retrieval Evaluation
![Page 8: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/8.jpg)
Databases
Bibliographic– References to journal literature– Used in initial IR systems– Most famous example is MEDLINE
» Nearly 9 million references to peer-reviewed literature dating back to 1966
» Covers about 3,000 journals, mostly English-based
» About 300,000 new references added yearly
» Maintained by National Library of Medicine
![Page 9: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/9.jpg)
Databases (cont.)
Full-text– Journal literature has been available for over a
decade in text-only and at high cost– Last decade has seen increasing growth of CD-
ROM market– New “evidence-based” resources are becoming
available, e.g., Best Evidence, Cochrane Hypertext
– Information linked in non-linear fashion
![Page 10: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/10.jpg)
Indexing
Two major types:– Human indexing with controlled vocabulary
» MEDLINE uses the 18,000-term Medical Subject Headings (MeSH) vocabulary
– Computer assignment of all words in record» Often a stop word list to remove common words
(e.g., the, and, which) is used
» Some systems “stem” words to root form (eg, coughs to cough)
![Page 11: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/11.jpg)
Limitations of indexing approaches
Human indexing– Inconsistency– Inadequate indexing vocabulary
Word indexing– Synonymy - e.g., cancer and carcinoma– Polysemy - e.g., lead– Granularity - e.g., antibiotics, penicillin– Focus
![Page 12: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/12.jpg)
Retrieval
Traditional approach: indexing terms connected by AND, OR
Most bibliographic systems allow searching on both vocabulary and text words
Proximity operators require words to be within a certain range
Some systems hide Boolean operators
![Page 13: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/13.jpg)
Limitations of retrieval approaches
Novices confuse ANDs and ORs Complex user interfaces dissuade busy
users Returned documents displayed in arbitrary
or, at best, reverse chronological order
![Page 14: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/14.jpg)
An alternative approach to indexing and retrieval
Called vector-space, word-statistical, automated retrieval…
Developed by Salton in 1960’s but since works best for end-users did not achieve commercial prominence until 1990’s
Based on notion of finding similarity in words between user’s query and document
Used in Knowledge Finder (Aries) and most Web search engines
![Page 15: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/15.jpg)
Word-statistical indexing
Indexing done of all words (though nothing precludes use of MeSH or other terms)
After stop word filtering and stemming, each word in each document assigned a weight based on product of IDF*TF:– Inverse document frequency of term i
» IDFi = log(# documents/# documents with term)+1
– Term frequency of term i in document j» Tfij = log(frequency of term in document)+1
![Page 16: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/16.jpg)
Word-statistical retrieval
Queries entered in natural language, subject to same stop list and stemming
Each document gets a score based on sum of weights for each query term in the document
Results are sorted and presented to user (relevance ranking)
![Page 17: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/17.jpg)
This approach allows other features:
Relevance feedback– After user designates relevant documents,
query modified Query expansion
– Same but using top-ranked documents without user relevance designations
![Page 18: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/18.jpg)
Evaluation
What questions to ask?– Is system used?– Are users satisfied?– Do they find relevant information?– Do they complete their desired task?
Most research has focused on retrieval of relevant documents
![Page 19: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/19.jpg)
Relevance-based measures
Recall =# retrieved and relevant documents
-------------------------------------------
# relevant documents in collection Precision =
# retrieved and relevant documents
-------------------------------------------
# relevant documents in search
![Page 20: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/20.jpg)
Comments about recall and precision
There tends to be a trade-off between the two
“Relevance” can be a slippery notion It is unclear whether they correlate with a
user’s success in using an IR system The proliferation of standard test collections
leads to a great deal of research that excludes real users
![Page 21: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/21.jpg)
How well do users search?(Haynes et al., Annals of Internal Medicine, 1990)
Recall Precision
Novice 27% 38%
Experiencedclinician
48% 48%
Librarian 49% 57%
![Page 22: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/22.jpg)
More searching results(Hersh et al., Bull Med Libr Assoc, 1994)
Retrieved Recall PrecisionNovice physicians Knowledge Finder 88.8 68.2 14.7Novice physicians KF top 15 14.6 31.2 24.8Librarians Full MEDLINE 18.0 37.1 36.1Librarians Text words only 17.0 31.5 31.9Exp. Physicians Full MEDLINE 10.9 26.6 34.9Exp. Physicians Text words only 14.8 30.6 31.4
![Page 23: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/23.jpg)
Other results
Little overlap among retrieval sets– Searchers tend to find similar quantities of
disparate relevant documents Novice searchers are satisfied with results
– Adequate information or ignorant bliss?
![Page 24: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/24.jpg)
New approaches to evaluation
Changing the research questions– How well can clinical users answer questions?
– What factors are association with success?» Demographics, experience, cognitive factors, and
searching mechanics?
Ongoing study funded by NLM Challenges
– Appropriate questions, database, sample size, etc.
![Page 25: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/25.jpg)
IR research directions
Enhancing word-statistical approaches Linguistic approaches Enhancing conventional indexing and
retrieval
![Page 26: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/26.jpg)
Enhancing word-statistical approaches
Passage retrieval– Giving weight to documents that have sections
mapping closely to the query Use of phrases
– High, blood, and pressure have more meaning when occurring near each other
![Page 27: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/27.jpg)
Linguistic approaches
Syntactic approaches– Conceptual matter tends to occur in noun
phrases Semantic approaches
– Can we overcome problems of synonymy, polysemy, granularity, etc.?
![Page 28: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/28.jpg)
Identifying semantics in documents
SAPHIRE (Hersh and Hickam, 1995)– Direct mapping of text to terms in large
controlled vocabulary (UMLS Metathesaurus)– Works best when exact terms and synonyms
present, less well when terms vague or synonyms non-standard
MEDSPACE (Schatz, 1997)– Large-scale processing to uncover underlying
related terms and literatures
![Page 29: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/29.jpg)
Enhancing conventional systems
Better content– Evidence-based resources
» More informative abstracts, e.g., Best Evidence
» Systematic reviews, e.g., Cochrane Database of Systematic Reviews
» Critically-Appraised Topics (CATs)
Better indexing– NLM’s MedIndex system provides expert
assistance to indexers
![Page 30: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/30.jpg)
Enhancing conventional systems (cont.)
Better retrieval– NLM’s Internet Grateful Med looks for
common searching mistakes (eg, excessive AND’s) and informs searcher
Better vocabularies– NLM’s UMLS Project adds terminology from
other vocabularies
![Page 31: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/31.jpg)
IR and the World Wide Web
Indexing and retrieval approaches Implications for scientific publishing Implications for health care Limitations
![Page 32: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/32.jpg)
Indexing and retrieval on the Web
Web crawlers– Index everything they find– Examples: Alta Vista, InfoSeek, Lycos– Problems: non-discriminating, word only
Filtering and/or classifying– Sites filtered and/or classified based on criteria– Examples: Yahoo, CliniWeb, OMNI– Problems: maintenance, intended audience
![Page 33: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/33.jpg)
Implications for scientific publishing
Peer-review process– Imperfect but best means for controlling
quality in publications Responsibility
– Increased anonymity of Web enhances ability for misrepresentation
Liability– Who is liable for inaccurate information?
![Page 34: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/34.jpg)
Implications for health care
Informativeness vs. marketing– There is potential conflict between providing
information and self-promotion Patient empowerment
– Absolutely important but much potential for damage from misinformation
![Page 35: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/35.jpg)
Much medical informationis on the Web
“Free” information from government agencies, medical schools, and advocacy groups is easy to access and use
“Best” information from traditional medical publishers still costly and fragmented
Some well-known launching pads– Medical Matrix: www.medmatrix.org/– CliniWeb: www.ohsu.edu/cliniweb/
![Page 36: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/36.jpg)
Limitations of the Web(Hersh, ACPJC, 1996)
Difficult to find information - a diversity of different search engines, each with its own benefits and limitations
Everyone can be a publisher - Good for democratic society, less so for scientific and professional fields
Misrepresentation and fraud - Web can amplify misinformation and allow easy fraud
![Page 37: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/37.jpg)
Some have expressed concern about free information on Web
Silberg et al. (JAMA, 1997) suggested standards for health information on Web– Authorship - names, affiliations, and credentials
– Attribution - references, sources, and (where appropriate) copyright
– Disclosure - potential and real conflicts of interest
– Currency - dates content posted and updated
![Page 38: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/38.jpg)
But applicability and quality of Web content is poor
Hersh, Gorman, and Sacherek, JAMA, 1998 Searched on 50 questions generated by
clinicians Less than 10% of pages relevant, none for
half of queries Low percentage of JAMA quality indicators
![Page 39: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/39.jpg)
Final thoughts
We are on the threshold of an exciting new era in communications and information dissemination– Integrity of information and responsibility for
it must be maintained– It should augment and not substitute for human
communication
![Page 40: Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor and Chief Division of Medical Informatics and Outcomes](https://reader035.vdocument.in/reader035/viewer/2022062803/56649f315503460f94c4d67c/html5/thumbnails/40.jpg)
References Cited: 1. Hersh, W., Information Retrieval: A Health Care Perspective. 1996, New York: Springer-Verlag. 2. Hersh, W. and D. Hickam, How well do physicians use electronic information retrieval systems? A framework
for investigation and review of the literature. Journal of the American Medical Association, 1998. 280: 1347-1352.
3. Haynes, R., et al., Online access to MEDLINE in clinical settings. Annals of Internal Medicine, 1990. 112: 78-84.
4. Hersh, W. and D. Hickam, The use of a multi-application computer workstation in a clinical setting. Bulletin of the Medical Library Association, 1994. 82: 382-389.
5. Hersh W. and D. Hickam, Information retrieval in medicine: the SAPHIRE experience. Journal of the American Society for Information Science, 1995. 46: 743-747.
6. Schatz B., Information retrieval in digital libraries: bringing search to the net. Science, 1997. 275: 327-334. 7. Hersh, W., Evidence-based medicine and the Internet. ACP Journal Club, 1996. 5(4): A12-A14. 8. Silberg, W., G. Lundberg, and R. Musacchio, Assessing, controlling, and assuring the quality of medical
information on the Internet: caveat lector et viewor - let the reader and viewer beware. Journal of the American Medical Association, 1997. 277: 1244-1245.
9. Hersh, W., P. Gorman, and L. Sacherek, Applicability and quality of information for answering clinical questions on the Web. Journal of the American Medical Association, 1998. 280: 1307-1308.
URLs:Division of Medical Informatics & Outcomes Research: www.ohsu.edu/bicc-informatics/CliniWeb: www.ohsu.edu/cliniweb/SAPHIRE International: www.ohsu.edu/cliniweb/saphint/