a knowledge-based approach to retrieve scenario specific free-text in a medical digital library...
TRANSCRIPT
![Page 1: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu](https://reader036.vdocument.in/reader036/viewer/2022062621/551c50475503469d6a8b4b89/html5/thumbnails/1.jpg)
A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library
Wesley W. ChuComputer Science Dept,
![Page 2: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu](https://reader036.vdocument.in/reader036/viewer/2022062621/551c50475503469d6a8b4b89/html5/thumbnails/2.jpg)
2
NIH Program Project Grant A 5 year $ 10M joint interdisciplinary project
between Medical School & CS faculty Project 1-- teleradaiology infrastructure Project 2-- neuroradiology workstation Project 3-- multimedia information architecture Project 4-- natural language processing for
medical reports Project 5-- medical digital library
![Page 3: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu](https://reader036.vdocument.in/reader036/viewer/2022062621/551c50475503469d6a8b4b89/html5/thumbnails/3.jpg)
3
Project 5 Personnel
Graduate students:Victor Z. LiuWenlei MaoQinghua Zou
Consultants:Hooshang Kangaloo, M.D.Denies Aberle, M.D.
Project leader: Wesley W. Chu
![Page 4: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu](https://reader036.vdocument.in/reader036/viewer/2022062621/551c50475503469d6a8b4b89/html5/thumbnails/4.jpg)
4
Data in a Medical Digital Library Structured data (patient lab data,
demographic data,…)--CoBase Images (X rays, MRI, CT scans)--
KMeD Free-text
Patient reports Teaching files Literature News articles
![Page 5: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu](https://reader036.vdocument.in/reader036/viewer/2022062621/551c50475503469d6a8b4b89/html5/thumbnails/5.jpg)
5
System Overview
Patient reports
Medical literature
Medical Digital Library(MDL)
Teaching materials
Query results
Ad-hoc query
Patient report for content correlation
News Articles
![Page 6: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu](https://reader036.vdocument.in/reader036/viewer/2022062621/551c50475503469d6a8b4b89/html5/thumbnails/6.jpg)
6
A Sample Patient Report
…Tissue Source:LUNG (FINE NEEDLE ASPIRATION) (LEFT
LOWER LOBE)…FINAL DIAGNOSIS:
- LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE ASPIRATION):- LUNG CANCER, SMALL CELL, STAGE II.
…
…Tissue Source:LUNG (FINE NEEDLE ASPIRATION) (LEFT
LOWER LOBE)…FINAL DIAGNOSIS:
- LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE ASPIRATION):- LUNG CANCER, SMALL CELL, STAGE II.
…
![Page 7: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu](https://reader036.vdocument.in/reader036/viewer/2022062621/551c50475503469d6a8b4b89/html5/thumbnails/7.jpg)
7
Treatment-related articles
??? How to treat the disease
Diagnosis-related articles
??? How to diagnose the disease
Scenario Specific Retrieval
…Tissue Source:LUNG (FINE NEEDLE
ASPIRATION) (LEFT LOWER LOBE)
…FINAL DIAGNOSIS:
- LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE ASPIRATION):- LUNG CANCER, SMALL CELL, STAGE II.
…
![Page 8: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu](https://reader036.vdocument.in/reader036/viewer/2022062621/551c50475503469d6a8b4b89/html5/thumbnails/8.jpg)
8
Challenge I: Indexing Extracting domain-specific key
concepts in the free text for indexing Free-text: Lung cancer, small cell, stage II
Concept terms in knowledge source: stage II small cell lung cancer
Conventional methods use NLP Not scalable Cannot adapt to various forms of word
permutation
![Page 9: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu](https://reader036.vdocument.in/reader036/viewer/2022062621/551c50475503469d6a8b4b89/html5/thumbnails/9.jpg)
9
Challenge II: Terms used in the query are too general
Expanding the general terms in the query to specific terms that are used in the document
Query: lung cancer, diagnosis options
Document: … the effectiveness of chest x-ray and bronchography on patients with lung cancer …
?√
Query: lung cancer, chest x-ray, bronchography, …
![Page 10: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu](https://reader036.vdocument.in/reader036/viewer/2022062621/551c50475503469d6a8b4b89/html5/thumbnails/10.jpg)
10
Challenge III: Mismatching between terms used in query and documents
ExampleQuery: … lung cancer, …
Document 3: anti-cancerdrug combinations…
?? ?Document 1: … lung carcinoma …
Document 2: … lung neoplasm …
![Page 11: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu](https://reader036.vdocument.in/reader036/viewer/2022062621/551c50475503469d6a8b4b89/html5/thumbnails/11.jpg)
11
Challenge I: Indexing Challenge II: Terms in the query
are too general Challenge III: Mismatch between
terms in the query and the documents
![Page 12: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu](https://reader036.vdocument.in/reader036/viewer/2022062621/551c50475503469d6a8b4b89/html5/thumbnails/12.jpg)
12
IndexFinder: Extracting domain-specific key concepts
Technique Permute words from text to generate
concept candidates. Use knowledge base to select the
valid candidates. Problem
Valid candidates may be irrelevant to specific domain indexing.
![Page 13: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu](https://reader036.vdocument.in/reader036/viewer/2022062621/551c50475503469d6a8b4b89/html5/thumbnails/13.jpg)
13
Eliminating irrelevant concepts
Syntactic filter: Limit permutation of words within a
sentence. Semantic filter:
Use the semantic type (e.g. body part, disease, treatment, diagnosis) to filter out irrelevant concepts
Use ISA relationship to filter out general concepts and yield specific concepts.
![Page 14: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu](https://reader036.vdocument.in/reader036/viewer/2022062621/551c50475503469d6a8b4b89/html5/thumbnails/14.jpg)
14
IndexFinder Performance Two orders of magnitude faster than
conventional approaches No NLP Knowledge base (UMLS) and index files are
resided in main memory Time complexity is linear with the number of
distinct words in the text Preliminary Evaluation
IndexFinder generates 4% more concepts than conventional approaches
(using a single noun phrase) All concepts are relevant
![Page 15: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu](https://reader036.vdocument.in/reader036/viewer/2022062621/551c50475503469d6a8b4b89/html5/thumbnails/15.jpg)
15
Challenge I: Indexing Challenge II: Terms in the query
are too general Challenge III: Mismatch between
terms in the query and the documents
![Page 16: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu](https://reader036.vdocument.in/reader036/viewer/2022062621/551c50475503469d6a8b4b89/html5/thumbnails/16.jpg)
16
Query Expansion (QE) Queries in the following form
benefit from expansion:
<key concept> + <general supporting concept(s)>e.g. lung cancer e.g. diagnosis options
<key concept> + <specific supporting concept(s)>e.g. lung cancer e.g. chest x-ray, bronchography
expansion
![Page 17: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu](https://reader036.vdocument.in/reader036/viewer/2022062621/551c50475503469d6a8b4b89/html5/thumbnails/17.jpg)
17
Traditional QE Appends all terms that statistically co-
occur with the key terms in the query Not semantically focused
Original Query: lung cancer, diagnosis options
expansion
Expanded Query: lung cancer, radiotherapy, chemotherapy, antineoplastic agents, survival rate
![Page 18: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu](https://reader036.vdocument.in/reader036/viewer/2022062621/551c50475503469d6a8b4b89/html5/thumbnails/18.jpg)
18
Knowledge-based QE
Knowledge source(UMLS,by theNLM)
diagnoses
Concept
Disease or Syndrome
Diagnostic Procedure
Sign or Symptom
Pharmacologic Substance
lung cancer chest x-ray
Semantic Type
Key concept Specific supporting concepts
A class of conceptsthat belong to aSemantic Type
BodyParts
Injury orPoisoning
Semantic NetworkMetathesaurus
diagnoses
diagnoses
![Page 19: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu](https://reader036.vdocument.in/reader036/viewer/2022062621/551c50475503469d6a8b4b89/html5/thumbnails/19.jpg)
19
Challenge I: Indexing Challenge II: Terms in the query
are too general Challenge III: Mismatch between
terms in the query and the documents
![Page 20: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu](https://reader036.vdocument.in/reader036/viewer/2022062621/551c50475503469d6a8b4b89/html5/thumbnails/20.jpg)
20
Document: … lung carcinoma …Document: … lung neoplasm …Document: … anti-cancer drugcombinations …
Document: … anti-cancer drugcombinations …
Phrase-based Vector Space Model (VSM)
Query: … lung cancer, …
?
Knowledge-source
lung cancer = lung carcinoma …√
lung neoplasm …
parent_of
√
anti-cancer drug combinations
missing!!!
Query: … lung cancer, …
√??
![Page 21: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu](https://reader036.vdocument.in/reader036/viewer/2022062621/551c50475503469d6a8b4b89/html5/thumbnails/21.jpg)
21
Phrase-based VSM Examples
Query
Document
[(C0242379); “lung” “cancer”] …[(C0003393); “anti” “cancer” “drug” “combin”] …
Query:“lung cancer …”
Phrases:[(C0242379); “lung” “cancer”]…
Document:“anti-cancer drugcombinations …”
Phrases:[(C0003393); “anti” “cancer” “drug” “combin”]…
![Page 22: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu](https://reader036.vdocument.in/reader036/viewer/2022062621/551c50475503469d6a8b4b89/html5/thumbnails/22.jpg)
22
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
recall
aver
age
prec
isio
n ov
er 1
05 q
uerie
s
Stems
Retrieval Effectiveness Comparison (Corpus: OHSUMED, KB: UMLS)
16%100 queries
vs.5%
50 queries
![Page 23: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu](https://reader036.vdocument.in/reader036/viewer/2022062621/551c50475503469d6a8b4b89/html5/thumbnails/23.jpg)
23
System Overview
Patient reports
Medical literature
Medical Digital Library(MDL)
Teaching materials
Query results
Ad-hoc query
Patient report for content correlation
News Articles
![Page 24: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu](https://reader036.vdocument.in/reader036/viewer/2022062621/551c50475503469d6a8b4b89/html5/thumbnails/24.jpg)
24
Application: Query Answering via Templates
Sample templates:“<disease>, treatment,”“<disease>, diagnosis ”
QueryExpansion
…Template:“<disease>, treatment”
lung cancer
lung cancerradiotherapychemotherapycisplatin
relevant documents
IndexFinder
lung cancer,treatment
Phrase-basedVSM
![Page 25: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu](https://reader036.vdocument.in/reader036/viewer/2022062621/551c50475503469d6a8b4b89/html5/thumbnails/25.jpg)
25
Applications (cont’d) Scenario-specific content
correlation
Query Templates Scenario
Selection
e.g. treatment, diagnosis, etc.
PatientReport
QueryExpansion
…
relevant documents
Phrase-basedVSM
IndexFinder
![Page 26: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu](https://reader036.vdocument.in/reader036/viewer/2022062621/551c50475503469d6a8b4b89/html5/thumbnails/26.jpg)
26
Conclusion
Knowledge based (UMLS) approach provides scenario-specific medical free-text retrieval
IndexFinder – use word permutation as well as syntactic and semantic filtering to extract domain-specific key concepts in the free text for indexing
Knowledge-based query expansion – transform general terms in the query into the scenario specific terms used in the documents, giving the query a higher probability of matching with the relevant documents
Phrase based indexing – transform document indexing into phrase paradigm (concept and its word stems) to improve retrieve effectiveness
![Page 27: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu](https://reader036.vdocument.in/reader036/viewer/2022062621/551c50475503469d6a8b4b89/html5/thumbnails/27.jpg)
27
Acknowledgement
This research is supported in part by NIC/NIH Grant#4442511-33780
![Page 28: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu](https://reader036.vdocument.in/reader036/viewer/2022062621/551c50475503469d6a8b4b89/html5/thumbnails/28.jpg)
31
Demo http://fargo.cs.ucla.edu/umls/search.aspx
Test Texts
• Technically successful left lower lobe nodule biopsy.
• Preliminary localization CT images again demonstrate a left lower lobe nodule adjacent to the posterior segmental bronchus.
• CT scans obtained during biopsy demonstrate the coaxial cannula adjacent to the proximal aspect of the nodule.
• Surrounding pulmonary parenchymal hemorrhage as a result of the biopsy is also noted.
• There may be a tiny left apical air collection in the pleural space lateral to the apical bulla.
• Formal cytologic evaluation of the withdrawn specimen is pending at this time, although abnormal appearing "spindle" cells were identified during on-site cytopathologic evaluation of specimen adequacy.