domain specific multi-stage query language for medical ...bhalla/modeldrivenver2.pdf · use web...
TRANSCRIPT
to Advance Knowledge for Humanity
S. Bhalla
Model Driven Querying
1
Domain Specific Multi-stage Query Language
for Medical Document Repositories
9/23/2013 VLDB Phd Workshop 2013
to Advance Knowledge for Humanity
Introduction
Specialized Domains
Biomedical, agriculture, medical/healthcare
Require Effective search and query mechanisms
Insufficient Search-engine like Key-words based searches
Medical domain
Complex
Example,
Medical professionals Specific technical articles (particular topic )
General public General information (disease or medicine).
How to retrieve medical query related information
9/23/2013 2 VLDB Phd Workshop 2013
Query Information about ”general AIDS information�” medical search tool, such as
PubMed
Result 1000s of documents ⊆ Different aspects of AIDS are displayed Such as , treatment, drug therapy, transmission, diagnosis, and history
to Advance Knowledge for Humanity
9/23/2013 3 VLDB Phd Workshop 2013
Introduction (1)
Medical Information
Knowledge Evolved over 10s of years
Contains Well defined terms and processes
Available on the Web
Patient Specific Information
Knowledge-based Information
Medical Literature
Web Documents
Patient-encounter
Recordings
to Advance Knowledge for Humanity
Complexity Knowledge-based Resources
Heterogeneous End-user groups
Patients, researchers, doctors and other experts
Variation Information Requirements
Patient-treatment, self -diagnosis, general health information
Structure Medical Documents
Scientific papers, encyclopedias and other literature
Unique, well-defined
9/23/2013 VLDB Phd Workshop 2013 4
Introduction (2)
to Advance Knowledge for Humanity
Introduction (3): Specialized Documents
Case of medical encyclopedias
Comprehensive medical guide Patients and clinicians
Authoritative source NLM (National Library of Medicine)
Paper based resources Electronic format
Frequently referred Medical domain users
Example, MedlinePlus, WebMD, ADAMS,
Merriam-Webster Medical Dictionary
9/23/2013 VLDB Phd Workshop 2013 5
to Advance Knowledge for Humanity
Introduction (4): Why Query
External knowledge base Clinicians
Evidence based medicine
During different stages of point-of-care
Assessment plan of treatment
Patient diagnosis
Improve Quality of Care
Authoritative information required
Self Diagnosis
During Early appearance of symptoms
Personal Knowledge Patients and their relatives
9/23/2013 VLDB Phd Workshop 2013 6
to Advance Knowledge for Humanity
The Underlying Structure
9/23/2013 VLDB Phd Workshop 2013 7
The Hierarchical Structure
Topic of the Document
Subtopics
Miscellaneous/Related
Content
Subtopic 1 Subtopic 2 Subtopic n Content topic 1 Content topic 2 Content topic n
Content Content Content Content Content Content
Flow of Contents Organized as stages of point-of-care
to Advance Knowledge for Humanity
9/23/2013 8 VLDB Phd Workshop 2013
Introduction (5): The End Users Variable
a. Demographical Characteristics
b. Tasks/Purpose
c. Computer/Domain Expertise
Practitioners and Researchers
Well-versed
Domain knowledge and terminologies
Require
Precise, complete, accurate and timely results
Patients and their relatives
NOT Well-versed
Domain knowledge and terminologies
Require
General information
Healthcare Workers
Specialized
Researchers
Patients, their relatives
to Advance Knowledge for Humanity
Evidence-based Queries
Intent: Diagnostic
Raised by: Clinicians/Experts
Target resources: Online Medical Repositories (e.g. medical encyclopedia)
Example: “Cases where helicobacter bacteria causes peptic ulcer”
Hypothesis-directed Queries
Intent: Non-diagnostic
Raised by: Novice users/patients
Target resources: Online Medical Repositories (e.g. medical encyclopedia)
Example: “Treatment in case of high fever and dizziness”
9/23/2013 VLDB Phd Workshop 2013 9
The Medical Queries
to Advance Knowledge for Humanity
The Query Flows
9/23/2013 VLDB Phd Workshop 2013 10
Occurrence Evidence-based and hypothesis-directed queries
Represent Stages of information seeking
Comprise Varying levels of query complexity
1. 2. 3. 4.
to Advance Knowledge for Humanity
9/23/2013 VLDB Phd Workshop 2013 11
Query: Find chances of "Cancer Risk" in patients showing symptom "Sleep Deprivation"
and have been exposed to "Radiation" (but not "Environmental Toxins" and does not have
"Genetic Disorder") .
Help Needed
The Research Gap
Results Large in number, irrelevant
Failure Keyword search, domain-specific search tools
Require Precise and easy-to-use database style query methods
Key steps:
1. Schema understandable by users
2. Identify Resources to query
3. Identify Granularity of results
Healthcare Expert
Paper-based resources
to Advance Knowledge for Humanity
9/23/2013 VLDB Phd Workshop 2013 12
Aim: Effective Online Medical Information
Transform Document Repository User-Level Schema
Enable High-level Query Language
Target Audience Skilled and semi-skilled users
Utilize Query capabilities of a database query language
Assist Domain Experts Using Query language
Facilitate
In-depth Queries
Granular Results
Bridging the Gap
to Advance Knowledge for Humanity
Querying the New Way
9/23/2013 VLDB Phd Workshop 2013 13
User-level
Schema
High-level Query
language
Traditional
Method
Proposed
Method
Resource
Resource
Search
Method
Query
Method
Medical
Expert
Medical
Expert
Results returned
- Lack specificity
- Long list of full documents
- Trustworthiness of resources unknown
Results returned
- Specific, granular
- Segments of documents query criteria
- Trustworthy/Authoritative sources only
to Advance Knowledge for Humanity
9/23/2013 14 VLDB Phd Workshop 2013
Proposed Approach
to Advance Knowledge for Humanity
9/23/2013 VLDB Phd Workshop 2013 15
Key Features
User-Level Schema
Universal , concept-level schema
Attributes
Understandable Domain experts and novice users
Query-able
Granular results
Multi-stage Query Language
Map multi-stage diagnostic process Step-by-step Query Flow
Interactive Querying View Results Add concept
Continuous query refinement
Supported Queries Simple, Medium, Complex, Recursive
to Advance Knowledge for Humanity
Outline
Two-step framework
Offline Process Create User-level schema
Online Process Enable Multi-stage Query Language
Offline Process
Use Web segmentation algorithm, Domain concepts
Result Automatic creation of a User-level schema
Online Process
Enable Multi-stage Query Language
Use User-level schema
Results Granular, segment-level, context-based
9/23/2013 VLDB Phd Workshop 2013 16
to Advance Knowledge for Humanity
9/23/2013 VLDB Phd Workshop 2013 17
The Method
Two-step Framework
to Advance Knowledge for Humanity
18 9/23/2013
VLDB Phd Workshop 2013
Data Model
to Advance Knowledge for Humanity
Data Model
Tree Structured Repository
9/23/2013 VLDB Phd Workshop 2013 19
H1
f1
f2
Example: MedlinePlus Medical Encyclopedia
to Advance Knowledge for Humanity
Data Model (1)
9/23/2013 VLDB Phd Workshop 2013 20
to Advance Knowledge for Humanity
Data Model (2):The Schema
9/23/2013 VLDB Phd Workshop 2013 21
Attributes Diagnostic concepts/terms
Understandable by expert and novice users
Do not change frequently
to Advance Knowledge for Humanity
Data Model (3): A XML Document
9/23/2013 VLDB Phd Workshop 2013 22
Example: MedlinePlus Medical Encyclopedia
Title Causes Symptoms
Treatment
Document corresponding to “Aarskog Syndrome”
to Advance Knowledge for Humanity
Data Model (4): Query Effort
Query: Find if "Oxygen therapy" work for the treatment of "Chronic Respiratory
Failure" and symptoms are "Lethargy" OR "Shortness of breath”.
9/23/2013 VLDB Phd Workshop 2013 23
Advanced search: MedlinePlus Proposed Method
SELECT attribute = “Disease_name”
WHERE
Attribute “Disease_name” = “Chronic
Respiratory Failure”
AND
Attribute “Treatment” = “Oxygen therapy”
AND
Attribute “Symptoms” =“Lethargy”
OR
Attribute “Symptoms” = “Shortness of breath”
Easy-to-Use Not Possible
Result segment
Context of user-query
to Advance Knowledge for Humanity
Data Model (5): Granular Results
9/23/2013 VLDB Phd Workshop 2013 24
Queried
Attributes/Segments
Query Results
Context Granular
Each result is a segment, combination of
Concept/context in query
Item of concern (content enclosed in a
segment)
to Advance Knowledge for Humanity
An Example
Query: Find other symptoms where “chronic kidney failure” is
caused by “anemia”
Queried segment Symptoms
Segments in Query Causes = “anemia” and Disease_name
= “Chronic kidney failure”
Result Segment Symptoms
Context disease_name = “chronic kidney failure” & causes
= “anemia” (SYMPTOM - segment)
9/23/2013 VLDB Phd Workshop 2013 25
to Advance Knowledge for Humanity
9/23/2013 VLDB Phd Workshop 2013 26
Next Step Multi-stage Query Language
to Advance Knowledge for Humanity
Proposed Query Language (1)
XQBE Medical document repositories
Create queries Drag and drop interface
Query : “Find cases where a person is inflicted with “peptic ulcer” due to
“helicobacter pylon bacteria”
9/23/2013 VLDB Phd Workshop 2013 27
Attributes understandable by end
users
1. Case = disease_name
Value = ??
2. Due to = Causes
Value = “helicobacter pylon bacteria”
3. Inflicted with = Symptoms
Value = “peptic ulcer”
Query Effort Minimal learning curve
Computer-expertise not required
to Advance Knowledge for Humanity
Multi-stage Query-by-Concept
Concept Query-able attribute
Topic, sub-topic, medical concept
Query Effort
Dynamic selection of attributes
No computer expertise
Query Process
9/23/2013 VLDB Phd Workshop 2013 28
Proposed Query Language (2)
An Example: Cases where fever is caused due
to infliction of Pneumonia and Tuberculosis
to Advance Knowledge for Humanity
Another Example
Query: Find cases where 3 clinical concepts (“cough”, “no
sore-throat”, and “had no sterol injection”) occur in context of
symptoms occur along with a concept having sub-key (i.e. “non
sterol injection at the left side”)
Possible XQBE on Specialized Medical Repositories
Possible Multi-stage Query-by-concept Query Language
9/23/2013 VLDB Phd Workshop 2013 29
to Advance Knowledge for Humanity
Evaluation Plan
9/23/2013 VLDB Phd Workshop 2013 30
Data Sets
MedlinePlus document repository
Health topics (900+) , encyclopedia (4000+), drugs (12000+)
Set of Queries
50 test queries (multi-staged)
Using diseases and medication etc.
Quantitative Studies
Evaluation Metrics
Accuracy of segment extraction (schema creation) Precision and Recall
Reduction in search space
Qualitative Studies
Usability Studies
Actual End-users
Query Performance
to Advance Knowledge for Humanity
Initial Achievements
HTML XML as per proposed model
XQuery on XML
Integration with XQBE
Query by concept Enumeration using paper and
pencil
9/23/2013 VLDB Phd Workshop 2013 31
to Advance Knowledge for Humanity
32 9/23/2013 VLDB Phd Workshop 2013
Challenges
Scalability Similarly structured repositories
List Query operations needed
Implementation Above query operations
Query Language User Interface
to Advance Knowledge for Humanity
33 9/23/2013 VLDB Phd Workshop 2013
Related Work
Domain-specific Information Retrieval
Similarity and popularity based models Insufficient for domain experts
“Information granulation” needs to be considered in huge document repositories
Form-based Query Interfaces
Easy-to-use
Limited access to the database
Complex queries large number of forms
Varying medical concepts large number of fields in forms
Beyond single page web search results
Provide granular results for user’s search
Return segments from multiple or related web documents as results
High-level Graphical Query Languages
Easy-to-use and understand
Little or no programming effort required by the user
Common languages QBE, XQBE
to Advance Knowledge for Humanity
Summary and Conclusions
34 9/23/2013 VLDB Phd Workshop 2013
Proposed Multi-stage Query Language
1. Aim Effective online medical resources
2. Key feature User-Level Schema
3. Facilitates Granular/Context-based Results
4. Support Healthcare Experts
5. Minimize Learning curve for novice users
6. Reduce Dependency on general-purpose search engines
Provide Web user level activity no or little programming effort
Healthcare experts
to Advance Knowledge for Humanity
References (1)
35 9/23/2013 VLDB Phd Workshop 2013
[1] D. Braga, A. Campi, and S. Ceri. Xqbe (xquery by example): A visual interface to the standard xml query language. ACM
Trans. Database Syst., 30(2):398–443, June 2005.
[2] D. Cai, S. Yu, J.-R. Wen, and W.-Y. Ma. Extracting content structure for web pages based on visual representation. In
Proceedings of the 5th APWeb, pages 406–417. Springer-Verlag, 2003.
[3] M.-A. Cartright, R. W. White, and E. Horvitz. Intentions and attention in exploratory health search. In Proceedings of the 34th
Intl. ACM SIGIR conference, pages 65–74, New York, NY, USA, 2011.ACM.
[4] S. Cohen, Y. Kanza, Y. Kogan, W. Nutt, Y. Sagiv, and A. Serebrenik. Equix-a search and query language for xml. Journal of
the American Society for Information Science and Technology, 53:2002, 2000.
[5] S. M. Freire, E. Sundvall, D. Karlsson, and P. Lambrix. Performance of XML Databases for Epidemiological Queries in
Archetype-Based EHRs. In Proceedings Scandinavian Conference on Health Informatics 2012, volume 70 of Linkping Electronic
Conference Proceedings, pages 51–57. Linkping University Electronic Press, 2012.
[6] M. Gschwandtner, M. Kritz, and C. Boyer. Requirements of the health professional research. In Technical Report D8.1.2.
Khresmoi Project, 2011.
[7] A. Hanbury. Medical information retrieval, an instance of domain. In SIGIR'12. ACM, August 2012.
[8] S. Hunt, J. J. Cimino, and D. E. Koziol. A comparison of clinicians’s access to online knowledge resources using two types of
information retrieval applications in an academic hospital setting. J Med Libr Assoc, 101(1):26–31, 2013.
[9] http://www.who.int/classifications/icd/en/, 2011.
[10] M. Jayapandian and H. V. Jagadish. Automating the design and construction of query forms. ICDE, page 125, 2006.
[11] F. Li and H. V. Jagadish. Usability, databases, and hci. IEEE Data Eng. Bull., 35(3):37–45, 2012. [12] http://loinc.org/, 2011.
[13] A. Marian and W. Wang. Flexible querying of personal information. IEEE Data Eng. Bull., 32(2):20–27, 2009.
[14] http://www.nlm.nih.gov/bsd/pmresources.html, 2011.
[15] http://www.nlm.nih.gov/medlineplus/, 2009.
[16] http://www.linkedin.com/groups/ Choice-OpenEHR-persistence-layer-144276.S.208531138?qid=208adbca-fc26-4ada-bf02-
7efe5a9e5661&trk=group_most_recent_rich-0-b-ttl&goback=%2Egmr_144276, 2013.
to Advance Knowledge for Humanity
References (2) [17] http://www.ncbi.nlm.nih.gov/pubmed, 2011.
[18] S. A. Rahman, S. Bhalla, and T. Hashimoto. Query-by-object interface for information requirement elicitation in m-commerce. Int.
J. Hum. Comput. Interaction, 20(2):135–160, 2006.
[19] X. Y. Raymond, Y. Lau, D. Song, X. Li, and J. Ma. Toward a semantic granularity model for domain-specific information retrieval.
ACM Trans. On Information Systems., 29(3), July 2011.
[20] S. Sachdeva and S. Bhalla. Implementing high-level query language interfaces for archetype-based electronic health records
database. In COMAD, 2009.
[21] http://www.ihtsdo.org/snomed-ct/, 2011.
[22] R. Varadarajan, V. Hristidis, and T. Li. Beyond single-page web search results. IEEE Transactions on Knowledge and Data
Engineering, 20(3):411–424, 2008.
[23] A. Yasir, M. Kumara Swamy, P. Krishna Reddy, and S. Bhalla. Enhanced query-by-object approach for information requirement
elicitation in large databases. In Big Data Analytics, volume 7678 of Lecture Notes in Computer Science, pages 26–41. Springer, 2012.
9/23/2013 VLDB Phd Workshop 2013 36
to Advance Knowledge for Humanity
Questions
37 9/23/2013 VLDB Phd Workshop 2013