domain-specific multi-stage query language for medical document repositories
TRANSCRIPT
to Advance Knowledge for Humanityto Advance Knowledge for Humanity
Aastha MadaanUniversity of Aizu
1
Domain Specific Multi-stage Query Language Domain Specific Multi-stage Query Language for Medical Document Repositoriesfor Medical Document Repositories
05/01/23 VLDB Phd Workshop 2013
to Advance Knowledge for Humanityto Advance Knowledge for Humanity
IntroductionIntroduction Specialized Domains Biomedical, agriculture, medical/healthcare
Insufficient Search Engine-like Keywords based search
Require Effective search and query mechanisms
Medical domain
Medical professionals Specific technical articles (particular topic/sub-topics )General public General information (disease or medicine).
How to retrieve medical information effectively
05/01/23 2VLDB Phd Workshop 2013
Query ”general AIDS information” � medical search tool, such as PubMed Result 1000s of documents Different aspects of AIDS⊆ Such as , treatment, drug therapy, transmission, diagnosis, and history
to Advance Knowledge for Humanityto Advance Knowledge for Humanity
05/01/23 3VLDB Phd Workshop 2013
Introduction (1)Introduction (1) Medical Information
Knowledge Evolved over 10s of yearsContains Well defined terms and processes
Available on the WebPatient Specific InformationKnowledge-based Information
Medical Literature
Web Documents
Patient-encounterRecordings
to Advance Knowledge for Humanityto Advance Knowledge for Humanity
Complexity Knowledge-based Resources
Heterogeneous End-user groups Patients, researchers, doctors and other experts
Variation Information Requirements Patient-treatment, self -diagnosis, general health information
Structure Medical Documents Scientific papers, encyclopedias and other literature Unique, well-defined
05/01/23 VLDB Phd Workshop 2013 4
Introduction (2) Introduction (2)
to Advance Knowledge for Humanityto Advance Knowledge for Humanity
Introduction (3): Specialized DocumentsIntroduction (3): Specialized DocumentsCase of medical encyclopedias
Comprehensive medical guide Patients and clinicians
Authoritative source NLM (National Library of Medicine)
Paper based resources Electronic format
Examples MedlinePlus, WebMD, ADAMS, Merriam-Webster Medical Dictionary
05/01/23 VLDB Phd Workshop 2013 5
to Advance Knowledge for Humanityto Advance Knowledge for Humanity
Introduction (4): Why QueryIntroduction (4): Why Query External knowledge base Clinicians
Evidence based medicine
During different stages of point-of-care
Patient Assessment plan
Treatment based on Patient diagnosis
Improve Quality of Care
Authoritative information required
Self Diagnosis Patients and their relatives During Early appearance of symptoms/post-checkups Enhance Personal Knowledge
05/01/23 VLDB Phd Workshop 2013 6
to Advance Knowledge for Humanityto Advance Knowledge for Humanity
The Underlying StructureThe Underlying Structure
05/01/23 VLDB Phd Workshop 2013 7
The Hierarchical StructureTopic of the Document
SubtopicsMiscellaneous/Related
Content
Subtopic 1 Subtopic 2 Subtopic n Content topic 1 Content topic 2 Content topic n
Content Content Content Content Content Content
Flow of Contents Organized stages of point-of-care
to Advance Knowledge for Humanityto Advance Knowledge for Humanity
05/01/23 8VLDB Phd Workshop 2013
Introduction (5): End UsersIntroduction (5): End Users Variable
a. Demographical Characteristicsb. Tasks/Purposec. Computer/Domain Expertise
Practitioners and Researchers Well-versed
Domain knowledge and terminologies Require
Precise, complete, accurate and timely results
Patients and their relatives NOT Well-versed
Domain knowledge and terminologies Require
General information
Healthcare Workers
SpecializedResearchers
Patients, their relatives
to Advance Knowledge for Humanityto Advance Knowledge for Humanity
Evidence-based Queries Intent: Diagnostic Raised by: Clinicians/Experts Target resources: Online Medical Repositories (e.g. medical encyclopedia) Example: “Cases where helicobacter bacteria causes peptic ulcer”
Hypothesis-directed Queries Intent: Non-diagnostic Raised by: Novice users/patients Target resources: Online Medical Repositories (e.g. medical encyclopedia) Example: “Treatment in case of high fever and dizziness”
05/01/23 VLDB Phd Workshop 2013 9
Medical QueriesMedical Queries
to Advance Knowledge for Humanityto Advance Knowledge for Humanity
Query FlowsQuery Flows
05/01/23 VLDB Phd Workshop 2013 10
Occurrence Evidence-based and hypothesis-directed queries Represent Stages of information seeking Comprise Varying levels of query complexity
1. 2. 3. 4.
to Advance Knowledge for Humanityto Advance Knowledge for Humanity
05/01/23 VLDB Phd Workshop 2013 11
Query: Find chances of "Cancer Risk" in patients showing symptom "Sleep Deprivation" and have been exposed to "Radiation" (but not "Environmental Toxins" and does not have "Genetic Disorder") .
Help Needed
Research GapResearch Gap
Results Large in number, irrelevantFailure Keyword search, domain-specific search toolsRequire Precise and easy-to-use database style query methods
Key steps:
1. Schema understandable by users 2. Identify Resources to query3. Identify Granularity of results
Healthcare Expert
Paper-based resources
to Advance Knowledge for Humanityto Advance Knowledge for Humanity
05/01/23 VLDB Phd Workshop 2013 12
Aim: Query Online Medical Information Effectively
Transform Document Repository User-Level Schema
Enable High-level Query Language
Target Audience Skilled and semi-skilled users
Utilize Query capabilities of a database query language
Facilitate In-depth Queries and Granular Results
Bridging the GapBridging the Gap
to Advance Knowledge for Humanityto Advance Knowledge for Humanity
Query the New WayQuery the New Way
05/01/23 VLDB Phd Workshop 2013 13
User-levelUser-levelSchemaSchema
High-level Query High-level Query language language
Traditional Method
Proposed Method
ResourceResource
KeywordSearch
Query Method
Medical Expert
Medical Expert
Results -Lack specificity-Long list of full documents-Trustworthiness of resources unknown
Results-Specific, granular-Segments of documents query criteria-Trustworthy/Authoritative sources only
to Advance Knowledge for Humanityto Advance Knowledge for Humanity
05/01/23 14VLDB Phd Workshop 2013
Proposed ApproachProposed Approach
to Advance Knowledge for Humanityto Advance Knowledge for Humanity
05/01/23 VLDB Phd Workshop 2013 15
Key Features Key Features User-Level Schema Offline Process Universal , concept-level schema Attributes
Understandable Domain experts and novice users Provide Granular, context-based results Use Web segmentation algorithm, Domain concepts
Multi-stage Query Language Online Process Map multi-stage diagnostic process Step-by-step Query Flow Interactive Querying Continuous query refinement
View Results Add concept View Results
Support Simple, Medium, Complex, Recursive Queries Use User-level schema
to Advance Knowledge for Humanityto Advance Knowledge for Humanity
05/01/23 VLDB Phd Workshop 2013 16
OutlineOutlineTwo-step Framework
to Advance Knowledge for Humanityto Advance Knowledge for Humanity
1705/01/23 VLDB Phd Workshop 2013
Data ModelData Model
to Advance Knowledge for Humanityto Advance Knowledge for Humanity
Data ModelData ModelTree Structured Repository
05/01/23 VLDB Phd Workshop 2013 18
H1H1
f1f1
f2f2
to Advance Knowledge for Humanityto Advance Knowledge for Humanity
Data Model (1)Data Model (1)
05/01/23 VLDB Phd Workshop 2013 19
to Advance Knowledge for Humanityto Advance Knowledge for Humanity
Data Model (2): SchemaData Model (2): Schema
05/01/23 VLDB Phd Workshop 2013 20
Attributes Diagnostic concepts/terms Stages of point-of-care
Do not change frequently
to Advance Knowledge for Humanityto Advance Knowledge for Humanity
Data Model (3): A XML DocumentData Model (3): A XML Document
05/01/23 VLDB Phd Workshop 2013 21
Title CausesSymptoms
Treatment
Document corresponding to “Aarskog Syndrome”
MedlinePlus Medical Encyclopedia
to Advance Knowledge for Humanityto Advance Knowledge for Humanity
Data Model (4): Query EffortData Model (4): Query Effort Query: Find if "Oxygen therapy" work for the treatment of "Chronic
Respiratory Failure" and symptoms are "Lethargy" OR "Shortness of breath”.
05/01/23 VLDB Phd Workshop 2013 22
Advanced keyword search Proposed MethodSELECT attribute = “Treatment” WHEREAttribute “Disease_name” = “Chronic Respiratory Failure”ANDAttribute “Treatment” = “Oxygen therapy”ANDAttribute “Symptoms” =“Lethargy”OR Attribute “Symptoms” = “Shortness of breath”
Easy-to-UseNot PossibleResult segment
Context of user-query
to Advance Knowledge for Humanityto Advance Knowledge for Humanity
Data Model (5): Granular ResultsData Model (5): Granular Results
05/01/23 VLDB Phd Workshop 2013 23
Queried Attributes/Segments Query Results
Context Granular
Each result is a segment, combination of Concept/context in query Item of concern (content enclosed in a segment)
to Advance Knowledge for Humanityto Advance Knowledge for Humanity
An ExampleAn ExampleQuery: Find other symptoms where “chronic kidney failure” is caused by “anemia”
Queried segment Symptoms
Segments in Query Causes = “anemia” and Disease_name = “Chronic kidney failure”
Result Segment Symptoms
Context disease_name = “chronic kidney failure” & causes = “anemia”
05/01/23 VLDB Phd Workshop 2013 24
to Advance Knowledge for Humanityto Advance Knowledge for Humanity
05/01/23 VLDB Phd Workshop 2013 25
Next Step Next Step Multi-stage Query Language Multi-stage Query Language
to Advance Knowledge for Humanityto Advance Knowledge for Humanity
Proposed Query Language (1)Proposed Query Language (1) XQBE [Braga, 2005] User level schema
Create queries Drag and drop interface
Query : “Find cases where a person is inflicted with “peptic ulcer” due to “helicobacter pylon bacteria”
05/01/23 VLDB Phd Workshop 2013 26
Attributes understandable by end users
1. Case = disease_name Value = ??
2. Due to = Causes Value = “helicobacter pylon bacteria”
3. Inflicted with = Symptoms Value = “peptic ulcer”
Query Effort Minimal learning curve Computer-expertise not required
to Advance Knowledge for Humanityto Advance Knowledge for Humanity
Multi-stage Query-by-Concept Concept Query-able attribute Topic, sub-topic, medical concept
Query Effort Dynamic selection of attributes No computer expertise
Query Process
05/01/23 VLDB Phd Workshop 2013 27
Proposed Query Language (2)Proposed Query Language (2)
An Example: Cases where fever is caused due to infliction of Pneumonia and
Tuberculosis
to Advance Knowledge for Humanityto Advance Knowledge for Humanity
Another ExampleAnother Example Query: Find cases where 3 clinical concepts (“cough”, “no
sore-throat”, and “no sterol injection”) occur in context of symptoms along with a sub-concept (“non sterol injection at the left side”)
XQBE on Specialized Medical Repositories
Multi-stage Query-by-concept Query Language
05/01/23 VLDB Phd Workshop 2013 28
to Advance Knowledge for Humanityto Advance Knowledge for Humanity
Evaluation Plan Evaluation Plan
05/01/23 VLDB Phd Workshop 2013 29
Data Sets Document repository
MedlinePlus Health topics (900+) , encyclopedia (4000+), drugs (12000+) Set of Queries
50 test queries (multi-staged) Using literature survey and consultation with medical users
Quantitative Studies Evaluation Metrics
Accuracy of segment extraction (schema creation) Precision and Recall Reduction in search space Query Results
Qualitative Studies Usability Studies
Actual End-users Query Performance
to Advance Knowledge for Humanityto Advance Knowledge for Humanity
Initial AchievementsInitial Achievements
HTML documents XML schema as per proposed model
XQuery on XML
Integration with XQBE
Query by concept Enumeration using paper and pencil
05/01/23 VLDB Phd Workshop 2013 30
to Advance Knowledge for Humanityto Advance Knowledge for Humanity
3105/01/23 VLDB Phd Workshop 2013
ChallengesChallenges
Scalability Schema extraction of similar repositories
Understand and Implement Query operations needed
Understand User characteristics to be considered
User Interface Query Language
to Advance Knowledge for Humanityto Advance Knowledge for Humanity
3205/01/23 VLDB Phd Workshop 2013
Related WorkRelated Work Domain-specific Information Retrieval [Yan, 2011]
Similarity and popularity based models Insufficient for domain experts “Information granulation” needs to be considered in huge document repositories
Form-based Query Interfaces [Jayapandian, 2009] Easy-to-use Limited access to the database Complex queries large number of forms Varying medical concepts large number of fields in forms
Beyond single page web search results [Varadarajan, 2008] Provide granular results for user’s search Return segments from multiple or related web documents as results
High-level Graphical Query Languages [Braga, 2005] Easy-to-use and understand Little or no programming effort required by the user Common languages QBE, XQBE
to Advance Knowledge for Humanityto Advance Knowledge for Humanity
Summary and Conclusions Summary and Conclusions
3305/01/23 VLDB Phd Workshop 2013
Proposed Multi-stage Query Language
1. Aim Making Online medical information usable
2. Transformation User-Level Schema
3. Facilitates Granular/Context-based Results
4. Support Healthcare Experts
5. Minimize Learning curve for novice users
6. Reduce Dependency on keyword based searches
Provide Web user level activity Healthcare experts no or little programming effort
to Advance Knowledge for Humanityto Advance Knowledge for Humanity
References (1)References (1)
3405/01/23 VLDB Phd Workshop 2013
[1] D. Braga, A. Campi, and S. Ceri. Xqbe (xquery by example): A visual interface to the standard xml query language. ACM Trans. Database Syst., 30(2):398–443, June 2005.[2] D. Cai, S. Yu, J.-R. Wen, and W.-Y. Ma. Extracting content structure for web pages based on visual representation. In Proceedings of the 5th APWeb, pages 406–417. Springer-Verlag, 2003.[3] M.-A. Cartright, R. W. White, and E. Horvitz. Intentions and attention in exploratory health search. In Proceedings of the 34th Intl. ACM SIGIR conference, pages 65–74, New York, NY, USA, 2011.ACM.[4] S. Cohen, Y. Kanza, Y. Kogan, W. Nutt, Y. Sagiv, and A. Serebrenik. Equix-a search and query language for xml. Journal of the American Society for Information Science and Technology, 53:2002, 2000.[5] S. M. Freire, E. Sundvall, D. Karlsson, and P. Lambrix. Performance of XML Databases for Epidemiological Queries in Archetype-Based EHRs. In Proceedings Scandinavian Conference on Health Informatics 2012 , volume 70 of Linkping ElectronicConference Proceedings, pages 51–57. Linkping University Electronic Press, 2012.[6] M. Gschwandtner, M. Kritz, and C. Boyer. Requirements of the health professional research. In Technical Report D8.1.2. Khresmoi Project, 2011.[7] A. Hanbury. Medical information retrieval, an instance of domain. In SIGIR'12. ACM, August 2012.[8] S. Hunt, J. J. Cimino, and D. E. Koziol. A comparison of clinicians’s access to online knowledge resources using two types of information retrieval applications in an academic hospital setting. J Med Libr Assoc, 101(1):26–31, 2013.[9] http://www.who.int/classifications/icd/en/, 2011.[10] M. Jayapandian and H. V. Jagadish. Automating the design and construction of query forms. ICDE, page 125, 2006.[11] F. Li and H. V. Jagadish. Usability, databases, and hci. IEEE Data Eng. Bull., 35(3):37–45, 2012. [12] http://loinc.org/, 2011.[13] A. Marian and W. Wang. Flexible querying of personal information. IEEE Data Eng. Bull., 32(2):20–27, 2009.[14] http://www.nlm.nih.gov/bsd/pmresources.html, 2011.[15] http://www.nlm.nih.gov/medlineplus/, 2009.[16] http://www.linkedin.com/groups/ Choice-OpenEHR-persistence-layer-144276.S.208531138?qid=208adbca-fc26-4ada-bf02-7efe5a9e5661&trk=group_most_recent_rich-0-b-ttl&goback=%2Egmr_144276, 2013.
to Advance Knowledge for Humanityto Advance Knowledge for Humanity
References (2)References (2)[17] http://www.ncbi.nlm.nih.gov/pubmed, 2011.[18] S. A. Rahman, S. Bhalla, and T. Hashimoto. Query-by-object interface for information requirement elicitation in m-commerce. Int. J. Hum. Comput. Interaction, 20(2):135–160, 2006.[19] X. Y. Raymond, Y. Lau, D. Song, X. Li, and J. Ma. Toward a semantic granularity model for domain-specific information retrieval. ACM Trans. On Information Systems., 29(3), July 2011.[20] S. Sachdeva and S. Bhalla. Implementing high-level query language interfaces for archetype-based electronic health records database. In COMAD, 2009.[21] http://www.ihtsdo.org/snomed-ct/, 2011.[22] R. Varadarajan, V. Hristidis, and T. Li. Beyond single-page web search results. IEEE Transactions on Knowledge and Data Engineering, 20(3):411–424, 2008.[23] A. Yasir, M. Kumara Swamy, P. Krishna Reddy, and S. Bhalla. Enhanced query-by-object approach for information requirement elicitation in large databases. In Big Data Analytics, volume 7678 of Lecture Notes in Computer Science, pages 26–41. Springer, 2012.[24] M. Jayapandian, H.V. Jagadish : Automating the Design and Construction of Query Forms. IEEE Trans. Knowl. Data Eng. 21(10) :1389-1402 (2009).
05/01/23 VLDB Phd Workshop 2013 35
to Advance Knowledge for Humanityto Advance Knowledge for Humanity
QuestionsQuestions
3605/01/23 VLDB Phd Workshop 2013