domain-specific multi-stage query language for medical document repositories

36
to Advance Knowledge for Humanity to Advance Knowledge for Humanity Aastha Madaan University of Aizu 1 Domain Specific Multi-stage Query Domain Specific Multi-stage Query Language Language for Medical Document Repositories for Medical Document Repositories 07/05/22 VLDB Phd Workshop 2013

Upload: aastha-madaan

Post on 09-Feb-2017

23 views

Category:

Healthcare


0 download

TRANSCRIPT

Page 1: Domain-specific Multi-stage Query Language for Medical Document Repositories

to Advance Knowledge for Humanityto Advance Knowledge for Humanity

Aastha MadaanUniversity of Aizu

1

Domain Specific Multi-stage Query Language Domain Specific Multi-stage Query Language for Medical Document Repositoriesfor Medical Document Repositories

05/01/23 VLDB Phd Workshop 2013

Page 2: Domain-specific Multi-stage Query Language for Medical Document Repositories

to Advance Knowledge for Humanityto Advance Knowledge for Humanity

IntroductionIntroduction Specialized Domains Biomedical, agriculture, medical/healthcare

Insufficient  Search Engine-like Keywords based search

Require Effective search and query mechanisms

Medical domain

Medical professionals Specific technical articles (particular topic/sub-topics )General public General information (disease or medicine).

How to retrieve medical information effectively

05/01/23 2VLDB Phd Workshop 2013

Query ”general AIDS information” � medical search tool, such as PubMed Result 1000s of documents Different aspects of AIDS⊆ Such as , treatment, drug therapy, transmission, diagnosis, and history

Page 3: Domain-specific Multi-stage Query Language for Medical Document Repositories

to Advance Knowledge for Humanityto Advance Knowledge for Humanity

05/01/23 3VLDB Phd Workshop 2013

Introduction (1)Introduction (1) Medical Information

Knowledge Evolved over 10s of yearsContains Well defined terms and processes

Available on the WebPatient Specific InformationKnowledge-based Information

Medical Literature

Web Documents

Patient-encounterRecordings

Page 4: Domain-specific Multi-stage Query Language for Medical Document Repositories

to Advance Knowledge for Humanityto Advance Knowledge for Humanity

Complexity Knowledge-based Resources

Heterogeneous End-user groups Patients, researchers, doctors and other experts

Variation Information Requirements Patient-treatment, self -diagnosis, general health information

Structure Medical Documents Scientific papers, encyclopedias and other literature Unique, well-defined

05/01/23 VLDB Phd Workshop 2013 4

Introduction (2) Introduction (2)

Page 5: Domain-specific Multi-stage Query Language for Medical Document Repositories

to Advance Knowledge for Humanityto Advance Knowledge for Humanity

Introduction (3): Specialized DocumentsIntroduction (3): Specialized DocumentsCase of medical encyclopedias

Comprehensive medical guide Patients and clinicians

Authoritative source NLM (National Library of Medicine)

Paper based resources Electronic format

Examples MedlinePlus, WebMD, ADAMS, Merriam-Webster Medical Dictionary

05/01/23 VLDB Phd Workshop 2013 5

Page 6: Domain-specific Multi-stage Query Language for Medical Document Repositories

to Advance Knowledge for Humanityto Advance Knowledge for Humanity

Introduction (4): Why QueryIntroduction (4): Why Query External knowledge base Clinicians

Evidence based medicine

During different stages of point-of-care

Patient Assessment plan

Treatment based on Patient diagnosis

Improve Quality of Care

Authoritative information required

Self Diagnosis Patients and their relatives During Early appearance of symptoms/post-checkups Enhance Personal Knowledge

05/01/23 VLDB Phd Workshop 2013 6

Page 7: Domain-specific Multi-stage Query Language for Medical Document Repositories

to Advance Knowledge for Humanityto Advance Knowledge for Humanity

The Underlying StructureThe Underlying Structure

05/01/23 VLDB Phd Workshop 2013 7

The Hierarchical StructureTopic of the Document

SubtopicsMiscellaneous/Related

Content

Subtopic 1 Subtopic 2 Subtopic n Content topic 1 Content topic 2 Content topic n

Content Content Content Content Content Content

Flow of Contents Organized stages of point-of-care

Page 8: Domain-specific Multi-stage Query Language for Medical Document Repositories

to Advance Knowledge for Humanityto Advance Knowledge for Humanity

05/01/23 8VLDB Phd Workshop 2013

Introduction (5): End UsersIntroduction (5): End Users Variable

a. Demographical Characteristicsb. Tasks/Purposec. Computer/Domain Expertise

Practitioners and Researchers Well-versed

Domain knowledge and terminologies Require

Precise, complete, accurate and timely results

Patients and their relatives NOT Well-versed

Domain knowledge and terminologies Require

General information

Healthcare Workers

SpecializedResearchers

Patients, their relatives

Page 9: Domain-specific Multi-stage Query Language for Medical Document Repositories

to Advance Knowledge for Humanityto Advance Knowledge for Humanity

Evidence-based Queries Intent: Diagnostic Raised by: Clinicians/Experts Target resources: Online Medical Repositories (e.g. medical encyclopedia) Example: “Cases where helicobacter bacteria causes peptic ulcer”

Hypothesis-directed Queries Intent: Non-diagnostic Raised by: Novice users/patients Target resources: Online Medical Repositories (e.g. medical encyclopedia) Example: “Treatment in case of high fever and dizziness”

05/01/23 VLDB Phd Workshop 2013 9

Medical QueriesMedical Queries

Page 10: Domain-specific Multi-stage Query Language for Medical Document Repositories

to Advance Knowledge for Humanityto Advance Knowledge for Humanity

Query FlowsQuery Flows

05/01/23 VLDB Phd Workshop 2013 10

Occurrence Evidence-based and hypothesis-directed queries Represent Stages of information seeking Comprise Varying levels of query complexity

1. 2. 3. 4.

Page 11: Domain-specific Multi-stage Query Language for Medical Document Repositories

to Advance Knowledge for Humanityto Advance Knowledge for Humanity

05/01/23 VLDB Phd Workshop 2013 11

Query: Find chances of "Cancer Risk" in patients showing symptom "Sleep Deprivation" and have been exposed to "Radiation" (but not "Environmental Toxins" and does not have "Genetic Disorder") .

Help Needed

Research GapResearch Gap

Results Large in number, irrelevantFailure Keyword search, domain-specific search toolsRequire Precise and easy-to-use database style query methods

Key steps:

1. Schema understandable by users 2. Identify Resources to query3. Identify Granularity of results

Healthcare Expert

Paper-based resources

Page 12: Domain-specific Multi-stage Query Language for Medical Document Repositories

to Advance Knowledge for Humanityto Advance Knowledge for Humanity

05/01/23 VLDB Phd Workshop 2013 12

Aim: Query Online Medical Information Effectively

Transform Document Repository User-Level Schema

Enable High-level Query Language

Target Audience Skilled and semi-skilled users

Utilize Query capabilities of a database query language

Facilitate In-depth Queries and Granular Results

Bridging the GapBridging the Gap

Page 13: Domain-specific Multi-stage Query Language for Medical Document Repositories

to Advance Knowledge for Humanityto Advance Knowledge for Humanity

Query the New WayQuery the New Way

05/01/23 VLDB Phd Workshop 2013 13

User-levelUser-levelSchemaSchema

High-level Query High-level Query language language

Traditional Method

Proposed Method

ResourceResource

KeywordSearch

Query Method

Medical Expert

Medical Expert

Results -Lack specificity-Long list of full documents-Trustworthiness of resources unknown

Results-Specific, granular-Segments of documents query criteria-Trustworthy/Authoritative sources only

Page 14: Domain-specific Multi-stage Query Language for Medical Document Repositories

to Advance Knowledge for Humanityto Advance Knowledge for Humanity

05/01/23 14VLDB Phd Workshop 2013

Proposed ApproachProposed Approach

Page 15: Domain-specific Multi-stage Query Language for Medical Document Repositories

to Advance Knowledge for Humanityto Advance Knowledge for Humanity

05/01/23 VLDB Phd Workshop 2013 15

Key Features Key Features User-Level Schema Offline Process Universal , concept-level schema Attributes

Understandable Domain experts and novice users Provide Granular, context-based results Use Web segmentation algorithm, Domain concepts

Multi-stage Query Language Online Process Map multi-stage diagnostic process Step-by-step Query Flow Interactive Querying Continuous query refinement

View Results Add concept View Results

Support Simple, Medium, Complex, Recursive Queries Use User-level schema

Page 16: Domain-specific Multi-stage Query Language for Medical Document Repositories

to Advance Knowledge for Humanityto Advance Knowledge for Humanity

05/01/23 VLDB Phd Workshop 2013 16

OutlineOutlineTwo-step Framework

Page 17: Domain-specific Multi-stage Query Language for Medical Document Repositories

to Advance Knowledge for Humanityto Advance Knowledge for Humanity

1705/01/23 VLDB Phd Workshop 2013

Data ModelData Model

Page 18: Domain-specific Multi-stage Query Language for Medical Document Repositories

to Advance Knowledge for Humanityto Advance Knowledge for Humanity

Data ModelData ModelTree Structured Repository

05/01/23 VLDB Phd Workshop 2013 18

H1H1

f1f1

f2f2

Page 19: Domain-specific Multi-stage Query Language for Medical Document Repositories

to Advance Knowledge for Humanityto Advance Knowledge for Humanity

Data Model (1)Data Model (1)

05/01/23 VLDB Phd Workshop 2013 19

Page 20: Domain-specific Multi-stage Query Language for Medical Document Repositories

to Advance Knowledge for Humanityto Advance Knowledge for Humanity

Data Model (2): SchemaData Model (2): Schema

05/01/23 VLDB Phd Workshop 2013 20

Attributes Diagnostic concepts/terms Stages of point-of-care

Do not change frequently

Page 21: Domain-specific Multi-stage Query Language for Medical Document Repositories

to Advance Knowledge for Humanityto Advance Knowledge for Humanity

Data Model (3): A XML DocumentData Model (3): A XML Document

05/01/23 VLDB Phd Workshop 2013 21

Title CausesSymptoms

Treatment

Document corresponding to “Aarskog Syndrome”

MedlinePlus Medical Encyclopedia

Page 22: Domain-specific Multi-stage Query Language for Medical Document Repositories

to Advance Knowledge for Humanityto Advance Knowledge for Humanity

Data Model (4): Query EffortData Model (4): Query Effort Query: Find if "Oxygen therapy" work for the treatment of "Chronic

Respiratory Failure" and symptoms are "Lethargy" OR "Shortness of breath”.

05/01/23 VLDB Phd Workshop 2013 22

Advanced keyword search Proposed MethodSELECT attribute = “Treatment” WHEREAttribute “Disease_name” = “Chronic Respiratory Failure”ANDAttribute “Treatment” = “Oxygen therapy”ANDAttribute “Symptoms” =“Lethargy”OR Attribute “Symptoms” = “Shortness of breath”

Easy-to-UseNot PossibleResult segment

Context of user-query

Page 23: Domain-specific Multi-stage Query Language for Medical Document Repositories

to Advance Knowledge for Humanityto Advance Knowledge for Humanity

Data Model (5): Granular ResultsData Model (5): Granular Results

05/01/23 VLDB Phd Workshop 2013 23

Queried Attributes/Segments Query Results

Context Granular

Each result is a segment, combination of Concept/context in query Item of concern (content enclosed in a segment)

Page 24: Domain-specific Multi-stage Query Language for Medical Document Repositories

to Advance Knowledge for Humanityto Advance Knowledge for Humanity

An ExampleAn ExampleQuery: Find other symptoms where “chronic kidney failure” is caused by “anemia”

Queried segment Symptoms

Segments in Query Causes = “anemia” and Disease_name = “Chronic kidney failure”

Result Segment Symptoms

Context disease_name = “chronic kidney failure” & causes = “anemia”

05/01/23 VLDB Phd Workshop 2013 24

Page 25: Domain-specific Multi-stage Query Language for Medical Document Repositories

to Advance Knowledge for Humanityto Advance Knowledge for Humanity

05/01/23 VLDB Phd Workshop 2013 25

Next Step Next Step Multi-stage Query Language Multi-stage Query Language

Page 26: Domain-specific Multi-stage Query Language for Medical Document Repositories

to Advance Knowledge for Humanityto Advance Knowledge for Humanity

Proposed Query Language (1)Proposed Query Language (1) XQBE [Braga, 2005] User level schema

Create queries Drag and drop interface

Query : “Find cases where a person is inflicted with “peptic ulcer” due to “helicobacter pylon bacteria”

05/01/23 VLDB Phd Workshop 2013 26

Attributes understandable by end users

1. Case = disease_name Value = ??

2. Due to = Causes Value = “helicobacter pylon bacteria”

3. Inflicted with = Symptoms Value = “peptic ulcer”

Query Effort Minimal learning curve Computer-expertise not required

Page 27: Domain-specific Multi-stage Query Language for Medical Document Repositories

to Advance Knowledge for Humanityto Advance Knowledge for Humanity

Multi-stage Query-by-Concept Concept Query-able attribute Topic, sub-topic, medical concept

Query Effort Dynamic selection of attributes No computer expertise

Query Process

05/01/23 VLDB Phd Workshop 2013 27

Proposed Query Language (2)Proposed Query Language (2)

An Example: Cases where fever is caused due to infliction of Pneumonia and

Tuberculosis

Page 28: Domain-specific Multi-stage Query Language for Medical Document Repositories

to Advance Knowledge for Humanityto Advance Knowledge for Humanity

Another ExampleAnother Example Query: Find cases where 3 clinical concepts (“cough”, “no

sore-throat”, and “no sterol injection”) occur in context of symptoms along with a sub-concept (“non sterol injection at the left side”)

XQBE on Specialized Medical Repositories

Multi-stage Query-by-concept Query Language

05/01/23 VLDB Phd Workshop 2013 28

Page 29: Domain-specific Multi-stage Query Language for Medical Document Repositories

to Advance Knowledge for Humanityto Advance Knowledge for Humanity

Evaluation Plan Evaluation Plan

05/01/23 VLDB Phd Workshop 2013 29

Data Sets Document repository

MedlinePlus Health topics (900+) , encyclopedia (4000+), drugs (12000+) Set of Queries

50 test queries (multi-staged) Using literature survey and consultation with medical users

Quantitative Studies Evaluation Metrics

Accuracy of segment extraction (schema creation) Precision and Recall Reduction in search space Query Results

Qualitative Studies Usability Studies

Actual End-users Query Performance

Page 30: Domain-specific Multi-stage Query Language for Medical Document Repositories

to Advance Knowledge for Humanityto Advance Knowledge for Humanity

Initial AchievementsInitial Achievements

HTML documents XML schema as per proposed model

XQuery on XML

Integration with XQBE

Query by concept Enumeration using paper and pencil

05/01/23 VLDB Phd Workshop 2013 30

Page 31: Domain-specific Multi-stage Query Language for Medical Document Repositories

to Advance Knowledge for Humanityto Advance Knowledge for Humanity

3105/01/23 VLDB Phd Workshop 2013

ChallengesChallenges

Scalability Schema extraction of similar repositories

Understand and Implement Query operations needed

Understand User characteristics to be considered

User Interface Query Language

Page 32: Domain-specific Multi-stage Query Language for Medical Document Repositories

to Advance Knowledge for Humanityto Advance Knowledge for Humanity

3205/01/23 VLDB Phd Workshop 2013

Related WorkRelated Work Domain-specific Information Retrieval [Yan, 2011]

Similarity and popularity based models Insufficient for domain experts “Information granulation” needs to be considered in huge document repositories

Form-based Query Interfaces [Jayapandian, 2009] Easy-to-use Limited access to the database Complex queries large number of forms Varying medical concepts large number of fields in forms

Beyond single page web search results [Varadarajan, 2008] Provide granular results for user’s search Return segments from multiple or related web documents as results

High-level Graphical Query Languages [Braga, 2005] Easy-to-use and understand Little or no programming effort required by the user Common languages QBE, XQBE

Page 33: Domain-specific Multi-stage Query Language for Medical Document Repositories

to Advance Knowledge for Humanityto Advance Knowledge for Humanity

Summary and Conclusions Summary and Conclusions

3305/01/23 VLDB Phd Workshop 2013

Proposed Multi-stage Query Language

1. Aim Making Online medical information usable

2. Transformation User-Level Schema

3. Facilitates Granular/Context-based Results

4. Support Healthcare Experts

5. Minimize Learning curve for novice users

6. Reduce Dependency on keyword based searches

Provide Web user level activity Healthcare experts no or little programming effort

Page 34: Domain-specific Multi-stage Query Language for Medical Document Repositories

to Advance Knowledge for Humanityto Advance Knowledge for Humanity

References (1)References (1)

3405/01/23 VLDB Phd Workshop 2013

[1] D. Braga, A. Campi, and S. Ceri. Xqbe (xquery by example): A visual interface to the standard xml query language. ACM Trans. Database Syst., 30(2):398–443, June 2005.[2] D. Cai, S. Yu, J.-R. Wen, and W.-Y. Ma. Extracting content structure for web pages based on visual representation. In Proceedings of the 5th APWeb, pages 406–417. Springer-Verlag, 2003.[3] M.-A. Cartright, R. W. White, and E. Horvitz. Intentions and attention in exploratory health search. In Proceedings of the 34th Intl. ACM SIGIR conference, pages 65–74, New York, NY, USA, 2011.ACM.[4] S. Cohen, Y. Kanza, Y. Kogan, W. Nutt, Y. Sagiv, and A. Serebrenik. Equix-a search and query language for xml. Journal of the American Society for Information Science and Technology, 53:2002, 2000.[5] S. M. Freire, E. Sundvall, D. Karlsson, and P. Lambrix. Performance of XML Databases for Epidemiological Queries in Archetype-Based EHRs. In Proceedings Scandinavian Conference on Health Informatics 2012 , volume 70 of Linkping ElectronicConference Proceedings, pages 51–57. Linkping University Electronic Press, 2012.[6] M. Gschwandtner, M. Kritz, and C. Boyer. Requirements of the health professional research. In Technical Report D8.1.2. Khresmoi Project, 2011.[7] A. Hanbury. Medical information retrieval, an instance of domain. In SIGIR'12. ACM, August 2012.[8] S. Hunt, J. J. Cimino, and D. E. Koziol. A comparison of clinicians’s access to online knowledge resources using two types of information retrieval applications in an academic hospital setting. J Med Libr Assoc, 101(1):26–31, 2013.[9] http://www.who.int/classifications/icd/en/, 2011.[10] M. Jayapandian and H. V. Jagadish. Automating the design and construction of query forms. ICDE, page 125, 2006.[11] F. Li and H. V. Jagadish. Usability, databases, and hci. IEEE Data Eng. Bull., 35(3):37–45, 2012. [12] http://loinc.org/, 2011.[13] A. Marian and W. Wang. Flexible querying of personal information. IEEE Data Eng. Bull., 32(2):20–27, 2009.[14] http://www.nlm.nih.gov/bsd/pmresources.html, 2011.[15] http://www.nlm.nih.gov/medlineplus/, 2009.[16] http://www.linkedin.com/groups/ Choice-OpenEHR-persistence-layer-144276.S.208531138?qid=208adbca-fc26-4ada-bf02-7efe5a9e5661&trk=group_most_recent_rich-0-b-ttl&goback=%2Egmr_144276, 2013.

Page 35: Domain-specific Multi-stage Query Language for Medical Document Repositories

to Advance Knowledge for Humanityto Advance Knowledge for Humanity

References (2)References (2)[17] http://www.ncbi.nlm.nih.gov/pubmed, 2011.[18] S. A. Rahman, S. Bhalla, and T. Hashimoto. Query-by-object interface for information requirement elicitation in m-commerce. Int. J. Hum. Comput. Interaction, 20(2):135–160, 2006.[19] X. Y. Raymond, Y. Lau, D. Song, X. Li, and J. Ma. Toward a semantic granularity model for domain-specific information retrieval. ACM Trans. On Information Systems., 29(3), July 2011.[20] S. Sachdeva and S. Bhalla. Implementing high-level query language interfaces for archetype-based electronic health records database. In COMAD, 2009.[21] http://www.ihtsdo.org/snomed-ct/, 2011.[22] R. Varadarajan, V. Hristidis, and T. Li. Beyond single-page web search results. IEEE Transactions on Knowledge and Data Engineering, 20(3):411–424, 2008.[23] A. Yasir, M. Kumara Swamy, P. Krishna Reddy, and S. Bhalla. Enhanced query-by-object approach for information requirement elicitation in large databases. In Big Data Analytics, volume 7678 of Lecture Notes in Computer Science, pages 26–41. Springer, 2012.[24] M. Jayapandian, H.V. Jagadish : Automating the Design and Construction of Query Forms. IEEE Trans. Knowl. Data Eng. 21(10) :1389-1402 (2009).

05/01/23 VLDB Phd Workshop 2013 35

Page 36: Domain-specific Multi-stage Query Language for Medical Document Repositories

to Advance Knowledge for Humanityto Advance Knowledge for Humanity

QuestionsQuestions

3605/01/23 VLDB Phd Workshop 2013