semantic web technologies as a framework for clinical informatics

Post on 13-Dec-2014

1.499 Views

Category:

Health & Medicine

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

Semantic Web Technologies Semantic Web Technologies as a Framework for Clinical as a Framework for Clinical

InformaticsInformatics

Chimezie Ogbuji (CCF)Chris Pierce (CCF)

Chris Deaton (Cycorp)

Semantic Technology Conference16 June 2009

MeMe

I work in the Heart and Vascular Institute at the Cleveland Clinic

We store and query patient populations as an RDF dataset

Ph.D student at Case Western Reserve University

Researching medical informatics methodology

OutlineOutlineRelevant methods in clinical

informatics

Traditional challenges in cohort identification

RDF dataset and managing patient populations

Our cohort identification system

Challenges with current standards

What is Informatics?What is Informatics?

The science of information:• Gathering

• Analysis

• Representation

Scientific method: • Body of techniques for investigating

phenomena, acquiring new knowledge, or correcting & integrating previous knowledge.

BioinformaticsBioinformatics

Bioinformatics• Discipline of gathering, analyzing,

and representing the structure / function of genes and proteins and correlating these to disease and population variation.

Medical InformaticsMedical Informatics

Medical informatics: • Discipline of gathering, analyzing,

and representing longitudinal patient studies in health and disease while providing decision support or predictive tools to assist in the diagnosis and prognosis of clinical patient care.

Cohort StudiesCohort Studies

Longitudinal study: • Research study that involves

repeated observations of the same items over long periods of time.

Cohort: • Group of subjects — most often

humans from a given population — characterized by the experience of an event in a particular time span.

Retrospective Cohort StudiesRetrospective Cohort Studies

Observational clinical study: • A longitudinal study that looks

back in time

Dependent on curated patient record content

We primarily do observational studies from our cardiothoracic patient registry

Reasoning Methods in Reasoning Methods in Biomedical InformaticsBiomedical Informatics

Areas of Applied OntologyAreas of Applied Ontology

Controlled vocabulary standards and management

Reporting and export of patient record content for analysis and aggregation

Population-based research• Identification of cohorts

Challenges in Challenges in Traditional Traditional CCohort Identificationohort Identification

Domain-specific criteria are conceived by researches who dialog with DBA(s)• DBAs translate this into joins,

aggregation, text matching, etc.

Mostly an exercise in navigation of data structure

Organization of content cannot easily evolve

Patient RecordsPatient Records

Computer-based Patient Record: • An electronic patient record that

resides in a system specifically designed to support users through availability of complete and accurate data, practitioner reminders and alerts, clinical decision support systems, links to bodies of medical knowledge, and other aids.

Patient Records Cont.Patient Records Cont.

Longitudinal patient record: • Patient records from different

times, providers, and sites of care that are linked to form a lifelong view of a patient’s health care experience or a single patient record system with the same characteristics.

RDF DatasetsRDF Datasets

• “A SPARQL query is executed against an RDF Dataset which represents a collection of graphs. An RDF Dataset comprises one graph, the default graph, which does not have a name, and zero or more named graphs, where each named graph is identified by an IRI.“

RDF Datasets Cont.RDF Datasets Cont.

Similar to a document collection in XPath 2.0

The GRAPH operator can be used to scope query patterns to a particular graph or within all named graph

SPARQL &Cohort SPARQL &Cohort IdentificationIdentification

One named graph per patient record (a patient record graph)

Each patient record graph is allocated a URI

No significant cross-graph statements.• Beyond cohort identification, most

processing happens within a single patient record graph

Use of Named GraphsUse of Named Graphs

In our vocabulary, there are instances of PatientRecord, Operation, Patient, etc.

PatientRecord resources share a URI with their containing graph

GRAPH operator can be used to optimize the search space

Use of Named Graphs Cont.Use of Named Graphs Cont.

Easy to parallelize computation and optimal for cohort querying• Constraints in the first part of

query are cross-graph while the second part are intra-graph

Patient Record OntologyPatient Record Ontology

3974+ OWL Classes, 171 Object properties, and 217 Datatype properties

Diseases, findings, symptoms, medication, procedures, etc…

SHOIN(D) expressiveness • OWL-DL

Ontology: DiagnosesOntology: Diagnoses

Ontology: Coronary AnatomyOntology: Coronary Anatomy

Ontology: PathogensOntology: Pathogens

Ontologies: Family HistoryOntologies: Family History

Integration with Cyc KBIntegration with Cyc KB

Patient record ontology is aligned to Cyc common sense ontology

Lexical metadata are added to facilitate natural language processing

Cyc SKSI protocol was extended to support SPARQL

Semantic Research AssistantSemantic Research Assistant

Cyc-based medical expert system for cohort identification

Natural-language driven interface composes logical queries

Queries are generated against a SPARQL Protocol service

Leverages ontology alignment

““Semantic Interface”Semantic Interface”

OWL serves as the schema for a cohort’s SPARQL protocol service

SPARQL is the query interlingua

The Cyc KB’s common sense ontology and NLP capabilities shield the researcher from SPARQL, RDF, and OWL

ScreenshotsScreenshots

CycL QueryCycL Query

(thereExists ?ID (thereExists ?PATIENT (and (cCFhasLeftAtriumDiameter ?CATH-OR-ECHO ?DISTANCE) (patientTreated ?CATH-OR-ECHO ?PATIENT) (cCFCCFID ?PATIENT ?ID) (isa ?CATH-OR-ECHO Echocardiogram) (patientTreated ?CATH-OR-ECHO ?PATIENT) (or (and (patientSex ?PATIENT MaleHuman) (greaterThan ?DISTANCE ( (Centi Meter) 4.2))) (and (patientSex ?PATIENT FemaleHuman) (greaterThan ?DISTANCE ( (Centi Meter) 3.8)))) (temporallyBetween-Inclusive ?CATH-OR-ECHO (MonthFn January (YearFn 2008)) (DayFn 15 (MonthFn March (YearFn 2008)))))))

SPARQL QueriesSPARQL Queries

SELECT ?VAR0 ?VAR1 ?VAR2 ?VAR3 ?VAR4 ?VAR5 ?VAR6WHERE { ?VAR0 ptrec:hasSex ptrec:Sex_female . ?VAR0 a ptrec:Patient . ?VAR1 dnode:contains ?VAR0 . ?VAR1 a ptrec:PatientRecord . ?VAR1 dnode:contains ?VAR2 . ?VAR2 a ptrec:Event_evaluation_echocardiogram> . ?VAR2 ptrec:hasLeftAtriumDiameter ?VAR3 . FILTER (?VAR3 > xsd:float(3.8)) ?VAR2 dnode:contains ?VAR4 . ?VAR4 a ptrec:EventStartDate . ?VAR4 ptrec:hasDateTimeMax ?VAR5 . FILTER (?VAR5 > xsd:dateTime("2007-12-31T23:59:59")) FILTER (xsd:dateTime("2008-03-16T00:00:00") > ?VAR5) ?VAR0 ptrec:hasCCFID ?VAR6 .}

ChallengesChallenges

Representing negation in SPARQL is painfully cumbersome• Patients who had X but not Y

No equivalent of SQL’s IN operator• Find patients who had a diagnoses

of an myocardial infarction, renal failure, or atrial fibrillation

Challenges Cont.Challenges Cont.

SPARQL specification doesn’t allow matching blank nodes by name

No sufficient, readily-available medical record ontologies• We created our own

Protocol doesn’t easily support a way to abort running queries

Questions?Questions?

Case Study: A Semantic Web Content Repository for Clinical Research

http://www.w3.org/2001/sw/sweo/public/UseCases/ClevelandClinic/

Email ogbujic@ccf.org for (updated) copy of slides

top related