tools for next generation of cms: xml, rdf, & grddl

Post on 12-Jan-2016

20 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Tools for Next Generation of CMS: XML, RDF, & GRDDL. Chimezie Ogbuji (chee-meh) ‏ Cleveland Clinic Foundation Cardiothoracic Surgery Research ogbujic@ccf.org / chimezie@gmail.com. Background (CT Research Roadmap) ‏. A large, relational registry for Cardiothoracic procedures - PowerPoint PPT Presentation

TRANSCRIPT

Tools for Next Generation of CMS: XML, RDF, & GRDDL

Chimezie Ogbuji (chee-meh)Cleveland Clinic FoundationCardiothoracic Surgery Researchogbujic@ccf.org / chimezie@gmail.com

Background (CT Research Roadmap)

A large, relational registry for Cardiothoracic procedures

Relatively small research department with very little software engineering experience

Traditional CMS and DBMS were insufficient Initiated a large effort to convert to a metadata-

driven XML / RDF repository (SemanticDB) Need to replace a productive, integrated research

pipeline Data entry, clinical Q&A, patient follow-up, concurrent

study management,... 100+ research papers per year

Background (Institute of Medicine Proposal)

The Computer-Based Patient Record: An Essential Technology for Health Care ISBN: 0309055326

Old but very relevant set of requirements by the IOM (still unfulfilled).

A comprehensive attempt to address all the requirements: technological, clinical, procedural, etc..

Can be (completely) addressed with Semantic Web architecture, document processing, and “Web 2.0” architecture.

CPR: Functional Requirements

Uniform, extensible record content (Standard) record formats System performance Linkages Intelligence Reporting Capabilities Security Multi-views Accessiblity

Definitions: KR / CMS

What is Knowledge Representation (KR)? What is a Knowledge Base (KB)?:

A database system which facilitates deductive reasoning over a KR

Commonly called Rule-based Systems What are Expert Systems? What is a Content Management System

(CMS)?

Knowledge Representation

Older ideas at corners, newer ideas along sides (Credit: Conrad Barski, M.D.)

Content Management System:The What

The terms CMS and Content Repository are essentially interchangeable

Modern content repositories are best characterized by JSR 170 / 283

“.. a high-level information management system that is a superset of traditional data repositories”

Integrated support for the XPath data model is the most prominent feature (native document management)

Content Repository Feature Set

Modern CMS standards cover document management effectively Read/write access Versioning Event monitoring Document-level access control Concurrent access Cross-linking Profiles and Document Types

Anatomy of a JSR 170 Implementation

Jack Rabbit Component-based

Content Applications Content Repository API Implementation

Knowledge Bases and CMS

What of the requirements that Expert Systems meet?

Document management and knowledge management systems are historically isolated from each other

XML & RDF are contemporary manifestations of these methodologies

They have remained as isolated as their predecessors

They typically only coincide with regards to syntax

XML & RDF:Eating and Having your Cake

Classic example of where the document-oriented approach falls short: Modern EHR cannot facilitate dynamic research

Unified infrastructure for document and knowledge management is needed

One of the earliest examples: 4Suite Server version 0.10.0 (December 2000)

Current state of the art (GRDDL): Gleaning Resource Descriptions from Dialects of Language

GRDDL:The Elevator Pitch

Provides a way to normalize RDF concrete syntaxes

The problem: Many RDF concrete syntaxes (RDF/XML,Trix, RDFa,..) The authoritative concrete syntax is not without issues

The solution: Define mappings from XML dialects to RDF graphs Use turing-complete XML pipelines

English as a second language analogy

The GRDDL Picture

GRDDL:The Components

Faithful Rendition “By specifying a GRDDL transformation, the author of a

document states that the transformation will provide a faithful rendition in RDF of information (or some portion of the information) expressed through the XML dialect used in the source document.”

Various Mechanism for nominating transformations: Specific XML attribute, XML Namespaces, HTML

Profiles, and XHTML links GRDDL-aware agents compute GRDDL results

(RDF graphs)

The CMS Alternative:“Dual Representation”

Persist XML in synchrony with its faithful rendition Changes to the XML trigger calculation and storage of

corresponding RDF “Dual Representation” Implemented by 4Suite Server Document

Definitions The basis of how we capture patient records with

maximum syntactic and semantic expressivity

Document Definition

The document definition is the mapping Usually an XSLT document

Content Repository Architecture

Overlap between Content Repository APIs

Dual Representation:Advantages

Maximum expressiveness and versatility of content Unified naming convention and access control

(more on this later) Uniform, concrete RDF syntaxes

For systems which speak XML fluently (XForms, POX over HTTP, WS-*, etc..)

Cheap support for XML & RDF content negotiation Use of RDF as a semantic index for XML

Document Definition:Similarities

GRDDL RDDL

Resource Directory Description Language Human-readable descriptive material about a target A directory of individual resources related to a target

Nature and Purpose Schema, stylesheet, etc.

Lives at a namespace URI WXS's targetNamespace Common theme is a set of definitions for a

document or a class of documents

Registering a Document to a Class

Namespace registration works well for the web (preferred approach of W3C TAG)

What if you don't control the content served from the namespace of an existing vocabulary? Atom, Docbook, etc.

A CMS is better suited for a 'closed' / 'controlled' approach Persist membership metadata in the CMS

SemanticDB and Dual Representation

Document and Graph Granularity

Tying documents to graphs normalizes the content granularity

Documents and their RDF graphs can be treated uniformly: Naming convention Targeted querying Access control management

JSR Fine-Grained Control

'Controlled' Naming Convention

Controlled Naming Convention:Continued

RDF Dataset (from SPARQL): A collection of named graphs

The RDF is stored in a graph with the same URI as the XML source document

When RDF is used as the primary cross-document 'index' you can:

SELECT ?graph WHERE { GRAPH ?graph { ... } } document($graph)/.. XPath ..

The space compromise (of dual representation) can be further mitigated by only extracting a minimal RDF graph

Uniform Access Control for XML/RDF CMS

Traditionally, Access Control Lists are associated with an object Example: a file or directory in a filesystem

Assign document / graph ACLs to a single URI Certain users / groups can query the RDF but cannot

read the XML De-identification of EHR: HIPPA

The 4Suite repository supports unified XML/RDF ACL

Going Forward

The SPARQL RDF dataset needs to be generalized There is a long list of representation problems solved by

a formal named graph specification RDF graphs need to be first-class objects in CMS Build a common Content Repository API for XML /

RDF on the JSR 170 / 283 foundation Where do the 4Suite Repository API and JSR 170 /

283 overlap? How do we generalize Document Definitions?

A Proposal for XML/RDF CMS

Primary Takeaways

We need to stop thinking of XML & RDF as mutually exclusive solutions to similar problems

CMS standards are needed for the next generation of semantic / rich web applications

These standards can preemptively level the landscape of toolkits in this space

References

D. Nuescheler et al, JSR 170: Content Repository for Java http://jcp.org/en/jsr/detail?id=170

D. Connolly, Gleaning Resource Descriptions from Dialects of Language http://www.w3.org/TR/grddl/

J. Borden, T. Bray, Resource Directory Description Language http://www.rddl.org/

E. Prud'hommeaux, A. Seaborne, SPARQL Query Language for RDF http://www.w3.org/TR/rdf-sparql-query/

Fourthought Inc., 4Suite http://4Suite.org

top related