stanford'12 intro to ontology based data access for rdbms through query rewriting
DESCRIPTION
Seminar on Ontology Based Data Access for RDBMSs through query rewriting at Stanford's BMIR lab. 2012.TRANSCRIPT
ONTOLOGY BASED DATA ACCESS Architecture, Techniques and Systems
Mariano Rodríguez-Muro KRDB Research Group
Free University of Bozen-Bolzano BMIR, Stanford February, 2012
ONTOLOGIES Reasoning and Data
OBDA: Architecture, Techniques and Systems
Ontologies
• A formal conceptualization of a domain of interest • They come in many different
languages: RDFS, OBO, OWL 2, SWRL, etc. • Uses • Documentation • Knowledge Exchange • Discovering new knowledge • Ontologies + Data…
OBDA: Architecture, Techniques and Systems
Instance reasoning • Instance reasoning • Infer new information about the data • Detect inconsistent data • Use inferred information for complex queries (e.g., SPARQL)
• Queries • Is :person/mariano an instance of :Mammal? • Retrieve all instances of :Mammal • SELECT ?x, ?y WHERE { ?x a :Mammal; :hasAncestor ?y. ?y a :Mammal }
• Requirements • Fast execution • Efficient resource management • Big data, Big ontologies
OBDA: Architecture, Techniques and Systems
The usual workflow
OBDA: Architecture, Techniques and Systems
Reasoner
Source
Application
Communication
Ontology
Inputs
Triples Application Code
Problem with approach • Software Complexity • Duplication • Data refreshing
• Data structure is lost (PKEYS, FOREIGN KEYS, information about the import procedure)
OBDA: Architecture, Techniques and Systems
Reasoner
Source
Application
Communication
Ontology
Inputs
Triples Application Code
OBDA Models and Architecture
OBDA: Architecture, Techniques and Systems
OBDA as an Architecture
OBDA: Architecture, Techniques and Systems
Reasoner
Source
Application
Direct Communication
Ontology
OBDA Model
Inputs
OBDA Models: Sources and Mappings
“A formal specification of the relationship between data in a data source and the vocabulary of the ontology”
OBDA: Architecture, Techniques and Systems
OBDA Model
Source
Source Declaration A set of mappings
Mapping
“A tuple of 2 queries, one over the source and one over the ontology, with the same signature. Intuitively, a mapping associates the data specified by qs with the answers for qo ”
OBDA: Architecture, Techniques and Systems
qs⊆qo
SELECT id FROM condition WHERE c_id = 3333
⊆ CardiacArrestPatient(?id)èq(?id)
id = (23) <23> rdf:type CardiacArrestPatient
Example OBDA model
OBDA: Architecture, Techniques and Systems
SELECT id FROM condition WHERE c_id = 3333
⊆ CardiacArrestPatient(?id) è q(?id)
SELECT id,name,age,ssn FROM patient ⊆ Patient(?id) ^ name(?id,?name)
^ age(?id,?age) ^ ssn(?id, ?ssn) è q(?id,?name,?age,?ssn)
id [PKEY] name age ssn
12345 John 37 xxx-999
… … … …
Table: patient
patient_id [FKEY] c_id [FKEY]
12345 3333
… …
Table: condition
Example OBDA model
OBDA: Architecture, Techniques and Systems
id [PKEY] name age ssn
12345 John 37 xxx-999
… … … …
Table: patient
patient_id [FKEY] c_id [FKEY]
12345 3333
… …
Table: condition
<12345> rdf:type :Patient. <12345> :name “John”. <12345> :age “37”. <12345> :ssn “xxx-999” <12345> rdf:type :CardiacArrestPatient …
The Pay-off • At least • The source is documented • Data handling can be done automatically (by the reasoner) • Reduced cost of application development and maintenance • The reasoner can analyze source and mappings to minimize the cost of
inference
• The sweet spot • On-the-fly data access • Reasoning by query rewriting • Exploitation of efficient engines
OBDA: Architecture, Techniques and Systems
QUERY REWRITING
OBDA: Architecture, Techniques and Systems
Query Rewriting in a Nutshell
• Given a query Q, a TBox T, an OBDA model <D, M> to compute a query Q’ such that:
answer(Q,T,mat(D,M)) = answer(Q’,D)
where mat(D,M) is the collection of assertion resulting from “materializing” the mappings into ABox assertions (assertional triples)
OBDA: Architecture, Techniques and Systems
Example OBDA model
OBDA: Architecture, Techniques and Systems
SELECT id FROM condition WHERE c_id = 3333
⤳ CardiacArrestPatient(?id) è q(?id)
SELECT id,name,age,ssn FROM patient ⤳ Patient(?id) ^ name(?id,?name)
^ age(?id,?age) ^ ssn(?id, ?ssn) è q(?id,?name,?age,?ssn)
id [PKEY] name age ssn
12345 John 37 xxx-999
… … … …
Table: patient
patient_id [FKEY] c_id [FKEY]
12345 3333
… …
Table: condition
Query Rewriting: An example
OBDA: Architecture, Techniques and Systems
Ontology (Tbox)
SubClassOf(:CardiacArrest :HearthCondition) SubClassOf(:CardiacArrestPatient :Patient) SubClassOf(:CardiacArrestPatient ObjectSomeValuesFrom(:affectedBy :CardiacArrest))
Query (SPARQL)
SELECT ?p ?name ?ssn WHERE { ?p a :Patient; :name ?name; :ssn ?ssn; :age ?age
:affectedBy [ a :HeartCondition
]. FILTER (?age >= 21 && ?age <= 50) }
Query Rewriting: An example
OBDA: Architecture, Techniques and Systems
Rewritten query
SELECT ?p ?name ?ssn WHERE { {?p a :Patient; :name ?name; :ssn ?ssn; :age ?age
:affectedBy [ a :HeartCondition
]. FILTER (?age >= 21 && ?age <= 50) }
UNION {?p a :Patient; :name ?name; :ssn ?ssn; :age ?age
:affectedBy [ a :CardiacArrest
]. FILTER (?age >= 21 && ?age <= 50) }
UNION {?p a :Patient; :name ?name; :ssn ?ssn; :age ?age; a :CardiacArrestPatient. FILTER (?age >= 21 && ?age <= 50) }
UNION … }
Query Rewriting An Example
OBDA: Architecture, Techniques and Systems
SQL query
SELECT tp.id as p, tp.name as name, tp.age as age FROM patient tp JOIN condition tc ON tp.id = tc.patient_id WHERE c.c_id = 3333 AND tp.age >= 21 AND tp.age <= 50
?p ?name ?ssn
12345 John xxx-999
Answer
“Fast execution even in the presence of millions of assertions”
That Simple? • Warning: Query rewritings can easily grow to exponentially. • Effective query rewriting requires: • Highly efficient rewriting algorithm that is able to detect redundancy • Highly efficient SQL generation: • Detect redundant SQL (w.r.t. constraints and mappings) • Optimize individual SQL queries (w.r.t. constraints and mappings) • Generate optimal SQL (w.r.t. the database engine) • Able to deal with impedance miss-match (URIs and Literals vs. Data values)
• Database engine tuning (indexing, buffers, disk, etc.)
• Effective query rewriting gives you: • Fast system initialization • Small footprint • Fast query execution
OBDA: Architecture, Techniques and Systems
Efficient Languages (for pure query rewriting)
• RDFS, DL-Lite, OWL 2 QL • Datalog+- • DL-lite/OWL 2 QL/Datalog+- fragments of SWRL
Promising Languages (for combined approaches) • EL++ and OWL 2 EL • OWL-Horst and OWL 2 RL • SWRL with limited recursivity
OBDA: Architecture, Techniques and Systems
SYSTEMS OBDALib, OBDA Plugin for Protègè 4
OBDA: Architecture, Techniques and Systems
OBDA as an Architecture
OBDA: Architecture, Techniques and Systems
Ontology
Reasoner
OBDA Model
Source
Application
Communication
Inputs
OBDALib A Java library for: • OBDA Model creation and manipulation • OBDA Model persistence • Interfaces for OBDA-capable reasoners • SQL parsing and Datalog translation • RDBMS metadata extraction libraries • OBDA model materialization
In the near future: • Automatic OBDA model generation (compatible with W3C’s RDB2RDF
direct mapping) • Support for W3C’s R2RML syntax
OBDA: Architecture, Techniques and Systems
OBDA Plugin for Protégé 4
“A plugin to write and test OBDA models interact with OBDA-capable reasoners”
OBDA: Architecture, Techniques and Systems
OBDA Model tab and tools
OBDA: Architecture, Techniques and Systems
OBDA Model tab and tools
OBDA: Architecture, Techniques and Systems
OBDA Model synch
An EditorKitHook plugin to: • Associate an OBDA
model to the editor environment • Synchronize OBDA
models with OBDA-capable reasoners
OBDA: Architecture, Techniques and Systems
DataQuery Tab
OBDA: Architecture, Techniques and Systems
SYSTEMS Quest
OBDA: Architecture, Techniques and Systems
Quest An OBDA-capable reasoner with focus on fast and efficient query answering over very large ontologies and volumes of data. Features: • Support for RDFS and OWL 2 QL and DL-Lite • SPARQL
• On-the-fly reasoning based on query rewriting • Read-only “Virtual OBDA” • Read/Write “Triple-store” mode
• Generation of highly optimized SQL
• OWLAPI 3 and Protégé support
OBDA: Architecture, Techniques and Systems
Quest in virtual mode
OBDA: Architecture, Techniques and Systems
Ontology
Quest
OBDA Model
Source
Application
JDBC
Inputs
MySQL, PostgreSQL, DB2 and Oracle
Data integration with Quest in virtual mode
OBDA: Architecture, Techniques and Systems
Ontology
Quest
OBDA Model
Database Federator
Application
JDBC
Inputs
E.g., Teiid
Read/Write triple-store mode
OBDA: Architecture, Techniques and Systems
Ontology
Quest
Triples
JDBC Storage
Application
JDBC
Storage is is based on the Semantic Index technique (ISWC11, KR12)
Technique based on “smart index” computation that allows to retrieve hierarchy inferences by means of interval queries (FAST SQL!)
Performance in triple-store mode: Resource Index Experiments • Input: • Ontology: The asserted is-a relations in obs_relation (for all RI ontologies) • Data: The annotations for Clinical Trials.gov • Queries e.g,.
SELECT ?x WHERE { ?x a :DNA_Repair_Gene; a :Antigen_Gene; a :Cancer_Gene. }
OBDA: Architecture, Techniques and Systems
Performance in triple-store mode: Resource Index Experiments • System setup costs: • Resource Index workflow: • Ontology Closure: X ? • CT annotation closure: 7 days (naïve), 40 mins optimized • Space requirements for CT: 16 GB + isa-closure: 70 GB
• Using a naïve implementation of Quest’s reasoning technique for the RI: • Ontology Closure: 5 mins • CT annotation closure: none • Space requirements for CT: 16 GB
• Execution speed: roughly the same • Potential to eliminate all _isa_annotation_tables and the closure of relation_isa.
OBDA: Architecture, Techniques and Systems
DEMO
OBDA: Architecture, Techniques and Systems
CONCLUSIONS
OBDA: Architecture, Techniques and Systems
Summary • OBDA as an architecture • Benefits: Software Complexity, Optimization and On-the-fly query
answering
• Basis of query rewriting in OBDA • Introduced • OBDALib • OBDA Plugin for Protégé • Quest
• Briefly mentioned the performance advantages of Quest’s reasoning technique
OBDA: Architecture, Techniques and Systems
Where to go now? • Resource index overhauling? • Demos? • More detail on the techniques? • More details on the systems? • Development and plugins for Protege • Projects?! • You call it J
OBDA: Architecture, Techniques and Systems
THANK YOU
OBDA: Architecture, Techniques and Systems