query translation for data sources with heterogeneous content semantics

28
Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Query Translation for Data Sources with Heterogeneous Content Semantics Jie Bao Department of Computer Science Iowa State University [email protected] May 5, 2006

Upload: jie-bao

Post on 19-Jan-2015

848 views

Category:

Documents


3 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Query Translation for Data Sources with Heterogeneous Content Semantics

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Query Translation for Data Sources with Heterogeneous Content Semantics

Jie BaoDepartment of Computer Science

Iowa State [email protected]

May 5, 2006

Page 2: Query Translation for Data Sources with Heterogeneous Content Semantics

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Outline

Ontology-Extended Data Sources (OEDS) Query Translation for OEDS with Heterogeneous

Data Content Semantics The INDUS Implementation Summary

Page 3: Query Translation for Data Sources with Heterogeneous Content Semantics

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Data Semantics

Even you have the data, do you really understand it?

From Health database for Lorises

Environmental Stress

Tiredness Unwellness

Normal

Hear Something

FearSocial Stress

Social Play

Page 4: Query Translation for Data Sources with Heterogeneous Content Semantics

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Bridging the Semantic Gap

• Explanations of data are always context-specific, therefore semantic gaps are common.

Between data sources of the same domain

Between the data provider

and a data user

Between different data users of the same data source

• Ontologies can make explicit the usually implicit assumptions about the “meaning” of data.

Page 5: Query Translation for Data Sources with Heterogeneous Content Semantics

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Example: Academic DepartmentStudent RegisterFor Classes OfferedBy InstructorsSchema

Data Set

Ontological Commitment

• Students and Instrutors are People• Classes:Duration's values are time in minutes

• Student status “2ndYear” implies “Undergrad”

Data Schema Ontology

Data Content Ontologies

We will focus on data content ontologies in this work

Page 6: Query Translation for Data Sources with Heterogeneous Content Semantics

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Jane’s ontology

Classes:Duration : Minutes

Data Content Ontologies

Data Users’Ontologies

Bob’s ontology

Classes:Duration : Hours

Data Provider’sOntology

[ AVH (Attribute Value Hierarchy) ]

Classes:Duration : Minutes

[ Unit Scale ]

Page 7: Query Translation for Data Sources with Heterogeneous Content Semantics

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Ontology-Extended Data SourcesOntology-extended data sources (OEDS) make explicit, the

otherwise implicit ontologies associated with the data sources.

• ontologies can be specified by data providers or data users representing their local points of view.

D

OS

S

Schema Data Set

Data Schema Ontology

OD

Data Content Ontology

Data Sources(Relational, RDF…)

Ontologies

Page 8: Query Translation for Data Sources with Heterogeneous Content Semantics

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Data Content Ontology as Data Type

• Common data types: String, Integer, Float…• Unit Scales

– e.g. MinuteDuration, HourDuration• Hierarchies as Partial-Order Ontologies (PO)

– Partial-ordering (): are transitive, self-reflexive and anti-symmetric relations.

– PO operators: =(equal to), <(below), >(above), (above or equal to), (below or equal to), ≠(not equal to)

– e.g. StudentStatus• Undergrad StudentStatus• Undergrad 1st_Year • 2nd_Y ear Undergrad • …

• They can be easily implemented as extensions to many RDBMSs: Oracle, PostgreSQL…

Page 9: Query Translation for Data Sources with Heterogeneous Content Semantics

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Outline

Ontology-Extended Data Sources (OEDS) Query Translation for OEDS with

Heterogeneous Data Content Semantics The INDUS Implementation Summary

Page 10: Query Translation for Data Sources with Heterogeneous Content Semantics

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Ontology-Extended Query

Bob’s query: How many regular classes (classes longer than half an hour) duration (in hours) are taken by students with status `Masters'?

Data Provider’s ontologyhas not equivalent conceptfor “Masters”

Class duration as recorded in the data source is in minutes

However, this query cannot be directly understood by the data source due to semantic gaps

Page 11: Query Translation for Data Sources with Heterogeneous Content Semantics

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Query Translation

• Query translation is a process to transform a query using one ontology to a query using another ontology– usually from a user ontology to the data provider’s

ontology• The tuples that match a given query q: {q(t)}• A translation q-> q’ is

– Sound, if {q’(t)} {q(t)} (all retrieved results are needed)

– Complete, if {q(t)} {q’(t)} (all needed results are retrieved)

– Exact, if {q(t)} = {q’(t)} (sound and complete)

Page 12: Query Translation for Data Sources with Heterogeneous Content Semantics

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Translation with Conversion Function

• A conversion function f:O1->O2 establishes one-to-one correspondences between terms in the two ontologies – O1:t and O2:f(O1:t) are semantically equivalent

• Example:– State2Code: {Iowa->IA, Delaware->DE,…}– H2M: y=x*60 (HourDuration to MinuteDuration)

• With conversion functions, exact translation can be made by term substitution– Duration HourDuration:0.5 -> Duration

MinuteDuration:30

Page 13: Query Translation for Data Sources with Heterogeneous Content Semantics

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Translation with Interoperation Constraints (1)

• In many cases, one-to-one term correspondence is not existent– Float:3.5 has no correspondence in Integer– GradStatus:Masters has no correspondence in StudentStatus

• Therefore, exact translation is not always possible. • However, we may still build sound or complete translation

with the help of Interoperation Constraints (IC)

?

Page 14: Query Translation for Data Sources with Heterogeneous Content Semantics

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Translation with Interoperation Constraints(2)

• IC between Float and Integer– Float:x <= Integer:x (ceiling)– Float:x >= Integer:x (floor)

• Translation rules– Sound translation: A < Float:x -> A < Integer:x, A >

Float:x -> A > Integer:x – Complete translation: A < Float:x -> A < Integer:x , A

> Float:x -> A > Integer :x • Example

– Sound translation: A< Float:3.5 -> A < Integer:3 A> Float:3.5 -> A > Integer:4

– Complete translation: A< Float:3.5 -> A < Integer:4 A> Float:3.5 -> A > Integer:3

The translation is dependent on both the terms and the operators in question

Page 15: Query Translation for Data Sources with Heterogeneous Content Semantics

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Translation with Interoperation Constraints(3)

• IC between Partial-order Ontologies– INTO (<=): GradStatus: " Masters" <=

StudentStatus: "Grad"– ONTO (>=): GradStatus: "Masters" >=

StudentStatus: "Master of Science"– EQUIV (=): GradStatus: "Ph.D" =

StudentStatus: " Doctor of Philosophy"

=

<=

>=

Page 16: Query Translation for Data Sources with Heterogeneous Content Semantics

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Translation Rules for PO

Example

• Sound translation: Status GradStatus: "Masters" -> Status StudentStatus:"Master of Science“(IC : GradStatus: "Masters" >= StudentStatus:"Master of Science“)

• Complete translation:Status GradStatus: "Masters" -> Status StudentStatus:“Grad“(IC : GradStatus: "Masters" <= StudentStatus:"Master of Grad“)

Page 17: Query Translation for Data Sources with Heterogeneous Content Semantics

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

A Query Translation Algorithm

Page 18: Query Translation for Data Sources with Heterogeneous Content Semantics

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Outline

Ontology-Extended Data Sources (OEDS) Query Translation for OEDS with

Heterogeneous Data Content Semantics The INDUS Implementation Summary

Page 19: Query Translation for Data Sources with Heterogeneous Content Semantics

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Ontology-based information integration in INDUS

Page 20: Query Translation for Data Sources with Heterogeneous Content Semantics

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Query processing in INDUS

QL

SV,OV

QLSQL

SV

Q1

S1,OV

Qn

Sn,OV

Qr1

S1,O1

Qrn

S1,On

Qr1SQL

S1

QrnSQL

Sn

D1

Dn

r1

rn

In remote ontology

In local ontology In local schema

In remote schema

r1L

rnL

RL

QueryFormation

LocalRewriting

Query Decomposition

Query Translation

Remote Rewriting

QueryExecution

InverseTranslation

ResultComposition

M1

Mn

M1

Mn

Query Formulation

Handling both schema heterogeneity and data content heterogeneity

Page 21: Query Translation for Data Sources with Heterogeneous Content Semantics

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

INDUS: Ontology Editor

Page 22: Query Translation for Data Sources with Heterogeneous Content Semantics

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

INDUS: Schema Editor

Page 23: Query Translation for Data Sources with Heterogeneous Content Semantics

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

INDUS: Mapping Editor

Page 24: Query Translation for Data Sources with Heterogeneous Content Semantics

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

INDUS: Query Editor

Page 25: Query Translation for Data Sources with Heterogeneous Content Semantics

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Outline

Ontology-Extended Data Sources (OEDS) Query Translation for OEDS with

Heterogeneous Data Content Semantics The INDUS Implementation Summary

Page 26: Query Translation for Data Sources with Heterogeneous Content Semantics

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Related work

Extensive work on semantic data integration, see survey papers [Hull 1997; Wache, et al. 2001; Levy, 2000]

Query translation with schema ontologies OBSERVER: [Mena et al., 2000] SIRUP: [Ziegler and Dittrich, 2004]

Query translation with data content ontologies BUSTER: [Wache and Stuckenschmidt, 2001] COIN: [Goh et al., 1999] Both only address term substitution, i.e. translation with conversion

functions. HOME & Ontology-extended relational algebra: [Bonatti et

al., 2003] It allows data types to be hierarchies, but only with “below”(<=)

operations on hierarchies.

Page 27: Query Translation for Data Sources with Heterogeneous Content Semantics

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Conclusions

• In this study, we:– Argued for the need for making explicit the ontological

commitments behind data content semantics, in addition to data schema semantics

– Formulated the problem of translating queries w.r.t. context-specific data content ontologies.

– Described an algorithm for semantic-preserving translation of an ontology-extended query.

• Future Work:– Improve the scaleability of the translation process– Improve the expressiveness of supported ontologies

Page 28: Query Translation for Data Sources with Heterogeneous Content Semantics

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Thank you!

Questions ?