semantic integration of xml using a rdf global mediatorfhsu/publications/feihong_defense.pdfmasters...

23
1 Semantic Integration of XML Using a RDF Global Mediator Feihong Hsu Masters Thesis Defense March 10, 2004 Masters Defense, March 2004 2 Why Semantic Integration? (1) XML documents are scattered throughout the web—but there is a lot of heterogeneity in terms of their schemas! We want to use the information contained within them, but we don’t have the time to translate each and every document to “our” format.

Upload: others

Post on 10-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Semantic Integration of XML Using a RDF Global Mediatorfhsu/publications/feihong_defense.pdfMasters Thesis Defense March 10, 2004 Masters Defense, March 2004 2 Why Semantic Integration?

1

Semantic Integration of XML

Using a RDF Global Mediator

Feihong Hsu

Masters Thesis Defense

March 10, 2004

Masters Defense, March 2004 2

Why Semantic Integration? (1)

• XML documents are scattered throughout

the web—but there is a lot of

heterogeneity in terms of their schemas!

• We want to use the information contained

within them, but we don’t have the time to

translate each and every document to

“our” format.

Page 2: Semantic Integration of XML Using a RDF Global Mediatorfhsu/publications/feihong_defense.pdfMasters Thesis Defense March 10, 2004 Masters Defense, March 2004 2 Why Semantic Integration?

2

Masters Defense, March 2004 3

Why Semantic Integration? (2)

• We would like to be able to take

advantage of common data among

documents.

• We would like to be able to ask higher-

level queries.

Masters Defense, March 2004 4

The Semantic Web Stack – Where

We Fit In

UnicodeUnicodeUnicodeUnicode URI

XML + NS + XML Schema

RDF + RDF Schema

Ontology Vocabulary

Logic

Proof

Trust

Digital Signature

Self –

described

document

Data

Data

Rules

Page 3: Semantic Integration of XML Using a RDF Global Mediatorfhsu/publications/feihong_defense.pdfMasters Thesis Defense March 10, 2004 Masters Defense, March 2004 2 Why Semantic Integration?

3

Masters Defense, March 2004 5

Key Problems in Semantic

Integration

• Schematic Heterogeneity

• Semantic Heterogeneity

• Semantic Relationships

• Object Identity

Masters Defense, March 2004 6

Schematic Heterogeneity

<films>

<film title=”21 Grams”>

<actor name=”B. del Toro”/>

</film>

<film title=”Traffic”>

<actor name=”B. del Toro”/>

</film>

</films>

<actors>

<actor name=”B. del Toro”>

<films>

<film title=”21 Grams”/>

<film title=”Traffic”/>

</films>

</actor>

</actors>

Documents can contain the same element and

attribute names but have different nested

structures.

Page 4: Semantic Integration of XML Using a RDF Global Mediatorfhsu/publications/feihong_defense.pdfMasters Thesis Defense March 10, 2004 Masters Defense, March 2004 2 Why Semantic Integration?

4

Masters Defense, March 2004 7

Semantic Heterogeneity (1)

<workers>

<worker>

<name>Feihong Hsu</name>

<job>Janitor</job>

<comp>90000</comp>

</worker>

</workers>

<employees>

<employee>

<name first=”Feihong”

last=”Hsu”/>

<role>Janitor</role>

<salary>90000</salary>

</employee>

</employees>

Documents can have the same semantics but

have different names for elements and attributes.

Masters Defense, March 2004 8

Semantic Heterogeneity (2)

<stars>

<star name=”Eva Gardner”>

<born>1922-12-24</born>

<died>1990-01-25</died>

</star>

</stars>

<stars>

<star name=”Betelgeuse”>

<distance>425 light years

</distance>

<luminosity from=”40000”

to=”100000”/>

</star>

</stars>

Documents can have the same names for

elements and attributes but have different

semantics.

Page 5: Semantic Integration of XML Using a RDF Global Mediatorfhsu/publications/feihong_defense.pdfMasters Thesis Defense March 10, 2004 Masters Defense, March 2004 2 Why Semantic Integration?

5

Masters Defense, March 2004 9

Semantic Relationships (1)

<trucks>

<truck model=”Ram SR-10”>

<manuf>Dodge</manuf>

<msrp>$45,000</msrp>

<truck>

</trucks>

<cars>

<car model=”Miata MX-5”>

<manuf>Mazda</manuf>

<msrp>$22,388</msrp>

</car>

</cars>

What if you wanted to do a search for information

involving Automobiles (a hypernym of Car and

Truck)?

Masters Defense, March 2004 10

Semantic Relationships (2)

<cars>

<car model=”Miata MX-5”>

<manuf>Mazda</manuf>

<doors>2</doors>

</car>

<car model=”EuroVan MV”>

<manuf>VW</manufact>

<doors>4</doors>

</car>

</cars>

What if you wanted to

do a search for

Coupes?

(Coupe is a hyponym

of Car—it’s a Car that

has only 2 doors.)

Page 6: Semantic Integration of XML Using a RDF Global Mediatorfhsu/publications/feihong_defense.pdfMasters Thesis Defense March 10, 2004 Masters Defense, March 2004 2 Why Semantic Integration?

6

Masters Defense, March 2004 11

Object Identity

<employees>

<employee>

<name>B. Banner</name>

<dept>Physics</dept>

<salary>40000</salary>

</employee>

</employees>

<scientists>

<scientist>

<name>B. Banner</name>

<area>Physics</area>

<degree>PhD</degree>

</scientist>

</scientists>

How do we figure out that the two XML

snippets describe the same person?

Masters Defense, March 2004 12

Architecture of the Semantic

Integration Framework (1)

Global Mediator

XML Repository 1 XML Repository 3XML Repository 2

XML Doc 1 XML Doc 2 XML Doc 3 XML Doc 4 XML Doc 5 XML Doc 6 XML Doc 7

Page 7: Semantic Integration of XML Using a RDF Global Mediatorfhsu/publications/feihong_defense.pdfMasters Thesis Defense March 10, 2004 Masters Defense, March 2004 2 Why Semantic Integration?

7

Masters Defense, March 2004 13

Architecture of the Semantic

Integration Framework (2)

Layers:

• RDF Global Mediator – provides view of

the data as a conceptual model

• XML Repository – provides view of

homogeneous documents as a single

document

• XML Local Data Source – provides the

actual information

Masters Defense, March 2004 14

RDF Global Mediator

• Simulates an RDF repository

• Accepts RDQL queries, and returns RDQL

result tables

• Provides a global ontology in the form of

RDF Schema

• Keeps track of mappings between the

global ontology and the local schema

through mapping structures

Page 8: Semantic Integration of XML Using a RDF Global Mediatorfhsu/publications/feihong_defense.pdfMasters Thesis Defense March 10, 2004 Masters Defense, March 2004 2 Why Semantic Integration?

8

Masters Defense, March 2004 15

Mapping Structures

• Owned by the RDF Global Mediator

• Bridge the gap between the global

ontology (RDFS) and the local schemas

(XMLS)

• Not a separate layer because they are

static data structures

• Currently have to be constructed by hand

Masters Defense, March 2004 16

XML Repository

• Simulates a single, monolithic XML

document

• Accepts XQuery expressions

• Returns DOM trees

• Handles distributed XQuery processing

• Provides a schema for its local data

sources

Page 9: Semantic Integration of XML Using a RDF Global Mediatorfhsu/publications/feihong_defense.pdfMasters Thesis Defense March 10, 2004 Masters Defense, March 2004 2 Why Semantic Integration?

9

Masters Defense, March 2004 17

XML Local Data Source

• Does not simulate anything; it’s the source

of the data

• Can run XQuery expressions on it

• Results of XQuery (DOM tree) sent back

to XML Repository

• Conforms to the schema of its XML

Repository

Masters Defense, March 2004 18

Semantic Integration Process

• Query Translation:

RDQL → XQuery → Distributed XQuery

• Result Transformation:

DOM Tree → Merged DOM tree →

RDQL Result Table + RDF Model

Page 10: Semantic Integration of XML Using a RDF Global Mediatorfhsu/publications/feihong_defense.pdfMasters Thesis Defense March 10, 2004 Masters Defense, March 2004 2 Why Semantic Integration?

10

Masters Defense, March 2004 19

Query Translation

Global Mediator XML RepositoryXML Local Source

(Multiple)

1. Accept RDQL

query

2. Translate to

XQuery

3. Process

distributed XQuery

for multiple local

sources

RDQL

Query XQuery XQuery

Masters Defense, March 2004 20

Anatomy of an RDQL Query

Clauses:

• SELECT – list of variables for output

• WHERE – RDF subgraph constraints

• AND – boolean expression constraints

• USING – namespace prefixes

Page 11: Semantic Integration of XML Using a RDF Global Mediatorfhsu/publications/feihong_defense.pdfMasters Thesis Defense March 10, 2004 Masters Defense, March 2004 2 Why Semantic Integration?

11

Masters Defense, March 2004 21

A sample RDQL QuerySELECT ?title, ?pub

WHERE (?book dc:title ?title),

(?book dc:creator ?author),

(?book dc:publisher ?pub)

(?author foaf:name ?name)

AND ?name eq "Neil Gaiman"

USING dc AS <http://purl.org/dc/elements/1.1/>,

foaf AS <http://xmlns.cm/foaf/0.1/>

Masters Defense, March 2004 22

Graph of WHERE clause

?book

?title

?author ?name

foaf:name

dc:title

dc:creator

?pubdc:publisher

Page 12: Semantic Integration of XML Using a RDF Global Mediatorfhsu/publications/feihong_defense.pdfMasters Thesis Defense March 10, 2004 Masters Defense, March 2004 2 Why Semantic Integration?

12

Masters Defense, March 2004 23

Anatomy of an XQuery Expression

Clauses:

• let – bind variables

• for – bind variables, iterate over nodes

• where – boolean expression constraints

• return – list of variables to output

Masters Defense, March 2004 24

A Sample XQuery Expression

let $authors := doc("authors.xml")/authors

for $author in $authors, $name in $author/@name

for $book in $author/book

for $title in $book/@title,

$pub in $book/@publisher

where $name = "Neil Gaiman"

return ($title, $pub)

Page 13: Semantic Integration of XML Using a RDF Global Mediatorfhsu/publications/feihong_defense.pdfMasters Thesis Defense March 10, 2004 Masters Defense, March 2004 2 Why Semantic Integration?

13

Masters Defense, March 2004 25

Tree of For Clauses

authors

author

book

@title @pub

Masters Defense, March 2004 26

Mapping of Clauses

Conceptually, the algorithm can proceed by

mapping an RDQL clause with its

equivalent XQuery clause(s):

• SELECT → return

• WHERE → for (multiple)

• AND → where

• USING → [none]

Page 14: Semantic Integration of XML Using a RDF Global Mediatorfhsu/publications/feihong_defense.pdfMasters Thesis Defense March 10, 2004 Masters Defense, March 2004 2 Why Semantic Integration?

14

Masters Defense, March 2004 27

Mapping the WHERE Clause to For

Clauses

• Need to map a graph to a tree

• Can break down an RDF graph into triples

• Can break down an XML tree into path

expressions

• Map triples to path expressions using a

pattern-matching technique!

Masters Defense, March 2004 28

Break RDF Graph into Triples

A B

C

x

By

ACz

A

B

x

C

z

y

Page 15: Semantic Integration of XML Using a RDF Global Mediatorfhsu/publications/feihong_defense.pdfMasters Thesis Defense March 10, 2004 Masters Defense, March 2004 2 Why Semantic Integration?

15

Masters Defense, March 2004 29

Break XML Tree into Path

Expressions

A/B/C

A/B/D

A/E

A

EB

C D

Masters Defense, March 2004 30

Pattern Matching with the Mapping

Structure

• We want to map triples to path

expressions

• But we must respect the class hierarchy

and the property hierarchy

• Therefore, do sub-triple matching

Page 16: Semantic Integration of XML Using a RDF Global Mediatorfhsu/publications/feihong_defense.pdfMasters Thesis Defense March 10, 2004 Masters Defense, March 2004 2 Why Semantic Integration?

16

Masters Defense, March 2004 31

What Is a Sub-Triple?

• A sub-triple is a specialization of another

triple.

• So (A, b, C) is a sub-triple of (X, y, Z) iff

– A is subclass of X

– b is subproperty of y

– C is subclass of Z

• Example: (Painter, paints, Painting) is a

subclass of (Artist, creates, Work).

Masters Defense, March 2004 32

Pattern Matching Example (1)

SELECT ?name, ?title, ?year

WHERE (?artist foaf:name ?name),

(?artist art:creates ?work),

(?work art:finishedIn ?year),

(?work art:title ?title)

USING art AS <http://example.org/art/>

foaf AS <http://xmlns.com/foaf/0.1/>

Page 17: Semantic Integration of XML Using a RDF Global Mediatorfhsu/publications/feihong_defense.pdfMasters Thesis Defense March 10, 2004 Masters Defense, March 2004 2 Why Semantic Integration?

17

Masters Defense, March 2004 33

Pattern Matching Example (2)

?artist

(art:Artist)

?name

(rdfs:Literal)

?work

(art:Work)

?title

(rdfs:Literal)art:title

foaf:nam

e

art:creates

?year

(rdfs:Literal)

art:finishedIn

Perform type resolution using the global

ontology (in RDF Schema)

Masters Defense, March 2004 34

Pattern Matching Example (3)

/novelists/novelist/book/@yearart:Book, art:finishedIn, rdfs:Literal

/novelists/novelist/book/@titleart:Book, art:title, rdfs:Literal

/novelists/novelist/@nameart:Novelist, foaf:name, rdfs:Literal

/novelists/novelist/bookart:Novelist, art:writes, art:Book

Page 18: Semantic Integration of XML Using a RDF Global Mediatorfhsu/publications/feihong_defense.pdfMasters Thesis Defense March 10, 2004 Masters Defense, March 2004 2 Why Semantic Integration?

18

Masters Defense, March 2004 35

Pattern Matching Example (4)

novelists

novelist

book

@title @year

Masters Defense, March 2004 36

Pattern Matching Example (5)

let $novelists := doc("novelists.xml")/novelists

for $novelist in $novelists/novelist,

$name in $novelist/@name

for $book in $novelist/book,

$title in $book/@title,

$year in $book/@year

return ($name, $title, $year)

Page 19: Semantic Integration of XML Using a RDF Global Mediatorfhsu/publications/feihong_defense.pdfMasters Thesis Defense March 10, 2004 Masters Defense, March 2004 2 Why Semantic Integration?

19

Masters Defense, March 2004 37

Result Transformation

• XQuery produces DOM trees

• XML repository merges DOM trees from

each local source

• Merged DOM tree is converted to RDF

graph

• RDF graph from each XML repository is

added to the result RDF graph

Masters Defense, March 2004 38

Converting DOM Tree to RDF

Model

• We can reuse the Mapping Structure

• Map path expressions to RDF triples

• Based on same principles as query

translation

Page 20: Semantic Integration of XML Using a RDF Global Mediatorfhsu/publications/feihong_defense.pdfMasters Thesis Defense March 10, 2004 Masters Defense, March 2004 2 Why Semantic Integration?

20

Masters Defense, March 2004 39

Advantages (1)

• Layered architecture provides

modularization, separation of concerns

• Query translation is fast

• Takes advantage of high-level query

languages

Masters Defense, March 2004 40

Advantages (2)

• Provides end-to-end solution

• Extensible, more layers can be added on

top

• Uses currently-available languages and

tools

Page 21: Semantic Integration of XML Using a RDF Global Mediatorfhsu/publications/feihong_defense.pdfMasters Thesis Defense March 10, 2004 Masters Defense, March 2004 2 Why Semantic Integration?

21

Masters Defense, March 2004 41

Disadvantages

• Result transformation is slow

• Cannot deal with semantic relationships

that are not length-one paths

• Mapping structure limits the types of

schemas that can be handled

Masters Defense, March 2004 42

Related Work (1)

• Camillo, Heuser, & Mello:

– Global ontology uses ER variant

– CXPath to XPath

– Mapping views

• Amann, Beeri, Fundulaki, & Scholl:

– Global ontology is generic ontology model

– OQL to XQuery

– Mapping rules

Page 22: Semantic Integration of XML Using a RDF Global Mediatorfhsu/publications/feihong_defense.pdfMasters Thesis Defense March 10, 2004 Masters Defense, March 2004 2 Why Semantic Integration?

22

Masters Defense, March 2004 43

Related Work (2)

• Lakshmanan & Sadri:

– Global ontology is generic ontology model

– XQuery to XQuery

– Mapping catalog

• Patel-Schneider & Siméon:

– Global ontology is RDF Schema

– XQuery to XQuery

– Mapping rules

Masters Defense, March 2004 44

Future Work

• Complete the implementation that deals

with conversion of XML data to RDF

• Use a tree regular expression structure for

the mapping structure instead of a table

• Add the OWL layer on top of the current

framework

Page 23: Semantic Integration of XML Using a RDF Global Mediatorfhsu/publications/feihong_defense.pdfMasters Thesis Defense March 10, 2004 Masters Defense, March 2004 2 Why Semantic Integration?

23

Masters Defense, March 2004 45

What the OWL Layer Would Give

Us

• OWL has more ways to express axioms,

such as disjoint, union, etc.

• OWL properties can be symmetric,

transitive, functional, etc.

• OWL has the sameIndividualAs property,

which gives us a means to make

statements about object identity

Masters Defense, March 2004 46

What Is It Good For? Potential

Applications

• Publishing Framework

• Sensor Network

• Software Agents

• Multimedia Integration