inexact querying of xml. xml data may be irregular relational data is regular and organized. xml may...

Post on 16-Jan-2016

223 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Inexact Querying of XMLInexact Querying of XML

XML Data May be IrregularXML Data May be Irregular

• Relational data is regular and organized. XML may be

very different.

– Data is incomplete: Missing values of attributes in elements

– Data has structural variations: Relationships between

elements are represented differently in different parts of the

document

– Data has ontology variations: Different labels are used to

describe nodes of the same type

• (Note: In some of the upcoming slides, we have labels

on edges instead of on nodes.)

1

11 12 14

Movie Database

Movie

Movie

Actor

22 23 25 26 27 2829

T.V. Series

Film

ActorActor

TitleName Name

Name

Title

Title Title

31 3234 35

KyleMacLachlan

NataliePortman

Harrison Ford

1977

Dune

StarWars

TwinPeaks

36

Year

1984

24

Year

21

Actor

Name

30

Mark Hamill

Léon

Movie

13

Title

33Magnolia

The movie has a year attribute

Incomplete DataIncomplete Data

The year of the movie is missing

1

11 12 14

Movie Database

Movie

Movie

Actor

22 23 25 26 27 2829

T.V. Series

Film

ActorActor

TitleName Name

Name

Title

Title Title

31 3234 35

KyleMacLachlan

NataliePortman

Harrison Ford

1977

Dune

StarWars

TwinPeaks

36

Year

1984

24

Year

Actor

Name

30

Mark Hamill

Léon

Movie

13

Title

33Magnolia

Variations in StructureVariations in Structure

11

Movie below Actor

29

14

2121

Actor below Movie

1

11 12 13

Movie Database

Movie

Movie

Actor

22 23 25 26 27 2829

T.V. Series

Film

ActorActor

TitleName Name

Name

Title

Title Title

31 3233 34

KyleMacLachlan

NataliePortman

Harrison Ford

1977

Dune

StarWars

TwinPeaks

35

Year

1984

24

Year

21

Actor

Name

30

Mark Hamill

Léon

Movie

13

Title

34Magnolia

A movie label A film label

Ontology VariationsOntology Variations

The description of the

schema is large

(e.g., a DTD of XML)

The description of the

schema is large

(e.g., a DTD of XML)

It is difficult to use the schema when formulating queries

It is difficult to use the schema when formulating queries

Data is contributedby many users in a variety of designs

Data is contributedby many users in a variety of designs

The query should deal with differentstructures of data

The query should deal with differentstructures of data

The structure of the

database is changed

frequently

The structure of the

database is changed

frequently

Queries should be rewritten frequentlyQueries should be rewritten frequently

Need to allow the user to write an “approximate query” and have the query processor deal with it

Need to allow the user to write an “approximate query” and have the query processor deal with it

The ProblemThe Problem

• In many different domains, we are given the option

to query some source of information

• Usually, the user only gets results if the query can

be completely answered (satisfied)

• In many domains, this is not appropriate, e.g.,

– The user is not familiar with the database

– The database does not contain complete information

– There is a mismatch between the ontology of the user

and that of the database

Example 1Example 1

ישוב: באר שבע 03איזור חיוג :

היישוב הנבחר אינו מופיע באיזור החיוג הנבחר!

עלייה: חיפה – טכניוןירידה: אילת

אין קו ישיר המחבר בין הנקודות הנבחרות

עלייה: ירידה: אילת

פרטי המקצוע: בסיסי נתונים

לא נמצאו מקצועות מתאימים

What Do Users Need?What Do Users Need?

• Users need a way to get interesting partial answers

to their queries, especially if a complete answer does

not exist

• These partial answers should contain maximal

information

• Problem:

– It is easy to define when an answer satisfies a query

– Hard to say when an answer that does not satisfy a query is

of interest

– Hard to say which incomplete answers are better than others

Modeling a Database and a Modeling a Database and a QueryQuery

• It is useful to model both databases and

queries as labeled directed graphs

– Clean mathematical modeling!

– Captures the essentials of XPath, XQuery

University DatabaseUniversity Database

Technion

University

NameDept Dept

Name Faculty Name Faculty

Professor

Name Teaches Teaches

Lecturer

Name Teaches

ComputerScience

ChanaIsraeli

Databases Bioinformatics AviLevy

Biology

MolecularBiology

QueryQuery University

Dept

Faculty

Name

• Exact answers are

defined by exact

matchings, i.e.,

subgraph

homorphisms

• This query asks for the

names of all faculty

members (of any type)How would you write

this in XPath?

Exact AnswersExact Answers

Technion

University

NameDept Dept

Name Faculty Name Faculty

Professor

Name Teaches Teaches

Lecturer

NameTeaches

ComputerScience

ChanaIsraeli

Databases Bioinformatics AviLevy

Biology

MolecularBiology

University

Dept

Faculty

Name

Exact AnswersExact Answers

Technion

University

NameDept Dept

Name Faculty Name Faculty

Professor

Name Teaches Teaches

Lecturer

NameTeaches

ComputerScience

ChanaIsraeli

Databases Bioinformatics AviLevy

Biology

MolecularBiology

University

Dept

Faculty

Name

Slightly More Complex QuerySlightly More Complex Query

University

Dept

Faculty

Name

• Returns faculty

members only from the

Biology Department

Biology

Exact Answers Are Not Always Exact Answers Are Not Always UsefulUseful

• Problems with exact answers:

– labels are not always known

– content may be unknown, misspelled, etc.

– structure may be unknown, or may vary from one

representation to another

– we may actually want to perform a search, since the

query is a vague hypothesis

– do not allow users to get partial/vague answers

where none better exist

Manually Adding InexactnessManually Adding Inexactness

• One can use language constructs in order to

get more flexible queries

• Example: Suppose we want to find courses,

with teachers that teach them but we don’t

know which hierarchy exists in the database:

– for each teacher, there is a list of courses or

– for each course, there is a list of teachers

– or both…

Technion

University

NameDept Dept

Name Faculty Name Faculty

Teacher

Name Course Course

Teacher

NameCourse

ComputerScience

ChanaIsraeli

Databases Bioinformatics AviLevy

Biology

MolecularBiology

Teacher

Course

Query Needed:

Technion

University

NameDept Dept

Name Faculty Name Faculty

Course

Name Teacher Teacher

Course

Name

ComputerScience

Bioinformatics ChanaIsraeli

Avi Levy

Biology

MolecularBiology

Course

Teacher

Query Needed:

Manually Adding Inexactness Manually Adding Inexactness (cont.)(cont.)

• If we don’t know the hierarchy, we need

Teacher

Course

Course

Teacher

Union

Manually Adding Inexactness Manually Adding Inexactness (cont.)(cont.)

• If we don’t know the hierarchy, we need:

• If we don’t know what exactly the labels are, we

might need:

Teacher

Course

Course

Teacher

Union

Teacher or Lecturer or Professor

Course or Seminar or Lab

UnionTeacher or Lecturer or

Professor

Course or Seminar or Lab

Help!Help!

IntuitionIntuition

• Users write regular queries, stating what

they are looking for

• The query processor uses a built-in strategy

to find answers that exactly satisfy the query

or inexactly satisfy the query

• Burden is on the query processor, not on the

user

Inexact AnswersInexact Answers

• Many different definitions have been given

– For each definition, query processing algorithms have been

defined

• Examples:

– Allow some of the nodes of the query to be unmatched

– Allow edges in the query to be matched to paths in the

database

– Allow nodes to be matched to nodes with labels that have a

similar meaning

• Be careful so that answers are meaningful!

Name

Area Code

City

Allow Unmatched Nodes: Bezeq Allow Unmatched Nodes: Bezeq QueryQuery

Phone Number

שמולביץ

באר שבע

03

Eilat

Matching Edges to Paths: Matching Edges to Paths: Egged QueryEgged Query

Source

Destination

Technion-Haifa

Similar Meaning LabelsSimilar Meaning Labels

Course

Name Details

בסיסי נתוניםבסיסי נתונים

Other Types of InexactnessOther Types of Inexactness

• Many other definitions have been given, e.g.,

– allow permutations of nodes in the query

– allow child nodes to be promoted

– interconnection

• Summary: Inexactness basically means that

we relax some of the query requirements!

top related