icdt 2005 an abstract framework for generating maximal answers to queries sara cohen, yehoshua sagiv

46
ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

Upload: lily-foley

Post on 28-Mar-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

An Abstract Framework for Generating Maximal Answers to Queries

Sara Cohen, Yehoshua Sagiv

Page 2: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Motivation

Queries and Databases

Answers and Semantics

Graph Properties

Page 3: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

The Problem

In many different domains, we are given the option to query some source of information

Usually, the user only gets results if the query can be completely answered (satisfied)

In many domains, this is not appropriate, e.g., The user is not familiar with the database The database does not contain complete information There is a mismatch between the ontology of the user

and that of the database The query is a “search” that is not expected to be

correct

Page 4: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Search for papers by “Smith” that appeared in

ICDT 2004

Page 5: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Sorry, no matching record found

Page 6: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Search for buses from “Haifa-Technion” to “Ben Gurion Airport”

Page 7: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

There is no direct bus line between the required

destinations

Page 8: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Search for buses to “Ben Gurion Airport”

Page 9: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Must choose From and To

Page 10: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

What Do Users Need?

Users need a way to get interesting partial answers to their queries, especially if a complete answer does not exist

These partial answers should contain maximal information

Main Problems: What should be the semantics of partial answers? How can all partial answers be efficiently computed?

Page 11: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Previous Work

Many solutions have been given for the main problems solutions differ, according to the problem domain

Examples: Full disjunctions: Galindo-Legaria (94), Rajaraman,

Ullman (96), Kanza, Sagiv (03) Queries with incomplete answers over semistructured

data: Kanza, Nutt, Sagiv (99) FleXPath: Amer-Yahia, Lakshmanan, Pandit (04) Interconnections: Cohen, Kanza, Sagiv (03)

Page 12: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Our Contribution In the past, for each semantics considered, the

query evaluation problem had to be studied anew. In this paper, we: Present a general framework for defining semantics

for partial answers Framework is general enough to cover most

previously studied semantics Query evaluation problem can be solved once within

this framework – and reused for new semantics Results improve upon previous evaluation algorithms Presents relationship between this problem and that

of the maximal P-subgraph problem

Page 13: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Motivation

Queries and Databases

Answers and Semantics

Graph Properties

Page 14: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Databases

Databases are modeled as data graphs: (V, E, r, lV, lE)r: Can have a designated root lV: Labels on the vertices lE: Labels on the edges

Note:Nodes correspond to data itemsEven databases that do not have an inherent

graph structure can be modeled as graphs, e.g., relational databases

Page 15: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

XML as a Data Graph

Technion

University

NameDept Dept

Name Faculty Name Faculty

Professor

Name Teaches Teaches

Lecturer

Name Teaches

ComputerScience

ChanaIsraeli

Databases Bioinformatics AviLevy

Biology

MolecularBiology

Page 16: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Relational Database as a Data Graph

Country Climate

Canadadiverse

UKtemporate

USAtemporate

Country City Hotel

UKLondonPlaza

CanadaMontrealHitlon

Canada TorontoRamada

Country City Site

UKLondonBuckingham

USANYMetropolitan

Climates Sites

Accommodations

Page 17: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Relational Database as a Data Graph

Country Climate

Canadadiverse

UKtemporate

USAtemporate

Country City Hotel

UKLondonPlaza

CanadaMontrealHitlon

Canada TorontoRamada

Country City Site

UKLondonBuckingham

USANYMetropolitan

Climates Sites

Accommodations(C, (Canada, diverse))

(C, (UK, temporate))

(C, (USA, temporate))

Page 18: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Relational Database as a Data Graph

Country City Hotel

UKLondonPlaza

CanadaMontrealHitlon

Canada TorontoRamada

Country City Site

UKLondonBuckingham

USANYMetropolitan

Sites

Accommodations

(C, (Canada, diverse))

(C, (UK, temporate))

(C, (USA, temporate))

(A, (UK, London, Plaza))

(A, (Canda, Montreal, Hilton))

(A, (Canda, Toronto, Ramada))

Page 19: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Relational Database as a Data Graph

Country City Site

UKLondonBuckingham

USANYMetropolitan

Sites(C, (Canada, diverse))

(C, (UK, temporate))

(C, (USA, temporate))

(A, (UK, London, Plaza))

(A, (Canda, Montreal, Hilton))

(A, (Canda, Toronto, Ramada))

(S, (UK, London, Buckingham))

(S, (USA, NY, Metropolitan))

Page 20: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Relational Database as a Data Graph

(C, (Canada, diverse))

(C, (UK, temporate))

(C, (USA, temporate))

(A, (UK, London, Plaza))

(A, (Canda, Montreal, Hilton))

(A, (Canda, Toronto, Ramada))

(S, (UK, London, Buckingham))

(S, (USA, NY, Metropolitan))

Page 21: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Queries

Queries are modeled as query graphs: (V, E, r, CV, CE, s) r: Can have a designated root CV : Vertex constraints on the vertices (basically, a

boolean function on vertices) CE : Edge constraints on the edges (basically, a

boolean function on pairs of vertices) s: A structural constraint, one of the letters C, R, N

(defines the required structure of answers, i.e., connected, rooted or none)

Note: Nodes correspond to query variables

Page 22: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

= Dept and ContainsText(Biology)

XML Query as a Graph

Returns faculty members from the Biology Department

= University

= Faculty

= Name

Is Descendent

Is GrandChild

Is Child

Structural Constraint: Rooted

Page 23: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Join Query as a Graph

C A S

Belongs to: C

Belongs to: A Belongs to: S

C.Country = A.Country

C.Country = S.Country

A.Country = S.Company and A.City = S.City

q1

q2 q3

Structural Constraint: Connected

Page 24: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Motivation

Queries and Databases

Answers and Semantics

Graph Properties

Page 25: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Assignment Graphs

Assignment graphs are used to compactly represent assignments of query nodes to database nodes

Basically, assignment graph for Q and D, written QD has:Node (q,d) for each pair q Q and d D such

that d satisfies the constraint on qEdge ((q,d), (q’,d’)) if there is an edge (q,q’)

in Q and (d,d’) satisfies the constraint on (q,q’)May also have a root (details omitted)

Page 26: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Belongs to: A q2

Belongs to: C

Belongs to: S

C.Country = A.Country

C.Country = S.Country

A.Country = S.Company and A.City = S.City

q1

q3

(C, (Canada, diverse))

(C, (UK, temporate))

(C, (USA, temporate))

(A, (UK, London, Plaza))

(A, (Canda, Montreal, Hilton))

(A, (Canda, Toronto, Ramada))

(S, (UK, London, Buckingham))

(S, (USA, NY, Metropolitan))

c1

c2

c3

a1

a2

s1

s2

a3

(q1, c1)

(q1, c2)

(q1, c3)

Page 27: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Belongs to: A q2

Belongs to: C

Belongs to: S

C.Country = A.Country

C.Country = S.Country

A.Country = S.Company and A.City = S.City

q1

q3

(C, (Canada, diverse))

(C, (UK, temporate))

(C, (USA, temporate))

(A, (UK, London, Plaza))

(A, (Canda, Montreal, Hilton))

(A, (Canda, Toronto, Ramada))

(S, (UK, London, Buckingham))

(S, (USA, NY, Metropolitan))

c1

c2

c3

a1

a2

s1

s2

a3

(q1, c1)

(q1, c2)

(q1, c3)

(q2, a1)

(q2, a2)

(q2, a3)

Page 28: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Belongs to: A q2

Belongs to: C

Belongs to: S

C.Country = A.Country

C.Country = S.Country

A.Country = S.Company and A.City = S.City

q1

q3

(C, (Canada, diverse))

(C, (UK, temporate))

(C, (USA, temporate))

(A, (UK, London, Plaza))

(A, (Canda, Montreal, Hilton))

(A, (Canda, Toronto, Ramada))

(S, (UK, London, Buckingham))

(S, (USA, NY, Metropolitan))

c1

c2

c3

a1

a2

s1

s2

a3

(q3, s1)

(q3, s2)

(q1, c1)

(q1, c2)

(q1, c3)

(q2, a1)

(q2, a2)

(q2, a3)

Page 29: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Belongs to: A q2

Belongs to: C

Belongs to: S

C.Country = A.Country

C.Country = S.Country

A.Country = S.Company and A.City = S.City

q1

q3

(C, (Canada, diverse))

(C, (UK, temporate))

(C, (USA, temporate))

(A, (UK, London, Plaza))

(A, (Canda, Montreal, Hilton))

(A, (Canda, Toronto, Ramada))

(S, (UK, London, Buckingham))

(S, (USA, NY, Metropolitan))

c1

c2

c3

a1

a2

s1

s2

a3

(q3, s1)

(q3, s2)

(q1, c1)

(q1, c2)

(q1, c3)

(q2, a1)

(q2, a2)

(q2, a3)

Page 30: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Belongs to: A q2

Belongs to: C

Belongs to: S

C.Country = A.Country

C.Country = S.Country

A.Country = S.Company and A.City = S.City

q1

q3

(C, (Canada, diverse))

(C, (UK, temporate))

(C, (USA, temporate))

(A, (UK, London, Plaza))

(A, (Canda, Montreal, Hilton))

(A, (Canda, Toronto, Ramada))

(S, (UK, London, Buckingham))

(S, (USA, NY, Metropolitan))

c1

c2

c3

a1

a2

s1

s2

a3

(q3, s1)

(q3, s2)

(q1, c1)

(q1, c2)

(q1, c3)

(q2, a1)

(q2, a2)

(q2, a3)

Page 31: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Belongs to: A q2

Belongs to: C

Belongs to: S

C.Country = A.Country

C.Country = S.Country

A.Country = S.Company and A.City = S.City

q1

q3

(C, (Canada, diverse))

(C, (UK, temporate))

(C, (USA, temporate))

(A, (UK, London, Plaza))

(A, (Canda, Montreal, Hilton))

(A, (Canda, Toronto, Ramada))

(S, (UK, London, Buckingham))

(S, (USA, NY, Metropolitan))

c1

c2

c3

a1

a2

s1

s2

a3

(q3, s1)

(q3, s2)

(q1, c1)

(q1, c2)

(q1, c3)

(q2, a1)

(q2, a2)

(q2, a3)

Page 32: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Partial Assignment

A partial assignment is any subgraph of QD that does not contain two different nodes (q,d) and (q,d’)otherwise, would map the node q to two

different database nodes Can distinguish special types of partial

assignments:vertex completeedge completestructurally consistent

Every query node must appear in the partial

assignment

Every edge constraint between query

variables in the partial assignment holds

The partial assignment satisfies the query’s structural constraint

Page 33: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Vertex Complete,

Edge Complete,

Structurally Consistent

Vertex Complete,

Edge Complete,

Structurally Consistent

Vertex Complete,

Edge Complete,

Structurally Consistent

Example

(q3, s1)

(q3, s2)

(q1, c1)

(q1, c2)

(q1, c3)

(q2, a1)

(q2, a2)

(q2, a3)

Page 34: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Semantics All partial assignments for Q over D that satisfy

the vertex and edge constraints are encoded in QD

A semantics defines which subgraphs of the answer graph (i.e., which partial assignments) are in fact answers, e.g., Sves allows all partial assignments that are vertex

complete, edge complete and structurally consistent Ses allows all partial assignments that are edge

complete and structurally consistent Ss allows all partial assignments that are structurally

consistent Usually, we are only interested in maximal

partial assignemnts

Page 35: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Example: Join

(q3, s1)

(q3, s2)

(q1, c1)

(q1, c2)

(q1, c3)

(q2, a1)

(q2, a2)

(q2, a3)

Using semantics Sves

we get the natural join

Page 36: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Example: Join “becomes” a Full Disjunction

(q3, s1)

(q3, s2)

(q1, c1)

(q1, c2)

(q1, c3)

(q2, a1)

(q2, a2)

(q2, a3)

Using semantics Ses

we get the full disjunction

Page 37: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Other Examples

Queries with incomplete answers over semistructured data: Kanza, Nutt, Sagiv (PODS 99) Weak semantics modeled by Ses; Or-semantics

modeled by Ss

FleXPath: Amer-Yahia, Lakshmanan, Pandit (Sigmond 04) Modeled by Ses

Interconnections: Cohen, Kanza, Sagiv (03) Complete interconnection can be modeled by Ses;

Reachable interconnection can be modeled by Ss

Page 38: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Motivation

Queries and Databases

Answers and Semantics

Graph Properties

Page 39: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Semantics are a type of Graph Property

A graph property P is a set of graphs, e.g., is a clique is a bipartite graph

A semantics defines a set of graphs, for every Q, D (these graphs are subgraphs of QD)

Therefore, semantics are a type of graph property

Page 40: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Hereditary Graph Properties and their Variants

There are several interesting types of graph properties that have been studied in graph theory

A graph property P is hereditary if every induced subgraph of a graph in P, is also in P (e.g., clique, is a forest)

A graph property P is connected-hereditary if every connected induced subgraph of a graph in P, is also in P (e.g., is a tree)

Can define rooted-hereditary similarly

Page 41: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Semantics are usually Hereditary

Most semantics for partial answers considered in the past are hereditary (in some sense), i.e., subgraphs of a partial answer are also partial answers

Many semantics require connectivity of results (e.g., full disjunctions)

Some require answers to be rooted (e.g., FlexPath)

Page 42: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Maximal P-Subgraph Problem

Given a graph property P, and a graph G The maximal P-subgraph problem is: Find all maximal induced subgraphs of G that have property P

Therefore, the problem of finding all maximal answers for a query over a database, under a given semantics, is a special case of the maximal P-subgraph problem

Page 43: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Efficient Query Evaluation

There are efficient algorithms that find all maximal P-subgraphs for hereditary, connected hereditary and rooted hereditary properties Efficient in terms of the input and the output (i.e.,

incremental polynomial time)

Use these algorithms to find maximal query answers, e.g., to find full disjunctions, weak answers, or-answers, etc. Improves upon previous results

Page 44: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Conclusion

Presented abstract framework Can model many different types of

queries, databases and semantics in the framework

Semantics in the framework are graph properties

Solve the maximal P-subgraph problem once and reuse it to find maximal query answers

Page 45: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Future Work

It is convenient to define ranking functions and return answers in ranking order

How/when can this be done in our framework? Note: From the modeling it is immediately

apparent that ranking cannot always be performed efficiently The problem of finding a maximal P-subgraph of size

k is NP complete for hereditary and connected-hereditary graph properties (Yannakakis, STOC 78)

Page 46: ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

ICDT 2005

Thank you!

Questions?