cs848: topics in databases: foundations of query optimization topics covered introduction to...

38
CS848: Topics in Databases: Foundations of Query Optimization Topics covered Introduction to description logic: Single column Q The ALC family of dialects Terminologies Language extensions

Upload: deasia-ballinger

Post on 31-Mar-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

Topics covered

Introduction to description logic: Single column QL

The ALC family of dialects

Terminologies

Language extensions

Page 2: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

Single column QLD ::= THING | C Q ::= D as x

| (empty x) | (THING as x minus C as x) | (from Q1, Q2) | (elim x from x.A = y, elim y from y = x, Q) | (x.Pf1 = x.Pf2) | (THING as x minus x.Pf1 = x.Pf2) | (elim x x.R = y) | (THING as x minus elim x from x.R = y,

elim y from y = x, THING as x minus Q) |

Page 3: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

Initial analysis

The language L2 consists of all formulae of FOPC with equality and constant functions that use at most two distinct variables.

Theorem: The satisfiability problem for L2 is NEXPTIME-complete.

Corollary: The query containment problem for single column QL is decidable for queries that are attribute free.

Page 4: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

New syntax (cont’d)

D ::= THING | C Q ::= D as x | ? , | (empty x)

| (THING as x minus C as x) | (from Q1, Q2) | (elim x from x.A = y, elim y from y = x, Q) | (x.Pf1 = x.Pf2) | (THING as x minus x.Pf1 = x.Pf2) | (elim x x.R = y) | (THING as x minus elim x from x.R = y,

elim y from y = x, THING as x minus Q) |

Page 5: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

New syntax (cont’d)

D ::= THING | C Q ::= D as x | ? | :C , | (THING as x minus C as x)

| (from Q1, Q2) | (elim x from x.A = y, elim y from y = x, Q) | (x.Pf1 = x.Pf2) | (THING as x minus x.Pf1 = x.Pf2) | (elim x x.R = y) | (THING as x minus elim x from x.R = y,

elim y from y = x, THING as x minus Q) |

Page 6: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

New syntax (cont’d)

D ::= THING | C Q ::= D as x | ? | :C | C1 u C2 , | (from Q1, Q2)

| (elim x from x.A = y, elim y from y = x, Q) | (x.Pf1 = x.Pf2) | (THING as x minus x.Pf1 = x.Pf2) | (elim x x.R = y) | (THING as x minus elim x from x.R = y,

elim y from y = x, THING as x minus Q) |

Page 7: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

New syntax (cont’d)

D ::= THING | C Q ::= D as x | ? | :C | C1 u C2

| 8A.D , | (elim x from x.A = y, elim y from y = x, Q) | (x.Pf1 = x.Pf2) | (THING as x minus x.Pf1 = x.Pf2) | (elim x x.R = y) | (THING as x minus elim x from x.R = y,

elim y from y = x, THING as x minus Q) |

Page 8: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

New syntax (cont’d)

D ::= THING | C Q ::= D as x | ? | :C | C1 u C2

| 8A.D | Pf1 = Pf2 , | (x.Pf1 = x.Pf2)

| (THING as x minus x.Pf1 = x.Pf2) | (elim x x.R = y) | (THING as x minus elim x from x.R = y,

elim y from y = x, THING as x minus Q) |

Page 9: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

New syntax (cont’d)

D ::= THING | C Q ::= D as x | ? | :C | C1 u C2

| 8A.D | Pf1 = Pf2

| Pf1 Pf2 , | (THING as x minus x.Pf1 = x.Pf2) | (elim x x.R = y) | (THING as x minus elim x from x.R = y,

elim y from y = x, THING as x minus Q) |

Page 10: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

New syntax (cont’d)

D ::= THING | C Q ::= D as x | ? | :C | C1 u C2

| 8A.D | Pf1 = Pf2

| Pf1 Pf2

| 9R.THING , | (elim x x.R = y) | (THING as x minus elim x from x.R = y,

elim y from y = x, THING as x minus Q) |

Page 11: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

New syntax (cont’d)

D ::= THING | C Q ::= D as x | ? | :C | C1 u C2

| 8A.D | Pf1 = Pf2

| Pf1 Pf2

| 9R.THING | 8R.D , | (THING as x minus elim x from x.R = y,

elim y from y = x, THING as x minus Q) |

Page 12: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

New syntax (cont’d)

Q ::= D as x |

D ::= THING | C | ? | :C | C1 u C2

| 8A.D | Pf1 = Pf2

| Pf1 Pf2

| 9R.THING | 8R.D | (D)

Page 13: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

New syntax (cont’d)

Q ::= D as x |

D ::= > | C | ? | :C | C1 u C2

| 8A.D | Pf1 = Pf2

| Pf1 Pf2

| 9R.> | 8R.D | (D)

Page 14: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

Concept dependencies

On terminology and notation: We call an instance of the language generated by D for a given DL a concept. A concept inclusion dependency C for a given DL is written

D1 v D2

and corresponds to the query containment dependency

(D1 as x) v (D2 as x).

A concept definition C for a given DL is written

C ´ D

and corresponds to the query equivalence dependency

(C as x) ´ (D as x).

Page 15: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

CLASSIC† (our first DL) (syntax) (semantics)

D ::= (universal concept) | > (primitive concept) | C (C)I

(bottom concept) | ? ; (atomic negation) | :C – (C)I

(intersection) | D1 u D2 (D1)I Å (D2)I

(attribute value restriction) | 8A.D {e : (A)I(e) 2 (D)I} (path agreement) | Pf1 = Pf2 {e : (Pf1)I(e) = (Pf2)I(e)} (path disagreement) | Pf1 Pf2 {e : (Pf1)I(e) (Pf2)I(e)} (existential quantification) | 9R.D {e1 : 9e2 : (e1, e2) 2 (R)I Æ e2 2 (D)I} (role value restriction) | 8R.D {e1 : 8(e1, e2) 2 (R)I : e2 2 (D)I}

| (D)

†[Borgida and Patel-Schneider, 1994]

Page 16: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

Concept dependencies (cont’d)

The concept inclusion problem for a given DL is to determine if a concept inclusion dependency in the DL, D1 v D2, is an axiom; that is, to determine if (D1)I µ (D2)I for any database I.

Theorem: The concept inclusion problem for CLASSIC is solvable in low order polynomial time.

Page 17: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

An efficient decision procedure

Theorem: The following procedure decides if C = (D1 v D2) is anaxiom for CLASSIC, and can be implemented in low order polynomialtime.

1. Create a partial database I1 consisting of a single individual e in concept D1. Perform a simple chase of I1 to obtain a partial database I2.

2. Return true if the domain of I2 is empty, or if the tuple

hx : e , cnt : 1i

occurs in «D2 as x¬(I2)†; otherwise return false.

†Use forced semantics for agreements and disagreements.

Page 18: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

The simple chase

n : {D1 t D2} [ L n : {D1, D2} [ L

n1 : {8A.D} [ L n2 : {D}n1 : LA

n1 : {9R.D} [ L n2 : {D}n1 : LR

Page 19: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

The simple chase (cont’d)

n2 : L2n1 : {8R.D} [ L1

R

n2 : {D} [ L2n1 : L1

R

n : {A1.A2. .Ar = B1.B2. .Bs} [ L

n : L u1 : ; ur : ;A1 ArA2

v1 : ; vs : ;BsB2B1

Page 20: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

The simple chase (cont’d)

n : {A1.A2. .Ar B1.B2. .Bs} [ L

n : L u1 : ; ur : ;A1 ArA2

v1 : ; vs : ;BsB2B1

w : L u : L1

A

v : L2

A

w : L u : L1

A

v : L2

A

Page 21: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

The simple chase (cont’d)

n1 : L1 n2 : L2 n1 : L1 [ L2 n2 : L1 [ L2

n1 : L1 n2 : L2 n3 : L3

n1 : L1 n2 : L2 n3 : L3

u : L1 v : L3

A

x : L4

Aw : L2

u : L1 v : L3

A

x : L4

Aw : L2

Page 22: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

The simple chase (cont’d)

w : L u : L1

A

v : L2

A

w : {?} u : L1

A

v : L2

A

u : L1 v : L3

A

x : L4

Aw : L2

u : L1 v : L3

A

x : L4

Aw : L2

Page 23: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

The simple chase (cont’d)

(remove all nodes and incident arcs)n : {?} [ L

or

m : L1 n : L2

n : {C, :C } [ L

or

Page 24: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

Evaluating agreements and disagreements

Note that agreements and disagreements can navigate missing attribute values. In such cases, assume a forced semantics. In particular, a node n satisfies an agreement iff the agreement has the form

Pf1.Pf = Pf2.Pf

where (Pf1)I(n) and (Pf2)I(n) are defined and lead to nodes connected by an equality arc; n satisfies a disagreement iff it has the form

Pf1 = Pf2

where (Pf1)I(n) and (Pf2)I(n) are defined and lead to nodes connected by an inequality arc.

Page 25: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

Example

Observation: The chase decision procedure for CLASSIC can be implemented in O(n log n) time, where n is the length of the component descriptions.

select e from EMP as ewhere e = e.b.b.b and e = e.b.b.b.b.b

(from (EMP as x), (from (x = x.b.b.b), (x = x.b.b.b.b.b)))

´ EMP u (id = b.b.b) u (id = b.b.b.b.b) as x

EMP u (id = b.b.b) u (id = b.b.b.b.b)

´ EMP u (id = id.b)

EMP u (id = b) as x)

select e from EMP as e where e = e.b

Page 26: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

The ALC family of DLs

(syntax) (semantics)

D ::= (primitive concept) | C (C)I

(universal concept) | > (bottom concept) | ? ; (atomic negation) | :C – (C)I

(intersection) | D1 u D2 (D1)I Å (D2)I

(role value restriction) | 8R.D {e1 : 8(e1, e2) 2 (R)I : e2 2 (D)I} (limited existential quantification) | 9R.> {e1 : 9e2 : (e1, e2) 2 (R)I Æ e2 2 (D)I} (union) | D1 t D2 (D1)I [ (D2)I

(full existential quantification) | 9R.D {e1 : 9e2 : (e1, e2) 2 (R)I Æ e2 2 (D)I} (quantified number restriction) | (> n R) {e1 : |{e2 : (e1, e2) 2 (R)I}| ¸ n} (quantified number restriction) | (6 n R) {e1 : n ¸ |{e2 : (e1, e2) 2 (R)I}|} (full negation) | :D – (D)I

Page 27: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

The ALC family of DLs (cont’d)

FL0 FL– AL ALN

D ::= C p p p p | > p p p | ? p p p | :C p p | D1 u D2 p p p p | 8R.D p p p p | 9R.> p p p | D1 t D2

| 9R.D | (> n R) p | (6 n R) p | :D

Page 28: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

The ALC family of DLs (cont’d)

ALU ALE ALUE ALC ALCN

D ::= C p p p p p | > p p p p p | ? p p p p p | :C p p p p p | D1 u D2 p p p p p | 8R.D p p p p p | 9R.> p p p p p | D1 t D2 p p ± p | 9R.D p p ± p | (> n R) p | (6 n R) p | :D ± p p

Page 29: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

Some complexity results

Theorem: The concept inclusion problems for ALC and ALCN are PSPACE-complete.

A consistency problem for a given set of concepts is to determine if there exists a database that interprets a given member of the set as nonempty.

Observation: The consistency problem for ALC (resp. ALCN ) coincides with the concept inclusion problem for ALC (resp. ALCN ). In particular,

D1 v D2

is an axiom iff the concept(D1 u :D2)

is not consistent.

Page 30: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

Testing consistency in ALC

Theorem: The following procedure decides if a given concept D in ALCis consistent.

1. Create a singleton set S1 = {I} of partial databases in which I consists of a single individual e in concept D. Perform a union generalized chase of S1 to obtain a set of partial databases S2 = {I1, … , In}.

2. Return true if the domain of any database in S2 is nonempty; otherwise return false.

Page 31: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

Union generalized chase

Repeatedly do the following to a given set of partial databases S until nochanges occur.

1. Apply the simple chase augmented with the negation rule to a member of S.

2. If S contains a partial database I that in turn contains a node n with the form on the left below, then replace I with two partial databases I1 and I2 in S in which the labeling of node n is revised to the forms on the right below.

e : {D1t D2} [ L e : {D1} [ L e : {D2} [ L

(old node n in I) (new node n in I2)(new node n in I1)

Page 32: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

The negation rule

Exhaustively apply the following rewrites to the concept labeling for any given node:†

:> ) ?:? ) >::D ) D:(D1 u D2) ) (:D1) t (:D2):8A.D ) 8A.:D:8R. D ) 9R.:D:9R. D ) 8R.:D:(D1 t D2) ) (:D1) u (:D2)

†Obtains negation normal form for concept descriptions.

Page 33: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

A general membership problem

A database schema T that consists of concept dependencies in which no primitive concept occurs more than once on the left-hand-side of a concept definition is called a terminology.

The membership problem for a DL dialect is to determine, given a set

{C1, … , Cn, C} of concept dependencies in the DL, if {C1, … , Cn} ² C; that is, if every database I that models each Ci also models C.

Theorem: The membership problem for CLASSIC is undecidable.

Theorem: The membership problem for ALCN is DEXPTIME-complete.

Page 34: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

Varieties of terminologies

A terminology T with only concept definitions is definitional.

For each C1 ´ D occurring in a terminology T and each primitive concept C2 occurring in D, C1 has a direct use of C2. The use relation is the transitive closure of direct use.

T is cyclic iff there exists an atomic concept in T that has a use of itself.

T is acyclic iff it is definitional and is not cyclic.

Page 35: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

An acyclic terminology in ALC

WOMAN ´ PERSON u FEMALE

MAN ´ PERSON u :WOMAN

MOTHER ´ WOMAN u 9hasChild.PERSON

FATHER ´ MAN u 9hasChild.PERSON

PARENT ´ FATHER t MOTHER

GRANDMOTHER ´ MOTHER u 9hasChild.PARENT

MOTHERWITHMANYCHILDREN ´ MOTHER u > 3 hasChild

MOTHERWITHOUTDAUGHTER ´ MOTHER u 8hasChild.:WOMAN

WIFE ´ WOMAN u 9hasHusband.MAN

Page 36: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

More complexity results

Theorem: The membership problem for FL0 with acyclic terminologies is CoNP-complete.

Theorem: The membership problem for ALC with acyclic terminologies is PSPACE-complete.

The DL ALCF extends ALC with agreements and disagreements of path functions.

Theorem: The concept inclusion problem for ALCF is PSPACE-complete.

Theorem: The membership problem for ALCF with acyclic terminologies is NEXPTIME-complete.

Page 37: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

Blocking

Theorem: The membership problem for ALCN is DEXPTIME-complete.

The membership problem for ALCN can be solved by a refinement of theconsistency checking algorithm for concepts in ALC. There are twoimportant tricks to note.

1. Each concept dependency occurring in the terminology, e.g. D1 v D2, is internalized to each new node by adding a corresponding concept, e.g. (:D1 t D2), to the node’s label.

2. To ensure termination, no chasing is performed on blocked nodes. A node is blocked if its concepts are included in an older node.

Page 38: CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of

CS848: Topics in Databases: Foundations of Query Optimization

Language extensions

Role constructors

Role value maps

Uniqueness constraints