cse 544 relational calculus - university of washington · announcements • guest lecture on...

71
CSE 544 Relational Calculus Lecture #2 January 11 th , 2011 1 Dan Suciu -- 544, Winter 2011

Upload: phungthuan

Post on 28-Aug-2019

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

CSE 544 Relational Calculus

Lecture #2 January 11th, 2011

1 Dan Suciu -- 544, Winter 2011

Page 2: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Announcements

•  HW1 is posted

•  First paper review due next week

•  Friday: project groups are due

Dan Suciu -- 544, Winter 2011 2

Page 3: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Announcements

•  Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases

•  Need to makeup one lecture this/next week. When ? –  Friday 1/14, 12:30-2pm ?

–  Wednesday, 1/19, morning (9…12) ? –  Friday, 1/21, late morning (11...1pm) ?

Dan Suciu -- 544, Winter 2011 3

Page 4: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Today’s Agenda

Relational calculus: •  Domain Relational Calculus •  Non-recursive datalog (a reasonable

abstraction of SQL) •  Relational algebra

4 Dan Suciu -- 544, Winter 2011

They are equivalent and why we care

Page 5: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Domain Relational Calculus

Given: •  A vocabulary: R1, …, Rk

•  An arity, ar(Ri), for each i=1,…,k •  An infinite supply of variables x1, x2, x3,

… •  Constants: c1, c2, c3, ...

Dan Suciu -- 544, Winter 2011 5

Page 6: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Domain Relational Calculus

Dan Suciu -- 544, Winter 2011 6

t ::= x | a /* variable or constant */

ϕ ::= R(t1, ..., tk) | ti = tj

| ϕ ∧ ϕ’ | ϕ ∨ ϕ’ | ¬ϕ’

| ∀x.ϕ | ∃x.ϕ

Terms t and Formulas ϕ are:

A query Q is any formula ϕ, usually written as:

Q = { x | ϕ }

where x = (x1,x2,…) are the free variables in ϕ.

Page 7: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Example: Querying a Graph

1

2

4

3

R encodes a graph

1 2 2 1 2 3 1 4 3 4

R=

{x | ∃y.R(x,y)}

{x | ∃y.∃z.∃u.(R(x,y) ∧ R(y,z) ∧ R(z,u))

{(x,y) | ∀z.(R(x,z) R(y,z)}

What do these queries return ?

Page 8: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Example: Querying a Graph

1

2

4

3

1 2 2 1 2 3 1 4 3 4

R=

{x | ∃y.R(x,y)}

{x | ∃y.∃z.∃u.(R(x,y) ∧ R(y,z) ∧ R(z,u))

{(x,y) | ∀z.(R(x,z) R(y,z)}

Nodes that have at least one child: {1,2,3}

Nodes that have a great-grand-child: {1,2}

Every child of x is a child of y: {(1,1),(2,2),(3,1),(3,3),(4,1), (4,2), (4,3)}

R encodes a graph What do these queries return ?

Page 9: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Semantics of a Formula

•  A database instance (or a model) is D = (D, R1

D, …, RkD)

– D = a set, called domain, or universe – Ri

D ⊆ D × D × ... × D, (ar(Ri) times) i = 1,...,k

•  A valuation is a function s : {x1, x2, ...} D •  The semantics of a formula ϕ says when

D ⊨ ϕ[s] holds

Dan Suciu -- 544, Winter 2011 9

The domain D is not part of the database, yet we need it to define the semantics. Where ?

Page 10: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Semantics of D ⊨ ϕ[s]

Dan Suciu -- 544, Winter 2011 10

D ⊨ (R(t1, ..., tn)) [s]

D ⊨ (t= t’) [s]

If (s(t1), ..., s(tn)) ∈ RD

If s(t) = s(t’)

D ⊨ (ϕ ∧ ϕ’) [s]

D ⊨ (ϕ ∨ ϕ’) [s]

D ⊨ (¬ϕ’) [s]

If D ⊨ ϕ[s] and D ⊨ (ϕ’) [s]

If D ⊨ ϕ[s] or D ⊨ ϕ’[s]

If it is not the case D ⊨ ϕ[s]

Page 11: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Semantics of D ⊨ ϕ[s]

Dan Suciu -- 544, Winter 2011 11

D ⊨ (∀x.ϕ) [s]

D ⊨ (∃x.ϕ) [s]

If for every constant a in D, D ⊨ ϕ[sa/x]

If for some constant a in D, D ⊨ ϕ[sa/x]

Notation: sa/x = the valuation s modified to map x to a

Page 12: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Semantics of D ⊨ ϕ[s]

Dan Suciu -- 544, Winter 2011 12

D ⊨ (∀x.ϕ) [s]

D ⊨ (∃x.ϕ) [s]

If for every constant a in D, D ⊨ ϕ[sa/x]

If for some constant a in D, D ⊨ ϕ[sa/x]

Notation: sa/x = the valuation s modified to map x to a

Note: here we need the domain D

Page 13: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Semantics of a Query

Dan Suciu -- 544, Winter 2011 13

The semantics of a query Q = { x | ϕ } is

Q(D) = { s(x) | for all s such that D ⊨ ϕ[s] }

Page 14: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

The drinkers-bars-beers example

14

Likes(drinker, beer) Frequents(drinker, bar) Serves(bar, beer)

x: ∃y. ∃z. Frequents(x, y) ∧ Serves(y,z) ∧ Likes(x,z)

What does this query compute ?

Page 15: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

The drinkers-bars-beers example

15

Find drinkers that frequent some bar that serves some beer they like.

Likes(drinker, beer) Frequents(drinker, bar) Serves(bar, beer)

x: ∃y. ∃z. Frequents(x, y) ∧ Serves(y,z) ∧ Likes(x,z)

What does this query compute ?

Page 16: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

The drinkers-bars-beers example Likes(drinker, beer) Frequents(drinker, bar) Serves(bar, beer)

x: ∀y. Frequents(x, y)⇒ (∃z. Serves(y,z)∧Likes(x,z))

What does this query compute ?

Page 17: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

The drinkers-bars-beers example

Find drinkers that frequent only bars that serves some beer they like.

Likes(drinker, beer) Frequents(drinker, bar) Serves(bar, beer)

x: ∀y. Frequents(x, y)⇒ (∃z. Serves(y,z)∧Likes(x,z))

What does this query compute ?

Page 18: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

The drinkers-bars-beers example Likes(drinker, beer) Frequents(drinker, bar) Serves(bar, beer)

x: ∃y. Frequents(x, y)∧∀z.(Serves(y,z) ⇒ Likes(x,z))

What does this query compute ?

Page 19: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

The drinkers-bars-beers example

Find drinkers that frequent some bar that serves only beers they like.

Likes(drinker, beer) Frequents(drinker, bar) Serves(bar, beer)

x: ∃y. Frequents(x, y)∧∀z.(Serves(y,z) ⇒ Likes(x,z))

What does this query compute ?

Page 20: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

The drinkers-bars-beers example

Dan Suciu -- 544, Winter 2011 20

Likes(drinker, beer) Frequents(drinker, bar) Serves(bar, beer)

x: ∀y.Frequents(x, y)⇒ ∀z.(Serves(y,z) ⇒ Likes(x,z))

What does this query compute ?

Page 21: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

The drinkers-bars-beers example

Dan Suciu -- 544, Winter 2011 21

Find drinkers that frequent only bars that serves only beer they like.

Likes(drinker, beer) Frequents(drinker, bar) Serves(bar, beer) What does this query compute ?

x: ∀y.Frequents(x, y)⇒ ∀z.(Serves(y,z) ⇒ Likes(x,z))

Page 22: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Summary of Relational Calculus

•  Same as First Order Logic •  See book for the two variants:

– Domain relational calculus (what we discussed) – Tuple relational calculus

•  This is a powerful, concise language

Dan Suciu -- 544, Winter 2011 22

Page 23: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Datalog Rule

Li = a literal, or atom, or subgoal; can be one of:

•  R(x1, x2, …) = relational atom •  u = v, or u ≠ v, where u, v = variables or constants S = a new symbol

x = (x1, x2, …) are head variables

Dan Suciu -- 544, Winter 2011 23

Q(x ) :- L1, L2, …, Lk

Head Body

Q(x) :- Likes(x,y),Likes(x,z),y≠z Example:

Page 24: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Semantics of Datalog Rule

Dan Suciu -- 544, Winter 2011 24

Q = {x | ∃y1.∃y2… L1∧L2 ∧…∧Lk} Q(x ) :- L1, L2, …, Lk

y1, y2,… are all non-head variables

Q(D) :- { s(x) | s = valuation s.t. D ⊨ L1[s],…,D ⊨ Lk[s] }

The semantics is:

Note: a datalog rule is also called a conjunctive query

Page 25: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Non-Recursive Datalog

Dan Suciu -- 544, Winter 2011 25

S1(x1) :- body1 S2(x2) :- body2 S3(x3) :- body3 . . .

Each symbol Sk may appear in bodyk+1, bodyk+2, …, but may not appear in body1,…,bodyk.

Example:

A(x,y) :- R(x,z),R(z,y) B(x,y) :- A(x,z),A(z,y) C(x,y) :- B(x,z),B(z,y) Q(x,y) :- C(x,z),C(z,y)

What does Q compute ?

Page 26: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Non-Recursive Datalog

Dan Suciu -- 544, Winter 2011 26

S1(x1) :- body1 S2(x2) :- body2 S3(x3) :- body3 . . .

Each symbol Sk may appear in bodyk+1, bodyk+2, …, but may not appear in body1,…,bodyk.

Example:

A(x,y) :- R(x,z),R(z,y) B(x,y) :- A(x,z),A(z,y) C(x,y) :- B(x,z),B(z,y) Q(x,y) :- C(x,z),C(z,y)

What does Q compute ?

A: all pairs (x,y) connected by a path of length 16.

Page 27: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Comparison

Dan Suciu -- 544, Winter 2011 27

Is non-recursive datalog equivalent to the Relational Calculus ?

Page 28: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Comparison

Dan Suciu -- 544, Winter 2011 28

Is non-recursive datalog equivalent to the Relational Calculus ?

What about writing this query in non-recursive datalog ?

Q = {x | ∃y. Frequents(x,y) ∧ Serves(y,’Bud’) ∨ ∃z. Frequents(x,z) ∧ Serves(z,’Miller’) }

Page 29: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Comparison

Dan Suciu -- 544, Winter 2011 29

Is non-recursive datalog equivalent to the Relational Calculus ?

What about writing this query in non-recursive datalog ?

Q(x) :- Frequents(x,y), Serves(y,’Bud’) Q(x) :- Frequents(x,z), Serves(z,’Miller’)

Q = {x | ∃y. Frequents(x,y) ∧ Serves(y,’Bud’) ∨ ∃z. Frequents(x,z) ∧ Serves(z,’Miller’) }

Page 30: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Comparison

Dan Suciu -- 544, Winter 2011 30

Is non-recursive datalog equivalent to the Relational Calculus ?

Q = {x | ∃y. Frequents(x,y) ∧ ¬∃z.Likes(x,z)∧Serves(y,z)

Now what about writing this query in non-recursive datalog ?

Page 31: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Comparison

Dan Suciu -- 544, Winter 2011 31

Is non-recursive datalog equivalent to the Relational Calculus ?

Now what about writing this query in non-recursive datalog ?

Impossible ! Proof on next slides.

Q = {x | ∃y. Frequents(x,y) ∧ ¬∃z.Likes(x,z)∧Serves(y,z)

Page 32: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Monotone Queries

•  Given two database instances D = (D, R1, …, Rk) and D’ = (D’, R’1, …, R’k) we write D⊆ D’ if D⊆D’, R1⊆R’1,…, Rk⊆R’k.

•  In other words, D’ is obtained by adding tuples to D.

•  We say that Q is monotone if whenever D⊆ D’, Q(D)⊆ Q(D’).

Dan Suciu -- 544, Winter 2011 32

Page 33: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Monotone Queries

Dan Suciu -- 544, Winter 2011 33

Fact. Every datalog rule is monotone. Why ?

Fact. Every datalog program is monotone. Why ?

Fact. This query is not monotone: Q = {x | ∃y. Likes(x,y) ∧ ¬∃z.Frequents(x,z)∧Serves(z,y)

Why ?

Page 34: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Monotone Queries

Dan Suciu -- 544, Winter 2011 34

Fact. Every datalog rule is monotone. Why ?

Fact. Every datalog program is monotone. Why ?

Fact. This query is not monotone: Q = {x | ∃y. Likes(x,y) ∧ ¬∃z.Frequents(x,z)∧Serves(z,y)

Recall the semantics: Q(D) = { s(x) | D ⊨ L1[s],…,D ⊨ Lk[s] } If D ⊨ Li[s] then D’ ⊨ Li[s] hence Q(D)⊆ Q(D’).

Why ?

Page 35: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Monotone Queries

Dan Suciu -- 544, Winter 2011 35

Fact. Every datalog rule is monotone. Why ?

Fact. Every datalog program is monotone. Why ?

Fact. This query is not monotone: Q = {x | ∃y. Likes(x,y) ∧ ¬∃z.Frequents(x,z)∧Serves(z,y)

Recall the semantics: Q(D) = { s(x) | D ⊨ L1[s],…,D ⊨ Lk[s] } If D ⊨ Li[s] then D’ ⊨ Li[s] hence Q(D)⊆ Q(D’).

By induction on the number of rules

Why ?

Page 36: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Monotone Queries

Dan Suciu -- 544, Winter 2011 36

Fact. Every datalog rule is monotone. Why ?

Fact. Every datalog program is monotone. Why ?

Fact. This query is not monotone: Q = {x | ∃y. Likes(x,y) ∧ ¬∃z.Frequents(x,z)∧Serves(z,y)

Recall the semantics: Q(D) = { s(x) | D ⊨ L1[s],…,D ⊨ Lk[s] } If D ⊨ Li[s] then D’ ⊨ Li[s] hence Q(D)⊆ Q(D’).

By induction on the number of rules

By adding a tuple to Frequents we may remove some answer

Why ?

Page 37: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Monotone Queries

Dan Suciu -- 544, Winter 2011 37

t ::= x | a /* variable or constant */

ϕ ::= R(t1, ..., tk) | ti = tj

| ϕ ∧ ϕ’ | ϕ ∨ ϕ’ | ¬ϕ’

| ∀x.ϕ | ∃x.ϕ What fragment of the relational calculus is monotone ?

Recall the relational calculus:

Page 38: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Monotone Queries

Dan Suciu -- 544, Winter 2011 38

t ::= x | a /* variable or constant */

ϕ ::= R(t1, ..., tk) | ti = tj

| ϕ ∧ ϕ’ | ϕ ∨ ϕ’ | ¬ϕ’

| ∀x.ϕ | ∃x.ϕ What fragment of the relational calculus is monotone ?

t ::= x | a /* variable or constant */

ϕ ::= R(t1, ..., tk) | ti = tj

| ϕ ∧ ϕ’ | ϕ ∨ ϕ’ | ∃x.ϕ

Recall the relational calculus:

Page 39: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Non-Recursive Datalog¬

Li = may also be ¬R(x1, x2, …)

•  R(x1, x2, …) = relational atom •  u = v, or u ≠ v, where u, v = variables or constants

Dan Suciu -- 544, Winter 2011 39

Q(x ) :- L1, L2, …, Lk

Page 40: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Non-Recursive Datalog¬

Dan Suciu -- 544, Winter 2011 40

Q = {x | ∃y. Likes(x,y) ∧ ¬∃z.Frequents(x,z)∧Serves(z,y)

Example:

Page 41: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Non-Recursive Datalog¬

Dan Suciu -- 544, Winter 2011 41

FS(x,y) :- Frequents(x,z),Serves(z,y) Q(x) :- Likes(x,y), ¬FS(x,y)

Q = {x | ∃y. Likes(x,y) ∧ ¬∃z.Frequents(x,z)∧Serves(z,y)

Example:

Page 42: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Rel. Calculus non-rec. Datalog¬

Dan Suciu -- 544, Winter 2011 42

Theorem. Every query in the Relational Calculus can be translated to non-recursive Datalog¬

Page 43: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Rel. Calculus non-rec. Datalog¬

Dan Suciu -- 544, Winter 2011 43

Proof

First remove ∀ by using De Morgan: ∀x.ϕ = ¬∃x.¬ϕ

Then, inductively, for each subformula ϕ create a datalog rule F(x):-body that computes Q = { x | ϕ }

Page 44: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Rel. Calculus non-rec. Datalog¬

Dan Suciu -- 544, Winter 2011 44

R(t1, ..., tk) F(x) :- R(t1, ..., tk)

Page 45: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Rel. Calculus non-rec. Datalog¬

Dan Suciu -- 544, Winter 2011 45

? ? ? ϕ1 ∧ ϕ2

F1(x1) :- … /* for ϕ1 */ F2(x2) :- … /* for ϕ2 */

Page 46: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Rel. Calculus non-rec. Datalog¬

Dan Suciu -- 544, Winter 2011 46

F(x) :- F1(x1),F2(x2) ϕ1 ∧ ϕ2

F1(x1) :- … /* for ϕ1 */ F2(x2) :- … /* for ϕ2 */

Page 47: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Rel. Calculus non-rec. Datalog¬

Dan Suciu -- 544, Winter 2011 47

ϕ1 ∨ ϕ2 ? ? ?

F1(x1) :- … /* for ϕ1 */ F2(x2) :- … /* for ϕ2 */

Page 48: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Rel. Calculus non-rec. Datalog¬

Dan Suciu -- 544, Winter 2011 48

ϕ1 ∨ ϕ2 F(x) :- F1(x1)

F(x) :- F2(x2)

F1(x1) :- … /* for ϕ1 */ F2(x2) :- … /* for ϕ2 */

Page 49: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Rel. Calculus non-rec. Datalog¬

Dan Suciu -- 544, Winter 2011 49

∃x.ϕ1 ? ? ?

F1(x) :- … /* for ϕ1 */

Page 50: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Rel. Calculus non-rec. Datalog¬

Dan Suciu -- 544, Winter 2011 50

∃x.ϕ1 F(y) :- F1(x)

F1(x) :- … /* for ϕ1 */

where y = x – {x}

Page 51: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Rel. Calculus non-rec. Datalog¬

Dan Suciu -- 544, Winter 2011 51

¬ϕ1 F(x) :- ¬F1(x)

F1(x) :- … /* for ϕ1 */

End of Proof

Page 52: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Unsafe Queries

Dan Suciu -- 544, Winter 2011 52

Q(x) :- x=y

Q(x) :- ¬R(x,y)

What’s wrong ?

Page 53: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Unsafe Queries

Dan Suciu -- 544, Winter 2011 53

Definition. A datalog¬ rule S :- L1, …, Lk is called safe, if every variable x appears in at least one positive relational atom.

Q(x) :- T(x), S(y), x=y

Q(x) :- T(x), S(y), ¬R(x,y)

Examples

We require that all rules of a non-recursive datalog¬ program to be safe.

Page 54: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Unsafe Queries

Dan Suciu -- 544, Winter 2011 54

But what about Relational Calculus ? What does it mean for a query Q to be safe ?

Q = {x | ∀y. Frequents(x,y)}

Q = {(x,y) | Faculty(x) ∨ Student(y)}

Are these queries safe ?

Page 55: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Unsafe Queries

Dan Suciu -- 544, Winter 2011 55

Definition. A query Q is domain independent, if for any two database instances D, D’, such that they have the same relations R1, …,Rk but possibly different domains D, D’, Q(D) = Q(D’).

A safe datalog¬ rule is domain independent (obvious). Does the converse hold ?

Note: domain independent queries are also called safe queries, somewhat confusingly.

Page 56: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Unsafe Queries

Dan Suciu -- 544, Winter 2011 56

Q = {x | ∀y. Frequents(x,y)}

Q = {(x,y) | Faculty(x) ∨ Student(y)}

Explain why these queries are not domain independent:

Page 57: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Unsafe Queries

57

Theorem. The problem: “given a relational query Q decide whether Q is safe” is undecidable.

Bummer !...

Why doesn’t the following work ? Translate Q to a datalog¬ program, then check that each rule is safe. Is this a decision procedure for domain independence ?

Page 58: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Active Domain Semantics

Dan Suciu -- 544, Winter 2011 58

Definition. The active domain ADom of a database instance D = (D, R1

D, …, RkD),

is the set of all constants in R1D, …, Rk

D

ADom(x) :- Frequents(x,y)

ADom(y) :- Frequents(x,y) ADom(x) :- Likes(x,y)

ADom(y) :- Likes(x,y) ADom(x) :- Serves(x,y)

ADom(y) :- Serves(x,y)

The active domain can be computed by a boring relational query:

Page 59: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Active Domain Semantics

Dan Suciu -- 544, Winter 2011 59

D ⊨ (∀x.ϕ) [s]

D ⊨ (∃x.ϕ) [s]

If for every constant a in ADom, D ⊨ ϕ[sa/x]

If for some constant a in ADom, D ⊨ ϕ[sa/x]

Make the following two changes to standard semantics:

Page 60: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Active Domain Semantics

Dan Suciu -- 544, Winter 2011 60

Theorem. If Q is domain-independent, then its standard semantics coincides with the active domain semantics

Why ?

Page 61: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Active Domain Semantics

Dan Suciu -- 544, Winter 2011 61

Theorem. If Q is domain-independent, then its standard semantics coincides with the active domain semantics

Why ?

Let D = (D, R1D, …, Rk

D), be a database instance. Denote D’ = (ADom, R1

D, …, RkD)

Since Q is d.i. we have Q(D) = Q(D’) Q(D’) is the same as the active-domain semantics on D

Page 62: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Rel. Calculus non-rec. Datalog¬

Dan Suciu -- 544, Winter 2011 62

¬ϕ1 F(x) :- ¬F1(x)

This rule is unsafe ! How do we make it safe ?

If Q is domain independent, then we want to translate it into a safe non-rec. datalog¬ program

Page 63: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Rel. Calculus non-rec. Datalog¬

Dan Suciu -- 544, Winter 2011 63

¬ϕ1

F(x) :- Adom(x1), Adom(x2),…, Adom(xk),¬F1(x)

Where x = (x1, x2, … xk)

If Q is domain independent, then we want to translate it into a safe non-rec. datalog¬ program

Page 64: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Discussion

•  Non-recursive datalog¬ is a simple, friendly, very popular notation for queries

•  Naturally compositional: sequence of rules •  Exposes a hierarchy of queries

– Single rule datalog = conjunctive queries – Non-rec. datalog program = monotone-RC – Non-rec. datalog¬ program = RC

•  No quantifiers ! Because all are ∃

Dan Suciu -- 544, Winter 2011 64

Page 65: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Why We Care

Dan Suciu -- 544, Winter 2011 65

Q(x) :- R(..), S(..), T(..)

Subqueries in the FROM clause

SQL is non-recursive datalog¬ !

SELECT x FROM R, S, T WHERE …

Q1(x) :- … Q2(y) :- … Q(z) :- Q1(x) , Q2(y)

Q(x) :- R(..), S(..), ¬T(..) Subquery in the WHERE clause

Page 66: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Why We Care •  When we use a database system we need to

write some complex queries – The Masochists: Find drinkers that frequent only

bars that server no beer that they like •  Write it first in Relational Calculus

– Easy exercise •  Translate it to Non-recursive datalog¬

– This is subtle, but we have seen it in detail •  Translate Non-recursive datalog¬ to SQL

– Easy exercise

Dan Suciu -- 544, Winter 2011 66

Page 67: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

67

Relational Algebra The Basic Five operators: •  Union: ∪ •  Difference: - •  Selection: σ •  Projection: Π •  Join: ⨝

Plus: cartesian product ×, renaming ρ

Dan Suciu -- CSEP544 Fall 2010 We will study RA in detail when we discuss Query Execution

Page 68: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Complex RA Expressions

Dan Suciu -- CSEP544 Fall 2010 68

Person x Purchase y Person z Product u

σname=fred σname=gizmo

Π pid Π ssn

y.seller-ssn=z.ssn

y.pid=u.pid

x.ssn=y.buyer-ssn

Π u.name

Page 69: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Equivalence Theorem

69

Theorem. Every domain independent query in RC can be translated to safe non-recursive Datalog¬

Done that.

Theorem. Every safe non-recursive Datalog¬ can be translated to Relational Algebra.

Easy exercise – to do at home

Theorem. Relational Algebra expressions can be translated to a domain independent RC query

Easy exercise – to do at home

Page 70: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Discussion •  What is the monotone fragment of RA ?

– 

•  What are the safe queries in RA ? – 

•  Where do we use RA (applications) ? –  – 

Dan Suciu -- 544, Winter 2011 70

Page 71: CSE 544 Relational Calculus - University of Washington · Announcements • Guest lecture on Tuesday, 1/18, by Bill Howe SQLShare: Smart Services for Ad Hoc Databases • Need to

Discussion •  What is the monotone fragment of RA ?

– No difference: ∪, σ, Π, ⨝

•  What are the safe queries in RA ? – All RA queries are safe

•  Where do we use RA (applications) ? – Translating SQL (WHAT HOW) – Some new query languages (Pig-Latin)

Dan Suciu -- 544, Winter 2011 71