cse544

30
1 CSE544 Wednesday, March 29, 2006

Upload: mariko

Post on 06-Feb-2016

28 views

Category:

Documents


0 download

DESCRIPTION

CSE544. Wednesday, March 29, 2006. Theory. Relational databases invented by a theoretician (Codd) Fundamental principle: separate the WHAT from the HOW - data independence WHAT: First Order Logic (FO) HOW: Relational algebra (RA). FO Syntax. Given: A vocabulary: R 1 , …, R k - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CSE544

1

CSE544

Wednesday, March 29, 2006

Page 2: CSE544

2

Theory

• Relational databases invented by a theoretician (Codd)

• Fundamental principle: separate the WHAT from the HOW - data independence

• WHAT: First Order Logic (FO)• HOW: Relational algebra (RA)

Page 3: CSE544

3

Given:

• A vocabulary: R1, …, Rk

• An arity, ar(Ri), for each i=1,…,k

• An infinite supply of variables x1, x2, x3, …

• Constants: c1, c2, c3, ...

FO Syntax

Page 4: CSE544

4

• Terms (t) and FO formulas () are:

FO Syntax

t ::= x | c

::= R(t1, ..., tar(R)) | ti = tj

| ’ | ’ | ’

| x. | x.

t ::= x | c

::= R(t1, ..., tar(R)) | ti = tj

| ’ | ’ | ’

| x. | x.

Page 5: CSE544

5

FO Examples

1

2

4

3

Most interesting case:Vocabulary = one binary relation R (encodes a graph)

1 2

2 1

2 3

1 4

3 4

R=

Page 6: CSE544

6

FO Sentences

• Does there exists a loop in the graph ?

• Are there paths of length >2 ?

• Is there a “sink” node ?

x.R(x,x) x.R(x,x)

x.y.z.u.(R(x,y) R(y,z) R(z,u)) x.y.z.u.(R(x,y) R(y,z) R(z,u))

x.y.R(x,y) x.y.R(x,y)

Page 7: CSE544

7

Semantics

• Given a vocabulary R1, …, Rk

• A model is D = (D, R1D, …, Rk

D)

– D = a set, called domain, or universe

– RiD D D ... D, (ar(Ri) times)

i = 1,...,k

Page 8: CSE544

8

Semantics

• Given:– A model D = (D, R1

D, ..., RkD)

– A formula

– A substitution s : {x1, x2, ...} D

• We define next the relation:

meaning “D satisfies with s”

D |= [s]D |= [s]

Page 9: CSE544

9

Semantics

D |= (R(t1, ..., tn)) [s]D |= (R(t1, ..., tn)) [s]

D |= (t= t’) [s]D |= (t= t’) [s]

If (s(t1), ..., s(tn)) RD

If s(t) = s(t’)

Page 10: CSE544

10

Semantics

D |= ( ’) [s]D |= ( ’) [s]

D |= ( ’) [s]D |= ( ’) [s]

D |= (’) [s]D |= (’) [s]

If D |= ()[s] and D |= (’) [s]

If D |= ()[s] or D |= (’) [s]

If not D |= ()[s]

Page 11: CSE544

11

First Order Logic: Semantics

D |= (x.) [s]D |= (x.) [s]

D |= (x.) [s]D |= (x.) [s]

If for all s’ s.t. s(y) = s’(y) for allvariables y other than x, D |= ()[s’]

If for some s’ s.t. s(y) = s’(y) for allvariables y other than x, D |= ()[s’]

Page 12: CSE544

12

FO and Databases

• FO:a sentence is true in D if D |=

• Databases:a formula with free variables x1, ..., xn defines the query:

(D) = {(s(x1), ..., s(xn)) | D |= [s]}

Page 13: CSE544

13

FO Queries

• Find all nodes connected by a path of length 2:

• Find all nodes without outgoing edges:

(x,y) u.(R(x,u) R(u,y))(x,y) u.(R(x,u) R(u,y))

(x) u.(R(u,x) y.R(x,y)(x) u.(R(u,x) y.R(x,y)

These are open formulas

Page 14: CSE544

14

In Class

• Retrieve all nodes with at least two children

• A node x is more important than y if every child of y is also a child of x. Retrieve all ‘most important nodes’ in the graph

Page 15: CSE544

15

FO in Databases

FO Databases

Vocabulary:

R1, ..., Rn

Database schema:

R1, ..., Rn

Model:D = (D, R1

D, …, RkD)

Database instance:D = (D, R1

D, …, RkD)

Sentences are

true or falseFormulas compute

queries

Page 16: CSE544

16

FO Semantics

• In FO we express WHAT we want

• Sometimes it’s even unclear HOW to get it

• See accompanying slides on FO semantics– They explain HOW to get it, but it’s impractical

Page 17: CSE544

17

Relational Algebra

• An algebra over relations

• Five operators:, -, , ,

• Meaning:R1 R2 = set union

R1 - R2 = set difference

R1 R2 = cartesian product

c(R) = subset of tuples satisfying condition c

a(R) = projection on the attributes in a

R1 R2 = set union

R1 - R2 = set difference

R1 R2 = cartesian product

c(R) = subset of tuples satisfying condition c

a(R) = projection on the attributes in a

Page 18: CSE544

18

FO RA

(x) R(x,x)(x) R(x,x) 1(1=2(R))1(1=2(R))

(x,y) z.u.(R(x,z) R(z,u) R(u,y))

(x,y) z.u.(R(x,z) R(z,u) R(u,y)) 16(2=34=5 (R R R))16(2=34=5 (R R R))

(x) y.R(x,y)(x) y.R(x,y) ?

16 ((R join2=1 R) join4=1 R))16 ((R join2=1 R) join4=1 R))

WHAT HOW

Page 19: CSE544

19

FO v.s. RA

Theorem. Every query in RA can be expressed in FO

Proof

This shows how to go from HOW to WHATnot very interesting

What about the converse ?

Page 20: CSE544

20

The Drinkers/Beers Example

• Vocabulary:

• Find all drinkers that frequent some bar that serve some beer that they like:

Likes(drinker,beer), Serves(bar,beer), Frequents(drinker,bar)Likes(drinker,beer), Serves(bar,beer), Frequents(drinker,bar)

(d) ba. be.(F(d,ba) L(d,be))(d) ba. be.(F(d,ba) L(d,be))

Page 21: CSE544

21

Lots of Fun Examples (in class)

• Find drinkers that frequent some bar that serves only beer they like

• Find drinkers that frequent only bars that serve some beer they like

• Find drinkers that frequent only bars that serve only beer they like

Page 22: CSE544

22

Unsafe FO Queries

• Find all nodes that are not in the graph:

what’s wrong ?

x)z.R(z,y)y.R(x,q(x) ¬∃∧¬∃=

Page 23: CSE544

23

Unsafe FO Queries

• Find all nodes that are connected to “everything”:

what’s wrong ?

y)y.R(x,q(x) ∀=

Page 24: CSE544

24

Unsafe FO Queries

• Find all pairs of employees or offices:

what’s wrong ?

• We don’t want such queries !

Office(y)Emp(x)y)q(x, ∨=

Page 25: CSE544

25

Safe Queries

A model D = (D, R1D, …, Rk

D)

• In FO:– both D and R1

D, …, RkD may be infinite

• In databases:– D may infinite (int, string, etc)– R1

D, …, RkD are always finite

– We call this a finite model

Page 26: CSE544

26

Safe Queries

is a finite query if for every finite model D, (D) is finite

is safe, or domain independent, if for every two models D, D’ having the same relations:

D = (D, R1D, …, Rk

D), D’ = (D’, R1D, …, Rk

D)

we have (D) = (D’)

• If is safe then it is also finite (why ?)

• Note: book has different but equivalent definition

Page 27: CSE544

27

Safe Queries

• Definition. Given D = (D, R1D, …, Rk

D), the active domain is Da = the set of all constants in R1

D, …, RkD

• Example. Given a graph D = (D, R)Da = { x | y.R(x,y) z.R(z,x)}

• Property. If a query is safe, it suffices to range quantifiers only over the active domain (why ?)

• Hence we can compute safe queries

Page 28: CSE544

28

Safe Queries

• The safe relational calculus consists only of safe queries. However:

• Theorem It is undecidable if a given a FO query is safe.

• Need to write only safe queries, but how do we know how which queries are safe ?

• Work around: write them in an obviously safe way– Range restricted queries - formally defined in [AHU]

Page 29: CSE544

29

FO v.s. RA

Theorem. Every safe query in FO can be expressed in RA

Proof

From WHAT to HOWthis is really interesting and motivated the relational model

Page 30: CSE544

30

Limited Expressive Power

• Vocabulary: binary relation R

• The following queries cannot be expressed in FO:

• Transitive closure: x.y. there exists x1, ..., xn s.t.

R(x,x1) R(x1,x2) ... R(xn-1,xn) R(xn,y)

• Parity: the number of edges in R is even