dr. a.i. cristea acristea/ cs 319: theory of databases

40
Dr. A.I. Cristea http://www.dcs.warwick.ac.uk/ ~acristea/ CS 319: Theory of Databases

Upload: alejandro-mcdowell

Post on 28-Mar-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

Dr. A.I. Cristea

http://www.dcs.warwick.ac.uk/~acristea/

CS 319: Theory of Databases

Page 2: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

2

… previous

Armstrong axioms

Page 3: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

3

Content1. Generalities DB2. Integrity constraints (FD revisited)3. LLJ, DP and applications4. Relational Algebra (revisited)5. Query optimisation6. Temporal Data7. The Askew Wall8. Tuple calculus9. Domain calculus10. Query equivalence

Page 4: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

4

Lossless Join Decomposition

• Lossless Join Definition:– Let { R

1 , R

2 } be a decomposition of R (meaning that

R1 R

2 = R); the decomposition is lossless if for every

legal instance r of R:

r = R1

(r) R2

(r)

• What is wrong with the following decomposition? – R = {A,B,C} and F = { A B, C B} and we replace R

by { R1 , R

2 } where R

1 = {A,B} and R

2 = {C,B}.

Page 5: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

5

Sufficient Condition for Lossless Join

• Lossless Join means:– Let { R

1 , R

2 } be a decomposition of R (meaning that

R1 R

2 = R);

• Prove that for all legal instances r: r

R1(r)

R2(r)

• Prove that this decomposition is lossless if R

1 R

2 R

1 or R

1 R

2 R

2

– Can you give an example of a lossless join decomposition (instance) when neither

R1 R

2 R

1 nor R

1 R

2 R

2 hold?

Page 6: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

6

Boyce-Codd Normal Form (BCNF)

• A relation scheme R is in BCNF if (and only if) for every non-trivial fd X Y F+, X is a superkey (for R).

• A database scheme D = {R1,..., R

n} is in BCNF if (and

only if) i {1,...,n}: Ri is in BCNF.

• Let R = {A,B,C} and F = { A B, C B} and let us decompose R into by { R

1 , R

2 } where R

1= {A,B} and

R2 = {C,B}. Is this decomposition in BCNF? Is this

the “best” decomposition in BCNF? (Can you find a better one?)

Page 7: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

7

BCNF Decomposition Algorithmresult := {R};done := false;compute F+;while (not done) do

if (there is a schema Ri in result that is not in BCNF)then begin

let αβ be a nontrivial functional dependencythat holds on Ri such that αRi is not in F+,and αβ=;result := (result – Ri) (Ri – β) (α, β);

endelse done:= true;

Page 8: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

8

Dependencies in a decomposition

• Which dependencies hold in R1 and R

2?

– R = {A,B,C} and F = { A B, B C} and we replace R by { R

1 , R

2 } where R

1 = {A,B} and R

2 = {B,C}.

– R = {A,B,C} and F = { A B, C B} and we replace R by { R

1 , R

2 } where R

1 = {A,B} and R

2 = {A,C}.

– R = {A,B,C} and F = { A B, B C} and we replace R by { R

1 , R

2 } where R

1 = {A,B} and R

2 = {A,C}

Page 9: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

9

Third Normal Form (3NF)• Third Normal Form

– Informal Presentation– Example and Discussion– Formal Definition

• 3NF Decomposition Algorithm– Principle and Properties

• Lossless-join, dependency-preserving decomposition into 3NF

– Proof of Correctness– Example of 3NF Decomposition

• Third Normal Form and Boyce-Codd Normal Form

Page 10: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

10

Informal Presentation

• Motivation– There are some situations where

• BCNF decomposition is not dependency preserving, and • Efficient checking for FD violation on updates is important

– Solution• Define a weaker normal form, called Third Normal Form

– FDs can be checked on individual relations without computing a join

– There is always a lossless-join, dependency-preserving decomposition into 3NF

Page 11: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

11

Informal Presentation• Motivation

– Sometimes a relational schema and its FDs are not in BCNF but one does not want to decompose it further

– Example:• Relation Bookings with attributes:

– title, the name of the performance– theater, the name of the theater where the

performance is being shown– city, where the theater is located

• FDs are: theater city, title city theater• Is there a BCNF violation?

Bookings(title, theater, city)theater citytitle city theater

Yes: (theater city) because theater is not a superkeyNote: keys here are: (title, city) and (theater, title)

BCNF Decomposition:Bookings1(theater, city)Bookings2(title, theater)

Page 12: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

12

Informal Presentation• Motivation

– Decomposition to get to BCNF may not always be desirable• BCNF decomposition is not dependency preserving, and • Efficient checking for FD violation on updates is important

– 3NF relaxes BCNF to allow relations that cannot be decomposed into BCNF relations without losing ability to check each FD

• Informal Definition of 3NF– A relation R is in third normal form if:

Whenever A B is a nontrivial FD:either A is a superkey or B is a member of some candidate key

Bookings(title, theater, city)theater citytitle city theater

As for BCNF

Page 13: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

13

Informal Presentation• Informal Definition of 3NF

– A relation R is in third normal form if:

• The difference between BCNF and 3NF:– “B is a member of some candidate key”

– Previous example schema is in 3NF• Candidate keys here are: (title, city) and (theater, title)• Theater is not a superkey but city is a member of a candidate

key• What is the problem with this schema?

Whenever A B is a nontrivial FD:either A is a superkey or B is a member of some candidate key

Bookings(title, theater, city)theater citytitle city theater

Page 14: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

14

Informal Presentation• Informal Definition of 3NF

– Previous example schema is in 3NF• What is the problem with this schema?

– The schema contains redundant information

Bookings(title, theater, city)theater citytitle city theater

LondonImperialBeethoven’s 5th Symphony

LondonNew TheaterCats

LondonImperialPhantom of the Opera

New YorkBroadwayCats

CityTheaterTitle

Page 15: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

15

Formal Definition 3NF• Definition

– A relation schema R is in third normal form (3NF) if • for all functional dependencies in F+ of the form ,

where R and R, at least one of the following holds:

is a trivial functional dependency ( )

contains a key for R – every B is part of some candidate key of R

• BCNF and 3NF– A BCNF relation is in 3NF– A 3NF relation is not necessary in BCNF

BCNF Conditions

Page 16: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

16

Formal Presentation• Example

– Consider the two relational schemas• R1 = (cust-num, name, house-num, street, city, state)

cust-num name, house-num, street, city, state

• R2 = (house-num, street, city, state, zip)house-num, street, city, state zipzip state

– Are these relations in 3NF?

Page 17: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

17

Formal Presentation• Example in 3NF?

– For R1

• The only nontrivial functional dependencies in F+ are those with cust-num as a member of the left-side of the FD

• As cust-num is a superkey of R1, these functional dependencies satisfy the second condition for 3NF

R1= (cust-num, name, house-num, street, city, state)cust-num name, house-num, street, city, state

Three conditions for 3NF:• is a trivial functional dependency ( )• contains a key for R • Every B is part of some candidate key of R

Page 18: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

18

Formal Presentation• Example in 3NF?

– For R2

• There are two kinds of nontrivial functional dependencies in F+:– Those with (house-num, street, city, state) as a subset of the left

hand side of the FD: As (house-num, street, city, state) is a superkey for R2, these functional dependencies satisfy the second condition for 3NF

– Those of the form {zip} {state} where For any such functional dependency:

( {state}) – ( {zip}) = {state} (or = )Because state is part of a candidate key of R2, such functional

dependencies satisfy the third condition for 3NF

R2 = (house-num, street, city, state, zip)house-num, street, city, state zipzip state

Three conditions for 3NF:• is a trivial functional dependency • contains a key for R • Every B is part of some candidate key of R

Page 19: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

19

Decomposition into 3NF• Principles

– Input/Output• Input

– A set of functional dependencies F – A relation schema R

• Output– A lossless-join, dependency-preserving

decomposition in 3NF

– Canonical Cover• The set of dependencies Fc in the algorithm is a

canonical cover of the functional dependencies

Page 20: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

20

Fc definition

a canonical cover Fc for F is a set of dependencies Fc for which:

• Fc <=> F • no fd in Fc is superfluous• no fd in Fc contains extraneous attrs• each left side of fd in Fc is unique

Page 21: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

21

Extraneous attribute A in α→β in R

• Aα; F => F – {α→β} α-A)→β• Aβ; F – {α→β} α→(β -A)=> F

• Computed via attribute closures

Page 22: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

22

Fc computation algorithm

Fc = F

Repeat apply union rule (right side of fd)

find fd with extraneous attrs (left/right side)

& delete these

Until Fc doesn’t change

Page 23: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

23

Decomposition into 3NF

• Principles– The algorithm takes a set of dependencies and adds

one schema at a time, instead of decomposing the initial schema repeatedly

– The result is not uniquely defined since • A set of functional dependencies can have more than one

canonical cover• In some cases, the result of the algorithm depends on the order

which it considers the dependencies in Fc(minor bug in the algorithm, see later)

Page 24: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

24

Decomposition into 3NF• Decomposition

– Given: relation R, set F of functional dependencies– Find: decomposition of R into a set of 3NF relation Ri

– Algorithm (sketch, real algorithm on next slides):

– Decomposition produces a lossless join and preserves dependencies– Prove !

(1) Eliminate redundant fd, resulting in a canonical cover Fc of F(2) Create a relation Ri = XY for each FD X Y in Fc(3) If the key K of R does not occur in any relation Ri, create one

more relation Ri=K

Page 25: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

25

Let Fc be the canonical cover of F;j = 0;for each dependency α β in Fc

if none of schemes in Ri (i=1, 2, …, j) contains αβ then j = j+1; Rj = αβ;

end-ifif any of the schemes in Ri (i=1, 2, …, j-1) is contained in Rj

remove Riend-if

end-forif none of the schemes Ri (i=1, 2, …, j) contains a candidate key for R

thenj = j + 1;Rj = any candidate key for R;

end-ifreturn (R1, R2, …, Rj)

Decomposition Algorithm into 3NF

Page 26: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

26

Decomposition into 3NF

• Example– Semester database of a university– Relational schema R=(L, I, T, R, S, G)– Attributes

• L: Lecture R: Room• I: Instructor S: Student• G: Grade T: Time

– Functional Dependencies• L I, TR L, TI R, LS G, TS R, TRI LR

R=(L, I, T, R, S, G)L I, TR L, TI R, LS G, TS R, TRI LR

Page 27: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

27

Decomposition into 3NF• Example

– R=(L, I, T, R, S, G)– F: {L I, TR L, TI R, LS G, TS R, TRI LR}

– Are all FDs necessary? No !• TR L, TI R then TRI LR

– Canonical cover of F• Fc= {L I, TR L, TI R, TS R, LS G}

– Key: (ST)– Key attributes: S, T

(1) Eliminate redundant FD, resulting in a canonical cover Fc of F

Page 28: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

28

Decomposition into 3NF• Example

– R = (L, I, T, R, S, G)– Fc = {L I, TR L, TI R, TS R, LS G}– Key attributes: S, T

– Decomposition in 3NF

• R1 = (L, I) R2 = (T, R, L)• R3 = (T, I, R) R4 = (L, S, G)• R5 = (S, T, R)

(3) If the key K of R does not occur in any relation Ri, create one more relation Ri=K, but it does.

(2) Create a relation Ri = XA for each FD X A in Fc

Page 29: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

29

Decomposition into 3NF

• 3NF Decomposition Algorithm– Proof of Correctness

3NF decomposition algorithm is lossless join, dependency preserving decomposition into 3NF1. Dependency preserving

2. Lossless join

3. 3NF

Page 30: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

30

Proof: Decomposition into 3NF is dependency preserving

• 3NF Decomposition Algorithm

– Decomposition is dependency preserving• 3NF decomposition algorithm is dependency

preserving since there is a relation for every FD in Fc.

Page 31: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

31

Proof: Decomposition into 3NF is a lossless join

• 3NF Decomposition Algorithm

– Decomposition is lossless join• Lossless join decomposition

– A decomposition {R1, R2} is a lossless-join decompositionif R1 R2 R1 or R1 R2 R2

• Idea:

– A candidate key (K) is in one of the relations Ri in decomposition (last step of algorithm guarantees this)

– Closure of candidate key under Fc must contain all attributes in R (definition of candidate key)

– Follow the steps of attribute closure algorithm (Fig. 7.9)to show that the sufficient lossless join condition is satisfied for K+.

Page 32: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

32

Proof: Decomposition into 3NF is actually 3NF!

• 3NF Decomposition Algorithm– Decomposition into 3NF

• Claim– If a relation Ri is in the decomposition generated by the

synthesis algorithm, then Ri is in 3NF

• Idea– To test for 3NF, it is sufficient to consider the functional

dependencies whose right-hand side is a single attribute– Therefore to see that Ri is in 3NF, we must show that

any functional dependency that holds in Ri, satisfies the definition of 3NF

Page 33: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

33

Proof: Decomposition into 3NF is actually 3NF!• 3NF Decomposition Algorithm

– Decomposition into 3NF• Demonstration

– Assume is the dependency that generated Ri in the algorithm

– B must be in or , since B is in Ri and generated Ri

– Let us consider two possible cases» B is in but not » B is in but not

Page 34: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

34

Proof: 3NF Decomposition is 3NF!

• 3NF Decomposition Algorithm– Decomposition into 3NF

• Demonstration– B is in but not in :

must be superkey (why?)» The second condition of 3NF is satisfied

– B is in but not in is a candidate key» The third alternative in the definition of 3NF is

satisfied» Note: we cannot show that is a superkey. This

shows exactly why the third alternative is present in the definition of 3NF

Three conditions for 3NF:• is a trivial functional dependency • contains a key for R • Every B is part of some candidate key of R

Page 35: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

35

Decomposition into 3NF

• B is in Assume is not a superkey

must contain some attribute not in 1. Since B is in F+ it must be derivable from Fc, by using

attribute closure on 2. Attribute closure cannot have used

if it had been used, must be contained in the attribute closure of , which is not possible, since we assumed is not a superkey

3. Now, using (- {B}) and B, we can derive B (since , and since B is non-trivial)

4. Then, B is extraneous in the right-hand side of ; which is not possible since is in Fc (contradiction!)

5. Thus, if B is in then must be a superkey

: FD in R : FD that was used to generated Ri

Page 36: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

36

Comparison of BCNF and 3NF• BCNF or 3NF?

– Relations in BCNF and 3NF• Relations in BCNF: no repetition of information• Relations in 3NF: problem of repetition of information

– Decomposition in BCNF and in 3NF• It is always possible to decompose a relation into

relations in 3NF and – the decomposition is lossless – dependencies are preserved

• It is always possible to decompose a relation into relations in BCNF and

– the decomposition is lossless– the information is not repeated

Page 37: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

37

Compare BCNF and 3NF• To summarize

– Design Goals• Goal for a relational database design is:

– BCNF (no redundant information)– Lossless join– Dependency preservation

• If we cannot achieve this, we accept:– 3NF (possible repetition of information)– Lossless join– Dependency preservation

Page 38: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

38

Summary

• We have learned:– LLJ– DP– BCNF + algorithm– 3rd NF + algorithm

Page 39: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

39

… to follow

Relational Algebra, revisited

Page 40: Dr. A.I. Cristea acristea/ CS 319: Theory of Databases

40

Questions?