databases -normalization iteaching.csse.uwa.edu.au/units/cits2232/lectures/db-norm1.pdf · for...

24
Databases -Normalization I (GF Royle, N Spadaccini 2006-2010) Databases - Normalization I 1 / 24

Upload: trinhhanh

Post on 05-Jul-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Databases -Normalization Iteaching.csse.uwa.edu.au/units/CITS2232/lectures/db-norm1.pdf · For example, consider the sort of ... Normalization I 12 / 24. Decomposition Lossless-join

Databases -Normalization I

(GF Royle, N Spadaccini 2006-2010) Databases - Normalization I 1 / 24

Page 2: Databases -Normalization Iteaching.csse.uwa.edu.au/units/CITS2232/lectures/db-norm1.pdf · For example, consider the sort of ... Normalization I 12 / 24. Decomposition Lossless-join

This lecture

This lecture introduces normal forms, decomposition andnormalization.

We will explore problems that arise from poorly designed schema, andintroduce "decomposition".

(GF Royle, N Spadaccini 2006-2010) Databases - Normalization I 2 / 24

Page 3: Databases -Normalization Iteaching.csse.uwa.edu.au/units/CITS2232/lectures/db-norm1.pdf · For example, consider the sort of ... Normalization I 12 / 24. Decomposition Lossless-join

Redundancy

Redundancy

One of the main reasons for using relational tables for data is to avoidthe problems caused by redundant storage of data.For example, consider the sort of general information that is storedabout a student:

Student NumberNameAddressDate of Birth

A number of different parts of the university may keep differentadditional items of data regarding students, such as grades, financialinformation and so on.

(GF Royle, N Spadaccini 2006-2010) Databases - Normalization I 3 / 24

Page 4: Databases -Normalization Iteaching.csse.uwa.edu.au/units/CITS2232/lectures/db-norm1.pdf · For example, consider the sort of ... Normalization I 12 / 24. Decomposition Lossless-join

Redundancy

Repeating Data

Suppose that the marks are kept in the following format:

Student Number Name Unit Code Mark14058428 John Smith CITS 3240 7214058428 John Smith CITS 1200 6814058428 John Smith CITS 2200 7715712381 Jill Tan CITS 1200 8815712381 Jill Tan CITS 2200 82

Then this table contains redundant data, because the student’s nameis needlessly repeated.

If the financial system also stores student numbers and names, thenthere is redundancy between tables as well as within tables.

(GF Royle, N Spadaccini 2006-2010) Databases - Normalization I 4 / 24

Page 5: Databases -Normalization Iteaching.csse.uwa.edu.au/units/CITS2232/lectures/db-norm1.pdf · For example, consider the sort of ... Normalization I 12 / 24. Decomposition Lossless-join

Redundancy

Problems with redundancy

Apart from unnecessary storage, redundancy leads to some moresignificant problems:

Update AnomaliesIf one copy of a data item is updated — for example, a studentchanges his or her name, then the database becomesinconsistent unless every copy is updated.Insertion AnomaliesA new data item, for example a new mark for a student, cannot beentered without adding some other, potentially unnecessary,information such as the student’s name.Deletion AnomaliesIt may not be possible to delete some data without losing other,unrelated data, as well (an example is on the next slide).

(GF Royle, N Spadaccini 2006-2010) Databases - Normalization I 5 / 24

Page 6: Databases -Normalization Iteaching.csse.uwa.edu.au/units/CITS2232/lectures/db-norm1.pdf · For example, consider the sort of ... Normalization I 12 / 24. Decomposition Lossless-join

Redundancy

Deletion AnomaliesA deletion anomaly occurs when a table storing redundant informationbecomes a proxy for storing that information properly.

For example, suppose that a company pays fixed hourly ratesaccording to the level of an employee.

Name Level RateSmith 10 25.00Jones 8 20.50Tan 10 25.00

White 9 22.00...

......

This table contains the employee data, but also the assocationbetween level of an employee and the rate they are paid.

(GF Royle, N Spadaccini 2006-2010) Databases - Normalization I 6 / 24

Page 7: Databases -Normalization Iteaching.csse.uwa.edu.au/units/CITS2232/lectures/db-norm1.pdf · For example, consider the sort of ... Normalization I 12 / 24. Decomposition Lossless-join

Redundancy

What if Jones leaves?If Jones happens to be the only employee currently at level 8, and heleaves and is deleted from the database, then the more generalinformation that “The hourly rate for Level 8 is $20.50” is also lost.

In this situation the right approach is to keep a separate table thatrelates levels with hourly rates, and to remove the “rate” informationfrom the employee table.

Level Rate...

...8 20.509 22.00

10 25.00...

...

(GF Royle, N Spadaccini 2006-2010) Databases - Normalization I 7 / 24

Page 8: Databases -Normalization Iteaching.csse.uwa.edu.au/units/CITS2232/lectures/db-norm1.pdf · For example, consider the sort of ... Normalization I 12 / 24. Decomposition Lossless-join

Redundancy

Separating the student tablesThe redundancy problems with the student information can also beresolved by creating a separate table with just the basic studentinformation

Student Number Name14058428 John Smith15712381 Jill Tan

Student Number Unit Code Mark14058428 CITS 3240 7214058428 CITS 1200 6814058428 CITS 2200 7715712381 CITS 1200 8815712381 CITS 2200 82

What is the algorithm for deciding how to split a relation?

(GF Royle, N Spadaccini 2006-2010) Databases - Normalization I 8 / 24

Page 9: Databases -Normalization Iteaching.csse.uwa.edu.au/units/CITS2232/lectures/db-norm1.pdf · For example, consider the sort of ... Normalization I 12 / 24. Decomposition Lossless-join

Decomposition

Decomposition

Both of these examples were solved by replacing the originalredundant table by two tables each containing a subset of the originalfields. This leads to the following definition:

A decomposition of a relational schema R consists of replacing therelational schema by two (or more) relational schemas each containinga subset of the attributes of R and that together contain all of theattributes of R.

Data stored under the original schema is projected onto each of theschemas in the decomposition.

(GF Royle, N Spadaccini 2006-2010) Databases - Normalization I 9 / 24

Page 10: Databases -Normalization Iteaching.csse.uwa.edu.au/units/CITS2232/lectures/db-norm1.pdf · For example, consider the sort of ... Normalization I 12 / 24. Decomposition Lossless-join

Decomposition

Decomposition - formally

Given a relation R(A1,A2, . . . ,An) we may decompose R into therelations S(B1,B2, . . . ,Bm) and T(C1,C2, . . . ,Ck) such that:

1 {A1,A2, . . . ,An} = {B1,B2, . . . ,Bm} ∪ {C1,C2, . . . ,Ck}

2 S = πB1,B2,...,Bm(R)

3 T = πC1,C2,...,Ck(R)

(GF Royle, N Spadaccini 2006-2010) Databases - Normalization I 10 / 24

Page 11: Databases -Normalization Iteaching.csse.uwa.edu.au/units/CITS2232/lectures/db-norm1.pdf · For example, consider the sort of ... Normalization I 12 / 24. Decomposition Lossless-join

Decomposition

Example

Suppose that R is the original “Student Number / Name / Unit Code /Mark” schema above — we’ll abbreviate this to

R = SNUM

(S =Student Number, N = Name, U = Unit Code, M = Mark).

Then the decomposition suggested above would decompose R into

R1 = SUM R2 = SN

(GF Royle, N Spadaccini 2006-2010) Databases - Normalization I 11 / 24

Page 12: Databases -Normalization Iteaching.csse.uwa.edu.au/units/CITS2232/lectures/db-norm1.pdf · For example, consider the sort of ... Normalization I 12 / 24. Decomposition Lossless-join

Decomposition

Storing and Recovering Data

Any data stored in the original schema is stored in the decomposedschema(s) by storing its projections onto R1 and R2.

We can then recover the original data by performing a join of thedecomposed relations. In this situation we have the property that forevery legal instance r of R

r = πR1(r) ./ πR2(r)

In other words, the original relation is the join of the decomposedrelations.

(GF Royle, N Spadaccini 2006-2010) Databases - Normalization I 12 / 24

Page 13: Databases -Normalization Iteaching.csse.uwa.edu.au/units/CITS2232/lectures/db-norm1.pdf · For example, consider the sort of ... Normalization I 12 / 24. Decomposition Lossless-join

Decomposition

Lossless-join decomposition

The property that any instance of the original relation can be recoveredas the join of the decomposed relations is called lossless-joindecomposition.

A decomposition of a relational schema R into R1 and R2 is lossless-joinif and only if the set of attributes in R1 ∩ R2 contains a key for R1 or R2.

For our example above, R1 ∩ R2 is the single attribute S (studentnumber) which is a key for R2 = SN and hence the decomposition islossless-join.

(GF Royle, N Spadaccini 2006-2010) Databases - Normalization I 13 / 24

Page 14: Databases -Normalization Iteaching.csse.uwa.edu.au/units/CITS2232/lectures/db-norm1.pdf · For example, consider the sort of ... Normalization I 12 / 24. Decomposition Lossless-join

Decomposition

Other decompositions

In general, an arbitrary decomposition of a schema will not be losslessjoin.

A B Ca1 b1 c1a2 b2 c2a3 b1 c3

Instance r

A Ba1 b1a2 b2a3 b1

InstanceR1 = πAB(r)

B Cb1 c1b2 c2b1 c3

InstanceR2 = πBC(r)

The attributes in R1 ∩ R2 is B which is a key for neither AB nor BC, sothe condition for lossless join is not met.

(GF Royle, N Spadaccini 2006-2010) Databases - Normalization I 14 / 24

Page 15: Databases -Normalization Iteaching.csse.uwa.edu.au/units/CITS2232/lectures/db-norm1.pdf · For example, consider the sort of ... Normalization I 12 / 24. Decomposition Lossless-join

Decomposition

Lossy join

Now consider the join πAB(r) ./ πBC(r)

A B Ca1 b1 c1a1 b1 c3a2 b2 c2a3 b1 c3a3 b1 c1

This contains two tuples that were not in the original relation —because b1 is associated with both a1 and a3 in the first relation, and c1and c3 in the second.

(GF Royle, N Spadaccini 2006-2010) Databases - Normalization I 15 / 24

Page 16: Databases -Normalization Iteaching.csse.uwa.edu.au/units/CITS2232/lectures/db-norm1.pdf · For example, consider the sort of ... Normalization I 12 / 24. Decomposition Lossless-join

Decomposition

Problems with decomposition

Some types of redundancy in (or between) relations can be resolvedby decomposition.

However decomposition introduces its own problems, in particular thefact that queries over the decomposed schemas now require joins; ifsuch queries are very common then the deterioration in performancemay be more severe than the original problems due to redundancy.

To make informed decisions about whether to decompose or notrequires a formal understanding about the types of redundancy andwhich can be resolved through decomposition — this is the theory offunctional dependencies.

(GF Royle, N Spadaccini 2006-2010) Databases - Normalization I 16 / 24

Page 17: Databases -Normalization Iteaching.csse.uwa.edu.au/units/CITS2232/lectures/db-norm1.pdf · For example, consider the sort of ... Normalization I 12 / 24. Decomposition Lossless-join

Functional Dependencies

Functional dependencies

A functional dependency (an FD) is a generalization of the concept ofa key in a relation.

Suppose that X and Y are two subsets of the attributes of a relationsuch that whenever two tuples have the same values on X, then theyalso have the same values on Y. In this situation we say that Xdetermines Y and write

X → Y.

Note that this condition must hold for any legal instance of the relation.

(GF Royle, N Spadaccini 2006-2010) Databases - Normalization I 17 / 24

Page 18: Databases -Normalization Iteaching.csse.uwa.edu.au/units/CITS2232/lectures/db-norm1.pdf · For example, consider the sort of ... Normalization I 12 / 24. Decomposition Lossless-join

Functional Dependencies

Keys

The obvious functional dependencies come from the keys of a relation.

For example, in the student-number / name relation SN we have theobvious functional dependency

S→ N

meaning that the student number determines the name of the student.

Obviously S determines S and so

S→ SN

which is just another way of saying that the student-number is a key forthe whole relation.

(GF Royle, N Spadaccini 2006-2010) Databases - Normalization I 18 / 24

Page 19: Databases -Normalization Iteaching.csse.uwa.edu.au/units/CITS2232/lectures/db-norm1.pdf · For example, consider the sort of ... Normalization I 12 / 24. Decomposition Lossless-join

Functional Dependencies

Superkeys

A key is a minimal set of attributes that determines all of the remainingattributes of a relation.

For example, in the SNUM relation above, the pair SU is a key becausethe student number and unit code determine both the name and themark, or in symbols

SU → SNUM.

(It is clear that no legal instance of the relation can have two tuples withthe same student number and unit code, but different names or marks.)

Any superset of a key is called a superkey — it determines all of theremaining attributes, but is not minimal.

(GF Royle, N Spadaccini 2006-2010) Databases - Normalization I 19 / 24

Page 20: Databases -Normalization Iteaching.csse.uwa.edu.au/units/CITS2232/lectures/db-norm1.pdf · For example, consider the sort of ... Normalization I 12 / 24. Decomposition Lossless-join

Functional Dependencies

Example

Consider a relation R with four attributes ABCD, and suppose that thefollowing functional dependencies hold:

ABC→ D D→ A

What are the keys for R?

The first FD shows us that ABC is a key for this relation, but this is notthe only key for this relation.

The second FD shows us that D determines A, and so BCDdetermines all the attributes and hence is a (candidate) key.

(GF Royle, N Spadaccini 2006-2010) Databases - Normalization I 20 / 24

Page 21: Databases -Normalization Iteaching.csse.uwa.edu.au/units/CITS2232/lectures/db-norm1.pdf · For example, consider the sort of ... Normalization I 12 / 24. Decomposition Lossless-join

Functional Dependencies

Reasoning about FDs

Often some functional dependencies will be immediately obvious fromthe semantics of a relation, but others may follow as a consequence ofthese initial ones.

For example, if R is a relation with FDs A→ B and B→ C, then itfollows that

A→ C

as well.

The set of all FDs implied by a set F of FDs is called the closure of Fand is denoted F+. In order to consider the various normal forms it isimportant to be able to calculate F+ given an initial set F of FDs.

(GF Royle, N Spadaccini 2006-2010) Databases - Normalization I 21 / 24

Page 22: Databases -Normalization Iteaching.csse.uwa.edu.au/units/CITS2232/lectures/db-norm1.pdf · For example, consider the sort of ... Normalization I 12 / 24. Decomposition Lossless-join

Functional Dependencies

Armstrong’s Axioms

Armstrong’s Axioms is a set of three rules that can be repeatedlyapplied to a set of FDs:

Reflexivity: If Y ⊆ X then X → Y.Augmentation: If X → Y then XZ → YZ for any Z.Transitivity: If X → Y and Y → Z then X → Z.

We can augment these with two rules that follow directly fromArmstrong’s Axioms.

Union: If X → Y and X → Z then X → YZ.Decomposition: If X → YZ, then X → Y and X → Z.

(GF Royle, N Spadaccini 2006-2010) Databases - Normalization I 22 / 24

Page 23: Databases -Normalization Iteaching.csse.uwa.edu.au/units/CITS2232/lectures/db-norm1.pdf · For example, consider the sort of ... Normalization I 12 / 24. Decomposition Lossless-join

Functional Dependencies

Sound and complete

The key point about Armstrong’s axioms is that they are both soundand complete. That is, if we start with a set F of FDs then:

Repeated application of Armstrong’s axioms to F generates onlyFDs in F+.Any FD in F+ can be obtained by repeated application ofArmstrong’s axioms to F.

(GF Royle, N Spadaccini 2006-2010) Databases - Normalization I 23 / 24

Page 24: Databases -Normalization Iteaching.csse.uwa.edu.au/units/CITS2232/lectures/db-norm1.pdf · For example, consider the sort of ... Normalization I 12 / 24. Decomposition Lossless-join

Functional Dependencies

Example

Consider a relation with attributes ABC and let

F = {A→ B,B→ C}

Then from transitivity we get A→ C, by augmentation we get AC→ BCand by union we get A→ BC.

FDs that arise from reflexivity such as

AB→ B

are known as trivial dependencies.

(GF Royle, N Spadaccini 2006-2010) Databases - Normalization I 24 / 24