normalization

25
Murali Mani Normalization

Upload: alfonso-rivers

Post on 15-Mar-2016

30 views

Category:

Documents


0 download

DESCRIPTION

Normalization. What and Why Normalization?. To remove potential redundancy in design Redundancy causes several anomalies : insert, delete and update Normalization uses concept of dependencies Functional Dependencies Idea used: Decomposition - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Normalization

Murali Mani

Normalization

Page 2: Normalization

Murali Mani

What and Why Normalization? To remove potential redundancy in design

Redundancy causes several anomalies: insert, delete and update

Normalization uses concept of dependencies Functional Dependencies

Idea used: Decomposition Break R (A, B, C, D) into R1 (A, B) and R2 (B, C,

D) Use decomposition judiciously.

Page 3: Normalization

Murali Mani

Insert Anomaly

sNumber sName pNumber pName

s1 Dave p1 MM

s2 Greg p2 ER

Student

Note: We cannot insert a professor who has no students.

Insert Anomaly: We are not able to insert “valid” value/(s)

Page 4: Normalization

Murali Mani

Delete AnomalysNumber sName pNumber pName

s1 Dave p1 MM

s2 Greg p2 ER

Student

Note: We cannot delete a student that is the only student of a professor.

Note: In both cases, minimum cardinality of Professor in the correspondingER schema is 0

Student

sNumber

sName

Professor

pNumber

pName

HasAdvisor

(1,1) (0,1)

years

Delete Anomaly: We are not able to perform a delete without losing some “valid” information.

Page 5: Normalization

Murali Mani

Update AnomalysNumber sName pNumber pName

s1 Dave p1 MM

s2 Greg p1 MM

Student

Note: To update the name of a professor, we have to update in multiple tuples.Update anomalies are due to redundancy.

Student

sNumber

sName

Professor

pNumber

pName

HasAdvisor

(1,1) (0,*)

years

Note the maximum cardinality of Professor in the corresponding ER schema is *

Update Anomaly: To update a value, we have to update multiple rows.

Page 6: Normalization

Murali Mani

Keys : Revisited A key for a relation R (a1, a2, …, an) is a set

of attributes, K that uniquely determine the values for all attributes of R.

A key K is minimal: no proper subset of K is a key.

A superkey need not be minimal Prime Attribute: An attribute of a key

Page 7: Normalization

Murali Mani

Keys: Example

sNumber sName address

1 Dave 144FL

2 Greg 320FL

Student

Primary Key: <sNumber>Candidate key: <sName>Some superkeys: {<sNumber, address>,

<sName>, <sNumber>, <sNumber, sName>, <sNumber, sName, address>}

Prime Attribute: {sNumber, sName}

Page 8: Normalization

Murali Mani

Functional Dependencies (FDs)

sNumber sName address

1 Dave 144FL

2 Greg 320FL

Student

Suppose we have the FD sName address• for any two rows in the Student relation with the same value for sName, the value for address must be the same• i.e., there is a function from sName to address

Note: • We will assume no null values.• Any key (primary or candidate) or superkey of a relation R functionally determines all attributes of R

Page 9: Normalization

Murali Mani

Properties of FDs Consider A, B, C, Z are sets of attributes

Reflexive (also called trivial FD): if A B, then A B

Transitive: if A B, and B C, then A C Augmentation: if A B, then AZ BZUnion: if A B, A C, then A BC Decomposition: if A BC, then A B, A C

Page 10: Normalization

Murali Mani

Inferring FDs Why?

Suppose we have a relation R (A, B, C) and we have functional dependencies A B, B C, C A what is a key for R? Should we split R into multiple relations?

We can infer A ABC, B ABC, C ABC. Hence A, B, C are all keys.

Page 11: Normalization

Murali Mani

Algorithm for inference of FDs Computing the closure of set of attributes

{A1, A2, …, An}, denoted {A1, A2, …, An}+

1. Let X = {A1, A2, …, An}2. If there exists a FD B1, B2, …, Bm C, such

that every Bi X, then X = X C3. Repeat step 2 till no more attributes can be

added.4. {A1, A2, …, An}+ = X

Page 12: Normalization

Murali Mani

Inferring FDs: Example 1 Given R (A, B, C), and FDs A B, B C, C

A, what are possible keys for R Compute the closure of attributes:

{A}+ = {A, B, C} {B}+

= {A, B, C} {C}+

= {A, B, C} So keys for R are <A>, <B>, <C>

Page 13: Normalization

Murali Mani

Inferring FDs: Example 2 Consider R (A, B, C, D, E) with FDs A B,

B C, CD E, does A E? Let us compute {A}+

{A}+ = {A, B, C} Therefore A E is false

Page 14: Normalization

Murali Mani

Decomposing Relations

sNumber sName pNumber pName

s1 Dave p1 MM

s2 Greg p2 MM

StudentProf

FDs: pNumber pName

sNumber sName pNumber

s1 Dave p1

s2 Greg p2

Student

pNumber pName

p1 MM

p2 MM

Professor

Page 15: Normalization

Murali Mani

Decomposition: Lossless Join Property

Generating spurious tuples

sNumber sName pName

S1 Dave MM

S2 Greg MM

Student

pNumber pName

p1 MM

p2 MM

Professor

sNumber sName pNumber pName

s1 Dave p1 MM

s1 Dave p2 MM

s2 Greg p1 MM

s2 Greg p2 MM

StudentProf

Page 16: Normalization

Murali Mani

Normalization Step Consider relation R with set of attributes AR.

Consider a FD A B (such that no other attribute in (AR – A – B) is functionally determined by A).

If A is not a superkey for R, we may decompose R as: Create R’ (AR – B) Create R’’ with attributes A B Key for R’’ = A

Page 17: Normalization

Murali Mani

Normal Forms: BCNF Boyce Codd Normal Form (BCNF): For every

non-trivial FD X a in R, X is a superkey of R.

Page 18: Normalization

Murali Mani

BCNF exampleSCI (student, course, instructor)FDs: student, course instructorinstructor course

Decomposition:SI (student, instructor)Instructor (instructor, course)

Page 19: Normalization

Murali Mani

Dependency PreservationWe might want to ensure that all specified FDs are captured.BCNF does not necessarily preserve FDs.2NF, 3NF preserve FDs.

student instructor

Dave MM

Dave ER

SI

instructor course

MM DB 1

ER DB 1

Instructor

student instructor course

Dave MM DB 1

Dave ER DB 1

SCI (from SI and Instructor)

SCI violates the FDstudent, course instructor

Page 20: Normalization

Murali Mani

Normal Forms: 3NF Third Normal Form (3NF): For every non-

trivial FD X a in R, either a is a prime attribute or X is a superkey of R.

Page 21: Normalization

Murali Mani

3NF - exampleLot (propNo, county, lotNum, area, price, taxRate)Candidate key: <county, lotNum>FDs: county taxRatearea price

Decomposition:Lot (propNo, county, lotNum, area, price)County (county, taxRate)

Page 22: Normalization

Murali Mani

3NF - exampleLot (propNo, county, lotNum, area, price)County (county, taxRate)

Candidate key for Lot: <county, lotNum>

FDs: county taxRatearea price

Decomposition:

Lot (propNo, county, lotNum, area)County (county, taxRate)Area (area, price)

Page 23: Normalization

Murali Mani

Extreme ExampleConsider relation R (A, B, C, D) with primary key (A, B, C), and FDs B D, and C D. R violates 3NF. Decomposing it, we get 3 relations as:

R1 (A, B, C), R2 (B, D), R3 (C, D)

Let us consider an instance where we need these 3 relations and how we do a natural join ⋈

A B C

a1 b1 c1

a2 b2 c1

a3 b1 c2

B D

b1 d1

b2 d2

R1R2

C D

c1 d1

c2 d2

R3

A B C D

a1 b1 c1 d1

a2 b2 c1 d2

a3 b1 c2 d1

R1 R2: violates ⋈ C D

A B C D

a1 b1 c1 d1

R1 R2 R3: no FD is violated⋈ ⋈

Page 24: Normalization

Murali Mani

Define Foreign Keys? Consider the normalization step: A relation R

with set of attributes AR, and a FD A B, and A is not a key for R, we decompose R as: Create R’ (AR – B) Create R’’ with attributes A B Key for R’’ = A

We can also define foreign key R’ (A) references R’’ (A)

Question: What is key for R’?

Page 25: Normalization

Murali Mani

How does Normalization Help?

Employee DeptWorksFor

ssn

name

lotdid dName

(1, 1) (0, *)

Employee (ssn, name, lot, dept)Dept (did, dName)FK: Employee (dept) REFERENCES Dept (did)

Suppose: employees of a dept are in the same lotFD: dept lot

Decomposing:Employee (ssn, name, dept)Dept (did, dName)DeptLot (dept, lot)

(or) Employee (ssn, name, dept)Dept (did, dName, lot)

Employee DeptWorksFor

ssn

namedid dName

(1, 1) (0, *)

lot