lecture 10 normalization. introduction relations derived from er model may be ‘faulty’ –...

56
Lecture 10 Normalization

Upload: prosper-phillips

Post on 19-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

Lecture 10

Normalization

Page 2: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

2

Introduction• Relations derived from ER model may be ‘faulty’

– Subjective process.– May cause data redundancy, and insert/delete/update

anomalies.

• We use some mathematical (semantic?) properties of relations to– locate these faults and– fix them

• Process is called Normalization.

Page 3: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

3

Normalization (contd.)

• Relational database schema = set of relations

• Relation = set of attributes

• How we group the attributes to relations is important.

Page 4: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

4

Normalization (contd.)

• Too many attributes in a relation– Waste space– Anomalies

• Insert anomaly• Delete anomaly• Update anomaly

• Decomposing the relation into too smaller set of relations– Loss-less join property– Dependency preserving property

Page 5: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

5

Data Redundancy

• Major aim of relational database design is – to group attributes into relations to minimize data redundancy

and – to reduce file storage space required by base relations.

• Data redundancy is undesirable because of the following anomalies– ‘Insert’ anomalies– ‘Delete’ anomalies– ‘Update’ anomalies

Page 6: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

6

Anomalies

Too many attributes…For example,

LECTURER (id, name, address, salary, department, building)

Page 7: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

7

Anomalies (contd.)

Insertion Anomaly…1. Inserting a new lecturer to the

LECTURER table- Department information is repeated (ensure that correct department information is inserted).

LECTURER (id, name, address, salary, department, building)

Page 8: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

8

Anomalies (contd.)

• Inserting a department with no employees– (Impossible – b/c null values for id is not allowed)

Page 9: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

9

Anomalies (contd.)

Deletion Anomalies…

• Deleting the last lecturer from the department will lose information about the department.

LECTURER (id, name, address, salary, department, building)

Page 10: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

10

Anomalies (contd.)

Updating Anomalies…

• Updating the department’s building needs to be done for all lecturers working for that department.

LECTURER (id, name, address, salary, department, building)

Page 11: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

11

Decomposition of Relations

• Staff and Branch relations which are obtained by decomposing StaffBranch do not suffer from these anomalies.

• Two important properties of decomposition– Lossless-join property enables us to find any instance of

original relation from corresponding instances in the smaller relations.

– Dependency preservation property enables us to enforce a constraint on original relation by enforcing some constraint on each of the smaller relations.

Page 12: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

12

Loss-less join property

Decomposing the relation into too smaller relations…

• Loss-less join property: we might lose information if we decompose relations…

Page 13: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

13

Loss-less join property (contd.)

For example,

S P DS1 P1 D1S2 P2 D2S3 P1 D3

S P

S1 P1S2 P2S3 P1

P DP1 D1P2 D2P1 D3

S R1 R2

Page 14: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

14

Loss-less join property (contd.)

Joining them together, we get spurious tuples…

S P D

S1 P1 D1

S1 P1 D3

S2 P2 D2

S3 P1 D1

S3 P1 D3

R1 R2

Page 15: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

15

The Process of Normalization• Formal technique for analyzing a relation based on its primary key and

functional dependencies between its attributes.

• Often executed as a series of steps. Each step corresponds to a specific normal form, which has known properties.

• As normalization proceeds, relations become progressively more restricted (stronger) in format and also less vulnerable to update anomalies.

• Given a relation, use the following cycle– Find out what normal form it is in– Transform the relation to the next higher form by decomposing it to

form simpler relations– You may need to refine the relation further if decomposition

resulted in undesirable properties

Normalization is based on Functional Dependencies

Page 16: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

16

Functional dependency

• A functional dependency, denoted by X Y, • X functionally determines Y• Y is functionally dependent on X

• where X and Y are sets of attributes in relation R, specifies the following constraint:Let t1 and t2 be tuples of relation R for any given instanceWhenever t1[X] = t2[X] then t1[Y] = t2[Y]

where ti[X] represents the values for X in tuple ti

Page 17: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

17

STUDENT COURSE TEACHER

Narayana Database ABC

Sumith Database ABC

Nalin Operating Systems Samantha

Kamal Mathematics Chandrika

Janith Database ABC

Ranil Operating Systems Samantha

Saman Mathematics Chandrika

Ruwan Database ABC

TEACH

Functional dependency (contd.)

TEACHER COURSE

Page 18: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

Functional Dependency• Diagrammatic representation:

• Determinant of a functional dependency refers to attribute or group of attributes on left-hand side of the arrow.

• If the determinant can maintain the functional dependency with a minimum number of attributes, then we call it full functional dependency.

Page 19: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

19

Key Terms

Review of some terms…• Superkey: Set of attributes S in relation R that

can be used to identify each tuple uniquely.

• Key: A key is a superkey with the additional property that removal of any attributes from the key will not satisfy the key condition.

Page 20: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

20

Key Terms

• Candidate Key: Each key of a relation is called a candidate key.

• Primary Key: A candidate key is chosen to be the primary key.

• Prime Attribute: an attribute which is a member of a candidate key.

• Nonprime Attribute: An attribute which is not prime.

Page 21: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

Unnormalized Form (UNF)• A table that contains one or more repeating groups.

• To create an unnormalized table: – Transform data from information source (e.g. form) into table

format with columns and rows.

Name Address Phone

Sally Singer 123 Broadway New York, NY, 11234 (111) 222-3345

Jason Jumper 456 Jolly Jumper St. Trenton NJ, 11547 (222) 334-5566

Example 1 – address and name fields are composite

Page 22: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

Another example of UNF

Rep ID Representative Client 1 Time 1 Client 2 Time 2 Client 3 Time 3

TS-89 Gilroy Gladstone US Corp. 14 hrs Taggarts 26 hrs Kilroy Inc. 9 hrs

RK-56 Mary Mayhem Italiana 67 hrs Linkers 2 hrs    

Example 2 – repeating columns for each client &

composite name field

Page 23: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

23

UNF to 1NF

• Remove repeating group by:– entering appropriate data into the empty columns of rows

containing repeating data (‘flattening’ the table).

Or by– placing repeating data along with copy of the original key

attribute(s) into a separate relation.

Page 24: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

24

Normalization (contd.)

1st Normal Form• A relation R is in first normal

form (1NF) if domains of all attributes in the relation are atomic (simple & indivisible).

• Avoid multivalued & composite attributes.

Page 25: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

25

Normalization (contd.)

For example…DEPARTMENT (Dname,Dnumber, DMGRSSN, DLocation)

• Department relation not in 1NF• How to take into 1NF ?

DEPARTMENT

DNAME DNUMBER DMGRSSN DLOCATIONS

Research 5 333445555 {Mathara, Kandy, Metro}

Administration 4 987654321 {Malabe}

Headquarters 1 888665555 {Metro}

Page 26: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

26

Normalization (contd.)

• Solution 1: Create a separate DLOCATION relation with foreign key.

• Solution 2: If max number of locations is known, create a column for each location (may have lots of null values).

• Solution 3: Repeat the same info (redundancy + new key attribute).

Page 27: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

27

Normalization (contd.)

• Expand the key so that there will be a separate tuple in the original DEPARTMENT relation for each location of the DEPARTMENT.

• This solution has the disadvantage of introducing redundancy in the relation.

DNAME DNUMBER DMGRSSN DLOCATIONS

Research 5 333445555 MatharaResearch 5 333445555 KandyResearch 5 333445555 MetroAdministration 4 987654321 Malabe

Headquarters 1 888665555 Metro

1NF relation with redundancy

Solution 1:DEPARTMENT

Page 28: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

28

DNAME DNUMBER DMGRSSN DLOC1 DLOC2 DLOC3

Research 5 333445555 Mathara Kandy Metro

Administration 4 987654321 Malabe Null Null

Headquarters 1 888665555 Metro Null Null

Normalization (contd.)Solution 2:

DEPARTMENT

• Need to know max number of locations.• create a column for each location.• may have lots of null values.

Page 29: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

29

Normalization (contd.)

• Remove the attribute DLOCTION and place it in a separate relation DEPT_LOCTIONS along with the primary key DNUMBER of DEPARTMENT.

• The PK is the combination {DNUMBER, DLOCTION}

• This decompose the non-INF relation into two INF relation.

DEPT_LOCATIONSDNUMBER DLOCATIONS

1 Metro

4 Malabe

5 Mathara

5 Kandy

5 Metro

DNAME DNUMBER DMGRSSN

Research 5 333445555

Administration 4 987654321

Headquarters 1 888665555

DEPARTMENT

Solution 3:

Page 30: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

30

Normalization (contd.)• A functional dependency, X Y is a full functional dependency if removal of any

attribute A from X means that the dependency does not hold (i.e. (X –{A}) Y does not hold )

TEACH

STUDENT COURSE TEACHER CAMPUSNarayan Database ABC ATI-KandySmith Database XYZ ATI-JaffnaNalin Operating Systems Samantha ATI-JaffnaKamal Operating Systems ABC ATI-KandyJanith Database ABC ATI-KandyRanil Operating Systems Samantha ATI-JaffnaSaman Operating Systems ABC ATI-KandyRuwan Database XYZ ATI-Jaffna

{Teacher, Campus} Course

Page 31: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

31

Normalization (contd.)2nd Normal Form:• A relation R is in second normal form (2NF) if every nonprime

attribute A in R is not partially dependent on any key of R.

TEACHER CAMPUS COURSE ADDRESS

ABC ATI-Kandy Database 16 Keppetipola Rd,KandyXYZ ATI-Jaffna Database 665/2 Beach Rd,GurunagarABC ATI-Kandy Operating Systems 16 Keppetipola Rd,KandySamantha ATI-Jaffna Operating Systems 665/2 Beach Rd,Gurunagar

Example: Not in 2NF

Page 32: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

32

Normalization (contd.)• Lossless join decomposition:

– Decomposition of R into X and Y is lossless-join w.r.t. a set of FDs F if, for every instance r that satisfies F:

• X(r) Y (r) = r

Theorem:

This condition holds if attributes common to X and Y contains a key for either X or Y

Page 33: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

33

Removing partial dependency

• Place the attributes that create the partial dependency in a separate table.

• Make sure that the new table's primary key is left in the original table.

Page 34: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

34

Normalization (contd.)

TEACHER CAMPUS COURSE ADDRESS

ABC ATI-Kandy Database 16 Keppetipola Rd,KandyXYZ ATI-Jaffna Database 665/2 Beach Rd,GurunagarABC ATI-Kandy Operating Systems 16 Keppetipola Rd,KandySamantha ATI-Jaffna Operating Systems 665/2 Beach Rd,Gurunagar

Page 35: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

35

Normalization (contd.)

CAMPUS ADDRESSATI-Kandy 16 Keppetipola Rd,KandyATI-Jaffna 665/2 Beach Rd,Gurunagar

TEACHER CAMPUS COURSE

ABC ATI-Kandy DatabaseXYZ ATI-Jaffna DatabaseABC ATI-Kandy Operating SystemsSamantha ATI-Jaffna Operating Systems

Example: After normalized into 2NF

Page 36: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

36

Another Example

EMP_PROJ

SSN

PNUM HOURS

ENAME PNAME LOC

FD1

FD2

FD3

Page 37: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

37

Normalization (contd.)

SSN ENAME

SSN PNUM HOURS

PNUM PNAME PLOC

EP1

EP2

EP3

Page 38: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

38

Normalization (contd.)

3rd Normal Form:

• A relation R is in 3rd normal form (3NF) if every

– R is in 2NF, and– No nonprime attribute is

transitively dependent on any key.

Page 39: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

39

Transitive dependency Attribute is dependent on another attribute that is

not part of the primary key. Requires the decomposition of the table

containing the transitive dependency.

ENAME SSN

BDATE ADD DNUM DNAME DMGR

EMP_DEPT

Page 40: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

40

Removing transitive dependency

• Place the attributes that create the transitive dependency in a separate table.

• Make sure that the new table's primary key attribute is the foreign key in the original table.

Page 41: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

41

ENAME SSN BDATE ADD DNUM

DNUM DNAME DMGR

ENAME SSN

BDATE ADD DNUM DNAME DMGR

Page 42: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

42

INV_NUM INV_DATE INV_AMOUNT CUS_NUM CUS_ADDRESS CUS_PHONE

Original table

Transitive Dependencies

INV_NUM INV_DATE INV_AMOUNT CUS_NUM

CUS_NUM CUS_ADDRESS CUS_PHONE

New Tables

Remove the Transitive dependency

Is the table in 3NF?

Why? Remove Transitive Dependency

Page 43: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

43

INV_NUM INV_DATE INV_AMOUNT CUS_NUM CUS_ADDRESS CUS_PHONE

Original table

Transitive Dependencies

INV_NUM INV_DATE INV_AMOUNT CUS_NUM

CUS_NUM CUS_ADDRESS CUS_PHONE

New Tables

Page 44: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

44

Normalization (contd.)

Keys: PropertyID, (County_Name, Lot#)

PROPERTY_ID

COUNTY_NAME

LOT#

AREA PRICE

TAX_RATE

FD1

FD2

FD3FD4

Page 45: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

45

PO( PO-NO, PO-DATE, EMP-CODE, SUPPNO, SUPP-NAME)

Supplier name is a non-key field depended on another non-key field (i.e. the supplier no) in addition to be depended on the key purchase order no.

Purchase order

Page 46: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

46

Deals with the relationship between non-key fields

A non-key field cannot be a fact about another non-key field

Supplier

Purchase order

Page 47: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

47

Normalization (contd.)

• 1NF, 2NF & 3NF guarantee to preserve lossless join property.

Page 48: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

48

Your Turn !!

C1 C2 C3 C4 C5

Dependency diagram

Identify the dependencies shown in the above diagram

C1->C2 partial dependencyC4 ->C5 transitive dependencyC1,C3 -> C2,C4,C5 functional dependency

Page 49: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

• Create a database whose tables are at least in 2NF, showing the dependency diagrams for each table.

C1 C2

C1 C3 C4 C5

Table 1

Primary key: C1Foreign key: NoneNormal form: 3NF

Table 2

Primary key: C1 + C3Foreign key: C1 (to Table 1)Normal form: 2NF, because the

table exhibits the transitivedependencies C4 C5

Page 50: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

50

Create a database whose tables are at least in 3NF

C1 C2

C1 C3 C4

C4

Table 1Primary key: C1Foreign key: NoneNormal form: 3NF

Table 2Primary key: C1 + C3Foreign key: C1 (to Table 1)

C4 (to Table 3)Normal form: 3NF

Table 3Primary key: C4Foreign key: NoneNormal form: 3NF

C5

Page 51: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

51

Normalization (contd.)• Denormalization…

Sometime for performance reasons, database designer may leave the relation in a lower normal form. This process is known as denormalization.

Page 52: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

52

Normalization

• Normalization complete

• Any questions ???

Page 53: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

53

Normalization Flow

UNF

1NF

2NF

3NF

Remove repeating groups

Remove partial dependencies

Remove transitive dependencies

More normalized forms

Page 54: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

Your Turn! Student Results Table

CourseCode

CourseTitle

StudentCode

StudentName

Date ofBirth

TutorCode

TutorName

Grade Result

SYA SystemsAnalysis

A2345A7423B3472A3472B9843

SmithBarkerGreenHarrisGreen

20/08/6903/04/5923/02/7017/07/6910/11/68

17461746133017461330

JonesJonesJarvisJonesJarvis

ACDFB

Dist.PassPassFailMerit

COB COBOL A7423A4217B8238

BarkerMorrisCarter

03/04/6917/01/6809/12/69

152015201520

HooperHooperHooper

EBC

FailMeritPass

PAS Pascal A4217B9843A3393A4247

MorrisGreenWhiteCross

17/01/6810/11/6830/09/6925/12/69

1520128312831520

HooperTrotterTrotterHooper

ABEC

Dist.MeritFailPass

Page 55: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

55

• List the functional dependencies and normalise the data to 3NF.

Page 56: Lecture 10 Normalization. Introduction Relations derived from ER model may be ‘faulty’ – Subjective process. – May cause data redundancy, and insert/delete/update

56

Conclusion• Quality of the relations derived from ER models is unknown.

• Normalization is a systematic process of either assessing or converting these relations into progressively stricter normal forms.

• Advanced normal forms such as Boyce-Codd normal form (BNCF), 4NF and 5NF exist