chapter 5 normalization

56
Chapter 5 Normalization Dr. Chuck Lillie CSC 3800 Database Management Systems Time: 10:00 to 11:15 Meeting Days: MF Location: Oxendine 1202 Textbook: Databases Illuminated, Author: Catherine M. Ricardo, 2004, Jones & Bartlett Publishers Spring 2011

Upload: naif

Post on 19-Feb-2016

81 views

Category:

Documents


1 download

DESCRIPTION

CSC 3800 Database Management Systems. Spring 2011. Time: 10:00 to 11:15. Meeting Days: MF. Location: Oxendine 1202. Textbook : Databases Illuminated , Author: Catherine M. Ricardo, 2004, Jones & Bartlett Publishers. Chapter 5 Normalization. Dr. Chuck Lillie. Objectives of Normalization. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chapter 5 Normalization

Chapter 5Normalization

Dr. Chuck Lillie

CSC 3800 Database Management Systems

Time: 10:00 to 11:15 Meeting Days: MF Location: Oxendine 1202

Textbook: Databases Illuminated, Author: Catherine M. Ricardo, 2004, Jones & Bartlett Publishers

Spring 2011

Page 2: Chapter 5 Normalization

Objectives of Objectives of NormalizationNormalizationDevelop a good description of the

data, its relationships and constraints

Produce a stable set of relations that

Is a faithful model of the enterprise Is highly flexible Reduces redundancy-saves space and

reduces inconsistency in data Is free of update, insertion and deletion

anomalies

Page 3: Chapter 5 Normalization

AnomaliesAnomaliesAn anomaly is an inconsistent,

incomplete, or contradictory state of the database

Insertion anomaly – user is unable to insert a new record when it should be possible to do so

Deletion anomaly – when a record is deleted, other information that is tied to it is also deleted

Update anomaly –a record is updated, but other appearances of the same items are not updated

Page 4: Chapter 5 Normalization

ART103A S1001 Smith F101 MWF9 H221 A ART103A S1010 Burns F101 MWF9 H221 ART103A S1006 Lee F101 MWF9 H221 B CSC201A S1003 Jones F105 TUTHF10 M110 A CSC201A S1006 Lee F105 TUTHF10 M110 C HST205A S1001 Smith F202 MWF11 H221

courseNo stuId stuLastName facId schedule room grade

Figure 5.1 The NewClass Table

Anomaly Examples: NewClass Anomaly Examples: NewClass TableTable

NewClass(courseNo, stuId, stuLastName, fID, schedule, room, grade)

• Update anomaly: If schedule of ART103A is updated in first record, and not in second and third – inconsistent data

• Deletion anomaly: If record of student S1001 is deleted, information about HST205A class is lost also

• Insertion anomaly: It is not possible to add a new class, for MTH101A , even if its teacher, schedule, and room are known, unless there is a student registered for it, because the key contains stuId

Page 5: Chapter 5 Normalization

Normal FormsNormal FormsFirst normal form -1NFSecond normal form-2NFThird normal form-3NFBoyce-Codd normal form-BCNFFourth normal form-4NFFifth normal form-5NFDomain/Key normal form-DKNF

Each is contained within the previous form – each has stricter rules than the previous form

Design Object: put schema in highest normal form that is practical and appropriate for the data in the database

Page 6: Chapter 5 Normalization

Normal Forms (cont.)Normal Forms (cont.)

DKNF

5NF4NF

BCNF3NF2NF1NF

All Relations

Page 7: Chapter 5 Normalization

Functional DependencyFunctional DependencyA functional dependency (FD) is a type of

relationship between attributesIf A and B are sets of attributes of relation R,

say B is functionally dependent on A if each A value in R has associated with it exactly one value of B in R.

Alternatively, if two tuples have the same A values, they must also have the same B values

Write A→B, read A functionally determines B, or B functionally dependent on A

FD is actually a many-to-one relationship between A and B

Page 8: Chapter 5 Normalization

Example of FDsExample of FDs

Let R be NewStudent(stuId, lastName, major, credits, status, socSecNo)FDs in R include{stuId}→{lastName}, but not the reverse{stuId} →{lastName, major, credits, status, socSecNo,

stuId} {socSecNo} →{stuId, lastName, major, credits, status,

socSecNo}{credits}→{status}, but not {status}→{credits}

Stuid lastName major credits status socSecNoS1001 Smith History 90 Senior 100429500S1003 Jones Math 95 Senior 010124567S1006 Lee

CSC 15

Freshman

088520876

S1010

Burns

Art 63

Junior

099320985

S1060 Jones CSC 25 Freshman 064624738

Figure 5.3 NewStudent Table (assume each student has only one mjaor)

Page 9: Chapter 5 Normalization

Trivial Functional Trivial Functional DependencyDependencyThe FD X→Y is trivial if set {Y} is a

subset of set {X}

Examples: If A and B are attributes of R,{A}→{A}{A,B} →{A}{A,B} →{B}{A,B} →{A,B}are all trivial FDs

Page 10: Chapter 5 Normalization

KeysKeysSuperkey – functionally determines all

attributes in a relation◦ Superkeys of NewStudent: {stuId}, {stuId,

lastName}, {stuId, any other attribute}, {socSecNo, any other attribute}

Candidate key - superkey that is a minimal identifier (no extraneous attributes)◦ Candidate keys of NewStudent: {stuId},

{socSecNo}◦ If no two students are permitted to have th

same combination of name and major values, {lastName, major} would be a candidate key

◦ A relation may have several candidate keys

Page 11: Chapter 5 Normalization

Keys (cont)Keys (cont)Primary key - candidate key actually

used to identify tuples in a relation◦ Has no null values◦ Values are unique in database◦ {stuId} is primay key

Choose because some international students may not have social security number, and there are some security issues with social security numbers.

Should also enforce uniqueness and no-null rule for candidate keys

Page 12: Chapter 5 Normalization

First Normal FormFirst Normal FormA relation is in 1NF if and only if

every attribute is single-valued for each tuple◦Each cell of the table has only one

value in it◦Domains of attributes are atomic:

no sets, lists, repeating fields or groups allowed in domains

Page 13: Chapter 5 Normalization

Counter-Example for 1NFCounter-Example for 1NF

NewStu(StuId, lastName, major, credits, status, socSecNo) – Assume students can have more than one major

The major attribute is not single-valued for each tuple

Stuid lastName major credits status socSecNoS1001 Smith History 90 Senior 100429500S1003 Jones Math 95 Senior 010124567S1006 Lee

Math 15

Freshman

088520876 CSC

S1010

Burns

English 63

Junior

099320985Art

S1060 Jones CSC 25 Freshman 064624738Figure 5.4(a) NewStu Table (Assume students can have double majors)

Page 14: Chapter 5 Normalization

Ensuring 1NFEnsuring 1NFBest solution: For each multi-valued

attribute, create a new table, in which you place the key of the original table and the multi-valued attribute. Keep the original table, with its key

Ex. NewStu2(stuId, lastName, credits,status, socSecNo) Majors(stuId, major)Stuid lastName credits status socSecNo

S1001 Smith 90 Senior 100429500 S1003 Jones 95 Senior 010124567 S1006 Lee

15

Freshman

088520876

S1010

Burns

63

Junior

099320985

S1060 Jones 25 Freshman 064624738stuId major

S1001 HistoryS1003 MathS1006 CSCS1006 MathS1010 ArtS1010 EnglishS1060 CSC

Majors

NewStu2

Figure 5.4 (b) NewStu2 Table and Majors Table

Page 15: Chapter 5 Normalization

Another method for 1NFAnother method for 1NF“Flatten” the original table by making

the multi-valued attribute part of the keyStudent(stuId, lastName, major, credits,

status, socSecNo)

Stuid lastName major credits status socSecNoS1001 Smith History 90 Senior 100429500S1003 Jones Math 95 Senior 010124567S1006 Lee

CSC 15

Freshman

088520876

S1006 Lee

Math 15

Freshman

088520876

S1010

Burns

Art 63

Junior

099320985

S1010

Burns

English

63

Junior

099320985

S1060 Jones CSC 25 Freshman 064624738

Figure 5.4(d) NewStu Table Rewritten in 1NF, with{StuId, major} as primary key

Page 16: Chapter 5 Normalization

Yet Another MethodYet Another MethodIf the number of repeats is limited, make additional columns for multiple values

Student(stuId, lastName, major1, major2, credits, status, socSecNo)Stuid lastName Major major2 Credits Status socSecNo

S1001 Smith History 9 0 Senior 100429500S1003 Jones Math 95 Senior 010124567S1006 Lee

CSC

Math 15

Freshman

088520876

S1010

Burns

Art

English 63

Junior

099320985

S1060 Jones CSC 25 Freshman 064624738

Figure 5.4(c) NewStu3 Table with two attributes for major

Page 17: Chapter 5 Normalization

Full Functional Full Functional DependencyDependencyIn relation R, set of attributes B is fully functionally dependent on set of attributes A of R if B is functionally dependent on A but not functionally dependent on any proper subset of A

This means every attribute in A is needed to functionally determine B

Page 18: Chapter 5 Normalization

Partial Functional Dependency Partial Functional Dependency ExampleExample

NewClass(courseNo, stuId, stuLastName, facId, schedule, room, grade)

FDs:{courseNo,stuId} → {lastName}{courseNo,stuId} →{facId}{courseNo,stuId} →{schedule}{courseNo,stuId} →{room}{courseNo,stuId} →{grade}courseNo → facId **partial FDcourseNo → schedule **partial FDcourseNo →room ** partial FDstuId → lastName ** partial FD …plus trivial FDs that are partial…

Page 19: Chapter 5 Normalization

Second Normal FormSecond Normal FormA relation is in second normal form (2NF) if it is in first normal form and all the non-key attributes are fully functionally dependent on the key.

No non-key attribute is FD on just part of the key

If key has only one attribute, and R is 1NF, R is automatically 2NF

Page 20: Chapter 5 Normalization

Converting to 2NFConverting to 2NFIdentify each partial FDRemove the attributes that depend on

each of the determinants so identified Place these determinants in separate

relations along with their dependent attributes

In original relation keep the composite key and any attributes that are fully functionally dependent on all of it

Even if the composite key has no dependent attributes, keep that relation to connect logically the others

Page 21: Chapter 5 Normalization

2NF Example2NF ExampleNewClass(courseNo, stuId, stuLastName, facId, schedule, room,

grade )

FDs grouped by determinant:{courseNo} → {courseNo,facId, schedule, room}{stuId} → {stuId, lastName}{courseNo,stuId} → {courseNo, stuId, facId, schedule, room, lastName, grade}

Create tables grouped by determinants:Course(courseNo,facId, schedule, room)Stu(stuId, lastName)

Keep relation with original composite key, with attributes FD on it, if anyNewStu2( courseNo, stuId, grade)

Page 22: Chapter 5 Normalization

2NF Example2NF ExamplecourseNo

stuId stuLastName

facId schedule

room grade

ART103A S1001

Smith F101 MWF9 H221 A

ART103A S1010

Burns F101 MWF9 H221

ART103A S1006

Lee F101 MWF9 H221 B

CSC201A S1003

Jones F105 TUTHF10 M110 A

CSC201A S1006

Lee F105 TUTHF10 M110 C

HST205A S1001

Smith F202 MWF11 H221

First Normal Form Relation

courseNo

stuId grade

ART103A S1001

A

ART103A S1010

ART103A S1006

B

CSC201A S1003

A

CSC201A S1006

C

HST205A S1001

stuId stuLastName

S1001

Smith

S1010

Burns

S1006

Lee

S1003

Jones

courseNo

facId schedule

room

ART103A F101 MWF9 H221 CSC201A F105 TUTHF10 M110 HST205A F202 MWF11 H221

Register Stu Class2

Second Normal Form Relations

Page 23: Chapter 5 Normalization

Transitive DependencyTransitive DependencyIf A, B, and C are attributes of relation

R, such that A → B, and B → C, then C is transitively dependent on A

Example:NewStudent (stuId, lastName, major, credits, status)FD:credits→status

By transitivity:stuId→credits credits→status implies stuId→status

Transitive dependencies cause update, insertion, deletion anomalies.

Page 24: Chapter 5 Normalization

Third Normal FormThird Normal Form A relation is in third normal form

(3NF) if whenever a non-trivial functional dependency X→A exists, then either X is a superkey or A is a member of some candidate key

To be 3NF, relation must be 2NF and have no transitive dependencies

No non-key attribute determines another non-key attribute. Here key includes “candidate key”

Page 25: Chapter 5 Normalization

Making a relation 3NFMaking a relation 3NFFor example,NewStudent (stuId, lastName, major, credits, status)with FD credits→status

Remove the dependent attribute, status, from the relation

Create a new table with the dependent attribute and its determinant, credits

Keep the determinant in the original table

NewStu2 (stuId, lastName, major, credits)Stats (credits, status)

Page 26: Chapter 5 Normalization

Example Transitive Example Transitive DependencyDependencyStuid lastNam

eMajor Credits Status

S1001 Smith History 90 Senior S1003 Jones Math 95 Senior S1006 Lee

CSC

15

Freshman

S1010

Burns

Art

63

Junior

S1060 Jones CSC 25 Freshman

Stuid lastName

Major Credits

S1001 Smith History 9 0S1003 Jones Math 95S1006 Lee

CSC

15

S1010

Burns

Art

63

S1060 Jones CSC 25

Credits Status15

Freshman

25 Freshman

63

Junior

90 Senior 95 Senior

NewStudent

NewStu2 Stats

Removed Transitive Dependency

Transitive Dependency

Page 27: Chapter 5 Normalization

Boyce-Codd Normal FormBoyce-Codd Normal FormA relation is in Boyce/Codd Normal

Form (BCNF) if whenever a non-trivial functional dependency X→A exists, then X is a superkey

Stricter than 3NF, which allows A to be part of a candidate key

If there is just one single candidate key, the forms are equivalent

Page 28: Chapter 5 Normalization

ExampleExampleNewFac (facName, dept, office, rank, dateHired)

FDs:office → deptfacName,dept → office, rank, dateHired facName,office → dept, rank, dateHired

NewFac is not BCNF because office is not a superkey To make it BCNF, remove the dependent attributes to a new

relation, with the determinant as the key Project intoFac1 (office, dept)Fac2 (facName, office, rank, dateHired)

Note we have lost a functional dependency in Fac2 – no longer able to see that {facName, dept} is a determinant, since they are in different relations

Page 29: Chapter 5 Normalization

Example Boyce-Codd Normal Example Boyce-Codd Normal FormFormfacNam

edept office rank dateHire

dAdams Art A101 Professor 1975Byrne Math M201 Assistant 2000Davis Art A101 Associate 1992Gordon Math M201 Professor 1982Hughes Mth M203 Associate 1990Smith CSC C101 Professor 1980Smith History H102 Associate 1990Tanaka CSC C101 Instructor 2001Vaughn CSC C101 Associate 1995

office

dept

A101 ArtC101 CSCC105 CSCH102 HistoryM201 MathM203 Math

facName

office rank dateHired

Adams A101 Professor 1975Byrne M201 Assistant 2000Davis A101 Associate 1992Gordon M201 Professor 1982Hughes M203 Associate 1990Smith C101 Professor 1980Smith H102 Associate 1990Tanaka C101 Instructor 2001Vaughn C101 Associate 1995

Faculty

Fac1 Fac2

Page 30: Chapter 5 Normalization

Normalization ExampleNormalization ExampleRelation that stores information about projects

in large business◦ Work (projName, projMgr, empId, hours, empName,

budget, startDate, salary, empMgr, empDept, rating)prijName projMgr empId hours empName budget startDate salary empMgr empDept ratingJupiter Smith E101 25 Jones 100000 01/15/04 60000 Levine 10 9

Jupiter Smith E105 40 Adams 100000 01/15/04 55000 Jones 12

Jupiter Smith E110 10 Rivera 100000 01/15/04 43000 Levine 10 8Maxima Lee E101 15 Jones 200000 03/01/04 60000 Levine 10Maxima Lee E110 30 Rivera 200000 03/01/04 43000 Levine 10Maxima Lee E120 15 Tanaka 200000 03/01/04 45000 Jones 15

Page 31: Chapter 5 Normalization

Normalization Example Normalization Example (cont)(cont)1. Each project has a unique name.

2. Although project names are unique, names of employees and managers are not.

3. Each project has one manager, whose name is stored in projMgr.

4. Many employees can be assigned to work on each project, and an employee can be assigned to more than one project. The attribute hours tells the number of hours per week a particular employee is assigned to work on a particular project.

5. budget stores the amount budgeted for a project, and startDate gives the starting date for a project.

6. salary gives the annual salary of an employee.7. empMgr gives the name of the employee’s manager, who might

not be the same as the project manager.8. empDept gives the employee’s department. Department names

are unique. The employee’s manager is the manager of the employee’s department.

9. rating gives the employee’s rating for a particular project. The project manager assigns the rating at the end of the employee’s work on the project.

Page 32: Chapter 5 Normalization

Normalization Example Normalization Example (cont)(cont)

Functional dependencies◦ projName projMgr, budget, startDate◦ empId empName, salary, empMgr, empDept◦ projName, empId hours, rating◦ empDept empMgr◦ empMgr does not functionally determine empDept

since people's names were not unique (different managers may have same name and manage different departments or a manager may manage more than one department

◦ projMgr does not determine projNamePrimary Key

◦ projName, empId since every member depends on that combination

Page 33: Chapter 5 Normalization

Normalization Example Normalization Example (cont)(cont)First Normal Form

◦With the primary key each cell is single valued, Work in 1NF

Second Normal Form◦Pratial dependencies

projName projMgr, budget, startDate empId empName, salary, empMgr, empDept

◦Transform to Proj (projName, projMgr, budget, startDate) Emp (empId, empName, salary, empMgr,

empDept) Work1 (projName, empId, hours, rating)

Page 34: Chapter 5 Normalization

Normalization Example Normalization Example (cont)(cont)

prijName projMgr budget startDateJupiter Smith 100000 01/15/04

Maxima Lee 200000 03/01/04

ProjprijName empId hours ratingJupiter E101 25 9

Jupiter E105 40

Jupiter E110 10 8Maxima E101 15Maxima E110 30Maxima E120 15

Work1

empId empName salary empMgr empDeptE101 Jones 60000 Levine 10

E105 Adams 55000 Jones 12

E110 Rivera 43000 Levine 10E101 Jones 60000 Levine 10E110 Rivera 43000 Levine 10E120 Tanaka 45000 Jones 15

Emp

Second Normal Form

Page 35: Chapter 5 Normalization

Normalization Example Normalization Example (cont)(cont)Third Normal Form

◦Proj in 3NF – no non-key atrribute functionally determines another non-key attribute

◦Work1 in 3NF – no transitive dependency involving hours or rating

◦Emp not in 3NF – transitive dependency empDept empMgr and empDept is not a

superkey, nor is empMgr part of a candidate key

Need two relations Emp1 (empId, empName, salary, empDept) Dep (empDept, empMgr)

Page 36: Chapter 5 Normalization

Normalization Example Normalization Example (cont)(cont)

empDept empMgr10 Levine

12 Jones

15 Jones

DeptempId empName salary empDeptE101 Jones 60000 10

E105 Adams 55000 12

E110 Rivera 43000 10E120 Tanaka 45000 15

Emp1

Third Normal Form

prijName projMgr budget startDateJupiter Smith 100000 01/15/04

Maxima Lee 200000 03/01/04

ProjprijName empId hours ratingJupiter E101 25 9

Jupiter E105 40

Jupiter E110 10 8Maxima E101 15Maxima E110 30Maxima E120 15

Work1

This is also BCNF since the only determinant in each relation is the primary key

Page 37: Chapter 5 Normalization

Properties of Properties of DecompositionsDecompositionsStarting with a universal relation that

contains all the attributes, we can decompose into relations by projection

A decomposition of a relation R is a set of relations {R1,R2,...,Rn} such that each Ri is a subset of R and the union of all of the Ri is R.

Desirable properties of decompositions◦ Attribute preservation - every attribute is in

some relation◦ Dependency preservation - see previous

example◦ Lossless decomposition - discussed later

Page 38: Chapter 5 Normalization

Dependency PreservationDependency PreservationIf R is decomposed into {R1,R2,

…,Rn,} so that for each functional dependency X→Y all the attributes in X Y appear in the same relation, Ri, then all FDs are preserved

Allows DBMS to check each FD constraint by checking just one table for each

Page 39: Chapter 5 Normalization

ExampleExampleNewFac (facName, dept, office, rank, dateHired)

FDs:office → deptfacName,dept → office, rank, dateHired facName,office → dept, rank, dateHired

NewFac is not BCNF because office is not a superkey To make it BCNF, remove the dependent attributes to a new relation,

with the determinant as the key Project intoFac1 (office, dept)Fac2 (facName, office, rank, dateHired)

Note we have lost a functional dependency in Fac2 – no longer able to see that {facName, dept} is a determinant, since they are in different relations

Sometimes more important to maintain functional dependencies that it is to get the relation in BCNF

Page 40: Chapter 5 Normalization

Multi-valued DependencyMulti-valued DependencyIn R(A,B,C) if each A values has

associated with it a set of B values and a set of C values such that the B and C values are independent of each other, then A multi-determines B and A multi-determines C

Multi-valued dependencies occur in pairsExample: JointAppoint(facId, dept,

committee) assuming a faculty member can belong to more than one department and belong to more than one committee

Table must list all combinations of values of department and committee for each facId

Page 41: Chapter 5 Normalization

4NF4NFA table is 4NF if it is BCNF and has no multi-

valued dependenciesExample: remove MVDs in JointAppoint

◦ A faculty member can be a member of more than one department

◦ A faculty member can be a member of more than one committee

◦ facId > dept because the set of dept associated with facId is independent of the set of committee associated with facId (the faculty member’s department is independent of the faculty member’s committee)

The new relationsAppoint1(facId,dept)Appoint2(facId,committee)

Page 42: Chapter 5 Normalization

Lossless DecompositionLossless DecompositionA decomposition of R into {R1,

R2,....,Rn} is lossless if the natural join of R1, R2,...,Rn produces exactly the relation R

No spurious tuples are created when the projections are joined.

Always possible to find a BCNF decomposition that is lossless

Page 43: Chapter 5 Normalization

Example of Lossy Example of Lossy ProjectionProjectionOriginal EmpRoleProj table:

EmpName role projNameSmith designer NileSmith programmer AmazonSmith designer AmazonJones designer Amazon

Project into Table a Table bEmpName role role projNameSmith designer designer NileSmith programmer programmer AmazonJones designer designer Amazon

Joining Table a and Table b producesEmpName role projNameSmith designer NileSmith designer AmazonSmith programmer AmazonJones designer Nile spurious tupleJones designer Amazon

Page 44: Chapter 5 Normalization

Lossless ProjectionsLossless ProjectionsLossless property guaranteed if for each pair of

relations that will be joined, the set of common attributes is a superkey of one of the relations

Binary decomposition of R into {R1,R2} lossless iff one of these holds

R1 ∩ R2 → R1 - R2or R1 ∩ R2 → R2 - R1

For n relations in decomposition, test by general Algorithm to Test for Lossless Join, Section 5.6.3

If projection is done by successive binary projections, can apply binary decomposition test repeatedly

Page 45: Chapter 5 Normalization

Algorithm to Test for Algorithm to Test for Lossless JoinLossless Join Given a relation R(A1,A2,…An), a set of functional dependencies, F, and a decomposition of R into Relations R1, R2, …Rm, to determine whether the decomposition has a lossless join◦ Construct an m by n table, S, with a column for each of the n

attributes in R and a row for each of the m relations in the decomposition

◦ For each cell S(I,j) of S, If the attribute for the column, Aj, is in the relation for the row, Ri, then

place the symbol a(j) in the cell else place the symbol b(I,j)there◦ Repeat the following process until no more changes can be

made to Sc for each FD X Y in F For all rows in S that have the same symbols in the columns

corresponding to the attributes of X, make the symbols for the columns that represent attributes of Y equal by the following rule: If any row has an a value,. A(j), then set the value of that column in all the other rows equal to

a(j) If no row ahs an a value, then pick any one of the b values, say b(I,j), and set all the other rows

equal to b(I,j)

◦ If, after all possible changes have been made to S, a row is made up entirely of a symbols, a(1, a(2, …,a(n), then the join is lossless. If there is no such row, the join is lossy.

Page 46: Chapter 5 Normalization

5NF and DKNF5NF and DKNFA relation is 5NF if there are no

remaining non-trivial lossless projections

A relation is in Domain-Key Normal Form (DKNF) if every constraint is a logical consequence of domain constraints or key constraints◦Every constraint means the normal

constraints (functional dependencies, etc.) as well as general constraints (specific to the data, e.g., first digit of stuId indicate student status (1 for <30 hours, 2 for 30 to 59, …))

Page 47: Chapter 5 Normalization

De-normalizationDe-normalizationWhen to stop the normalization

process◦When applications require too many

joins◦When you cannot get a non-loss

decomposition that preserves dependencies

Page 48: Chapter 5 Normalization

Inference Rules for FDsInference Rules for FDsArmstrong’s Axioms

◦ Reflexivity If B is a subset of A, then A → B..◦ Augmentation If A → B, then AC → BC.◦ Transitivity If A → B and B → C, then A → C

Additional rules that follow:◦ Additivity If A → B and A → C, then A → BC◦ Projectivity If A → BC, then A → B and A → C◦ Pseudotransitivity If A → B and CB → D, then

AC → D

Page 49: Chapter 5 Normalization

Closure of Set of FDsClosure of Set of FDsIf F is a set of functional

dependencies for a relation R, then the set of all functional dependencies that can be derived from F, F+, is called the closure of F

Could compute closure by applying Armstrong’s Axioms repeatedly

Very complex and take time

Page 50: Chapter 5 Normalization

Closure of an AttributeClosure of an AttributeIf A is an attribute or set of attributes of

relation R, all the attributes in R that are functionally dependent on A in R form the closure of A, A+

Computed by Closure Algorithm for A, Section 5.7.3result ← A;while (result changes) do

for each functional dependency B → C in F if B is contained in result then result ←

result C;end;A+ ← result;

Page 51: Chapter 5 Normalization

Uses of Attribute ClosureUses of Attribute ClosureCan determine if A is a superkey-

if every attribute in R functionally dependent on A

Can determine whether a given FD X→Y is in the closure of the set of FDs. (Find X+, see if it includes Y)

Page 52: Chapter 5 Normalization

Redundant FDs and Redundant FDs and CoversCovers

Given a set of FDs, can determine if any of them is redundant, i.e. can be derived from the remaining FDs, by a simple algorithm – see Section 5.7.4

If a relation R has two sets of FDs, F and G◦then F is a cover for G if every FD in G

is also in F+

◦F and G are equivalent if F is a cover for G and G is a cover for F (i.e. F+ = G+)

Page 53: Chapter 5 Normalization

Minimal Set of FDsMinimal Set of FDsSet of FDs, F is minimal if

◦The right side of every FD in F has a single attribute (called standard or canonical form)

◦No attribute in the left side of any FD is extraneous

◦F has no redundant FDs

Page 54: Chapter 5 Normalization

Minimal Cover for Set of Minimal Cover for Set of FDsFDsA minimal cover for a set of FDs

is a cover such that no proper subset of itself is also a cover

A set of FDs may have several minimal covers

See Algorithm for Finding a Minimal Cover, Section 5.7.7

Page 55: Chapter 5 Normalization

Decomposition Algorithm for Decomposition Algorithm for BCNFBCNFCan always find a lossless

decomposition that is BCNF◦ Find a FD that is a violation of BCNF and

remove it by decomposition process◦ Repeat this process until all violations are

removed◦ See algorithm, Section 5.7.8

No need to go through 1NF, 2NF, 3NF process

Not always possible to preserve all FDs

Page 56: Chapter 5 Normalization

Synthesis Algorithm for Synthesis Algorithm for 3NF3NFCan always find 3NF decomposition

that is lossless and that preserves all FDs

3NF Algorithm uses synthesis◦Begin with universal relation and set of

FDs,G◦Find a minimal cover for G◦Combine FDs that have the same

determinant◦Include a relation with a key of R◦See algorithm, Section 5.7.7