chapter 12 further normalization i 1nf, 2nf, 3nf, bcnf

29
Chapter 12 Further Normalization I 1NF, 2NF, 3NF, BCNF

Upload: jeremy-mitchell

Post on 02-Jan-2016

346 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Chapter 12 Further Normalization I 1NF, 2NF, 3NF, BCNF

Chapter 12

Further Normalization I

1NF, 2NF, 3NF, BCNF

Page 2: Chapter 12 Further Normalization I 1NF, 2NF, 3NF, BCNF

Topics in this Chapter

• Nonloss Decomposition and Functional Dependencies

• First, Second, and Third Normal Forms• Dependency Preservation• Boyce/Codd Normal Form• A Note on Relation-Valued Attributes

Page 3: Chapter 12 Further Normalization I 1NF, 2NF, 3NF, BCNF

Normalization and Database Design 

• The “normal forms represent stages in achieving a more desirable design.

 (“More desirable” means being more robust, having greater integrity.)

 • First normal form ( 1NF ) is what we

achieved by specifying that relations contain single valued attributes only (each tuple has exactly one value for each attribute).

 • So, relations are always in (at least) 1NF. 

Page 4: Chapter 12 Further Normalization I 1NF, 2NF, 3NF, BCNF

Normalization and Database Design 

• Additional constraints that produce “further normalization” lead to one of the other designations ( 2NF, 3NF, etc.)

  • Each “higher” normal form (2nd, 3rd, etc.)

includes the previous ones—i.e., to be in “third normal form” means that the data is also in 2nd and in 1st.

Page 5: Chapter 12 Further Normalization I 1NF, 2NF, 3NF, BCNF

Normalization

• Normalized and 1 NF are the same thing; Frequently “normalized” is used to refer (incorrectly) to 3NF

• Normalization helps control redundancy• Normalization is reversible; i.e. nonloss,

or information preserving• Six normal forms are discussed: 1

through 5, and Boyce-Codd Normal Form (BCNF), which is an improvement on 3NF

Page 6: Chapter 12 Further Normalization I 1NF, 2NF, 3NF, BCNF

First Normal Form

• A relvar is in 1NF if and only if in every legal value of that relvar, every tuple contains exactly one value for each attribute

• In this way, relvars are always in 1NF• A relvar in 1NF may display functional

dependencies other than those emanating from the primary key

• Such non-primary-key dependencies promote a miasma of update anomolies

Page 7: Chapter 12 Further Normalization I 1NF, 2NF, 3NF, BCNF

S+------+-------+--------+--------+| snum | sname | status | city |+------+-------+--------+--------+| S1 | Smith | 20 | London || S2 | Jones | 10 | Paris || S3 | Blake | 30 | Paris || S4 | Clark | 20 | London || S5 | Adams | 30 | Athens |+------+-------+--------+--------+

SP+------+------+------+| snum | pnum | qty |+------+------+------+| S1 | P1 | 300 || S1 | P2 | 200 || S1 | P3 | 400 || S1 | P4 | 200 || S1 | P5 | 100 || S1 | P6 | 100 || S2 | P1 | 300 || S2 | P2 | 400 || S3 | P2 | 200 || S4 | P2 | 200 || S4 | P4 | 300 || S4 | P5 | 400 |+------+------+------+

P+------+-------+-------+--------+--------+| pnum | pname | color | weight | city |+------+-------+-------+--------+--------+| P1 | Nut | Red | 12.0 | London || P2 | Bolt | Green | 17.0 | Paris || P3 | Screw | Blue | 17.0 | Rome || P4 | Screw | Red | 14.0 | London || P5 | Cam | Blue | 12.0 | Paris || P6 | Cog | Red | 19.0 | London |+------+-------+-------+--------+--------+

The suppliers and parts database

Just “looks” right

--because it is.

Satisfies all normal forms.

Page 8: Chapter 12 Further Normalization I 1NF, 2NF, 3NF, BCNF

+------+--------+------+------+| snum | scity | pnum | qty |+------+--------+------+------+| S1 | London | P1 | 300 || S1 | London | P2 | 200 || S2 | Paris | P1 | 300 || S2 | Paris | P2 | 400 || S3 | Paris | P2 | 200 || S4 | London | P2 | 200 || S4 | London | P4 | 300 || S4 | London | P5 | 400 |+------+--------+------+------+

The table “SCP”

recording supplier city in SCP rather than in S

redundancy!

update problems:

how to change S4’s city (in three places)

how to record the city of a new supplier for whom there are no shipments?primary key

Page 9: Chapter 12 Further Normalization I 1NF, 2NF, 3NF, BCNF

Second Normal Form

• A relation violates 2NF if a non-key field is a fact about a subset of a key.

 

 • A relation satisfies 2NF (is in 2NF) if it is in

1NF and every non-key attribute is irreducibly dependent on the primary key.

(i.e., dependent on the entire primary key)

Page 10: Chapter 12 Further Normalization I 1NF, 2NF, 3NF, BCNF

Second Normal Form

• A relvar is in 2NF if and only if it is in 1NF and every nonkey attribute is irreducibly dependent on the primary key

• (Assumes only one candidate key)• A relvar in 2NF is less susceptible to

update anomalies, but may still exhibit transitive dependencies

• Both attributes in a transitive dependency are irreducibly implied by the primary key, and each implies the other

Page 11: Chapter 12 Further Normalization I 1NF, 2NF, 3NF, BCNF

+--------+-----------+--------+------------+| Emp_Id | Emp_Name | Dept# | DeptName | +--------+-----------+--------+------------|| A001 | Johnson | 10 | Accounting | | A023 | Chung | 10 | Accounting | | C085 | Allen | 10 | Accounting || B120 | Gomez | 20 | Sales || B211 | Davis | 20 | Sales || A227 | Greenberg | 40 | Production || C340 | Brown | 40 | Production || C389 | Lopez | 40 | Production || C395 | Clark | 40 | Production || A502 | Edwards | 20 | Sales || A616 | Scott | 40 | Production || A700 | Sanyo | 60 | Delivery || A722 | Adams | 20 | Sales |+--------+-----------+--------+------------+

The table “Employees”

In 1NF, but not good

REDUNDANCY!

And

update problems:

change name of a department? (multiple updates required)

eliminate employee Sanyo? (what is the name of Dept 60?)

Page 12: Chapter 12 Further Normalization I 1NF, 2NF, 3NF, BCNF

Update Anomalies

• “Update anomalies” include three operations:• An INSERT anomaly occurs when the user

wishes to record a subordinate fact that is not dependent on the primary key (e.g., recording a supplier location before the supplier supplies a part)

• A DELETE anomaly, conversely, may delete the location inadvertently

• An UPDATE anomaly occurs when many updates are required to record a simple fact

Page 13: Chapter 12 Further Normalization I 1NF, 2NF, 3NF, BCNF

+--------+-----------+--------+------------+| Emp_Id | Emp_Name | Dept# | DeptName | +--------+-----------+--------+------------|| A001 | Johnson | 10 | Accounting | | A023 | Chung | 10 | Accounting | | C085 | Allen | 10 | Accounting || B120 | Gomez | 20 | Sales || B211 | Davis | 20 | Sales || A227 | Greenberg | 40 | Production || C340 | Brown | 40 | Production || C389 | Lopez | 40 | Production || C395 | Clark | 40 | Production || A502 | Edwards | 20 | Sales || A616 | Scott | 40 | Production || A700 | Sanyo | 60 | Delivery || A722 | Adams | 20 | Sales |+--------+-----------+--------+------------+

The table “Employees”

a transitive dependency

Emp_Id Dept#

Dept# DeptName

Emp_Id transitively determines DeptName

Emp_Id DeptName

Page 14: Chapter 12 Further Normalization I 1NF, 2NF, 3NF, BCNF

Mutually independent keys

Non-key attributes are “mutually independent” if no such key is functionally dependent on any combination of the others (assuming only one candidate key).

 

Mutually independent => no transitive dependencies, such as

 

Emp# → Dept# Dept# → DeptName

 

Page 15: Chapter 12 Further Normalization I 1NF, 2NF, 3NF, BCNF

Third Normal Form

• A relation violates 3NF if some non-key attribute is a fact about another non-key attribute.

 • A relation is in 3NF if it is in 2NF and the non-key

attributes are mutually independent.

 • A relation satisfies 3NF if it is in 2NF (and

therefore also in 1NF) and every attribute is either part of the key or provides a fact about the key (all of it) and nothing else.

Page 16: Chapter 12 Further Normalization I 1NF, 2NF, 3NF, BCNF

Third Normal Form

• A relvar is in 3NF if and only if it is in 2NF and every nonkey attribute is nontransitively dependent on the primary key

• (Assumes only one candidate key)• The process of normalization is a series of

projections that eliminate complex functional dependencies

• Such projections must be able to be recombined via JOIN to form the original relvar

Page 17: Chapter 12 Further Normalization I 1NF, 2NF, 3NF, BCNF

Third Normal Form

 

A table is in 3NF if every column is either the key, or part of the key, or a fact about

 

the key, the whole key, and nothing but the key.

 

Page 18: Chapter 12 Further Normalization I 1NF, 2NF, 3NF, BCNF

Third Normal Form

• A relvar is in 3NF if and only if the nonkey attributes are both mutually independent and irreducibly dependent on the primary key

• A relvar is in 3NF if and only if, for all time, each tuple consists of a primary key value that identifies some entity, together with a set of zero or more mutually independent attribute values that describe that entity in some way

Page 19: Chapter 12 Further Normalization I 1NF, 2NF, 3NF, BCNF

Nonloss Decomposition and Functional Dependencies

• Normalization uses a process of projection to decompose relvars

• Recomposition is a process of joins• The decomposition of relvar R into

projections R1…Rn is nonloss if R = the join of R1…Rn

• The normalization procedure can be seen as a method for eliminating functional dependencies that do not emanate from a candidate key

Page 20: Chapter 12 Further Normalization I 1NF, 2NF, 3NF, BCNF

+------+--------+------+------+| snum | scity | pnum | qty |+------+--------+------+------+| S1 | London | P1 | 300 || S1 | London | P2 | 200 || S2 | Paris | P1 | 300 || S2 | Paris | P2 | 400 || S3 | Paris | P2 | 200 || S4 | London | P2 | 200 || S4 | London | P4 | 300 || S4 | London | P5 | 400 |+------+--------+------+------+

“SP”+------+--------+| snum | scity |+------+--------+

+------+------+------+| snum | pnum | qty |+------+------+------+

decompose by projection

the decomposition is lossless since a join of the two tables reproduces the original

The table “SCP”

“S”

Page 21: Chapter 12 Further Normalization I 1NF, 2NF, 3NF, BCNF

+--------+-----------+--------+------------+| Emp_Id | Emp_Name | Dept# | DeptName | +--------+-----------+--------+------------|| A001 | Johnson | 10 | Accounting | | A023 | Chung | 10 | Accounting | | C085 | Allen | 10 | Accounting || B120 | Gomez | 20 | Sales || B211 | Davis | 20 | Sales || A227 | Greenberg | 40 | Production || C340 | Brown | 40 | Production || C389 | Lopez | 40 | Production || C395 | Clark | 40 | Production || A502 | Edwards | 20 | Sales || A616 | Scott | 40 | Production || A700 | Sanyo | 60 | Delivery |+--------+-----------+--------+------------+

“Employees”

+--------+-----------+--------+| Emp_Id | Emp_Name | Dept# | +--------+-----------+--------+

+--------+------------+| Dept# | DeptName | +--------+------------|

decompose by projection

the decomposition is lossless since a join of the two tables reproduces the original

“Departments”“Employees”

Page 22: Chapter 12 Further Normalization I 1NF, 2NF, 3NF, BCNF

Dependency Preservation

• Dependency preservation refers to a specific case of nonloss decomposition, such that the normalized relvars are independent of each other

• Some nonloss decompositions do not exhibit dependency preservation

• Example: decompose supplier, city, status where supplier implies city and status, and city and status imply each other

Page 23: Chapter 12 Further Normalization I 1NF, 2NF, 3NF, BCNF

Dependency Preservation

• Dependency is preserved in this projection:

• SC {S#, CITY}• CS {CITY, STATUS}• Dependency is not preserved in this one:• SC {S#, CITY}• CS {S#, STATUS}• Although the second is nonloss, you still

cannot update them independently

Page 24: Chapter 12 Further Normalization I 1NF, 2NF, 3NF, BCNF

+------+--------+------+------+| snum | sname | pnum | qty |+------+--------+------+------+| S1 | Smith | P1 | 300 || S1 | Smith | P2 | 200 || S2 | Jones | P1 | 300 || S2 | Jones | P2 | 400 || S3 | Blake | P2 | 200 || S4 | Clark | P2 | 200 || S4 | Clark | P4 | 300 || S4 | Clark | P5 | 400 |+------+--------+------+------+

The table “SSP”

again, assume unique supplier names

+------+------+| snum | pnum | +------+------++--------+------+| sname | pnum |+--------+------+

candidate key:

candidate key:

qty

qty

snum snamebut:

obviously bad (redundancy, etc.) but satisfies 3NF:

every attribute is key, part of the key, or about key, whole key, nothing but key

violates BCNF

Page 25: Chapter 12 Further Normalization I 1NF, 2NF, 3NF, BCNF

Boyce/Codd Normal Form

• BCNF refers to decompositions involving relvars with more than one candidate key, where the candidate keys are composite and overlapping

• A relvar is in BCNF if and only if every nontrivial, left- irreducible FD has a candidate key as its determinant

• That is, a relvar is in BCNF if and only if every determinant is a candidate key

Page 26: Chapter 12 Further Normalization I 1NF, 2NF, 3NF, BCNF

+------+--------+------+------+| snum | sname | pnum | qty |+------+--------+------+------+| S1 | Smith | P1 | 300 || S1 | Smith | P2 | 200 || S2 | Jones | P1 | 300 || S2 | Jones | P2 | 400 || S3 | Blake | P2 | 200 || S4 | Clark | P2 | 200 || S4 | Clark | P4 | 300 || S4 | Clark | P5 | 400 |+------+--------+------+------+

“SP”+------+--------+| snum | sname |+------+--------+

+------+------+------+| snum | pnum | qty |+------+------+------+

decompose by projection

the decomposition is lossless since a join of the two tables reproduces the original

The table “SSP”

“S”

again, assume unique supplier names

Page 27: Chapter 12 Further Normalization I 1NF, 2NF, 3NF, BCNF

+--------+-----------+--------+------------+| Emp_Id | Emp_Name | Skill | Language | +--------+-----------+--------+------------|| A001 | Johnson | Cook | English | | A001 | Johnson | Cook | French | | A001 | Johnson | Cook | Spanish | | A001 | Johnson | Type | English | | A001 | Johnson | Type | French | | A001 | Johnson | Type | Spanish | | B211 | Davis | Weld | English || B211 | Davis | Weld | German | | B211 | Davis | Type | English | | B211 | Davis | Type | German |

etc.

+--------+-----------+--------+------------+

“Employees”

In BCNF, but lots of redundancy (violates 4NF)

(multi-valued dependencies)

again, the solution is projection--a skills table and a language table

Page 28: Chapter 12 Further Normalization I 1NF, 2NF, 3NF, BCNF

The Normalization Process

• The “normal forms” are simply formalisms for describing problems that usually are apparent and that can cause obvious problems.

• They are usually apparent in the form of redundancies, and common sense says to remove them.

• Removal is a process of projecting the offending (proposed) table into two or more tables (in a lossless way).

Page 29: Chapter 12 Further Normalization I 1NF, 2NF, 3NF, BCNF

Relation-Valued Attributes

• A relation may include attributes whose values are relations

• Traditionally this would be seen to violate 1NF, which was held to prohibit repeating groups

• Now they are theoretically sound, but in practice you should avoid them because they have complicated predicates