chapter 8: normal forms based on functional dependencies data modeling and database design

70
Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Upload: nancy-lisby

Post on 14-Dec-2015

227 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8:Normal Forms Based on Functional

Dependencies

Data Modeling and Database Design

Page 2: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 2

Normalization

• Normalization is a technique that facilitates systematic validation of the participation of attributes in a relation schema from a perspective of data redundancy.

• Normal Forms (NFs) provide a stepwise progression towards attaining the goal of a fully normalized relation schema that is guaranteed to be free of data redundancies that cause update anomalies from a functional dependency perspective.

Page 3: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 3

Desirable Versus Undesirable FDs

• Desirable FDs in a relation schema R are those where the determinant is a candidate key of R.

• Undesirable FDs in a relation schema R are those where the determinant is not a candidate key of R.– That is, the FDs will cause data redundancy and

the consequent modification anomalies in R.

Page 4: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 4

Normal Forms: An Overview

• A relation schema is said to be in a particular normal form if it satisfies certain prescribed criteria for that normal form.

• First normal form (1NF) reflects one of the

properties of a relation schema – i.e., by definition a relation schema is in 1NF.

• The normal forms associated with functional dependencies are second (2NF), third (3NF), and Boyce-Codd (BCNF) normal forms.

Page 5: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Normal Forms: An Overview (continued)

• The violations of each of these normal forms signal the presence of a specific type of ‘undesirable’ FD.– violation of a normal form, can be interpreted

as equivalent to an inadvertent mixing up of entity types belonging to two different entity classes in a single entity type.

Page 6: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 6

Normal Forms: The History

• E.F. Codd first proposed the 1NF, 2NF, and 3NF in 1972.

• Later it was discovered that under certain conditions (i.e., FDs) a relation schema in 3NF continues to have data redundancies causing modification anomalies.

• A revised, stronger definition of the 3NF was then proposed by Boyce and Codd in 1974, which came to be known as Boyce-Codd normal form (BCNF).

Page 7: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 7

First Normal Form (1NF)

• First normal form (1NF) imposes conditions so that a base relation that is physically stored as a file does not contain records with a variable number of fields.

• This is accomplished by prohibiting multi-valued attributes, composite attributes, and combinations thereof in a relation schema.

• Such a constraint, in effect, prevents relations from containing other relations.

• In essence, 1NF, by definition, requires that the domain of an attribute must include only atomic values and that the value of an attribute in a relation’s tuple must be a single value from the domain of that attribute.

Page 8: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 8

ALBUM

Album_no Artist_nm Price Stock

BS123 Britney Spears 17.95 1000 JT111 Justin Timberlake 17.95 1200

BTL007 {John Lennon, Paul McCartney, George Harrison, Ringo Star} 23.95 MJ100 Michael Jackson 17.95 JM456 John Mayer 16.95 1000 JM151 John Mayer 16.95 1000 MX789 Madonna 11.95 500 DJM237 {John Denver, Michael Jackson, Madonna} 11.95 2000 DR711 Diana Ross 12.95 1000 PM137 Paul McCartney 19.95

1NF Violation – An Example

Note 1: Album_no is the primary key of ALBUM

Note 2: Artist_nm is a multi-valued attribute causing a first normal form violation.

Page 9: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 9

NEW_ALBUM Album_no Artist_nm Price Stock

BS123 Britney Spears 17.95 1000 JT111 Justin Timberlake 17.95 1200

BTL007 John Lennon 23.95 BTL007 Paul McCartney 23.95 BTL007 George Harrison 23.95 BTL007 Ringo Star 23.95 MJ100 Michael Jackson 17.95 JM456 John Mayer 16.95 1000 JM151 John Mayer 16.95 1000 MX789 Madonna 11.95 500 DJM237 John Denver 11.95 2000 DJM237 Michael Jackson 11.95 2000 DJM237 Madonna 11.95 2000 DR711 Diana Ross 12.95 1000 PM137 Paul McCartney 19.95

Note 1: Artist_nm is no longer a multi-valued attribute Note 2: {Album_no, Artist_nm} is the primary key of ALBUM thus rendering Artist_nm a single-valued attribute and achieving 1NF in NEW_ALBUM.

Resolution of 1NF Violation

Page 10: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 10

Second Normal Form (2NF)

• A relation schema R is in 2NF if every non-prime attribute in R is fully functionally dependent on the primary key of R.

• This means a non-prime attribute is not functionally dependent on a proper subset of the primary key of R.

• The Second Normal Form (2NF) is based on a concept known as full functional dependency.

• A functional dependency of the form Z A is a ‘full functional dependency’ if and only if no proper subset of Z functionally determines A.

• In other words, if Z A and X A, and X is a proper subset of Z, then Z does not fully functionally determine A, i.e., Z A is not a full functional dependency; it is a partial dependency.

Page 11: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 11

Attribute APrimary

Key

Partial dependency (Violation of 2NF in R1)

Attribute ZAttribute Y

R1

Violation of 2NF

Note: The primary key and the attributes A, Y, and Z can be atomic or composite. A is a prime attribute, whereas Y and Z are non-prime attributes. In order for a partial dependency to exist here, Attribute A must be a proper subset of the primary key of R1.

Page 12: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 12

NEW_ALBUM Album_no Artist_nm Price Stock

BS123 Britney Spears 17.95 1000 JT111 Justin Timberlake 17.95 1200

BTL007 John Lennon 23.95 BTL007 Paul McCartney 23.95 BTL007 George Harrison 23.95 BTL007 Ringo Star 23.95 MJ100 Michael Jackson 17.95 JM456 John Mayer 16.95 1000 JM151 John Mayer 16.95 1000 MX789 Madonna 11.95 500 DJM237 John Denver 11.95 2000 DJM237 Michael Jackson 11.95 2000 DJM237 Madonna 11.95 2000 DR711 Diana Ross 12.95 1000 PM137 Paul McCartney 19.95

2NF Violation - An Example

What is the cause of the 2NF violation in NEW_ALBUM?

Page 13: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 13

2NF Violation Explained

• F: fd1: Album_no Price; fd2: Album_no Stock

• F+: F;fd12: Album_no {Price, Stock};fd12x: {Album_no, Artist_nm} {Price, Stock}

• Fc = F• Candidate Key of NEW_ALBUM:

(Album_no, Artist_nm);• Primary Key:

(Album_no, Artist_nm)• fd1 and fd2 (fd12) violate 2NF in NEW_ALBUM

Page 14: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 14

Modification Anomalies Resulting from 2NF Violation

• Suppose we want to change the value of Price or Stock of Album_no BTL007 in NEW_ALBUM

– Multipe tuples require update and failure to update some will change the semantics of the scenario

update anomaly

• Suppose we want to add a new tuple (Album_no: XY111, Price: 17.95, and Stock: 100) to NEW_ALBUM

– Unless we also know the artist(s) the insert is not possible since Artist_nm is part of the primary key of NEW_ALBUM

insertion anomaly

• Suppose we want to delete Album_no BTL007 from NEW_ALBUM

– Entails deletion of multiple tuples deletion anomaly

Page 15: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 15

Resolution of 2NF Violation

• The resolution of 2NF violation is a two-step process that decomposes the target relation schema with the undesirable FDs into multiple relation schemas such that the undesirable FDs are rendered desirable.

1. Pull out the undesirable FD(s) from the target relation schema as separate relation schema(s).

2. Retain the determinant of the pulled-out relation schema as an attribute(s) in the leftover target relation schema to facilitate reconstruction of the original target relation schema.

Page 16: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 16

ALBUM_INFO ALBUM_ARTIST Album_no Price Stock Album_no Artist_nm

BS123 17.95 1000 BS123 Britney Spears JT111 17.95 1200 JT111 Justin Timberlake

BTL007 23.95 BTL007 John Lennon MJ100 17.95 BTL007 Paul McCartney JM456 16.95 1000 BTL007 George Harrison JM151 16.95 1000 BTL007 Ringo Star MX789 11.95 500 MJ100 Michael Jackson DJM237 11.95 2000 JM456 John Mayer DR711 12.95 1000 JM151 John Mayer PM137 19.95 MX789 Madonna

DJM237 John Denver DJM237 Michael Jackson DJM237 Madonna DR711 Diana Ross PM137 Paul McCartney

Resolution of 2NF Violation Demonstrated

Note: Album_no Price Album_no Stock

Note: No non-trivial FD present

Page 17: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 17

Third Normal Form – 3NF

• A relation schema R is in 3NF if it is in 2NF and no non-prime attribute is functionally dependent on another non-prime attribute in R.

• The Third Normal Form (3NF) is based on the concept of transitive dependency.

– Given a relation schema R (X, A, B) where • X, A, and B are pair-wise disjoint atomic or

composite attributes, • X is the primary key of R, and • A and B are non-prime attributesIf A B (or B A) in R, then B (or A) is said to

be ‘transitively dependent’ on X, the primary key of R.

Page 18: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 18

Attribute XPrimary

Key

Transitive dependency (Violation of 3NF in R2)

Attribute ZAttribute Y

R2

Violation of 3NF

Note: X, Y, and Z can be atomic or composite attributes. Each is a non-prime attribute.

Page 19: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 19

Transitive dependency (Violation of 3NF in R3)

Attribute YAttribute A Attribute ZPrimary Key

Attribute X

R3

Another Possible Violation of 3NF

Note: The primary key and the attributes A, X, Y, and Z can be atomic or composite. A is a prime attribute, whereas X, Y, and Z are non-prime attributes. The fact that the non-prime attribute Y is functionally dependent on the non-prime attribute X constitutes the third normal form violation.

Page 20: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 20

FLIGHT Flight# Origin Destination Mileage DL507 Seattle Denver 1537 DL123 Chicago Dallas 1058 DL723 Boston St. Louis 1214 DL577 Denver Los Angeles 1100

DL5219 Minneapolis St. Louis 580 DL357 Chicago Dallas 1058 DL555 Denver Houston 1100

DL5237 Cleveland St. Louis 580 DL5271 Chicago Cleveland 300

3NF Violation – An Example

fd1: Flight# Origin;fd2: Flight# Destination;fd3: (Origin, Destination) Mileage

Question: What is it about the following fds that allows FLIGHT to be free of a second normal form violation but contain a third normal form violation?

Page 21: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 21

3NF Violation Explained

Given R (Flight#, Origin, Destination) and F prevailing over R where

• F: fd1: Flight# Origin;fd2: Flight# Destination;fd3: {Origin, Destination} Mileage

• F+: F;fd12: Flight# {Origin, Destination};fd3x: Flight# Mileage; fd123: Flight# {Origin, Destination, Mileage}

• Fc = F• Candidate Key of FLIGHT: Flight#• Primary Key of FLIGHT:Flight#• A 2NF violation requires presence of partial dependency

– Note: when the primary key is an atomic attribute, a partial dependency is impossible.

• fd3 violates 3NF in FLIGHT.

Page 22: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

• Suppose we want to add the information

(Origin: Cincinnati, Destination: Houston, and Mileage: 1100) to FLIGHT

– It is not possible to do so without this route assigned to a Flight #

insertion anomaly

• Suppose Flight# DL507 is removed from FLIGHT

– The information that Seattle to Denver is 1537 miles is inadvertently lost

deletion anomaly

Modification Anomalies Resulting from 3NF Violation

Page 23: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 23

Resolution of 3NF Violation

• The resolution of a 3NF violation is accomplished by applying the same two-step process used earlier to resolve the 2NF violation. The two-step process is:

– Pull out the undesirable FD(s) from the target

relation schema as a separate relation schema.

– Retain the determinant of the pulled-out relation schema as an attribute(s) in the leftover target relation schema to facilitate reconstruction of the original target relation schema.

Page 24: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 24

FLIGHT DISTANCE

Flight# Origin Destination Origin Destination Mileage DL507 Seattle Denver Seattle Denver 1537 DL123 Chicago Dallas Chicago Dallas 1058 DL723 Boston St. Louis Boston St. Louis 1214 DL577 Denver Los Angeles Denver Los Angeles 1100

DL5219 Minneapolis St. Louis Minneapolis St. Louis 580 DL357 Chicago Dallas Denver Houston 1100 DL555 Denver Houston Cleveland St. Louis 580

DL5237 Cleveland St. Louis Chicago Cleveland 300 DL5271 Chicago Cleveland

Resolution 3NF Violation Demonstrated

Note: fd1: Flight# Origin fd2: Flight# Destination

Note: fd3: (Origin, Destination) Mileage

Page 25: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 25

Boyce-Codd Normal Form (BCNF)

• Data redundancies and the consequent update anomalies due to functional dependencies may persist even after a relation schema R is normalized to 3NF

– Then, R is said to violate BCNF

• Condition conducive to modification anomalies in a 3NF Relation Schema– if a relation schema has at least two candidate keys, – if any two candidate keys are composite attributes, and – if there is an attribute overlap between the two

candidate keys • Presence of above condition need not necessarily trigger

modification anomalies or BCNF violation

Page 26: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 26

BCNF Defined

• A relation schema R is in BCNF if for every non-trivial functional dependency in R, the determinant is a superkey of R– Remember:

An FD in R is trivial if and only if the dependent is a subset of the determinant.

• Note: By this definition, violation of 2NF or 3NF also imply violation of BCNF

Page 27: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 27

Attribute APrimary Key

Transitive dependency (Violation of BCNF in R4)[Note: R4 is in 3NF]

Attribute Z

R4

Attribute Y

Violation of BCNF

Note: The primary key and the attributes A, Y, and Z can be atomic or composite. A is a prime attribute, whereas Y and Z are non-prime attributes. The fact that Y is a determinant here but not a candidate key of R4 constitutes the Boyce-Codd Normal Form violation.

Page 28: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 28

STU_SUB Stu# Subject Teacher Ap_score

IH123 Chemistry Raturi 4 IH123 English Stephan 4 IH235 History Walker 5 IH357 English Campbell 4 IH571 Chemistry Raturi 3 IH235 English Campbell 4

BCNF Violation – An Example

Question: What is it about the following fds that allows STU_SUB to be free of second and third normal form violations but contain a BCNF violation?

fd1: (Stu#, Subject) Teacherfd2: (Stu#, Subject) Ap_scorefd3: Teacher Subject

Page 29: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 29

BCNF Violation Explained

• F: fd1: {Stu#, Subject} Teacher;fd2: {Stu#, Subject} Ap_score;fd3: Teacher Subject

• Candidate Keys of STU_SUB are:(Stu#, Subject); (Stu#, Teacher)

• Primary Key of STU_SUB: (Stu#, Subject) [Chosen for this example]

• fd3 violates BCNF since Teacher is not a candidate key of STU_SUB.

• Note: In the absence of fd3, the overlapping composite keys do not violate BCNF

Page 30: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 30

• Suppose we want to add a new Teacher for a Subject (e.g., Teacher: ‘Salter’, Subject: ‘English’) – The addition is not possible without a Stu#

associated with this insertion since Stu# is part of the primary key of STU_SUB

insertion anomaly

• Suppose we want to replace Campbell with Smith– Multiple tuples require update and failure to

update even one changes the semantics of the scenario

update anomaly

Modification Anomalies Resulting from BCNF Violation

Page 31: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 31

Resolution of a BCNF Violation

• The two-step process is: – Pull out the undesirable FD(s) from the

target relation schema as a separate relation schema.

– Retain the determinant of the pulled-out relation schema as an attribute(s) in the leftover target relation schema to facilitate reconstruction of the original target relation schema.

Page 32: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 32

TEACH_SUB STU_AP Teacher Subject Stu# Teacher Ap_score

Raturi Chemistry IH123 Raturi 4 Stephan English IH123 Stephan 4 Walker History IH235 Walker 5 Campbell English IH357 Campbell 4 IH571 Raturi 3 IH235 Campbell 4

Resolution of BCNF Violation Demonstrated

Note: Teacher Subject

Note: (Stu#, Teacher) Ap_score

Page 33: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 33

Possible Side Effects of Normalization

• Attribute Preservation– i.e., no attribute should be lost in the decomposition

process.

• Dependency Preservation– i.e., any decomposition should continue to preserve

the minimal cover of F.

• Lossless-Join (Non-additive or Non-loss Join) Property– i.e., the decomposition should be strictly reversible in

that the reversal should strictly yield the original target relation in tact with no loss of tuples or addition of spurious tuples.

Page 34: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Dependency Preservation Explained

• Any decomposition, D, of a relation schema, R, should continue to preserve the minimal cover (Fc) of F prevailing over R

– i.e., the union of the FDs that hold on individual relation schemas of D should be a cover for F

• Each FD in Fc should either directly appear in single relation schemas in D or be inferable from the FDs that appear in single relation schemas in D

• If there is a need to join two or more relation schemas in D to ascertain an FD in Fc, then D is not a dependency-preserving solution

• Failure to preserve specified FDs amounts to failure to honor the semantics of the specified business rules

Page 35: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 35

FLI GHT Flight# Origin Destination Mileage DL507 Seattle Denver 1537 DL123 Chicago Dallas 1058 DL723 Boston St. Louis 1214 DL577 Denver Los Angeles 1100

DL5219 Minneapolis St. Louis 580 DL357 Chicago Dallas 1058 DL555 Denver Houston 1100

DL5237 Cleveland St. Louis 580 DL5271 Chicago Cleveland 300

Dependency Preservation: An Example

F: fd1: Flight# Origin;fd2: Flight# Destination;fd3: {Origin, Destination} Mileage

F+: F;fd12: Flight# {Origin, Destination};fd3x: Flight# Mileage; fd123: Flight# {Origin, Destination, Mileage}

FLIGHT is in violation of 3NF

Page 36: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 36

Dependency-Preserving Solution

• D1 [R1, R2] (A 3NF Solution to R) where

– R1: FLIGHT (Flight#, Origin, Destination);

– R2: DISTANCE (Origin, Destination, Mileage)

• Is D1 attribute-preserving? Yes, since the union of all attributes in the decomposition D1 is exactly the same as the attributes in R

• Is D1 dependency-preserving?Yes, since fd1 and fd2 appear explicitly in FLIGHT (R1) and fd3 appears explicitly in DISTANCE (R2).

Page 37: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 37

FLI GHT DISTANCE Flight# Origin Destination Origin Destination Mileage DL507 Seattle Denver Seattle Denver 1537 DL123 Chicago Dallas Chicago Dallas 1058 DL723 Boston St. Louis Boston St. Louis 1214 DL577 Denver Los Angeles Denver Los Angeles 1100

DL5219 Minneapolis St. Louis Minneapolis St. Louis 580 DL357 Chicago Dallas Denver Houston 1100 DL555 Denver Houston Cleveland St. Louis 580

DL5237 Cleveland St. Louis Chicago Cleveland 300 DL5271 Chicago Cleveland Seattle Denver 1300 Illegal DDLL111111 SSeeaattttllee DDeennvveerr Legal

D1: A Dependency-Preserving Decomposition Demonstrated

Note: The shaded tuple in FLIGHT is a “legal” insertion. The Struck-out tuple in DISTANCE is an “illegal” insertion because a tuple with (Origin: Seattle, Destination: Denver) already exists in DISTANCE; and {Origin, Destination} is the primary key of DISTANCE. Observe that the addition of this tuple, if legal, will mean that the distance between the same two cities has two values thus failing to preserve fd3: {Origin, Destination} Mileage

Suppose we want to add a new flight (Flight#: DL111, Origin: Seattle, Destination: Denver, Mileage: 1300)

Page 38: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Three Possible 3NF Solutions for R

Given R (Flight#, Origin, Destination) and F prevailing over R where

• F: fd1: Flight# Origin;fd2: Flight# Destination;fd3: {Origin, Destination} Mileage

Solutions

• D1: R1: FLIGHT (Flight#, Origin, Destination);R2: DISTANCE (Origin, Destination, Mileage)

Note: D1 is a dependency-preserving solution

• D2: R1a: FLIGHT_A (Flight#, Origin, Destination);R2a: DISTANCE_A (Flight#, Mileage)

• D3: R1b: FLIGHT_B (Flight#, Mileage); R2b: DISTANCE_B (Origin, Destination, Mileage)

Page 39: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 39

Evaluation of the 3NF Solution D2 for Dependency Preservation

• D2: R1a: FLIGHT_A (Flight#, Origin, Destination);

R2a: DISTANCE_A (Flight#, Mileage)

• Is this decompositions attribute preserving? Yes, since the union of all attributes in D2 is exactly the same as the attributes in R.

• Is the decomposition D2 dependency preserving?No. It is impossible to deduce – fd3: (Origin, Destination) Mileage from D2

i.e., fd3 is not preserved in the solution D2

Page 40: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 40

FLI GHT_A DISTANCE_A Flight# Origin Destination Flight# Mileage DL507 Seattle Denver DL507 1537 DL123 Chicago Dallas DL123 1058 DL723 Boston St. Louis DL723 1214 DL577 Denver Los Angeles DL577 1100

DL5219 Minneapolis St. Louis DL5219 580 DL357 Chicago Dallas DL357 1058 DL555 Denver Houston DL555 1100

DL5237 Cleveland St. Louis DL5237 580 DL5271 Chicago Cleveland DL5271 300 DDLL111111 SSeeaattttllee DDeennvveerr Legal DL111 1300 Legal

D2: A Non-Dependency-Preserving Decomposition

Suppose we want to add a new flight (Flight#: DL111, Origin: Seattle, Destination: Denver, Mileage: 1300)

Note: Clearly, fd3: {Origin, Destination} Mileage is not preserved in in this decomposition since fd3 cannot be verified from either FLIGHT_A or DISTANCE_A. The consequence is demonstrated by the addition of tupes to FLIGHT_A and DESTINATION_A as desired. The shaded tuples in FLIGHT_A and DISTANCE_A are “legal” insertions. A join of the two tables reveals that the distance between the two cities Seattle and Denver has two values ( 1300 and 1537) thus proving the failure to preserve fd3 – i.e., the database has been contaminated.

Page 41: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 41

• D3: R1b: FLIGHT_B (Flight#, Mileage); R2b: DISTANCE_B (Origin, Destination, Mileage)

• Is the decomposition D3 attribute preserving? Yes, since the union of all attributes in D3 is exactly the same as the attributes in R

• Is the decomposition D3 dependency preserving?

No. It is impossible to derive

fd1: Flight# Origin and

fd2: Flight# Destination

from D3

i.e., fd1 and fd2 are not preserved in the solution D3

Evaluation of the 3NF Solution D3 for Dependency Preservation

Page 42: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Lossless-Join Explained

• A decomposition of a relation schema R should be strictly (losslessly) reversible.

• The term “loss” in “lossless” implies loss of information – not necessarily loss of tuples– generation of spurious tuples resulting from the natural join of the

projections of R entails loss of information

• Lossless-join property is always predicated on:– a set of FDs (F) prevailing over R– the premise that the join attributes in the decomposition are non-null

values

• Lossless Join is also referred to as Non-additive or Non-loss Join

Page 43: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Test of Lossless-Join Property

• A binary decomposition D: {R1, R2} of a relation schema, R, is a lossless-join decomposition with respect to a set of FDs, F that holds on R,

if and only if F+ contains:– either the FD (R1 ∩ R2) R1– or the FD (R1 ∩ R2) R2

• In other words, if the attribute(s) common to R1 and R2 contain a candidate key of either R1 or R2, then {R1, R2} is a lossless-join decomposition of R

Page 44: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 44

FLI GHT Flight# Origin Destination Mileage DL507 Seattle Denver 1537 DL123 Chicago Dallas 1058 DL723 Boston St. Louis 1214 DL577 Denver Los Angeles 1100

DL5219 Minneapolis St. Louis 580 DL357 Chicago Dallas 1058 DL555 Denver Houston 1100

DL5237 Cleveland St. Louis 580 DL5271 Chicago Cleveland 300

Lossless-Join: An Example

F: fd1: Flight# Origin;fd2: Flight# Destination;fd3: {Origin, Destination} Mileage

F+: F;fd12: Flight# {Origin, Destination};fd3x: Flight# Mileage; fd123: Flight# {Origin, Destination, Mileage}

FLIGHT is in violation of 3NF

Page 45: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Three Possible 3NF Solutions for R

Given R (Flight#, Origin, Destination) and F prevailing over R where

• F: fd1: Flight# Origin;fd2: Flight# Destination;fd3: {Origin, Destination} Mileage

Solutions

• D1: R1: FLIGHT (Flight#, Origin, Destination);R2: DISTANCE (Origin, Destination, Mileage)

Note: D1 is a dependency-preserving solution

• D2: R1a: FLIGHT_A (Flight#, Origin, Destination);R2a: DISTANCE_A (Flight#, Mileage)

• D3: R1b: FLIGHT_B (Flight#, Mileage); R2b: DISTANCE_B (Origin, Destination, Mileage)

Page 46: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 46

1< - - - R2 - - - > n D: R1: FLIGHT (Flight#, Origin, Destination); R2: DISTANCE ( Origin, Destination, Mileage) 1 R 1 FLIGHT DI STANCE {FLI GHT * DI STANCE}

Flight# Origin Destination Origin Destination Mileage Flight# Origin Destination Mileage DL507 Seattle Denver Seattle Denver 1537 DL507 Seattle Denver 1537 DL123 Chicago Dallas Chicago Dallas 1058 DL123 Chicago Dallas 1058 DL723 Boston St. Louis Boston St. Louis 1214 DL723 Boston St. Louis 1214 DL577 Denver Los Angeles Denver Los Angeles 1100 DL577 Denver Los Angeles 1100 DL5219 Minneapolis St. Louis Minneapolis St. Louis 580 DL5219 Minneapolis St. Louis 580 DL357 Chicago Dallas Denver Houston 1100 DL357 Chicago Dallas 1058 DL555 Denver Houston Cleveland St. Louis 580 DL555 Denver Houston 1100 DL5237 Cleveland St. Louis Chicago Cleveland 300 DL5237 Cleveland St. Louis 580 DL5271 Chicago Cleveland DL5271 Chicago Cleveland 300

D1: A 3NF Lossless-Join Solution

D1: R1: FLIGHT (Flight#, Origin, Destination);R2: DISTANCE (Origin, Destination, Mileage)

Note 1: Natural join of R1 and R2 strictly (losslessly) yields the original R

Note 2: D1 is also a dependency-preserving solution

Note 1: Natural join of R1 and R2 strictly (losslessly) yields the original R

Page 47: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 47

1 R2 n D: R1a: FLIGHT_A (Flight#, Origin, Destination); R2a: DISTANCE_A ( Flight, Mileage) 1 R 1 FLIGHT_A DI STANCE_A {FLIGHT_A * DI STANCE_A}

Flight# Origin Destination Flight Mileage Flight# Origin Destination Mileage DL507 Seattle Denver DL507 1537 DL507 Seattle Denver 1537 DL123 Chicago Dallas DL123 1058 DL123 Chicago Dallas 1058 DL723 Boston St. Louis DL723 1214 DL723 Boston St. Louis 1214 DL577 Denver Los Angeles DL577 1100 DL577 Denver Los Angeles 1100 DL5219 Minneapolis St. Louis DL5219 580 DL5219 Minneapolis St. Louis 580 DL357 Chicago Dallas DL357 1058 DL357 Chicago Dallas 1058 DL555 Denver Houston DL555 1100 DL555 Denver Houston 1100 DL5237 Cleveland St. Louis DL5237 580 DL5237 Cleveland St. Louis 580 DL5271 Chicago Cleveland DL5271 300 DL5271 Chicago Cleveland 300

D2: A 3NF Lossless-Join Solution

D2: R1a: FLIGHT_A (Flight#, Origin, Destination);R2a: DISTANCE_A (Flight#, Mileage)

Note 1: Natural join of R1a and R2a strictly (losslessly) yields the original R

Note 2: D2, however, is not a dependency-preserving decomposition since{Origin, Destination} Mileage is not preserved in this solution

Page 48: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 48

1 R2 n D: R1b: FLIGHT_B (Origin, Destination, Mileage); R2b: DISTANCE_A ( Flight, Mileage) 1 R 1 FLIGHT_B DI STANCE_B {FLIGHT_B * DI STANCE_B}

Origin Destination Mileage Flight Mileage Flight# Origin Destination Mileage Seattle Denver 1537 DL507 1537 DL507 Seattle Denver 1537

Chicago Dallas 1058 DL123 1058 DL123 Chicago Dallas 1058 Boston St. Louis 1214 DL723 1214 DL723 Boston St. Louis 1214 Denver Los Angeles 1100 DL577 1100 DL577 Denver Los Angeles 1100

Minneapolis St. Louis 580 DL5219 580 DL577 Denver Houston 1100 Denver Houston 1100 DL357 1058 DL5219 Minneapolis St. Louis 580

Cleveland St. Louis 580 DL555 1100 DL5219 Cleveland St. Louis 580 Chicago Cleveland 300 DL5237 580 DL357 Chicago Dallas 1058

DL5271 300 DL555 Denver Los Angeles 1100 DL555 Denver Houston 1100 DL5237 Minneapolis St. Louis 580 DL5237 Cleveland St. Louis 580 DL5271 Chicago Cleveland 300

Spurious tuples causing a loss-join decomposition are marked by shading

D3: A 3NF Loss-Join Solution

D3: R1b: FLIGHT_B (Flight#, Mileage); R2b: DISTANCE_B (Origin, Destination, Mileage)

Note 1: Natural join of R1b and R2b does not strictly (losslessly) yield theoriginal R

Note 2: D3 is not a dependency-preserving decomposition either sinceFlight {Origin, Destination} is not preserved in this solution

Page 49: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Normal Forms in a Nutshell

1NF 2NF 3NF BCNFModification Anomalies Yes Yes Yes No(due to FDs)

Lossless Join N/A Yes Yes YesDecomposition

Dependency Yes Yes Yes Yes when*Preservationon decomposition

* BCNF design that result from 2NF and 3NF resolutions requiring no further normalization.

Page 50: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Normalization: Solution Options

• Option 1:BCNF Lossless join Dependency preservation

The above is achieved only when a relational schema is in 3NF and there are no BCNF violations in the relational schema, because then the 3NF relational schema is also in BCNF

If the above design cannot be achieved, one may have to settle for:

• Option 2:3NF Lossless join Dependency preservation

• Option 1+: {presently, the best option} Option 1 plus a materialized view for each unpreserved FD in the minimal cover of F

Page 51: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Materialized View Defined

• A Materialized view (also known as a snapshot), despite the similarity in name is not a view. – Like a view, it is also derived via the evaluation of a specified

relational expression; – but, unlike a view a ‘materialized view’ is stored in the

database and refreshed on every update in the source relations from where the materialized view is generated

– i.e., it is maintained current by the DBMS

• Materialized views are used to:– freeze data as of a certain moment without preventing

updates to continue on the ‘real’ data– Note: Some application (e.g., accounting) often require

‘data’ as of a particular point in time (e.g., end of accounting cycle).

– sometimes it is desirable to save large amount of data resulting from complex queries from a certain period of time, again, without locking out updates on the source relations.

Page 52: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 52

Summary Notes on Normalization

• Normalization eliminates data redundancies that cause modification anomalies by appropriately decomposing the target relation schema.

– Undesirable FDs in the target relation schema are not discarded, but are rendered “desirable” in the decomposed relational schema.

• There is no specific merit in resolving 2NF violation before 3NF violation – the ordering is only historical.

• BCNF subsume 2NF and 3NF in that 2NF and 3NF violations are also BCNF violations.

– Likewise, a relation schema in BCNF is also in 2NF and 3NF.

• A BCNF solution that also has the lossless join property and is dependency preserving may not be achievable at all times.

– In this case a lossless join is preferred and– Dependency preservation is accomplished via application programs

or materialized views

Page 53: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 53

STOCK Store

Product

Price Quantity Location Discount Sq_ft Manager

15 Refrigerator 1850 120 Houston 5% 2300 Metzger 15 Dishwasher 600 150 Houston 5% 2300 Metzger 13 Dishwasher 600 180 Tulsa 10% 1700 Metzger 14 Refrigerator 1850 150 Tulsa 5% 1900 Schott 14 Television 1400 280 Tulsa 10% 1900 Schott 14 Humidifier 55 30 Tulsa 1900 Schott 17 Television 1400 10 Memphis 2300 Creech 17 Vacuum Cleaner 300 150 Memphis 5% 2300 Creech 17 Dishwasher 600 150 Memphis 5% 2300 Creech 11 Computer 180 Houston 10% 2300 Creech 11 Refrigerator 1850 120 Houston 5% 2300 Creech 11 Lawn Mower 300 Houston 2300 Creech

Figure 7.1c An instance of the relation schema, STOCK

Example Revised

Page 54: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 54

Sq_ft ManagerStore Location DiscountProduct Price QuantitySTOCK

Given URS STOCK (Store, Location, Sq_ft, Manager, Product, Price, Quantity, Discount) and F [fd1, fd2, fd3, fd4, fd5, fd6] where

fd1: Store Location; fd2: Store Sq_ftfd3: Store Manager fd4: Product Pricefd5: {Store, Product} Quantityfd6: Quantity Discount

fd1, fd2, fd3, and fd4 violate 2NF and fd6 violates 3NF.

Example Revised

Page 55: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 55

Decomposition of URS STOCK

• R1: STORE_LOC (Store, Location);• R2: STORE_SIZE (Store, Sq_ft);• R3: STORE_MGR (Store, Manager);• R4: PRODUCT (Product, Price);• R5: INVENTORY (Store, Product, Quantity,

Discount)

• R5 violates 3NF therefore…

R5a: DISC_STRUCTURE (Quantity, Discount);

R5b: INVENTORY (Store, Product, Quantity)

Page 56: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 56

A Parsimonious Consolidation of the Decomposition

• R123: STORE (Store, Location, Sq_ft, Manager)

• R4: PRODUCT (Product, Price);

• R5a: DISC_STRUCTURE (Quantity, Discount);

• R5b: INVENTORY (Store, Product, Quantity)

Page 57: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Denormalization

• Entails combining relations to enhance query efficiency

• Reintroduces data redundancies eliminated by normalization

• General misunderstanding:

Denormalization always improves data retrieval performance

Formal definition:

Replacing a set of often normalized relation schemas D {R1, R2, . . . Rn} by their join Rsuch that projection R over the set of attributes of R1, R2, . . . Rn respectively is guaranteed to yield the original set D.

Page 58: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 58

Review: Keys

• Primary key• Candidate key• Superkey • Foreign key• Alternate key• Surrogate key• Partial Key

Page 59: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 59

Figure 8.8b Dependency diagram for relation schema URS1

Sq_ft ManagerStore Location Product PriceURS1 Customer Address VendorBranch Type

Given F [fd1, fd2, fd3, fd4, fd5, fd6, fd7, fd8] that prevails over URS1 wherefd1: {Store, Branch} Location;fd2: Customer Address;fd3: Vendor Product;fd4: {Store, Branch} Sq_ft;fd5: Product Price;fd6: {Store, Branch} Manager;fd7: Manager {Store, Branch};fd8: Store Type

A Normalization Exercise

Page 60: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 60

Normalization

• Step 1: Identify the candidate keys of URS1, given F (Store, Branch, Customer, Vendor) and (Manager, Customer, Vendor)

• Step 2: Choose a primary key (Manager, Customer, Vendor)

• Step 3: Identify violations of normal forms for every FD in F

• Step 4: Resolve 2NF and 3NF violations and decompose

• Step 5: Resolve BCNF violations and decompose

Page 61: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Chapter 8 – Normal Forms Based on Functional Dependencies 61

Normalization (continued)

• Step 6:– Is the decomposition attribute preserving?– Is the decomposition dependency preserving?– Does the decomposition exhibit lossless-join

property?

• Cross-check the decomposition!

Page 62: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Reverse Engineering

• Common practice in hardware development intended to ‘discover’ how a particular product works

• Software engineering, over decades, has been preoccupied with developing “clearly understood” new systems

• A major concern of practitioners has been upgrading “poorly understood” old systems

• The criticality of the issue acknowledged only in the 1990s (CACM, May 1994)

• A working Definition of reverse enginering: working backward in a systematic manner

• Goal: Understand how existing software works in a system development environment

Page 63: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Reverse Engineering in a Database Environment

• Precursor to migration from legacy systems• Shed light on poorly documented (often, operational)

database systems

• In Data modeling:– Reverse engineering within the forward

engineering process– To understand if/how the original (conceptual)

schema is flawed so that occurrence of such flaws can be preemptively stopped

– Reverse engineer a normalized relational schema to discover conceptual modeling errors

Page 64: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Heuristic to Reverse Engineer a Relational Schema to ER Diagram

• Step 1 Translate the normalized relational schema to an information-preserving logical schema based on available information

• Step 2 Transform the information-preserving logical schema to a design-specific ER diagram – either coarse or fine granularity or a hybrid thereof depending on the available information

• Step 3 Abstract the design-specific ER diagram up to a presentation layer ERD

Page 65: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Reverse Engineering a Relation Schema Step 1

• Denote alternate keys as unique [Q]; Enclose composite attributes in braces [ ]

• Indicate the parent label on top of the foreign key along with the parent side structural constraints of the relationship type– default value for min and max when information not available are

0 an n respectively (see Section 6.7.2 of chapter 6 for grammar)

• Indicate child side structural constraints of the relationship type right below the foreign key along with the deletion rule– default value for min when information not available is 0 and

default deletion rule is R except when the cardinality ratio is 1:1; then, leave unfilled if rule not specified (see Section 6.7.2 of chapter 6 for grammar)

Page 66: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Reverse Engineering a Relation Schema Step 2

a. Map each logical scheme to an entity type

b. Represent the foreign key attribute(s) in a logical scheme by a relationship type in the ERDNote: The foreign key attribute is removed from the referencing entity type (Child) unless the attribute plays some other role also in the entity type

c. Map attributes of individual logical schemes to corresponding entity types in the ERD

Page 67: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Step 2aMap each logical scheme to an entity type

– If the primary key of a logical scheme Lx is a proper subset of the primary key of another logical scheme Ly, map Ly as a weak entity type in the ERD with Lx as its identifying parent

– If the primary key of a logical scheme Lx is a concatenation of the primary keys of multiple logical schemes Ly, Lz, etc., map Lx as a gerund entity type in the ERD with Ly, Lz, etc. as its identifying parents

– All other logical schemes are mapped as base entity types

Page 68: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Step 2bMap the foreign key attribute(s) to a relationship type

– Establish the relationship type

– connect the relationship type to the parent (referenced) and child (referencing) entity typesif the child is a weak entity type then the relationship type is mapped as an identifying relationship type

– Map the (min, max) to the appropriate edge of the relationship type

Page 69: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Step 2cMap attributes to entity types in the ERD

– Map atomic and composite attributes

– Underline unique identifiers

(The primary key and alternate keys are unique identifiers of the corresponding entity type)

– Partial key of a weak entity is denoted by a dotted underline

Page 70: Chapter 8: Normal Forms Based on Functional Dependencies Data Modeling and Database Design

Reverse Engineering a Relation Schema Step 3

• Transform gerund entity types to n-way relationship types. Attributes of the gerund entity type remain as attributes of the relationship type

• Any relationship with a gerund entity type is transformed to a relationship type with the cluster entity type that the gerund represents

• A weak entity type not participating in any relationship other than the identifying relationship is transformed to a multi-valued (atomic/composite) attribute of the parent entity type