unit 5: normal forms based on functional dependencies it238: data modeling and database design...

45
Unit 5: Normal Forms Based on Functional Dependencies IT238: Data Modeling and Database Design Instructor: Qing Yan, M.D., Ph.D.

Upload: lucinda-burke

Post on 03-Jan-2016

222 views

Category:

Documents


2 download

TRANSCRIPT

Unit 5:Normal Forms Based on Functional

Dependencies

IT238: Data Modeling and Database Design

Instructor: Qing Yan, M.D., Ph.D.

Week 3 Review: Assignment

Chapter 8 – Normal Forms Based on Functional Dependencies 2

Chapter 3 – Entity-Relationship Modeling 3

Notation

Figure 3.2 Summary of Presentation Layer ER diagram notationFrom: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology.

Chapter 3 – Entity-Relationship Modeling 4

Inclusive Arc Exclusive ArcNoninclusive Arc

R

R

m n Cardinality Ratio of m:n Chen's Notation

E1 Optionally Related To E2(Partial Participation of E1 in R)

E2 Mandatorily Related to E1(Total Participation of E2 in R)

[Existence Dependency of E2 in R]

E1 E2

E1 E2

RE1 E2 Cardinality Ratio of m:n Crow's Foot Notation

Figure 3.2 Summary of Presentation Layer ER diagram notation (continued)

Notation (continued)

Chapter 3 – Entity-Relationship Modeling 5

The Synthesis Approach to Generate an Initial Set of Entity Types and Attributes

• List all discernable data elements and treat them as attributes

• Group these attributes based on apparent commonalities

• Designate each cluster of attributes as an entity type• Review leftover data elements and investigate the

possibility that some of them serve as links among the entity types previously identified

• Designate these links as relationship types

From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology.

Chapter 3 – Entity-Relationship Modeling 6

The Analysis Approach to Generate an Initial Set of Entity Types and Attributes

• Begin by searching for things that can be labeled by singular nouns and call these things entity types

• Gather properties that appear to belong to individual entity types and label them as attributes of a particular entity type

• Be sensitive throughout the process to the identification of relationships among the various entity types

From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology.

Week 3 Assignment Rubric

Chapter 8 – Normal Forms Based on Functional Dependencies 7

Week 3 Assignment Answer

Chapter 8 – Normal Forms Based on Functional Dependencies 8

Week 4 Key Concepts

• Relational schema A set of relation schemas in a relational data model.

• Relation state A specific occurrence of a relation (or should we say relation schema).

• Superkey A set of one or more attributes, which taken collectively, uniquely identifies a tuple of a relation.

• Superset A set S1 is a superset of another set S2 if every element in S2 is in S1. S1 may have elements that are not in S2.

Chapter 8 – Normal Forms Based on Functional Dependencies 9From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology.

Week 4 Key Concepts

• Candidate key A superkey with no proper subsets that are superkeys.

• Uniqueness property The prohibition of the presence of identical values for the collection of attribute(s) that constitute a unique identifier.

• Primary key Chosen from among the candidate keys to serve the role of uniquely identifying tuples of a relation schema.

Chapter 8 – Normal Forms Based on Functional Dependencies 10From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology.

Week 4 Assignment

• 1. What is a relation schema? What is the difference between a relation, a relation schema, and a relational schema?

• A relation schema: the basic structure of a relation. It consists of a name, and a set of related attributes.

• When sets of values are assigned to the attributes that make up a relation schema, a relation is created. A relation is a relation schema with data.

• A relational schema is a set of relation schemas. A relational schema is also called a relational data model.

Chapter 8 – Normal Forms Based on Functional Dependencies 11From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology.

Unit 5 Objectives

• Apply the process of normalization to resolve data redundancy problems in a relation schema

• Demonstrate the need for dependency preservation during the normalization process

Chapter 1 – Database Systems: Architecture and Components12

Unit 5 To-Do List

• Complete the reading Textbook and Web

• Participate in the discussion board30 points

• Attend the introductory seminar or complete FLA quiz20 points

• Complete the unit assignment50 points

Chapter 1 – Database Systems: Architecture and Components13

Unit 5 Assignment Rubric

Chapter 8 – Normal Forms Based on Functional Dependencies 14

Key Concepts

• Normalization – The process of rendering undesirable functional

dependencies desirable. – A technique to facilitate systematic validation of

the participation of attributes in a relation schema from a perspective of data redundancy.

– First, second, third and Boyce-Codd normal forms.

Chapter 1 – Database Systems: Architecture and Components15

From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology.

Key Concepts

Normalization in the database design process– To render undesirable functional dependencies

desirable.

Chapter 8 – Normal Forms Based on Functional Dependencies 16From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology.

Key Concepts

• Functional dependencies – the source– technical translations of user-specified business

rules expressed as constraints in a relation schema.

Chapter 8 – Normal Forms Based on Functional Dependencies 17From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology.

Key Concepts

• First normal form A relation schema R where the attributes comprising the schema are atomic and single-valued.

• Second normal form A relation schema R where every non-prime attribute in R is fully functionally dependent on the primary key of R.

• Third normal form A relation schema R where no non-prime attribute is functionally dependent on another non-prime attribute in R.

Chapter 1 – Database Systems: Architecture and Components18

From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology.

Key Concepts

• Boyce-Codd normal form A relation schema R such that for every non-trivial functional dependency in R, the determinant is a superkey of R.

Chapter 1 – Database Systems: Architecture and Components19

From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology.

Key Concepts

• Reverse engineering The process of going from a lower level of abstraction (e.g., a normalized relational schema) to a higher level of abstraction (e.g., a Design-specific ER diagram).

Chapter 8 – Normal Forms Based on Functional Dependencies 20From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology.

Key Concepts

• The difference between a desirable and an undesirable functional dependency

– A desirable functional dependency is any functional dependency in a relation schema R where the determinant is a candidate key of R.

– Undesirable functional dependencies in a relation schema occur when the determinant of the functional dependency is not a candidate key of R.

– Functional dependencies of this nature are the seeds of data redundancies that lead to modification anomalies.

Chapter 8 – Normal Forms Based on Functional Dependencies 21From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology.

Chapter 8 – Normal Forms Based on Functional Dependencies 22

Normalization

• Normalization is a technique that facilitates systematic validation of the participation of attributes in a relation schema from a perspective of data redundancy.

• Normal Forms (NFs) provide a stepwise progression towards attaining the goal of a fully normalized relation schema that is guaranteed to be free of data redundancies that cause update anomalies from a functional dependency perspective.

From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology.

Chapter 8 – Normal Forms Based on Functional Dependencies 23

Desirable Versus Undesirable FDs

• Desirable FDs in a relation schema R are those where the determinant is a candidate key of R.

• Undesirable FDs in a relation schema R are those where the determinant is not a candidate key of R.– That is, the FDs will cause data redundancy and

the consequent modification anomalies in R.

From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology.

Chapter 8 – Normal Forms Based on Functional Dependencies 24

Normal Forms: An Overview

• A relation schema is said to be in a particular normal form if it satisfies certain prescribed criteria for that normal form.

• First normal form (1NF) reflects one of the

properties of a relation schema – i.e., by definition a relation schema is in 1NF.

• The normal forms associated with functional dependencies are second (2NF), third (3NF), and Boyce-Codd (BCNF) normal forms.

From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology.

Normal Forms: An Overview (continued)

• The violations of each of these normal forms signal the presence of a specific type of ‘undesirable’ FD.– violation of a normal form, can be interpreted

as equivalent to an inadvertent mixing up of entity types belonging to two different entity classes in a single entity type.

From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology.

Chapter 8 – Normal Forms Based on Functional Dependencies 26

First Normal Form (1NF)

• First normal form (1NF) imposes conditions so that a base relation that is physically stored as a file does not contain records with a variable number of fields.

• This is accomplished by prohibiting multi-valued attributes, composite attributes, and combinations thereof in a relation schema.

• Such a constraint, in effect, prevents relations from containing other relations.

• In essence, 1NF, by definition, requires that the domain of an attribute must include only atomic values and that the value of an attribute in a relation’s tuple must be a single value from the domain of that attribute.

From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology.

Chapter 8 – Normal Forms Based on Functional Dependencies 27

ALBUM

Album_no Artist_nm Price Stock

BS123 Britney Spears 17.95 1000 JT111 Justin Timberlake 17.95 1200

BTL007 {John Lennon, Paul McCartney, George Harrison, Ringo Star} 23.95 MJ100 Michael Jackson 17.95 JM456 John Mayer 16.95 1000 JM151 John Mayer 16.95 1000 MX789 Madonna 11.95 500 DJM237 {John Denver, Michael Jackson, Madonna} 11.95 2000 DR711 Diana Ross 12.95 1000 PM137 Paul McCartney 19.95

1NF Violation – An Example

Note 1: Album_no is the primary key of ALBUM

Note 2: Artist_nm is a multi-valued attribute causing a first normal form violation.

From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology.

Chapter 8 – Normal Forms Based on Functional Dependencies 28

NEW_ALBUM Album_no Artist_nm Price Stock

BS123 Britney Spears 17.95 1000 JT111 Justin Timberlake 17.95 1200

BTL007 John Lennon 23.95 BTL007 Paul McCartney 23.95 BTL007 George Harrison 23.95 BTL007 Ringo Star 23.95 MJ100 Michael Jackson 17.95 JM456 John Mayer 16.95 1000 JM151 John Mayer 16.95 1000 MX789 Madonna 11.95 500 DJM237 John Denver 11.95 2000 DJM237 Michael Jackson 11.95 2000 DJM237 Madonna 11.95 2000 DR711 Diana Ross 12.95 1000 PM137 Paul McCartney 19.95

Note 1: Artist_nm is no longer a multi-valued attribute Note 2: {Album_no, Artist_nm} is the primary key of ALBUM thus rendering Artist_nm a single-valued attribute and achieving 1NF in NEW_ALBUM.

Resolution of 1NF Violation

From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology.

Chapter 8 – Normal Forms Based on Functional Dependencies 29

Second Normal Form (2NF)

• A relation schema R is in 2NF if every non-prime attribute in R is fully functionally dependent on the primary key of R.

• This means a non-prime attribute is not functionally dependent on a proper subset of the primary key of R.

• The Second Normal Form (2NF) is based on a concept known as full functional dependency.

• A functional dependency of the form Z A is a ‘full functional dependency’ if and only if no proper subset of Z functionally determines A.

• In other words, if Z A and X A, and X is a proper subset of Z, then Z does not fully functionally determine A, i.e., Z A is not a full functional dependency; it is a partial dependency.

From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology.

Chapter 8 – Normal Forms Based on Functional Dependencies 30

Attribute APrimary

Key

Partial dependency (Violation of 2NF in R1)

Attribute ZAttribute Y

R1

Violation of 2NF

Note: The primary key and the attributes A, Y, and Z can be atomic or composite. A is a prime attribute, whereas Y and Z are non-prime attributes. In order for a partial dependency to exist here, Attribute A must be a proper subset of the primary key of R1.

From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology.

Chapter 8 – Normal Forms Based on Functional Dependencies 31

NEW_ALBUM Album_no Artist_nm Price Stock

BS123 Britney Spears 17.95 1000 JT111 Justin Timberlake 17.95 1200

BTL007 John Lennon 23.95 BTL007 Paul McCartney 23.95 BTL007 George Harrison 23.95 BTL007 Ringo Star 23.95 MJ100 Michael Jackson 17.95 JM456 John Mayer 16.95 1000 JM151 John Mayer 16.95 1000 MX789 Madonna 11.95 500 DJM237 John Denver 11.95 2000 DJM237 Michael Jackson 11.95 2000 DJM237 Madonna 11.95 2000 DR711 Diana Ross 12.95 1000 PM137 Paul McCartney 19.95

2NF Violation - An Example

What is the cause of the 2NF violation in NEW_ALBUM?

From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology.

Chapter 8 – Normal Forms Based on Functional Dependencies 32

2NF Violation Explained

• F: fd1: Album_no Price; fd2: Album_no Stock

• F+: F;fd12: Album_no {Price, Stock};fd12x: {Album_no, Artist_nm} {Price, Stock}

• Fc = F• Candidate Key of NEW_ALBUM:

(Album_no, Artist_nm);• Primary Key:

(Album_no, Artist_nm)• fd1 and fd2 (fd12) violate 2NF in NEW_ALBUM

From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology.

Chapter 8 – Normal Forms Based on Functional Dependencies 33

Modification Anomalies Resulting from 2NF Violation

• Suppose we want to change the value of Price or Stock of Album_no BTL007 in NEW_ALBUM

– Multipe tuples require update and failure to update some will change the semantics of the scenario

update anomaly

• Suppose we want to add a new tuple (Album_no: XY111, Price: 17.95, and Stock: 100) to NEW_ALBUM

– Unless we also know the artist(s) the insert is not possible since Artist_nm is part of the primary key of NEW_ALBUM

insertion anomaly

• Suppose we want to delete Album_no BTL007 from NEW_ALBUM

– Entails deletion of multiple tuples deletion anomaly

From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology.

Chapter 8 – Normal Forms Based on Functional Dependencies 34

Resolution of 2NF Violation

• The resolution of 2NF violation is a two-step process that decomposes the target relation schema with the undesirable FDs into multiple relation schemas such that the undesirable FDs are rendered desirable.

1. Pull out the undesirable FD(s) from the target relation schema as separate relation schema(s).

2. Retain the determinant of the pulled-out relation schema as an attribute(s) in the leftover target relation schema to facilitate reconstruction of the original target relation schema.

From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology.

Chapter 8 – Normal Forms Based on Functional Dependencies 35

ALBUM_INFO ALBUM_ARTIST Album_no Price Stock Album_no Artist_nm

BS123 17.95 1000 BS123 Britney Spears JT111 17.95 1200 JT111 Justin Timberlake

BTL007 23.95 BTL007 John Lennon MJ100 17.95 BTL007 Paul McCartney JM456 16.95 1000 BTL007 George Harrison JM151 16.95 1000 BTL007 Ringo Star MX789 11.95 500 MJ100 Michael Jackson DJM237 11.95 2000 JM456 John Mayer DR711 12.95 1000 JM151 John Mayer PM137 19.95 MX789 Madonna

DJM237 John Denver DJM237 Michael Jackson DJM237 Madonna DR711 Diana Ross PM137 Paul McCartney

Resolution of 2NF Violation Demonstrated

Note: Album_no Price Album_no Stock

Note: No non-trivial FD present

From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology.

Chapter 8 – Normal Forms Based on Functional Dependencies 36

Third Normal Form – 3NF

• A relation schema R is in 3NF if it is in 2NF and no non-prime attribute is functionally dependent on another non-prime attribute in R.

• The Third Normal Form (3NF) is based on the concept of transitive dependency.

– Given a relation schema R (X, A, B) where • X, A, and B are pair-wise disjoint atomic or

composite attributes, • X is the primary key of R, and • A and B are non-prime attributesIf A B (or B A) in R, then B (or A) is said to

be ‘transitively dependent’ on X, the primary key of R.

From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology.

Chapter 8 – Normal Forms Based on Functional Dependencies 37

BCNF Defined

• A relation schema R is in BCNF if for every non-trivial functional dependency in R, the determinant is a superkey of R– Remember:

An FD in R is trivial if and only if the dependent is a subset of the determinant.

• Note: By this definition, violation of 2NF or 3NF also imply violation of BCNF

From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology.

Chapter 8 – Normal Forms Based on Functional Dependencies 38

Possible Side Effects of Normalization

• Attribute Preservation– i.e., no attribute should be lost in the decomposition

process.

• Dependency Preservation– i.e., any decomposition should continue to preserve

the minimal cover of F.

• Lossless-Join (Non-additive or Non-loss Join) Property– i.e., the decomposition should be strictly reversible in

that the reversal should strictly yield the original target relation in tact with no loss of tuples or addition of spurious tuples.

From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology.

Lossless-Join Explained

• A decomposition of a relation schema R should be strictly (losslessly) reversible.

• The term “loss” in “lossless” implies loss of information – not necessarily loss of tuples– generation of spurious tuples resulting from the natural join of the

projections of R entails loss of information

• Lossless-join property is always predicated on:– a set of FDs (F) prevailing over R– the premise that the join attributes in the decomposition are non-null

values

• Lossless Join is also referred to as Non-additive or Non-loss Join

From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology.

Normal Forms in a Nutshell

1NF 2NF 3NF BCNFModification Anomalies Yes Yes Yes No(due to FDs)

Lossless Join N/A Yes Yes YesDecomposition

Dependency Yes Yes Yes Yes when*Preservationon decomposition

* BCNF design that result from 2NF and 3NF resolutions requiring no further normalization.From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology.

Chapter 8 – Normal Forms Based on Functional Dependencies 41

Summary Notes on Normalization

• Normalization eliminates data redundancies that cause modification anomalies by appropriately decomposing the target relation schema.

– Undesirable FDs in the target relation schema are not discarded, but are rendered “desirable” in the decomposed relational schema.

• There is no specific merit in resolving 2NF violation before 3NF violation – the ordering is only historical.

• BCNF subsume 2NF and 3NF in that 2NF and 3NF violations are also BCNF violations.

– Likewise, a relation schema in BCNF is also in 2NF and 3NF.

• A BCNF solution that also has the lossless join property and is dependency preserving may not be achievable at all times.

– In this case a lossless join is preferred and– Dependency preservation is accomplished via application programs

or materialized viewsFrom: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology.

Chapter 8 – Normal Forms Based on Functional Dependencies 42

Normalization

• Step 1: Identify the candidate keys of URS1, given F (Store, Branch, Customer, Vendor) and (Manager, Customer, Vendor)

• Step 2: Choose a primary key (Manager, Customer, Vendor)

• Step 3: Identify violations of normal forms for every FD in F

• Step 4: Resolve 2NF and 3NF violations and decompose

• Step 5: Resolve BCNF violations and decompose

From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology.

Chapter 8 – Normal Forms Based on Functional Dependencies 43

Normalization (continued)

• Step 6:– Is the decomposition attribute preserving?– Is the decomposition dependency preserving?– Does the decomposition exhibit lossless-join

property?

• Cross-check the decomposition!

From: Umanath, N.S., & Scamell, R. (2007). Data Modeling and Database Design. Boston: Thomson Course Technology.

Summary

• The process of normalization to resolve data redundancy problems in a relation schema

• The need for dependency preservation during the normalization process

Q & A

• Questions?