5 1 orange coast college business division cs/cis department fall 2004 cis 182 introduction to...

73
5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original Presentations Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel, 2004

Upload: evangeline-mccoy

Post on 27-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

5

1

Orange Coast CollegeBusiness Division

CS/CIS DepartmentFall 2004 CIS 182

Introduction to Database Concepts

InstructorDr. Martha Malaty

Text & Original PresentationsDatabase Systems: Design, Implementation, and

Management, Sixth Edition, Rob and Coronel, 2004

Page 2: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

5

2

Chapter 5

Normalization of Database Tables

Database Systems: Design, Implementation, and Management,

Sixth Edition, Rob and Coronel

Page 3: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

3

In this chapter, you will learn:

• What normalization is and what role it plays in the database design process

• About the normal forms 1NF, 2NF, 3NF, BCNF, and 4NF

• How normal forms can be transformed from lower normal forms to higher normal forms

• That normalization and ER modeling are used concurrently to produce a good database design

• That some situations require denormalization to generate information efficiently

Page 4: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

4

Database Tables & Normalization

• Good table structure is essential for good DB design• Normalization is the way to reach good table

structure• Normalization is the process for assigning attributes

to entities through evaluating and correcting table structures– Reduces data redundancies– Helps eliminate data anomalies– Produces controlled redundancies to link tables– Works through sequence of stages (normal forms)

Page 5: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

5

Database Tables & Normalization

• Normalization works through a series of stages called normal forms:

• First normal form (1NF)• Second normal form (2NF)• Third normal form (3NF)• Fourth normal form (4NF)• . . .

• 2NF is better than 1NF; 3NF is better than 2NF; …• For most business database design purposes, 3NF is

highest we need to go in the normalization process• Highest level of normalization is not always most

desirable

Page 6: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

6

Tutorials Websites On Normalization

• What do you need to know about

• Webopedia

• Rules of data normalization

• 3NF Tutorial

• ServerWatch

• VBMySQL.com

Page 7: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

7

Unnormalized vs. Denormalized

• Unnormalized data:

– No ER model is developed when creating the

database

• Denormalized data– Starting with normalized data, adding redundancy for

better performance

• We cannot denormalize unnormalized data

Page 8: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

8

The Need for Normalization

• Example: A company manages building projects– Charges its clients by billing hours spent on each

contract– Hourly billing rate is dependent on employee’s

position– Periodically, a report is generated that contains

information displayed in Table 5.1– See RPT_FORMAT table in ConstructCo.mdb

Page 9: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

9

A Sample Report Layout

Page 10: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

10

A Table in the Report Format

Page 11: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

11

Example: Construction company

• Three major problems:– PROJ_NUM contains repeating groups

– Table entries invite data inconsistencies

– Table displays data anomalies

Page 12: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

12

Construction Company Problems

• PROJ_NUM contains “repeating groups”– Several data entries with NULL entries indicating that all entries

belong to the same PROJ_NUM– PRO_NUM is intended to be primary key– Primary key can’t be NULL

• Relational table can’t contain repeating groups

Repeating groups

Page 13: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

13

Construction Company Problems

• Table entries invite data inconsistencies

– Elect. Engineer could mistakenly be entered in another

abbreviated form (e.g. Elect.Eng, EE, …)

Possible inconsistency

Page 14: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

14

Construction Company Problems• Table displays data anomalies

– Update anomalies• Modifying JOB_CLASS needs several alterations

– Insertion anomalies• New employee must be assigned project

– Deletion anomalies• If employee is deleted, other vital data might be lost

Page 15: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

15

Construction Company Problems

• Structure of data set in Fig. 5.1 (

ConstructCo.mdb) does not handle data very well• The table structure appears to work; report is

generated with ease• Unfortunately, the report may yield different results,

depending on what data anomaly has occurred

Page 16: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

16

Levels of Normalizations

1NF

2NF

3NF

BCNF

4NF

5NF

Un-normalized)

Page 17: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

17

Conversion to First Normal Form

• Repeating group– Derives its name from the fact that a group of

multiple (related) entries can exist for any single key attribute occurrence

• Relational table must not contain repeating groups

• Normalizing the table structure will reduce these data redundancies

Page 18: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

18

Conversion to First Normal Form

• Normalization is three-step procedure

1. Eliminate Repeating Groups

2. Identify the Primary Key

3. Identify all Dependencies

Page 19: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

19

Step 1: Eliminate Repeating Groups

• Present data in a tabular format, where each cell has a single value and there are no repeating groups

• Eliminate repeating groups by eliminating nulls, making sure that each repeating group attribute contains an appropriate data value

Page 20: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

20

Data Organization: 1NF

• See DATA_ORG_INF

Page 21: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

21

Step 2: Identify the Primary Key

• Primary key must uniquely identify attribute

values• New key must be composed• What would be the primary key for the

DATA_ORG_INF table?

Page 22: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

22

Step 3: Identify all Dependencies

• Dependencies can be depicted with the help

of a “Dependency Diagram”– Depicts all dependencies found within a given

table structure– Helpful in getting bird’s-eye view of all

relationships among a table’s attributes– Makes it much less likely that an important

dependency will be overlooked

Page 23: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

23

Dependency Diagram: 1NF

Page 24: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

24

First Normal Form

• Tabular format in which:– All key attributes are defined– There are no repeating groups in the table– All attributes are dependent on primary key

• All relational tables satisfy 1NF requirements• Some tables contain “Partial dependencies”

– Dependencies based on only part of the primary key

– Sometimes used for performance reasons, but should be used with caution

– Still subject to data redundancies

Page 25: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

25

Conversion to Second Normal Form

• Relational database design can be

improved by converting the database into

second normal form (2NF)• Two steps

1. Identify All key components

2. Identify the dependent attributes

Page 26: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

26

Step 1: Identify All Key Components

• Write each key component on separate line, and then write the original (composite) key on the last line

• Each component will become the key in a new table

• Example:– Proj_Num – Emp_Num – Proj_Num, Emp_Num

Page 27: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

27

Step 2: Identify Dependent Attributes

• Determine which attributes are dependent on which other attributes

• At this point, most anomalies have been eliminated

• Example:– Proj_Num Proj_name– Emp_Num Emp_Name, Job_Class, Chg_Hour– Proj_Num, Emp_Num Hours

Page 28: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

28

Second Normal Form (2NF) Conversion Results

Page 29: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

29

Second Normal Form

• Table is in second normal form (2NF) if:– It is in 1NF and – It includes no partial dependencies:

• No attribute is dependent on only a portion of

the primary key

Page 30: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

30

Conversion to Third Normal Form

• Three steps1. Identify each new determinant

2. Identify the dependent attributes

3. Remove the dependent attributes from transitive

dependencies

Page 31: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

31

Step 1: Identify New Determinants

• Determinant – Any attribute whose value determines other

values within a row• For every transitive dependency, write its

determinant as a PK for a new table• Example

– Job_Class

Page 32: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

32

Step 2: Identify Dependent Attributes

• Identify the attributes dependent on each

determinant identified in Step 1 and identify

the dependency• Name the table to reflect its contents and

function• Example

– Job_Class Chg_Hour

Page 33: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

33

Step 3: Remove Dependent Attributes from Transitive Dependencies

• Eliminate all dependent attributes in transitive relationship(s) from each table that has such a transitive relationship

• Draw a new dependency diagram to show all tables defined in Steps 1–3

• Check new tables and modified tables from Step 3 to make sure that each has a determinant and does not contain inappropriate dependencies

• Example:– Proj_Num Proj_name– Emp_Num Emp_Name, Job_Class– Proj_Num, Emp_Num Hours– Job_Class Chg_Hour

Page 34: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

34

Third Normal Form (3NF) Conversion Results

Page 35: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

35

Third Normal Form

• A table is in third normal form (3NF) if:– It is in 2NF and – It contains no transitive dependencies

Page 36: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

36

Improving the Design

• Table structures are cleaned up to eliminate

the troublesome initial partial and transitive

dependencies• Normalization cannot, by itself, be relied on

to make good designs• It is valuable because its use helps eliminate

data redundancies

Page 37: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

37

Improving the Design (continued)

• The following changes were (or should be) made: – PK assignment– Naming conventions– Attribute atomicity– Adding attributes – Adding relationships– Refining PKs– Maintaining historical accuracy– Using derived attributes

Page 38: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

38

PK assignment

• Addition of JOB_CODE attribute greatly decreases the likelihood of referential integrity violation

• Example: – One would write a job class as “DB Designer” and

another as “Database Designer” although both refer to the same class

• Problem: The new key produces transitive dependency– JOB_CODE -> JOB_CLASS, CHG_HOUR– JOB_CLASS -> CHG_HOUR

• The effect of transitive dependency is less problematic than integrity violation

Page 39: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

39

Naming conventions

• Adhere to naming convention by having the table

name as the first prefix of the attribute name

• Example:

– CHG_HOUR is changed to JOB_CHG_HOUR

– JOB_CLASS should be changed to

JOB_DESCRIPTION to better describe the entity

– HOURS is changed to ASSIGNED_HOURS to

associate it with the ASSIGN table

Page 40: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

40

Attribute atomicity

• Atomic attribute:

– Attribute that cannot be further divided

– Gains querying flexibility

• Example: EMP_NAME is not atomic

– Use EMP_LNAME, EMP_FNAME, EMP_INITIAL

instead

Page 41: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

41

Adding Attributes

• Adding attributes that are needed in the real-world

environment

• Example: EMP_HIREDATE can be used to track

employee’s “job longevity” to award bonuses to long-

term employees & other morale enhancing measures

Page 42: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

42

Adding Relationships

• To avoid unnecessary data duplication• Example:

– To be able to supply detailed information about each project’s manager, use EMP_NUM as a FK in PROJECT table

Page 43: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

43

Refining PKs

• If an employee makes two or more entries for the same project,

replace EMP_NUM & PROJ_NUM as a combined key, by a

system-assigned, surrogate key, ASIGN_NUM, in the ASSIGN

table

• Surrogate key:

– At the implementation level, a system-defined attribute that

is created and managed by the DBMS.

– It is automatically incremented for each row.

– In Oracle for example we use the "sequence" object for

surrogate keys.

• EMP_NUM & PROJ_NUM can still be used as FK’s

Page 44: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

44

Maintaining historical accuracy

• Writing the job charge per hour in ASSIGN table is

crucial to maintain historical accuracy, in case that the

job charge per hour changes over time

Page 45: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

45

Using derived attributes

• Derived attributes can be calculated when needed

• Storing a derived attribute makes it easier to write

application SW to produce the desired results and

saves reporting time

• ASSIGN_CHARGE is the result of multiplying

ASSIGN_HOURS by ASSIGN_CHG_HOUR

Page 46: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

46

The Completed Database• See Ch05_ConstructCo.mdb

Page 47: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

47

The Completed Database (continued)• See Ch05_ConstructCo.mdb

Page 48: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

48

The Completed Database (continued)• See Ch05_ConstructCo.mdb

Page 49: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

49

Limitations on System-Assigned Keys

• System-assigned (surrogate) primary key may not prevent confusing entries

• Example:– JOB_CODE attribute was designed to be a PK of JOB

table– This does not prevent duplicating existing records with

different PK values– Data entries in Table 5.2 are inappropriate because they

duplicate existing records– Yet there has been no violation of either entity integrity or

referential integrity

Page 50: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

50

The Boyce-Codd Normal Form (BCNF)

• Google search• About.com

– A relation is in Boyce-Codd Normal Form (BCNF) if every determinant is a candidate key.

• Candidate key– Has same characteristics as primary key, but for some reason,

not chosen to be primary key• If a table contains only one candidate key, the 3NF and the

BCNF are equivalent• BCNF can be violated only if the table contains more than one

candidate key• Most designers consider the BCNF as a special case of 3NF

Page 51: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

51

The Boyce-Codd Normal Form (BCNF) (continued)

• A table is in 3NF if it is in 2NF and there are no transitive dependencies

• A table can be in 3NF and not be in BCNF– A transitive dependency exists when one nonprime

attribute is dependent on another nonprime attribute– A nonkey attribute is the determinant of a key

attribute• How to convert a 3NF into BCNF?

– Decompose the tables according to the functional dependencies

Page 52: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

52

A Table That is in 3NF but not in BCNF

• Here:– No partial dependencies– No transitive dependencies– But a nonkey attribute determines a key attribute

• A + B C, D• C B

Page 53: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

53

Decomposition to BCNF

• To convert the table into BCNF– First change the PK into A + C; partial dependency is

generated– Next Decompose the tables to remove partial

dependency

Page 54: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

54

Sample Data for a BCNF Conversion

• Each CLASS_CODE identifies a class uniquely

• A student can take many classes

• A staff member can teach many classes– STU_ID + STAFF_ID CLASS_CODE, ENROLL_GRADE

– CLASS_CODE STAFF_ID

Page 55: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

55

Sample Data for a BCNF Conversion

• STU_ID + STAFF_ID CLASS_CODE, ENROLL_GRADE• CLASS_CODE STAFF_ID• Already in 3NF but has Some anomalies

– Update anomalies• If a different staff member is assigned to teach class 32456, 2 rows will

require updating– Deletion anomalies

• If student 135 drops class 28458, we loose information about who taught that class

• Solution– Decompose table structure– Resulting tables are in 3NF & BCNF

Page 56: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

56

Another BCNF Decomposition

Page 57: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

57

Normalization and Database Design

• Normalization should be part of design process

• Make sure that proposed entities meet required

normal form before table structures are created

• Many real-world databases have been improperly

designed or burdened with anomalies if improperly

modified during course of time

• You may be asked to redesign and modify existing

databases

Page 58: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

58

Normalization and Database Design (continued)

• ER diagram

– Provides the big picture, or macro view, of an

organization’s data requirements and operations– Created through an iterative process

• Identifying relevant entities, their attributes and their

relationship• Use results to identify additional entities and attributes

Page 59: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

59

Normalization and Database Design (continued)

• Normalization procedures – Focus on the characteristics of specific entities– A micro view of the entities within the ER diagram– May yield additional entities/attributes to be added to the

ERD

• Difficult to separate normalization process from ER modeling process

• Two techniques should be used concurrently

Page 60: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

60

Example: Contracting Company

• Operations/Business Rules:– Company manages many projects– Each project requires services of many employees– An employee may be assigned many projects– Some employees might not have any assigned

projects– Every employee has a job classification that

determines the hourly billing rate– Many employees can have the same job

classification

Page 61: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

61

Example: Contracting Company

• The Initial ERD– PROJECT is in 3NF– EMPLOYEE: Contains transitive dependency

• JOB_DESCRIPTION defines the job’s charge per hour• Classifications determine billing rate

Page 62: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

62

Example: Contracting Company

• The Modified ERD; removing transitive dependency• Question:

– Which type of relationship exist between EMPLOYEE & PROJECT?

Page 63: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

63

The Incorrect Representation of a M:N Relationship

• M:N between EMPLOYEE & PROJECT cannot be implemented– Need further table decomposition

Page 64: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

64

Final (Implementable) ERD for a Contracting Company

• After table decomposition

Page 65: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

65

The Implemented Database for the Contracting Company

• See Ch05_ConstructCo.mdb

Page 66: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

66

Higher-Level Normal Forms

• In some databases, multiple multivalued attributes exist

• Fourth Normal Form (4NF)– Table is in 3NF

– No multiple sets of multivalued dependencies

• Fifth Normal Form (5NF, Domain Key NF)– Not likely in business environment

Page 67: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

67

Tables with Multivalued Dependencies

• The tables contain two sets of independent multi-valued dependencies– ORG_CODE &

ASSIGN_NUM may have different values for the same employee

– One employee can have many service entries and many assignment to different organizations

– There is no viable candidate key

• EMP_NUM values are not unique, nor any other combination, since some of them contain nulls

• Version 3 is in 3NF, but has many redundancies

Page 68: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

68

Fourth Normal Form

• Table is in fourth normal form (4NF) if– It is in 3NF or BCNF– Has no multiple sets of multivalued dependencies

• 4NF is largely academic if tables conform to the following two rules:– All attributes are dependent on primary key but

independent of each other– No row contains two or more multivalued facts about

an entity

Page 69: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

69

A Set of Tables in 4NF

Page 70: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

70

Denormalization

• Creation of normalized relations is important database design goal

• Processing requirements should also be a goal• If tables decomposed to conform to normalization

requirements– Number of database tables expands

• Joining larger number of tables takes additional disk input/output (I/O) operations and processing logic – Reduces system speed

• Conflicts among design efficiency, information requirements, and processing speed are often resolved through compromises that may include denormalization

Page 71: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

71

Denormalization (continued)

• Unnormalized tables in a production database tend to have these defects:– Data updates are less efficient because

programs that read and update tables must deal with larger tables

– Indexing is much more cumbersome– Unnormalized tables yield no simple strategies

for creating virtual tables known as views• Use denormalization cautiously • Understand why—under some circumstances

—unnormalized tables are a better choice

Page 72: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

72

Summary

• Normalization is a table design technique aimed at minimizing data redundancies

• First three normal forms (1NF, 2NF, and 3NF) are most commonly encountered

• Normalization is an important part—but only a part—of the design process

• Continue the iterative ER process until all entities and their attributes are defined and all equivalent tables are in 3NF

Page 73: 5 1 Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original

Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

5

73

Summary (continued)

• A table in 3NF may contain multivalued dependencies that produce either numerous null values or redundant data

• It may be necessary to convert a 3NF table to the fourth normal form (4NF) by– splitting such a table to remove multivalued

dependencies• Tables are sometimes denormalized to yield

less I/O which increases processing speed