5 1 orange coast college business division cs/cis department fall 2004 cis 182 introduction to...
TRANSCRIPT
5
1
Orange Coast CollegeBusiness Division
CS/CIS DepartmentFall 2004 CIS 182
Introduction to Database Concepts
InstructorDr. Martha Malaty
Text & Original PresentationsDatabase Systems: Design, Implementation, and
Management, Sixth Edition, Rob and Coronel, 2004
5
2
Chapter 5
Normalization of Database Tables
Database Systems: Design, Implementation, and Management,
Sixth Edition, Rob and Coronel
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
3
In this chapter, you will learn:
• What normalization is and what role it plays in the database design process
• About the normal forms 1NF, 2NF, 3NF, BCNF, and 4NF
• How normal forms can be transformed from lower normal forms to higher normal forms
• That normalization and ER modeling are used concurrently to produce a good database design
• That some situations require denormalization to generate information efficiently
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
4
Database Tables & Normalization
• Good table structure is essential for good DB design• Normalization is the way to reach good table
structure• Normalization is the process for assigning attributes
to entities through evaluating and correcting table structures– Reduces data redundancies– Helps eliminate data anomalies– Produces controlled redundancies to link tables– Works through sequence of stages (normal forms)
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
5
Database Tables & Normalization
• Normalization works through a series of stages called normal forms:
• First normal form (1NF)• Second normal form (2NF)• Third normal form (3NF)• Fourth normal form (4NF)• . . .
• 2NF is better than 1NF; 3NF is better than 2NF; …• For most business database design purposes, 3NF is
highest we need to go in the normalization process• Highest level of normalization is not always most
desirable
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
6
Tutorials Websites On Normalization
• What do you need to know about
• Webopedia
• Rules of data normalization
• 3NF Tutorial
• ServerWatch
• VBMySQL.com
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
7
Unnormalized vs. Denormalized
• Unnormalized data:
– No ER model is developed when creating the
database
• Denormalized data– Starting with normalized data, adding redundancy for
better performance
• We cannot denormalize unnormalized data
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
8
The Need for Normalization
• Example: A company manages building projects– Charges its clients by billing hours spent on each
contract– Hourly billing rate is dependent on employee’s
position– Periodically, a report is generated that contains
information displayed in Table 5.1– See RPT_FORMAT table in ConstructCo.mdb
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
9
A Sample Report Layout
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
10
A Table in the Report Format
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
11
Example: Construction company
• Three major problems:– PROJ_NUM contains repeating groups
– Table entries invite data inconsistencies
– Table displays data anomalies
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
12
Construction Company Problems
• PROJ_NUM contains “repeating groups”– Several data entries with NULL entries indicating that all entries
belong to the same PROJ_NUM– PRO_NUM is intended to be primary key– Primary key can’t be NULL
• Relational table can’t contain repeating groups
Repeating groups
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
13
Construction Company Problems
• Table entries invite data inconsistencies
– Elect. Engineer could mistakenly be entered in another
abbreviated form (e.g. Elect.Eng, EE, …)
Possible inconsistency
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
14
Construction Company Problems• Table displays data anomalies
– Update anomalies• Modifying JOB_CLASS needs several alterations
– Insertion anomalies• New employee must be assigned project
– Deletion anomalies• If employee is deleted, other vital data might be lost
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
15
Construction Company Problems
• Structure of data set in Fig. 5.1 (
ConstructCo.mdb) does not handle data very well• The table structure appears to work; report is
generated with ease• Unfortunately, the report may yield different results,
depending on what data anomaly has occurred
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
16
Levels of Normalizations
1NF
2NF
3NF
BCNF
4NF
5NF
Un-normalized)
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
17
Conversion to First Normal Form
• Repeating group– Derives its name from the fact that a group of
multiple (related) entries can exist for any single key attribute occurrence
• Relational table must not contain repeating groups
• Normalizing the table structure will reduce these data redundancies
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
18
Conversion to First Normal Form
• Normalization is three-step procedure
1. Eliminate Repeating Groups
2. Identify the Primary Key
3. Identify all Dependencies
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
19
Step 1: Eliminate Repeating Groups
• Present data in a tabular format, where each cell has a single value and there are no repeating groups
• Eliminate repeating groups by eliminating nulls, making sure that each repeating group attribute contains an appropriate data value
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
20
Data Organization: 1NF
• See DATA_ORG_INF
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
21
Step 2: Identify the Primary Key
• Primary key must uniquely identify attribute
values• New key must be composed• What would be the primary key for the
DATA_ORG_INF table?
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
22
Step 3: Identify all Dependencies
• Dependencies can be depicted with the help
of a “Dependency Diagram”– Depicts all dependencies found within a given
table structure– Helpful in getting bird’s-eye view of all
relationships among a table’s attributes– Makes it much less likely that an important
dependency will be overlooked
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
23
Dependency Diagram: 1NF
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
24
First Normal Form
• Tabular format in which:– All key attributes are defined– There are no repeating groups in the table– All attributes are dependent on primary key
• All relational tables satisfy 1NF requirements• Some tables contain “Partial dependencies”
– Dependencies based on only part of the primary key
– Sometimes used for performance reasons, but should be used with caution
– Still subject to data redundancies
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
25
Conversion to Second Normal Form
• Relational database design can be
improved by converting the database into
second normal form (2NF)• Two steps
1. Identify All key components
2. Identify the dependent attributes
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
26
Step 1: Identify All Key Components
• Write each key component on separate line, and then write the original (composite) key on the last line
• Each component will become the key in a new table
• Example:– Proj_Num – Emp_Num – Proj_Num, Emp_Num
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
27
Step 2: Identify Dependent Attributes
• Determine which attributes are dependent on which other attributes
• At this point, most anomalies have been eliminated
• Example:– Proj_Num Proj_name– Emp_Num Emp_Name, Job_Class, Chg_Hour– Proj_Num, Emp_Num Hours
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
28
Second Normal Form (2NF) Conversion Results
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
29
Second Normal Form
• Table is in second normal form (2NF) if:– It is in 1NF and – It includes no partial dependencies:
• No attribute is dependent on only a portion of
the primary key
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
30
Conversion to Third Normal Form
• Three steps1. Identify each new determinant
2. Identify the dependent attributes
3. Remove the dependent attributes from transitive
dependencies
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
31
Step 1: Identify New Determinants
• Determinant – Any attribute whose value determines other
values within a row• For every transitive dependency, write its
determinant as a PK for a new table• Example
– Job_Class
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
32
Step 2: Identify Dependent Attributes
• Identify the attributes dependent on each
determinant identified in Step 1 and identify
the dependency• Name the table to reflect its contents and
function• Example
– Job_Class Chg_Hour
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
33
Step 3: Remove Dependent Attributes from Transitive Dependencies
• Eliminate all dependent attributes in transitive relationship(s) from each table that has such a transitive relationship
• Draw a new dependency diagram to show all tables defined in Steps 1–3
• Check new tables and modified tables from Step 3 to make sure that each has a determinant and does not contain inappropriate dependencies
• Example:– Proj_Num Proj_name– Emp_Num Emp_Name, Job_Class– Proj_Num, Emp_Num Hours– Job_Class Chg_Hour
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
34
Third Normal Form (3NF) Conversion Results
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
35
Third Normal Form
• A table is in third normal form (3NF) if:– It is in 2NF and – It contains no transitive dependencies
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
36
Improving the Design
• Table structures are cleaned up to eliminate
the troublesome initial partial and transitive
dependencies• Normalization cannot, by itself, be relied on
to make good designs• It is valuable because its use helps eliminate
data redundancies
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
37
Improving the Design (continued)
• The following changes were (or should be) made: – PK assignment– Naming conventions– Attribute atomicity– Adding attributes – Adding relationships– Refining PKs– Maintaining historical accuracy– Using derived attributes
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
38
PK assignment
• Addition of JOB_CODE attribute greatly decreases the likelihood of referential integrity violation
• Example: – One would write a job class as “DB Designer” and
another as “Database Designer” although both refer to the same class
• Problem: The new key produces transitive dependency– JOB_CODE -> JOB_CLASS, CHG_HOUR– JOB_CLASS -> CHG_HOUR
• The effect of transitive dependency is less problematic than integrity violation
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
39
Naming conventions
• Adhere to naming convention by having the table
name as the first prefix of the attribute name
• Example:
– CHG_HOUR is changed to JOB_CHG_HOUR
– JOB_CLASS should be changed to
JOB_DESCRIPTION to better describe the entity
– HOURS is changed to ASSIGNED_HOURS to
associate it with the ASSIGN table
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
40
Attribute atomicity
• Atomic attribute:
– Attribute that cannot be further divided
– Gains querying flexibility
• Example: EMP_NAME is not atomic
– Use EMP_LNAME, EMP_FNAME, EMP_INITIAL
instead
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
41
Adding Attributes
• Adding attributes that are needed in the real-world
environment
• Example: EMP_HIREDATE can be used to track
employee’s “job longevity” to award bonuses to long-
term employees & other morale enhancing measures
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
42
Adding Relationships
• To avoid unnecessary data duplication• Example:
– To be able to supply detailed information about each project’s manager, use EMP_NUM as a FK in PROJECT table
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
43
Refining PKs
• If an employee makes two or more entries for the same project,
replace EMP_NUM & PROJ_NUM as a combined key, by a
system-assigned, surrogate key, ASIGN_NUM, in the ASSIGN
table
• Surrogate key:
– At the implementation level, a system-defined attribute that
is created and managed by the DBMS.
– It is automatically incremented for each row.
– In Oracle for example we use the "sequence" object for
surrogate keys.
• EMP_NUM & PROJ_NUM can still be used as FK’s
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
44
Maintaining historical accuracy
• Writing the job charge per hour in ASSIGN table is
crucial to maintain historical accuracy, in case that the
job charge per hour changes over time
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
45
Using derived attributes
• Derived attributes can be calculated when needed
• Storing a derived attribute makes it easier to write
application SW to produce the desired results and
saves reporting time
• ASSIGN_CHARGE is the result of multiplying
ASSIGN_HOURS by ASSIGN_CHG_HOUR
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
46
The Completed Database• See Ch05_ConstructCo.mdb
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
47
The Completed Database (continued)• See Ch05_ConstructCo.mdb
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
48
The Completed Database (continued)• See Ch05_ConstructCo.mdb
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
49
Limitations on System-Assigned Keys
• System-assigned (surrogate) primary key may not prevent confusing entries
• Example:– JOB_CODE attribute was designed to be a PK of JOB
table– This does not prevent duplicating existing records with
different PK values– Data entries in Table 5.2 are inappropriate because they
duplicate existing records– Yet there has been no violation of either entity integrity or
referential integrity
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
50
The Boyce-Codd Normal Form (BCNF)
• Google search• About.com
– A relation is in Boyce-Codd Normal Form (BCNF) if every determinant is a candidate key.
• Candidate key– Has same characteristics as primary key, but for some reason,
not chosen to be primary key• If a table contains only one candidate key, the 3NF and the
BCNF are equivalent• BCNF can be violated only if the table contains more than one
candidate key• Most designers consider the BCNF as a special case of 3NF
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
51
The Boyce-Codd Normal Form (BCNF) (continued)
• A table is in 3NF if it is in 2NF and there are no transitive dependencies
• A table can be in 3NF and not be in BCNF– A transitive dependency exists when one nonprime
attribute is dependent on another nonprime attribute– A nonkey attribute is the determinant of a key
attribute• How to convert a 3NF into BCNF?
– Decompose the tables according to the functional dependencies
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
52
A Table That is in 3NF but not in BCNF
• Here:– No partial dependencies– No transitive dependencies– But a nonkey attribute determines a key attribute
• A + B C, D• C B
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
53
Decomposition to BCNF
• To convert the table into BCNF– First change the PK into A + C; partial dependency is
generated– Next Decompose the tables to remove partial
dependency
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
54
Sample Data for a BCNF Conversion
• Each CLASS_CODE identifies a class uniquely
• A student can take many classes
• A staff member can teach many classes– STU_ID + STAFF_ID CLASS_CODE, ENROLL_GRADE
– CLASS_CODE STAFF_ID
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
55
Sample Data for a BCNF Conversion
• STU_ID + STAFF_ID CLASS_CODE, ENROLL_GRADE• CLASS_CODE STAFF_ID• Already in 3NF but has Some anomalies
– Update anomalies• If a different staff member is assigned to teach class 32456, 2 rows will
require updating– Deletion anomalies
• If student 135 drops class 28458, we loose information about who taught that class
• Solution– Decompose table structure– Resulting tables are in 3NF & BCNF
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
56
Another BCNF Decomposition
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
57
Normalization and Database Design
• Normalization should be part of design process
• Make sure that proposed entities meet required
normal form before table structures are created
• Many real-world databases have been improperly
designed or burdened with anomalies if improperly
modified during course of time
• You may be asked to redesign and modify existing
databases
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
58
Normalization and Database Design (continued)
• ER diagram
– Provides the big picture, or macro view, of an
organization’s data requirements and operations– Created through an iterative process
• Identifying relevant entities, their attributes and their
relationship• Use results to identify additional entities and attributes
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
59
Normalization and Database Design (continued)
• Normalization procedures – Focus on the characteristics of specific entities– A micro view of the entities within the ER diagram– May yield additional entities/attributes to be added to the
ERD
• Difficult to separate normalization process from ER modeling process
• Two techniques should be used concurrently
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
60
Example: Contracting Company
• Operations/Business Rules:– Company manages many projects– Each project requires services of many employees– An employee may be assigned many projects– Some employees might not have any assigned
projects– Every employee has a job classification that
determines the hourly billing rate– Many employees can have the same job
classification
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
61
Example: Contracting Company
• The Initial ERD– PROJECT is in 3NF– EMPLOYEE: Contains transitive dependency
• JOB_DESCRIPTION defines the job’s charge per hour• Classifications determine billing rate
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
62
Example: Contracting Company
• The Modified ERD; removing transitive dependency• Question:
– Which type of relationship exist between EMPLOYEE & PROJECT?
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
63
The Incorrect Representation of a M:N Relationship
• M:N between EMPLOYEE & PROJECT cannot be implemented– Need further table decomposition
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
64
Final (Implementable) ERD for a Contracting Company
• After table decomposition
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
65
The Implemented Database for the Contracting Company
• See Ch05_ConstructCo.mdb
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
66
Higher-Level Normal Forms
• In some databases, multiple multivalued attributes exist
• Fourth Normal Form (4NF)– Table is in 3NF
– No multiple sets of multivalued dependencies
• Fifth Normal Form (5NF, Domain Key NF)– Not likely in business environment
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
67
Tables with Multivalued Dependencies
• The tables contain two sets of independent multi-valued dependencies– ORG_CODE &
ASSIGN_NUM may have different values for the same employee
– One employee can have many service entries and many assignment to different organizations
– There is no viable candidate key
• EMP_NUM values are not unique, nor any other combination, since some of them contain nulls
• Version 3 is in 3NF, but has many redundancies
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
68
Fourth Normal Form
• Table is in fourth normal form (4NF) if– It is in 3NF or BCNF– Has no multiple sets of multivalued dependencies
• 4NF is largely academic if tables conform to the following two rules:– All attributes are dependent on primary key but
independent of each other– No row contains two or more multivalued facts about
an entity
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
69
A Set of Tables in 4NF
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
70
Denormalization
• Creation of normalized relations is important database design goal
• Processing requirements should also be a goal• If tables decomposed to conform to normalization
requirements– Number of database tables expands
• Joining larger number of tables takes additional disk input/output (I/O) operations and processing logic – Reduces system speed
• Conflicts among design efficiency, information requirements, and processing speed are often resolved through compromises that may include denormalization
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
71
Denormalization (continued)
• Unnormalized tables in a production database tend to have these defects:– Data updates are less efficient because
programs that read and update tables must deal with larger tables
– Indexing is much more cumbersome– Unnormalized tables yield no simple strategies
for creating virtual tables known as views• Use denormalization cautiously • Understand why—under some circumstances
—unnormalized tables are a better choice
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
72
Summary
• Normalization is a table design technique aimed at minimizing data redundancies
• First three normal forms (1NF, 2NF, and 3NF) are most commonly encountered
• Normalization is an important part—but only a part—of the design process
• Continue the iterative ER process until all entities and their attributes are defined and all equivalent tables are in 3NF
Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
5
73
Summary (continued)
• A table in 3NF may contain multivalued dependencies that produce either numerous null values or redundant data
• It may be necessary to convert a 3NF table to the fourth normal form (4NF) by– splitting such a table to remove multivalued
dependencies• Tables are sometimes denormalized to yield
less I/O which increases processing speed