normalization well structured relations and anomalies normalization first normal form (1nf)...
TRANSCRIPT
Normalization
• Well structured relations and anomalies
• Normalization
• First normal form (1NF)
• Functional dependence
• Partial functional dependency
• Second normal form (2NF)
• Transitive functional dependency
• Third normal form (3NF)
• Practical consideration
• Contains a minimum amount of redundancy
• Allows users to modify, insert and delete the rows in a
table without errors or inconsistencies
EMPLOYEE1
EMP ID NAME DEPT SALARY100 Margaret Simpson Marketing 42000140 Allen Beeton Accounting 39000234 Christina Lucero Finance 53000356 Lorenzo Davis Sales 45000209 Susan Martin Accounting 46000
- Emploee1 is a well structured relation.- Any modification to an employee’s data such as a change in salary, is confined to one row of the table.
A well structured relation
EMP ID NAME DEPT SALARY COURSE DATE100 Margaret Simpson Marketing 42000 101 5/9/94100 Margaret Simpson Marketing 42000 103 5/3/93140 Allen Beeton Accounting 39000 333 8/3/95140 Allen Beeton Accounting 39000 342 5/3/94234 Christina Lucero Finance 53000 111 3/5/95356 Lorenzo Davis Sales 45000 244 4/6/95209 Susan Martin Accounting 46000 235 4/2/97
- This table has a considerable amount of redundancy
e.g. EMP ID, NAME, DEPT, and SALARY appear in two separate rows
for some employees
- If the salary of those employees change, we must record this information
in two or more rows.
- Therefore, this is not a well structured relation.
EMLOYEE2
Is this a well structured relation?
• Redundancies in a table may result in errors and
inconsistencies (called anomalies) when a use attempts to
update the data in the table
• Three types of anomalies
– Insertion anomaly
– Deletion anomaly
– Modification anomaly
Why minimize redundancies?
• If we want to add a new employee to EMPLOYEE2, the user must supply values for EMPID and COURSE.
• This is because the primary key values cannot be NULL.
• In reality, employee should be able to enter employee data without supplying course data
Insertion anomaly
Deletion anomaly
• If the data for employee number 234 is deleted, we will also lose the information that this employee completed the course 111.
• In fact, we lose information about the course altogether.
Modification anomaly• Suppose that employee number 100 gets a salary increase,
we must record this increase in each of the rows for that employee.
• Otherwise the data will be inconsistent.
• Normalization is a process for converting complex data
structures into simple, stable data structures (E.Codd 1970)
• The objectives of the normalization process are:– to eliminate certain kinds of data redundancy,– to avoid certain anomalies.
• Normalization is accomplished in stages.
• A normal form is a state of a relation that can be determined by applying simple rules regarding dependencies (or relationships between attributes)
Normalization
• Every attribute in each record contains only one value, i.e. a
table contains NO REPEATING GROUPS!
• A relation is already (at least) in 1NF
• A table with repeating groups is converted to a relation in
first normal form by:
extending the data in each column to fill the cells that are
empty because of the repeating groups structures.
First Normal Form (1NF)
Student_ID Sname GPA CourseID Cname InstructorID Iname111223333 Tom Smith 3.4 MIS310
MIS325Intro MISDatabase Mgt
123456789234567891
ShawKim
222334444 Mary Jones 3.5 MIS373MIS333MIS325
Data CommComputer SysDatabase Mgt
345678901456789123234567891
BaruaLeeKim
1NF Normalization
Student_ID Sname GPA CourseID Cname InstructorID Iname111223333111223333
Tom SmithTom Smith
3.43.4
MIS310MIS325
Intro MISDatabase Mgt
123456789234567891
ShawKim
222334444222334444222334444
Mary JonesMary JonesMary Jones
3.53.53.5
MIS373MIS333MIS325
Data CommComputer SysDatabase Mgt
345678901456789123234567891
BaruaLeeKim
Student(Student_ID, Sname,GPA,CourseID,Cname,InstructorID,Iname)
Functional dependence
• A functional dependency is a particular relationship
between two attributes
• For any relation R, the attribute B is functionally
dependent on A if for every instance of A, that value of A
uniquely determines the value of B.
• Represented as A -> B
• Normalization is based on the analysis of functional
dependence
Examples of functional dependency
SSN -> NAME, ADDRESS, BIRTHDATE A person’s name, address and birthdate are functionally dependent on that person’s social security number.
VIN -> MAKE, MODEL, COLORThe make, model and color of a vehicle are functionally dependent on the vehicle identification number
ISBN -> TITLE The title of a book is functionally dependent on the book’s international standard book number (ISBN)
• The attribute on the left hand side of the arrow in a
functional dependency is called a determinant.
e.g. SSN,VIN, ISBN are determinants
Important!
• Instances (or sample data) in a relation do not prove
that a functional dependency exists.
• Only knowledge of the problem domain is a reliable
method for identifying a functional dependency
Determinant
EMPLOYEE2 = (EMPID, NAME, DEPT,
SALARY,COURSE, DATE COMPLETED)
• Functional dependencies:
EMPID -> NAME,DEPT, SALARY
EMPID, COURSE -> DATE COMPLETED
• Therefore the only candidate key (and hence primary
key) is a combination of EMPID and COURSE
EMPLOYEE2 (EMPID, NAME, DEPT, SALARY,COURSE,
DATE COMPLETED)
• A composite key is a primary key that contains more than
one attribute.
• EMPID is a determinant but not a candidate key.
• A candidate key is always a determinant
• But a determinant is not always a candidate key.
Partial functional dependency
• A functional dependency A-> B is a partial dependency,
if B is functionally dependent on A and also
functionally dependent on any proper subset of A.
• We check partial dependency if we have a composite key.
EMPLOYEE2= (EMPID,NAME,DEPT,SALARY, COURSE, DATE
COMPLETED)
The functional dependencies are:
EMPID,COURSE -> DATE COMPLETED, EMPID -> NAME, DEPT, SALARY
Second Normal Form (2NF)
• A relation is in second normal form if:
- It is in first normal form, and
- every nonkey attribute is functionally dependent on part (but not
all) of the primary key, i.e. no partial functional dependency.
• The conditions of 2NF
- The primary key consists of only one attribute,
- No nonkey attributes exist, or
- Every nonkey attribute is functionally dependent on the full set
of primary key attributes.
Problems created by partial functional dependencies?
• Insertion anomaly
– To insert a row, we must provide values for both EMPID and COURSE
• Deletion anomaly
– If we delete a row for an employee, we lose the information that the
employee completed a course on a particular date
• Modification anomaly
– If an employee’s salary changes, we must record this change in multiple
rows (if the employee completed more than one course)
Removing partial dependencies
If a relation is not in 2NF, it can be further normalized into a number of 2NF
relations in which nonkey attributes are associated only with the part of the
primary key on which they are fully functionally dependent.
EMPLOYEE2 =
(EMPID,NAME,DEPT,SALARY,COURSE, DATE COMPLETED)
EMPID,COURSE->DateCompleted and EMPID->Name, Dept, Salary
EMPLOYEE (EMPID,NAME,DEPT,SALARY)
EMPCOURSE (EMPID, COURSE, DATE COMPLETED)
Student_ID Sname GPA CourseID Cname InstructorID Iname111223333111223333
Tom SmithTom Smith
3.43.4
MIS310MIS325
Intro MISDatabase Mgt
123456789234567891
ShawKim
222334444222334444222334444
Mary JonesMary JonesMary Jones
3.53.53.5
MIS373MIS333MIS325
Data CommComputer SysDatabase Mgt
345678901456789123234567891
BaruaLeeKim
Student (Student_ID, Sname, GPA, CourseID, Cname, InstructorID, Iname)
2NF
Student(Student_ID,Sname,GPA) Course(CourseID,Cname,InstructorID,Iname)StudentCourse(Student_ID,CourseID)
StudentID Sname GPA111223333 Tom Smith 3.1222334444 Mary Jones 3.2
CourseID Cname InstructorID InameMIS310 Intro MIS 123456789 ShawMIS325 Database Mgt 234567891 KimMIS333 Computer Sys 456789123 LeeMIS373 Data Comm 345678901 Barua
StudentID CourseID111223333 MIS310111223333 MIS325222334444 MIS373222334444 MIS333222334444 MIS325
Transitive dependency
• A functional dependency between two (or more) nonkey
attributes.
•A set of attributes Y that is not a subset of the primary
key of R, and both X->Y and Y->Z hold,
i.e. X->Y and Y->Z, then X->Z.
E.g.
STUDENT NUMBER -> MAJOR and MAJOR -> ADVISOR
then STUDENT NUMBER ->ADVISOR
Transitivity dependency
Pseudotransitivity Rule:
•If X->Y and YZ->W, then XZ->W
e.g.
STUDENT NUMBER->MAJOR and
MAJOR,CLASS->ADVISOR,
then STUDENT NUMBER, CLASS->ADVISOR
Third Normal Form (3NF)
• To eliminate the anomalies caused by the presence of
transitive dependencies in a relation.
• If a relation is in 3NF, it is also in second normal form and
no transitive dependencies exist.
• 3NF normalization: the nonkey attributes connected by
each functional dependency which causes the transitive
functional dependency become a new relation.
Sales
CUST_NO NAME SALESPERSON REGION5435 Anderson Smith South7546 Bancroft Hicks West3435 Hobbs Smith South6577 Tucker Hernandez East3545 Eckersley Hicks West7878 Arnold Faulb North
SALES(CUST_NO,NAME, SALESPERSON, REGION)
Functional dependencies:
CUST_NO -> NAME, SALESPERSON, REGION
SALESPERSON -> REGION (Each salesperson is assigned to
a unique region)
•Insertion Anomaly: A new salesperson Robinson
assigned to the North region cannot be entered until a
customer has been assigned.
•Deletion Anomaly: If Customer Number 6577 is deleted
from the relation, we lose the information that Hernandez is
assigned to the East region
•Modification anomaly: If salesperson Smith is reassigned
to the East region, several rows must be changed to reflect
that fact.
Anomalies with Sales
Removing transitive dependencies
The transitive dependencies can be removed by:
Decomposing SALES into two relations:
SALES1: (CUST NO, NAME, SALESPERSON)
SPERSON (SALESPERSON, REGION)
The determinant in the transitive dependency in SALES
(i.e. SALESPERSON dependency) becomes primary key
in the SPERSON & foreign key in the SALES1 relation
• Transitive dependency can occur between sets of
attributes in a relation.
E.g.
SHIPMENT (SNUM, ORIGIN, DESTINATION,
DISTANCE)
Functional dependencies:
SNUM -> ORIGIN, DESTINATION, DISTANCE
ORIGIN, DESTINATION -> DISTANCE
Transitive dependency between sets of attributes
Identify the insertion anomaly
Identify the deletion anomaly
Identify the modification anomaly
SNUM ORIGIN DESTINATION DISTANCE409 Seattle Denver 1537618 Chicago Dallas 1058723 Boston Atlanta 1214824 Denver Los Angeles 1150629 Minneapolis St. Louis 587
SNUM ORIGIN DESTINATION409 Seattle Denver618 Chicago Dallas723 Boston Atlanta824 Denver Los Angeles629 Minneapolis St. Louis
ORIGIN DESTINATION DISTANCESeattle Denver 1537Chicago Dallas 1058Boston Atlanta 1214Denver Los Angeles 1150Minneapolis St. Louis 587
SHIPMENT1 (SNUM, ORIGIN, DESTINATION)OD_DISTANCE (ORIGIN, DESTINATION, DISTANCE)
Relations in 3NF
CourseID Cname InstructorID InameMIS310 Intro MIS 123456789 ShawMIS325 Database Mgt 234567891 KimMIS333 Computer Sys 456789123 LeeMIS373 Data Comm 345678901 Barua
Course(CourseID, Cname, InstuctorID, Iname)
3NF
Course(CourseID, Cname, InsturctorID) Instructor (InstructorID, Iname)
CourseID Cname InstructorIDMIS310 Intro MIS 123456789MIS325 Database Mgt 234567891MIS333 Computer Sys 456789123MIS373 Data Comm 345678901
InstructorID Iname123456789 Shaw234567891 Kim456789123 Lee345678901 Barua
ER Model and Third Normal Form (3NF)
• In general, if we have a “good” ER model and convert this model to relation schemes according to the transformation rules, we can get the relations with 3NF.
Additional Normal Forms
• Relations in third normal form are sufficient for most
practical database applications
• However, 3NF does not guarantee that all anomalies have
been removed.
• There are additional normal forms to remove them:
Boyce-Codd Normal Form
Fourth Normal Form
Fifth Normal Form
Domain Key Normal Form
Steps in NormalizationTable with repeating groups
First normal form (1NF)
Second normal form (2NF)
Third normal form (3NF)
Boyce-Codd normal form (BCNF)
Fourth normal form (4NF)
Fifth normal form (5NF)
Removerepeatinggroups
Removetransitive
dependencies
Removemultivalued
dependencies
Removepartial
dependencies
Remove remaininganomalies resulting
from functionaldependencies
Removeremaininganomalies