normalization well structured relations and anomalies normalization first normal form (1nf)...

32
Normalization Well structured relations and anomalies • Normalization First normal form (1NF) Functional dependence Partial functional dependency Second normal form (2NF) Transitive functional dependency Third normal form (3NF) Practical consideration

Upload: amos-harmon

Post on 02-Jan-2016

229 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second

Normalization

• Well structured relations and anomalies

• Normalization

• First normal form (1NF)

• Functional dependence

• Partial functional dependency

• Second normal form (2NF)

• Transitive functional dependency

• Third normal form (3NF)

• Practical consideration

Page 2: Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second

• Contains a minimum amount of redundancy

• Allows users to modify, insert and delete the rows in a

table without errors or inconsistencies

EMPLOYEE1

EMP ID NAME DEPT SALARY100 Margaret Simpson Marketing 42000140 Allen Beeton Accounting 39000234 Christina Lucero Finance 53000356 Lorenzo Davis Sales 45000209 Susan Martin Accounting 46000

- Emploee1 is a well structured relation.- Any modification to an employee’s data such as a change in salary, is confined to one row of the table.

A well structured relation

Page 3: Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second

EMP ID NAME DEPT SALARY COURSE DATE100 Margaret Simpson Marketing 42000 101 5/9/94100 Margaret Simpson Marketing 42000 103 5/3/93140 Allen Beeton Accounting 39000 333 8/3/95140 Allen Beeton Accounting 39000 342 5/3/94234 Christina Lucero Finance 53000 111 3/5/95356 Lorenzo Davis Sales 45000 244 4/6/95209 Susan Martin Accounting 46000 235 4/2/97

- This table has a considerable amount of redundancy

e.g. EMP ID, NAME, DEPT, and SALARY appear in two separate rows

for some employees

- If the salary of those employees change, we must record this information

in two or more rows.

- Therefore, this is not a well structured relation.

EMLOYEE2

Is this a well structured relation?

Page 4: Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second

• Redundancies in a table may result in errors and

inconsistencies (called anomalies) when a use attempts to

update the data in the table

• Three types of anomalies

– Insertion anomaly

– Deletion anomaly

– Modification anomaly

Why minimize redundancies?

Page 5: Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second

• If we want to add a new employee to EMPLOYEE2, the user must supply values for EMPID and COURSE.

• This is because the primary key values cannot be NULL.

• In reality, employee should be able to enter employee data without supplying course data

Insertion anomaly

Page 6: Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second

Deletion anomaly

• If the data for employee number 234 is deleted, we will also lose the information that this employee completed the course 111.

• In fact, we lose information about the course altogether.

Modification anomaly• Suppose that employee number 100 gets a salary increase,

we must record this increase in each of the rows for that employee.

• Otherwise the data will be inconsistent.

Page 7: Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second

• Normalization is a process for converting complex data

structures into simple, stable data structures (E.Codd 1970)

• The objectives of the normalization process are:– to eliminate certain kinds of data redundancy,– to avoid certain anomalies.

• Normalization is accomplished in stages.

• A normal form is a state of a relation that can be determined by applying simple rules regarding dependencies (or relationships between attributes)

Normalization

Page 8: Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second

• Every attribute in each record contains only one value, i.e. a

table contains NO REPEATING GROUPS!

• A relation is already (at least) in 1NF

• A table with repeating groups is converted to a relation in

first normal form by:

extending the data in each column to fill the cells that are

empty because of the repeating groups structures.

First Normal Form (1NF)

Page 9: Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second

Student_ID Sname GPA CourseID Cname InstructorID Iname111223333 Tom Smith 3.4 MIS310

MIS325Intro MISDatabase Mgt

123456789234567891

ShawKim

222334444 Mary Jones 3.5 MIS373MIS333MIS325

Data CommComputer SysDatabase Mgt

345678901456789123234567891

BaruaLeeKim

1NF Normalization

Student_ID Sname GPA CourseID Cname InstructorID Iname111223333111223333

Tom SmithTom Smith

3.43.4

MIS310MIS325

Intro MISDatabase Mgt

123456789234567891

ShawKim

222334444222334444222334444

Mary JonesMary JonesMary Jones

3.53.53.5

MIS373MIS333MIS325

Data CommComputer SysDatabase Mgt

345678901456789123234567891

BaruaLeeKim

Student(Student_ID, Sname,GPA,CourseID,Cname,InstructorID,Iname)

Page 10: Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second

Functional dependence

• A functional dependency is a particular relationship

between two attributes

• For any relation R, the attribute B is functionally

dependent on A if for every instance of A, that value of A

uniquely determines the value of B.

• Represented as A -> B

• Normalization is based on the analysis of functional

dependence

Page 11: Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second

Examples of functional dependency

SSN -> NAME, ADDRESS, BIRTHDATE A person’s name, address and birthdate are functionally dependent on that person’s social security number.

VIN -> MAKE, MODEL, COLORThe make, model and color of a vehicle are functionally dependent on the vehicle identification number

ISBN -> TITLE The title of a book is functionally dependent on the book’s international standard book number (ISBN)

Page 12: Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second

• The attribute on the left hand side of the arrow in a

functional dependency is called a determinant.

e.g. SSN,VIN, ISBN are determinants

Important!

• Instances (or sample data) in a relation do not prove

that a functional dependency exists.

• Only knowledge of the problem domain is a reliable

method for identifying a functional dependency

Determinant

Page 13: Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second

EMPLOYEE2 = (EMPID, NAME, DEPT,

SALARY,COURSE, DATE COMPLETED)

• Functional dependencies:

EMPID -> NAME,DEPT, SALARY

EMPID, COURSE -> DATE COMPLETED

• Therefore the only candidate key (and hence primary

key) is a combination of EMPID and COURSE

Page 14: Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second

EMPLOYEE2 (EMPID, NAME, DEPT, SALARY,COURSE,

DATE COMPLETED)

• A composite key is a primary key that contains more than

one attribute.

• EMPID is a determinant but not a candidate key.

• A candidate key is always a determinant

• But a determinant is not always a candidate key.

Page 15: Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second

Partial functional dependency

• A functional dependency A-> B is a partial dependency,

if B is functionally dependent on A and also

functionally dependent on any proper subset of A.

• We check partial dependency if we have a composite key.

EMPLOYEE2= (EMPID,NAME,DEPT,SALARY, COURSE, DATE

COMPLETED)

The functional dependencies are:

EMPID,COURSE -> DATE COMPLETED, EMPID -> NAME, DEPT, SALARY

Page 16: Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second

Second Normal Form (2NF)

• A relation is in second normal form if:

- It is in first normal form, and

- every nonkey attribute is functionally dependent on part (but not

all) of the primary key, i.e. no partial functional dependency.

• The conditions of 2NF

- The primary key consists of only one attribute,

- No nonkey attributes exist, or

- Every nonkey attribute is functionally dependent on the full set

of primary key attributes.

Page 17: Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second

Problems created by partial functional dependencies?

• Insertion anomaly

– To insert a row, we must provide values for both EMPID and COURSE

• Deletion anomaly

– If we delete a row for an employee, we lose the information that the

employee completed a course on a particular date

• Modification anomaly

– If an employee’s salary changes, we must record this change in multiple

rows (if the employee completed more than one course)

Page 18: Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second

Removing partial dependencies

If a relation is not in 2NF, it can be further normalized into a number of 2NF

relations in which nonkey attributes are associated only with the part of the

primary key on which they are fully functionally dependent.

EMPLOYEE2 =

(EMPID,NAME,DEPT,SALARY,COURSE, DATE COMPLETED)

EMPID,COURSE->DateCompleted and EMPID->Name, Dept, Salary

EMPLOYEE (EMPID,NAME,DEPT,SALARY)

EMPCOURSE (EMPID, COURSE, DATE COMPLETED)

Page 19: Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second

Student_ID Sname GPA CourseID Cname InstructorID Iname111223333111223333

Tom SmithTom Smith

3.43.4

MIS310MIS325

Intro MISDatabase Mgt

123456789234567891

ShawKim

222334444222334444222334444

Mary JonesMary JonesMary Jones

3.53.53.5

MIS373MIS333MIS325

Data CommComputer SysDatabase Mgt

345678901456789123234567891

BaruaLeeKim

Student (Student_ID, Sname, GPA, CourseID, Cname, InstructorID, Iname)

2NF

Student(Student_ID,Sname,GPA) Course(CourseID,Cname,InstructorID,Iname)StudentCourse(Student_ID,CourseID)

StudentID Sname GPA111223333 Tom Smith 3.1222334444 Mary Jones 3.2

CourseID Cname InstructorID InameMIS310 Intro MIS 123456789 ShawMIS325 Database Mgt 234567891 KimMIS333 Computer Sys 456789123 LeeMIS373 Data Comm 345678901 Barua

StudentID CourseID111223333 MIS310111223333 MIS325222334444 MIS373222334444 MIS333222334444 MIS325

Page 20: Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second

Transitive dependency

• A functional dependency between two (or more) nonkey

attributes.

•A set of attributes Y that is not a subset of the primary

key of R, and both X->Y and Y->Z hold,

i.e. X->Y and Y->Z, then X->Z.

E.g.

STUDENT NUMBER -> MAJOR and MAJOR -> ADVISOR

then STUDENT NUMBER ->ADVISOR

Page 21: Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second

Transitivity dependency

Pseudotransitivity Rule:

•If X->Y and YZ->W, then XZ->W

e.g.

STUDENT NUMBER->MAJOR and

MAJOR,CLASS->ADVISOR,

then STUDENT NUMBER, CLASS->ADVISOR

Page 22: Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second

Third Normal Form (3NF)

• To eliminate the anomalies caused by the presence of

transitive dependencies in a relation.

• If a relation is in 3NF, it is also in second normal form and

no transitive dependencies exist.

• 3NF normalization: the nonkey attributes connected by

each functional dependency which causes the transitive

functional dependency become a new relation.

Page 23: Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second

Sales

CUST_NO NAME SALESPERSON REGION5435 Anderson Smith South7546 Bancroft Hicks West3435 Hobbs Smith South6577 Tucker Hernandez East3545 Eckersley Hicks West7878 Arnold Faulb North

SALES(CUST_NO,NAME, SALESPERSON, REGION)

Functional dependencies:

CUST_NO -> NAME, SALESPERSON, REGION

SALESPERSON -> REGION (Each salesperson is assigned to

a unique region)

Page 24: Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second

•Insertion Anomaly: A new salesperson Robinson

assigned to the North region cannot be entered until a

customer has been assigned.

•Deletion Anomaly: If Customer Number 6577 is deleted

from the relation, we lose the information that Hernandez is

assigned to the East region

•Modification anomaly: If salesperson Smith is reassigned

to the East region, several rows must be changed to reflect

that fact.

Anomalies with Sales

Page 25: Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second

Removing transitive dependencies

The transitive dependencies can be removed by:

Decomposing SALES into two relations:

SALES1: (CUST NO, NAME, SALESPERSON)

SPERSON (SALESPERSON, REGION)

The determinant in the transitive dependency in SALES

(i.e. SALESPERSON dependency) becomes primary key

in the SPERSON & foreign key in the SALES1 relation

Page 26: Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second

• Transitive dependency can occur between sets of

attributes in a relation.

E.g.

SHIPMENT (SNUM, ORIGIN, DESTINATION,

DISTANCE)

Functional dependencies:

SNUM -> ORIGIN, DESTINATION, DISTANCE

ORIGIN, DESTINATION -> DISTANCE

Transitive dependency between sets of attributes

Page 27: Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second

Identify the insertion anomaly

Identify the deletion anomaly

Identify the modification anomaly

SNUM ORIGIN DESTINATION DISTANCE409 Seattle Denver 1537618 Chicago Dallas 1058723 Boston Atlanta 1214824 Denver Los Angeles 1150629 Minneapolis St. Louis 587

Page 28: Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second

SNUM ORIGIN DESTINATION409 Seattle Denver618 Chicago Dallas723 Boston Atlanta824 Denver Los Angeles629 Minneapolis St. Louis

ORIGIN DESTINATION DISTANCESeattle Denver 1537Chicago Dallas 1058Boston Atlanta 1214Denver Los Angeles 1150Minneapolis St. Louis 587

SHIPMENT1 (SNUM, ORIGIN, DESTINATION)OD_DISTANCE (ORIGIN, DESTINATION, DISTANCE)

Relations in 3NF

Page 29: Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second

CourseID Cname InstructorID InameMIS310 Intro MIS 123456789 ShawMIS325 Database Mgt 234567891 KimMIS333 Computer Sys 456789123 LeeMIS373 Data Comm 345678901 Barua

Course(CourseID, Cname, InstuctorID, Iname)

3NF

Course(CourseID, Cname, InsturctorID) Instructor (InstructorID, Iname)

CourseID Cname InstructorIDMIS310 Intro MIS 123456789MIS325 Database Mgt 234567891MIS333 Computer Sys 456789123MIS373 Data Comm 345678901

InstructorID Iname123456789 Shaw234567891 Kim456789123 Lee345678901 Barua

Page 30: Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second

ER Model and Third Normal Form (3NF)

• In general, if we have a “good” ER model and convert this model to relation schemes according to the transformation rules, we can get the relations with 3NF.

Page 31: Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second

Additional Normal Forms

• Relations in third normal form are sufficient for most

practical database applications

• However, 3NF does not guarantee that all anomalies have

been removed.

• There are additional normal forms to remove them:

Boyce-Codd Normal Form

Fourth Normal Form

Fifth Normal Form

Domain Key Normal Form

Page 32: Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second

Steps in NormalizationTable with repeating groups

First normal form (1NF)

Second normal form (2NF)

Third normal form (3NF)

Boyce-Codd normal form (BCNF)

Fourth normal form (4NF)

Fifth normal form (5NF)

Removerepeatinggroups

Removetransitive

dependencies

Removemultivalued

dependencies

Removepartial

dependencies

Remove remaininganomalies resulting

from functionaldependencies

Removeremaininganomalies