chapter 2 relational data model-part 3
TRANSCRIPT
CHAPTER 2: RELATIONAL DATA MODEL PART 3
DFC2033 DATABASE SYSTEM
Learning Outcome
¨ Define normalization. ¨ Explain the importance of normalization in database. ¨ Define functional dependencies (FD). ¨ Describe the various types of normal forms:
a. First normal form (1 NF) b. Second normal form (2 NF) c. Third normal form (3 NF)
¨ Define Boyce-Codd Normal Form (BCNF).
Introduction to Normalization
¨ Normalization is the process of decomposing relations with anomalies to produce smaller, well structured relation.
¨ Normalization is a process for assigning attributes to entities to determine whether our chosen entities, attributes and primary keys are appropriate and suitable for the system.
Normalization of Database Table
¨ By doing NORMALIZATION: ¤ Redundancies can be reduced ¤ Anomalies can be eliminated
¨ Normalization process can be divided into a few levels called Normal Forms (NF). The NF that will be covered in this subject are: ¤ 1NF ¤ 2NF ¤ 3NF ¤ Boyce-Codd Normal Form (BCNF)
Introduction to Normalization
Importance of normalization
¨ Improve system performance and accuracy. ¨ Support integrity and consistency of the data ¨ Save space, minimize redundancy and eliminate
anomalies. ¨ The goal of normalization is to create a set of
relational tables that are free of redundant data and that can be consistently and correctly modified.
Anomalies
¨ Problems that occur when information is inserted, deleted or updated.
¨ Three types of update anomalies: ¤ Insertion Anomalies ¤ Deletion Anomalies ¤ Modification Anomalies
Insertion Anomalies
¨ To insert details of new members of staff into StaffBranch relations, we must include the details of the branch at which the staff are to be located.
¨ For example to insert new details the staff located at B007 , we must enter the correct details of Branch B007 so that the branch details are consistent with values for Branch B007 in other tuples of StaffBrach relation.
¨ Table Staff and Branch in Figure 1 do not suffer from this potential inconsistencies, because we enter the appropriate branch number for each staff member in the Staff relation. Instead, the details of branch number B007 is recorded in the database as a single tuple in the Brach relation.
Insertion Anomalies
¨ Second problem in StaffBrach relation is to insert new branch that currently has no members of staff into the StaffBrach relation. So it is necessary to enter nulls into the attributes for staff, such as StaffNo. However, staffNo is the primary key for the StaffBranch relation. Attempting to enter nulls for the staffNo violate the entity integrity and it is not allowed.
¨ Therefore we cannot enter tuple for a new Branch into StaffBranch relation unless we already has staff in that branch.
¨ Table in Figure 1 can avoid this problem because branch details are entered separately from the staff details.
Deletion Anomalies
¨ If we delete a tuple from the StaffBranch relation that represents the last member of staff located at a branch, the details about the branch are also lost from the database.
¨ For example if we delete details for staff number 400 (Kumar) from StaffBranch relation, the details relating to branch number B007 are lost from database.
¨ The design of relation in Figure 1 avoid this problem because branch tuples are stored separately from the staff tuples. If we delete staff number 400 from Staff relation, the details on branch B007 remain unaffected in the Branch relation.
Modification Anomalies
¨ If we want to change the value of one of the attributes of a particular branch in the StaffBranch relation, for example the address of branch number B003, we must update the tuples of all staff located at that branch.
¨ If the modification is not carried out on all the appropriate tuples of StaffBranch relation, the database will become inconsistent.
Functional Dependency
¨ Constraint between two attributes or two sets of attributes. ¨ For any relation R, attribute B is functionally dependent on
attribute A if, for every valid instance of A, that value of A uniquely determines the value of B.
¨ The functional dependency of B on A is represented by an arrow, as follows: A→ B. An attribute may be functionally dependent on two (or more) attributes rather than on a single attribute.
¨ Three type of Functional Dependency ¤ Full Functional Dependency ¤ Partial Functional Dependency ¤ Transitive Functional Dependency
Functional Dependencies
¨ Determinant ¤ the attribute or group of attributes on the left-hand
side of the arrow of a functional dependency.
Determinants
¨ The attribute on the left-hand side of the arrow in a functional dependency.
¨ Examples: ¤ SSN → Name, Address, Birthdate ¤ VIN → Make, Model, Color ¤ ISBN → Title, First_Author_Name
Full Functional Dependency
¨ Full Functional Dependency ¤ Indicates that if A and B are attributes of a relation, B
is fully functionally dependent on A, but not on any proper subset of A.
Full Functional Dependency
staffNo sName position salary branchNo bAddress
100 Ahmad Manager 30000 B005 Penang
200 Sally Assistant 12000 B003 Kelantan
300 Zaidi Supervisor 18000 B003 Kelantan
400 Kumar Assistant 9000 B007 Seremban
500 Desmond Manager 24000 B003 Kelantan
600 Mei Lin Assistant 9000 B005 Penang
FD : staffNo à sName, position, salary, brachNo, bAdress
Table : StaffBranch
The relation is not in full dependency because bAddress is fuctionally dependent on branchNo staffNo à sName, position, salary, brachNo brachNo à bAddress
Full Functional Dependency
staffNo sName position salary branchNo
100 Ahmad Manager 30000 B005
200 Sally Assistant 12000 B003
300 Zaidi Supervisor 18000 B003
400 Kumar Assistant 9000 B007
500 Desmond Manager 24000 B003
600 Mei Lin Assistant 9000 B005
branchNo bAddress
B005 Penang
B003 Kelantan
B007 Seremban
Figure 1 : Staff and Branch relations
Partial Dependency
¨ Occurs when an attribute is functionally dependent on only a part of a multi-attribute key (a key that is made up of more than one field).
¨ A table with only a single-attribute primary key cannot exhibit partial dependency
Partial Dependency
Stud_ID StudName Course_ID Course_Title
10 Ali F3038 Database System
20 Abu B2009 Discrete Math
20 Abu F3038 Database System
40 Alia B2009 Discrete Math
FD Stud_ID, Course_ID à StudName, Course_Title Remove Partial Dependency StudID,CourseID à StudName CourseID à Course_Title
Student-Course
Partial Dependency
Course_ID Course_Title
F3038 Database System
B2009 Discrete Math
Stud_ID StudName Course_ID
10 Ali F3038
20 Abu B2009
20 Abu F3038
40 Alia B2009
Transitive Dependency
¨ Occurs when an attribute is functionally dependent on another non-key attribute. For example, if A → B and B → C, then A → C. That is, if B depends on A, and C depends on B, then C depends on A. This is called transitive dependency.
¨ Refer to example in Full Functional Dependency.
Transitive Dependency
¨ FD staffNo à sName, position, salary, brachNo, bAdress staffNo (A) à branchNo (B) branchNo (B) à branchAddress (C) staffNo (A) à branchAddress (C) staffNo à sName, position, salary, brachNo brachNo à bAddress
Normalization
Table with mul-valued a0ributes
First Normal
Form
Second Normal Form
Third Normal Form
Remove repea-ng groups / mul-valued a0ributes
Remove par-al dependencies
Remove transi-ve dependencies
Basic Normal Form
¨ Scenario : A company obtains parts from a number of suppliers. Each supplier is located in one city. A city can have more than one supplier located there and each city has a status code associated with it. Each supplier may provide many parts. The company creates a simple relational table to store this information that can be expressed in relational notation as:
SUPPLIER-PART(s_id, status, city, p_id, qty)
Unnormalize Form
¨ A table that contains one or more repeating groups. s_id status city part_id quan-ty S1 20 London P1
P2 P3 P4 P5 P6
300 200 400 200 100 100
S2 10 Paris P1 P2
300 400
s3 10 Paris P2 200 S4 20 London P2
P4 P5
200 300 500
SUPPLIER (s_id, status, city, p_id, qty)
First Normal Form
¨ Def : A relation in which the intersection of each row and column contains one and only one value.
s_id status city part_id quan-ty S1 20 London P1 300 S1 20 London P2 200 S1 20 London P3 400 S1 20 London P4 200 S1 20 London P5 100 S1 20 London P6 100 S2 10 Paris P1 300 S2 10 Paris P2 400 S3 10 Paris P2 200 S4 20 London P2 200 S4 20 London P4 300 S4 20 London P5 500
SUPPLIER (s_id, p_id ,status, city,qty)
First Normal Form (1NF)
¨ First normal form still contains redundant data. ¨ Redundancy causes problem called update
anomalies.
Second Normal Form (2NF)
¨ Def : A relation is in first normal form and every non-candidate key attribute is fully functionally dependent on any candidate key.
¨ A table that is in first normal form and every non-primary key attribute is fully dependent on the primary key.
¨ Involve in removing partial dependencies.
1NF to 2NF
1NF Supplier (s_id, part_id, city, status, quantity
FD
s_id, part_id à city, status, quantiy
2NF FD
s_id à city, status
s_id, part_id à quantiy
Supplier (s_id, city, status)
Part (s_id, part_id, quantity)
s_id status city part_id quan-ty S1 20 London P1 300 S1 20 London P2 200 S1 20 London P3 400 S1 20 London P4 200 S1 20 London P5 100 S1 20 London P6 100 S2 10 Paris P1 300 S2 10 Paris P2 400 S3 10 Paris P2 200 S4 20 London P2 200 S4 20 London P4 300 S4 20 London P5 500
Second Normal Form
s_id status city S1 20 London S2 10 Paris S3 10 Paris S4 20 London
s_id part_id quan-ty S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 S2 P1 300 S2 P2 400 S3 P2 200 S4 P2 200 S4 P4 300 S4 P5 500
Supplier (s_id, city, status)
Part (s_id, part_id, quantity)
Third Normal Form (3NF)
¨ Def : A relation is in the first and second normal form and in which no non-candidate-key attribute is transitively dependent on any candidate key.
¨ All columns in a relational table are dependent only upon the primary key.
¨ Involve in removing transitive dependencies.
2NF to 3NF
2NF FD
s_id, part_id à quantiy
s_id à city, status
PART is already in 3NF. The non-key column, qty, is fully dependent upon the primary key (s_id, part_id)
s_id status city
S1 20 London
S2 10 Paris
S3 10 Paris
S4 20 London
Transitive Dependency ! city is determined both by the primary key s_id and the non-key column status.
Supplier
2NF to 3NF
2NF s_id à city, status
Transitive dependency
s_id (A) à status (B)
status (B)à city (C) s_id (A) à city (C)
s_id status city
S1 20 London
S2 10 Paris
S3 10 Paris
S4 20 London
2NF to 3NF
s_id status
S1 20
S2 10
S3 10
S4 20
s_id status city
S1 20 London
S2 10 Paris
S3 10 Paris
S4 20 London
Supplier
status city
10 Paris
20 London
Supplier_status Status_city
3NF relations
s_id part_id quan-ty S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 S2 P1 300 S2 P2 400 S3 P2 200 S4 P2 200 S4 P4 300 S4 P5 500
s_id status
S1 20
S2 10
S3 10
S4 20
status city 10 Paris
20 London
Status_city Supplier_status Part
3NF Part (s_id, part_id, quantity) Supplier_status (s_id, status) Status_city (status, city)
Boyce-Codd Normal Form (BCNF)
¨ Def : A relation is in BCNF of and only if every determinant is a candidate key.
¨ Advance version of normal form. ¨ Based on the concept of determinants. ¨ BCNF is considered to be part of 3NF. It is
perceived to be lower than 4NF but higher than 3NF/
¨ However you may have a table that is in 3NF but not in BCNF.
BCNF
¨ Advance version of the 3NF deal with relational
tables that has ¤ Multiple candidate keys ¤ Composite candidate keys ¤ Candidate keys that overlapped.