Download - Normalization Notes by Mahendra Patil
7/28/2019 Normalization Notes by Mahendra Patil
http://slidepdf.com/reader/full/normalization-notes-by-mahendra-patil 1/18
Normalization in DBMS, notes prepared by Mahendra Patil
Normalization:
A technique for producing a set of tables with desirable properties that support therequirements of a user or company.
Process of decomposing relations with anomalies to produce smaller, well-structuredrelations.
Normalisation is a process for deciding which attributes should be grouped together in
a relation.
Use to validate and improve logical design to satisfy certain constraints - avoidunnecessary duplication of data.
Objective of Normalization:
The basic objectives of normalization are:1) To reduce redundancy which means that information is to be stored only once.2) To reduce file storage space required by base tables.
3) To reduce the inconsistency caused by redundancy.4) To make it feasible to represent any relation in the database.
5) To free relations from undesirable insertion, update, and deletion anomalies.
Properties of Normalized Relations:
a. No data value should be duplicated in different rows unnecessarily.
b. A value must be specified (and required) for every attribute in a row.
c. Each relation should be self-contained. In other words, if a row from a relation is
deleted, important information should not be accidentally lost.
d. When a row is added to a relation, other relations in the database should not be
affected.
e. A value of an attribute in a tuple may be changed independent of other tuples in the
relation and other relations.
1
7/28/2019 Normalization Notes by Mahendra Patil
http://slidepdf.com/reader/full/normalization-notes-by-mahendra-patil 2/18
Normalization in DBMS, notes prepared by Mahendra Patil
Data redundancy and update anomalies:
Problems associated with data redundancy are illustrated by comparing the Staff and
Branch tables with the StaffBranch table.
Fig: StaffBranch Table
StaffBranch table has redundant data; the details of a branch are repeated for everymember of staff.
In contrast, the branch information appears only once for each branch in the
Branch table and only the branch number (branchNo) is repeated in the Staff table,
to represent where each member of staff is located.
Tables that contain redundant information may potentially suffer from updateanomalies.
Types of update anomalies include
1) insertion
2) deletion
3) modification/updation
1) Insert Anomalies: Try to insert details for a new member of staff into StaffBranch.
You also need to insert branch details that are consistent with existing details for
the same branch.
Hard to maintain data consistency with StaffBranch
2
7/28/2019 Normalization Notes by Mahendra Patil
http://slidepdf.com/reader/full/normalization-notes-by-mahendra-patil 3/18
Normalization in DBMS, notes prepared by Mahendra Patil
2) Delete Anomalies:
Try to delete details for a member of staff from StaffBranch.
You also lose branch details in that row (tuple).
3) Update Anomalies:
Try to update the value of one of the attributes of a branch.
You also need to update that information in all the rows about the same branch.
Decomposition of Relations :
Two important properties of decomposition:
Lossless-join property enables us to find any instance of original relation from
corresponding instances in the smaller relations.
Dependency preservation property enables us to enforce a constraint on
original relation by enforcing some constraint on each of the smaller relations.
Staff and Branch relations which are obtained by decomposing StaffBranch do notsuffer from these anomalies.
Steps in Normalisation:
First normal form: Any multivalued attributes (repeating groups) have beenremoved
Second normal form: Any partial functional dependencies have been removed
Third normal form: Any transitive dependencies have been removed
Boyce/Codd normal form: Any remaining anomalies that result from functional
dependencies have been removed
Fourth normal form: Any multivalued dependencies have been removed
Fifth normal form: Any remaining anomalies have been removed
Usually only bother with First to third
Following Fig shows process:
3
7/28/2019 Normalization Notes by Mahendra Patil
http://slidepdf.com/reader/full/normalization-notes-by-mahendra-patil 4/18
Normalization in DBMS, notes prepared by Mahendra Patil
Relationship of Normal Forms:
4
7/28/2019 Normalization Notes by Mahendra Patil
http://slidepdf.com/reader/full/normalization-notes-by-mahendra-patil 5/18
Normalization in DBMS, notes prepared by Mahendra Patil
The Process of Normalization :
Given a relation, use the following cycle
1. Find out what normal form it is in.
2. Transform the relation to the next higher form by decomposing it to form simpler
relations
3. You may need to refine the relation further if decomposition resulted in
undesirable properties
First normal form (1NF):
A relation is in 1NF if and only if all underlying domains contain atomic values only.
Or
A table in which the intersection of every column and record contains only one value.
Steps from UNF to 1NF
1. Nominate an attribute or group of attributes to act as the key for theunnormalized table.
2. Identify repeating group(s) in unnormalized table which repeats for the key
attribute(s).
Fig: Branch table is not in 1NF
5
7/28/2019 Normalization Notes by Mahendra Patil
http://slidepdf.com/reader/full/normalization-notes-by-mahendra-patil 6/18
Normalization in DBMS, notes prepared by Mahendra Patil
Second normal form (2NF) :
A relation is in 2NF if it is in 1NF and every non-key attribute is fully dependent on
primary key of the relation.
2NF only applies to tables with composite primary keys.
Functional dependency :
Functional Dependency
Describes relationship between attributes in a relation or columns in a table.
If A and B are columns of table R, B is functionally dependent on A if each value
of A in R is associated with exactly one value of B in R. It is represented by A->B. Weare interested in finding such functional dependencies among database relations
6
7/28/2019 Normalization Notes by Mahendra Patil
http://slidepdf.com/reader/full/normalization-notes-by-mahendra-patil 7/18
Normalization in DBMS, notes prepared by Mahendra Patil
• Determinant of a functional dependency refers to attribute or group of attributes
on left-hand side of the arrow.
• If the determinant can maintain the functional dependency with a minimum
number of attributes, then we call it fully functional dependency.
1NF to 2NF :
Steps:
1. Identify primary key for the 1NF relation.
2. Identify functional dependencies in the relation.
3. If partial dependencies exist on the primary key remove them by placing them in a
new relation along with copy of their determinant.
For ex:
7
7/28/2019 Normalization Notes by Mahendra Patil
http://slidepdf.com/reader/full/normalization-notes-by-mahendra-patil 8/18
Normalization in DBMS, notes prepared by Mahendra Patil
Fig: TempStaffAllocation table is not in 2NF
8
7/28/2019 Normalization Notes by Mahendra Patil
http://slidepdf.com/reader/full/normalization-notes-by-mahendra-patil 9/18
Normalization in DBMS, notes prepared by Mahendra Patil
9
7/28/2019 Normalization Notes by Mahendra Patil
http://slidepdf.com/reader/full/normalization-notes-by-mahendra-patil 10/18
Normalization in DBMS, notes prepared by Mahendra Patil
Third normal form (3NF) :
A relation R is in third normal form if it is in 2NF and every non-key attribute of R is
non-transitively dependent on primary key of R.
For example, consider a table with A, B, and C. If B is functional dependent on A(A-> B) and C is functional dependent on B (B-> C), then C is transitively
dependent on A via B (provided that A is not functionally dependent on B or C).
If a transitive dependency exists on the primary key, the table is not in 3NF.
2NF to 3NF :
Steps:
1. Identify the primary key in the 2NF relation.2. Identify functional dependencies in the relation.
3. If transitive dependencies exist on the primary key, remove them by placing them
in a new relation along with copy of their determinant.
For ex:
10
7/28/2019 Normalization Notes by Mahendra Patil
http://slidepdf.com/reader/full/normalization-notes-by-mahendra-patil 11/18
Normalization in DBMS, notes prepared by Mahendra Patil
Fig: StaffBranch table is not in 3NF
11
7/28/2019 Normalization Notes by Mahendra Patil
http://slidepdf.com/reader/full/normalization-notes-by-mahendra-patil 12/18
Normalization in DBMS, notes prepared by Mahendra Patil
Fig: Converting the StaffBranch table to 3NF
12
7/28/2019 Normalization Notes by Mahendra Patil
http://slidepdf.com/reader/full/normalization-notes-by-mahendra-patil 13/18
Normalization in DBMS, notes prepared by Mahendra Patil
Boyce/Codd Normal Form (BCNF):
A relation is BCNF ⇔ every determinant is a candidate key
A determinant is an attribute, possibly composite, on which some other attribute is
fully functionally dependent For ex: Consider a relation SJT (Student-Subject-Teacher relation)
S J T
Smith Math Prof. White
Smith Physics Prof. Green
Jones Math Prof. WhiteJones Physics Prof. Brown
1. For each subject(J), each student (S) of that subject taught by only one teacher(T):
FD: S,J -> T
2. Each teacher (T) teaches only one subject(J):
FD: T -> J
3. Each subject (J) is taught by several teacher:
MVD: J -> -> T
There exists a relation SJT with attributes S (student), J (subject) and T (teacher).
The meaning of SJT tuple is that the specified student is taught the specified subject
by the specified teacher. There are two determinants: (S, J) and T in functional dependency.
Anomalies in update: If the fact that Jones studies physics is deleted, the fact that
Professor Brown teaches physics is also lost. It is because T is a determinant butnot a candidate key.
13
7/28/2019 Normalization Notes by Mahendra Patil
http://slidepdf.com/reader/full/normalization-notes-by-mahendra-patil 14/18
Normalization in DBMS, notes prepared by Mahendra Patil
Fig: relation SJ Fig: relation TJ
Relations (S, J) and (T, J) are in BCNF because all determinants are candidate keys.
BCNF vs 3NF:
It should be noted that most relations that are in 3NF are also in BCNF. Infrequently, a3NF relation is not in BCNF and this happens only if
(a) the candidate keys in the relation are composite keys (that is, they are not single
attributes),
(b) there is more than one candidate key in the relation, and(c) the keys are not disjoint, that is, some attributes in the keys are common.
The BCNF differs from the 3NF only when there are more than one candidate keys and
the keys are composite and overlapping.
• BCNF: For every functional dependency X->Y in a set F of functional
dependencies over relation R, either:
– Y is a subset of X or, – X is a superkey of R
• 3NF: For every functional dependency X->Y in a set F of functional dependencies
over relation R, either: – Y is a subset of X or, – X is a superkey of R, or
– Y is a subset of K for some key K of R
For Example:
Consider a 3NF schema which is not in BCNF:
14
7/28/2019 Normalization Notes by Mahendra Patil
http://slidepdf.com/reader/full/normalization-notes-by-mahendra-patil 15/18
Normalization in DBMS, notes prepared by Mahendra Patil
Client, Office -> Client, Office, Account
Account -> Office
Account Client OfficeA Joe 1
B Mary 1
A John 1
C Joe 2
3NF has some redundancy BCNF does not.
Unfortunately, BCNF is not dependency preserving, but 3NF is.
Account Office
A 1
B 1C 2
Account Client
A Joe
B Mary
A John
C Joe
No No-trival FD’s
15
7/28/2019 Normalization Notes by Mahendra Patil
http://slidepdf.com/reader/full/normalization-notes-by-mahendra-patil 16/18
Normalization in DBMS, notes prepared by Mahendra Patil
Multi-valued Dependency:
Given a relation R with attributes A, B and C. The multi-valued dependence R.A
→→R.B holds ⇔ the set of B-values matching a given (A-value, C-value) pair in
R depends only on the A-value and is independent of the C-value
16
7/28/2019 Normalization Notes by Mahendra Patil
http://slidepdf.com/reader/full/normalization-notes-by-mahendra-patil 17/18
Normalization in DBMS, notes prepared by Mahendra Patil
Fourth Normal Form(4 NF):A relation is in 4NF⇔whenever there exists an multi-
valued dependence (MVD), say A→→B, then all attributes are also functionally
dependent on A, i.e. A→X for all attribute X of the relation
For Ex: Relation CTX (not in 4NF)
Course Teacher Text
Physics Prof. Green Basic Mechanics
Physics Prof. Green Principles of Optics
Physics Prof. Brown Basic Mechanics
Physics Prof. Brown Principles of Optics
Physics Prof. Black Basic Mechanics
Physics Prof. Black Principles of Optics
Math Prof. White Modern Algebra
Math Prof. White Projective Geometry
A tuple (C, T, X) appears in CTX ⇔ course C can be taught by teacher T and usesX as a reference. For a given course, all possible combinations of teacher and text
appear – that is, CTX satisfies the constraint: if tuples (C, T1, X1), (C, T2, X2)
both appears, then tuples (C, T1, X2), (C, T2, X1) both appears also. CTX contains redundancy
CTX is in BCNF as there are no other functional determinants
But CTX is not in 4NF as it involves an MVD that is not an FD at all, let alone an
FD in which the determinant is a candidate key Anomalies in insert: For example, to add the information that the physics course
uses a new text called Advanced Mechanism, it is necessary to create three new
tuples, one for each of the three teachers.
Fig: Relation CT Fig: Relation CX
17
7/28/2019 Normalization Notes by Mahendra Patil
http://slidepdf.com/reader/full/normalization-notes-by-mahendra-patil 18/18
Normalization in DBMS, notes prepared by Mahendra Patil
4NF is an improvement over BCNF, in that it eliminates another form of undesirable
structure
Fifth Normal Form (5NF)/ Projection-Join Normal form:
Join dependency: relation R satisfies the JD (X, Y,…Z) ⇔ it is the join of its
projections on X, Y,…Z where X, Y,…Z are subsets of the set of attributes of R
A relation is in 5NF/PJNF (Projection-join normal form) ⇔ every join dependencyin R is implied by the candidate keys of R
5NF is the ultimate normal form with respect to projection and join.
For Ex:
Summary:
• Relations are categorized as a normal form based on which modification anomaliesor other problems that they are subject to:
18