abridged lecture notes for dcs210 - auwal genemystudents/lecturenotes/... · abridged lecture notes...
TRANSCRIPT
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 1 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
Abridged Lecture Notes For
DCS210 Introduction to Database Management (II)
prepared and delivered by
Adamu Auwal Gene MCPN @IACC, Ahmadu Bello University, Zaria – Nigeria
Last Updated: March, 2014
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 2 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
A Gentle Introduction to
Database Normalization
Abridged Lecture Notes For
DCS210 Introduction to Database Management (II)
Diploma in Computer Science
Year II, Semester II
Prepared and Delivered
By
Adamu Auwal Gene MCPN Chartered Information Technology Professional
@Iya Abubakar Computer Centre, Ahmadu Bello University, Zaria – Nigeria
Last Updated: March, 2014
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 3 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
CONTENTS
INTRODUCTION ................................................................................................................................................... 5
JUST SOME FEW RULES PLEASE, BEFORE WE START ......................................................................................... 6
SECTION I: BASIC CONCEPTS OF NORMALIZATION ........................................................................................... 8
1.1: So, what is normalization? .................................................................................................................. 8
1.2: Anomalies ............................................................................................................................................ 9
1.2.1: Insert anomaly ........................................................................................................................... 10
1.2.2: Delete anomaly .......................................................................................................................... 10
1.2.3: Update anomaly ........................................................................................................................ 10
1.3: Purpose or goals of normalization..................................................................................................... 10
1.4: Advantages and disadvantages of normalization .............................................................................. 11
1.5: Functional Dependencies (FD) ........................................................................................................... 12
1.6: Partial Dependencies ......................................................................................................................... 14
1.7: Transitive Dependencies ................................................................................................................... 14
1.8: Normal Forms (NF) ............................................................................................................................ 15
1.8.1: First Normal Form (1NF) ............................................................................................................ 16
1.8.2: Second Normal Form (2NF) ....................................................................................................... 18
1.8.3: Third Normal Form (3NF) .......................................................................................................... 19
1.8.4: Other Normal Forms .................................................................................................................. 20
1.9: Class Exercises: Can You? ................................................................................................................. 21
1.10: Extra Credit: Can You? .................................................................................................................. 22
SECTION II: NORMALIZATION BY EXAMPLE ..................................................................................................... 23
2.1: Case-Study Introduction .................................................................................................................... 23
2.2: Conversion to First Normal Form ...................................................................................................... 25
2.2.1: Step 1: Eliminate the repeating groups ..................................................................................... 26
2.2.2: Step 2: Identify the primary key ................................................................................................ 26
2.2.3: Step 3: Identify all dependencies .............................................................................................. 26
2.3: Conversion to Second Normal Form ................................................................................................. 28
2.3.1: Step 1: Identify all the key components .................................................................................... 28
2.3.2: Step 2: Identify the dependent attributes ................................................................................. 29
2.4: Conversion to Third Normal Form ..................................................................................................... 30
2.4.1: Step 1: Identify each new determinant ..................................................................................... 30
2.4.2: Step 2: Identify the dependent attributes ................................................................................. 31
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 4 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
2.4.3: Step 3: Remove the dependent attributes from transitive dependencies ............................... 31
2.5: Improving the Design ........................................................................................................................ 32
SECTION III: NORMALIZATION EXTRA CREDITS (CAN YOU?) ........................................................................... 35
3.1: Extra Credit 1: Worked Example ....................................................................................................... 35
3.2: Extra Credit 2: Worked Example ....................................................................................................... 37
3.3: Extra Credit 3: Worked Example ....................................................................................................... 37
SECTION IV: USEFUL DATABASE TERMS AND DEFINITIONS ............................................................................ 40
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 5 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
INTRODUCTION
ELCOME TO the second part of your course on Introduction to Database
Management. Of course you have been learning about and working with
databases since last semester here at IACC; and the term "normalization"
is not new to you – we have referred to it more than a dozen times in last
semester's course, DCS209. You also might have heard phrases or sentences like "the
database is not [correctly] normalized" or "My database is now in BCNF" and so on. All
these terminologies may sound somewhat academic or intimidating but trust me, you
need not be scared about normalization anymore once we finish this course manual
(and especially when you attend my lectures and labs punctually, consistently and
attentively.
In this course manual, you will be introduced to the basic concept of database
normalization, taking a brief look at the most common normal forms. Your future
explorations of the principles of database design and implementations will provide
more in-depth principles and practices of the normalization process.
Please be reminded that this manual covers only a topic in the whole of your DCS210
syllabus, so be sure to go online and download the all other manuals for DCS210 and
more useful resources at my website, which can be accessed anytime at
http://www.auwalgene.com/mystudents/lecturenotes
IMPORTANT NOTE: It is YOUR PERSONAL RESPONSIBILITY to download, print and
bind all course materials as advised. Your C.A. and exam questions will always be
set from what has been covered in these manuals and in the class. As a result
therefore, your final scores in this course will depend largely on how promptly you
get and study all the materials, as well as how regularly and attentively you
participate in class and lab sessions.
Finally, I take no responsibility for any spelling or grammaticl errors found in this
manual. My written English is probably none too good, so I won't take offence at any
corrections from any Grammar Nazi out there: I am fluent in programming languages,
not the English language!
Best regards, and happy database normalization!
M-Auwal Gene mcpn @IACC, ABU Zaria March, 2014
W
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 6 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
JUST SOME FEW RULES PLEASE, BEFORE WE START
1: Attendance Policy: Please note
that all students are expected to
attend every class and lab session on
time. Punctuality is expected, and is
part of your cumulative continuous
assessment. In case of unexpected
events that make it impossible for
any student to attend class or lab
sessions, such students should
contact me (or any other Instructor
in charge) via phone call or send an
SMS text message briefly explaining
why they would not be in the class or
lab.
2: Extra Credit: Occasionally there
are opportunities for students to earn
extra credits for exceptionally
excellent work or enthusiastic
attitude towards study in the this
course. There is no guarantee that
there will be extra credit
opportunities every time; but
whenever the opportunity arises, all
students will have an equal chance of
earning those extra credits.
Maximum extra credit obtainable by
any student is 5 points (out of 100).
3: Assignments: To evaluate
students’ learning progress, one or
more take-home assignments shall be
given to students at the end of every
class or lab session. Those
assignments will mostly be based on
current topics being discussed; but
may also sometimes include work
outside of the current topic.
4: Make-Up and Late Policy: All
assignments that are handed in late
will be docked 2% per day that they
are late, unless arrangements have
been made at least 48 hours before
the due date. The term “LATE” refers
to all assignments that are turned in
after the class or lab time on the
assignment’s due date. Please note
that I am not responsible for you not
having your personal laptop, or not
having Internet access, or not having
access to the lab computers to enable
you do your assignments. You will
normally be given freedom to do all
practical assignments in the lab if you
properly approach the Centre’s
Operations Manager or any of the Lab
Support Staff on duty.
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 7 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
5: Grading Policy: The following
grading policy shall apply during this
course (both theory and practical
labs are covered):
Please note that every student’s
grade totally depends on what he or
she has achieved during the course:
the grades will be earned, not given!
6: Lab Etiquette: Since we are a
large class in a large lab, let us all
faithfully follow these four simple
rules in order to make life easy for
everyone:
i. Be punctual. Coming in late
disrupts your fellow students.
If you are going to be late for a
lab session, perhaps you should
not bother coming to the lab, as
you might not be able to catch
up anyway.
ii. Do not leave the lab early
unless it is an emergency.
iii. No texting, phone calls or
Internet browsing during class
or lab sessions.
iv. Kindly turn off cell phones, and
Internet access. If your phone
rings during a class or lab
session or you are seen
browsing the ’Net during a
class or lab session, you shall
be penalized – and your
penalty is to provide snacks
and drinks for the Instructor at
the next class or lab session.
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 8 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
SECTION I: BASIC CONCEPTS OF NORMALIZATION
1.1: So, what is normalization?
It will be great if we begin from the beginning, right? So we are going to start by
trying to understand clearly what normalization actually is. In database
management systems, normalization is a logical database design method. In
simple English, it is a process of systematically breaking a complex database table
into simpler ones so as to efficiently organize data in that database.
Put in another way, we can say normalization is a process in which database tables
are systematically examined for anomalies and, when detected, remove those
anomalies by splitting up the table into two new, related, tables.
If you like, you may also say normalization is a process for evaluating and
correcting table structures to minimize data redundancies, thereby helping to
eliminate data anomalies. It helps us evaluate table structures and produce
good tables.
In essence, normalization is the process of eliminating “bad” dependencies by
splitting up tables and linking them with foreign keys.
It is a formal process of decomposing relations with anomalies to produce
smaller, well-structured and stable relations (tables). Primarily, it is a tool to
validate and improve a logical design so that it satisfies certain constraints that
avoid unnecessary duplication of data.
Normalization is a very important part of the database development process; as it
is often during normalization that database designers get their first real look into
how the data are going to interact in the database.
Finding problems with the database structure at this stage is strongly preferred
to finding problems later on in the development process after so much work has
been done (wrongly).
In a short while, we shall understand more about normalization and try it out
ourselves; but for now, let us find out why we need normalization in database
design and what are the pros and cons of normalization.
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 9 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
Fig. 1: Normalization Concept Map
1.2: Anomalies
In relational database design, we not only want to create a structure that stores
all of the data, but we also want to do it in a way that minimize potential errors
when we work with the data. The default language for accessing data from a
relational database is SQL. You will recall that SQL can be used to manipulate data
in the following ways: insert new data, delete unwanted data, and update existing
data. Similarly, in an un-normalized design, there are 3 problems that can occur
when we work with the data:
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 10 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
1.2.1: Insert anomaly: This refers to the situation when it is impossible to insert
certain types of data into the database or, if data must be inserted anyway, then
insertion of new rows forces user to create duplicate data. In short, insert
anomaly occurs when extra data beyond the desired data must be added to a
table.
1.2.2: Delete anomaly: The deletion of data leads to unintended loss of additional
data, data that we had wished to preserve.
1.2.3: Update anomaly: This refers to the situation where updating the value of a
column leads to database inconsistencies (i.e., because data is duplicated,
changing data in a row forces changes to other rows otherwise unchanged data in
other row will cause those data to have inconsistent values). It occurs when it is
necessary to change multiple rows to modify ONLY a single fact.
To address the three problems above, we go through the process of
normalization. When we go through the normalization process, we increase the
number of tables in the database, while decreasing the amount of data stored in
each table. There are several different levels of database normalization as you
will learn later.
1.3: Purpose or goals of normalization
There are two important goals or
reasons for normalization:
i. to eliminate redundant data
(i.e. ensuring that the same
data is not stored in more than
one table). This improves
consistency.
ii. to ensure that data
dependencies make sense (only
storing related data in a table).
Some additional goals of
normalization include:
iii. to avoid or minimize anomalies
(i.e. insertion, deletion and
update anomalies).
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 11 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
iv. to provide maximum flexibility
to meet future information
needs by keeping tables
corresponding to object types
in their simplified forms.
v. to produce a clearer and
readable data model.
1.4: Advantages and disadvantages of normalization
Advantages:
i. Reduce data redundancy & space required
ii. Enhance data consistency
iii. Enforce data integrity
iv. Reduce update cost
v. Provide maximum flexibility in responding ad hoc queries
vi. Allow the use of parallelism,
vii. Can reduce the total number of rows per block.
Disadvantages:
i. Many complex queries will be slower because joins have to be performed to
retrieve relevant data from several normalized tables
ii. Programmers/users have to understand the underlying data model of a
database application in order perform proper joins among several tables
iii. The formulation of multiple-level queries is a very daunting, non-
trivial task.
Now, before we go any further, let us
recall that the relational model we
have been studying since last
semester (DCS209) consists of the
elements: relations (or tables), which
are made up of attributes (or
columns). We have learnt that:
A relation or table is a set of
attributes or columns with values
for each attribute such that:
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 12 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
1. Each attribute (column) value must
be atomic (i.e. a single value only).
2. All values for a given attribute
(column ) must be of the same data
type.
3. Each attribute (column) name must
be unique.
4. The sequence of attributes
(columns) is insignificant.
5. No two tuples (rows) in a table
should be identical.
6. The sequence of the tuples (rows) is
insignificant.
You will recall also, that from our
discussion of E-R Modeling, we
know that an entity typically
corresponds to a relation (or
table, if you like) and that the
entity’s attributes become
columns of the table.
We also discussed how, depending
on the relationships between
entities, copies of attributes (the
identifiers ) could be placed in
related tables, where they become
foreign keys.
From here, if we remember all our
fundamental discussions very well as
summarized above, then we can
begin to delve into normalization by
looking first at how to identify
functional dependencies within
relations or tables. But if we still have
issues with our fundamental
concepts of relational databases,
please refer back to your DCS209
notes and have a thorough revision,
then come and join us when you have
gotten those fundamental concepts
clear.
1.5: Functional Dependencies (FD)
A functional dependency or FD for
short describes a relationship
between attributes within a single
relation. That is, functional
dependency is about the relationship
between the columns of a table. So if
you have a table with two or more
columns, we say one column is
functionally dependent on another if
we can use the value of one column
to determine the value of another. A
simple example will make this clear:
Fig. 2
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 13 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
Say, we have a StudentsTable relation as shown in Fig. 2 for example. We can
know the name of any student if we know his/her registration number by writing
a SELECT query like this for example:
SELECT student_name, gender, points_earned
FROM StudentsTable WHERE reg_num = 'D14-005';
And that should return AUWAL
MUKHTAR GENE in the query result.
With this example, we say the
student_name attribute (or column)
of the StudentsTable relation is
functionally dependent on the
reg_num attribute (or column)
because reg_num can be used to
uniquely determine the value of
student_name.
That was easy for you to grasp, I
hope. Now, there are standard
conventions used to
communicate functional
dependency notations.
Generally, the arrow symbol → is
used to indicate a functional
dependency. So, we may have
something like: X → Y, which is read
or interpreted as "X functionally
determines Y" or, reading in the
reverse, we say "Y is functionally
dependent on X". So, for our
preceding reg_num and
student_name example above, we
may write reg_num →
student_name.
NOTE:
The attributes listed on the left hand side of the → symbol are called determinants.
One can also read A → B as, “A determines B”. Or more specifically: "Given
a value for A, we can uniquely determine one value for B".
A key (maybe a primary key, for example) functionally determines a tuple (row). So one functional dependency that can always be written is:
The Key → All other attributes
Not all determinants are keys, however!
Functional dependency require that the value for a certain set of attributes
determines uniquely the value for another set of attributes.
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 14 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
A functional dependency is a generalization of the notion of a key.
There is a great deal of mathematical theories behind normalization, but
we shall not concern ourselves with all of that in this introductory course.
1.6: Partial Dependencies
In a database, a partial dependency occurs when an attribute is dependent only
partially on the primary key, as opposed to the primary key in its entirety. In this
case, the primary key is a composite key. Having a partial dependency will violate
the second normal form. To remove these dependencies, separate tables will
need to be created so normalization is possible.
Example: Your instructor will come up with one or more examples in class to
illustrate partial dependencies. Please be present and attentive!
1.7: Transitive Dependencies
Transitive dependencies occur when
there is an indirect relationship that
causes a functional dependency.
Third Normal Form usually deals
with transitive dependencies. This
means if we have a primary key A
and a non-key domain B and C where
C is more dependent on B than A and
B is directly dependent on A, then C
can be considered transitively
dependant on A.
For example, ”A → C” is a transitive dependency when it is true only because both
“A → B” and “B → C” are true.
Another way to look at it is a bit like a
stepping stone across a river. If we
consider the primary key A to be the
far bank of the river and our non-key
domain C to be our current location,
in order to get to A, our primary key,
we need to step on a stepping stone
B, another non-key domain, to help
us get there. Of course we could jump
directly from C to A, but it is easier,
and we are less likely to fall in, if we
use our stepping stone B. Therefore
current location C is transitively
dependent on A through our stepping
stone B (see Fig. 2).
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 15 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
GOOD TO KNOW
From a structural point of view,
2NF is better than 1NF, and 3NF is
better than 2NF. For most business
database design purposes, 3NF is
as high as we need to go in the
normalization process. And some
very specialized applications may
require normalization beyond 4NF.
Note: A transitive dependency can
occur only in a relation that has three
or more attributes.
Fig. 2: Transitive dependency illustrated.
1.8: Normal Forms (NF)
Normalization works through a series of stages called normal forms.
There are quite a number of normal forms as
listed below:
First Normal form (1NF)
Second normal form (2NF)
Third normal form (3NF)
Boyce-Codd Normal Form (BCNF)
Forth Normal (4NF)
Fifth Normal (5NF)
Domain-key normal form (DKNF)
Although normalization is a very important database design ingredient, you
should not assume that the highest level of normalization is always the most
desirable. Generally, the higher the normal form, the more SQL joins are required
to produce a specified output and the more slowly the database system responds
to end-user demands. A successful design must, therefore, always consider end-
user demand for fast performance. So, you will occasionally be expected to
"denormalize" some portions of a database design in order to meet performance
requirements.
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 16 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
1.8.1: First Normal Form (1NF)
A database relation (table) is in first normal form (1NF) if and only if it satisfies
the following two key conditions:
i. Contains only atomic values
ii. There are no repeating groups or duplicate rows
(and according to some authors, an additional requirement is that entries in any
given column should be of the same kind.)
Explanation 1:
In relational database parlance, an "atomic value" is a value that cannot be
divided. For example, in a table that has [RowID], [StudentNames], [Gender] and
[ContactAddress] as its fields, the [StudentNames] field may contain values like
"Muhammad-Auwal Gene", "Aremu Oluwakemi Juliet" and so on; while the
[ContactAddress] field may contain values like "No. 18, Usman Akilu Street,
Kaduna" or "23/25, Mora Road, Tudun Wada, Zaria, Kaduna State".
Fig. 3
In the examples above, the contents of both the [StudentNames] and
[ContactAddress] fields are not atomic, because the [StudentNames] field can be
broken into surname and other_names (or surname, middle_name and
last_name); while the [ContactAddress] field can be broken into house_number,
street_name, city_name and state. So, such a table is not in 1NF because the
values are not atomic.
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 17 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
Explanation 2:
In relational database speak, a "repeating group" means that a table contains
two or more columns that are closely related. For example, a table that records
data on a book and its author(s) with the following columns: [Book ID], [Author
1], [Author 2], [Author 3] is not in 1NF because [Author 1], [Author 2], and
[Author 3] are all repeating the same attribute.
Important Note: The requirement that there be no duplicated rows in the table
means that the table should have a key (although the key might be made up of
more than one column – even, possibly, of all the columns).
Question: So, how do we correct a table that is not in 1NF to become 1NF? Let's
discuss in class.
1NF Example: Consider the following example:
This table is not in first normal form because the [PhoneNumbers] column is
allowed to contain multiple values. For example, the second row includes values
"08032126160" and "07067430539".
To bring this table to first normal form, we split the table into two tables and now
we have the resulting tables as follows:
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 18 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
Now first normal form is satisfied, as the columns on each table all hold just one
value. Note that we had to split the [ContactName] field in MyContactBasics table
too!
1.8.2: Second Normal Form (2NF)
A database relation (table) is in 2NF if it meets the criteria for 1NF and if all non-
key attributes are fully functional dependent on the primary key.
Note: Since a partial dependency occurs when a non-key attribute is dependent
on only a part of the (composite) key, the definition of 2NF is sometimes phrased
as, "A table is in 2NF if it is in 1NF and if it has no partial dependencies."
Note also that any table with a primary key that is composed of a single
attribute (column) is automatically in second normal form.
Explanation:
In a table, if attribute B is functionally dependent on A, but is not functionally
dependent on a proper subset of A, then B is considered fully functional
dependent on A. Hence, in a 2NF table, all non-key attributes cannot be
dependent on a subset of the primary key. Note that if the primary key is not a
composite key, all non-key attributes are always fully functional dependent on
the primary key. A table that is in 1st normal form and contains only a single key
as the primary key is automatically in 2nd normal form.
2NF Example: Consider the following example:
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 19 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
This table has a composite primary key [Customer ID, Store ID]. The non-key
attribute is [Purchase Location]. In this case, [Purchase Location] only depends
on [Store ID], which is only part of the primary key. Therefore, this table does
not satisfy second normal form.
To bring this table to second normal form, we break the table into two tables, and
now we have the following:
What we have done is to remove the partial functional dependency that we
initially had. Now, in the table TABLE_STORE, the column [Purchase Location] is
fully dependent on the primary key of that table, which is [Store ID].
1.8.3: Third Normal Form (3NF)
A database relation (table) is in 3NF if it meets the criteria for 2NF and if it has no
transitive dependencies (i.e. if each non-key attribute in a row does not depend
on the entry in another key column).
Remember: By "transitive functional dependency", we mean we have the
following relationships in the table: A is functionally dependent on B, and B is
functionally dependent on C. In this case, C is transitively dependent on A via B.
3NF Example: Consider the following example:
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 20 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
In the table able, [Book ID] determines [Genre ID], and [Genre ID] determines
[Genre Type]. Therefore, [Book ID] determines [Genre Type] via [Genre ID] and
we have transitive functional dependency, and this structure does not satisfy
third normal form.
To bring this table to third normal form, we split the table into two as follows:
Now all non-key attributes are fully functional dependent only on the primary
key. In TABLE_BOOK, both [Genre ID] and [Price] are only dependent on [Book
ID]. In TABLE_GENRE, [Genre Type] is only dependent on [Genre ID].
1.8.4: Other Normal Forms
Apart from the 1-3 NF you have learnt about so far, there are a number of other
normal forms which you are advised to investigate and know about. In
particular, the following four additional normal forms are recommended for your
further study:
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 21 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
Boyce-Codd Normal Form (BCNF)
Forth Normal (4NF)
Fifth Normal (5NF)
Domain-key normal form (DKNF)
1.9: Class Exercises: Can You?
The table below contains a number of functional dependency expressions
without interpretations, as well as interpretations without the FD expressions.
Write out the full expression or interpretation in the blank spaces provided:
SN FD EXPRESSION YOUR INTERPRETATION
01 Student_ID → Student_Major
02 Student_ID, CourseCode, Semester → Grade
03 Employee_Number functionally determines Current_Salary
04 Row_ID → Movie_Title, Main_Actor
05 Country_Name is functionally dependent on the Lga_ID
06 Lecture_Room, NumberOfStudents and Lecturer (delivering the lecture) are functionally dependent on the Course_Code and Course_Section
07 Car_Type, Maker_ID → Car_Price
08 Given a value for Car_ID, we can uniquely determine one value for Date_Sold, Price_Sold, Sales_Person, Buyer_ID
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 22 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
1.10: Extra Credit: Can You?
Consider R(empno, ename, deptno) with the following instance of R.
EMPNO ENAME DEPTNO
---------- ---------- ----------
7876 ADAMS 209
7499 TUNDE 301
7698 BUKAR 301
7600 BUKAR 405
7782 EMEKA 100
7902 LAWAL 209
7900 JAMES 301
7566 JAMIL 209
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 23 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
SECTION II: NORMALIZATION BY EXAMPLE
Before we begin, may I acknowledge that the bulk of this example on
normalization was adapted from the following file on the internet:
http://opencourseware.kfupm.edu.sa/colleges/cim/acctmis/mis311/files%5CChapter_5-
_Data_Normalization_Topic_1_-Database_Tables_and_Normalization.pdf
2.1: Case-Study Introduction
To illustrate the normalization
process, we will examine a simple
business application. In this case we
will explore the simplified database
activities of a construction company
that manages several building
projects. Each project has its own
project number, name, employees
assigned to it and so on. Each
employee has an employee number,
name, and job classification such as
engineer or computer technician.
The company charges its clients by
billing the hours spent on each
contract. The hourly billing rate is
dependent on the employee’s
position. Periodically, a report is
generated that contains the
information displayed in Table 2.1.
Table 2.1: A
sample report
layout
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 24 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
The Total Charge in Table 2.1 is a derived attribute and, at this point is not
stored in this table. Now, the easiest short-cut to generate the required report
might seem to be to have a table whose contents correspond to the reporting
requirements. (See Fig. 2.1)
Fig. 2.1: A table
in the report
format.
Clearly, the structure of the data set in Fig. 2.1 does not handle data very well for
the following reasons:
1. The project number (PROJ_NUM) is apparently intended to be a primary key,
but it contains nulls.
2. The table entries invite data inconsistencies. For example, the JOB_CLASS value
“Elect.Engineer” might be entered as “Elect.Eng.” in some cases, “El. Eng” or
“EE” in others.
3. The table displays data redundancies. These data redundancies yield the
following anomalies:
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 25 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
a. Update anomalies. Modifying the JOB_CLASS for employee number 105
requires (potentially many alterations, one for each EMP_NUM = 105)
b. Insertion anomalies. Just to complete a row definition, a new employee
must be assigned to a project. If the employee is not yet assigned, a
phantom project must be created to complete the employee data entry.
c. Deletion anomalies. If employee 103 quits, deletions must be made for
every entry in which EMP_NUM = 103. Such deletions will result in loosing
other vital data of project assignments from the database.
NOTE: We note that in spite of these structural deficiencies, the table structure
appears to work; the report is generated with ease. Unfortunately, the report
may yield different results, depending on what data anomaly has occurred.
2.2: Conversion to First Normal Form
Fig. 2.1 contains what is known as
repeating groups. A repeating group
derives its name from the fact that a
group of multiple (related) entries
can exist for any single key attribute
occurrence.
A good relational table must not
contain repeating groups. The
existence of repeating groups
provides evidence that the
RPT_FORMAT table in Fig. 2.1 fails to
meet even the lowest normal form
requirements, thus reflecting data
redundancies.
Normalizing the table structure will
reduce these data redundancies. If
repeating groups do exist, they must
be eliminated by making sure that
each row defines a single entity. In
addition, the dependencies must be
identified to diagnose the normal
form. The normalization process
starts with a simple three-step
procedure.
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 26 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
2.2.1: Step 1: Eliminate the repeating groups
Start by presenting the data in a tabular format, where each cell has a single
value and there are no repeating groups. To eliminate the repeating groups,
eliminate the nulls by making
sure that each repeating group
attribute contains an appropriate
data value. This change converts
the RPT_FORMAT table in Fig. 2.1
to the DATA_ORG_1NF table in
Fig. 2.2.
Fig. 2.2: Data organization: first normal form.
2.2.2: Step 2: Identify the primary key
The layout in Fig. 2.2 represents
much more than a mere cosmetic
change. Even a casual observer will
note that PROJ_NUM is not an adequate
primary key because the project
number does not uniquely identify all
the remaining entity (row) attributes.
For example, the PROJ_NUM value 15
can identify any one of five
employees. To maintain a proper
primary key that will uniquely
identify any attribute value, the new
key must be composed of a
combination of PROJ_NUM and EMP_NUM.
2.2.3: Step 3: Identify all dependencies
Dependencies can be depicted with
the help of a diagram as shown in Fig.
2.3. Because such a diagram depicts
all the dependencies found within a
given table structure, it is known as a
dependency diagram. Dependency
diagrams are very helpful in getting a
bird’s-eye view of all the
relationships among a table’s
attributes, and their use makes it
much less likely that you might
overlook an important dependency.
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 27 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
Fig. 2.3: A dependency diagram: first normal form (1NF).
Notice the following dependency diagram features from Fig. 2.3:
1. The primary key attributes are bold, underlined, and shaded in a different
colour.
2. The arrows above the attributes indicate all desirable dependencies, that is,
dependencies that are based on the primary key. In this case, note that the
entity’s attributes are dependent on the combination of PROJ_NUM and
EMP_NUM.
3. The arrows below the dependency diagram indicate less-desirable
dependencies. Two types of such dependencies exist:
a. Partial dependencies. Dependencies based on only a part of a
composite primary key are called partial dependencies.
b. Transitive dependencies. A transitive dependency is a dependency of
one nonprime attribute on another nonprime attribute. The problem
with transitive dependencies is that they still yield data anomalies.
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 28 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
The first normal form (1NF)
describes the tabular format, shown
in Fig. 2.2, in which three conditions
are satisfied:
All the key attributes are
defined.
There are no repeating groups
in the table.
All attributes are dependent on
the primary key.
All relational tables satisfy the 1NF
requirements. The problem with the
1NF table structure shown in Fig. 2.3
is that it contains partial
dependencies and transitive
dependency.
2.3: Conversion to Second Normal Form
The rule of conversion from INF
format to 2NF format is: eliminate all
partial dependencies from the 1NF
format. The conversion from 1NF to
2NF format is done in two steps:
2.3.1: Step 1: Identify all the key components
Fortunately, the relational database
design can be improved easily by
converting the database into a format
known as the second normal form
(2NF). The 1NF-to-2NF conversion is
simple: Starting with the 1NF format
displayed in Fig. 2.3, you do the
following activity:
Eliminate partial dependencies from
the 1NF format in Fig. 2.3. This step
will result in producing three tables
from the original table shown in Fig.
2.2.
From Fig. 2.3, two partial dependencies exist:
1. PROJ_NAME depends on PROJ_NUM, and
2. EMP_NAME, JOB_CLASS, and CHG_HOUR depend on EMP_NUM.
To eliminate the existing two partial dependencies, write each component on a
separate line, and then write the original (composite) key on the last line.
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 29 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
PROJ_NUM
EMP_NUM
PROJ_NUM EMP_NUM
Each component will become the key in a new table. The original table is now
divided into three tables: PROJECT, EMPLOYEE, and ASSIGN.
2.3.2: Step 2: Identify the dependent attributes
Determine which attributes are dependent on which other attributes. The three
new tables, PROJECT, EMPLOYEE and ASSIGN, are described by:
PROJECT (PROJ_NUM, PROJ_NAME)
EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR)
ASSIGN (PROJ_NUM, EMP_NUM, ASSIGN_HOURS)
The results of steps 1 and 2 are displayed in Fig. 2.4. At this point, most of the
anomalies have been eliminated.
Fig. 2.4: Second normal form (2NF) conversion results.
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 30 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
You will recall that a table is in
second normal form (2NF) if the
following two key conditions are
satisfied:
It is in 1NF
It includes no partial
dependencies; that is, no
attribute is dependent on only
a portion of the primary key.
Because a partial dependency can
exist only if a table’s primary key is
composed of several attributes, a
table whose primary key consists of
only a single attribute is
automatically in 2NF if it is in 1NF.
In the ASSIGN table, the attribute
ASSIGN_HOURS depends on both key
attributes of the composite primary
key EMP_NUM and PROJ_NUM. However,
Fig. 2.4 still shows a transitive
dependency as CHG_HOUR depends on
JOB_CLASS. This transitive
dependency can generate anomalies.
2.4: Conversion to Third Normal Form
The rule of conversion from 2NF
format to 3NF format is: eliminate all
transitive dependencies from the 2NF
format.
The conversion from 2NF to 3NF
format is done in three steps: the
data anomalies created by the
database organization shown in Fig.
2.4 are easily eliminated by
completing the following three steps:
2.4.1: Step 1: Identify each new determinant
For every transitive dependency, write its determinant as a PK for a new table. (A
determinant is any attribute whose value determines other values within a
row.) If you have three different transitive dependencies, you will have three
different determinants. Fig. 2.4 shows only one case of transitive dependency.
Therefore, write the determinant for this transitive dependency:
JOB_CLASS
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 31 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
2.4.2: Step 2: Identify the dependent attributes
Identify the attributes that are dependent on each determinant identified in Step
1 and identify the dependency. In this case, you write: JOB_CLASS → CHG_HOUR
Name the table to reflect its contents and function. In this case, JOB seems
appropriate.
2.4.3: Step 3: Remove the dependent attributes from transitive dependencies
Eliminate all the dependent attributes in the transitive relationship(s) from
each of the tables shown Fig. 2.4 that have such a transitive relationship.
Draw a new dependency diagram to show all the tables defined in Steps 1-
3.
Check the new tables as well as the tables modified in Step 3 to make sure
that each table has a determinant and that no table contains inappropriate
dependencies (partial or transitive).
When Steps 1-3 have been completed, the resulting tables will be shown in Fig.
2.5.
Fig. 2.5: Third normal form (3NF) conversion results.
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 32 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
Recall that a table is in third normal form (3NF) if the following two conditions
are satisfied:
It is in 2NF
It contains no transitive dependencies
2.5: Improving the Design
The table structures are refined to eliminate the troublesome initial partial and
transitive dependencies. Normalization cannot, by itself, be relied on to make
good designs. Instead it is valuable only because its use helps eliminate data
redundancies. Therefore the following changes have been made:
PK Assignment
Naming Conventions
Attribute Atomicity
Adding Attributes
Adding Relationships
Refining PKs
Maintaining Historical Accuracy
Using Derived Attributes
The enhancements are shown in the tables and dependency diagrams in Fig. 2.6.
Fig. 2.6: The completed
database (continued next
page).
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 33 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
Fig. 2.6: The completed database
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 34 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
5.1.6 Limitations on System-Assigned Keys
System-assigned primary key may not prevent confusing entries
Data entries in Table 2.2 are inappropriate because they duplicate existing
records - Yet there has been no violation of either entity integrity or
referential integrity.
Table 2.2: Duplicate entries in the JOB table.
This “multiple duplicate records” problem was created when the JOB_CODE was
added to become the PK. In any case, if JOB_CODE is to be the PK, we still must
ensure the existence of unique values in the JOB_DESCRIPTION through the use of a
unique index.
Although our design meets the vital entity and referential requirements, there are
still some concerns the designer must address. The JOB_CODE attribute was
created and designated to be the JOB table’s primary key to ensure entity
integrity in the JOB table. The DBMS may be used to have the system assign the
PK values. However it is useful to remember that the JOB_CODE does not prevent
us from making the entries in the JOB table shown in Table 2.2.
It is worth repeating that database design often involves trade-offs and the
exercise of professional judgment. In a real-world environment, we must strike a
balance between design integrity and flexibility.
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 35 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
SECTION III: NORMALIZATION EXTRA CREDITS (CAN YOU?)
3.1: Extra Credit 1: Worked Example
Examine the table shown below:
branchNo branchAddress telNos
B001 8 Jefferson Way, Portland, OR 97201 503-555-3618, 503-555-2727, 503-555-6534 B002 City Center Plaza, Seattle, WA 98122 206-555-6756, 206-555-8836 B003 14 – 8th Avenue, New York, NY 10012 212-371-3000 B004 16 – 14th Avenue, Seattle, WA 98128 206-555-3131, 206-555-4112
(a) Why is this table not in 1NF?
(b) Describe and illustrate the process of normalizing the data shown in this
table to third normal form (3NF).
(c) Identify the primary, alternate and foreign keys in your 3NF relations.
Answers:
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 36 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
NOTE: There is an alternative approach to altering the original Branch table –
columns can be added to the original Branch table to hold the individual values
for each telephone number, e.g. telNo1, telNo2 and telNo3.
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 37 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
3.2: Extra Credit 2: Worked Example
Given the dependency diagram shown in the following figure, identify and discuss
each of the indicated dependencies:
Answers:
C1 → C2 represents a partial dependency, because C2 depends only on C1,
rather than on the entire primary key composed of C1 and C3.
C4 → C5 represents a transitive dependency, because C5 depends on an
attribute (C4) that is not part of a primary key.
C1, C3 → C2, C4, C5 represents a functional dependency, because C2, C4,
and C5 depend on the primary key composed of C1 and C3.
3.3: Extra Credit 3: Worked Example
Given the health report card data below, work out the un-normalized, first, second
and third normal forms for the given data card (see next page, please).
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 38 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
HEALTH HISTORY REPORT
PET ID PET NAME PET TYPE PET AGE OWNER VISIT DATE PROCEDURE
246 ROVER DOG 12 SAM COOK JAN 13/2002 01 - RABIES VACCINATION
MAR 27/2002 10 - EXAMINE and TREAT WOUND
APR 02/2002 05 - HEART WORM TEST
298 SPOT DOG 2 TERRY KIM JAN 21/2002 08 - TETANUS VACCINATION
MAR 10/2002 05 - HEART WORM TEST
341 MORRIS CAT 4 SAM COOK JAN 23/2001 01 - RABIES VACCINATION
JAN 13/2002 01 - RABIES VACCINATION
519 TWEEDY BIRD 2 TERRY KIM APR 30/2002 20 - ANNUAL CHECK UP
APR 30/2002 12 - EYE WASH
Your answers here, please:
UNF:
1NF:
2NF:
3NF:
WORKED ANSWERS:
UNF:
Pet [ pet_id, pet_name, pet_type, pet_age, owner, ( visitdate, procedure_no,
procedure_name ) ]
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 39 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
1NF:
Pet [ pet_id, pet_name, pet_type, pet_age, owner ]
Pet_Visit [ pet_id, visitdate, procedure_no, procedure_name ]
Note that a procedure may occur on multiple dates, therefore visitdate is included as
part of the key
2NF:
Pet [ pet_id, pet_name, pet_type, pet_age, owner ]
Pet_Visit [ pet_id, visitdate, procedure_no ]
Procedure [ procedure_no, procedure_name ]
3NF:
same as 2NF
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 40 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
SECTION IV: USEFUL DATABASE TERMS AND DEFINITIONS
Terms and definitions are very important in any topic. The normalization process
has its share of terms. Here are a few that you should try to understand as you
continue your exploration of the interesting world of database management
systems even outside the scope of this manual:
WORD/TERM DEFINITION
Anomaly A deviation from the common rule, type, arrangement, or form; or an incongruity or inconsistency.
Attribute Describes a column in a table. It comes from relational algebra, where a column is an attribute and a row is a tuple.
Binary relationship A reciprocal set of relations between two things, in databases the two things are tables, views (semi-permanent result sets), or temporary result sets..
Candidate key A unique key that you may choose as a primary key.
Column Describes a vertical element in a table. It comes from spreadsheets, where a column defines the vertical axis of data. A column is a single element of a data structure that is found in every row. Columns always have a value in a data structure when the structure constrains its creation to demand one. Database let you allow or disallow a null value when you create a table (or structure).
Composite key A key that is made up of two or more columns. It is possible that this term can be applied to many different keys, and that it is interchangeable with a compound key. You will see compound key used more frequently.
Compound key A key that is made up of two or more columns. It is possible that this term can be applied to many different keys, and that it is interchangeable with a composite key. This is typically the more widely used word.
Data structure Describes the definition of a type of data, like an integer or string and the collection of a group of various pre-defined data types into a group. The latter is the best corollary to a record in a file system, or a row in a database. You can make a numerically indexed array of any base data type (often described as scalar or primitive), or compound data type, which effectively creates a 2-dimensional structure known as a database table.
Field Describes a column in a table. It comes from file systems, which predate databases. A field is either a positionally fixed or delimited
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 41 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
element in a list of values. Fields are always found but may be null or empty values.
File system Describes the use of files as a data repository, where each file contains rows of data organized as data structures. Procedural programming languages access the files based the programmer’s knowledge of their definitions, which is normally maintained in a definitional file or document.
Foreign key A key that maps to a value in a primary key list, where the list exists in the same or another table.
Functional dependency Means an attribute or column depends on exactly one other unique attribute or set of attributes. The unique attribute is a single column natural key chosen as a primary key, while the unique set of attributes is a compound natural key, likewise chosen as a primary key. You write the functional dependency: A → B Columns that have a mandatory reliance on another column or set of columns are typically a non-key column or set of columns on a primary key found in the current row. A foreign key column is also functionally dependent on a primary key in the current table or other table.
Key A column that contains a value that identifies or helps in conjunction with other key columns to identify a row as unique.
Many-to-many A non-specific relationship between two tables, where one row in one table may map to one to many rows in the other and vice versa. You map these two tables by using a third table that holds a foreign key from both in the same row. The third table is known as an association or translation table. Both of the original tables have a one to many relationship to the association table, and both relationships resolve through the association table.
N-ary relationship A non-specific relationship between three or more tables, where one row in one table may have a many-to-many-to-many relationship between one or both of the other tables. You map these three or more tables by using another table that holds a foreign key from all of them in the same row. The other table is known as an association or translation table. Typically, all of the original tables have a one-to-many relationship to the association table, and all relationships resolve through the association table.
Natural key A unique key that identifies a row of data, or instance of data. A natural key is automatically a candidate key that you may choose as a primary key. All other columns in the table should enjoy a direct and full functional dependency on the natural key. If you adopt a surrogate key for joins, the surrogate key plus the natural key should become a unique index to speed searches through the table.
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 42 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
Nominated key A unique key that you may choose as a primary key, and it is also known as a candidate key. The only subtle difference that I’ve found is that some people use nominated to indicate the candidate key they’ve tentatively chosen before making a final decision.
Non-key A column that contains a descriptive value that doesn’t identify or help identify a row as unique but provides a characteristic to a row of data. All non-key columns should have a full functional dependency on the natural key, or primary key.
Non-specific relationship A logical reciprocal set of relations between two things where no row in either set has a possible intersection with the other. A non-specific relationship is also known as a binary many-to-many relationship. These are logical relationships that convert to two physical relationships known as specific relationships. Specific relationships are either one-to-one or one-to-many binary relationships. Non-specific relationships are resolved by two one-to-many relationships and an association set. The association set holds rows of foreign keys that point respectively to both sets. Each row in the association table lets you resolve the relationship between a row in one and a row in the other through an INNER JOIN.
Object instance An object instance is a data set inserted into a defined object type. This can occur at runtime, or in the context of databases through an INSERT statement. An Oracle database may contain nested object instances when a column relies on an object type, which are known as standalone objects.
Object type An object type in the context of a database is a data structure, or the definition of a table. Definitions of tables are stored in the database catalog and built upon pre-existing data types. Some databases support User-Defined Types (UDT). Where UDTs are available the data structure may use them when they’re defined before the object type. Object types are a generalization of tables user-defined types in an Oracle database.
One-to-many A specific relationship between two table, where one row in one table maps to one to many rows in another table. You map these two tables by using the primary key of one table as a foreign key in the other. This makes the table that holds the foreign key functionally dependent on the primary key in the other table. The one side of the relationship is always the independent row, and it always donates a copy of its primary key to the dependent row.
One-to-one A specific relationship between two table, where one row in one table maps to one and only one row in another table. You map these two tables by using the primary key of one table as a foreign key in the other. This makes the table that holds the foreign key functionally
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 43 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
dependent on the primary key in the other table. While a one-to-one relationship allows you to choose either as the independent row, it is important that you identify the business relationship of the two tables and make the primary task element the independent row. The independent row donates a copy of its primary key to the dependent row.
Partial dependency A partial dependency exists when the primary key is a compound key of two or more columns, and one or more columns depends on less than all of the columns in the compound primary key.
Primary key A candidate key that you chose to serve as the primary key.
Record Describes a horizontal element in a table. A record is a row of data, or an instance of a defined data structure. As such, it is row inside a file.
Row Describes a horizontal element in a table. It comes from spreadsheets, where a row defines the horizontal axis of data. A row is also an instance of the data structure defined by a table definition, and the nested array of a structure inside an ordinary array.
Specific relationship A reciprocal set of relations between two things where one row in a result set finds one row in another result set. Another example is where one row in a result set finds many matches in another result set. These binary relationships are respectively one-to-one and one-to-many. Specified relationships have equijoin or non-equijoin resolution. The first matches values, like the process in a nested loop, and the second matches values through a range or inequality relationship. Equijoins typically have a primary and foreign key, and the one-side holds the primary key while the many side holds a foreign key. In the specialized case of a one-to-one relationship, you must choose which table holds the primary key that becomes a functional dependency as a foreign key in the other.
Superkey A key that identifies a set of rows, like a gender column that lets you identify male or females in your data model.
Surrogate key A key that identifies uniqueness for rows, like an automatic numbering sequence. It is superior solution to a natural key because you create indexes by using the surrogate key column followed by the primary key column(s). If you discover more about the domain later and need to add a column to the natural key, you need only drop the index and recreate with the new list of columns.
Transitive dependency A column that depends on another column before relying on the primary key of the table. It may exist in tables with three or more columns that are in second normal form.
Tuple Describes a row in a table. It comes from relational algebra, where a
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 44 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
column is an attribute and a row is a tuple.
Unique key A column or set of columns that uniquely identifies a row of data.
User-defined type A data type defined by the user in a schema (Oracle) or database (MySQL and Microsoft SQL Server).
DCS210 Course Manual: A Gentle Introduction to Database Normalization 2014 © IACC, ABU Zaria
PAGE 45 Downloaded from http://www.auwalgene.com/mystudents/lecturenotes
FREE, NOT FOR SALE!
WE’RE DONE FOR NOW, GOOD BYE!
ELL, that will be all in this gentle introduction to database
normalization. I hope you found it both useful and enjoyable; even
though we had to skip many of the "noisy" parts that deal with the
bewildering math and equations of normalization. In the next manuals for this
course, we shall learn about database programming with VB6 and you will
eventually build a complete data-driven desktop application to manage students'
records at IACC, ABU Zaria. See you then...
Download more resources at: http://www.auwalgene.com/mystudents/lecturenotes
Connect with me on Facebook at: http://www.facebook.com/auwalgene3
Drop me a comment or two on my website: http://www.auwalgene.com/comments
Tell me something via e-mail at: [email protected]
Send SMS text messages to my mobile phone at +234 (0) 8032126160
Thank you for reading, and happy database normalization!
W