normalization 2003 319b database systems normal forms wilhelm steinbuss room g1.25, ext. 4041...
TRANSCRIPT
normalization 2003
319B Database SystemsNormal Forms
Wilhelm Steinbuss
Room G1.25, ext. [email protected]
normalization 2003
Introduction
• Develop first an ER Model • map this into a (logical) relational database
design • verify that the resulting design does not
violate any of the normalization principles
1NF 2NF 3NF BCNF 4NF 5NF ..
normalization 2003
Why Normalization?
Assume you would have the following table
in your logical design: (project table)
There are many anomalies with this design:
Emp# Proj# Dept# Mgr# deptname percentage
normalization 2003
Anomalies
• Insert anomaly:
no new department unless there is an employee in it
• Delete anomaly:
the last employee of a department can not be dropped; otherwise the information about the department disappears
• Update anomaly:
the name of a department is repeated once for each employee
normalization 2003
1NF
A relational Variable is in 1NF if and only if
every legal value of that relational variable
contains exactly one value for each attribute.
(A relational variable with strict typing is always in 1NF.)
normalization 2003
1NF (cont.)
Example: (a relational variable not in 1NF)
person p# name .... language_skills
1 McGee ... French,Dutch,English
: : ... :
normalization 2003
2NF
Example: (project is in 1NF, but with anomalies)
emp# proj# dept# dept_name mgr# percentage
normalization 2003
2NF (cont.)
A relational variable is in 2NF if and only if
it is in 1NF and every nonkey attribute
depends on the whole key.
normalization 2003
Example project
emp# proj# percentage
emp# mgr# dept# dept_name
normalization 2003
Normalization step
Let Z be a key for R{A1,..,An}; if X Y,
X a proper subset of Z and Y Z = {}, then
R can be lossless decomposed into R1,R2:
R1{X Y} and R2{{A1,...,An} – Y}
If R1,R2 are not in 2NF, repeat the step
normalization 2003
Lossless decomposition
Theorem 1: Let X,Y,Z be sets of attributes for R and S a set of FDs; thenR = R{X Y} R{X Z} X Y S+ or X Z S+
Proof: ‘‘ Let (x,y,z) be a short cut for {X:x,Y:y,Z:z}. We first show that R R{X Y} R{X Z}.
Let (x,y,z) R, then (x,y) R{X Y} and (x,z) R{X Z}, and so (x,y,z) R{X Y} R{X Z} Next we show R R{X Y} R{X Z}. Let (x,y,z) be an Element of the right hand side; in order to generate this element (x,y) R{X Y} and (x,z) R{X Z} and therefore
normalization 2003
Lossless decomposition (cont.)
(x,y‘,z) R for some y‘ in order to generate (x,z)R{X Z};
therefore (x,y‘) and (x,y) R{X Y} and y‘=y because
X Y; therefore (x,y,z) R
‘‘ Let us assume that neither X Y nor X Z is valid. So at least an
A Y and a B Z exists with neither X {A} nor X {B}; so
A, B X+ (Lemma 2.3 FD). Now we choose r=(x,y1,z1) and
s=(x,y2,z2) like in Lemma 2.4 FD; now r|X = s|X but they are
different at least at the position for A (within the Y attributes) so
r|Y = y1 y2 = s|Y (the same for Z).
(x,y1,z2) R{X Y} R{X Z}, but (x,y1,z2) R
normalization 2003
3NF
Example: the first relational variable (EMP)
in the 2NF decomposition still has anomalies:
emp# mgr# dept# dept_name
normalization 2003
3NF (cont.)
A relational variable is in 3NF if and only if it
is in 2NF and every non-key attribute is non
transitively dependent on the primary key.
normalization 2003
Example project
emp# proj# percentage
dept# deptnameemp# mgr# dept#
normalization 2003
Boyce-Codd Normal Form (BCNF)
So far we focused on FDs X Y with :
X key and Y non key attributes
or
X and Y non key attributes; but what‘s
about:
X non key attributes and Y key ?
normalization 2003
Example
Example: An course relational variable
with FDs:
{stud#,course#} {teacher#}
{teacher#} {course#}
student# course# teacher#
normalization 2003
Example (cont.)
course is in 3NF with key {stud#,course#}
(why?), but has anomalies (e.g. if we delete
the last sentence for a student in the course A
taught by a teacher B, we‘re losing the
information that B teaches A. The reason is:
{teacher#} {course#} and {teacher#} isn‘t
a (super)key.
normalization 2003
Example (cont.)
The situation is:
1. Two (or more) candidate keys
2. The candidate keys are composite and
3. They overlapped (i.e. had at least one
attribute in common)
( what is the second candidate key?)
normalization 2003
BCNF
A relational variable is in BCNF if and only if
whenever X A holds and A is not in X,
X is a superkey.
normalization 2003
BCNF (cont.)
More informal: each attribute must represent a fact about the entity identified by the key, the whole key and nothing but the key.
Or
If we assign the attributes in an ER Diagram to the suitable entity types then the resulting relational variables are in BCNF
normalization 2003
Example course
teacher# course#
What is the key?student# teacher#
normalization 2003
Normalization Step
Let R{A1,..,An}; if X Y (X,Y {A1,..,An})
and X is not a superkey, then R can be lossless
decomposed into R1,R2:
R1{X Y} and R2{{A1,...,An} – Y}
If R1,R2 are not in BCNF, repeat the step
normalization 2003
Exercise bookings
The relational variable Bookings:
title the name of a movie
theater the name of a theater where the movie is being shown
city the city where the theater is located
with FDs
{theater} {city}
{title,city} {theater} (only for the sake of the example)
Find the two candidate keys (proof that they are keys!) and decompose
bookings into relational variables which are in BCNF
normalization 2003
Exercise events
The relational variable events:
event_type type of the event (e.g. sport)
date date for the event
event# the number of a specific event of that type
With FDs
{event_type,date} {event#} (for each event_type only one event of this type per day)
{event#} {event_type}
With the (candidate) key {event_type,date} events is not in BCNF;
decompose it to relational variables which are in BCNF
normalization 2003
summary
In BCNF the only (interesting) determinants
are the (candidate) keys; together with
Theorem 1 that is the end of the normalization
process depending on FDs (because there are
no more interesting lossless decompositions)
normalization 2003
4NF
Suppose we choose
instead of
an associative entity type:
normalization 2003
Example article
article_name colour size
T-shirt sunshine green M
T-shirt sunshine red M
T-shirt sunshine green L
T-shirt sunshine red L
T-shirt sunshine green S
T-shirt sunshine red S
normalization 2003
Example article (cont.)
If the article_name and an arbitrarily chosen
value for size are known, then the set of valid
values for colour is known (e.g. given
‘T-shirt sunshine‘ with size=‘M‘, then
colour = {‘green‘,‘red‘}; the same is true for
size = ‘S‘ and size =‘L‘)
normalization 2003
Multivalued Dependency
Let X,Y and Z be a decomposition of the attributes
of a relational variable R{X Y Z} and R a relational
value for R{X Y Z}. Let Yxz := {y: (x,y,z) R}
X Y (i.e. X multidetermines Y)
if and only if
Yxz = Yxz* for each z, z* whenever Yxz and Yxz* {}
Note: XY is a special case of XY whereYxz contains exactly one element
normalization 2003
4NF
A relational variable is in 4NF if and only if
X is a superkey for every nontrivial X Y
Note: Because each FD is a multivalued dependency
this implies also BCNF
normalization 2003
complementary rule
Theorem 2: X Y X ZConclusion from:Lemma 3: X Y ( If (x,y,z) R and
(x,y*,z*) R then (x,y*,z) R and (x,y,z*) R )
““ Let (x,y,z)R and Yxz* {}; then (x,y,z*) R
because Yxz = Yxz* by definition of X Y .Starting with (x,y*,z*), we get (x,y*,z) R
normalization 2003
Lemma 3 (cont.)
““ Let y* Yxz* , i.e. (x,y*,z*) R and by
prerequisite (x,y*,z) R y* Yxz
i.e. Yxz* Yxz
Starting with y Yxz , i.e. (x,y,z) R and by
prerequisite (x,y,z*) R y Yxz*
i.e. Yxz Yxz*
Yxz = Yxz* X Y by definition
normalization 2003
Decomposition
Theorem 4: Let X,Y and Z be a decomposition of the attributes of arelational variable R{X Y Z}. ThenR = R{X Y} R{X Z} X Y
““ Let (x,y,z) , (x,y*,z*) R; there is a representation(x,y,z)=(x,y) (x,z) and (x,y*,z*) = (x,y*) (x,z*); but then also (x,y,z*) = (x,y) (x,z*) R and (x,y*,z) = (x,y*) (x,z) R X Y by Lemma 3
normalization 2003
Decomposition (cont.)
““ For R R{X Y} R{X Z} see
proof of Theorem 1; we have to show ““ :
Let t R{X Y} R{X Z} ; then there
are t1, t2 R with t = t1|X Y t2|X Z
with t1 = (x,y,z) and t2=(x,y*,z*)
then t=(x,y,z*) or t=(x,y*,z) t R by
Lemma 3 and X Y
normalization 2003
Normalization Step
Let X,Y and Z be a decomposition of the attributes of a relational variable R{X Y Z} and X Y.Then R{X Y Z} can be lossless decomposed:
R = R{X Y} R{X Z}
If R{X Y}, R{X Z} are not in 4NF, repeat the step
normalization 2003
summary
In our example we get the two (original) m:n
relationsships; so a unnecessarily designed
n-ary relationship results in a relational
variable which violates the 4NF.
4NF marks the end of a lossless
decomposition into two relational variables.