copyright © 2003-2012 curt hill schema refinement iii 4 th nf and 5 th nf
TRANSCRIPT
Copyright © 2003-2012 Curt Hill
Schema Refinement III
4th NF and 5th NF
Copyright © 2003-2012 Curt Hill
Now what?
• An example• Consider a table that contains courses,
instructors and textbooks• There may be multiple instructors for
multiple sections of the class• There may be multiple textbooks as well• Both instructors and textbooks come
from a set of possibilities
Copyright © 2003-2012 Curt Hill
Course/Instructor/BookDept Number Instructor Book
CIS 385 221 Smith & Boss
CIS 385 221 Noble
CIS 385 403 Smith & Boss
CIS 385 403 Noble
• Key is entire tuple• Each instructor uses two books for the course• There is a redundancy
Copyright © 2003-2012 Curt Hill
Commentary• There is redundancy that we should
deal with
• The table is in BCNF– No examination of FDs will help us
• The two instructors and two textbooks are both determined by the course department and number
• This is an example of a MultiValued Dependency
Commentary Again• First normal form disallows repeating
groups• A repeating group is often a set• A MultiValued Dependecy is a set
depending on an item• Examples:
– People working on many projects– Each of these have many dependents
Copyright © 2003-2012 Curt Hill
Examples• In this example the course determines a
set of instructors
• The course also determines a set of textbooks
• These two sets are independent
• If the sets are large we get plenty of redundancy and yet are still in BCNF– If we have every book connected to every
instructor connected to the course
Copyright © 2003-2012 Curt Hill
Copyright © 2003-2012 Curt Hill
MultiValue Dependency
• An MVD determines a value from a set
• Notation is two arrows• Dept,Number Instructor and
• Dept,Number Book
• The correct decomposition is splitting teacher from book
Copyright © 2003-2012 Curt Hill
Course/Instructor/Book
Dept Number Instructor Book
CIS 385 221 Smith & Boss
CIS 385 221 Noble
CIS 385 403 Smith & Boss
CIS 385 403 Noble
Dept Num Instruct
CIS 385 221
CIS 385 403
Dept Num Book
CIS 385 Smith & Boss
CIS 385 Noble
Project into
Copyright © 2003-2012 Curt Hill
Fourth Normal Form
• The above two tables are in 4th NF
• A table is in 4th NF if and only if
• The table is in BCNF
• All MVDs are now FDs
• If there are no MVDs then BCNF is also 4NF
Copyright © 2003-2012 Curt Hill
Another View of 4th NF
• If a relation is in 4th NF then for each MVD, X A one of the following must hold
• The MVD is trivial– A is part of X or– XA is the whole relation
• X is a superkey
Copyright © 2003-2012 Curt Hill
Is this 4th NF?Dept Number Instructor Book
CIS 385 221 Smith & Boss
CIS 385 221 Noble
CIS 385 403 Smith & Boss
CIS 385 403 Noble
• There are two MVDs– Dept,Number Instructor
– Dept,Number Book
• Trivial MVDs? - No• Dept,Number superkey? - No
Copyright © 2003-2012 Curt Hill
Is this 4th NF?Dept Num Instruct
CIS 385 221
CIS 385 403
• There is one MVD– Dept,Num Instructor
• Trivial MVD?– Yes, this is whole relation
Copyright © 2003-2012 Curt Hill
Decomposability
• A strange thing happens:
• There are relations that may not be lossless join decomposed into two relations
• But they can be decomposed into larger number of relations
• The following example shows a relation that can be decomposed into three but not two
Copyright © 2003-2012 Curt Hill
S P J
1 1 2
1 2 1
2 1 1
1 1 1
AExample
Copyright © 2003-2012 Curt Hill
What about this?
• What is the key?– Entire tuple– Must be in 4th NF
• What MVDs?– S P– S J– P J– Among others
Decomposition
• In the next slide we will see the table decomposed into tables of two fields
• However, no two of them can be joined into the original without extra rows
• All three of them can be joined into the original
Copyright © 2003-2012 Curt Hill
Copyright © 2003-2012 Curt Hill
S P J
1 1 2
1 2 1
2 1 1
1 1 1
S J
1 2
1 1
2 1
S P
1 1
1 2
2 1
P J
1 2
2 1
1 1
S P J
1 1 2
1 2 2
1 2 1
2 1 1
1 1 1
S P J
1 1 2
1 2 1
2 1 1
1 1 1
AB C D
Example Decomposed
Copyright © 2003-2012 Curt Hill
What Just Happened?
• A could not be lossless join decomposed into any two of {B, C, D}– Decomposing into just two must break an MVD
• It could be lossless join decomposed into all three
• There is a join dependency between A and {B, C, D}
• There is no join dependency between any of– A and {B, C} – A and {B, D} – A and {C, D}
Copyright © 2003-2012 Curt Hill
Join Dependencies• A Join Dependency {R1,R2,…RN} holds over R if
R1,R2,…RN is a lossless join decomposition of R– In other words, joining R1,R2,…RN gives R
• Notation: {R1,R2,…RN}• A JD is a generalization of MVDs
• In the previous example, the MVDs S P S JP Jmay be expressed as the join dependency {B,C,D}
Copyright © 2003-2012 Curt Hill
Trivial Join Decompositions
• The join dependency {R1,R2,…RN} on R is trivial iff– At least one of R1,R2,…RN is the set of all
attributes of R– In other words, there is a relation
equivalent to R in the decomposition• Joining R to any decomposition of R or its join
reproduces the original
Copyright © 2003-2012 Curt Hill
Implied Join Dependencies
• Suppose the join dependency {R1,R2,…RN} on R
• This Join Dependency is Implied by the Candidate Key(s) iff
• Each relation R1,R2,…RN is a superkey for R
Copyright © 2003-2012 Curt Hill
Fifth Normal Form
• 5th NF is also known as: Projection Join Normal Form (PJNF)
• A relation R is in 5th NF if and only if every non-trivial join dependency that is satisfied by R is implied by the candidate key(s) of R
Copyright © 2003-2012 Curt Hill
S P J
1 1 2
1 2 1
2 1 1
1 1 1
Is this in 5th NF?• There is a non-trivial join
decomposition, {B,C,D}
–None of these are A
• This decomposition is not implied by the only candidate key, SPJ–None of these contain SPJ
• No – not in 5NF
Copyright © 2003-2012 Curt Hill
Is 5th NF the Ultimate?• It is the ultimate that can be obtained
with just projections– The guaranteed best in terms of a lack of
anomalies that can be removed by projections
• Hence the name Join Projection Normal Form
• However, there may be some anomalies that cannot be eliminated with just projections
Copyright © 2003-2012 Curt Hill
JDs and FDs
• FDs and MVDs have a set of inference rules– This allows us to reason about them
• JDs lack this set
• Thus finding JDs and using them to move to 5th NF has its problems
• We do have one tool
Copyright © 2003-2012 Curt Hill
3NF and 5NF
• If a relation is in 3rd NF and each of its keys is atomicthen the relation is also in 5th NF– The same may be said on BCNF
• There may be 5th NF relations that do not have atomic keys
• When we can apply this we can determine the table is in 5th NF without any consideration of JDs
Copyright © 2003-2012 Curt Hill
Denormalization
• The argument against making everything 5th NF:– Lots of separate relations– These relations become separate files– This means lots of I/O
• Since SQL cannot separate a relation from a file, the argument has some merit
Conclusion
• MVD are much less common than FD• Thus tables that are in BCNF are very
often in 5NF because there are no MVDs• MVDs are also harder to observe and
reason about• Thus 3NF and BCNF are the most
common normal forms
Copyright © 2003-2012 Curt Hill