c20.0046: database management systems lecture #5
DESCRIPTION
C20.0046: Database Management Systems Lecture #5. Matthew P. Johnson Stern School of Business, NYU Spring, 2004. Agenda. Last time: relational model Homework 1 is up, due next Tuesday This time: Functional dependencies Keys and superkeys in terms of FDs Finding keys for relations - PowerPoint PPT PresentationTRANSCRIPT
M.P. Johnson, DBMS, Stern/NYU, Sp2004
1
C20.0046: Database Management SystemsLecture #5
Matthew P. JohnsonStern School of Business, NYUSpring, 2004
M.P. Johnson, DBMS, Stern/NYU, Sp2004
2
Agenda Last time: relational model Homework 1 is up, due next Tuesday This time:1. Functional dependencies
Keys and superkeys in terms of FDs Finding keys for relations
2. Rules for combining FDs Next time: anomalies & normalization
M.P. Johnson, DBMS, Stern/NYU, Sp2004
3
Review: converting inheritance
Movies
Cartoons Murder-Mysteries
isa isaVoices
Weapon
stars
length title year
Lion King
Component
M.P. Johnson, DBMS, Stern/NYU, Sp2004
4
Review: converting inheritance OO-style:
Nulls-style:
Title Year length
Star Wars 1980 120
Title Year length Murder-Weapon
Scream 1988 110 Knife
Title Year length
Lion King 1990 115
Title Year length Murder-Weapon
Roger Rabbit 1988 110 Knife
1. Plain movies 3. Murder mysteries
2. Cartoons 4. Cartoon murder-mysteries
Title Year length Murder-Weapon
Star Wars 1980 120 NULLLion King 1993 130 NULLScream 1988 110 Knife
Roger Rabbit 1990 115 Knife
M.P. Johnson, DBMS, Stern/NYU, Sp2004
5
Review: converting inheritance E/R-style
Root entity set: Movies(title, year, length)
1301993Lion King198819901980
Year
110115120
length
Roger RabbitScream
Star Wars
Title
Knife1990R. Rabbit1988
YearKnife
murderWeaponScream
TitleSubclass: MurderMysteries(title, year, murderWeapon)
Subclass: Cartoons(title, year)
1993Lion King1990
YearRoger Rabbit
Title
M.P. Johnson, DBMS, Stern/NYU, Sp2004
6
E/R-style & quasi-redundancy Name and year of Roger Rabbit were listed in three
different rows (in different tables) Suppose title changes (“Roger” “Roget”)
must change all three places Q: Is this redundancy? A: No!
name and year are independent multiple movies may have same name
Real redundancy reqs. depenency two rows agree on SSN must agree on rest
conflicting hair colors in these rows is an error two rows agree on movie title may still disagree
conflicting years may be correct – or may not be Better: introduce “movie-id” key att
M.P. Johnson, DBMS, Stern/NYU, Sp2004
7
Combined isa/weak example Exercise 3.3.1
Convert from E/R to R, by E/R, OO and nulls
courses
Lab-courses
Depts
Computer-allocation
room
number
givenBy
name chair
isa
M.P. Johnson, DBMS, Stern/NYU, Sp2004
8
Next topic: Functional dependencies (3.4) FDs are constraints
part of the schema can’t tell from particular relation instances FD may hold for some instances “accidentally”
Finding all FDs is part of DB design Used in normalization
M.P. Johnson, DBMS, Stern/NYU, Sp2004
9
Functional dependencies Definition:
Notation: Read: Ai functionally determines Bj
If two tuples agree on the attributes
A1, A2, …, An
then they must also agree on the attributes
B1, B2, …, Bm
A1, A2, …, An B1, B2, …, Bm
M.P. Johnson, DBMS, Stern/NYU, Sp2004
10
Typical Examples of FDs Product
name price, manufacturer
Person ssn name, age father’s/husband’s-name last-name zipcode state phone state (notwithstanding inter-state area codes)
Company name stockprice, president symbol name name symbol
M.P. Johnson, DBMS, Stern/NYU, Sp2004
11
To check A B, erase all other columns; for each rows t1, t2
i.e., check if remaining relation is many-one no “divergences” i.e., if AB is a well-defined function thus, functional dependency
Functional dependenciesBm...B1Am...A1
t1
t2
if t 1, t 2 agree here then t 1, t 2 agree here
M.P. Johnson, DBMS, Stern/NYU, Sp2004
12
FDs ExampleProduct(name, category, color, department, price)
name colorcategory departmentcolor, category price
Consider these FDs:
What do they say ?
M.P. Johnson, DBMS, Stern/NYU, Sp2004
13
FDs ExampleFDs are constraints:• On some instances they hold• On others they don’t
name category color department price
Gizmo Gadget Green Toys 49
Tweaker Gadget Green Toys 99
Does this instance satisfy all the FDs ?
name colorcategory departmentcolor, category price
M.P. Johnson, DBMS, Stern/NYU, Sp2004
14
FDs Example
name category color department price
Gizmo Gadget Green Toys 49
Tweaker Gadget Black Toys 99
Gizmo Stationary Green Office-supp. 59
What about this one ?
name colorcategory departmentcolor, category price
M.P. Johnson, DBMS, Stern/NYU, Sp2004
15
Q: Is PositionPhone an FD here?
A: It is for this instance, but no, presumably not in general
Others FDs? EmpID Name, Phone, Position but Phone Position
Recognizing FDs
EmpID Name Phone PositionE0045 Smith 1234 ClerkE1847 John 9876 SalesrepE1111 Smith 9876 SalesrepE9999 Mary 1234 Lawyer
M.P. Johnson, DBMS, Stern/NYU, Sp2004
16
Keys of relations {A1A2A3…An} is a key for relation R if
1. A1A2A3…An functionally determine all other attributes
Usual notation: A1A2A3…An B1B2…Bk
rels = sets distinct rows can’t agree on all Ai
2. A1A2A3…An is minimal No proper subset of A1A2A3…An functionally determines
all other attributes of R Primary key: chosen if there are several
possible keys
M.P. Johnson, DBMS, Stern/NYU, Sp2004
17
Keys example Relation: Student(Name, Address, DoB,
Email, Credits) Which (/why) of the following are keys?
Name, Address Name, SSN Email, SSN Email SSN
NB: minimal != smallest
M.P. Johnson, DBMS, Stern/NYU, Sp2004
18
Superkeys A set of attributes that contains a key Satisfies first condition:
functionally determines every other attribute in the relation
Might not satisfy the second condition: minimality may be possible to peel away some attributes
from the superkey keys are superkeys
key are special case of superkey superkey set is superset of key set
name;ssn is a superkey but not a key
M.P. Johnson, DBMS, Stern/NYU, Sp2004
19
Discovering keys for relations Relation entity set
Key of relation = (minimized) key of entity set Relation binary relationship
Many-many: union of keys of both entity sets Many(M)-one(O): only key of M (Why?) One-one: key of either entity set (but not both!)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
20
Example – entity sets Key of entity set = (minimized) key of relation
Student(Name, Address, DoB, SSN, Email, Credits)
Student
Name
Address
DoB
SSN
Credits
M.P. Johnson, DBMS, Stern/NYU, Sp2004
21
Example – many-many Many-many key: union of both ES keys
Student Enrolls Course
SSN Credits CourseID Name
Enrolls(SSN,CourseID)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
22
Example – many-one Key of many ES but not of one ES
keys from both would be non-minimal
Course MeetsIn Room
CourseID Name RoomNo Capacity
MeetsIn(CourseID,RoomNo)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
23
Example – one-one Keys of both ESs included in relation Key is key of either ES (but not both!)
Husbands Married Wives
SSN Name SSN Name
Married(HSSN, WSSN) or
Married(HSSN, WSSN)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
24
Discovering keys: multiway Multiway relationships:
Multiple ways – may not be obvious R:F,G,HE is many-one E’s key is included
but not part of key Recall that relship atts are implicitly many-one
Course Enrolls Student
CourseID Name SSN NameSection
RoomNo CapacityEnrolls(CourseID,SSN,RoomNo)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
25
Example Exercise 3.4.2 Relation relating particles in a box to
locations and velocitiesInPosition(id,x,y,z,vx,vy,vz)
Q: What FDs hold? Q: What are the keys?
M.P. Johnson, DBMS, Stern/NYU, Sp2004
26
Combining FDs (3.5)If some FDs are satisfied, thenothers are satisfied too
If all these FDs are true:name colorcategory departmentcolor, category price
Then this FD also holds: name, category price
Why?
M.P. Johnson, DBMS, Stern/NYU, Sp2004
27
Rules for FDs Reasoning about FDs: given a set of FDs,
infer other FDs – useful E.g. A B, B C A C Definitions: for FD-sets S and T
T follows from S if all relation-instances satisfying S also satisfy T.
S and T are equivalent if the sets of relation-instances satisfying S and T are the same.
I.e., S and T are equivalent if S follows from T, and T follows from S.
M.P. Johnson, DBMS, Stern/NYU, Sp2004
28
Splitting & combining FDsSplitting rule:
Combining rule:Note: doesn’t apply to the left side
Q: Does it apply to the left side?
A1A2…An B1B2…Bm
A1, A2, …, An B1
A1, A2, …, An B2
. . . . .A1, A2, …, An Bm
Bm...B1Am...A1
t1
t2
M.P. Johnson, DBMS, Stern/NYU, Sp2004
29
Reflexive rule: trivial FDs
FD A1A2…An B1B2…Bk may be Trivial: Bs are a subset of As Nontrivial: >=1 of the Bs is not among the As Completely nontrivial: none of the Bs is among the As
Trivial elimination rule: Eliminate common attributes from Bs, to get an equivalent
completely nontrivial FD
A1, A2, …, An Ai with i in 1..n is a trivial FD
A1 … An
t
t’
M.P. Johnson, DBMS, Stern/NYU, Sp2004
30
Transitive ruleIf
and
then
A1, A2, …, An B1, B2, …, Bm
B1, B2, …, Bm C1, C2, …, Cp
A1, A2, …, An C1, C2, …, Cp
A1 … Am B1 … Bm C1 ... Cp
t
t’
M.P. Johnson, DBMS, Stern/NYU, Sp2004
31
Augmentation rule
If
then
A1, A2, …, An B
A1, A2, …, An , C B, C, for any C
A1 … Am B1 … Bm C1 ... Cp
t
t’
M.P. Johnson, DBMS, Stern/NYU, Sp2004
32
Rules summary1. A B AC B (by definition)2. Separation/Combination3. Reflexive4. Augmentation5. Transitivity Last 3 called Armstrong’s Axioms Complete: entire closure follows from these Sound: no other FDs follow from these
M.P. Johnson, DBMS, Stern/NYU, Sp2004
33
Inferring FDs exampleStart from the following FDs:
Infer the following FDs:
1. name color2. category department3. color, category price
Inferred FD Which Ruledid we apply?
4. name, category name
5. name, category color
6. name, category category
7. name, category color, category
8. name, category price
Reflexive rule
Transitivity(4,1)
Reflexive rule
combine(5,6) or Aug(1)
Transitivity(3,7)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
34
Problem: infer ALL FDsGiven a set of FDs, infer all possible FDs
How to proceed? Try all possible FDs, apply all rules
E.g. R(A, B, C, D): how many FDs are possible? Drop trivial FDs, drop augmented FDs
Still way too many Better: use the Closure Algorithm (next)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
35
Closure of a set of AttributesGiven a set of attributes A1, …, An
The closure, {A1, …, An}+ = {B in Atts: A1, …, An B}
name colorcategory departmentcolor, category price
Example:
Closures: {name}+ = {name, color} {name, category}+ = {name, category, color, department, price} {color}+ = {color}
M.P. Johnson, DBMS, Stern/NYU, Sp2004
36
Closure Algorithm
Start with X={A1, …, An}.
Repeat:
if B1, …, Bn C is a FD and B1, …, Bn are all in X then add C to X.
until X didn’t change
{name, category}+ = {name, category, color, department, price}
name colorcategory departmentcolor, category price
Example:
M.P. Johnson, DBMS, Stern/NYU, Sp2004
37
Example
Compute {A,B}+ X = {A, B, }
Compute {A, F}+ X = {A, F, }
R(A,B,C,D,E,F) A, B CA, D EB DA, F B
In class:
M.P. Johnson, DBMS, Stern/NYU, Sp2004
38
More definitions Given FDs versus Derived FDs
Given FDs: stated initially for the relation Derived FDs: those that are inferred using the
rules or through closure Basis: A set of FDs from which we can infer
all other FDs for the relation Minimal basis: a minimal set of FDs (Can’t
discard any of them) No proper subset of the minimal basis can derive
the complete set of FDs
M.P. Johnson, DBMS, Stern/NYU, Sp2004
39
Problem: find all FDs Given a database instance Find all FD’s satisfied by that instance
Useful if we don’t get enough information from our users: need to reverse engineer a data instance
Q: How long does this take? A: Some time for each subset of atts Q: How many subsets?
powerset exponential time in worst-case But can often be smarter…
M.P. Johnson, DBMS, Stern/NYU, Sp2004
40
Using Closure to Infer ALL FDs A, B C
A, D BB D
Example:
Step 1: Compute X+, for every X:
A+ = A, B+ = BD, C+ = C, D+ = DAB+ = ABCD, AC+ = AC, AD+ = ABCDABC+ = ABD+ = ACD+ = ABCD (no need to compute– why?)BCD+ = BCD, ABCD+ = ABCD
Step 2: Enumerate all FD’s X Y, s.t. Y X+ and XY = :
AB CD, ADBC, ABC D, ABD C, ACD B
M.P. Johnson, DBMS, Stern/NYU, Sp2004
41
Example R(A,B,C) Each of three determines other two Q: What are the FDs?
Closure of singleton sets Closure of doubletons
Q: What are the keys? Q: What are the minimal bases?
M.P. Johnson, DBMS, Stern/NYU, Sp2004
42
Computing Keys Rephrasing of definition of key: X is a key if
X+ = all attributes X is minimal
Note: there can be many minimal keys ! Example: R(A,B,C), ABC, BCA
Minimal keys: AB and BC