c20.0046: database management systems lecture #5

43
M.P. Johnson, DBMS, Stern/NYU , Sp2004 1 C20.0046: Database Management Systems Lecture #5 Matthew P. Johnson Stern School of Business, NYU Spring, 2004

Upload: yagil

Post on 20-Mar-2016

29 views

Category:

Documents


0 download

DESCRIPTION

C20.0046: Database Management Systems Lecture #5. Matthew P. Johnson Stern School of Business, NYU Spring, 2004. Agenda. Last time: relational model Homework 1 is up, due next Tuesday This time: Functional dependencies Keys and superkeys in terms of FDs Finding keys for relations - PowerPoint PPT Presentation

TRANSCRIPT

M.P. Johnson, DBMS, Stern/NYU, Sp2004

1

C20.0046: Database Management SystemsLecture #5

Matthew P. JohnsonStern School of Business, NYUSpring, 2004

M.P. Johnson, DBMS, Stern/NYU, Sp2004

2

Agenda Last time: relational model Homework 1 is up, due next Tuesday This time:1. Functional dependencies

Keys and superkeys in terms of FDs Finding keys for relations

2. Rules for combining FDs Next time: anomalies & normalization

M.P. Johnson, DBMS, Stern/NYU, Sp2004

3

Review: converting inheritance

Movies

Cartoons Murder-Mysteries

isa isaVoices

Weapon

stars

length title year

Lion King

Component

M.P. Johnson, DBMS, Stern/NYU, Sp2004

4

Review: converting inheritance OO-style:

Nulls-style:

Title Year length

Star Wars 1980 120

Title Year length Murder-Weapon

Scream 1988 110 Knife

Title Year length

Lion King 1990 115

Title Year length Murder-Weapon

Roger Rabbit 1988 110 Knife

1. Plain movies 3. Murder mysteries

2. Cartoons 4. Cartoon murder-mysteries

Title Year length Murder-Weapon

Star Wars 1980 120 NULLLion King 1993 130 NULLScream 1988 110 Knife

Roger Rabbit 1990 115 Knife

M.P. Johnson, DBMS, Stern/NYU, Sp2004

5

Review: converting inheritance E/R-style

Root entity set: Movies(title, year, length)

1301993Lion King198819901980

Year

110115120

length

Roger RabbitScream

Star Wars

Title

Knife1990R. Rabbit1988

YearKnife

murderWeaponScream

TitleSubclass: MurderMysteries(title, year, murderWeapon)

Subclass: Cartoons(title, year)

1993Lion King1990

YearRoger Rabbit

Title

M.P. Johnson, DBMS, Stern/NYU, Sp2004

6

E/R-style & quasi-redundancy Name and year of Roger Rabbit were listed in three

different rows (in different tables) Suppose title changes (“Roger” “Roget”)

must change all three places Q: Is this redundancy? A: No!

name and year are independent multiple movies may have same name

Real redundancy reqs. depenency two rows agree on SSN must agree on rest

conflicting hair colors in these rows is an error two rows agree on movie title may still disagree

conflicting years may be correct – or may not be Better: introduce “movie-id” key att

M.P. Johnson, DBMS, Stern/NYU, Sp2004

7

Combined isa/weak example Exercise 3.3.1

Convert from E/R to R, by E/R, OO and nulls

courses

Lab-courses

Depts

Computer-allocation

room

number

givenBy

name chair

isa

M.P. Johnson, DBMS, Stern/NYU, Sp2004

8

Next topic: Functional dependencies (3.4) FDs are constraints

part of the schema can’t tell from particular relation instances FD may hold for some instances “accidentally”

Finding all FDs is part of DB design Used in normalization

M.P. Johnson, DBMS, Stern/NYU, Sp2004

9

Functional dependencies Definition:

Notation: Read: Ai functionally determines Bj

If two tuples agree on the attributes

A1, A2, …, An

then they must also agree on the attributes

B1, B2, …, Bm

A1, A2, …, An B1, B2, …, Bm

M.P. Johnson, DBMS, Stern/NYU, Sp2004

10

Typical Examples of FDs Product

name price, manufacturer

Person ssn name, age father’s/husband’s-name last-name zipcode state phone state (notwithstanding inter-state area codes)

Company name stockprice, president symbol name name symbol

M.P. Johnson, DBMS, Stern/NYU, Sp2004

11

To check A B, erase all other columns; for each rows t1, t2

i.e., check if remaining relation is many-one no “divergences” i.e., if AB is a well-defined function thus, functional dependency

Functional dependenciesBm...B1Am...A1

t1

t2

if t 1, t 2 agree here then t 1, t 2 agree here

M.P. Johnson, DBMS, Stern/NYU, Sp2004

12

FDs ExampleProduct(name, category, color, department, price)

name colorcategory departmentcolor, category price

Consider these FDs:

What do they say ?

M.P. Johnson, DBMS, Stern/NYU, Sp2004

13

FDs ExampleFDs are constraints:• On some instances they hold• On others they don’t

name category color department price

Gizmo Gadget Green Toys 49

Tweaker Gadget Green Toys 99

Does this instance satisfy all the FDs ?

name colorcategory departmentcolor, category price

M.P. Johnson, DBMS, Stern/NYU, Sp2004

14

FDs Example

name category color department price

Gizmo Gadget Green Toys 49

Tweaker Gadget Black Toys 99

Gizmo Stationary Green Office-supp. 59

What about this one ?

name colorcategory departmentcolor, category price

M.P. Johnson, DBMS, Stern/NYU, Sp2004

15

Q: Is PositionPhone an FD here?

A: It is for this instance, but no, presumably not in general

Others FDs? EmpID Name, Phone, Position but Phone Position

Recognizing FDs

EmpID Name Phone PositionE0045 Smith 1234 ClerkE1847 John 9876 SalesrepE1111 Smith 9876 SalesrepE9999 Mary 1234 Lawyer

M.P. Johnson, DBMS, Stern/NYU, Sp2004

16

Keys of relations {A1A2A3…An} is a key for relation R if

1. A1A2A3…An functionally determine all other attributes

Usual notation: A1A2A3…An B1B2…Bk

rels = sets distinct rows can’t agree on all Ai

2. A1A2A3…An is minimal No proper subset of A1A2A3…An functionally determines

all other attributes of R Primary key: chosen if there are several

possible keys

M.P. Johnson, DBMS, Stern/NYU, Sp2004

17

Keys example Relation: Student(Name, Address, DoB,

Email, Credits) Which (/why) of the following are keys?

Name, Address Name, SSN Email, SSN Email SSN

NB: minimal != smallest

M.P. Johnson, DBMS, Stern/NYU, Sp2004

18

Superkeys A set of attributes that contains a key Satisfies first condition:

functionally determines every other attribute in the relation

Might not satisfy the second condition: minimality may be possible to peel away some attributes

from the superkey keys are superkeys

key are special case of superkey superkey set is superset of key set

name;ssn is a superkey but not a key

M.P. Johnson, DBMS, Stern/NYU, Sp2004

19

Discovering keys for relations Relation entity set

Key of relation = (minimized) key of entity set Relation binary relationship

Many-many: union of keys of both entity sets Many(M)-one(O): only key of M (Why?) One-one: key of either entity set (but not both!)

M.P. Johnson, DBMS, Stern/NYU, Sp2004

20

Example – entity sets Key of entity set = (minimized) key of relation

Student(Name, Address, DoB, SSN, Email, Credits)

Student

Name

Address

DoB

SSN

Email

Credits

M.P. Johnson, DBMS, Stern/NYU, Sp2004

21

Example – many-many Many-many key: union of both ES keys

Student Enrolls Course

SSN Credits CourseID Name

Enrolls(SSN,CourseID)

M.P. Johnson, DBMS, Stern/NYU, Sp2004

22

Example – many-one Key of many ES but not of one ES

keys from both would be non-minimal

Course MeetsIn Room

CourseID Name RoomNo Capacity

MeetsIn(CourseID,RoomNo)

M.P. Johnson, DBMS, Stern/NYU, Sp2004

23

Example – one-one Keys of both ESs included in relation Key is key of either ES (but not both!)

Husbands Married Wives

SSN Name SSN Name

Married(HSSN, WSSN) or

Married(HSSN, WSSN)

M.P. Johnson, DBMS, Stern/NYU, Sp2004

24

Discovering keys: multiway Multiway relationships:

Multiple ways – may not be obvious R:F,G,HE is many-one E’s key is included

but not part of key Recall that relship atts are implicitly many-one

Course Enrolls Student

CourseID Name SSN NameSection

RoomNo CapacityEnrolls(CourseID,SSN,RoomNo)

M.P. Johnson, DBMS, Stern/NYU, Sp2004

25

Example Exercise 3.4.2 Relation relating particles in a box to

locations and velocitiesInPosition(id,x,y,z,vx,vy,vz)

Q: What FDs hold? Q: What are the keys?

M.P. Johnson, DBMS, Stern/NYU, Sp2004

26

Combining FDs (3.5)If some FDs are satisfied, thenothers are satisfied too

If all these FDs are true:name colorcategory departmentcolor, category price

Then this FD also holds: name, category price

Why?

M.P. Johnson, DBMS, Stern/NYU, Sp2004

27

Rules for FDs Reasoning about FDs: given a set of FDs,

infer other FDs – useful E.g. A B, B C A C Definitions: for FD-sets S and T

T follows from S if all relation-instances satisfying S also satisfy T.

S and T are equivalent if the sets of relation-instances satisfying S and T are the same.

I.e., S and T are equivalent if S follows from T, and T follows from S.

M.P. Johnson, DBMS, Stern/NYU, Sp2004

28

Splitting & combining FDsSplitting rule:

Combining rule:Note: doesn’t apply to the left side

Q: Does it apply to the left side?

A1A2…An B1B2…Bm

A1, A2, …, An B1

A1, A2, …, An B2

. . . . .A1, A2, …, An Bm

Bm...B1Am...A1

t1

t2

M.P. Johnson, DBMS, Stern/NYU, Sp2004

29

Reflexive rule: trivial FDs

FD A1A2…An B1B2…Bk may be Trivial: Bs are a subset of As Nontrivial: >=1 of the Bs is not among the As Completely nontrivial: none of the Bs is among the As

Trivial elimination rule: Eliminate common attributes from Bs, to get an equivalent

completely nontrivial FD

A1, A2, …, An Ai with i in 1..n is a trivial FD

A1 … An

t

t’

M.P. Johnson, DBMS, Stern/NYU, Sp2004

30

Transitive ruleIf

and

then

A1, A2, …, An B1, B2, …, Bm

B1, B2, …, Bm C1, C2, …, Cp

A1, A2, …, An C1, C2, …, Cp

A1 … Am B1 … Bm C1 ... Cp

t

t’

M.P. Johnson, DBMS, Stern/NYU, Sp2004

31

Augmentation rule

If

then

A1, A2, …, An B

A1, A2, …, An , C B, C, for any C

A1 … Am B1 … Bm C1 ... Cp

t

t’

M.P. Johnson, DBMS, Stern/NYU, Sp2004

32

Rules summary1. A B AC B (by definition)2. Separation/Combination3. Reflexive4. Augmentation5. Transitivity Last 3 called Armstrong’s Axioms Complete: entire closure follows from these Sound: no other FDs follow from these

M.P. Johnson, DBMS, Stern/NYU, Sp2004

33

Inferring FDs exampleStart from the following FDs:

Infer the following FDs:

1. name color2. category department3. color, category price

Inferred FD Which Ruledid we apply?

4. name, category name

5. name, category color

6. name, category category

7. name, category color, category

8. name, category price

Reflexive rule

Transitivity(4,1)

Reflexive rule

combine(5,6) or Aug(1)

Transitivity(3,7)

M.P. Johnson, DBMS, Stern/NYU, Sp2004

34

Problem: infer ALL FDsGiven a set of FDs, infer all possible FDs

How to proceed? Try all possible FDs, apply all rules

E.g. R(A, B, C, D): how many FDs are possible? Drop trivial FDs, drop augmented FDs

Still way too many Better: use the Closure Algorithm (next)

M.P. Johnson, DBMS, Stern/NYU, Sp2004

35

Closure of a set of AttributesGiven a set of attributes A1, …, An

The closure, {A1, …, An}+ = {B in Atts: A1, …, An B}

name colorcategory departmentcolor, category price

Example:

Closures: {name}+ = {name, color} {name, category}+ = {name, category, color, department, price} {color}+ = {color}

M.P. Johnson, DBMS, Stern/NYU, Sp2004

36

Closure Algorithm

Start with X={A1, …, An}.

Repeat:

if B1, …, Bn C is a FD and B1, …, Bn are all in X then add C to X.

until X didn’t change

{name, category}+ = {name, category, color, department, price}

name colorcategory departmentcolor, category price

Example:

M.P. Johnson, DBMS, Stern/NYU, Sp2004

37

Example

Compute {A,B}+ X = {A, B, }

Compute {A, F}+ X = {A, F, }

R(A,B,C,D,E,F) A, B CA, D EB DA, F B

In class:

M.P. Johnson, DBMS, Stern/NYU, Sp2004

38

More definitions Given FDs versus Derived FDs

Given FDs: stated initially for the relation Derived FDs: those that are inferred using the

rules or through closure Basis: A set of FDs from which we can infer

all other FDs for the relation Minimal basis: a minimal set of FDs (Can’t

discard any of them) No proper subset of the minimal basis can derive

the complete set of FDs

M.P. Johnson, DBMS, Stern/NYU, Sp2004

39

Problem: find all FDs Given a database instance Find all FD’s satisfied by that instance

Useful if we don’t get enough information from our users: need to reverse engineer a data instance

Q: How long does this take? A: Some time for each subset of atts Q: How many subsets?

powerset exponential time in worst-case But can often be smarter…

M.P. Johnson, DBMS, Stern/NYU, Sp2004

40

Using Closure to Infer ALL FDs A, B C

A, D BB D

Example:

Step 1: Compute X+, for every X:

A+ = A, B+ = BD, C+ = C, D+ = DAB+ = ABCD, AC+ = AC, AD+ = ABCDABC+ = ABD+ = ACD+ = ABCD (no need to compute– why?)BCD+ = BCD, ABCD+ = ABCD

Step 2: Enumerate all FD’s X Y, s.t. Y X+ and XY = :

AB CD, ADBC, ABC D, ABD C, ACD B

M.P. Johnson, DBMS, Stern/NYU, Sp2004

41

Example R(A,B,C) Each of three determines other two Q: What are the FDs?

Closure of singleton sets Closure of doubletons

Q: What are the keys? Q: What are the minimal bases?

M.P. Johnson, DBMS, Stern/NYU, Sp2004

42

Computing Keys Rephrasing of definition of key: X is a key if

X+ = all attributes X is minimal

Note: there can be many minimal keys ! Example: R(A,B,C), ABC, BCA

Minimal keys: AB and BC

M.P. Johnson, DBMS, Stern/NYU, Sp2004

43

Examples of Keys Product(name, price, category, color)

name, category pricecategory color

Keys are: {name, category}

Enrollment(student, address, course, room, time)student addressroom, time coursestudent, course room, time

Keys are: [in class]