c20.0046: database management systems lecture #8

77
M.P. Johnson, DBMS, Stern/NYU, Sprin g 2005 1 C20.0046: Database Management Systems Lecture #8 Matthew P. Johnson Stern School of Business, NYU Spring, 2005

Upload: harva

Post on 22-Jan-2016

50 views

Category:

Documents


0 download

DESCRIPTION

C20.0046: Database Management Systems Lecture #8. Matthew P. Johnson Stern School of Business, NYU Spring, 2005. Roadmap. Want to remove redundancy/anomalies Convert to BCNF Find FDs – closure alg Check if each FD A B is ok If A contains a key - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

1

C20.0046: Database Management SystemsLecture #8

Matthew P. Johnson

Stern School of Business, NYU

Spring, 2005

Page 2: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

2

Roadmap Want to remove redundancy/anomalies

Convert to BCNF

Find FDs – closure alg Check if each FD AB is ok

If A contains a key

If not, decompose into R1(A,B), R2(A,rest) Because AB, this will be lossless

Could check by joining R1 and R2 Would get no rows not in original

Page 3: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

3

Normal Forms First Normal Form = all attributes are atomic

As opposed to set-valued Assumed all along

Second Normal Form (2NF)

Third Normal Form (3NF)

Boyce Codd Normal Form (BCNF)

Fourth Normal Form (4NF)

Fifth Normal Form (5NF)

Page 4: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

4

Chappaqua789Bill

DC456Hilary

NY123Michael

Mailing-addressSSNName

Decomposition example

The anomalies are gone No more redundant data Easy to for Bill to move Okay for Bill to lose all phones

Break the relation into two:

Name SSN Mailing-address Phone

Michael 123 NY 212-111-1111

Michael 123 NY 917-111-1111

Hilary 456 DC 202-222-2222

Hilary 456 DC 914-222-2222

Bill 789 Chappaqua 914-222-2222

Bill 789 Chappaqua 212-333-3333

212-333-3333789

914-222-2222789

914-222-2222456

202-222-2222456

917-111-1111123

212-111-1111123

PhoneSSN

Page 5: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

5

Boyce-Codd Normal Form Name/phone example is not BCNF:

{ssn,phone} is key FD: ssn name,mailing-address holds

Violates BCNF: ssn is not a superkey

Its decomposition is BCNF Only superkeys anything else

Name SSN Mailing-address Phone

Michael 123 NY 212-111-1111

Michael 123 NY 917-111-1111

Name SSN Mailing-address

Michael 123 NY

SSN PhoneNumber

123 212-111-1111

123 917-111-1111

Page 6: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

6

BCNF motivation Two big ideas:

Only a key field can determine other fields Key values are unique no FD-caused redundancy

Slogan: “Every FD must contain the key, the whole key and nothing but the key.”

More accurate: “Every FD must contain (on the left) a key, a whole key, and maybe other fields.

Page 7: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

7

Design examples Consider situation:

Entities: Parts, Suppliers, Departments Relship: Contracts(P,S,D,id,quant)

Draw E/R

New rule: no department can buy multiple parts from the same supplier (why?)

Translate to FD Normalize

Page 8: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

8

Design examples Consider situation:

Entities: Emp(ssn,name,lot), Dept(id,name,budg) Relship: Works(E,D,since)

Draw E/R

New info: in each dept, everyone parks in same lot

Translate to FD Normalize

Page 9: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

9

BCNF Decomposition Larger example: multiple decompositions {Title, Year, Studio, President, Pres-Address} FDs:

Title Year Studio Studio President President Pres-Address Studio President, Pres-Address (why?)

No many-many this time Problem cause: transitive FDs:

Title,year studio president

Page 10: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

10

BCNF Decomposition Illegal: As Bs, where As don’t include key Decompose: Studio President, Pres-Address

As = {studio} Bs = {president, pres-address} Cs = {title, year}

Result:1. Studios(studio, president, pres-address)2. Movies(studio, title, year)

Is (2) in BCNF? Is in (1) BCNF? Key: Studio FD: President Pres-Address Q: Does president studio? If so, president is a key But if not, it violates BCNF

Page 11: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

11

BCNF Decomposition Studios(studio, president, pres-address) Illegal: As Bs, where As don’t include key Decompose: President Pres-Address

As = {president} Bs = {pres-address} Cs = {studio}

{Studio, President, Pres-Address} becomes {President, Pres-Address} {Studio, President}

Page 12: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

12

Decomposition algorithm example R(N,O,R,P) F = {N O, O R, R N}

Key: N,P Violations of BCNF: N O, OR, N OR

which kinds of violations are these? Pick N OR (on board) Can we rejoin? (on board) What happens if we pick N O instead? Can we rejoin? (on board)

Name Office Residence Phone

George Pres. WH 202-…

George Pres. WH 486-…

Dick VP NO 202-…

Dick VP NO 307-…

Page 13: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

13

BCNF and two-att relations Must a two-attribute relation be in BCNF?

Case 1: there are no non-trivial FDs Case 2: A B but not B A Case 3: B A but not A B Case 4: Both A B and B A

Note that relations may have multiple keys BCNF requires a key on the left, not all keys

Page 14: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

14

Lossless BCNF decomposition Consider simple relation: R(A,B,C) Only FD: A B (assume C!A) Key: A,C

Also goes through if CA BCNF violation: no key on the left

Thus: Decomposition to BCNF: Create R1(A,B) and R2(A,C)

Could this be lossy? We will join R1 and R2 on A to find out

Page 15: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

15

Lossless BCNF decomposition Suppose R contains the rows: (b,a,c) and (b’,a,c’) In projection onto (B,A):

(b,a,c) (b,a), (b’,a,c’) (b’,a) In projection onto (A,C):

(b,a,c) (a,c), (b’,a,c’) (a,c’) In joining, (b’,a) and (a,c) become (b’,a,c), and (b,a) and (a,c’) become (b,a,c’)

Q: Is/must/can this be correct? A: Yes! A B, so b = b’

So this was lossless We assumed C!A, but argument also goes

through when CA Moral: BCNF decomp alg really is lossless

Page 16: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

16

BCNF summary BCNF decomposition is lossless

Can reproduce original by joining Saw: Every 2-attribute relation is in BCNF Final set of decomposed relations might be

different depends on order of bad FDs chosen

Saw: But all results will be in BCNF

Page 17: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

17

A problem with BCNF Relation: R(Title, Theater, Neighboorhood) FDs:

Title,N’hood Theater Assume a movie shouldn’t play twice in same

neighborhood Theater N’hood

Keys: {Title, N’hood} {Theater, Title}

Title Theater N’hood

Aviator Angelica Village

Life Aquatic Angelica Village

Page 18: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

18

A problem with BCNF BCNF violation: Theater N’hood Decompose:

{Theater, N’Hood} {Theater, Title}

Resulting relations:

VillageAngelica

N’hoodTheater

R1

Life AquaticAngelica

AviatorAngelica

TitleTheater

R2

Page 19: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

19

Problem - continued Suppose we add new rows to R1 and R2:

Their join:

Life AquaticVillageFilm Forum

Village

Village

N’hood

Aviator

Life Aquatic

Title

Angelica

Angelica

Theater

(R’)

Theater N’hood

Angelica Village

Film Forum Village

Theater Title

Angelica Life Aquatic

Angelica Aviator

Film Forum Life Aquatic

R1 R2

A and B could not enforce FD Title,N’hood Theater

Page 20: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

20

Third normal form: motivation There are some situations in which

BCNF is not dependency-preserving, and Efficient checking for FD violation on updates is

important In these cases BCNF is too severe a req.

Solution: define a weaker normal form, called Third Normal Form in which FDs can be checked on individual relations

without performing a join (no inter-relational FDs) to which relations can be converted, preserving both

data and FDs

Page 21: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

21

Third Normal Form BCNF decomposition is not dependency-preserving! We now define the (weaker) Third Normal Form

Turns out: this example was already in 3NF

A relation R is in 3rd normal form if :

For every nontrivial dependency A1, A2, ..., An Bfor R, {A1, A2, ..., An } is a super-key for R, or B is part of a key, i.e., B is prime

A relation R is in 3rd normal form if :

For every nontrivial dependency A1, A2, ..., An Bfor R, {A1, A2, ..., An } is a super-key for R, or B is part of a key, i.e., B is prime

Tradeoff:BCNF = no FD anomalies, but may lose some FDs3NF = keeps all FDs, but may have some anomalies

Page 22: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

22

BCNF: vices and virtues Be clear on the problem just described v. the

arg. that BCNF decomp is data-lossless

BCNF decomp does not lose data Resulting relations can be rejoined to obtain the

original

But: it can can lose dependencies After decomp, now legal to add rows whose

corresponding rows would be illegal in (rejoined) original

Page 23: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

23

Recap: goals of normalization When we decompose a relation R with FDs F into

R1..Rn we want:

1. lossless-join decomposition – no data lost

2. no/little redundancy: the relations Ri should be in either BCNF or at least 3NF

3. Dependency preservation: if Fi be the set of dependencies in F+ that include only attributes in Ri:

F is the “sum” of the FDs of the new relations (F1 F2 F3 … Fn)+ = F+

Otherwise checking updates for violation of FDs may require computing joins, which is expensive

Page 24: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

24

Dependency preservation Saw that last req. didn’t hold in move-theater

example Did it hold in R(N,O,R,P) example?

(on board)

Page 25: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

25

Testing for 3NF For each dependency X Y, use attribute closure

to check if X is a superkey If X is not a superkey, verify that each attribute in Y

is prime This test is rather more expensive, since it involves finding

candidate keys Testing for 3NF is NP-complete Interestingly, decomposition into 3NF can be done in

polynomial time Testing for 3NF is harder than decomposing into 3NF!

Optimization: need to check only FDs in F, need not check all FDs in F+ (why?)

Page 26: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

26

3NF Example R = (J, K, L) F = (JK L, L K) Two candidate keys: JK and JL R is in 3NF

JK L JK is a superkey L K K is prime

BCNF decomposition yields R1 = (L,K), R2 = (L,J)

testing for JK L requires a join There is some redundancy in R

Page 27: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

27

BCNF and 3NF Comparison Example of problems due to redundancy in 3NF

R = (J, K, L) F = (JK L, L K)

A schema that is in 3NF but not BCNF has the problems of: redundancy (e.g., the relationship between l1 and k1) need to use null values (if allowed!), e.g. to represent the

relationship between l2 and k2 when there is no corresponding value for attribute J

J K L

j1 k1 l1

j2 k1 l1

j3 k1 l1

NULL k2 l2

Page 28: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

28

Comparison of BCNF and 3NF It is always possible to decompose a relation

into relations in 3NF such that: the decomposition is lossless the dependencies are preserved

It is always possible to decompose a relation into relations in BCNF such that: the decomposition is lossless but it may not be possible to preserve

dependencies But may eliminate more redundancy

Page 29: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

29

The Normal Forms (so far) 1NF: every attribute has an atomic value 2NF: no longer used 3NF: for each FD X Y either

it is trivial, or X is a superkey, or Y is a part of some key

BCNF: 3NF and third 3NF option disallowed

Page 30: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

30

Distinguishing examples 1NF but not 2NF: R(Name, SSN ,Mailing-

address,Phone) Key: SSN,Phone Partial: ssn name, address

3NF but not BCNF: R(Title, Theater, N’hood) Title,N’hood Theater Prime-on-right: Theater N’hood

Page 31: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

31

Design Goals Goal for a relational database design is:

No redundancy Lossless Join Dependency Preservation

If we cannot achieve this, we accept one of dependency loss use of more expensive inter-relational methods to preserve

dependencies data redundancy due to use of 3NF

Interesting: SQL does not provide a direct way of specifying FDs other than superkeys can specify FDs using assertions, but they are expensive to test

Page 32: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

32

3NF 3NF means we may have anomalies Example: TEACH(student, teacher, subject)

student, subject teacher (students not allowed in the same subject with two teachers)

teacher subject (each teacher teaches one subject) Subject is prime, so this is 3NF

But we have anomalies: Insertion: cannot insert a teacher until we have a

student taking his subject If we convert to BCNF, we lost student,

subject teacher

Page 33: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

33

BCNF and over-normalization What is the problem? Schema overload – trying to capture two meanings:

1) subject X can be taught by teacher Y 2) student Z takes subject W from teacher V

What to do? 3NF has anomalies, normalizing to BCNF loses FDs One soln: keep the 3NF TEACH and another

(BCNF) relation SUBJECT-TAUGHT (teacher, subject)

Still (more!) redundancy, but no more insert and delete anomalies

Page 34: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

34

Normalization Review Q: What’s required for BCNF?

Q: How do we fix a non-BCNF relation?

Q: What’s the loophole for 3NF?

Page 35: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

35

Normalization Review Q: If AsBs violates BCNF, what do we do?

Q: In this case, could the decomposition be lossy?

Q: How do we combine two relations?

Q: Can BCNF decomp. lose FDs?

Q: Can 3NF decomp. lose FDs?

Page 36: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

36

New topic: MVDs Consider this relation

People ~ their jobs ~ their residences Person-address/city: many-many Person-job: many-many Address/city-job: independent

Chappaqua333 Some StreetFirst Lady456Hilary

Washington444 Embassy RowFirst Lady456Hilary

New York111 East 60th StreetCEO123Michael

London222 Brompton RoadCEO123Michael

444 Embassy Row

333 Some Street

444 Embassy Row

333 Some Street

222 Brompton Road

111 East 60th Street

Streets

Lawyer

Lawyer

Senator

Senator

Mayor

Mayor

Jobs

Washington456Hilary

Chappaqua789Hilary

Washington789Hilary

Chappaqua456Hilary

London123Michael

New York123Michael

CitysSSNName

Page 37: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

37

Redundancy in BCNF

Lots of redundancy! Key? All fields

None determined by others! Non-trivial FDs? None! In BCNF? Yes!

Name Streets Citys Jobs

Michael 111 East 60th Street New York Mayor

Michael 222 Brompton Road London Mayor

Michael 111 East 60th Street New York CEO

Michael 222 Brompton Road London CEO

Hilary 333 Some Street Chappaqua Senator

Hilary 444 Embassy Row Washington Senator

Hilary 333 Some Street Chappaqua First Lady

Hilary 444 Embassy Row Washington First Lady

Hilary 333 Some Street Chappaqua Lawyer

Hilary 444 Embassy Row Washington Lawyer

Now what? New concept, leading

to another normal form: Multivalued

dependencies

Page 38: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

38

As Bs if, when As are held fixedvalues in Bs are independent of

values in rest

More precisely: if t1 and t3 agree on As, we then can find t2 such that

t2, t2, t3 agree on As

t2, t1 agree of Bs

t2, t3 agree on Cs

MVD definition

As Bs Cst1

As Bs Cst2

As Bs Cst3

| |

| |

| |

| |

Page 39: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

39

MVD example Claim: name streets,cities If true: can pick arbitrary t1, t3 and find a t2

We pick: first and last of Hilary’s tuples:

Now: if true, can find another Hilary row with street/address of t1 and job of t3

LawyerWashington444 Embassy RowHilary

JobsCitysStreetsName

SenatorChappaqua333 Some StreetHilaryt1

t3

LawyerChappaqua333 Some StreetHilaryt2

Page 40: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

40

MVD example Now: if true, can find another Hilary row with

street/address of t1 and job of t3

Sure enough:

Hilary 333 Some Street Chappaqua Lawyert2

Name Streets Citys Jobs

Michael 111 East 60th Street New York Mayor

Michael 222 Brompton Road London Mayor

Michael 111 East 60th Street New York CEO

Michael 222 Brompton Road London CEO

Hilary 333 Some Street Chappaqua Senator

Hilary 444 Embassy Row Washington Senator

Hilary 333 Some Street Chappaqua First Lady

Hilary 444 Embassy Row Washington First Lady

Hilary 333 Some Street Chappaqua Lawyer

Hilary 444 Embassy Row Washington Lawyer

t2

Page 41: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

41

MVD rules No splitting rule:

In the example, name streets,cities Do we have name streets?

No: 444 Embassy Row doesn’t go with Chappaqua

NB: City doesn’t determine street – could have >1 house But city, street aren’t independent

Name Streets Citys Jobs

Hilary 333 Some Street Chappaqua Senator

Hilary 444 Embassy Row Washington Lawyer

t1

t3

Page 42: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

42

MVD rules Trivial dependencies:

As Bs iff As BsAi

Transitive rule: As Bs, Bs Cs As Cs

Complementation rule: As Bs As rest Intuition: if each value in Bs is assoc’ed w/each value in

rest, then each value of rest is assoc’ed w/each value in BsName Streets Citys Jobs

Michael 111 East 60th Street New York Mayor

Michael 222 Brompton Road London Mayor

Michael 111 East 60th Street New York CEO

Michael 222 Brompton Road London CEO

Page 43: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

43

MVDs and FDs MVD is a generalization of FD Every FD is an MVD Pf: Suppose As Bs

Pick t1, t3 that agree on As.

Must find a t2. Let t2 be t3.

Then1) t2 agrees on As with both

2) t2 agrees on Bs with t1 (why?)

3) t2 agrees on rest with t3 (why?)

QED

Page 44: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

44

Fourth Normal Form 4NF: like BCNF, but with MVDs not FDs An MVD As Bs is nontrivial if

No Bs are As Some attributes left over (why?)

4NF: for every nontrivial MVD

As Bs, As is a superkey In example name streets,cities, but

name isn’t a superkeyName Streets Citys Jobs

Hilary 333 Some Street Chappaqua Senator

Hilary 444 Embassy Row Washington Lawyer

Page 45: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

45

Decomposition to 4NF Again, analogous to BCNF If we can find As Bs for R where As isn’t

a superkey, replace R with R1(As,Bs) and R2(As,rest)

Running example: name streets,cities People(name,streets,cities,jobs) becomes

Residences(name,street,city) and Employment(name,job)

Page 46: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

46

4NF: another construal In nontrivial As Bs, As must be superkey After df of 4NF, text says: “That is, … every

nontrivial MVD is really a FD with a superkey on the left” (p123).

We know: FDs are* MVDs but not vice versa So: Why does this follow? Is it true? Yes. As is a superkey As everything As Bs the MVD is an FD Two kinds of MVDs: FDs and “true” MVDs 4NF eliminates exactly the true ones

* The typo swapping these was fixed.

Page 47: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

47

Summary of normal forms

Guaranteed to 3NF BCFN 4NF

Eliminate FD redundancy

Mostly Yes Yes

Eliminate MVD redundancy

No No Yes

Preserve FDs Yes No No

Preserve MVDs No No No

Page 48: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

48

Next topic: relational algebra Set operations: union, intersection, difference Projection, selection Cartesian Product Joins: natural joins, theta joins Combining operations to form queries Dependent and independent operations

Page 49: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

49

What is relational algebra? An algebra for relations “High-school” algebra is an algebra for numbers Formalism for constructing expressions

Operations Operands: Variables, Constants, expressions

Expressions: Vars & constants Operators applied to expressions

Algebra Vars/consts Operators

High-school Numbers + * - / etc.

Relational Relations (=sets of tupes)

union, intersection, join, etc.

Page 50: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

50

Why do we care about relational algebra? Why construct expressions on relations? The exprs are the form questions about the

take The relations these exprs cash out to are the

answers to our questions First proof of RDBMS/RA concept: System R

(1979) Modern implementation of RA: SQL

Page 51: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

51

Relation operators Five basic operators:

Union: Intersection: Difference: - Selection: Projection: Cartesian Product:

Derived/auxiliary operators: Intersection, complement Joins (natural, equijoin, theta join, semijoin) Renaming:

Page 52: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

52

Operators Relations are sets have set-theoretic ops

Venn diagrams

Union: R1 R2 Example:

ActiveEmployees RetiredEmployees

Difference: R1 – R2 Example:

AllEmployees – RetiredEmployees = ActiveEmployees

Page 53: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

53

Set operations - exampleName Address Gender Birthdate

Fisher 123 Maple F 9/9/99

Hamill 456 Oak M 8/8/88

Name Address Gender Birthdate

Fisher 123 Maple F 9/9/99

Ford 345 Palm M 7/7/77

R:

S:

Name Address Gender Birthdate

Fisher 123 Maple F 9/9/99

Hamill 456 Oak M 8/8/88

Ford 345 Palm M 7/7/77

R S:

Page 54: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

54

Set operations - exampleName Address Gender Birthdate

Fisher 123 Maple F 9/9/99

Hamill 456 Oak M 8/8/88

Name Address Gender Birthdate

Fisher 123 Maple F 9/9/99

Ford 345 Palm M 7/7/77

R:

S:

R - S: Name Address Gender Birthdate

Hamill 456 Oak M 8/8/88

Page 55: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

55

Operators Intersection: R1 R2 Example:

UnionizedEmployees RetiredEmployees

Intersection can be derived from and – R1 R2 = R1 – (R1 – R2) R1 R2 = -(-R1 -R2) (allowed?)

Page 56: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

56

Set operations - exampleName Address Gender Birthdate

Fisher 123 Maple F 9/9/99

Hamill 456 Oak M 8/8/88

Name Address Gender Birthdate

Fisher 123 Maple F 9/9/99

Ford 345 Palm M 7/7/77

R:

S:

R S: Name Address Gender Birthdate

Fisher 123 Maple F 9/9/99

Page 57: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

57

Operators Selection Selects all tuples satisfying a condition Notation: c(R)

Examples salary > 100000(Employee) name = “Smith”(Employee)

The condition c can have comparison ops:=, <, , >, , <> boolean ops: and, or

Page 58: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

58

Selection example

Select the movies at Angelica: Theater=“Angelica”(Showings)

City of GodVillageFilm Forum

Village

Village

N’hood

Fog of War

City of God

Title

Angelica

Angelica

Theater

Village

Village

N’hood

Fog of War

City of God

Title

Angelica

Angelica

Theater

Page 59: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

59

Operators Projection: op we used for decomposition

Eliminates columns, then removes duplicates

Notation: A1,…,An(R)

Page 60: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

60

Operators Cartesian Product

Cross product Each tuple in R1 combines w/each tuple in R2

Notation: R1 R2

If R1, R2 fields overlap, include both and disambiguate: R1.A, R2.A

Fairly rare in practice used to express joins

Q: Where does the name come from? Q: If R1 has n1 rows and R2 has n2, how

large is R1 x R2?

Page 61: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

61

Cartesian product example

Street City

333 Some Street Chappaqua

444 Embassy Row Washington

333 Some Street Chappaqua

Hillary-addresses

Job

Senator

First Lady

Lawyer

Hillary-jobs

Street City Job

333 Some Street Chappaqua Senator

444 Embassy Row Washington Senator

333 Some Street Chappaqua First Lady

444 Embassy Row Washington First Lady

333 Some Street Chappaqua Lawyer

444 Embassy Row Washington Lawyer

Hillary-addresses x Hillary-jobs

Page 62: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

62

Operators Natural join: our join up to now

But always merging shared attributes Notation: R1 ⋈ R2 Meaning:

R1 ⋈ R2 = every att once(shared atts =(R1 R2)) I.e., first compute the cross product R1 x R2

Next, select the rows in which shared fields agree

Finally, project onto the union of R1 and R2’s fields (remove duplicates)

Page 63: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

63

Natural join example

Name Street City

Hilary 333 Some Street Chappaqua

Hilary 444 Embassy Row Washington

Hilary 333 Some Street Chappaqua

Addresses

Name Job

Hilary Senator

Hilary First Lady

Hilary Lawyer

Jobs

Addresses ⋈ JobsName Street City Job

Hilary 333 Some Street Chappaqua Senator

Hilary 444 Embassy Row Washington Senator

Hilary 333 Some Street Chappaqua First Lady

Hilary 444 Embassy Row Washington First Lady

Hilary 333 Some Street Chappaqua Lawyer

Hilary 444 Embassy Row Washington Lawyer

Page 64: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

64

Natural Join R S

R ⋈ S= ?

Unpaired tuples called dangling

A B

X Y

X Z

Y Z

Z V

B C

Z U

V W

Z V

Page 65: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

65

Natural Join Given the schemas R(A, B, C, D), S(A, C, E),

what is the schema of R ⋈ S ?

Given R(A, B, C), S(D, E), what is R ⋈ S?

Given R(A, B), S(A, B), what is R ⋈ S?

Page 66: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

66

Theta Join Like natural join, but

includes only rows that satisfy arbitrary condition Does not project away shared attributes

R1 ⋈ R2 = (R1 R2)

Here can be any condition If condition is always satisfies, then theta join

becomes natural join

Page 67: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

67

Theta-join exampleA B C

1 2 3

6 7 8

9 7 8

B C D

2 3 4

2 3 5

7 8 10

A U.B U.C V.B V.C D

1 2 3 2 3 4

1 2 3 2 3 5

1 2 3 7 8 10

6 7 8 7 8 10

9 7 8 7 8 10

U V

U V

A<D

Page 68: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

68

Equijoin A theta join where is an equality R1 ⋈A=B R2 = A=B(R1 R2) = lower-case sigma Example:

Employee ⋈SSN=SSN Dependents

Most useful join in practice

Page 69: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

69

Semijoin R ⋉ S = {atts of R}(R ⋈ S) Q: What does this mean?

Natural join of R and S; Then project onto R’s atts

A: The rows of R for which >1 row in S agree on shared atts

Page 70: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

70

Semijoin example

SSN Name

. . . . . .

DSSN Dname SSN

. . . . . .

EmployeeDependents

network

Employee ⋉ Dependents =

{employees who have dependents}

Employee ⋉ Dependents =

{employees who have dependents}

Page 71: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

71

Renaming Changes the schema, not the instance Notation: B1,…,Bn(R) is spelled “rho”, pronounced “row” Example:

Employee(ssn,name) social, name)(Employee)

Or just: (Employee)

Page 72: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

72

Complex RA Expressions Q: How long was Star Wars (1977)?

Strategy: find the row with Star Wars; then project the length field

Title Year Length inColor Studio Prdcr#

Star Wars 1977 124 True Fox 12345

M.Ducks 1991 104 True Disney 67890

W.World 1992 95 True Paramount 99999

Page 73: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

73

Combining operations Schema: Movies (Title, year, length, filmType, studioName)

Query: select titles and years of movies by Fox that are at least 100 minutes long.

Title Year Length Filmtype StudioStar wars 1977 124 Color Fox

Mighty ducks 1991 104 Color Disney

Wayne’s world 1992 85 Color Paramount

Page 74: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

74

Complex RA Expressions Reps(ssn, name, etc.) Clients(ssn, name, rssn) Q: Find George’s client names Clients.name(Reps.name=George(Reps.ssn=rssn(

Reps x Clients))) Or: Clients.name(Reps.name=George and Reps.ssn=rssn(Reps x

Clients)) Or: Clients.name(Reps.name=George(Reps x Clients)

Reps.ssn=rssn(Reps x Clients))

Page 75: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

75

For next time Finish chapter 5 Come to office hours!

Page 76: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

76

BCNF Review Q: What’s required for BCNF?

Q: What’s the slogan for BCNF?

Q: Who are B & C?

Q: What are the two types of violations?

Page 77: C20.0046: Database Management Systems Lecture #8

M.P. Johnson, DBMS, Stern/NYU, Spring 2005

77

BCNF Review Q: How do we fix a non-BCNF relation?

Q: If AsBs violates BCNF, what do we do? Q: In this case, could the decomposition be lossy?

Q: Under what circumstances could a decomposition be lossy?

Q: How do we combine two relations?