er diagrams (concluded), schema refinement, and normalization zachary g. ives university of...

39
ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 6, 2005 me slide content courtesy of Susan Davidson & Raghu Ramakrishnan

Upload: kathlyn-hines

Post on 04-Jan-2016

225 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

ER Diagrams (Concluded),Schema Refinement, and Normalization

Zachary G. IvesUniversity of Pennsylvania

CIS 550 – Database & Information Systems

October 6, 2005

Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan

Page 2: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

2

Examples of ER Diagrams

Please interpret these ER diagrams:

COURSESSTUDENTS Takes

COURSESSTUDENTS Takes

STUDENTS COURSESTakes

Page 3: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

3

Converting ER Relationship Sets to Tables: 1:n Relationships

CREATE TABLE Teaches( fid INTEGER, serno CHAR(15), semester CHAR(4), PRIMARY KEY (serno), FOREIGN KEY (fid) REFERENCES PROFESSORS, FOREIGN KEY (serno) REFERENCES Teaches)CREATE TABLE Teaches_Course( serno INTEGER, subj VARCHAR(30), cid CHAR(15), fid CHAR(15), when CHAR(4), PRIMARY KEY (serno), FOREIGN KEY (fid) REFERENCES PROFESSORS)

• “1” entity = key of relationship set:

• Or embed relationship in “many” entity set:

COURSES

PROFESSORS

Teaches

Page 4: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

4

1:1 Relationships

If you borrow money or have credit, you might get:

What are the table options?

CreditReport Borrower

delinquent?

ssn

namedebt

Describesrid

Page 5: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

5

ISA Relationships: Subclassing(Structurally)

Inheritance states that one entity is a “special kind” of another entity: “subclass” should be member of “base class”

name

ISA

Peopleid

Employees salary

Page 6: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

6

But How Does this Translateinto the Relational Model?

Compare these options: Two tables, disjoint tuples Two tables, disjoint attributes One table with NULLs Object-relational databases (allow subclassing

of tables)

Page 7: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

7

Weak Entities

A weak entity can only be identified uniquely using the primary key of another (owner) entity. Owner and weak entity sets in a one-to-many

relationship set, 1 owner : many weak entities Weak entity set must have total

participation

People Feeds Pets

ssn name weeklyCost name species

Page 8: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

8

Translating Weak Entity Sets

Weak entity set and identifying relationship set are translated into a single table; when the owner entity is deleted, all owned weak entities must also be deleted

CREATE TABLE Feed_Pets ( name VARCHAR(20), species INTEGER, weeklyCost REAL, ssn CHAR(11) NOT NULL, PRIMARY KEY (pname, ssn), FOREIGN KEY (ssn) REFERENCES Employees, ON DELETE CASCADE)

Page 9: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

9

N-ary Relationships

Relationship sets can relate an arbitrary number of entity sets:

Student Project

Advisor

IndepStudy

Page 10: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

10

Summary of ER Diagrams

One of the primary ways of designing logical schemas

CASE tools exist built around ER (e.g. ERWin, PowerBuilder, etc.) Translate the design automatically into DDL,

XML, UML, etc. Use a slightly different notation that is better

suited to graphical displays Some tools support constraints beyond what ER

diagrams can capture Can you get different ER diagrams from the

same data?

Page 11: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

11

Schema Refinement & Design Theory

ER Diagrams give us a start in logical schema design

Sometimes need to refine our designs further There’s a system and theory for this Focus is on redundancy of data

Causes update, insertion, deletion anomalies

Page 12: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

12

Not All Designs are Equally Good

Why is this a poor schema design?

And why is this one better?

Stuff(sid, name, serno, subj, cid, exp-grade)

Student(sid, name)Course(serno, cid)Subject(cid, subj)Takes(sid, serno, exp-grade)

Page 13: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

13

Focus on the Bad Design

Certain items (e.g., name) get repeated Some information requires that a student be

enrolled (e.g., courses) due to the key

sid

name

serno

subj

cid

exp-grade

1 Sam 570103

AI 520

B

23 Nitin 550103

DB 550

A

45 Jill 505103

OS 505

A

1 Sam 505103

OS 505

C

Page 14: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

14

Functional DependenciesDescribe “Key-Like” Relationships

A key is a set of attributes where:If keys match, then the tuples match

A functional dependency (FD) is a generalization:If an attribute set determines another, written X ! Y

then if two tuples agree on attribute set X, they must agree on X:

sid ! name

What other FDs are there in this data? FDs are independent of our schema design

choice

Page 15: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

15

Formal Definition of FD’s

Def. Given a relation schema R and subsets X, Y of R:An instance r of R satisfies FD X Y if,

for any two tuples t1, t2 2 r, t1[X ] = t2[X] implies t1[Y] = t2[Y]

For an FD to hold for schema R, it must hold for every possible instance of r

(Can a DBMS verify this? Can we determine this by looking at an instance?)

Page 16: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

16

General Thoughts on Good Schemas

We want all attributes in every tuple to be determined by the tuple’s key attributes, i.e. part of a superkey (for key X Y, a superkey is a “non-minimal” X)What does this say about redundancy?

But: What about tuples that don’t have keys (other

than the entire value)? What about the fact that every attribute

determines itself?

Page 17: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

17

Armstrong’s Axioms: Inferring FDs

Some FDs exist due to others; can compute using Armstrong’s axioms:

Reflexivity: If Y X then X Y (trivial dependencies)

name, sid name

Augmentation: If X Y then XW YWserno subj so serno, exp-grade subj, exp-grade

Transitivity: If X Y and Y Z then X Zserno cid and cid subj

so serno subj

Page 18: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

18

Armstrong’s Axioms Lead to…

Union: If X Y and X Z then X YZ

Pseudotransitivity: If X Y and WY Z then XW Z

Decomposition: If X Y and Z Y then X Z

Let’s prove these from Armstrong’s Axioms

Page 19: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

19

Closure of a Set of FD’s

Defn. Let F be a set of FD’s. Its closure, F+, is the set of all FD’s:

{X Y | X Y is derivable from F by Armstrong’s Axioms}

Which of the following are in the closure of our Student-Course FD’s?name name

cid subj

serno subj

cid, sid subj

cid sid

Page 20: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

20

Attribute Closures: Is SomethingDependent on X?

Defn. The closure of an attribute set X, X+, is:

X+ = {Y | X Y F +} This answers the question “is Y determined

(transitively) by X?”; compute X+ by:

Does sid, serno subj, exp-grade?

closure := X;repeat until no change {

if there is an FD U V in F such that U is in closure then add V to closure}

Page 21: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

21

Equivalence of FD sets

Defn. Two sets of FD’s, F and G, are equivalent if

their closures are equivalent, F + = G +

e.g., these two sets are equivalent: {XY Z, X Y} and {X Z, X Y}

F + contains a huge number of FD’s (exponential in the size of the schema)

Would like to have smallest “representative” FD set

Page 22: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

22

Minimal Cover

Defn. A FD set F is minimal if:1. Every FD in F is of the form X A,

where A is a single attribute2. For no X A in F is:

F – {X A } equivalent to F3. For no X A in F and Z X is: F – {X A } {Z A } equivalent to FDefn. F is a minimum cover for G if F is minimal

and is equivalent to G.e.g.,

{X Z, X Y} is a minimal cover for{XY Z, X Z, X Y}

in a sense,each FD is“essential”to the cover

we expresseach FD insimplest form

Page 23: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

23

More on Closures

If F is a set of FD’s and X Y F + then for some attribute A Y, X A F +

Proof by counterexample. Assume otherwise and let Y = {A1,..., An} Since we assume X A1, ..., X An are in F +

then X A1 ... An is in F + by union rule,

hence, X Y is in F + which is a contradiction

Page 24: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

24

Why Armstrong’s Axioms?Why are Armstrong’s axioms (or an

equivalent rule set) appropriate for FD’s? They are: Consistent: any relation satisfying FD’s in F

will satisfy those in F +

Complete: if an FD X Y cannot be derived by Armstrong’s axioms from F, then there exists some relational instance satisfying F but not X Y

In other words, Armstrong’s axioms derive all the FD’s that should hold

Page 25: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

25

Proving Consistency

We prove that the axioms’ definitions must be true for any instance, e.g.:

For augmentation (if X Y then XW YW):

If an instance satisfies X Y, then: For any tuples t1, t2 r,

if t1[X] = t2[X] then t1[Y] = t2[Y] by defn.

If, additionally, it is given that t1[W] = t2[W], then t1[YW] = t2[YW]

Page 26: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

26

Proving Completeness

Suppose X Y F + and define a relational instance r that satisfies F + but not X Y: Then for some attribute A Y, X A F +

Let some pair of tuples in r agree on X+ but disagree everywhere else:

x1 x2 ... xn a1,1 v1 v2 ... vm w1,1 w2,1...x1 x2 ... xn a1,2 v1 v2 ... vm w1,2 w2,2...

X A X+ – X R – X+ – {A}

Page 27: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

27

Proof of Completeness cont’d Clearly this relation fails to satisfy X A and X Y.

We also have to check that it satisfies any FD in F + .

The tuples agree on only X + . Thus the only FD’s that might be violated are of the form X’ Y’ where X’ X+ and Y’ contains attributes in R – X+ – {A}.

But if X’ Y’ F+ and X’ X+ then Y’ X+ (reflexivity and augmentation). Therefore X’ Y’ is satisfied.

Page 28: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

28

Decomposition

Consider our original “bad” attribute set

We could decompose it into

But this decomposition loses information about the relationship between students and courses. Why?

Stuff(sid, name, serno, subj, cid, exp-grade)

Student(sid, name)Course(serno, cid)Subject(cid, subj)

Page 29: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

29

Lossless Join Decomposition

R1, … Rk is a lossless join decomposition of R w.r.t. an FD set F if for every instance r of R that satisfies F,

R1(r) ⋈ ... ⋈ Rk(r) = r

Consider:

What if we decompose on (sid, name) and (serno, subj, cid, exp-grade)?

sid

name serno subj

cid exp-grade

1 Sam 570103

AI 570 B

23 Nitin 550103

DB 550 A

Page 30: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

30

Testing for Lossless Join

R1, R2 is a lossless join decomposition of R with respect to F iff at least one of the following dependencies is in F+

(R1 R2) R1 – R2

(R1 R2) R2 – R1

So for the FD set:sid nameserno cid, exp-gradecid subj

Is (sid, name) and (serno, subj, cid, exp-grade) a lossless decomposition?

Page 31: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

31

Dependency Preservation

Ensures we can “easily” check whether a FD X Y is violated during an update to a database:

The projection of an FD set F onto a set of attributes Z, FZ is

{X Y | X Y F +, X Y Z}i.e., it is those FDs local to Z’s attributes

A decomposition R1, …, Rk is dependency preserving if F + = (FR1 ... FRk)+

The decomposition hasn’t “lost” any essential FD’s, so we can check without doing a join

Page 32: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

32

Example of Lossless and Dependency-Preserving Decompositions

Given relation scheme R(name, street, city, st, zip, item, price)

And FD set name street, citystreet, city ststreet, city zipname, item price

Consider the decomposition R1(name, street, city, st, zip) and R2(name, item, price) Is it lossless? Is it dependency preserving?

What if we replaced the first FD by name, street city?

Page 33: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

33

Another Example

Given scheme: R(sid, fid, subj)and FD set: fid subj

sid, subj fidConsider the decomposition

R1(sid, fid) and R2(fid, subj)

Is it lossless? Is it dependency preserving?

Page 34: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

34

FD’s and Keys

Ideally, we want a design s.t. for each nontrivial dependency X Y, X is a superkey for some relation schema in R We just saw that this isn’t always possible

Hence we have two kinds of normal forms

Page 35: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

35

Two Important Normal Forms

Boyce-Codd Normal Form (BCNF). For every relation scheme R and for every X A that holds over R,

either A X (it is trivial) ,oror X is a superkey for R

Third Normal Form (3NF). For every relation scheme R and for every X A that holds over R,

either A X (it is trivial), or X is a superkey for R, or A is a member of some key for R

Page 36: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

36

Normal Forms Compared

BCNF is preferable, but sometimes in conflict with the goal of dependency preservation It’s strictly stronger than 3NF

Let’s see algorithms to obtain: A BCNF lossless join decomposition A 3NF lossless join, dependency preserving

decomposition

Page 37: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

37

BCNF Decomposition Algorithm(from Korth et al.; our book gives recursive version)

result := {R}compute F+while there is a schema Ri in result that is not in BCNF{

let A B be a nontrivial FD on Ri

s.t. A Ri is not in F+ and A and B are disjoint

result:= (result – Ri) {(Ri - B), (A,B)}}

Page 38: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

38

3NF Decomposition Algorithmby Phil Bernstein, now @ MS Research

Let F be a minimal coveri:=0for each FD A B in F { if none of the schemas Rj, 1 j i, contains AB { increment i Ri := (A, B) }}if no schema Rj, 1 j i contains a candidate key for R { increment i Ri := any candidate key for R}return (R1, …, Ri)

Build dep.-preservingdecomp.

Ensurelosslessdecomp.

Page 39: ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October

39

Summary

We can always decompose into 3NF and get: Lossless join Dependency preservation

But with BCNF we are only guaranteed lossless joins

BCNF is stronger than 3NF: every BCNF schema is also in 3NF

The BCNF algorithm is nondeterministic, so there is not a unique decomposition for a given schema R