n ormalization joe meehean 1. r edundancies repeated data in database wastes space can cause...

Post on 14-Dec-2015

213 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

NORMALIZATION

Joe Meehean

2

REDUNDANCIES

Repeated data in database Wastes space Can cause modification anomalies

unexpected side effect when changing data make building software on top of DB difficult

Normalization process of removing redundancies

3

MODIFICATION ANOMALIES

Insert anomaly extra data must be known to insert a row into a

table Update anomaly

must change multiple rows to modify a single fact

Deletion anomaly deleting a row causes other data to be deleted deletes more data than is necessary or desired

4

BAD COLLEGE DATABASE

All data in 1 table

StdNo

First Name

Last Name

Offer No

Term Year Grade Course No

Course Descr.

S1 Phil Park O1 Fall 2011

C- C1 DB

S1 Phil Park O2 Fall 2011

B+ C2 OS

S2 Blem Emily O3 Spring

2012

A+ C3 PL

S2 Blem Emily O2 Fall 2011

B+ C2 OS

S3 Roger Cook O4 Spring

2014

--- C1 DB

5

BAD COLLEGE DATABASE

Insert anomaly adding Rush Daniels as a student requires knowing which offerings Rush is

enrolled in cannot add Rush as a student until he enrolls

StdNo

First Name

Last Name

Offer No

Term Year Grade Course No

Course Descr.

S1 Phil Park O1 Fall 2011

C- C1 DB

S1 Phil Park O2 Fall 2011

B+ C2 OS

S2 Blem Emily O3 Spring

2012

A+ C3 PL

S2 Blem Emily O2 Fall 2011

B+ C2 OS

S3 Roger Cook O4 Spring

2014

--- C1 DB

6

BAD COLLEGE DATABASE

Update anomaly if Emily changes her name to Emma need to change multiple rows

StdNo

First Name

Last Name

Offer No

Term Year Grade Course No

Course Descr.

S1 Phil Park O1 Fall 2011

C- C1 DB

S1 Phil Park O2 Fall 2011

B+ C2 OS

S2 Blem Emily O3 Spring

2012

A+ C3 PL

S2 Blem Emily O2 Fall 2011

B+ C2 OS

S3 Roger Cook O4 Spring

2014

--- C1 DB

7

BAD COLLEGE DATABASE

Delete anomaly if Roger drops out of college and we delete him we also delete that there is an offering of DB in

the spring

StdNo

First Name

Last Name

Offer No

Term Year Grade Course No

Course Descr.

S1 Phil Park O1 Fall 2011

C- C1 DB

S1 Phil Park O2 Fall 2011

B+ C2 OS

S2 Blem Emily O3 Spring

2012

A+ C3 PL

S2 Blem Emily O2 Fall 2011

B+ C2 OS

S3 Roger Cook O4 Spring

2014

--- C1 DB

8

FUNCTIONAL DEPENDENCIES (FDS)

Constraint between 2 or more columns Represented by → X determines Y (X →Y) if there exists at most

1 value of Y for each value of X like a mathematical function f(x) = y left hand side (or LHS) is called the

determinant e.g., StdNo determines Student first name

StdNo → First Name

9

ORGANIZING FDS

Make a list can condense list by listing all dependent

columns for a given determinant e.g., StdNo →First Name, Last Name

Determinants should be minimal least # of columns required to determine values

of other columns e.g., StdNo,First Name → Last Name

10

BAD COLLEGE DATABASE

StdNo → First Name, Last Name OfferNo → Term, Year, Course No, Course

Descr. Std No, Offer No → Grade

StdNo

First Name

Last Name

Offer No

Term Year Grade Course No

Course Descr.

S1 Phil Park O1 Fall 2011

C- C1 DB

S1 Phil Park O2 Fall 2011

B+ C2 OS

S2 Blem Emily O3 Spring

2012

A+ C3 PL

S2 Blem Emily O2 Fall 2011

B+ C2 OS

S3 Roger Cook O4 Spring

2014

--- C1 DB

11

IDENTIFYING FDS

From business narrative Look for words like unique

e.g., “Each student has a unique student number, a first name, and a last name.”

Look for 1-M relationships child (M-side) is the determinant (LHS) e.g., “Faculty teach many offerings.” e.g., Offer No → Faculty Id

12

IDENTIFYING FDS

From relational tables FDs where determinant (LHS) is not the PK or

a candidate key recall, a candidate key is column(s) that unique

identify a row e.g., Zip → State

Combined PKs does 1 column determine values of some

other columns? e.g., StdNo → First Name, Last Name

QUESTIONS?

13

14

NORMAL FORMS

Normalization remove redundancies in tables removes modification anomalies makes data easier to modify

Normal form rules about functional dependencies (FDs)

allowed each successive normal form removes FDs

15

NORMAL FORMS1NF

2NF

3NF/BCNF

16

1ST NORMAL FORM

All relational tables are already in 1NF by definition

17

2ND NORMAL FORM

Key columns columns that are part (or all of) a candidate key recall a candidate key is a key that uniquely

identifies a row Non-key columns

columns that are not part of a candidate key

18

2ND NORMAL FORM

A table is in 2NF if each non-key column depends on all candidate keys NOT on any subset of any candidate key check functional dependencies (FDs)

A 2NF violation a FD where part of a key determines a

non-key column

19

2ND NORMAL FORM

2NF Violations StdNo → First Name, Last Name OfferNo → Term, Year, Course No, Course

Descr.

StdNo

First Name

Last Name

Offer No

Term Year Grade Course No

Course Descr.

S1 Phil Park O1 Spring

2012

-- C1 PL

S1 Phil Park O2 Fall 2011

B+ C2 DB

S2 Blem Emily O3 Spring

2012

-- C3 OS

S2 Blem Emily O2 Fall 2011

B+ C2 DB

20

3RD NORMAL FORM

A table is in 3NF if it is in 2NF AND each non-key column depends only on

candidate keys NOT other non-key columns e.g., CourseNr → Course Desc.

3NF violation a non-key column on the right-hand side (RHS) AND anything other than a candidate key on LHS

21

3RD NORMAL FORM

3NF prohibits transitive dependencies Transitive dependencies

if A → B & B → C, then A → C e.g., Offer No → Course No & Course No → Course Desc. then Offer No → Course Desc.

22

COMBINED 2NF & 3NF

A table is in 3NF if each non-key column depends on all candidate keys whole candidate keys and nothing but candidate keys

23

3RD NORMAL FORM

2NF Violations StdNo → First Name, Last Name OfferNo → Term, Year, Course No, Course Descr.

3NF Violations CourseNo → Course Descr. OfferNo → Course Descr.

StdNo

First Name

Last Name

Offer No

Term Year Grade Course No

Course Descr.

S1 Phil Park O1 Spring

2012

-- C1 PL

S1 Phil Park O2 Fall 2011

B+ C2 DB

S2 Blem Emily O3 Spring

2012

-- C3 OS

S2 Blem Emily O2 Fall 2011

B+ C2 DB

24

BOYCE-CODD NORMAL FORM (BCNF)

Revised, simpler version of 3NF Covers additional special cases A table is in BCNF if every determinant is a

candidate key Violations are easy to detect

determinant (LHS) is not a candidate key e.g., StdNo → Last Name

25

BOYCE-CODD NORMAL FORM (BCNF)

Excludes 2 redundancies that 3NF does not1. part of a key determines part of a key2. a non-key determines part of a key

26

BOYCE-CODD NORMAL FORM (BCNF)

StdNo OfferNo Email EnrGrade

S1 O1 blem@fake.edu

3.5

S1 O2 blem@fake.edu

3.6

S2 O1 rush@fake.edu

3.8

S2 O3 rush@fake.edu

3.5 BCNF Violations Email → StdNo

27

SIMPLE SYNTHESIS (BCNF)

Convert tables into BCNF1. Eliminate extraneous columns from LHS of

FDs2. Remove derived (transitive) FDs3. Arrange FDs into groups by determinant4. For each FD group make table with

determinant as primary key5. Merge tables where one table include all

columns of other table choose PK of one of the tables to be PK of new

table

28

BAD COLLEGE DATABASE (1)

StdNo → First Name StdNo → Last Name OfferNo → Term OfferNo → Year Offer No → Course No Offer No → Course Descr. Std No, Offer No → Grade

Course No → Course Descr.

StdNo

First Name

Last Name

Offer No

Term Year Grade Course No

Course Descr.

S1 Phil Park O1 Spring

2012

-- C1 PL

S1 Phil Park O2 Fall 2011

B+ C2 DB

S2 Blem Emily O3 Spring

2012

-- C3 OS

S2 Blem Emily O2 Fall 2011

B+ C2 DB

29

BAD COLLEGE DATABASE (2)

StdNo → First Name StdNo → Last Name OfferNo → Term OfferNo → Year Offer No → Course No Offer No → Course Descr. Std No, Offer No → Grade

Course No → Course Descr.

StdNo

First Name

Last Name

Offer No

Term Year Grade Course No

Course Descr.

S1 Phil Park O1 Spring

2012

-- C1 PL

S1 Phil Park O2 Fall 2011

B+ C2 DB

S2 Blem Emily O3 Spring

2012

-- C3 OS

S2 Blem Emily O2 Fall 2011

B+ C2 DB

30

BAD COLLEGE DATABASE (3)

StdNo → First Name, Last Name OfferNo → Term, Year, Course No Std No, Offer No → Grade Course No → Course Descr.

StdNo

First Name

Last Name

Offer No

Term Year Grade Course No

Course Descr.

S1 Phil Park O1 Spring

2012

-- C1 PL

S1 Phil Park O2 Fall 2011

B+ C2 DB

S2 Blem Emily O3 Spring

2012

-- C3 OS

S2 Blem Emily O2 Fall 2011

B+ C2 DB

31

BAD COLLEGE DATABASE (4)

StdNo First Name Last Name

S1 Emily Blem

S2 Phil Park

Offer No Term Year Course No

O1 Spring 2012 C1

O2 Fall 2011 C2

O3 Spring 2012 C3

StdNo OfferNo Grade

S1 O1 --

S1 O2 B+

S2 O3 --

S2 02 B+

Course No Course Descr.

C1 PL

C2 DB

C3 OS

32

BAD COLLEGE DATABASE (5)

StdNo First Name Last Name

S1 Emily Blem

S2 Phil Park

Offer No Term Year Course No

O1 Spring 2012 C1

O2 Fall 2011 C2

O3 Spring 2012 C3

StdNo OfferNo Grade

S1 O1 --

S1 O2 B+

S2 O3 --

S2 02 B+

Course No Course Descr.

C1 PL

C2 DB

C3 OS

33

IMPORTANCE OF NORMAL FORM VIOLATIONS

We have the BCNF synthesis process we can just make BCNF tables why do we care about detecting NF violations?

DBA has 2 jobs make new databases maintain old ones

Making new DBs requires using BCNF synthesis process

Maintaining old DBs requires detecting NF violations perhaps made by other employees detecting violations narrows scope of DB redesign

QUESTIONS?

34

35

4TH NORMAL FORM (4NF)

M-way relationships associative entity types (weak entities) multiple associations primary key made of FKs from 3 or more tables often represent important documents

glue multiple things together e.g., invoice

can sometimes contain redundancies

36

4TH NORMAL FORM (4NF)

Student

StdNoName

Offering

OfferNoLocation

Textbook

TextNoTextTitle

Enroll

37

4TH NORMAL FORM (4NF)

StdNo OfferNo TextNo

S1 O1 T1

S1 O2 T2

S1 O1 T2

S1 O2 T3

Enroll Table

38

MULTIVALUED DEPENDENCIES (MVDS)

Given table R with columns X,Y, and Z X →→ Y

each X maps to a set of Ys (between 1 and M) X →→ Z

each X maps to a set of Zs (between 1 and M) Y & Z are independent

knowing Y doesn’t tell you anything about Z and vice-versa

Y →→ Z & Y → Z Z →→ Y & Z → Y also Y,V →→ Z, unless V →→ Z

Every FD is an MVD not every MVD is an FD

39

TRIVIAL MVDS

MVD X →→ Y is trivial if Y is a subset of X OR X and Y are the only columns in the table OR X → Y and X → Z

e.g., has-job table E# →→ P#

e.g. offering table C#, S# →→ #S

Employee# Position# Course Number

Section #

Faculty ID

40

MULTIVALUED DEPENDENCES (MVDS) non-trivial MVDs manifest as redundancies

in tables there exist rows where X and Y are the same

but Z is different e.g., enroll table

O# →→ S# O# →→ T# S# independent of T#

if Emily drops 242 it doesn’t change the text books

OfferNo StudentNo TextNo

CS242A Phil

CS242A Emily

CS242A Drozdek

CS242A Weiss

41

MULTIVALUED DEPENDENCES (MVDS) non-trivial MVDs manifest as redundancies

in tables there exist rows where X and Y are the same

but Z is different e.g., enroll table

O# →→ S# O# →→ T# S# independent of T#

if Emily drops 242 it doesn’t change the text books

OfferNo StudentNo TextNo

CS242A Phil Weis

CS242A Emily Drozdek

CS242A Phil Drozdek

CS242A Emily Weiss

42

4TH NORMAL FORM (4NF)

4th normal form table in BCNF AND all MVDs are trivial

Detecting a violation are there any MVDs? are those MVDs non-trivial?

43

4TH NORMAL FORM (4NF) Resolving violations

X →→ Y X →→ Z

X Y Z

X1 Y1 Z1

X1 Y2 Z2

X1 Y2 Z1

X1 Y1 Z2

X Y

X1 Y1

X1 Y2

X Z

X1 Z1

X1 Z2

44

MORE EXAMPLES

Student Offering Grade

Phil CS242A A

Phil CS370A B

Emily CS242A B

Emily CS370A A

S →→ O & S →→ G ?

O →→ G & O →→ S ?

G →→ S & G →→ O ?

45

MORE EXAMPLES

Student Offering Grade

Phil CS242A A

Phil CS370A B

Emily CS242A B

Emily CS370A A

Offering and Grade not independent

Grade and Student not independent

Student and Offering not indepedent

S →→ O & S →→ G ?

O →→ G & O →→ S ?

G →→ S & G →→ O ?

46

MORE EXAMPLES

B →→ E & B →→ C Is this a trivial MVD?

Bank Branch Employee Customer

B3 Ann Ted

B3 Terry Alfred

B3 Ann Alfred

B3 Terry Ted

47

MORE EXAMPLES

B →→ E & B →→ C Is this a trivial MVD?

E is not a subset of B & C is not a subset of B B and E are not the only columns in the table B → E & B → C NO!!!

Bank Branch Employee Customer

B3 Ann Ted

B3 Terry Alfred

B3 Ann Alfred

B3 Terry Ted

48

MORE EXAMPLESBank Branch Employee Customer

B3 Ann Ted

B3 Terry Alfred

B3 Ann Alfred

B3 Terry Ted

Bank Branch Employee

B3 Ann

B3 Terry

Bank Branch Customer

B3 Ted

B3 Alfred

QUESTIONS?

49

50

QUIZ BREAK!!!

Part# PQty PDesc

P1 2 5mm bolt

P2 4 10mm nut

P3 2 5mm wrench

P4 4 8mm washer

PQty →→ PDesc & PQty →→ Part# ?

51

QUIZ BREAK!!!

Loc # Item Managers

L1 XBox 360 250GB

Cindy

L1 Garmin GPS Aaron

L1 XBox 360 250GB

Aaron

L1 Garmin GPS Cindy

52

EXTRA 4NF SLIDES

53

4TH NORMAL FORM (4NF)

Relationship independence 2 relationships are independent if one cannot be

derived from the other knowing one relationship tells you nothing about

the other

54

4TH NORMAL FORM (4NF)

StdNo OfferNo TextNo

S1 O1 T1

S1 O2 T2

S1 O1 T2

S1 O2 T3

Enroll Table

3 relationships StdNo -- OfferNo StdNo -- TextNo OfferNo -- TextNo

55

4TH NORMAL FORM (4NF)

StdNo -- OfferNo cannot be derived from other 2 StdNo -- TextNo & TextNo -- OfferNo

same textbook can be use for 2 offerings

OfferNo -- TextNo cannot be derived from other 2 OfferNo -- StdNo & StdNo -- TextNo

students use many text books, not all related to this offering

StdNo -- TextNo can be derived StdNo -- OfferNo & OfferNo -- TextNo

offering number gives the set of texts a student needs

56

4TH NORMAL FORM (4NF)

Multivalued Dependencies (MVDs) each X can map to a set of Ys and a set of Zs generalization of functional dependencies

each X maps to one Y each X maps to one Z

represented by X→→Y|Z every FD is an MVD

known as a trivial MVD not every MVD is an FD

57

4TH NORMAL FORM (4NF)

M-way tables sometimes introduce MVDs X →→Y X→→Z X→→Y|Z Y and Z are independent

relationship X--Y is independent of relationship X--Z

Not all M-way tables produce MVDs

58

4TH NORMAL FORM (4NF)

MVD Table Redundancies assume X1 maps to Y1 & Y2 and X1 maps to Z1 & Z2

X Y Z

X1 Y1

X1 Y2

X1 Z1

X1 Z2

59

4TH NORMAL FORM (4NF)

Need to fill in the rest of the table

X Y Z

X1 Y1 Z1

X1 Y2 Z2

X1 Y2 Z1

X1 Y1 Z2

60

4TH NORMAL FORM (4NF)

Rows below the line exist because relationship B--C can be derived from relationships A--B & A--C

Rows below line are redundant

X Y Z

X1 Y1 Z1

X1 Y2 Z2

X1 Y2 Z1

X1 Y1 Z2

61

4TH NORMAL FORM (4NF)

OfferNo StdNo TextNo

O1 S1 T1

O1 S2 T2

O1 S2 T1

O1 S1 T2

Enroll Table

OfferNo→→StdNo|TextNo offerings map to many students offerings can have many text books

Rows below the line are redundant

62

4TH NORMAL FORM (4NF)

4NF definition tables cannot contain any non-trivial MVDs

Resolving 4NF violations for each table with a non-trivial MVD split 3 column table into two 2 column tables A,B,C goes to A,B & A,C

StdNo OfferNo

S1 O1

S1 O2

OfferNo TextNo

O1 T1

O1 T2

O2 T1

O2 T3

top related