normalization - texas southern...

55
Normalization 1 Normalization We discuss four normal forms: first, second, third, and Boyce-Codd normal forms 1NF, 2NF, 3NF, and BCNF Normalization is a process that “improves” a database design by generating relations that are of higher normal forms. The objective of normalization: to create relations where every dependency is on the key, the whole key, and nothing but the key”.

Upload: trinhkien

Post on 13-Mar-2018

254 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

Normalization

1

Normalization

We discuss four normal forms: first, second, third, and

Boyce-Codd normal forms

1NF, 2NF, 3NF, and BCNF

Normalization is a process that “improves” a database

design by generating relations that are of higher normal

forms.

The objective of normalization:

“to create relations where every dependency is on the key,

the whole key, and nothing but the key”.

Page 2: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

Normalization

2

There is a sequence to normal forms:

1NF is considered the weakest,

2NF is stronger than 1NF,

3NF is stronger than 2NF, and

BCNF is considered the strongest

Also,

any relation that is in BCNF, is in 3NF;

any relation in 3NF is in 2NF; and

any relation in 2NF is in 1NF.

Page 3: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

Prime and non-prime Attributes

• In the relational model of databases, a candidate key of a relation is a minimal superkey for that relation.

• The constituent attributes of a candidate key are called prime attributes or key attributes.

• Conversely, an attribute that does not occur in ANY candidate key is called a non-prime attribute or non-key attribute

• 2NF (and 3NF) both involve the concepts of

key and non-key attributes.

March 2017 3

Page 4: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

Normalization

4

BCNF

3NF

2NF

1NF a relation in BCNF, is also

in 3NF

a relation in 3NF is also in

2NF

a relation in 2NF is also in

1NF

Page 5: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

Normalization

5

We consider a relation in BCNF to be fully normalized.

The benefit of higher normal forms is that update semantics for

the affected data are simplified.

This means that applications required to maintain the database

are simpler.

A design that has a lower normal form than another design has

more redundancy. Uncontrolled redundancy can lead to data

integrity problems.

First we introduce the concept of functional dependency

Page 6: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

Functional Dependencies

6

Functional Dependencies

We say an attribute, B, has a functional dependency on

another attribute, A, if for any two records, which have

the same value for A, then the values for B in these two

records must be the same. We illustrate this as:

A B

Example: Suppose we keep track of employee email

addresses, and we only track one email address for each

employee. Suppose each employee is identified by their

unique employee number. We say there is a functional

dependency of email address on employee number:

employee number email address

Page 7: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

Functional Dependencies

7

EmpNum EmpEmail EmpFname EmpLname

123 [email protected] John Doe

456 [email protected] Peter Smith

555 [email protected] Alan Lee

633 [email protected] Peter Doe

787 [email protected] Alan Lee

If EmpNum is the PK then the FDs:

EmpNum EmpEmail

EmpNum EmpFname

EmpNum EmpLname

must exist.

Page 8: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

Functional Dependencies

8

EmpNum EmpEmail

EmpNum EmpFname

EmpNum EmpLname

EmpNum

EmpEmail

EmpFname

EmpLname

EmpNum EmpEmail EmpFname EmpLname

3 different ways

you might see FDs

depicted

Page 9: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

Functional Dependency

9

Teacher Course Text

Smith Data Structures Bartram

Smith Data Management Martin

Hall Compilers Hoffman

Brown Data Structures Horowitz

Text Course

Teacher , Course Text

Course is NOT FD on Teacher

Text is NOT FD on Teacher

Page 10: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

Full Functional Dependency

The attribute B is fully functionally dependent on the attribute A if each value of A determines one and only one value of B.

• Example: PROJ_NUM PROJ_NAME

In this case, the attribute PROJ_NUM is known as the determinant attribute and the attribute PROJ_NAME is known as the dependent attribute.

10

Page 11: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

Determinant

11

Functional Dependency

EmpNum EmpEmail

Attribute on the LHS is known as the determinant

• EmpNum is a determinant of EmpEmail

Attribute A determines attribute B (that is B is functionally

dependent on A): if all of the rows in the table that agree in value

for attribute A also agree in value for attribute B.

Page 12: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

Fully functional dependency & composite key

• A composite or multipart key A is a combination of two or more columns in a table that can be used to uniquely identify each row in the table.

• If attribute B is functionally dependent on a composite key A but not on any subset of that composite key, the attribute B is fully functionally dependent on A.

12

Page 13: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

Transitive dependency

13

Transitive dependency

Consider attributes A, B, and C, and where

A B and B C.

Functional dependencies are transitive, which means

that we also have the functional dependency A C

We say that C is transitively dependent on A through B.

Page 14: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

Transitive dependency

14

EmpNum EmpEmail DeptNum DeptNname

EmpNum EmpEmail DeptNum DeptNname

DeptName is transitively dependent on EmpNum via DeptNum

EmpNum DeptName

EmpNum DeptNum

DeptNum DeptName

Page 15: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

Partial dependency

15

A partial dependency exists when an attribute B is

functionally dependent on an attribute A, and A is a

component of a multipart candidate key.

InvNum LineNum Qty InvDate

Candidate keys: {InvNum, LineNum} InvDate is

partially dependent on {InvNum, LineNum} as

InvNum is a determinant of InvDate and InvNum is

part of a candidate key

Page 16: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

First Normal Form

16

First Normal Form

We say a relation is in 1NF if all values stored in the

relation are single-valued and atomic.

1NF places restrictions on the structure of relations.

Values must be simple.

Page 17: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

First normal form (1NF)

A relation (table) should be “flat”.

• the domain of an attribute must include only atomic (simple, indivisible) values and

• the value of any attribute in a tuple must be a single value from the domain of that attribute.

• The only attribute values permitted by 1NF are single atomic (or indivisible) values.

• (No relation within relation)

17

Page 18: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

First Normal Form

18

The following in not in 1NF

EmpNum EmpPhone EmpDegrees

123 233-9876

333 233-1231 BA, BSc, PhD

679 233-1231 BSc, MSc

EmpDegrees is a multi-valued field:

employee 679 has two degrees: BSc and MSc

employee 333 has three degrees: BA, BSc, PhD

Page 19: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

First Normal Form

19

To obtain 1NF relations we must, without loss of

information, replace the above with two relations -

see next slide

EmpNum EmpPhone EmpDegrees

123 233-9876

333 233-1231 BA, BSc, PhD

679 233-1231 BSc, MSc

Page 20: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

First Normal Form

20

EmpNum EmpDegree

333 BA

333 BSc

333 PhD

679 BSc

MSc 679

EmpNum EmpPhone

123 233-9876

333 233-1231

679 233-1231

A join between Employee and EmployeeDegree will produce

the information we saw before

Employee EmployeeDegree

Page 21: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

Boyce-Codd Normal Form

21

Boyce-Codd Normal Form

BCNF is defined very simply:

a relation is in BCNF if it is in 1NF and if every

determinant is a candidate key.

Usually BCNF is the target normalization.

Page 22: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

Second Normal Form

22

Second Normal Form

A relation is in 2NF if it is in 1NF, and every non-key attribute is

fully dependent on each candidate key. (That is, we don’t have

any partial functional dependency.)

A relation schema R is in 2NF if every nonprime attribute A in R

is fully functionally dependent on the primary key of R.

•Relations that are not in BCNF have data redundancies

•A relation in 2NF will not have any partial dependencies

Page 23: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

Second Normal Form

23

LineNum ProdNum Qty InvNum

InvNum, LineNum ProdNum, Qty

InvLine is not 2NF since there is a partial

dependency of InvDate on InvNum

Since there is a determinant that is not a

candidate key, InvLine is not BCNF

InvDate

InvDate InvNum

Qty is a non-key attribute, and it is dependent on InvNum, but

also on ProdNum and LineNum.

InvLine is

only in 1NF

Consider this InvLine table (in 1NF):

Page 24: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

Second Normal Form

24

LineNum ProdNum Qty InvNum InvDate

InvLine

The above relation has redundancies: the invoice date is

repeated on each invoice line.

We can improve the database by decomposing the relation

into two relations:

LineNum ProdNum Qty InvNum

InvDate InvNum

Question: What is the highest normal form for these

relations? 2NF? 3NF? BCNF?

Page 25: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

25

Is the following relation in 2NF?

inv_no line_no prod_no prod_desc qty

Page 26: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

26

EmployeeDept

ename ssn bdate address dnumber dname

Is the following relation in 2NF?

Page 27: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

Third Normal Form

27

• A relation R is in 3NF if the relation is in 1NF and all

determinants of non-key attributes are candidate keys

That is, for any functional dependency: X A holds

in R, either (a) X is a candidate key of R, or (b) A is a

prime attribute.

• This definition of 3NF differs from BCNF only in the

specification of non-key attributes - 3NF is weaker

than BCNF. (BCNF requires all determinants to be

candidate keys.)

• A relation in 3NF will not have any transitive

dependencies of non-key attribute on a candidate key

through another non-key attribute.

Page 28: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

Third Normal Form

28

EmpNum EmpName DeptNum DeptName

EmpName, DeptNum, and DeptName are non-key attributes.

DeptNum determines DeptName, a non-key attribute, and

DeptNum is not a candidate key.

Consider this Employee relation

Is the relation in 3NF? … no

Is the relation in 2NF? … yes

Is the relation in BCNF? … no

Candidate keys

are? …

Page 29: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

Third Normal Form

29

EmpNum EmpName DeptNum DeptName

We correct the situation by decomposing the original relation

into two 3NF relations. Note the decomposition is lossless.

EmpNum EmpName DeptNum DeptName DeptNum

Verify these two relations are in 3NF.

Page 30: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

30

student_no course_no instr_no

Instructor teaches one

course only.

Student takes a course

and has one instructor.

In 3NF, but not in BCNF:

{student_no, course_no} instr_no

instr_no course_no

since we have instr_no course-no, but instr_no is not a

candidate key, the relation is not BCNF.

Page 31: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

31

course_no instr_no

student_no instr_no

student_no course_no instr_no

{student_no, instr_no} student_no

{student_no, instr_no} instr_no

instr_no course_no

Page 32: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

91.2914 32

Is the following relation in 3NF? BCNF?

EmployeeDept

ename ssn bdate address dnumber dname

Page 33: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

33

Not 3NF nor BCNF.

since dnumber is not a candidate key and we have:

dnumber dname.

EmployeeDept

ename ssn bdate address dnumber dname

Page 34: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

34

LineNum ProdNum Qty InvNum InvDate

InvLine (original)

2NF

LineNum ProdNum Qty InvNum

InvDate InvNum

Question: What is the highest normal form for these

relations? 2NF? 3NF? BCNF?

InvNum ProdNum Qty

Page 35: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

Conversion to First Normal Form • A relational table must not contain repeating

groups.

• A repeating group derives its name from the fact that a group of multiple entries of the same type can exist for any single key attribute occurrence.

• If repeating groups do exist, they must be eliminated by making sure that each row defines a single entity.

• 1NF starts with a simple three-step procedure:

35

Page 36: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

Step 1: Eliminate the Repeating Groups:

• Represent the data in a tabular format, where each cell has a single value and there are no repeating groups.

• To eliminate repeating groups: eliminate the NULLs by making sure that each repeating group contains appropriate data value.

1NF violation

PROJ_NUM

PROJ_ NAME

EMP_ NUM

EMP_ NAME JOB_CLASS

CHG_HOUR

HOURS

15 Evergreen 103

June E..Arbough

Elect. Engineer 84.5 23.8

101

John G. News

Database Designer 105 19.4

105

Alice K. Johnson

Database Designer 105 35.7

106

William Smithfield

Programmer 35.75 12.6

102

David H. Senior

Systems Analyst 96.75 23.6

18

Amber Wave 114

Annelisse Jones

Application Designer 48.1 24.6

118

James J. Frommer

General Support 18.36 45.3

104

Anne. K. Romoras

Systems Analyst 96.75 32.4

112

Darlene M. Smithson DSS Analyst 45.95 44

22

Rolling Tide 105

Alice K. Johnson

Database Designer 105 64.7

104

Anne. K. Romoras

Systems Analyst 96.75 48.4

113

Delbert K. Joenbrood

Application Designer 48.1 23.6

111

Geoff B. Wabash

Clerical Support 26.87 22

106

William Smithfield

Programmer 35.75 12.8

25 Starflight 107

Maria D. Alonzo

Programmer 35.75 24.6

115

Travis B. Bawangi

Systems Analyst 96.75 45.8

101

John G. News

Database Designer 105 56.3

114

Annelisse Jones

Application Designer 48.1 33.1

108

Ralph B. Washington

Systems Analyst 96.5 23.6

118

James J. Frommer

General Support 18.36 30.5

112

Darlene M. Smithson DSS Analyst 45.95 41.4

36

Page 37: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

Step1: Eliminating Repetitions PROJ_NUM PROJ_NAME EMP_NUM EMP_NAME JOB_CLASS CHG_HOUR HOURS

15 Evergreen 103 June E..Arbough Elect. Engineer 84.5 23.8

15 Evergreen 101 John G. News Database Designer 105 19.4

15 Evergreen 105 Alice K. Johnson Database Designer 105 35.7

15 Evergreen 106 William Smithfield Programmer 35.75 12.6

15 Evergreen 102 David H. Senior Systems Analyst 96.75 23.6

18 Amber Wave 114 Annelisse Jones Application Designer 48.1 24.6

18 Amber Wave 118 James J. Frommer General Support 18.36 45.3

18 Amber Wave 104 Anne. K. Romoras Systems Analyst 96.75 32.4

18 Amber Wave 112 Darlene M. Smithson DSS Analyst 45.95 44

22 Rolling Tide 105 Alice K. Johnson Database Designer 105 64.7

22 Rolling Tide 104 Anne. K. Romoras Systems Analyst 96.75 48.4

22 Rolling Tide 113 Delbert K. Joenbrood Application Designer 48.1 23.6

22 Rolling Tide 111 Geoff B. Wabash Clerical Support 26.87 22

22 Rolling Tide 106 William Smithfield Programmer 35.75 12.8

25 Starflight 107 Maria D. Alonzo Programmer 35.75 24.6

25 Starflight 115 Travis B. Bawangi Systems Analyst 96.75 45.8

25 Starflight 101 John G. News Database Designer 105 56.3

25 Starflight 114 Annelisse Jones Application Designer 48.1 33.1

25 Starflight 108 Ralph B. Washington Systems Analyst 96.5 23.6

25 Starflight 118 James J. Frommer General Support 18.36 30.5

25 Starflight 112 Darlene M. Smithson DSS Analyst 45.95 41.4 37

Page 38: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

Step 2: Identify the Primary Key: • To have a proper

Primary Key, it should uniquely identify any attribute value.

• PROJ_NUM value 15, identifies any one of 5 employees.

• EMP_NUM can also identify multiple rows, since one employee can work in more than one project.

• In this case, the only primary key possible is a combination of PROJ_NUM and EMP_NUM.

38

PROJECT

(PROJ_NUM ,EMP-NUM ,EMP_NAME, JOB_CLASS, CHG_HOUR HOURS)

Page 39: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

Step 3: Identify all dependencies:

• (PROJ_NUM , EMP_NUM) PROJ_NAME, EMP_NAME, JOB_CLASS, CHG_HOUR, HOURS.

• Additional dependencies:

• PROJ_NUM PROJ_NAME

• EMP_NUM EMP_NAME, JOB_CLASS, CHG_HOUR

• JOB_CLASS CHG_HOUR

This dependency exists between two nonprime attributes, which signals a transitive dependency.

39

Page 40: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

Conversion to Second Normal Form

• Conversion to 2NF only occurs when the 1NF has a composite primary key.

• If the 1NF table has a single-attribute primary key, then the table is automatically 2NF.

• 2NF starts with a simple two -step procedure:

40

Page 41: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

Step 1: Make new tables to Eliminate Partial Dependencies

• For each component of the primary key that

acts as a determinant in a partial dependency, create a new table with a copy of that component as the primary key.

• It is also important that the determinant attribute remains in the original table because they will be the foreign keys that will relate the new tables to the original one.

41

PROJ_NUM EMP_NUM

Page 42: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

Step 2: Reassign Corresponding Dependent Attributes

• Determine all attributes that are dependent in

the partial dependencies. These are removed from the original table and placed in the new table with their determinant.

• Any attributes that are dependent in a partial dependency will remain in the original table.

42

Page 43: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

PROJ_NUM PROJ_NAME

PROJ_NUM PROJ_NAME

15 Evergreen

18 Amber Wave

22 Rolling Tide

25 Starflight

43

Page 44: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

Step1: 2NF EMP_NUM EMP_NAME, JOB_CLASS, CHG_HOUR

May 2012 91.2814 44

EMP_NUM EMP_NAME JOB_CLASS CHG_HOUR 103 June E..Arbough Elect. Engineer 84.5 101 John G. News Database Designer 105 105 Alice K. Johnson Database Designer 105 106 William Smithfield Programmer 35.75 102 David H. Senior Systems Analyst 96.75 114 Annelisse Jones Application Designer 48.1 118 James J. Frommer General Support 18.36 104 Anne. K. Romoras Systems Analyst 96.75 112 Darlene M. Smithson DSS Analyst 45.95 105 Alice K. Johnson Database Designer 105 104 Anne. K. Romoras Systems Analyst 96.75 113 Delbert K. Joenbrood Application Designer 48.1 111 Geoff B. Wabash Clerical Support 26.87 106 William Smithfield Programmer 35.75 107 Maria D. Alonzo Programmer 35.75 115 Travis B. Bawangi Systems Analyst 96.75 101 John G. News Database Designer 105 114 Annelisse Jones Application Designer 48.1 108 Ralph B. Washington Systems Analyst 96.5 118 James J. Frommer General Support 18.36

112 Darlene M. Smithson DSS Analyst 45.95

Page 45: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

45

PROJ_NUM EMP_NUM ASSIGN_HOURS 15 103 23.8 15 101 19.4 15 105 35.7 15 106 12.6 15 102 23.6 18 114 24.6 18 118 45.3 18 104 32.4 18 112 44 22 105 64.7 22 104 48.4 22 113 23.6 22 111 22 22 106 12.8 25 107 24.6 25 115 45.8 25 101 56.3 25 114 33.1 25 108 23.6 25 118 30.5 25 112 41.4

ASSIGNMENT(PROJ_NUM, EMP_NUM, ASSIGN_HOURS)

Page 46: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

2NF Form

Now, we have 3 tables:

• PROJECT(PROJ_NUM, PROJ_NAME)

• EMPLOYEE(EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOURS)

• ASSIGNMENT(PROJ_NUM, EMP_NUM, ASSIGN_HOURS)

46

Page 47: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

Conversion to third Normal Form • Step 1: Make new tables to eliminate transitive

dependencies.

• For every transitive dependency, write a copy of its determinant as a primary key for a new table.

• It is also important that the determinant remains in the original table to serve as a foreign key.

JOB_CLASS CHG_HOUR

47

JOB_CLASS

Page 48: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

Step 2: 3NF

• Identify the attributes that are dependent on each determinant and place them in the new tables with their determinant and remove them from their original table.

In our example:

1. Move CHG_HOUR to new table

2. Remove CHG_HOUR from EMPLOYEE

3. Now : EMP_NUMEMP_NAME, JOB_CLASS

48

Page 49: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

Step2: 3 NF

JOB_CLASS CHG_HOUR

General Support 18.36

Clerical Support 26.87

Programmer 35.75

DSS Analyst 45.95

Application Designer 48.1

Elect. Engineer 84.5

Systems Analyst 96.5

Systems Analyst 96.75

Database Designer 105

49

Page 50: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

BCNF by making JOB_CLASS a key

JOB_CLASS CHG_HOUR Elect. Engineer 84.5 Database Designer 105 Programmer 35.75 Systems Analyst 96.75 Application Designer 48.1 General Support 18.36 DSS Analyst 45.95 Clerical Support 26.87 Mechanical Engineer 67.9 Civil Engineer 55.78 Bio Technician 34.55

50

Page 51: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

Step 2: 3NF EMP_NUM EMP_NAME JOB_CLASS

103 June E..Arbough Elect. Engineer

101 John G. News Database Designer

105 Alice K. Johnson Database Designer

106 William Smithfield Programmer

102 David H. Senior Systems Analyst

114 Annelisse Jones Application Designer

118 James J. Frommer General Support

104 Anne. K. Romoras Systems Analyst

112 Darlene M. Smithson DSS Analyst

113 Delbert K. Joenbrood Application Designer

111 Geoff B. Wabash Clerical Support

107 Maria D. Alonzo Programmer

115 Travis B. Bawangi Systems Analyst

108 Ralph B. Washington Systems Analyst

51

Page 52: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

BCNF conversion

• So now our design becomes:

• PROJECT(PROJ_NUM, PROJ_NAME)

• EMPLOYEE(EMP_NUM, EMP_NAME, JOB_CLASS)

• JOB(JOB_CLASS, CHG_HOUR)

• ASSIGNMENT(PROJ_NUM, EMP_NUM, ASSIGN_HOURS)

• Performance tip: Introduce a surrogate key to act as a numeric key in the JOB relation.

52

Page 53: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

JOB_ID JOB_CLASS CHG_HOUR 500 Elect. Engineer 84.5 501 Database Designer 105 502 Programmer 35.75 503 Systems Analyst 96.75 504 Application Designer 48.1 505 General Support 18.36 506 DSS Analyst 45.95 507 Clerical Support 26.87 508 Mechanical Engineer 67.9 509 Civil Engineer 55.78 510 Bio Technician 34.55

53

Page 54: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

After BCNF conversion

• So now our design becomes:

• PROJECT(PROJ_NUM, PROJ_NAME)

• EMPLOYEE(EMP_NUM, EMP_NAME, JOB_ID)

• JOB(JOB_ID, JOB_CLASS, CHG_HOUR)

• ASSIGNMENT(PROJ_NUM, EMP_NUM, ASSIGN_HOURS)

54

Page 55: Normalization - Texas Southern Universitycs.tsu.edu/ghemri/CS681/ClassNotes/Normalization_slides.pdfa relation is in BCNF if it is in 1NF and if every determinant is a candidate key

Normal Form Test Remedy (Normalization)

First (1NF) Relation should have no multivalued

attributes or nested relations

Form new relations for each

multivalued attribute or nested

relation. Second (2NF) For relations where primary key

contains multiple attributes, no

nonkey attribute should be

functionally dependent on a part of

the primary key.

Decompose and set up a new

relation for each partial key with its

dependent attribute(s).

Make sure to keep a relation with

the original primary key and any

attributes that are fully functionally

dependent on it. Third (3NF) Relation should not have a nonkey

attribute functionally determined by

another nonkey attribute (or by a set

of nonkey attributes). That is, there

should be no transitive dependency of

a nonkey attribute on the primary key

Decompose and setup a relation

that includes the nonkey attribute(s)

that functionally determine(s) other

nonkey attribute(s).

BCNF All non key attributes are functionally

dependent on key attributes

Make all non key attributes as keys

in the new table.

March 2017 55