unit iii dbms

1

CHAPTER 8NORMALIZATION OF RELATIONAL SCHEMA

Normalization of a Relation Schema

Normalization of a Relation Schema refers to the process of:-

(a) Identifying those data dependencies in the schema, which would cause anomalies during Insert, Update and Delete of data.

(b) And decomposing the schema into a set of sub-schemas, on the basis of

dependencies, causing anomalies.

The resulting schemas would permit representation of the intended information, while maintaining the data redundancies to a minimal.

The process of normalization can be well understood only after understanding the concept of various types of data dependencies occurring in a database.

Concept of Functional Dependencies (FDs)

Let there be a Relation Schema R, comprising sets of attributes , & i.e. R, R & R. A Functional Dependency (read as “ determines ”) is said to be holding on the Schema R, if for every legal relation r(R) and for every tuple- pair {t1,t2} r, if t1[] = t2[] then it must satisfy t1[] = t2[]. This means that if any two tuples of relation r(R) agree on the values of , then the two tuples must agree on the values of .

, the “Left Side” of FD is called “Determinant”; and , the “Right Side” of the FD is called “Dependent”.

Example: The statement that “Knowing the Registration-number of a Student, we can determine his/her Name” implies FD Registration-number Name. Here, the Registration-number is the Determinant of the FD and Name is the Dependent.

FDs holding on a Schema R

A set of FDs F is said to be holding on a schema R, if all FDs in F are satisfied by every legal relation r(R).

Suppose, there is a Relational Schema “Student” having the following FDs holding on it:-

{Roll_No} Registration_No, Branch, Section, Name, Father’s-name, Address, DOB{Registarion_No} Roll_No, Branch, Section, Name, Father’s-name, Address, DOB{Name, Father_Name, DOB, Address} Roll_No, Registration_No, Section, Branch

2

Trivial Functional Dependency

An FD is called Trivial, if . It is called trivial, because such an FD would be satisfied by any relation r(R). Since, if two tuples of a relation agree on the values of , the tuples will definitely agree on the values of , since is a subset of .

For Example, {Roll_No, Name} Name represents a trivial FD.

Extraneous Attributes on the “Left Side” of an FD

Suppose an FD is holding on a Schema R. An attribute A (A ) is said to be extraneous, if FD ( - A) also holds on R. This means that attribute A is not required in to determine the value of . The subset ( - A) is sufficient to determine the value of .

Left-Irreducible FD

An FD , holding on a schema R, is said to be left-irreducible, if there exists no proper subset of , which can determine i.e. there exists no attribute A (A ) such that ( - A) holds on R.

Super Key (SK) of a Relation Schema

Suppose FD K R holds on schema R, where K R, then K forms a Super Key (SK) of Schema R. Super-set of a SK will also form a SK.

Example: Suppose, we have a Schema Student (Roll_No, Registration_No, Branch, Section, Name, Fathers-name, Address, DOB) and suppose the value of {Roll-no} is distinct and also the value of {Registration_No} is distinct for each student. Thus, {Roll-no, Class} will form a Super-Key of Schema Student. Similarly, {Roll-no, Class, Name} will also form a Super-Key of Student. Similarly, there can be many Super Keys of Student.

Candidate Key (CK) of a Relation Schema R

Since superset of a Super Key of R will also be its Super Key. This implies that a Super Key may be containing some extraneous attributes. Suppose K is a Super Key of R and E is the complete set of extraneous attributes contained in K, then (K-E) will form a minimal Super Key of R. No proper subset of (K-E) will form a Super Key of R. This minimal Super Key is called a Candidate Key of R. Alternatively, it can be stated that if K R forms a left-irreducible FD holding on R, then K is called a Candidate Key of R.

3

For example, {Roll_No} and {Registration_No} will form Candidate Keys of schema “Student”. In addition, {Name, Fathers_Name, Address, DOB} also forms another Candidate Key.

Primary-Key

A Relation Schema R may have more than one Candidate Keys. One of the Candidate Keys is chosen as primary means to identify tuples uniquely in a relation r(R). This designated candidate key is called a Primary Key of R. Out of the three candidate keys in the above example, we may select {Roll-No} as Primary Key of the Schema Student. Prime Attributes or Key Attributes of a Relational Schema An attribute A of Relational Schema R (AR) is said to be a Prime Attribute or Key Attribute, if it forms part of any of its Candidate Keys. Let {K1, K2, …. Kn} be the complete set of candidate keys of R. Then the set of Prime Attributes of R = K’ = K1 K2 …. Kn

Non-Prime Attributes of a Relational Schema The subset of R, which does not form part of any of its Candidate Keys, is called the set its Non-Prime Attributes or Non-Key Attributes.

Logically implied FDs

Suppose F is the set of FDs holding on a Relation Schema R. There may be some FDs that can be logically inferred from F. These inferred FDs will also hold on every legal relation r(R). The set of such FDs, inferred from set F, is said to logically implied by F.

Rules for the Inference of FDs

Suppose , , and are subsets of attributes of a Relation Schema R i.e. R and R, R and R

Armstrong’s Rules The inference rules 1..3 are Armstrong’s Rules:- Rule 1 (Reflexivity Rule) If then holds on R.

Rule 2 (Augmentation Rule) If holds on R, then holds on R.

Rule 3 (Transitivity Rule) If and hold on R, then holds on R.

Additional Rules of Inference

4

Rule 4 (Union Rule) If and hold on R, then holds on R.

Rule 5 (Decomposition Rule) If holds on R, then and hold on R.

Rule 6 (Pseudo-Transitivity Rule) If and hold on R, then holds on R.

Trivial FD An FD is said to be trivial, if i.e. if the determinant is a superset of dependent.

Example: The Functional Dependency conveyed by the statement, “ Knowing the Name and Address of a Person, we can determine his Address “ i.e. (Name, Address) Address is obviously trivial.

Closure of FD Set

Suppose F is a set of FDs that holds on a Schema R, then the Closure of F, denoted by F +, is the complete set of FDs, that includes F and all the FDs that are logically implied (inferred) by F. Any legal relation r(R) that satisfies F will also satisfy F+.

Algorithm to determine Closure F + of an FD Set F

F+ = F;Repeat

Save-F+ = F+;To each FD f1 F+, apply Armstrong’s Rule of Reflexivity; and add the FDs so inferred to F+;To each FD f1 F+, apply Armstrong’s Rule of Augmentation; and add the FDs so inferred to F+;To each such pair of FDs as { } F+, apply the Armstrong’s Rule of Transitivity, and add FD to F+;

Until (F+ = Save-F+);

Cover of an FD Set

An FD set G is said to be the cover of another FD set F, if F G+ i.e. all the FDs that are there in F are also there in the Closure of set G.

Equivalent Sets

Two FD sets F and G are said to be equivalent sets, if both form cover of each other i.e F G+ and G F+, which implies F+ = G+. This means that two sets F and G are equivalent, if their closures are equal.

5

Extraneous FDs in a set

An FD f F is said to be extraneous, if its exclusion from F does not affect the Closure of F i.e. {F-f}+ = F+. Such FDs in a set are logically implied by other FDs in the set.

Example:-

Suppose we have an FD set F :{ , and } then is said to be extraneous FD, since it can be inferred by applying Armstrong’s Transitivity Rule to the other two FDs in the set. The extraneous FD can be eliminated from the set, without affecting Closure of the set.

Extraneous attributes in the determinant of an FD

Suppose, we have an FD , that holds on a schema R. An attribute A is said to be extraneous, if ( - A) also holds on R.

Left-irreducible FDs

The left-side on an FD (i.e. its determinant) is said to be irreducible, if it does not contain any extraneous attributes. Such FDs are known as left-irreducible FDs.

If R is a left-irreducible FD holding on the schema R, then forms a candidate Key of R.

Canonical Cover of an FD Set

An FD set Fc is said to be Minimal Cover or Canonical Cover of FD set F, iff F Fc+ and

it satisfies the following three conditions:-

(a) Each FD in Fc is in a Canonical Form i.e. has only one attribute on its right side.

(b) No FD in Fc has any extraneous attributes on its left side i.e. all the FDs in Fc

are left-irreducible.

(c) None of the FDs in Fc is extraneous i.e. no FD in Fc is logically implied by the other FDs in the set Fc . This implies than no FD can be removed from the set Fc without changing its closure.

How does Canonical Cover of an FD set help to reduce the DBMS overheads?

Suppose F is the set of FDs holding on a schema R, then a relation r(R) would be legal only if satisfies all the FDs in set F. Now, to determine whether a relation r(R) is legal or not, DBMS has to check for the satisfaction of all the FDs in set F. On the other hand, if

6

we determine a minimal Cover Fc of F, then we have to check for the satisfaction of a much smaller set of FDs, since, a relation r(R) that satisfies FD set Fc will also satisfy FD set F, since both are equivalent sets. This will reduce DBMS overheads.

Can an FD Set have more than one Canonical Covers?

Yes, an FD set F can have more than one Canonical Covers, but all of those sets would be equivalent to each other; and in turn equivalent to F.

Example:

Determine Canonical Cover of an FD Set {A BCD, B CDA, C ABD}

Step 1: Covert the FDs to their canonical form i.e. by equivalent sets of FDs, having only single attributes on their right side Fc : { A B, A C, A D, B C, B D, B A, C A, C B, C D}

Step2: Remove extraneous attributes from the left side of all FDs. Here, all FDs have only one attribute on its left side, thus cannot contain any extraneous attribute.Fc : { A B, A C, A D, B C, B D, B A, C A, C B, C D}

Step3: Remove extraneous FDs from the set. A B is implied by A C, C B; so A B can be eliminated from the set.So, now Fc:{A C, A D, B C, B D, B A, C A, C B, C D}

A D is implied by A C and C D; so A C can now be eliminated.So, now Fc:{A C, B C, B D, B A, C A, C B, C D}

B C is implied by B A and A C; so B C can now be eliminatedSo, now Fc:{A C, B D, B A, C A, C B, C D}

C D is implied by C B and B D; so C D can now be eliminatedSo, now Fc:{A C, B D, B A, C A, C B}

C A is implied by C B and B A; so C A can now be eliminatedSo, now Fc:{A C, B D, B A, C B}

No more FDs can be eliminated; so {A C, B D, B A, C B}forms the Canonical Cover of F.

In the beginning of Step3 above, had we eliminated A C, since it was implied by A B and B C. Then, the Canonical Cover would have been different. So, the Canonical Cover of an FD Set need not be unique.

Closure of an Attribute Set under F

7

Suppose is a sub-set of Schema R i.e. R. And suppose F is the set of FDs holding on Schema R. Then, the Closure of Attribute Set , denoted by +, is the complete set of attributes that can be determined by under the FD set F.

Algorithm to determine Closure of an Attribute Set under a set of FDs

Let schema R and F be the set of FDs holding on schema R.The closure of i.e. + under F can be determined as follows:-

+ = ;Repeat

Save-+ = +;For each FD ( ) F if + then + = + ;

Until (Save-+ == + ) ;The Concept of “Attribute Set Closure” can be used to determine the following:-

(a) Whether an Attribute Set is a Super Key of Relation Schema R (where R)

Determine + under F.If + equals R, then is a Super-Key of R.

(b) Whether holds on R(where R and R)

Determine + under F.If +, then holds on R.

(c) To determine Closure of the FD Set F i.e F +

F+ := F;For each FD in F+

Begin Determine + under F+;For each +

holds; Thus include it in F+

i.e. F+ := F+ { };End;

Loss-Less-Join Decomposition of a Relation Schema RDecomposition of a relation r(R) into r1(R1) and r2(R2) (such that R1 R2 = R) is said to be a loss-less-join decomposition, if it satisfies r1 * r2 = r i.e. natural join of r1 and r2

should generate r, with no tuples eliminated and with no new tuples added. Such a decomposition is also called Non-Additive Decomposition.

Example:-

8

Case IConsider the following relation r on schema R (A,B,C) and its decomposition into r1

and r2 . r(R)

A B C

A1 B1 C1

A2 B2 C1

A1 B1 C2

A3 B2 C3

A1 B1 C3

A2 B2 C4

r1(R1)

A BA1 B1

A2 B2

A3 B2

r2(R2)A C

A1 C1

A2 C1

A1 C2

A3 C3

A1 C3

A2 C4

r1 * r2A B C

A1 B1 C1

A1 B1 C2

A1 B1 C3

A2 B2 C1

A2 B2 C4

A3 B2 C3

It is a loss-less-join-decomposition, since r1 * r2 = r

9

Case IINow consider the following decomposition of r into r1 and r2 .

r(R)A B C

A1 B1 C1

A2 B2 C1

A1 B2 C2

A3 B2 C3

A1 B1 C3

A2 B1 C4

r1 (R1)A BA1 B1

A2 B2

A1 B2

A3 B2

A2 B1

r2 (R2)A C

A1 C1

A2 C1

A1 C2

A3 C3

A1 C3

A2 C4

r1 * r2A B C

A1 B1 C1

A1 B1 C2

A1 B1 C3

A2 B2 C1

A2 B2 C4

A1 B2 C1

A1 B2 C2

A1 B2 C3

A3 B2 C3

A2 B1 C1

10

A2 B1 C4

It is NOT a loss-less-join-decomposition, since r1 * r2 r Here r1 * r2 contains five additional tuples (shown in italics and underlined), which are not found in r.

Necessary Condition for a Decomposition to be loss-less –join decomposition

A decomposition of Relation Schema R into R1 and R2 (such that R1 R2 =R) will be a Loss-Less-Join (Non-Additive) Decomposition, if the common attributes of R1 and R2

(i.e. R1 R2) form candidate key of either R1 or R2 or both.i.e. R1 R2 R1

ORR1 R2 R2

In the above example, in Case I, FD A B holds on the Schema R which can be ascertained from the data represented in relation r of Case I. Thus, the common attribute of r1 and r2 i.e. {A} forms a Primary key of both R1 and R2. That is why the decomposition is a loss-less-join decomposition; whereas in Case II, such a condition does not hold; and that is why the decomposition is a Lossy (Additive) decomposition.Why is it desirable that a decomposition should be dependency-preserving:-

For any decomposition, it is mandatory that it should be a loss-less-join decomposition to ensure consistency of database; and also desirable (not mandatory) that it should be dependency-preserving, for the following reason:-

If each FD in F+ is preserved in at least one of the decompositions, then the satisfaction of each FD can be verified in a single relation itself; else it would require natural-join of more than one relations to verify some of the FDs, which are not preserved in the decomposition. A natural join operation to ascertain compliance of FDs would be too costly in terms of execution time and memory requirements. But, sometimes while normalizing a schema, it may not be possible to preserve all the FDs. But, ensuring that a decomposition is a loss-less-join decomposition is mandatory, since no compromise is possible on consistency of a database.

So, dependency-preservation is a desirable criteria, NOT A MANDATORY ONE. Whereas, ensuring Loss-Less-Join (Non-Additive Join) Decomposition is a mandatory criteria.

Algorithm to determine whether a Decomposition *( R1, R2, R3, -----,Rn) of R is a Dependency-Preserving Decomposition or not.

Let F be the set of FDs holding on R.

Compute Minimal Cover Fc of F.For Each FD in Fc

11

If Ri (For 1< i < n ) Then it is Dependency-Preserving Decomposition Else it is Non-Dependency-Preserving Decomposition.

NORMALIZATION

First Normal Form (1 NF) A Schema R is said to be in first normal form (1 NF) if all its attributes have only atomic domains i.e. domains of all attributes have only indivisible values. Alternatively, it can be stated that for a Relation Schema R to be in 1 NF, all its attributes should be “simple” and “single-valued”. Each field in each tuple of that relation must have only one value from the respective domain or a “NULL” value.

Let there be a relation schema EMP (E#, E_Name, Salary, Tel_No)An employee in EMP may have none/one/more-than-one Tel_No.

Then a Table (Relation) may be represented as :-

UN-NORMALIZED TABLE

E# E_Name Salary Tel_No

001 Ajay 200000 {9810222777, 2449227, 9422230230}

002 Vijay 50000 {NULL}

003 Ram 100000 {9810345567}

The Field Tel_No in a tuple has a set of values. Such a Table is said to be Un-normalized and it is not in First Normal Form.

The above table can be transformed to First Normal Form as follows:-

NORMALIZED TABLE

E# E_Name Salary Tel_No

001 Ajay 200000 9810222777001 Ajay 200000 2449227001 Ajay 200000 9422230230002 Vijay 50000 NULL

003 Ram 100000 9810345567

12

A tuple in the Un-Normalized Table is replaced by as many tuples in the Normalized Table as the number of telephones owned by the respective employee. Such a table is called Normalized Table or Flat Table. It is in First Normal Form and has a lot of data redundancy, which will be eliminated by further normalization of the Table.

Full Functional Dependency Let there be a Relational Schema R with a Candidate Key K and a non-prime attribute A (K R, AR). The Functional Dependency K A is said to be a “Full Functional Dependency”, if attribute A cannot be determined by any proper subset of K i.e. there does not exists any K1 K for which K1 A holds.

Partial Functional Dependency Let there be a Relational Schema R with a Candidate Key K and a non-prime attribute A (K R, AR). The Functional Dependency K A is said to be a “Partial Functional Dependency”, if attribute A can be determined by a proper subset of K i.e. there exists K1 K for which K1 A holds.

Let R1 (A, B, C, D, E) be a Relation Schema with all its attributes (A..E) having only atomic domains; and let F: {AB C, A D, D E} be the set of FDs holding on it.

By the Armstrong’s Rule of Transitivity {A D, D E} implies A E. So, A E also holds on R.

Since, all attributes of R1 have only atomic domains, it is in 1 NF.

Since {A, B}+ = ABCDE, {A,B} forms the a candidate key of this schema R; and this is the only candidate key of R. So, A and B are the prime attributes of R and all other attributes i.e. C, D and E are non-prime attributes.

The Non-prime attribute C is determined only by the full candidate key, since AB C. holds on R. So, C is said to be fully functionally dependent on the candidate key and the FD AB C is called a Full FD or Complete FD of R. However, the non-prime attributes D and E are determined by A alone, which is a proper subset of the candidate key {A, B}. Such a dependency is called partial functional dependency; and such FD causes certain Insert/Delete/Update anomalies, as demonstrated in the following example:-

Example:-

Consider a Schema SP1 (S#, P#, Sname, Scity, Status, Pname, Qty)

Where S# :Supplier Id number (Unique)

13

P# :Part Id Number (Unique)Sname :Supplier NameScity :Supplier CityStatus :Supplier Status, which depends on ScityPname: Part NameQty :Quantity of a Part (P#) to be supplied by a Supplier (S#)

Suppose, the following set of FDs holds on the Schema SP1:-S# Sname, ScityScity Status, P# Pname,{S#,P#} Qty

An instance of a relation, defined on Schema SP1:-

S# P# Sname Scity Status Pname QtyS1 P1 Avia Mumbai 10 Aero-engine 5S1 P2 Avia Mumbai 10 Generator 5S2 P1 Aero Delhi 20 Aero-engine 2S2 P3 Aero Delhi 20 Altimeter 5S3 P2 Air-supp Mumbai 10 Generator 10S3 P3 Air-supp Mumbai 10 Altimeter 20

This relation, being in 1 NF, has the following Insert/Delete/Update-anomalies:-

(a) Insertion Anomalies:-

(i) Information about a supplier, like Sname, Scity can be inserted only when the supplier is supplying at least one part.

(ii) Information about a Part like its Pname can be inserted only when the part is being supplied by at least one supplier.

(iii) Information about a City like its Status can be inserted only when there is at least one supplier from that city and the supplier is supplying at least one part.

(b) Deletion Anomalies:-

(i) If a supplier is supplying only one part, and that supply is concluded. Then the tuple relating to that supply will be deleted. With the deletion of that tuple, we would lose complete information about the supplier i.e. its Name and City.

14

(ii) Suppose, a part is being supplied by only one supplier. On completion of that supply, when the related tuple is deleted, we would lose the information about the Name of that part.

(iii) Suppose, a city has only one supplier and that supplier is supplying only one part. On deletion of the tuple of that particular supply, we would lose the information about the Status of that City.

(c ) Update Anomalies There is a lot of unwanted data redundancy like:-

(i) Information about the Sname and Scity of a particular supplier will be appearing as many times as the number of parts being supplied by that supplier.

(ii) Information about the name of a particular part will be appearing as many times as the number of supplies relating to that part.

(iii) Information about the Status of a City will be appearing as many times as the number of supplies from the suppliers of that City.

The unwanted data redundancy will need the redundant information to be updated at multiple places. Like when a Supplier moves from one city to other city, this has to be changed in multiple tuples. An inconsistent update or a partial update will cause database inconsistency.

Second Normal Form (2 NF)

A Relation Schema R is said to be in Second Normal Form (2 NF), iff:-

(a) It is in 1 NF and

(b) Each non-prime attribute of R is fully functionally determined by the candidate keys of R, that is, R does not involve any partial functional dependencies.

The above relation schema R1 is not in 2 NF, since it involves a partial functional dependency A DE.

Decomposing a 1 NF Schema into 2 NF Schemas

Using Heath’s Theorem, R1 can be loss-less decomposed into R21 and R22.

R21 (A, D, E) Primary Key {A}A D, D EBy transitivity A E also holds on R21.

15

R22 (A, B, C) Primary Key {A,B}Foreign Key {A} references R21

AB C

The decomposition of R1 into R21 and R22 is Loss-less Join Decomposition since:-

R21 R22 R21

Since, R21 and R22 involve no partial functional dependencies, both are in 2 NF.Now, Let us consider Schema SP1

{S#,P#}is a candidate key of SP1 and this is the only candidate key of SP1

So, S# and P# are prime attributes and all other attributes are non-prime.Since S# Scity and Scity Status; so S# Status also holds.

SP1 involves the following partial FDs:- S# Sname, Scity, Status AND P# Pname

Since, SP1 has partial FDs, so it is not in 2 NF.

SP1 can be decomposed into 2 NF Schemas S, P and SP2 in two steps as shown below:-

Step 1 Decompose SP1 into S and SP11, on the basis of partial FD S# Sname, Scity, Status

S (S#, Sname, Scity, Status) Primary Key {S#}S# Sname, S# Scity, Scity StatusBy Transitivity, S# Status

SP11( S#, P#, Pname, Qty) Primary Key {S#, P#}P# Pname{S#, P#} Qty

The above decomposition is a loss-less-join decomposition, since:-S SP11 = {S#} S since S# Sname, Scity, Status

The Schema S has no partial FD and is so in 2NF.However, SP11 has a partial FD P# Pname and it is still not in 2NF.

Step 2 Decompose SP11 into P and SP2, on the basis of partial FD P# Pname

P (P#, Pname) Primary Key {P#}

16

P# Pname

SP2 (S#, P#, Qty) Primary Key {S#, P#}{S#,P#} Qty Foreign Key {S#} references S

Foreign Key {P#} references PThe above decomposition is a loss-less-join decomposition, since:-P SP2 = {P#} P Since P# PnameBoth P and SP11 have no partial FDs and are in 2NF.Also, SP11 has no partial FD P# and it is still not in 2NF.

So, the 2NF decomposition of SP1 is:-S (S#, Sname, Scity, Status) Primary Key {S#}

S# Sname, S# Scity, Scity StatusBy Transitivity, S# Status

P (P#, Pname) Primary Key {P#}P# Pname

SP2 (S#, P#, Qty) Primary Key {S#, P#}{S#,P#} Qty Foreign Key {S#} references S

Foreign Key {P#} references P

The projections of SP1 over S, P and SP2 are:-S

S# Sname Scity StatusS1 Avia Mumbai 10S2 Aero Delhi 20S3 Air-supp Mumbai 10

PP# Pname

P1 Aero-engineP2 GeneratorP3 Altimeter

SP2S# P# Qty

S1 P1 5S1 P2 5S2 P1 2S2 P3 5S3 P2 10S3 P3 20

The partial dependency related problems have been resolved, as explained below:-

17

Insert:Sname and Scity of a Supplier can now be inserted into table S, even when it is not supplying even one part.

Pname of a part can now be inserted into table P, even when it is not being supplied by any supplier.

Delete:When information about a supply is deleted from table SP2, we do not lose any information about Sname, Scity or Pname.

Update

Information about Sname & Scity of a supplier now appears only in one tuple in table S.

Information about Pname of a particular part now appears only in one tuple in table P.

However, some anomalies still remain in 2 NF Schema. For example, the Status of city can be inserted in table S, only when at least one supplier is available in that city. Also, if the only supplier from a city ceases to exist, we lose the information about the status of that city. And if multiple suppliers exist in city, it status would appear in multiple tuples of table S. These anomalies will be eliminated in the next normal form.

In R21, the non-prime attribute E is dependent on the candidate key A, through another non-prime attribute D i.e. A D, D E A E.

Such a Functional Dependency is called a Transitive Dependency.

Transitive Functional Dependency This refers to a situation wherein a non-prime attribute of a Relation Schema R is dependent on its candidate key, through another non-prime attribute of R. Such a FD also causes some Insert/Delete/Update anomalies.

Third Normal Form (3NF)

A Relation Schema R is said to be in 3 NF, iff:-

(a) It is in 2 NF and

(b) Each non-prime attribute of R is non-transitively dependent on its candidate keys i.e. R does not involve any Transitive Dependencies.

18

Thus, R22 is in 3 NF but R21 is not, since it involves a Transitive Dependency i.e A D, D E A E.

Decomposition of a 2 NF Schema into 3 NF Schemas

R21 can be loss-less decomposed into R21 and R22.

R31 (D, E) Primary Key {D}D E

R32 (A, D) Primary Key {A}A D Foreign Key {D} references R31.

The decomposition of R21 into R31 and R32 is Loss-less Join Decomposition since:- R31 R32 R31

R31 and R32 do not involve any Transitive Dependency; and are thus in 3 NF.

3 NF decomposition of R1:-

R31 (D, E) Primary Key {D}D E

R31 (A, D) Primary Key {A}A D Foreign Key {D} references R31.

R22 (A, B, C) Primary Key {A, B}Foreign Key {A} references R31

AB C

Let us now consider decompositions of SP1

The Schemas P and SP2 do not involve any Transitive Dependencies; and are thus already in 3NF. But, S has a Transitive Dependency i.e. S# Scity, Scity Status S# Status

This Transitive Dependency of Schema S can be eliminated by decomposing it on the basis of FD Scity Status

STS (Scity, Status) Primary Key {Scity}Scity Status

SUPP (S#, Sname, Scity) Primary Key {S#}Foreign Key {Scity} references STS

S# Sname, Scity

Now, STS and SUPP do not have any Transitive Dependency; so both are in 3 NF.

The decomposition is loss-less, since:-

19

STS SUPP = {Scity} STS, since Scity Status

Thus, the 3NF Decomposition of SP1 is:-

STS (Scity, Status) Primary Key {Scity}Scity Status

SUPP (S#, Sname, Scity) Primary Key {S#}Foreign Key {Scity} references STS

S# Sname, Scity

P (P#, Pname) Primary Key {P#}P# Pname

SP2 (S#, P#, Qty) Primary Key {S#, P#}{S#,P#} Qty Foreign Key {S#} references SUPP

Foreign Key {P#} references P

The projections of S over STS and SUPP are:-

SUPP

S# Sname ScityS1 Avia MumbaiS2 Aero DelhiS3 Air-supp Mumbai

STS

Scity StatusMumbai 10Delhi 20

Now, all update anomalies have been resolved. Status of a particular city now appears in only one tuple in table STS. Also, information about status of a city can now be inserted irrespective of whether any supplier exists in that city or not.

BOYCE CODD NORMAL FORM (BCNF)

20

An alternate definition of a Relation Schema R to be in 3NF is as follows:-

3 NF A Relation Schema R is said to be in 3 NF, if each FD α→β holding on R satisfies one of the following three conditions:-

(a) It is a trivial FDOR (b) α is a Super Key of ROR (c) Each attribute in the set (β – α ) is a prime attribute.

A Relation Schema in Third Normal Form (3 NF) may still be riddled with some anomalies, under the situations when a schema has multiple candidate keys, which may be composite and overlapping. Any relation, under such schema, may have some data redundancies that would cause some update anomalies. For example, the schema:-

SP (S#, Sname, P#, Qty) with FDs: {S#,P#} Qty

{Sname, P#} Qty S# SnameSname S#

holding on it.

The relation schema SP has two candidate keys i.e {S#, P#} and {Sname, P#}. Both the candidate keys are composite and have one common attribute i.e. P#.

Set of Prime Attributes: {S#, Sname , P# }Set of Non-Prime Attributes: { Qty}

The only non-key attribute i.e. Qty is non-transitively and fully dependent on both the candidate keys. Thus, the schema SP is free of any partial dependencies or transitive dependencies and is thus in Third Normal Form (3NF).

This can also verified from the fact that each FD satisfies one of the necessary conditions for SP to be in 3 NF.

Despite being in 3NF, any legal relation under the schema will have some data redundancies, for example, the name of a particular supplier i.e. Sname will be repeated as many times as the number of supplies being made by that supplier.

Thus, there is need to have a normal form, stronger than 3NF. The necessary solution is provided by Boyce Codd Normal Form (BCNF).

BCNF A Relation Schema R is said to be in BCNF, if all non-trivial left-irreducible FDs that hold on R have only candidate keys as determinants. Alternately, we

21

can state that a Relation Schema R would be in BCNF if each FD α→β holding on R satisfies one of the following three conditions:-

(a) It is a trivial FDOR (b) α is a Super Key of R

The above two conditions, for BCNF, are same as the first two conditions of 3 NF. Thus, if a schema is in BCNF, it must also be in 3NF. However, the third condition of 3 NF is missing against the BCNF criteria, indicating that BCNF is more restrictive as compared to 3NF. So, it is possible that a schema may be in 3NF but not in BCNF. Thus, BCNF is a stronger normal form than 3NF.

Going by the definition of BCNF, SP is in 3 NF but not in BCNF since the two FDs i.e S# Sname, Sname S#, are neither trivial and nor have Super Keys on their Left Side.

Alternating it can be stated that a Relational Schema R will be in BCNF if each non-trivial left-irreducible FD α→β, holding on R, has only Candidate Key on its left side i.e. α must be a Candidate Key of R.

Decomposition of SP into a BCNF schema

SP can be decomposed into BCNF Schemas, on the basis of the FDs that violate BCNF i.e. S# Sname and Sname S#. The resulting BCNF decompositions of SP will be:-

S (S#, Sname) Primary Key {S#} or {Sname} S# Sname, Sname S#

SP1 (S#, P#, Qty) Primary Key {S#, P#}Foreign Key {S#} references S

{S#, P#} Qty

OR

S (S#, Sname) Primary Key {S#} OR {Sname} S# Sname, Sname S#

SP2 (Sname, P#, Qty) Primary Key {S#, P#}Foreign Key {Sname} references S

{S#, P#} Qty

It can be verified that the decompositions are loss-less-join decompositions.

Example:- Decompose the following Schema into a set of BCNF Schemas.

22

SP ( S#, Sname, P#, Pname, Scity, Status, Qty )

The set of FDs holding on SP:-

S# Sname, Scity Sname S#, Scity Scity StatusP# Pname{S#,P#} Qty {Sname, P#} Qty

The candidate keys of SP are :- {S#,P#}, {Sname, P#}

It can be verified that all the FDs indicated above are non-trivial and left-irreducible; and the following FDs do not have Candidate Key on left side:-

S# Sname, Scity Sname S#, Scity Scity StatusP# Pname Thus SP is not in BCNF.

Is BCNF a stronger normal form than 3NF ?

Yes, a relation in 3NF may not be in BCNF. But, a relation in BCNF will definitely be in 3NF also; since BCNF is more restrictive as compared to 3 NF. Like in the above example, SP is in 3NF but not in BCNF. However, its decompositions SP3, P, STS and SUPP are all in BCNF; and are also in 3NF. Thus, BCNF is a stronger normal form than 3NF.

In fact, we can state that a relation schema in BCNF will be free of all those data anomalies that can be eliminated on the basis of functional dependencies (FDs).

Example:-

Using ABU’s Algorithm, determine whether the following decomposition of SP (S#, Sname, Scity, Status, P#, Pname, Price, Qty) is a loss-less-join decomposition?

Decomposition:-CS (Scity, Status)SUPP(S#, Sname, Scity)PART (P#, Pname, Price)SPN (S#, P#, Qty)

23

FDs Holding on SP:-

S# Sname, ScityScity StatusP# Pname, Price{S#, P#} Qty

Since, S# Scity and Scity Status, thus S# StatusTherefore, S# Sname, Scity, StatusABU’s AlgorithmStep 1 and Step 2: Make a matrix of size 4 x 8 and initialize it.

S# Sname Scity Status P# Pname Price Qty 0 1 2 3 4 5 6 7

CS 0 b00 b01 a2 a3 b04 b05 b06 b07

SUPP 1 a0 a1 a2 b13 b14 b15 b16 b17

PART 2 b20 b21 b22 b23 a4 a5 a6 b17

SPN 3 a0 b31 b32 b33 a4 b35 b36 a7

Step 3:

Applying the FD Scity Status, rows 0 and 1 match on the value of Scity, so force these two rows to match on the value of Status. Thus replace b13 in row 1 by a3.


CS 0 b00 b01 a2 a3 b04 b05 b06 b07

SUPP 1 a0 a1 a2 a3 b14 b15 b16 b17

PART 2 b20 b21 b22 b23 a4 a5 a6 b17

SPN 3 a0 b31 b32 b33 a4 b35 b36 a7

Now, applying the FD P# Pname, Price , rows 2 and 3 match on the value of P#, so force these two rows to match on the value of Pname and Price. Thus replace b 35 in row 3 by a5 and replace b36 in row 3 by a6


CS 0 b00 b01 a2 a3 b04 b05 b06 b07

SUPP 1 a0 a1 a2 a3 b14 b15 b16 b17

PART 2 b20 b21 b22 b23 a4 a5 a6 b17

SPN 3 a0 b31 b32 b33 a4 a5 a6 a7

Now, applying the FD S# Sname, Scity, Staus , rows 1 and 3 match on the value of S#, so force these two rows to match on the value of Sname, Scity and Status. Thus replace b31 in row 3 by a1; replace b32 in row 3 by a2; and replace b33 in row 3 by a3


CS 0 b00 b01 a2 a3 b04 b05 b06 b07

24

SUPP 1 a0 a1 a2 a3 b14 b15 b16 b17

PART 2 b20 b21 b22 b23 a4 a5 a6 b17

SPN 3 a0 a1 a2 a3 a4 a5 a6 a7

Step 4 The row 3 contains only “a” values; therefore the above decomposition is a loss-less-join decomposition of SP.

Multi-Valued Dependencies (MVDs and 4 NF)

Definition of MVD

A relation schema R (,,) is said to have multi-valued dependencies ( multi-determines ) and ( multi-determines ), if and only if for every legal relation r(R) and for each tuple-pair {t1, t2} r that satisfies t1[] = t2[], t1 [] t2 [] and t1 [] t2 [], there exists a tuple-pair {t3, t4} r that satisfies t3[] = t4[]= t1[] = t2[] and t3[] = t1[] & t4[] = t2[] and t3[] = t2[] & t4[] = t1[]. Non-Trivial MVDs always occur in pairs, like and ; both can be jointly denoted as .

Trivial MVD

An MVD holding on a schema R is said to be trivial, iff:-(a)

or(b) = R

Non-Trivial MVDs occur only in pairs.

MVDs and a Loss-less Join Decomposition

Fagin’s Theorem

25

If a relation schema R (, , ) has a MVD holding on it, then it can be loss-less decomposed into R1 ( ,) and R2 (,). Inference Rules for MVDs

1. Complementation Rule If holds on a Schema R where (R, R) then R- ( ) will also hold on R.

This rule implies that all non-trivial MVDs will occur only in pairs.

2. Transitivity Rule If and then (-)

3. Union Rule If , , then , - ,

– ,

4. Augmentation Rule If , then where

5. Replication Rule If , then .

6. Coalescence Rule If and Where and = 0Then .

7. Mixed Transitivity Rule

If , , then (-)

Problem Let R = (A, B, C, G, H, I) be a relation schema, with the following set of dependencies holding on it:-

D = { A B, BHI, CGH}

Find whether the following dependencies are members of D+ :-A CGHIA HIB HA CGA H

Sol:

Since A B, so by complementation A (R – A – B) CGHI

Since A B and B HI so by transitivity A HI – B HI

26

Since B HI CG H where H HI and HI CG = 0Therefore, by Coalescence Rule, B H

Since, A CGHI and A HI therefore by Union Rule A CGHI- HI CG

Since A HI CG H where H HI and HI CG = 0Therefore, by Coalescence Rule, A H

Problem Consider a Relational Schema R (A, B, C, D, E). Let the set of MVDs holding on R be { A BC, B CD and E AD}. Determine its loss-less 4NF decomposition.

SolutionR (A, B, C, D, E)M = {A BC, B CD, E AD }

Since, MVD A BC holds on R, it will also satisfy A (R-BC) – A i.e. A DEThus, by Fagin’s Theorem, R can be loss-less decomposed into:-R1 (A, B , C) R2 (A, D, E).

Similarly, since MVDs B CD and E AD hold on R, it can be proved that the following are also loss-less & 4 NF decompositions of R :-

R1 (B, C, D) R2 (B, A, E)

ORR1 (E, A, D) R2 (E, B, C)

27

How to eliminate these data anomalies?

These anomalies are due to the non-trivial Multi Valued Dependencies (MVDs) holding on the schema CTX.

Multi Valued Dependency (MVD)

Let there be a relation schema R (,,). It is said to have MVD from to (denoted as ) and from to (denoted as ), if and only if for every legal relation r(R) it satisfies the following:-

(a) The set of -values, matching a given {-value, -value} pair, are dependent only on the -value and are completely independent of the -value.

And

(b) The set of -values, matching a given {-value, -value} pair, are dependent only on -value and are independent of -value.

Alternately, we can state that a relation schema R (,,) is said to have multi-valued dependencies and (both denoted by ), if and only if for every legal relation r(R), and for a tuple pair {t1 , t2 } r t1[] = t2[], there exists a tuple-pair {t3 , t4 } r, which satisfy:-

t1[] = t2[]=t3[] = t4[] and t1[] = t3[], t2[] = t4[],and t1[] = t4[], t2[] = t3[],

28

As per this definition, the relation ctx satisfies the MVD Course TeacherText

Since, {OS, Galvin} {Ravi, Vivek} and {OS, Dietel} {Ravi, Vivek}So, the set of Teachers, teaching a particular Course, is dependent only on the Course taught and is completely independent of the Texts followed for the particular Course.

Similarly, {OS, Ravi} {Galvin, Dietel} and {OS, Vivek} (Galvin, Dietel}So, the set of Texts, followed for a Course, depends only on the Course taught and is independent of the Teacher teaching it.

Trivial MVD An MVD , holding on a relation schema R, is said to be trivial, if:-

(a) or

(b) = R

Such MVDs are termed to be trivial, since these are satisfied by every relation on a schema R.

MVD is a generalization of FD An FD implies an MVD , wherein the set of values, matching a given value, will be a singleton set.

Fagin’s Theorem If a relation schema R (,,) satisfies an MVD then R can be loss-less decomposed into schemas R1 (,) and R2 (,). This implies that any legal relation r, on the schema R (,,), will be equal to equi-join of its projections on (,) and (,); that is r = ,(r) * ,(r)

Fourth Normal Form (4 NF)

A relation schema R is said to be in 4 NF, if and only if every MVD holding on R satisfies either the following two conditions:-

(a) It is trivial MVD or(b) is a Super Key of R

Now, the relation CTX has non-trivial MVDs Course Teacher and Course Text. These are not trivial MVDs. Also, Course is not Super Key of CTX, since CTX is an “All Key” Relation Schema. Thus, CTX is in BCNF; but not in 4 NF.

Non-loss Decomposition, based on MVDs

As per Fagin’s Theorem, a relation schema R (,,) satisfying MVDs can be loss-less decomposed into schemas R1 (,) and R2 (,). So, CTX can be decomposed into CT and CX.

29

ctCOURSE TEACHEROS RaviOS VivekCO RamCO Shyam

cxCOURSE TEXTOS GalvinOS DietelCO HamacherCO M-mano

As evident, ct * cx = ctx

The relations ct and cx are not satisfying any non-trivial MVDs; thus both CT & CX are in 4 NF.The relations are free of the data redundancies indicated above, which existed in ctx. The information of a Teacher teaching a particular Course is now represented only in one tuple in ct and the information regarding a Text being followed for a particular Course is represented only at one place in cx.

A relation schema R in BCNF, not having any non-trivial MVDs holding on it, will be in 4 NF. But, it may still have some data anomalies, for example consider a schema CTX 4, with the following constraints:-

(a) A Course may be taught by a number of Teachers.

(b) A number of Text books may be followed for a Course.

(c) The set of Texts, followed for a Course, depend not only on the Course but also on the Teacher teaching it. It means that each teacher teaching a particular course may follow different sets of text books; the sets may be overlapping.

(d) If a Teacher T1, teaching a Course C1, does not follow a Text X1, which is being followed by another Teacher T2 to teach the course C1, then T1 must not follow X1 for any other Course, which he may be teaching.

ctx4

COURSE TEACHER TEXTOS Ravi GalvinOS Vivek DietelOS Ravi DietelCO Ram HamacherCO Shyam M-mano

30

CO Shyam Hamacher

So, the set of TEXT-values that occur matching a given {COURSE-value, TEACHER-value} pair in ctx4 depends not only on COURSE but also on TEACHER. So, it does not satisfy the MVD COURSE TEACHERTEXT. So, the schema CTX4 does not have any non-trivial MVDs holding on it. So, it is in 4 NF. But ctx 4 still has some data redundancies, like the information about a teacher teaching a course appears as many times, as the number of text books followed by that teacher for that Course. Since, CTX4 does not satisfy MVD COURSE TEACHERTEXT, it can be verified that ct * cx ctx4


cxCOURSE TEXTOS GalvinOS DietelCO HamacherCO M-mano

ct * cxCOURSE TEACHER TEXT

OS Ravi GalvinOS Ravi DietelOS Vivek GalvinOS Vivek DietelCO Ram HamacherCO Ram M-manoCO Shyam M-manoCO Shyam Hamacher

As verified above, ct * cx ctx4. ct * cx has two spurious tuples, which do not exist in ctx4. So, the decomposition of CTX4 into CT and CX is not a loss-less (non-additive) decomposition.

31

It may be feasible to eliminate these data redundancies of ctx4, on the basis of another type of dependency, called Join Dependency (JD).Example (Employee-Project-Department)

Suppose an organization has a set of Employees, a set of Departments and a set of Projects which are being progressed at various departments with the following constraints holding on the system:-

(a) Each Department could be working on many Projects.(b) Each Project could be getting progressed at many Departments.(c) Each employee could be working on many Projects in many Departments.(d) The set of Departments in which an Employee is Working is determined by

the Employee alone and is completely independent of the set of Projects on which that Employee is working.

(e) The set of Projects on which an Employee is working is determined by the Employee alone and is completely independent of the set of Departments in which the Employee is working.

A Sample Database, created on a schema with the above constraints, will be:-

EPDE# P# D#E1 P1 D3

E1 P2 D1

E1 P1 D1

E1 P2 D3

E4 P3 D2

E4 P1 D3

E4 P3 D3

E4 P1 D2

The above table does not have any non-trivial FD; and is thus in BCNF. However, it has a lot of data redundancies; like the fact that Employee E1 is working on project P1

is reflected in two tuples. Similarly, there are many redundancies.

In the above table, we have:-{E1 , P1} {D3, D1}{E1 , P2} {D3, D1}

This implies that the set {D3, D1}is determined by E1 alone and does not change when P# is changed from P1 to P2.

However when E# is changed from E1 to E4, the set of D#s changes as indicated below:-{E4 , P3} {D2, D3}

32

{E4 , P1} {D2, D3}

Thus, the schema for this table satisfies E# P# and E# D#. This pair of MVDs is non-trivial. Thus, EPD is not in 4NF. It can be loss-less decomposed into EP(E#, P#) and ED(E# , D#) as shown below:-

EPE# P#E1 P1

E1 P2

E4 P3

E4 P1

EDE# D#E1 D3

E1 D1

E4 D2

E4 D3

It can be verified that EP*ED = EPD. Also, both EP and ED are free of the data redundancies. Both EP and ED do not have any non-trivial MVD and are thus in 4NF.

Join Dependency (JD) A Relation Schema R is said to have a Join Dependency *(R1, R2,…., Rn), if and only if any legal relation r(R) is equal to equi-join of its projections on R1, R2,…., Rn.

i.e r = R1 (r) * R2 (r) * ……..* Rn (r)

Trivial Join Dependency

A JD *( R1, R2,…., Rn) of a relation schema R is said to be trivial, if one of the projections (R1…Rn )is equal to R itself.

An MVD is also a JD

An MVD on a relation schema R is also a JD *(, ). This implies that a legal relation r(R ) can be loss-less decomposed into its projections and i.e.r = (r) (r)

The relation schema CTX4 has a Join Dependency * (CT, TX, XC) where C: Course, T: Teacher and X: Text , which can be verified as follows:-

33


txTEACHER TEXTRavi GalvinVivek DietelVivek GalvinRam HamacherShyam M-manoShyam Hamacher

xcTEXT COURSEGalvin OSDietel OSHamacher COM-mano CO

ct * tx * xc

COURSE TEACHER TEXTOS Ravi GalvinOS Ravi MilanOS Vivek GalvinCO Ram HamacherCO Shyam M-manoCO Shyam Hamacher

Thus, ct * tx * xc = ctx4, thus CTX4 has a Join Dependency *(CT, TX, XC)So, CTX4 can be loss-less decomposed into its projections on CT, TX and XC, which are free of any data redundancies that existed in the relation CTX4.

Non-Trivial JD

A JD *(R1, R2, ….Rn) of relation schema R is said to be trivial iff one of the projections in JD is equal to R itself. Such a JD hold on each schema.

34

Fifth Normal Form (5 NF)

A relation schema R is said to be in 5 NF, if and only if any non-trivial Join Dependency holding on R, is implied by its Candidate Keys.

The relation CTX4 has a JD *(CT,TX,XC) which is not implied by its Candidate Key {C,T,X}. Thus, CTX4 is in 4 NF but not in 5 NF.

The relations CT, TX and XC do not have any non-trivial Join Dependencies, and are thus in 5 NF.

Assuming that the Text Books followed for a Course are dependent not only on the Course but also on the Teacher teaching it, Join Dependency will hold on CTX4, only if the following is satisfied:-

“If a Teacher T1 teaching a Course C1, does not follow a Text Book X1, which is being followed by another Teachers teaching Course C1, then T1 must not follow X1

for any other Course also that he may be teaching.” Only then the JD *(CT,TX,XC) will hold on CTX4

A relation schema in 5 NF but still having some data redundanciesExample: A schema CTX5, with the following constraints:-

1. A Course may be taught by any number of Teachers and a Teacher may teach any number of Courses.

2. A number of Text books may be followed for a Course and a Text book may be followed for any number of courses.

3. A Teacher T1, teaching a Course C1, may not follow a Text Book X1, which is being followed by another teacher teaching the Course C1, but T1 may follow X1 while teaching another Course say C2.

A relation under schema CTX5:-

ctx5COURSE TEACHER TEXT

OS Ravi GalvinOS Vivek MilanOS Ravi MilanCO Vivek HamacherCO Ram HamacherCO Vivek Galvin

If we have its projections on ct, tx and xc :-

35

ctCOURSE TEACHEROS RaviOS VivekCO VivekCO Ram

txTEACHER TEXTRavi GalvinVivek MilanRavi MilanVivek HamacherRam HamacherVivek Galvin

xc

TEXT COURSEGalvin OSMilan OSHamacher COGalvin CO

ct * tx * xc

COURSE TEACHER TEXTOS Ravi GalvinOS Ravi MilanOS Vivek GalvinOS Vivek MilanCO Vivek HamacherCO Vivek GalvinCO Ram Hamacher

ct * tx * xc has an additional tuple i.e. OS, Vivek, Galvin, which does not exist CTX5. Thus, decomposition of CTX5 into its projections on CT,TX and XC is not a loss-less (NON-ADDITIVE) decomposition. The natural join of CT, TX, XC contains some spurious tuples. Thus, CTX5 does not have any JD and. thus, it is in 5 NF.

CTX5, though in 5 NF, still has some data anomalies; like information that ‘Galvin is the text-book for OS’ is represented twice. These data anomalies cannot be eliminated on the basis of Functional Dependencies, Multi-Valued Dependencies or Join Dependencies.

unit iii dbms

Documents

relation schema r

relation schema normalization

schema student roll

relational schema student

trivial fd

fd registrationnumber

superkey of schema student

super key sk of schema