normalization first normal form (1nf): a relation r is in 1nf if all attributes have atomic value =...

23
NORMALIZATION FIRST NORMAL FORM (1NF): A relation R is in 1NF if all attributes have atomic value = one value for an attribute = no repeating groups = no multivalued attributes = no composite attributes

Upload: julie-laura-logan

Post on 16-Dec-2015

245 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: NORMALIZATION FIRST NORMAL FORM (1NF): A relation R is in 1NF if all attributes have atomic value = one value for an attribute = no repeating groups =

NORMALIZATION

FIRST NORMAL FORM (1NF):

A relation R is in 1NF if all attributes have atomic value

= one value for an attribute

= no repeating groups

= no multivalued attributes

= no composite attributes

Page 2: NORMALIZATION FIRST NORMAL FORM (1NF): A relation R is in 1NF if all attributes have atomic value = one value for an attribute = no repeating groups =

Example

Non-1NF

EMP (E#, ENAME, SKILL). Here SKILL is a multi-valued attribute.

EMP( E#, ENAME, SKILL1, SKILL2, SKILL3, SKILL4, ....). Skill as a repeating group attribute.

Page 3: NORMALIZATION FIRST NORMAL FORM (1NF): A relation R is in 1NF if all attributes have atomic value = one value for an attribute = no repeating groups =

NON-1NF 1NF

There are two methods of converting a NON-1NF into a 1NF relation. Method 1 mapps out the multi-valued (or repeating group) attribute into another table, while method 2 keeps the multi-valued attribute but simply uses a composite PK.

Page 4: NORMALIZATION FIRST NORMAL FORM (1NF): A relation R is in 1NF if all attributes have atomic value = one value for an attribute = no repeating groups =

Method 1: Conversion to 1NF

1. Create one relation for repeating groups by adding the key of original relation.

2. Remove the attributes of repeating groups from the original relation.

Page 5: NORMALIZATION FIRST NORMAL FORM (1NF): A relation R is in 1NF if all attributes have atomic value = one value for an attribute = no repeating groups =

Example

SKILL (E#, SKILL)

EMP (E#, ENAME)

Note the composite PK of SKILL relation.

Page 6: NORMALIZATION FIRST NORMAL FORM (1NF): A relation R is in 1NF if all attributes have atomic value = one value for an attribute = no repeating groups =

Method 2: We can also flatten the table as follows:

EMP (E#, Skill, Ename)

- Elmasri's book uses this method.

- This method repeats the repeating group value in a separate tuple.

- Note the composite PK.

Page 7: NORMALIZATION FIRST NORMAL FORM (1NF): A relation R is in 1NF if all attributes have atomic value = one value for an attribute = no repeating groups =

What are advantages and disadvantages of the two

methods? DEPT (D#, DNAME, DMGRSSN, DLOC), where

DLOC is a multi-valued attribute

EMP_PROJ (SSN, ENAME, PROJS(PNUMBER, HOURS)), where PROJS is a composite attribute.

Typically, most relational database systems assume your DB should be 1NF

Page 8: NORMALIZATION FIRST NORMAL FORM (1NF): A relation R is in 1NF if all attributes have atomic value = one value for an attribute = no repeating groups =

SECOND NORMAL FORM (2NF)

A relation R is in 2NF if

(a) R is in 1NF, and

(b) each attribute is fully functionally dependent on the whole key of R

Page 9: NORMALIZATION FIRST NORMAL FORM (1NF): A relation R is in 1NF if all attributes have atomic value = one value for an attribute = no repeating groups =

Example

INVENTORY (WH, PART, QTY, WH_ADDR)

WH, PART --> QTY

WH --> WH_ADDR (This is not in 2NF, since WH is a part of a key)

Key: WH+PART

Page 10: NORMALIZATION FIRST NORMAL FORM (1NF): A relation R is in 1NF if all attributes have atomic value = one value for an attribute = no repeating groups =

Problem of non-2NF (update anomaly):

• Warehouse address is repeated for every part stored

• If the address is changed, needs multiple updates

• If no parts in a warehouse, can't keep the warehouse address

Page 11: NORMALIZATION FIRST NORMAL FORM (1NF): A relation R is in 1NF if all attributes have atomic value = one value for an attribute = no repeating groups =

2NF Decomposition:

• Create a separate relation for each PD

• Remove the RHS of the PD from the original relation.

Page 12: NORMALIZATION FIRST NORMAL FORM (1NF): A relation R is in 1NF if all attributes have atomic value = one value for an attribute = no repeating groups =

• The above Non-2NF can be transformed into the following 2NF relations.

INVENTORY(WH, PART, QTY)

WAREHOUSE(WH, WH_ADDR)

Page 13: NORMALIZATION FIRST NORMAL FORM (1NF): A relation R is in 1NF if all attributes have atomic value = one value for an attribute = no repeating groups =

Example• NOTE that Non-2NF occurs only when we have

a composite key.

•  EMP_PROJ (SSN, P#, HOURS, ENAME, PNAME, PLOC)

SSN, P# --> HOURS SSN --> ENAME (* Violate 2NF; SSN is a part of a key*)

P# --> PNAME, PLOC (* Violate 2NF; P# is a part of a key *)

Page 14: NORMALIZATION FIRST NORMAL FORM (1NF): A relation R is in 1NF if all attributes have atomic value = one value for an attribute = no repeating groups =

2NF decomposition

R1 (SSN, P#, HOURS) R2 (SSN, ENAME)R3 (P#, PNAME, PLOC)

Page 15: NORMALIZATION FIRST NORMAL FORM (1NF): A relation R is in 1NF if all attributes have atomic value = one value for an attribute = no repeating groups =

THIRD NORMAL FORM (3NF)

A relation R is in 3NF if

a) it is in 2 NF and

b) it has no transitive dependencies.

That is, each nonkey attribute must be functionally dependent on the key and nothing else. If you have any FD whose LHS is not a PK (or CK), then R is not in 3NF.

Page 16: NORMALIZATION FIRST NORMAL FORM (1NF): A relation R is in 1NF if all attributes have atomic value = one value for an attribute = no repeating groups =

Example

WORK (EMP#, DEPT, LOC)KEY: EMP#

2NF 3NF

(1) EMP# --> DEPT Y Y

(2) DEPT --> LOC Y N

WORK is in 2NF, but not in 3NF because of FD (2).

Page 17: NORMALIZATION FIRST NORMAL FORM (1NF): A relation R is in 1NF if all attributes have atomic value = one value for an attribute = no repeating groups =

Problem of Non-3NF

• Dept. location is repeated for every employee

• If the location is changed, needs multiple updates

• If you forget to change all records, can cause inconsistency

• If a dept. has no employees, can't keep dept location

Page 18: NORMALIZATION FIRST NORMAL FORM (1NF): A relation R is in 1NF if all attributes have atomic value = one value for an attribute = no repeating groups =

3NF DECOMPOSITION

Algorithm for a given minimal cover:

1) Combine the RHS of FDs if they have common LHS

2) Create a separate table for each FD.

3) Check for Lossless decomposition.

(Check whether a CK of the original realtion appears in any of the decomposed relation). IF not lossless, then add a table consisting of a CK.

Page 19: NORMALIZATION FIRST NORMAL FORM (1NF): A relation R is in 1NF if all attributes have atomic value = one value for an attribute = no repeating groups =

Example

• R1 (EMP#, DEPT), R2 (DEPT, LOC)

• The original relation WORK is not in 3NF, but R1 and R2 are in 3NF.

• Note that the LHS of a FD becomes the PK of each decomposed table.

Page 20: NORMALIZATION FIRST NORMAL FORM (1NF): A relation R is in 1NF if all attributes have atomic value = one value for an attribute = no repeating groups =

• Our 3NF definition we used above is an informal one used by many industry designers. Some DB text books, including Elmasri's book use a more rigorous definition that is shown below.

Page 21: NORMALIZATION FIRST NORMAL FORM (1NF): A relation R is in 1NF if all attributes have atomic value = one value for an attribute = no repeating groups =

Formal Def. of 3NF

A relation R is in 3NF if, for all X --> A in R(1) X is a super key or(2) A is a prime attribute (where X and A

could be a set of attributes)

In other words, all attributes, except prime attributes, must be dependent on any candidate keys.

Page 22: NORMALIZATION FIRST NORMAL FORM (1NF): A relation R is in 1NF if all attributes have atomic value = one value for an attribute = no repeating groups =

• The only difference between the informal definition and the formal definition is the second condition in the formal definition. That is, the formal definition allows transitive dependency whose RHS is a prime attribute, where a prime attribute is an attribute that belongs to any candidate key. The difference between these two definition is very minor and many real-world DB designers just use the informal definition. For you reference, we showed the formal definition of 3NF.

Page 23: NORMALIZATION FIRST NORMAL FORM (1NF): A relation R is in 1NF if all attributes have atomic value = one value for an attribute = no repeating groups =

SUMMARY OF NORMALIZATION

- As we go to higher normal forms, we create a more number of relations.

- Each higher normal form removes a certain type of dependency that causes redundancy.

- As a relation becomes a higher normal form:- We have a more number of relations- That increases more number of joins in query forming- Which increases more number of join processings- And also more referential integrity constraints need to be maintained- And thus schema is complicated and performance is drcreased.

So, many real-world DB designers stop at 3NF, which reasonably removes typical redundnacy and still maintains performance. So, strive to achive 3NF in your real-world RDB!