what is normalization ? proposed by codd in 1972 takes a relation through a series of steps to...

37
Normalization

Upload: holly-rodgers

Post on 17-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Normalization

NormalizationWhat is normalization ?Proposed by Codd in 1972

Takes a relation through a series of steps to certify whether it satisfies a certain normal form

Initially Codd proposed three normal forms

Boyce-Codd normal form is introduced by Boyce and Codd

Based on functional dependencies between attributes of a relation

Later 4th and 5th normal forms were introduced based on multi-valued dependencies and join dependencies

Normalization is the process of efficiently organizing data in a database

There are two goals of the normalization process:Eliminating redundant data For example, storing the same data in more than one tableEnsuring data dependencies make senseOnly storing related data in a table

Reduce the amount of space a database consumes and ensure that data is logically stored

Through normalization we want to design for our relational database a set of files that Contain all the data necessary for the purposes that the database is to serveHave as little redundancy as possibleAccommodate multiple values for types of data that require themPermit efficient updates of the data in the databaseAvoid the danger of losing data unknowinglyNormalization AvoidsDuplication of DataThe same data is listed in multiple lines of the databaseInsert AnomalyA record about an entity cannot be inserted into the table without first inserting information about another entity Cannot enter a customer without a sales orderDelete AnomalyA record cannot be deleted without deleting a record about a related entity. Cannot delete a sales order without deleting all of the customers information.Update AnomalyCannot update information without changing information in many places. To update customer information, it must be updated for each sales order the customer has placed

The Normal FormsGuidelines for ensuring that databases are normalized

Numbered from 1 through 5

1NF, 2NF, 3NF, 4NF and 5NF

In practical applications, We often see first three normal formsOccasionally we can see 4th normal formAnd 5th normal form is rarely seen

Normalization is a three stage process After the first stage, the data is said to be in first normal formAfter the second, it is in second normal formAfter the third, it is in third normal formBefore NormalizationBegin with a list of all of the fields that must appear in the database. Think of this as one big table.

Do not include computed fields

One place to begin getting this information is from a printed document used by the system.

Additional attributes besides those for the entities described on the document can be added to the database.

ORDERSSalesOrderNo, Date, CustomerNo, CustomerName, CustomerAdd, ClerkNo, ClerkName, ItemNo, Description, Qty, UnitPrice

Some definitions:Functional Dependency The value of one attribute in a table is determined entirely by the value of the primary keyPartial Dependency A type of functional dependency where an attribute is functionally dependent on only part of the primary key (primary key must be a composite key). Transitive Dependency A type of functional dependency where an attribute is functionally dependent on an attribute other than the primary key. Thus its value is only indirectly determined by the primary key.

First Normal FormTo disallow multi-valued attributes, composite attributes and complex attributes

Domain of an attribute must include only atomic values (simple and indivisible)

Disallows relations within relations or relations as attribute values within tuples

Example 1:DNAMEDNUMBERDMGRENODLOCATIONSResearch5333445555{Bangalore, New Delhi, Hyderabad}Administration4987654321{Chennai}Headquarters1888665555{Hyderabad}DLOCATION is not an atomic attributeThe domain of DLOCATIONS contain atomic values

The domain of DLOCATIONS contain sets of values (nonatomic)

Techniques to achieve 1NFRemove the attribute DLOCATIONS that violates 1NF and place it in a separate relation DEPT_LOCATIONS

DEPARTMENT

DEPT_LOCATIONSDNAMEDNUMBERDMGRENOResearch5333445555Administration4987654321Headquarters1888665555DNUMBERDLOCATION5Bangalore5New Delhi5Hyderabad4Chennai1HyderabadExpand the key so that there will be separate tuple in the original DEPARTMENT for each locationdisadvantage : introduces redundancy in relationDNAMEDNUMBERDMGRENODLOCATIONSResearch5333445555BangaloreResearch5333445555New DelhiResearch5333445555HyderabadAdministration4987654321ChennaiHeadquarters1888665555HyderabadIf the maximum number of values is known for the attribute, replace the attribute by number of atomic attributesdisadvantage : introduces null valuesDNAMEDNUMBERDMGRENODLOCATION1DLOCATION2DLOCATION3Research5333445555BangaloreNew DelhiHyderabadAdministration4987654321ChennaiHeadquarters1888665555HyderabadExample 2 : ENOENAMEPROJSPNUMBERHOURSEMP_PROJ( ENO, ENAME, {PROJS ( PNUMBER, HOURS ) } )

ENO is the primary key and PNUMBER is partial key of relation

Example 3 :PERSON ( IDNO, ENAME, ADDRESS, AGE, PROFESSION, {CAR_LIC}, {PHONE} )Second Normal FormThe relation should be in first normal form

Based on full functional dependency

A functional dependency X Y is a full functional dependency if removal of any attribute A from X means that the dependency does not hold any more

A X, (X - {A}) does not functional determine Y

A partial dependency X Y is a partial dependency if some attribute A X, (X {A}) Y

Example 1 :ENOPNOHOURSENAMEPNAMEPLOCATIONA relation R is in 2NF if every non-prime attribute A in R is fully functionally dependent on the primary key of R

If primary key contains one attribute, the test need not be applied at all

ENAMEENODOBADDRESSDNUMBERDNAMEDMERGENOIf the relation is not in 2NF, it can be second normalized in to a number of 2NF relations in which non-prime attributes are associated only with the part of the primary key on which they are fully functionally dependent

ENOPNOHOURSENAMEPNAMEPLOCATIONENOPNUMBERHOURSENOENAMEPNUMBERPNAMEPLOCATIONThird Normal FormRelation should be in second normal form

Based on transitive dependency

A functional dependency A Y in a relation R is a transitive dependency if there is a set of attributes Z that is neither a candidate key nor a subset of any key of R, and X Z and Z Y hold

The dependency ENO DNUMBER and DNUMBER DMGRENO hold and DNUMBER is neither a key nor a subset of a key

A relation is in 3NF if it satisfies 2NF and no non-prime attribute of R is transitively dependent on the primary keyENAMEENODOBADDRESSDNUMBERDNAMEDMERGENO25ENAMEENODOBADDRESSDNUMBERDNUMBERDNAMEDMGRENOENAMEENODOBADDRESSDNUMBERDNAMEDMERGENOIn what normal form this relation is ????

GRADES (StudentID, Course#, Semester#, Grade)

Suppose you are given a relation R = (A,B,C,D,E) with the following functional dependencies: {CE ! D,D ! B,C ! A}.a. Find all candidate keys.b. Identify the best normal form that R satisfies ( 1NF , 2NF , 3NF )What is normalization ????A relational database is basically composed of tables that contain related data. The process of organizing this data is called as normalization

What is 1 NF (Normal Form)????The domain of attribute must include only atomic (simple, indivisible) values.

What is 2NF???? A relation schema R is in 2NF if it is in 1NF and every non-prime attribute A in R is fully functionally dependent on primary key.

What is 3NF?A relation schema R is in 3NF if it is in 2NF and for every FD X A either of the following is trueX is a Super-key of R.A is a prime attribute of R.In other words, if every non prime attribute is non-transitively dependent on primary key.

NORMAL FORMTESTREMEDY1NFRelation should have no non-atomic attributes or nested relationsForm new relation for each non-atomic attribute or nested relation2NFFor relations where primary key contains multiple attributes, no non-key attribute should be functionally dependent on a part of the primary keyDecompose and set up a new relation for each partial key with its dependent attribute(s). Make sure to keep relation with the original primary key and any attributes that are fully functional dependent on it3NFRelation should not have a non-key attribute functionally determined by another non-key attribute. There should be no transitive dependency of a non-key attribute on the primary keyDecompose and set up a relation that includes the non-key attribute(s) that functionally determine other non-key attribute(s)NIDNameAgeContactDetailsWardWardInChargeWardLocationAddressTelePhoneConceptual Design :

Patient ( NID , Name , Age , {CotactDetails ( Address , {Telephone})} , Ward , WardInCharge, WardLocation)

Convert this relation into 1st Normal Form, 2nd Normal Form, 3rd Normal Form

Boyce-Codd Normal FormBCNF

Simpler form of 3NF

Stricter than 3NF

Every relation in BCNF is also in 3NF

Relation in 3NF is not necessarily in BCNF

A relation schema R is in BCNF if whenever a non-trivial functional dependency X A holds in relation R, then X is a super key of R

is trivial (i.e., ) is a superkey for R

R = (A, B, C)F = {A BB C}Key = {A}R is not in BCNFDecomposition R1 = (A, B), R2 = (B, C)R1 and R2 in BCNFLossless-join decompositionDependency preserving

ExamplePROPERTY_IDLOCATIONPROVINCEAREAPRICETAX_RATEExample :Patient NoPatient NameAppointment IdTimeDoctor1John009:00Zorro2Kerr009:00Killer3Adam110:00Zorro4Robert013:00Killer5Zane114:00ZorroPatno --> PatNamePatno,appNo --> Time,doctorTime --> appNo

Example :Grade_report ( StudNo, StudName,Grade ( Major, Advisor, Grade (CourseNo, Ctitle, InstrucName, InstructLocn, Grade)))