fit1004 database topic 6: normalisation - pravin … · fit1004 database topic 6: normalisation ......

www.infotech.monash.edu.au/FIT1004/

FIT1004 DatabaseTopic 6: Normalisation

Learning Objectives:• Understand the purpose of normalisation• Understand the problems associated with redundant data• Identify various types of update anomalies such as insertion, deletion, and

modification anomalies• Recognise the appropriateness or quality of the design of relations• Identify various types of functional dependencies between attributes• Understand how functional dependencies can be used to group attributes into

relations that are in a known normal form• Identify the most commonly used normal forms, namely 1NF, 2NF and 3NF• Perform normalisation• Understand various ways to refine 3NF relations to achieve better database design• Produce an ER diagram from the derived set of 3NF relationsReferences:• Rob, P. & Coronel, C., Database Systems, 6th Edition, Chapt. 5, p. 182 – 221, 7th

Edition, Chapt. 5, p. 147 – 174

2

Where are we?

Introduction to Database Systems The Relational Model

Conceptual Design Logical Design Normalisation

Database Lifecycle Physical Design

SQL (DML) SQL (DDL & DCL) Implementation Transaction Management

Database Administration

Data Warehousing & Data Mining

3

Normalisation

• Normalisation is a technique for producing a set of relations with desirable properties, given the data requirements of an enterprise:

– Developed by E.F. Codd (1972)– Often performed as a series of tests on a relation to determine whether it satisfies

or violates the requirements of a given normal form

• Four most commonly used normal forms are: – First (1NF), – Second (2NF), – Third (3NF) – normally sufficient point, and – Boyce-Codd (BCNF)– 4NF, …. etc (required by some very specialised applications)

• Based on functional dependencies among the attributes of a relation

• Major aim of relational database design is to group attributes into relations to minimise data redundancy and reduce file storage space required by base relations

4

Why Normalisation is required

Note * signifies Project Leader

5

Problems with table in Figure 5.1

• PROJ_NUM intended to be primary key, but it contains nulls• JOB_CLASS invites entry errors eg. Elec. Eng. vs Elect. Engineer vs

E.E.• Project relation has redundant data

– details of a charge per hour are repeated for every occurrence of job class– Every time an employee is assigned to a project emp name repeated

• Relations that contain redundant information may potentially suffer from update anomalies

– Types of update anomalies include:> Insertion

– Insert a new employee only if they are assigned to a project> Deletion

– Delete the last employee assigned to a project?– Delete the last employee of a particular job class?

> Modification– Update a job class hourly rate - need to update multiple rows

6

Functional Dependence

• An attribute B is FUNCTIONALLY DEPENDENT on another attribute A, if a value of A determines a single value of B at any one time.

– A B– EMP# EMP_NAME– CUSTNUMB CUSTNAME– ORDER-NUMBER ORDER-DATE

> ORDER-NUMBER - independent variable, also know as DETERMINANT> ORDER-DATE - dependant variable

• TOTAL DEPENDENCY – attribute A determines B AND attribute B determines A

> EMPLOYEE-NUMBER TAX-FILE-NUMBER

7

Functional Dependence

• FULL DEPENDENCY – occurs when an attribute is always dependant on AT LEAST TWO other attributes– ORDER-NUMBER, PART-NUMBER QTY-ORDERED– lack of full dependence for multiple attribute key = partial dependence

• TRANSITIVE DEPENDENCY– occurs when Y depends on X, and Z depends on Y - thus Z also depends on X

> X Y Z– INVOICE-NUMB CUSTOMER-NUMB CUSTOMER-NAME

• Dependencies are depicted with the help of a Dependency Diagram• NORMALISATION - SIMPLY 'COMMON SENSE'• Converts a table into tables of progressively smaller degree and

cardinality until an optimum level of decomposition is reached -little or no data redundancy exists

8

First Normal Form

• Positive results from normalisation -– amount of space needed to store data will be lower– table can be updated with greater efficiency– description of database will be straightforward

• Unnormalised form (UNF) – raw data from table/form/grid• UNF: PROJECT (proj_num, proj_name (emp_num, emp_name, ….))

– Figure 5.1 consists of a set of projects with each project having a set of project-employee details (model 1)

• FIRST NORMAL FORM (part of formal definition of relation)– A TABLE IS IN FIRST NORMAL FORM (1NF) IF -

> it is a valid table (in particular no repeating groups)> a unique key has been identified for each row> all attributes are functionally dependant on all or part of the key

– 1NF: PROJECT (proj_num, proj_name)– 1NF: ASSIGN (proj_num, emp_num, emp_name, job_class,

chg_hour, assign_hours)

9

UNF to 1NF transformation

• Identify the repeating group(s), if any, in the unnormalised relation• Move from UNF to 1NF by removing repeating group along with the PK

of the main relation• Important property of normalisation decomposition

– Lossless-join property enables us to find any instance of the original relation from corresponding instances in the smaller relations

– hence must extract PK of main relation• Determine PK of new relations created

– extracted repeating group will normally have a composite PK including the main relations PK

> but NOT always, PK of main relation may simply act as a FK– INSURED (comp_code, comp_name (insured_id,

insured_name, ..))» COMPANY (comp_code, comp_name)» INSURED (insured_id, comp_code ,insured_name, ..)

10

First Normal Form continued• An alternative way (model 2) of looking at this scenario

– Present data in tabular format, where each cell has single value and there are no repeating groups

– Eliminate repeating groups, eliminate nulls by making sure that each repeating group attribute contains an appropriate data value

11

Model 2: Dependency Diagram (1NF)

12

1NF to 2NF

• A RELATION IS IN 2NF IF -– all non key attributes are functionally dependent on the entire key

> ie. no partial dependencies exist• Model 1:• Move from 1NF to 2NF by removing partial dependencies

– 1NF: PROJECT (proj_num, proj_name)– 1NF: ASSIGN (proj_num, emp_num, emp_name, job_class,

chg_hour, assign_hours)• 1NF: PROJECT (proj_num, proj_name)

– already in 2NF only one attribute in PK thus CANNOT be any partial dependencies

> 2NF: PROJECT (proj_num, proj_name)• 1NF: ASSIGN (proj_num, emp_num, emp_name, job_class,

chg_hour, assign_hours)– becomes

> 2NF EMPLOYEE (emp_num, emp_name, job_class, chg_hour)> 2NF ASSIGN (proj_num, emp_num, assign_hours)

13

2NF Conversion Results (Model 1 & 2)

Note Model 1 & 2 now equivalent

14

2NF to 3NF

• A RELATION IS IN 3NF IF -– all transitive dependencies have been removed - check for non key

attribute dependant on another non key attribute• Move from 2NF to 3NF by removing transitive dependencies

– 2NF: PROJECT (proj_num, proj_name)– 2NF EMPLOYEE (emp_num, emp_name, job_class, chg_hour)– 2NF ASSIGN (proj_num, emp_num, assign_hours)

• PROJECT and ASSIGN already in 3NF– 3NF: PROJECT (proj_num, proj_name)– 3NF ASSIGN (proj_num, emp_num, assign_hours)

• 2NF EMPLOYEE (emp_num, emp_name, job_class, chg_hour)– 3NF EMPLOYEE (emp_num, emp_name, job_class)– 3NF JOB (job_class, chg_hour)

15

3NF Conversion Results

16

Improving the Design

• To improve the design of the database the following changes could be made: – PK assignment– Naming conventions– Attribute atomicity– Adding attributes– Adding relationships– Refining PKs– Maintaining historical accuracy– Using derived attributes

17

Improving the Design continued

• Returning to Table 5.1 (slide 4)– Data loss – who is the project leader?

> modify project (R&C approach)– 3NF: PROJECT (proj_num, proj_name, emp_num)

> Alternative, add emp_num at UNF> Do not use synonyms when naming attributes – always use the

same name for the same attribute eg. Do not make emp_num in PROJECT leader_num

– JOB (job_class, chg_hour)> Job_class is a string eg. Systems Analyst

– Redundant data with associated issues, poor PK– Better to create job code

> modify job (R&C approach)– 3NF JOB (job_code, job_description, job_chg_hour)

> Alternative, make changes at UNF

18

Completed Database

19

Completed Database continued

20

Entire Process UNF to 3NF• UNF

– PROJECT (proj_num, proj_name, emp_num (emp_num, emp_name, job_code, job_description, job_chg_hour, assign_hours))

• 1NF – remove repeating group and identify PK– PROJECT (proj_num, proj_name, emp_num)– ASSIGN (proj_num, emp_num, emp_name, job_code, job_description,

job_chg_hour, assign_hours)• 2NF – remove partial dependencies

– PROJECT (proj_num, proj_name, emp_num)– EMPLOYEE (emp_num, emp_name, job_code, job_description,

job_chg_hour)– ASSIGN (proj_num, emp_num, assign_hours)

• 3NF – remove transitive dependencies– PROJECT (proj_num, proj_name, emp_num)– EMPLOYEE (emp_num, emp_name, job_code)– ASSIGN (proj_num, emp_num, assign_hours)– JOB (job_code, job_description, job_chg_hour)

• Note R&C show some further 'suggested' improvements

21

Normalisation presented as a Conceptual ERD

22

Normalisation presented as a Logical ERD

23

Normalisation and Database Design

• Normalisation should be part of design process• Make sure that proposed entities meet required normal form before

table structures are created• ER diagram

– Provides the big picture, or macro view, of an organization’s data requirements and operations

– Created through an iterative process

> Identifying relevant entities, their attributes and their relationship

> Use results to identify additional entities and attributes• normalisation procedures

– Focus on the characteristics of specific entities– A micro view of the entities within the ER diagram

• Difficult to separate normalisation process from ER modeling process

• Two techniques should be used concurrently

24

Normalisation and ER DiagramsNormalisation and ER Diagrams

• Top down approach• Fast• Examine requirements • Business knowledge

• Bottom up approach• Very slow• Examine existing data• Mathematically based

NormalisationER Diagramming

• Top down create - bottom up checking• Accuracy• Greater understanding of the data

25

Summary

• This lecture– Understand the purpose of normalisation– Understand the problems associated with redundant data– Identify various types of update anomalies such as insertion,

deletion, and modification anomalies– Recognise the appropriateness or quality of the design of relations– Identify various types of functional dependencies between

attributes– Understand how functional dependencies can be used to group

attributes into relations that are in a known normal form– Identify the most commonly used normal forms, namely 1NF, 2NF

and 3NF– Perform normalisation– Understand various ways to refine 3NF relations to achieve better

database design– Produce an ER diagram from the derived set of 3NF relations

• Next lecture– Structured Query Language (SQL) - DML

fit1004 database topic 6: normalisation - pravin … · fit1004 database topic 6: normalisation ......

Documents