data modelling 2

8/2/2019 Data Modelling 2

1/28

Data Modelling 2

NormalisationBy

Haik Richards


2/28

Normalisation versus E-R modelling

E-R modelling is a top down methodology. ielook at entities, then attributes, analyseassociations and then construct tables.

Normalisation is a bottom up methodology. ie

the analyst looks at the structure of tablesalready being used in an enterprise, and applies

normalisation (a methodology) to the tables inorder to improve the structure.


3/28

Normalisation simple example

Question: What should the analyst do with the above

table?


4/28

Normalised Tables

ie data stored in a table should be about a single entity


5/28

Normalisation

Often results in splitting a table into smallertables.

therefore referred to as recognize and split

method.

Question: What sort of unwanted/undesirableconsequences might there be if the employeetable was not split?


6/28

anomalies

anomalies refer to unexpected or unwantedeffects that poorly constructed tables canproduce.

3 common types of anomalies:-

Delete anomaly

Update anomaly

Insert anomaly


7/28

Delete anomaly

If tom leaves the company, and we delete that row ofthe table. What undesirable effect will happen?

We will lose the fact that we have a Sales Department.


8/28

Update anomaly

If the name of the department Human Resourceschanges to Human Division, how many rows will wehave to change?

..And what if we dont change all the rows? Our database will be in an inconsistent state.


9/28

Insert anomaly

What if we now have a new department calledMarketing, but we have not yet assigned any employeesto it?

We will be forced do introduce nulls for many fields. Nulls are undesirable because they could mean anything


10/28

Normalisation

So do organisations have such poorlyconstructed tables?

Yes. They are rife, everywhere.

We used simple common sense to normalise theabove table.

Is there a more precise method of doingnormalisation? Yes.


11/28

Normal Forms

The process of normalisation requires the analyst to workprogressively through a series of normal forms:-

First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) Boyce-Codd Normal Form (BCNF) Fourth Normal Form (4NF)

Fifth Normal Form (5NF)Most analysts will work through to 3NF. Up to 3NFmost anomalies will be eliminated.


12/28

A worked example up to 3NF

A typical order form

Might be stored in the following unnormalised table:


13/28

Normalising the table

To begin the process of normalisation we list the columnheadings in a vertical format as follows:


14/28

First Normal Form (1NF)

remove repeating groups

Notice that for each order in the system there may be a number of products referred to bythe order. We therefore split this information off into its own group, making sure thatwe maintain the relationship between the information in the group through thecommon attribute 'ono'. This is illustrated in the table below:

Note: The lower relation has a composite key (two attributes acting as primary key).


15/28

We now have 2 tables

.......but duplication and anomalies still exist!


16/28

Second Normal Form (2NF)

remove partial dependencies

every non-key attribute must depend on the key, the whole key,and nothing but the key (so help me Codd)

In our INF column, we have only one group with a composite

key - the second group. It is only this group that we therefore need to check to see if

there are any partial dependencies.

ie for each attribute in this group we check to see whether it

depends on the whole composite key or just part of it. If it depends on just part of the composite key, we must split it

off into another table.


17/28

Second Normal Form (2NF)

Only qty depends on the whole key. The other attributes depend onpno and are split off.


18/28

We now have 3 tables:-

.But duplication and anomalies still exist!


19/28

Third Normal Form (3NF)

- no transitive dependenciesNote that in the 2NF column, there is a transitive dependency between cno and cname. ie

cno can be used as a key for the attribute cname. We can therefore split off thecustomer information into its own group. This can be seen below.


20/28

Therefore, we now have 4 tables

We have now removed most (maybe 99%) anomalies


21/28

Example 2

Normalise the following drug card to 3NF:

The data is stored in the following un-normalised table:-


22/28

And here is the above normalised to

3NF


23/28

Entity Relationship Modelling &

Normalisation

Most times, analysts will construct a data model usingE-R modelling. This provides a good first stab atdesign. It is also a good starting point for datamodelling as E-R models (being diagrammatic) arerelatively easy for end-users to understand.

Normalisation will then be used to validate thecorrectness of the E-R model. To do this, each entityin the E-R model will be checked by going through thenormalisation process.

ie E-R modelling and normalisation are seen ascomplimentary methods.


24/28

De-normalisation:

normalisation splits database information into multipletables.

To retrieve complete information from multiple tablesrequires the use of the JOIN operation in SQL

Joins produce an overhead on processing power, andvery large joins can make retrieval timesdeteriorate. Therefore....it is sometimes decided to de-normalise relations in order to improve access time for

queries. De-normalisation is the process of combining data

from 2 or more normalised tables into a single table


25/28

Derived Data

Data that can be computed (calculated) should not beincluded in a normalised table. eg the total number ofemployees in a department should not be an attribute

of a department table Why?

Can create inconsistencies

Requires extra storage Can be computed using count(*) so why store it?


26/28

A case for storing derived data

Computations can take a long time to do on large volumes ofdatareducing query response times.

Some computations are very complex - beyond the ability ofmost managers who need access to summary statistics,aggregated values etc eg total sales for 1st quarter 2007 in North

West region for all washing machine components. Do managers/end-users know SQL? To meet the needs of managers, data warehouses are being used. Data warehouses contain the summary statistics managers need

for decision making Data warehouse software provides graphical user interfaces that

do not require managers to know SQL. Data in a Data Warehouse is not normalised.


27/28

Self Assessment Exercise

Represent the following Staff Allocation Sheet as an un-normalised table, andthen normalise it to 3NF

The solution will be provided on Blackboard next week.


28/28

Further Reading

If you are interested in reading a short article onhigher normal forms (above 3NF) visit:-

http://www.databasejournal.com/sqletc/article.php/1442971

.and here is an article on de-normalisation:-

http://www.objectarchitects.de/ObjectArchitects/orpatterns/Performance/Denormalization/CraigMullinsGuidelines/i001fe02.htm
http://www.databasejournal.com/sqletc/article.php/1442971http://www.objectarchitects.de/ObjectArchitects/orpatterns/Performance/Denormalization/CraigMullinsGuidelines/i001fe02.htmhttp://www.objectarchitects.de/ObjectArchitects/orpatterns/Performance/Denormalization/CraigMullinsGuidelines/i001fe02.htmhttp://www.objectarchitects.de/ObjectArchitects/orpatterns/Performance/Denormalization/CraigMullinsGuidelines/i001fe02.htmhttp://www.objectarchitects.de/ObjectArchitects/orpatterns/Performance/Denormalization/CraigMullinsGuidelines/i001fe02.htmhttp://www.databasejournal.com/sqletc/article.php/1442971

data modelling 2

Documents