data modelling 2
TRANSCRIPT
-
8/2/2019 Data Modelling 2
1/28
Data Modelling 2
NormalisationBy
Haik Richards
-
8/2/2019 Data Modelling 2
2/28
Normalisation versus E-R modelling
E-R modelling is a top down methodology. ielook at entities, then attributes, analyseassociations and then construct tables.
Normalisation is a bottom up methodology. ie
the analyst looks at the structure of tablesalready being used in an enterprise, and applies
normalisation (a methodology) to the tables inorder to improve the structure.
-
8/2/2019 Data Modelling 2
3/28
Normalisation simple example
Question: What should the analyst do with the above
table?
-
8/2/2019 Data Modelling 2
4/28
Normalised Tables
ie data stored in a table should be about a single entity
-
8/2/2019 Data Modelling 2
5/28
Normalisation
Often results in splitting a table into smallertables.
therefore referred to as recognize and split
method.
Question: What sort of unwanted/undesirableconsequences might there be if the employeetable was not split?
-
8/2/2019 Data Modelling 2
6/28
anomalies
anomalies refer to unexpected or unwantedeffects that poorly constructed tables canproduce.
3 common types of anomalies:-
Delete anomaly
Update anomaly
Insert anomaly
-
8/2/2019 Data Modelling 2
7/28
Delete anomaly
If tom leaves the company, and we delete that row ofthe table. What undesirable effect will happen?
We will lose the fact that we have a Sales Department.
-
8/2/2019 Data Modelling 2
8/28
Update anomaly
If the name of the department Human Resourceschanges to Human Division, how many rows will wehave to change?
..And what if we dont change all the rows? Our database will be in an inconsistent state.
-
8/2/2019 Data Modelling 2
9/28
Insert anomaly
What if we now have a new department calledMarketing, but we have not yet assigned any employeesto it?
We will be forced do introduce nulls for many fields. Nulls are undesirable because they could mean anything
-
8/2/2019 Data Modelling 2
10/28
Normalisation
So do organisations have such poorlyconstructed tables?
Yes. They are rife, everywhere.
We used simple common sense to normalise theabove table.
Is there a more precise method of doingnormalisation? Yes.
-
8/2/2019 Data Modelling 2
11/28
Normal Forms
The process of normalisation requires the analyst to workprogressively through a series of normal forms:-
First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) Boyce-Codd Normal Form (BCNF) Fourth Normal Form (4NF)
Fifth Normal Form (5NF)Most analysts will work through to 3NF. Up to 3NFmost anomalies will be eliminated.
-
8/2/2019 Data Modelling 2
12/28
A worked example up to 3NF
A typical order form
Might be stored in the following unnormalised table:
-
8/2/2019 Data Modelling 2
13/28
Normalising the table
To begin the process of normalisation we list the columnheadings in a vertical format as follows:
-
8/2/2019 Data Modelling 2
14/28
First Normal Form (1NF)
remove repeating groups
Notice that for each order in the system there may be a number of products referred to bythe order. We therefore split this information off into its own group, making sure thatwe maintain the relationship between the information in the group through thecommon attribute 'ono'. This is illustrated in the table below:
Note: The lower relation has a composite key (two attributes acting as primary key).
-
8/2/2019 Data Modelling 2
15/28
We now have 2 tables
.......but duplication and anomalies still exist!
-
8/2/2019 Data Modelling 2
16/28
Second Normal Form (2NF)
remove partial dependencies
every non-key attribute must depend on the key, the whole key,and nothing but the key (so help me Codd)
In our INF column, we have only one group with a composite
key - the second group. It is only this group that we therefore need to check to see if
there are any partial dependencies.
ie for each attribute in this group we check to see whether it
depends on the whole composite key or just part of it. If it depends on just part of the composite key, we must split it
off into another table.
-
8/2/2019 Data Modelling 2
17/28
Second Normal Form (2NF)
Only qty depends on the whole key. The other attributes depend onpno and are split off.
-
8/2/2019 Data Modelling 2
18/28
We now have 3 tables:-
.But duplication and anomalies still exist!
-
8/2/2019 Data Modelling 2
19/28
Third Normal Form (3NF)
- no transitive dependenciesNote that in the 2NF column, there is a transitive dependency between cno and cname. ie
cno can be used as a key for the attribute cname. We can therefore split off thecustomer information into its own group. This can be seen below.
-
8/2/2019 Data Modelling 2
20/28
Therefore, we now have 4 tables
We have now removed most (maybe 99%) anomalies
-
8/2/2019 Data Modelling 2
21/28
Example 2
Normalise the following drug card to 3NF:
The data is stored in the following un-normalised table:-
-
8/2/2019 Data Modelling 2
22/28
And here is the above normalised to
3NF
-
8/2/2019 Data Modelling 2
23/28
Entity Relationship Modelling &
Normalisation
Most times, analysts will construct a data model usingE-R modelling. This provides a good first stab atdesign. It is also a good starting point for datamodelling as E-R models (being diagrammatic) arerelatively easy for end-users to understand.
Normalisation will then be used to validate thecorrectness of the E-R model. To do this, each entityin the E-R model will be checked by going through thenormalisation process.
ie E-R modelling and normalisation are seen ascomplimentary methods.
-
8/2/2019 Data Modelling 2
24/28
De-normalisation:
normalisation splits database information into multipletables.
To retrieve complete information from multiple tablesrequires the use of the JOIN operation in SQL
Joins produce an overhead on processing power, andvery large joins can make retrieval timesdeteriorate. Therefore....it is sometimes decided to de-normalise relations in order to improve access time for
queries. De-normalisation is the process of combining data
from 2 or more normalised tables into a single table
-
8/2/2019 Data Modelling 2
25/28
Derived Data
Data that can be computed (calculated) should not beincluded in a normalised table. eg the total number ofemployees in a department should not be an attribute
of a department table Why?
Can create inconsistencies
Requires extra storage Can be computed using count(*) so why store it?
-
8/2/2019 Data Modelling 2
26/28
A case for storing derived data
Computations can take a long time to do on large volumes ofdatareducing query response times.
Some computations are very complex - beyond the ability ofmost managers who need access to summary statistics,aggregated values etc eg total sales for 1st quarter 2007 in North
West region for all washing machine components. Do managers/end-users know SQL? To meet the needs of managers, data warehouses are being used. Data warehouses contain the summary statistics managers need
for decision making Data warehouse software provides graphical user interfaces that
do not require managers to know SQL. Data in a Data Warehouse is not normalised.
-
8/2/2019 Data Modelling 2
27/28
Self Assessment Exercise
Represent the following Staff Allocation Sheet as an un-normalised table, andthen normalise it to 3NF
The solution will be provided on Blackboard next week.
-
8/2/2019 Data Modelling 2
28/28
Further Reading
If you are interested in reading a short article onhigher normal forms (above 3NF) visit:-
http://www.databasejournal.com/sqletc/article.php/1442971
.and here is an article on de-normalisation:-
http://www.objectarchitects.de/ObjectArchitects/orpatterns/Performance/Denormalization/CraigMullinsGuidelines/i001fe02.htm
http://www.databasejournal.com/sqletc/article.php/1442971http://www.objectarchitects.de/ObjectArchitects/orpatterns/Performance/Denormalization/CraigMullinsGuidelines/i001fe02.htmhttp://www.objectarchitects.de/ObjectArchitects/orpatterns/Performance/Denormalization/CraigMullinsGuidelines/i001fe02.htmhttp://www.objectarchitects.de/ObjectArchitects/orpatterns/Performance/Denormalization/CraigMullinsGuidelines/i001fe02.htmhttp://www.objectarchitects.de/ObjectArchitects/orpatterns/Performance/Denormalization/CraigMullinsGuidelines/i001fe02.htmhttp://www.databasejournal.com/sqletc/article.php/1442971