data modelling 2

Upload: venix

Post on 05-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 Data Modelling 2

    1/28

    Data Modelling 2

    NormalisationBy

    Haik Richards

  • 8/2/2019 Data Modelling 2

    2/28

    Normalisation versus E-R modelling

    E-R modelling is a top down methodology. ielook at entities, then attributes, analyseassociations and then construct tables.

    Normalisation is a bottom up methodology. ie

    the analyst looks at the structure of tablesalready being used in an enterprise, and applies

    normalisation (a methodology) to the tables inorder to improve the structure.

  • 8/2/2019 Data Modelling 2

    3/28

    Normalisation simple example

    Question: What should the analyst do with the above

    table?

  • 8/2/2019 Data Modelling 2

    4/28

    Normalised Tables

    ie data stored in a table should be about a single entity

  • 8/2/2019 Data Modelling 2

    5/28

    Normalisation

    Often results in splitting a table into smallertables.

    therefore referred to as recognize and split

    method.

    Question: What sort of unwanted/undesirableconsequences might there be if the employeetable was not split?

  • 8/2/2019 Data Modelling 2

    6/28

    anomalies

    anomalies refer to unexpected or unwantedeffects that poorly constructed tables canproduce.

    3 common types of anomalies:-

    Delete anomaly

    Update anomaly

    Insert anomaly

  • 8/2/2019 Data Modelling 2

    7/28

    Delete anomaly

    If tom leaves the company, and we delete that row ofthe table. What undesirable effect will happen?

    We will lose the fact that we have a Sales Department.

  • 8/2/2019 Data Modelling 2

    8/28

    Update anomaly

    If the name of the department Human Resourceschanges to Human Division, how many rows will wehave to change?

    ..And what if we dont change all the rows? Our database will be in an inconsistent state.

  • 8/2/2019 Data Modelling 2

    9/28

    Insert anomaly

    What if we now have a new department calledMarketing, but we have not yet assigned any employeesto it?

    We will be forced do introduce nulls for many fields. Nulls are undesirable because they could mean anything

  • 8/2/2019 Data Modelling 2

    10/28

    Normalisation

    So do organisations have such poorlyconstructed tables?

    Yes. They are rife, everywhere.

    We used simple common sense to normalise theabove table.

    Is there a more precise method of doingnormalisation? Yes.

  • 8/2/2019 Data Modelling 2

    11/28

    Normal Forms

    The process of normalisation requires the analyst to workprogressively through a series of normal forms:-

    First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) Boyce-Codd Normal Form (BCNF) Fourth Normal Form (4NF)

    Fifth Normal Form (5NF)Most analysts will work through to 3NF. Up to 3NFmost anomalies will be eliminated.

  • 8/2/2019 Data Modelling 2

    12/28

    A worked example up to 3NF

    A typical order form

    Might be stored in the following unnormalised table:

  • 8/2/2019 Data Modelling 2

    13/28

    Normalising the table

    To begin the process of normalisation we list the columnheadings in a vertical format as follows:

  • 8/2/2019 Data Modelling 2

    14/28

    First Normal Form (1NF)

    remove repeating groups

    Notice that for each order in the system there may be a number of products referred to bythe order. We therefore split this information off into its own group, making sure thatwe maintain the relationship between the information in the group through thecommon attribute 'ono'. This is illustrated in the table below:

    Note: The lower relation has a composite key (two attributes acting as primary key).

  • 8/2/2019 Data Modelling 2

    15/28

    We now have 2 tables

    .......but duplication and anomalies still exist!

  • 8/2/2019 Data Modelling 2

    16/28

    Second Normal Form (2NF)

    remove partial dependencies

    every non-key attribute must depend on the key, the whole key,and nothing but the key (so help me Codd)

    In our INF column, we have only one group with a composite

    key - the second group. It is only this group that we therefore need to check to see if

    there are any partial dependencies.

    ie for each attribute in this group we check to see whether it

    depends on the whole composite key or just part of it. If it depends on just part of the composite key, we must split it

    off into another table.

  • 8/2/2019 Data Modelling 2

    17/28

    Second Normal Form (2NF)

    Only qty depends on the whole key. The other attributes depend onpno and are split off.

  • 8/2/2019 Data Modelling 2

    18/28

    We now have 3 tables:-

    .But duplication and anomalies still exist!

  • 8/2/2019 Data Modelling 2

    19/28

    Third Normal Form (3NF)

    - no transitive dependenciesNote that in the 2NF column, there is a transitive dependency between cno and cname. ie

    cno can be used as a key for the attribute cname. We can therefore split off thecustomer information into its own group. This can be seen below.

  • 8/2/2019 Data Modelling 2

    20/28

    Therefore, we now have 4 tables

    We have now removed most (maybe 99%) anomalies

  • 8/2/2019 Data Modelling 2

    21/28

    Example 2

    Normalise the following drug card to 3NF:

    The data is stored in the following un-normalised table:-

  • 8/2/2019 Data Modelling 2

    22/28

    And here is the above normalised to

    3NF

  • 8/2/2019 Data Modelling 2

    23/28

    Entity Relationship Modelling &

    Normalisation

    Most times, analysts will construct a data model usingE-R modelling. This provides a good first stab atdesign. It is also a good starting point for datamodelling as E-R models (being diagrammatic) arerelatively easy for end-users to understand.

    Normalisation will then be used to validate thecorrectness of the E-R model. To do this, each entityin the E-R model will be checked by going through thenormalisation process.

    ie E-R modelling and normalisation are seen ascomplimentary methods.

  • 8/2/2019 Data Modelling 2

    24/28

    De-normalisation:

    normalisation splits database information into multipletables.

    To retrieve complete information from multiple tablesrequires the use of the JOIN operation in SQL

    Joins produce an overhead on processing power, andvery large joins can make retrieval timesdeteriorate. Therefore....it is sometimes decided to de-normalise relations in order to improve access time for

    queries. De-normalisation is the process of combining data

    from 2 or more normalised tables into a single table

  • 8/2/2019 Data Modelling 2

    25/28

    Derived Data

    Data that can be computed (calculated) should not beincluded in a normalised table. eg the total number ofemployees in a department should not be an attribute

    of a department table Why?

    Can create inconsistencies

    Requires extra storage Can be computed using count(*) so why store it?

  • 8/2/2019 Data Modelling 2

    26/28

    A case for storing derived data

    Computations can take a long time to do on large volumes ofdatareducing query response times.

    Some computations are very complex - beyond the ability ofmost managers who need access to summary statistics,aggregated values etc eg total sales for 1st quarter 2007 in North

    West region for all washing machine components. Do managers/end-users know SQL? To meet the needs of managers, data warehouses are being used. Data warehouses contain the summary statistics managers need

    for decision making Data warehouse software provides graphical user interfaces that

    do not require managers to know SQL. Data in a Data Warehouse is not normalised.

  • 8/2/2019 Data Modelling 2

    27/28

    Self Assessment Exercise

    Represent the following Staff Allocation Sheet as an un-normalised table, andthen normalise it to 3NF

    The solution will be provided on Blackboard next week.

  • 8/2/2019 Data Modelling 2

    28/28

    Further Reading

    If you are interested in reading a short article onhigher normal forms (above 3NF) visit:-

    http://www.databasejournal.com/sqletc/article.php/1442971

    .and here is an article on de-normalisation:-

    http://www.objectarchitects.de/ObjectArchitects/orpatterns/Performance/Denormalization/CraigMullinsGuidelines/i001fe02.htm

    http://www.databasejournal.com/sqletc/article.php/1442971http://www.objectarchitects.de/ObjectArchitects/orpatterns/Performance/Denormalization/CraigMullinsGuidelines/i001fe02.htmhttp://www.objectarchitects.de/ObjectArchitects/orpatterns/Performance/Denormalization/CraigMullinsGuidelines/i001fe02.htmhttp://www.objectarchitects.de/ObjectArchitects/orpatterns/Performance/Denormalization/CraigMullinsGuidelines/i001fe02.htmhttp://www.objectarchitects.de/ObjectArchitects/orpatterns/Performance/Denormalization/CraigMullinsGuidelines/i001fe02.htmhttp://www.databasejournal.com/sqletc/article.php/1442971