a2 computing normalisation

10
Normalisation Entity-relationship modelling is a useful technique for arriving at a set of entities that can be modelled by relations in the relational database theory such that: All possible relationships between the data are allowed for Unnecessary duplication of data is avoided Altering data is not unnecessarily time-consuming Altering data does not lead to inconsistencies The set of entities that satisfy the above requirements is said to be a normalised set of entities. This set then produces a normalised set of relations in the relational database theory by the insertion of foreign keys to model the relationships. A normalised set of relations contains no redundant data. In this section a second technique for arriving at a set of normalised relations is described. This technique is called normalisation. This technique is an alternative to E-R modelling. It provides a means of checking the completeness, accuracy and consistency of the E-R model. We start by considering a simple example for which the data requirements are as follows: A college offers a number of different courses. Students enrol to study one or more courses up to a maximum of three. A particular course is taught by one lecturer only. Students when they enrol are allocated a unique student enrolment number and have their name, date of birth, gender and courses they enrol for recorded. Each course has a course title and a unique course code. Lecturers are assigned a unique staff number and have their name recorded. Instead of identifying the entities and their relationships we immediately write down a single relation called Student consisting of every single attribute identified in the data requirements. To give flesh to this task we also construct the

Upload: robj

Post on 12-Nov-2014

432 views

Category:

Documents


2 download

DESCRIPTION

CPT 5 Course document.

TRANSCRIPT

Page 1: A2 Computing Normalisation

Normalisation

Entity-relationship modelling is a useful technique for arriving at a set of entities that can be modelled by relations in the relational database theory such that:

All possible relationships between the data are allowed for

Unnecessary duplication of data is avoided

Altering data is not unnecessarily time-consuming

Altering data does not lead to inconsistencies

The set of entities that satisfy the above requirements is said to be a normalised set of entities. This set then produces a normalised set of relations in the relational database theory by the insertion of foreign keys to model the relationships.

A normalised set of relations contains no redundant data.

In this section a second technique for arriving at a set of normalised relations is described. This technique is called normalisation. This technique is an alternative to E-R modelling. It provides a means of checking the completeness, accuracy and consistency of the E-R model.

We start by considering a simple example for which the data requirements are as follows:

A college offers a number of different courses. Students enrol to study one or more courses up to a maximum of three. A particular course is taught by one lecturer only. Students when they enrol are allocated a unique student enrolment number and have their name, date of birth, gender and courses they enrol for recorded. Each course has a course title and a unique course code. Lecturers are assigned a unique staff number and have their name recorded.

Instead of identifying the entities and their relationships we immediately write down a single relation called Student consisting of every single attribute identified in the data requirements. To give flesh to this task we also construct the table equivalent of this relation and populate it with some example data. The primary key is EnrolNo.

Page 2: A2 Computing Normalisation

Relation Student

Student(EnrolNo, StudentName, DateOfBirth, Gender, CourseCode, CourseTitle, StaffNo, LecturerName)

Table Student

EnrolNo

StudentName

DateOfBirth

Gender CourseCode CourseTitle StaffNo LecturerName

15898 Bond, K 12/4/79 M AQA0643 A Level Computing 1234 Mead, C

24298 Smith, K 15/6/79 F UCL0675AQA0643 AQA0432

A Level MathsA Level ComputingA Level IT

567812341234

Davies, DMead, CMead, C

10598 Roberts, C

20/2/80 M EDE0187 A Level Art 9123 Milsom, C

13497 Nixon, T 28/9/79 F UOC0987 A Level French 4567 Crapper, T

There are some problems with the way that data is stored, currently, in this table. For example, K. Bond is registered for one course, at present, but he might register for more courses. At present, there is no space for these extra courses in K. Bond’s row in the table.

Listing some problems that arise with the present table:

How much space should be allowed for a student studying more than one course?

What should happen if one student is enrolled for more courses than the space permits?

How should we determine, for example, all the students who are on a particular course?

The lecturer C Mead has his name stored three times when once is enough thus wasting storage space.

These problems are typical of a relation that is in an un-normalised state such as the relation Student.

Repeating Group

These problems arise because the courses that the student Bond studies are held in a group known as repeating group.

The first step of the normalisation process is to remove all repeating groups. The relation/table is then said to be in first normal form (1NF).

Repeating Group

Page 3: A2 Computing Normalisation

A repeating group is indicated in the relation form of the table by drawing a line over the attributes in the group:

Student(EnrolNo, StudentName, DateOfBirth, Gender, CourseCode, CourseTitle, StaffNo, LecturerName)

First Normal Form (1NF)

The first normal form of the table is shown below:

EnrolNo StudentName

DateOfBirth

Gender CourseCode CourseTitle StaffNo

LecturerName

15898 Bond, K 12/4/79 M AQA0643 A Level Computing

1234 Mead, C

24298 Smith, K 15/6/79 F UCL0675 A Level Maths 5678 Davies, D24298 Smith, K 15/6/79 F AQA0643 A Level

Computing1234 Mead, C

24298 Smith, K 15/6/79 F AQA0432 A Level IT 1234 Mead, C10598 Roberts,

C20/2/80 M EDE0187 A Level Art 9123 Milsom, C

13497 Nixon, T 28/9/79 F UOC0987 A Level French 4567 Crapper, T

EnrolNo on its own no longer satisfies the criterion of uniqueness and so a new primary key must be found. EnrolNo together with CourseCode becomes the new primary key. The relation in first normal form is shown below:

Student(EnrolNo, CourseCode, StudentName, DateOfBirth, Gender, CourseTitle, StaffNo, LecturerName)

Page 4: A2 Computing Normalisation

Second Normal Form (2NF)

The next stage is to remove any non-primary key attributes that depend only upon part of the primary key to separate relations. This step will transform a relation that is in 1NF to two or more relations that are in 2NF form. Note that a relation that is in 1NF may already be in 2NF if the condition for 2NF is also satisfied.

Attributes StudentName, DateOfBirth, Gender depend upon EnrolNo not EnrolNo,CourseCode, the new primary key. Attributes CourseTitle, StaffNo and LecturerName depend upon CourseCode not EnrolNo, CourseCode.

Therefore, we create two new relations called Course and StudentCourse leaving a reduced Student as shown below:

Course(CourseCode, CourseTitle, StaffNo, LecturerName)

StudentCourse(EnrolNo, CourseCode)

Student(EnrolNo, StudentName, DateOfBirth, Gender)

Third Normal Form (3NF)

The next stage is to remove any non-primary key attributes that are not solely directly dependent on the key to separate relations. This step will transform a relation that is in 2NF to two or more relations that are in 3NF. Note that a relation that is in 2NF may already be in 3NF if the condition for 3NF is also satisfied.

LecturerName is dependent on the primary key CourseCode of relation Course but it is also dependent upon StaffNo.

Therefore, we create a new relation called Staff. LecturerName is removed from relation Course together with a copy of attribute StaffNo as shown below. A copy of StaffNo must be left behind in relation Course otherwise it will not be possible to determine the lecturer assigned to teach a particular course.

Course(CourseCode, CourseTitle, StaffNo)

Staff(StaffNo, LecturerName)

StudentCourse(EnrolNo, CourseCode)

Student(EnrolNo, StudentName, DateOfBirth, Gender)

Remember, a particular course is taught by just one lecturer

Page 5: A2 Computing Normalisation

We have arrived at a set of tables in which there is no unnecessary duplication, i.e. no redundancy. Updating the database is straightforward and avoids potentially inconsistent results. For example, if a female lecturer marries and therefore changes her surname, the change is made in one place only. In the un-normalised table, if the lecturer name to be changed is C Mead, this requires changing in three places. If, let’s say, the changes are carried out in only two of the three places accidentally, then the database will inconsistently reflect the new status of C Mead. The changes will also take longer to make in the case of the un-normalised database.

Question 6.1

SalesmanId CustomerId SalesmanName1 1, 2 Archer, F2 3,4 Dent, A3 5,6 Rogers, K

(a) Explain why this table is un-normalised.

(b) Normalise this table into 3NF.

Question 6.2

PatientId GPId GPName1 1 Biggs, F2 1 Biggs, F3 2 Smith, K4 2 Smith, K5 3 Timms, B6 3 Timms, B

The table contains redundant data. These GPNames may be deleted without loss of information. For example, PatientId value 2 in row 2 is associated with GPId value 1. GPName of GPId 1 can be looked up in row 1.

Normalise the data in the table into 3NF.

Question 6.3

LecturerNo StudentNo StudentName1 1 Ainsley, F2 2 Carter, M3 3 Fellows, K

?

?

Page 6: A2 Computing Normalisation

4 4 Minns, A5 5 Potts, B6 6 Ainsley, F

(a) Does this table contain any redundant information?

(b) Is this table in 3NF? Explain your answer.

Question 6.4

Subject StaffNo StaffNameComputing 1 Abbot, FGeology 2 Martin, MMaths 3 Frear, KMaths 4 Martin, A

(a) Is there any redundancy in this table?

(b) Is this table in 3NF? Explain your answer.

This is a trick question. Do not assume that LecturerNo can be duplicated.

Page 7: A2 Computing Normalisation

Question 6.5

PatientNo

PatientName WardName WardType

1456 Smith, J Nightingale Orthopaedic1461 Berry, N Barnard Cardiac1468 Thomas, V Barnard Cardiac1472 Harley, D Guttman Orthopaedic1478 Alton, L Barnard Cardiac1483 Noggs, C Spens Geriatric1497 Smith, B Nightingale Orthopaedic

(a) In what ways does this table contain redundant data?

(b) Split this table into three separate tables, ensuring that each table contains no redundant data. Select and indicate a primary key for each new table.

(c) Is each of the new tables in 3NF? Explain your answer.

Database Server Define and explain the operation of a database server.

?