topic 05 & 06 : the relational model

64
Er. Pradip Kharbuja Topic 5 & 6 The Relational Model

Upload: pradip-kharbuja

Post on 18-Jan-2015

2.046 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Topic 05 & 06 : The Relational Model

Er. Pradip Kharbuja

Topic 5 & 6The Relational Model

Page 2: Topic 05 & 06 : The Relational Model

Terminology

1. Relation

A relation is a table with columns and rows.

2. Attribute

The columns in a relation are known as attributes.

3. Domain

A domain or attribute domain is the set of allowable values for one or more attributes.

4. Tuple

A tuple is a row of a relation. They are also called the records.

Page 3: Topic 05 & 06 : The Relational Model

Terminology

5. Degree

The degree of a relation is the number of attributes it has. Example, the department table has 3 attributes. So, is has degree three.

6. Cardinality

The cardinality of a relation is the number of tuples it contains.

7. Relational Database

A collection of normalized relations with distinct or unique relation names. It consists of relations that are appropriately structured having no repeating groups. This is known as Normalization.

Page 4: Topic 05 & 06 : The Relational Model

Student Table

Student ID First Name Last Name Course Code

S334 Dave Watson COMP

S765 Jagpal Jutley COMP

S783 Cynthia Kodogo HIST

S111 Walace Antigone LIT

4 T

uple

s

4 Degree

Card

ina

lity

4 Attributes

Page 5: Topic 05 & 06 : The Relational Model

Alternative Terminology

Formal Term Alternative 1 Alternative 2

Relation Table File

Tuple Row Record

Attribute Column Field

Page 6: Topic 05 & 06 : The Relational Model

Background to Relational Model

Proposed by E.F. Codd in 1970 in his seminal paper “A relational model of data for large shared data banks”

In the relational model of a database, all data is represented in terms of tuples, grouped into relations.

A database organized in terms of the relational model is a relational database.

The purpose of the relational model is to provide a declarative method for specifying data and queries : users directly state what information they want from database and let the database management system software take care of describing data structures for storing the data.

Page 7: Topic 05 & 06 : The Relational Model

RDBMS

A relational database management system (RDBMS) is a database management system (DBMS) that is based on the relational model as introduced by E.F. Codd.

Dominates the market in databases

Many popular databases currently in use are based on the relational database model. e.g. Oracle, MySQL, Microsoft SQL Server, etc.

Second generation of DBMSs

The first generation of database technology started in the 60's and continued into the 70's.

Page 8: Topic 05 & 06 : The Relational Model

Do not confuse relations with

relationships in ER models

Page 9: Topic 05 & 06 : The Relational Model

Objectives of Relational Model

1. To allow a high degree of data independence.

2. To reduce the redundancy of relations. Relations should be normalized.

3. To enable the expansion of set-orientated data manipulation languages.

Page 10: Topic 05 & 06 : The Relational Model

Data Independence

Application programs must not be affected by modifications to the internal data representation, particularly by changes to file structure, record orderings, access paths, or using different storage devices.

Page 11: Topic 05 & 06 : The Relational Model

Normalized Relation

Codd's paper introduced the concept of normalized relation, i.e. relations having no repeating groups.

The process of normalization is about structuring the data so as to minimize redundancy and duplication.

Ideally, an item of data should be stored only in one place.

In practice, there is some duplication due to the use of foreign keys.

Page 12: Topic 05 & 06 : The Relational Model

Set-Orientated Data Manipulation Languages

SQL is based on the set theory from Mathematics.

The relational model has used languages like relational algebra and relational calculus from set theory from Mathematics to express data manipulation.

e.g. UNION, Cartesian Product, Intersection, etc.

Page 13: Topic 05 & 06 : The Relational Model

Presentation

1. System R

2. INGRES

3. Peterlee Relational Test Vehicle(PRTV)

Page 14: Topic 05 & 06 : The Relational Model

History – Practical Developments - 1

System R. Developed by IBM's San Jose laboratory in late 1970s and involved some of the key people in the early development of databases, such as Codd and Boyce.

System R was the first implementation of SQL.

Development of commercial database systems DB2; SQL/DS; Oracle

Other aspects are transaction management, concurrency control, recovery techniques, query optimization, data security, data integrity, user interfaces.

It was also the first system to demonstrate that a relational database management system could provide good transaction processing performance.

Page 15: Topic 05 & 06 : The Relational Model

History – Practical Developments - 2

INGRES. Developed by University of California in late 1970s

INGRES stands for Interactive Graphics Retrieval System.

It was used to investigate the concepts of the relational model.

It is a commercially supported, open-source SQL relational database management system.

Ingres spawned a number of commercial database applications, including Sybase, Microsoft SQL Server.

Postgres (Post Ingres), a project which started in the mid-1980s, later evolved into PostgreSQL.

Page 16: Topic 05 & 06 : The Relational Model

History - Practical Developments - 3

Peterlee Relational Test Vehicle(PRTV). Developed at IBM UK in 1976

It was the first relational database to be able to handle large volumes of data in term of both rows and columns.

It was a relational query system with powerful query facilities, but very limited update facility and no simultaneous multiuser facility.

Page 17: Topic 05 & 06 : The Relational Model

Properties of a Relation

It has a name which is unique within the relational schema. e.g. department_name column should not contain values other than department's name.

Each cell of a relation contains exactly one value.

Each attribute has a name.

Each tuple is unique.

The order of attributes is insignificant.

The order of tuples is insignificant.

Page 18: Topic 05 & 06 : The Relational Model

Activity - Is This a Relation?

Student Name Modules Course

Guy Smith Med1 Medieval History 1

Med2 Medieval History 2

TCE Twentieth Century

History

Sarah Anusiem

12 New Street, Lagos

OS Operating Systems

NET Networks

Computing

Page 19: Topic 05 & 06 : The Relational Model

Activity - Is This a Relation?

It has a name which is unique within the relational schema - No

Each cell of a relation contains exactly one value - No

Each attribute has a name – YES

Each tuple is unique - YES

The order of attributes is insignificant – YES

The order of tuples is insignificant - YES

Page 20: Topic 05 & 06 : The Relational Model

Now a Relation

Student Name Address Modules Course

Guy Smith Med1 Medieval History 1 History

Guy Smith Med2 Medieval History 2 History

Guy Smith TCE Twentieth Century History

Sarah Anusiem 12 New Street,Lagos

OS Operating Systems Computing

Sarah Anusiem 12 New Street,Lagos

NET Networks Computing

Page 21: Topic 05 & 06 : The Relational Model

Class Record

Class Code Instrument Taught Teachers No of Instruments Rented

2 Saxophone Marcus Smith 10

6 Trumpet Ajay Singh

Sonny Muller

20

7 Guitar Farhad Khan 10

9 Guitar Farhad Khan

Tommy Jones

23

1 Drums Tommy Jones 5

Page 22: Topic 05 & 06 : The Relational Model

Class Record – A Relation

Class Code Instrument Taught Teachers No of Instruments Rented

2 Saxophone Marcus Smith 10

6 Trumpet Ajay Singh 20

6 Trumpet Sonny Muller 20

7 Guitar Farhad Khan 10

9 Guitar Farhad Khan 23

9 Guitar Tommy Jones 23

1 Drums Tommy Jones 5

Page 23: Topic 05 & 06 : The Relational Model

Problem in Previous Solution

There are a lot of repetition for example the name, address and course.

Also note that where an address is not known, there is no data and this column is NULL.

In order to overcome the problem of repetition, the relation is split into three. This should result in reducing repletion to a minimum.

Only certain attributes are repeated and these are foreign keys that are linking the data in one relation with the data in another.

Page 24: Topic 05 & 06 : The Relational Model

Normalized Relation

StudentID ModuleCode

1 Med1

1 Med3

1 TCE

2 OS

2 Net

StudentID Name Address Course

1 Guy Smith History

2 Sarah Anusiem 12 New Street Lagos Computing

ModuleCode Name

Med1 Medieval History 1

OS Operating Systems

Med2 Medieval History 2

Net Networks

TCE Twentieth Century History

Students

StudentModulesModules Primary keys

Foreign keys

Page 25: Topic 05 & 06 : The Relational Model

Normalization

This process of moving from data that is not in a relational form, to a relation is known as normalization.

It is the process of organizing data to minimize redundancy.

In normalization, we divide the database table in two or more tables and create a relationship between them.

Page 26: Topic 05 & 06 : The Relational Model

Why Normalization?

For Data integrity

To make optimized queries on the normalized tables that produce fast, efficient results.

To increase the performance of the database

Page 27: Topic 05 & 06 : The Relational Model

Types of Normal Forms

1. First Normal Form(1NF)

2. Second Normal Form (2NF)

3. Third Normal Form (3NF)

4. Boyce-Codd Normal Form (BCNF)

5. Fourth Normal Form (4NF)

6. Fifth Normal Form (5NF)

Page 28: Topic 05 & 06 : The Relational Model

Relational Integrity Constraints

Relational integrity constraints are used to ensure accuracy and consistency of data in a relational database.

It refers to the different rules that exist within the model to make sure that it is made of relations.

Types

1. Null integrity

2. Entity integrity

3. Referential integrity

4. General constraints

Page 29: Topic 05 & 06 : The Relational Model

1. Null Integrity

A Null rule is a rule defined on a column that allows or disallows a null (the absence of a value) in that column.

Nulls represent values of an attribute that are unknown. Note that this does NOT mean blank or zero.

Since null means unknown, it is NOT possible to say that an attribute with a value of null is equal to another attribute with a value of null.

Page 30: Topic 05 & 06 : The Relational Model

1. Null Integrity

sp_help tbl_student;

Page 31: Topic 05 & 06 : The Relational Model

1. Null Integrity

This query will produce error because there are already NULL in student_id.So, delete the row having student_id NULL. Try the query again.

Page 32: Topic 05 & 06 : The Relational Model

1. Null Integrity

sp_help tbl_student;

Now, try the following query again. See what will happen.

Page 33: Topic 05 & 06 : The Relational Model

2. Entity Integrity

This rule is about making sure that each tuple (or row) in a relation is unique.

Entity integrity is an integrity rule which states that every table must have a primary key and that the column or columns chosen to be the primary key should be unique and not null.

Why an attribute that is a primary key cannot not be null? Why would this potentially violate uniqueness?

Answer: A null value, being unknown, might be the same as the value in the primary key of another tuple.

Page 34: Topic 05 & 06 : The Relational Model

Creating Primary Key

Page 35: Topic 05 & 06 : The Relational Model

Creating Primary Key

OR

Page 36: Topic 05 & 06 : The Relational Model

3. Referential Integrity

The referential integrity constraint is specified between two relations and is used to maintain the consistency among tuples in the two relations.

Referential integrity means if a foreign key is pointing to a record in another table, then that record must exist.

If the foreign key points to a record that doesn't exist, referential integrity is broken.

Referential integrity rule does not imply a foreign key cannot be null.

Page 37: Topic 05 & 06 : The Relational Model

4. General Constraints

Customized rules specified by the users or database administrators.

It is also called as a business rule which is a statement that defines or constrains some aspect of the business. It is intended to control the behavior of the business.

E.g.: age>=18 && age<=60

It is implemented using CHECK Constraint.

Page 38: Topic 05 & 06 : The Relational Model

CHECK Constraint

Ensures that the value in a column meets a specific condition.

Enforce domain integrity by limiting the values that are accepted by column(s).

Multiple CHECK constraints can apply to a single column.

Page 39: Topic 05 & 06 : The Relational Model

CHECK Constraint

Page 40: Topic 05 & 06 : The Relational Model

Relational Keys

1. Super Key An attribute, or set of attributes, that uniquely identifies a tuple within a relation.

For example, for the entity Employee = {EID, Name, Address, Age, Salary, Phone No}, the possible super keys are <EID>, <Phone No, Name>, <EID, Name>.

2. Candidate Key A smallest possible super key. Only <EID> is a candidate key. It is called a ‘candidate key’, because it is a candidate to become a primary key

3. Composite Key Primary Key with more than one attribute.

Page 41: Topic 05 & 06 : The Relational Model

Functional Dependency

Student ID First Name Surname

9901 John Dacus

9902 Satpal Singh

9922 Jagpal Singh

9911 John Smith

Students

• For any Student ID, there is one first name and one surname, So, First Name and Surname are functionally dependent on Student ID. We can also say Student ID functionally determines First Name and Surname.

• Student ID -> First Name, but not the reverse• Student ID -> Surname

Page 42: Topic 05 & 06 : The Relational Model

Functional Dependency

A functional dependency is a constraint that describes the relationship between attributes in a relation.

If A and B are attributes of relation R, B is said to be functionally dependent on A (denoted A → B), if each value of A is associated with exactly one value of B.

A → B means B is functionally dependent on A or A functionally determines B.

Page 43: Topic 05 & 06 : The Relational Model

Partial Dependency

A functional dependency A→B is a partially dependency if there is some attribute that can be removed from A and yet the dependency still holds.

When an non-key attribute is determined by a part, but not the whole, of a composite primary key.

student_id subject_id marks

1 1 100

2 1 80

1 2 85

marks

Page 44: Topic 05 & 06 : The Relational Model

Transitive Dependency

Three attributes A, B, and C connected in such a way that A→B and B→C. In other words A→C. If we know the value of A, we can determine B, which we can use in turn to determine C. This kind of functional dependency is known as transitive dependency.

e.g. The functional dependency {Book} → {Author Nationality} applies; that is, if we know the book, we know the author's nationality. Furthermore:

{Book} → {Author}{Author} does not → {Book}{Author} → {Author Nationality}

Therefore {Book} → {Author Nationality} is a transitive dependency.

Page 45: Topic 05 & 06 : The Relational Model

Anomalies

1. Insert Anomalies

2. Update Anomalies

3. Delete Anomalies

Page 46: Topic 05 & 06 : The Relational Model

Activity: Anomalies

Student ID Student Name Activity Fee

9901 Binay Basketball 200

9902 Shyam Football 300

9922 Sitaram Cricket 500

9811 Prashant Football 300

• What information do we lose if Binay quits Basketball?

• We would lose the price of ‘Basketball’.• This is the deletion anomaly that occur when relations are not fully

normalized.

• When you delete some information and lose valuable related information at the same time.

Page 47: Topic 05 & 06 : The Relational Model

Insert Anomalies

If we want to record a new activity, but no one has yet taken it. Can we insert this information?

We cannot do so; we need a student ID because the student ID is part of the primary key and therefore cannot be null.

This is an insert anomaly.

Page 48: Topic 05 & 06 : The Relational Model

Update Anomalies

If we wanted to change the cost of football to ‘500’, we would have to do it for every tuple where someone was playing football .

Any change made to your data will require you to scan all records to make the change. This is called the update anomaly.

Page 49: Topic 05 & 06 : The Relational Model

Normal Forms: Review

Un-normalized – There are multivalued attributes or repeating groups

1 NF – No multivalued attributes or repeating groups.

2 NF – 1 NF plus no partial dependencies

3 NF – 2 NF plus no transitive dependencies

Page 50: Topic 05 & 06 : The Relational Model

Billing System

Bill No.: 1078

Date: 2013-12-20

Customer Code: C100

Customer Name: Ram Shrestha

ItemCode ItemName Rate Qty Amount

1 Copy 20 10 200

2 Book 200 8 1600

3 Pen 10 3 30

Page 51: Topic 05 & 06 : The Relational Model

UNF (Un-Normalized Form)

• The first step is to identify which attributes belong to the repeating group.

• Those attributes where there is one occurrence are marked with a ‘1’.

• Those attributes where there is a repeating group are marked with a ‘2’.

• The tentative primary key is also underlined. In this case it is BillNo.

UNF UNF Level

BillNo 1

Date 1

CustomerCode 1

CustomerName 1

ItemCode 2

ItemName 2

Rate 2

Qty 2

Amount 2

Page 52: Topic 05 & 06 : The Relational Model

First Normal Form(1NF)

Remove Repeating Group Information

BillNoDateCustomerCodeCustomerName

BillNoItemCodeItemNameRateQtyAmount

Page 53: Topic 05 & 06 : The Relational Model

Second Normal Form (2NF)

Remove Partial Key Dependencies

Identify the attributes that are dependent on only one part of the primary key (composite key) and separate them.

BillNoDateCustomerCodeCustomerName

BillNoItemCodeQtyAmount

ItemCodeItemNameRate

Page 54: Topic 05 & 06 : The Relational Model

Third Normal Form (3NF)

Remove Non-Key Dependencies or Transitive Dependencies

Identify the attributes that are functionally dependent on non-key attributes or identify the attributes that are not functionally dependent on primary key.

Here CustomerName is dependent of CustomerCode not BillNo.

BillNoDateCustomerCode

BillNoItemCodeQty

ItemCodeItemNameRate

CustomerCodeCustomerName

Page 55: Topic 05 & 06 : The Relational Model

The Document - Example

Student Number: 1078654X Student Name: David Green Course Code: G105 Course Title: BA Business Computing

Module Code Module Title Number of Credits

Grade Point

Result Code

Result

BUS119 Business Operations

20 10 P Pass

COM110 Introduction to Computing

20 8 P Pass

COM112 Application Building 20 3 RE Refer Exam

COM114 Software Engineering

20 2 DC Defer Coursework

COM118 Computer Law 10 9 P Pass

COM120 Systems Analysis 20 3 RCE Refer coursework and Exam

COM122 HCI 10 7 P Pass

Page 56: Topic 05 & 06 : The Relational Model

UNF

• The first step is to identify which attributes belong to the repeating group.

• Those attributes where there is one occurrence are marked with a ‘1’.

• Those attributes where there is a repeating group are marked with a ‘2’.

• The tentative primary key is also underlined. In this case it is student number.

UNF UNF Level

Student Number 1

Student Name 1

Course Code 1

Course Title 1

Module Code 2

Module Title 2

No. of Credits 2

Grade Point 2

Result Code 2

Result 2

Page 57: Topic 05 & 06 : The Relational Model

First Normal Form(1NF)

Remove Repeating Group Information

Student NumberStudent NameCourse CodeCourse Title

Student NumberModule CodeModule TitleNo. of CreditsGrade PointResult CodeResult

Page 58: Topic 05 & 06 : The Relational Model

Second Normal Form (2NF)

Remove Partial Key Dependencies

Student NumberStudent NameCourse CodeCourse Title

Module CodeModule TitleNo. of Credits

Student NumberModule CodeGrade PointResult CodeResult

Page 59: Topic 05 & 06 : The Relational Model

Third Normal Form (3NF)

Remove Non-Key Dependencies or Transitive Dependencies

Student NumberStudent NameCourse Code

Module CodeModule TitleNo. of Credits

Student NumberModule CodeGrade PointResult Code

Course CodeCourse Title

Result CodeResult

Page 60: Topic 05 & 06 : The Relational Model

Activity 1

Page 61: Topic 05 & 06 : The Relational Model

Activity 2

Page 62: Topic 05 & 06 : The Relational Model

Activity 3

Page 63: Topic 05 & 06 : The Relational Model

ANY QUESTIONS?

Page 64: Topic 05 & 06 : The Relational Model

References

http://rdbms.opengrass.net/2_Database%20Design/2.2_Normalisation/2.2.4_1NF%20Repeating%20Attributes.html

http://rdbms.opengrass.net/2_Database%20Design/2.2_Normalisation/2.2.5_2NF-Partial%20Dependancy.html

http://rdbms.opengrass.net/2_Database%20Design/2.2_Normalisation/2.2.6_3NF-Transitive%20Dependency.html

http://en.wikipedia.org/wiki/Integrity_constraints

http://www.jkinfoline.com/functional-dependency.html

http://jcsites.juniata.edu/faculty/rhodes/dbms/funcdep.htm