pmit-6102 advanced database systems

22
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University

Upload: mariam-luna

Post on 01-Jan-2016

60 views

Category:

Documents


2 download

DESCRIPTION

PMIT-6102 Advanced Database Systems. By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University. Lecture 04 Relational Database Design Normalization. Outline. Overview of Relational DBMS Normalization. Normalization. - PowerPoint PPT Presentation

TRANSCRIPT

PMIT-6102Advanced Database Systems

By-Jesmin Akhter

Assistant Professor, IIT, Jahangirnagar University

Lecture 04 Relational Database Design

Normalization

Outline Overview of Relational DBMS

Normalization

The aim of normalization is to eliminate various anomalies (or undesirable aspects) of a relation in order to obtain “better” relations.

The following four problems might exist in a relation scheme: Repetition anomaly Update anomaly Insertion anomaly Deletion anomaly

Slide 4

Normalization

Repetition Anomaly

The NAME,TITLE, SAL attribute values are repeated for each project that the employee is involved in. Waste of space Complicates updates Contrary to the spirit of databases

ENO

EMP

ENAME TITLE SAL

J. Doe Elect. Eng. 40000M. Smith 34000M. Smith

AnalystAnalyst 34000

A. Lee Mech. Eng. 27000A. Lee Mech. Eng. 27000J. Miller Programmer 24000B. Casey Syst. Anal. 34000L. Chu Elect. Eng. 40000R. Davis Mech. Eng. 27000

E1E2E2E3E3E4E5E6E7E8 J. Jones Syst. Anal. 34000

24

PNO RESP DUR

P1 Manager 12P1 AnalystP2 Analyst 6P3 Consultant 10P4 Engineer 48P2 Programmer 18P2 Manager 24P4 Manager 48P3 Engineer 36P3 Manager 40

Update Anomaly

If any attribute of project (say SAL of an employee) is updated, multiple tuples have to be updated to reflect the change.

ENO

EMP

ENAME TITLE SAL

J. Doe Elect. Eng. 40000M. Smith 34000M. Smith

AnalystAnalyst 34000

A. Lee Mech. Eng. 27000A. Lee Mech. Eng. 27000J. Miller Programmer 24000B. Casey Syst. Anal. 34000

L. Chu Elect. Eng. 40000

R. Davis Mech. Eng. 27000

E1E2E2E3E3E4E5

E6

E7E8 J. Jones Syst. Anal. 34000

24

PNO RESP DUR

P1 Manager 12P1 AnalystP2 Analyst 6P3 Consultant 10P4 Engineer 48P2 Programmer 18P2 Manager 24P4 Manager 48

P3 Engineer 36

P3 Manager 40

Insertion Anomaly

It may not be possible to store information about a new project until an employee is assigned to it.

ENO

EMP

ENAME TITLE SAL

J. Doe Elect. Eng. 40000M. Smith 34000

M. Smith

Analyst

Analyst 34000A. Lee Mech. Eng. 27000

A. Lee Mech. Eng. 27000J. Miller Programmer 24000

B. Casey Syst. Anal. 34000

L. Chu Elect. Eng. 40000

R. Davis Mech. Eng. 27000

E1E2

E2E3

E3E4

E5

E6

E7

E8 J. Jones Syst. Anal. 34000

24

PNO RESP DUR

P1 Manager 12P1 Analyst

P2 Analyst 6

P3 Consultant 10

P4 Engineer 48P2 Programmer 18

P2 Manager 24

P4 Manager 48

P3 Engineer 36

P3 Manager 40

Deletion Anomaly If an engineer, who is the only employee on a project,

leaves the company, his personal information cannot be deleted, or the information about that project is lost.

May have to delete many tuples.

ENO

EMP

ENAME TITLE SAL

J. Doe Elect. Eng. 40000M. Smith 34000M. Smith

AnalystAnalyst 34000

A. Lee Mech. Eng. 27000A. Lee Mech. Eng. 27000J. Miller Programmer 24000B. Casey Syst. Anal. 34000L. Chu Elect. Eng. 40000

R. Davis Mech. Eng. 27000

E1E2E2E3E3E4E5E6

E7E8 J. Jones Syst. Anal. 34000

24

PNO RESP DUR

P1 Manager 12P1 AnalystP2 Analyst 6P3 Consultant 10P4 Engineer 48P2 Programmer 18P2 Manager 24P4 Manager 48P3 Engineer 36

P3 Manager 40

What to do? Take each relation individually and “improve” it in terms

of the desired characteristics Normal forms

o Atomic values (1NF)o Can be defined according to keys and dependencies.o Functional Dependencies ( 2NF, 3NF, BCNF)o Multivalued dependencies (4NF) o Projection-join dependencies (5NF)

Normalizationo Normalization is a process of concept separation which applies a

top-down methodology for producing a schema by subsequent refinements and decompositions.

o Do not combine unrelated sets of facts in one table; each relation should contain an independent set of facts.

o Universal relation assumption

Normalization Issues How do we decompose a schema into a desirable

normal form? What criteria should the decomposed schemas follow

in order to preserve the semantics of the original schema? Reconstructability: recover the original relation no spurious

joins Lossless decomposition: no information loss Dependency preservation: the constraints (i.e., dependencies)

that hold on the original relation should be enforceable by means of the constraints (i.e., dependencies) defined on the decomposed relations.

A Lossy Decomposition

Example of Lossless-Join Decomposition

Lossless join decomposition Decomposition of R = (A, B, C)

R1 = (A, B) R2 = (B, C)

A B

1

2

A

B

1

2

r B,C(r)

A (r) B (r)A B

1

2

C

A

B

B

1

2

C

A

B

C

A

B

A,B(r)

Unnormalized

(UDF)

First normal form

(1NF)

Remove repeating groups

Second normal form

(2NF)

Remove partial dependencies

Third normal form

(3NF)

Remove transitive dependencies

Boyce-Codd normal

form (BCNF)

Remove remaining functional

dependency anomalies

Fourth normal form

(4NF)

Remove multivalued dependencies

Fifth normal form

(5NF)

Remove remaining anomalies

Stages of Normalization

Repeating Groups

A repeating group is an attribute (or set of attributes) that can have more than one value for a primary key value.

staffNo job dept dname city contact Number

SL10 Salesman 10 Sales Stratford 018111777, 018111888, 079311122

SA51 Manager 20 Accounts Barking 017111777

DS40 Clerk 20 Accounts Barking Null

OS45 Clerk 30 Operations Barking 079311555

Example We have the following relation that contains staff and department details and a list of telephone contact numbers for each member of staff.

Repeating Groups are not allowed in a relational design, since all attributes have to be ‘atomic’ - i.e., there can only be one value per cell in a table!

Multivalued Attributes (or repeating groups): non-key attributes or groups of non-key attributes the values of which are not uniquely identified by (directly or indirectly) (not functionally dependent on) the value of the Primary Key (or its part).

STUDENT

Stud_ID Name Course_ID Units

101 Lennon MSI 250 3.00

101 Lennon MSI 415 3.00

125 Johnson MSI 331 3.00

Stud_ID Name Course_ID Units101 Lennon MSI 250, MSI 415 3.00

125 Johnson MSI 331 3.00

Repeating Groups

STUDENT

Functional Dependency

Formal Definition: Attribute B is functionally dependant upon attribute A (or a collection of attributes) if a value of A determines a single value of attribute B at any one time.

Formal Notation: A B This should be read as ‘A determines B’ or ‘B is functionally dependant on A’. A is called the determinant and B is called the object of the determinant.

staffNo job dept dname

SL10 Salesman 10 Sales

SA51 Manager 20 Accounts

DS40 Clerk 20 Accounts

OS45 Clerk 30 Operations

Example:

staffNo job

staffNo dept

staffNo dname

dept dname

Functional Dependencies

Functional Dependency

Full Functional Dependency: Only of relevance with composite determinants. This is the situation when it is necessary to use all the attributes of the composite determinant to identify its object uniquely.

order# line# qty price

A001 001 10 200

A002 001 20 400

A002 002 20 800

A004 001 15 300

Example:

(Order#, line#) qty

(Order#, line#) price

Full Functional Dependencies

Compound Determinants: If more than one attribute is necessary to determine another attribute in an entity, then such a determinant is termed a composite determinant.

Functional Dependency

Partial Functional Dependency: This is the situation that exists if it is necessary to only use a subset of the attributes of the composite determinant to identify its object uniquely.

(student#, unit#) grade

Full Functional Dependencies

unit# room

Partial Functional Dependencies

Repetition of data!

student# unit# room grade

9900100 A01 TH224 2

9900010 A01 TH224 14

9901011 A02 JS075 3

9900001 A01 TH224 16

Partial Dependency – when an non-key attribute is determined by a part, but not the whole, of a COMPOSITE primary key.

CUSTOMER

Cust_ID Name Order_ID

101 AT&T 1234

101 AT&T 156

125 Cisco 1250

Partial Dependency

Functional Dependency

Transitive DependencyDefinition: A transitive dependency exists when there is an intermediate functional dependency.

Formal Notation: If A B and B C, then it can be stated that the following transitive dependency exists: A B C

staffNo dept

dept dname

staffNo dept dname

Transitive Dependencies

Repetition of data!

staffNo job dept dname

SL10 Salesman 10 Sales

SA51 Manager 20 Accounts

DS40 Clerk 20 Accounts

OS45 Clerk 30 Operations

Example:

Transitive Dependency – when a non-key attribute determines another non-key attribute.

EMPLOYEE

Emp_ID F_Name L_Name Dept_ID Dept_Name

111 Mary Jones 1 Acct

122 Sarah Smith 2 Mktg

Transitive Dependency

Transitive Dependency

Thank You