isom mis415 module 1b relational model and normalization arijit sengupta

43

Click here to load reader

Upload: stephany-hunter

Post on 27-Dec-2015

266 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

MIS415 Module 1bRelational Model and Normalization

Arijit Sengupta

Page 2: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

Structure of this quarter

Database Fundamentals

Relational Model

Normalization

ConceptualModeling Query

Languages

AdvancedSQL

Transaction Management

Java DB Applications –

JDBC

DataMining

0. Intro 1. Design 3. Applications 4. AdvancedTopics

Newbie Users ProfessionalsDesigners

MIS415

2. Querying

Developers

Page 3: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

Today’s Buzzwords

• Relational Model• Superkey, Candidate Key, Primary Key

and Foreign Key• Entity Integrity Rule• Referential Integrity Rule• Normalization• First, Second, Third, and Boyce-Codd

Normal Forms• Unnormalization

Page 4: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

Objectives of this lecture

• Understand the Relational Model and its properties• Understand the notion of keys• Understand the use and importance of referential

integrity• Provide an alternative way to design relations using

semantics rather than concepts• Take an existing “flat file” design and creating a

relational design from it through the process of Normalization

• Identify sources of problems (or anomalies) within a given relational design

• Argue about improvements to designs created by others

Page 5: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

Relational Data Model

• Originally proposed by Codd in 1970• Based on mathematical set theory

ID Name Age Address GPAS1 Jose 21 Stoned Hill 3.1S2 Alice 18 BigHead 3.2S3 Lin 32 Done-Audy 2.9S4 Joyce 20 Atlanta 3.7S5 Sunil 27 Mare-iota 3.2Tuples

AttributesAttributeValues

Attribute NamesRelation

Page 6: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

Relation: Properties

• A relation is a set of tuples• A tuple is a set of attribute-value properties

(relations) Ordering of attributes is immaterial Ordering of Tuples is immaterial

• Tuples are distinct from one another• Attributes contain atomic values only

Emp# Name AddressE1 Jose' 'M.' 'Smith' 3413 Main Street', 'Atlanta', GA

Page 7: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

Attributes

• Attribute nameAttribute names are unique within a relation

• Attribute domainSet of all possible values an attribute may

takeDomain (GPA) = Domain (name) =Domain (DateOfBirth) = Domain (year)

• Number of attributes: degree of the relation

Page 8: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

Tuples

• Aggregation of attribute valuesS1 = (s1, ‘Jose’, 21, ‘StonedHill’, 3.1)S2 = (s2, ‘Alice’, 18, ‘BigHead’, 3.2)

• Cardinality: Number of tuples in a relation

• What is the difference between the cardinality and the degree?

ID Name Age Address GPAS1 Jose 21 Stoned Hill 3.1S2 Alice 18 BigHead 3.2S3 Lin 32 Done-Audy 2.9S4 Joyce 20 Atlanta 3.7S5 Sunil 27 Mare-iota 3.2

Page 9: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

Primary Keys

• Superkey: SK, a subset of attributes of R, satisfying Uniqueness, that is, no two tuples have the same combination of values for these attributes

• Candidate Key: K, a superkey SK, satisfying minimality, that is, no component of K can be eliminated without destroying the uniqueness property.

• Primary Key: PK, the selected Candidate key, K.

• Can a primary key be composed of multiple attributes?• Can a relation have multiple primary keys?

Page 10: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

Keys - example

• Superkeys?

• Candidate keys?

• Primary key?

Disk: (ISBN#, Artist_name, Album_name, Year, Producer, Genre, time, price)

Page 11: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

Entity Integrity Rule

• The primary key of a base relation cannot contain a NULL value.

• Enforcement of the rule:An update which results in a NULL value

in the primary key must be rejected.

• Are the following ok?

Course Section Meets Enrolled201 1 MW 20201 NULL TTh 25

NULL NULL MWF 18

Primary Key

Page 12: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

Foreign Key

Physician (ID, Name, …) Patient (ID, Name, PhysID*, …)

Club (ID, Name, …) Player (ID, Name, ?*, …)

Order (OrdID, Date, …, ?*) Customer (ID, Name, …, ?*)

Dept (DeptID, Name, …, ?*) Employee (EID, Name, …, ?*)

• Attribute(s) of one relation that reference(s) the PK of another relation

• FK may or may not be (a part of) the PK of this relation

Course (CourseID, Name, …, ?*) Class (ClassID, Meets, …, ?*)Student (SID, Name, …, ?*) Registration (?)

• Can an FK refer to a part of the PK of another relation?• Can an FK refer to a PK of the same relation?

Page 13: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

Foreign Key ..

• FK and referenced PK may have different names

• The values of FK must draw from the value set of PK

• How do we define the Domain of an FK?• Can an FK have a NULL value?• What can we enforce with PKs and FKs?

Domain

Value Set Domain

Primary Key Foreign Key

Page 14: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

Referential Integrity Rule

• If FK is the foreign key of a relation R2, which matches the primary key PK of the relation R1, then: the FK value must match the PK value in some tuple of R1,

or the FK value may be NULL, but only if the FK is not (a part

of) the PK of R2.

• Enforcement of the Rule An update on either a referenced PK or an FK must satisfy

the rule. Otherwise, the operation is rejected.

• Which operation on the primary key may violate this rule?• Which operation on the foreign key may violate this rule?

Page 15: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

Referential Integrity Enforcement

• If an operation violates referential integrity:Restrict

• reject the operation

Cascade• try to propagate the operation to all dependent FK

values, if it is not possible, reject the operation

Nullify (or Default)• set all dependent FK values to NULL (or a default

value), if that is not possible, reject the operation

• Cases for each of the above situations?

Page 16: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

Creating Relations

create table STUDENT (ID char (11) not null primary key,Name char(30) not null,age int,GPA number (2,1));

create table COURSE (courseno char (6) not null primary key,coursename char(30) not null,credithours number (2,1));

create table REGISTRATION (ID references STUDENT (ID)

on delete cascade,CourseNum references COURSE (courseno),primary key (ID, CourseNum) );

Page 17: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

Try it!

/* Create the computer table */

Create table test_computer(cid int not null, make varchar2(20),

model varchar2(20), primary key(cid));

/* Create the part table */

Create table test_part(pid int not null, ptype varchar2(20),

pdesc varchar2(20), comp int, primary key(pid),

foreign key(comp) references test_computer(cid) on delete cascade);

/* Insert values into computer */

Insert into test_computer values (111, 'Dell', 'Inspiron');

Insert into test_computer values (222, 'HP', 'TC4400');

/* Insert values into part */

Insert into test_part values(1111, 'Memory', '512MB');

insert into test_part values(2222, 'CD', 'Multimedia drive');

What happens when you delete computer 111?

What happens when you delete computer 222?

What happens when you delete part 2222?

Page 18: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

Purpose of Normalization

• The benefits of using a database that has a suitable set of relations is that the database will be:easier for the user to access and

maintain the data;take up minimal storage space on

the computer.

© Pearson Education Limited 1995, 2005

Page 19: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

Why Normalization?

• Poor Relation Design causes Anomalies Insertion anomalies - Insertion of some piece of

information cannot be performed unless other irrelevant information is added to it.

Update anomalies - Update of a single piece of information requires updates to multiple tuples.

Deletion anomalies - Deletion of a piece of information removes other unrelated but necessary information.

• Normalization improves the design to remove these anomalies

Page 20: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

Why Normalization?

• Benefitscontain minimum amount of redundancyallow users to insert, delete and modify tuples

in the relation without errors or inconsistencies. improve quality of information in the databasedecrease storage space for the database

• Costsmay contribute to performance problemsmay require more storage in some cases

Page 21: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

Data Redundancy and Update Anomalies

© Pearson Education Limited 1995, 2005

Page 22: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

Data Redundancy and Update Anomalies

• StaffBranch relation has redundant data; the details of a branch are repeated for every member of staff.

• In contrast, the branch information appears only once for each branch in the Branch relation and only the branch number (branchNo) is repeated in the Staff relation, to represent where each member of staff is located.

© Pearson Education Limited 1995, 2005

Page 23: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

Lossless-join and Dependency Preservation Properties

• Two important properties of decomposition.Lossless-join property enables us to find

any instance of the original relation from corresponding instances in the smaller relations.

Dependency preservation property enables us to enforce a constraint on the original relation by enforcing some constraint on each of the smaller relations.

© Pearson Education Limited 1995, 2005

Page 24: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

Normal Forms

Unnormalized Relation

First Normal Form

Second Normal Form

Third Normal Form

Higher Order Forms

Only atomic attributes

Remove nonkey dependency

Remove transitive dependency

Dependency preservation: BCNFRemove Multi-valued Dependencies: 4NFRemove Join Dependencies: 5NF

NF2

1NF

2NF

3NF

BCNF

Page 25: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

The Basis of Normalization

• Functional Dependency (FD)Consider two attributes, X and Y, and two

arbitrary tuples r1 and r2 of a relation R.

• Y is functionally dependent on X iff:

value of x in r1 = value of x in r2implies

value of Y in r1 = value of Y in r2

• Also stated as: R.X R.Y or X Y

Page 26: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

An Example Functional Dependency

© Pearson Education Limited 1995, 2005

Page 27: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

Properties of FDs

• If R.X R.Y or X Y X is called the determinant of Y. X may or may not be the key attribute of R. A FD changes with its semantic meaning

• Name Address?

X and Y may be composite X and Y may be mutually dependent on each other

• Husband Wife, Wife Husband

The same Y value may occur in multiple tuples• Course# Text

Page 28: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

Fully Functional Dependencies

• When is X Y a FFD?When Y is not functionally dependent on any proper subset

of X

• X Y is a fully functional dependency ( FFD )( SID, Course# ) Name? ( SID, Course# )

Grade?

( SID, Name ) Major? ( SID, Name ) SID?

• By default, the term FD refers to FFD

Page 29: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

Transitive Dependencies

• Given attributes X, Y, and Z of a relation R,• Z is transitively dependent on X (X Z)

iff X Y and Y Z

• For example:SID Dept, SID Major,

Dept School, Major Dept

• Do you see any Transitive Functional Dependencies?

Page 30: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

Example Transitive Dependency

• Consider functional dependencies in the StaffBranch relation (see Slide 19).

staffNo → sName, position, salary, branchNo, bAddressbranchNo → bAddress

• Transitive dependency, branchNo → bAddress exists on staffNo via branchNo.

© Pearson Education Limited 1995, 2005

Page 31: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

First Normal Form

• DEFINITIONA relation R is in first normal form (1NF) if and

only if all underlying domains contain atomic values only.

• TranslationTo be in first normal form the table must not

contain any repeating attributes.

• ImplicationAre all ‘relations’ in First Normal Form (1NF) ?

Page 32: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

Second Normal Form

• DEFINITIONA relation R is in second normal form (2NF) if

and only if it is in 1NF and every nonkey attribute is dependent on the full primary key.

• TranslationA table is in second normal form if there are no

partial dependencies.

• ImplicationWhat kinds of primary keys may lead to a

violation of the Second Normal Form (2NF) ?

Page 33: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

Third Normal Form

• DEFINITION A relation R is in third normal form (3NF) if and only if

it is in 2NF and every nonkey attribute is non-transitively dependent on the primary key.

• Translation A table is in Third Normal Form if every non-key

attribute is determined by the key, and nothing else.

• Implication How many total attributes must the relation have for a

possible violation of the Third Normal Form (3NF) ?

Page 34: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

The Normalization Process

1. Flatten the table completely (no composite columns) – repeate values if necessary (1NF)

2. Find “all” FDs (well as many as you can possibly detect). Ensure all FDs are “Full” FDs and expand transitive dependencies completely

3. Derive the Primary Key of the table using the FDs

4. Find Partial Dependencies and decompose relation using them (2NF)

5. Find Transitive dependencies and decompose using them (3NF)

6. Remember – this is not a deterministic method – depends on the order in which FDs are chosen, so same Relation, same set of FDs can lead to different decompositions!

Page 35: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

Example - Identifying a set of functional dependencies for the StaffBranch relation

• With sufficient information available, identify the functional dependencies for the StaffBranch relation as:

staffNo → sName, position, salary, branchNo, bAddress

branchNo → bAddress

bAddress → branchNo

branchNo, position → salary

bAddress, position → salary

© Pearson Education Limited 1995, 2005

Page 36: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

Identifying the Primary Key for a Relation using Functional Dependencies

• Main purpose of identifying a set of functional dependencies for a relation is to specify the set of integrity constraints that must hold on a relation.

• An important integrity constraint to consider first is the identification of candidate keys, one of which is selected to be the primary key for the relation.

© Pearson Education Limited 1995, 2005

Page 37: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

Example - Identify Primary Key for StaffBranch Relation

• StaffBranch relation has five functional dependencies (see Slide 31).

• The determinants are staffNo, branchNo, bAddress, (branchNo, position), and (bAddress, position).

• To identify all candidate key(s), identify the attribute (or group of attributes) that uniquely identifies each tuple in this relation.

© Pearson Education Limited 1995, 2005

Page 38: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

Example - Identifying Primary Key for StaffBranch Relation

• All attributes that are not part of a candidate key should be functionally dependent on the key.

• The only candidate key and therefore primary key for StaffBranch relation, is staffNo, as all other attributes of the relation are functionally dependent on staffNo.

© Pearson Education Limited 1995, 2005

Page 39: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

The Process of Normalization

© Pearson Education Limited 1995, 2005

Page 40: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

In-class Exercise – Normalize this:

Page 41: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

The Normalization Tree

Pno, Pname, Eno, Ename, Jclass, Chghr, Hrsbilled, Leader

Eno, Ename, Jclass, Chghr Pno, Pname, Eno*, Hrsbilled, Leader

Pno*, Eno*, Hrsbilled

Pno, Pname, Leader

Eno, Ename, Jclass*

Jclass, Chghr

Eno Ename, Jclass, Chghr

PnoPname,Leader

Jclass Chghr

Page 42: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

Boyce-Codd Normal Form (BCNF)

• Update anomalies occur in an 3NF relation R ifR has multiple candidate keys,Those candidate keys are composite, andThe candidate keys are overlapped.

Computer-Lab (SID, Account, Class, Hours)

• A relation R is in BCNF iff every determinant is a candidate key.

Page 43: ISOM MIS415 Module 1b Relational Model and Normalization Arijit Sengupta

ISOM

Higher Normal Forms

• Fourth Normal FormMultivalued Dependencies (Fagin 1977)

• Fifth Normal FormJoin Dependencies (Fagin 1979)

• Other Dependencies Inclusion Dependencies (Casanova 1981)Template Dependencies (Sadri 1982)Domain-Key Normal Form (Fagin 1981)