isom mis415 module 1b relational model and normalization arijit sengupta
TRANSCRIPT
ISOM
MIS415 Module 1bRelational Model and Normalization
Arijit Sengupta
ISOM
Structure of this quarter
Database Fundamentals
Relational Model
Normalization
ConceptualModeling Query
Languages
AdvancedSQL
Transaction Management
Java DB Applications –
JDBC
DataMining
0. Intro 1. Design 3. Applications 4. AdvancedTopics
Newbie Users ProfessionalsDesigners
MIS415
2. Querying
Developers
ISOM
Today’s Buzzwords
• Relational Model• Superkey, Candidate Key, Primary Key
and Foreign Key• Entity Integrity Rule• Referential Integrity Rule• Normalization• First, Second, Third, and Boyce-Codd
Normal Forms• Unnormalization
ISOM
Objectives of this lecture
• Understand the Relational Model and its properties• Understand the notion of keys• Understand the use and importance of referential
integrity• Provide an alternative way to design relations using
semantics rather than concepts• Take an existing “flat file” design and creating a
relational design from it through the process of Normalization
• Identify sources of problems (or anomalies) within a given relational design
• Argue about improvements to designs created by others
ISOM
Relational Data Model
• Originally proposed by Codd in 1970• Based on mathematical set theory
ID Name Age Address GPAS1 Jose 21 Stoned Hill 3.1S2 Alice 18 BigHead 3.2S3 Lin 32 Done-Audy 2.9S4 Joyce 20 Atlanta 3.7S5 Sunil 27 Mare-iota 3.2Tuples
AttributesAttributeValues
Attribute NamesRelation
ISOM
Relation: Properties
• A relation is a set of tuples• A tuple is a set of attribute-value properties
(relations) Ordering of attributes is immaterial Ordering of Tuples is immaterial
• Tuples are distinct from one another• Attributes contain atomic values only
Emp# Name AddressE1 Jose' 'M.' 'Smith' 3413 Main Street', 'Atlanta', GA
ISOM
Attributes
• Attribute nameAttribute names are unique within a relation
• Attribute domainSet of all possible values an attribute may
takeDomain (GPA) = Domain (name) =Domain (DateOfBirth) = Domain (year)
• Number of attributes: degree of the relation
ISOM
Tuples
• Aggregation of attribute valuesS1 = (s1, ‘Jose’, 21, ‘StonedHill’, 3.1)S2 = (s2, ‘Alice’, 18, ‘BigHead’, 3.2)
• Cardinality: Number of tuples in a relation
• What is the difference between the cardinality and the degree?
ID Name Age Address GPAS1 Jose 21 Stoned Hill 3.1S2 Alice 18 BigHead 3.2S3 Lin 32 Done-Audy 2.9S4 Joyce 20 Atlanta 3.7S5 Sunil 27 Mare-iota 3.2
ISOM
Primary Keys
• Superkey: SK, a subset of attributes of R, satisfying Uniqueness, that is, no two tuples have the same combination of values for these attributes
• Candidate Key: K, a superkey SK, satisfying minimality, that is, no component of K can be eliminated without destroying the uniqueness property.
• Primary Key: PK, the selected Candidate key, K.
• Can a primary key be composed of multiple attributes?• Can a relation have multiple primary keys?
ISOM
Keys - example
• Superkeys?
• Candidate keys?
• Primary key?
Disk: (ISBN#, Artist_name, Album_name, Year, Producer, Genre, time, price)
ISOM
Entity Integrity Rule
• The primary key of a base relation cannot contain a NULL value.
• Enforcement of the rule:An update which results in a NULL value
in the primary key must be rejected.
• Are the following ok?
Course Section Meets Enrolled201 1 MW 20201 NULL TTh 25
NULL NULL MWF 18
Primary Key
ISOM
Foreign Key
Physician (ID, Name, …) Patient (ID, Name, PhysID*, …)
Club (ID, Name, …) Player (ID, Name, ?*, …)
Order (OrdID, Date, …, ?*) Customer (ID, Name, …, ?*)
Dept (DeptID, Name, …, ?*) Employee (EID, Name, …, ?*)
• Attribute(s) of one relation that reference(s) the PK of another relation
• FK may or may not be (a part of) the PK of this relation
Course (CourseID, Name, …, ?*) Class (ClassID, Meets, …, ?*)Student (SID, Name, …, ?*) Registration (?)
• Can an FK refer to a part of the PK of another relation?• Can an FK refer to a PK of the same relation?
ISOM
Foreign Key ..
• FK and referenced PK may have different names
• The values of FK must draw from the value set of PK
• How do we define the Domain of an FK?• Can an FK have a NULL value?• What can we enforce with PKs and FKs?
Domain
Value Set Domain
Primary Key Foreign Key
ISOM
Referential Integrity Rule
• If FK is the foreign key of a relation R2, which matches the primary key PK of the relation R1, then: the FK value must match the PK value in some tuple of R1,
or the FK value may be NULL, but only if the FK is not (a part
of) the PK of R2.
• Enforcement of the Rule An update on either a referenced PK or an FK must satisfy
the rule. Otherwise, the operation is rejected.
• Which operation on the primary key may violate this rule?• Which operation on the foreign key may violate this rule?
ISOM
Referential Integrity Enforcement
• If an operation violates referential integrity:Restrict
• reject the operation
Cascade• try to propagate the operation to all dependent FK
values, if it is not possible, reject the operation
Nullify (or Default)• set all dependent FK values to NULL (or a default
value), if that is not possible, reject the operation
• Cases for each of the above situations?
ISOM
Creating Relations
create table STUDENT (ID char (11) not null primary key,Name char(30) not null,age int,GPA number (2,1));
create table COURSE (courseno char (6) not null primary key,coursename char(30) not null,credithours number (2,1));
create table REGISTRATION (ID references STUDENT (ID)
on delete cascade,CourseNum references COURSE (courseno),primary key (ID, CourseNum) );
ISOM
Try it!
/* Create the computer table */
Create table test_computer(cid int not null, make varchar2(20),
model varchar2(20), primary key(cid));
/* Create the part table */
Create table test_part(pid int not null, ptype varchar2(20),
pdesc varchar2(20), comp int, primary key(pid),
foreign key(comp) references test_computer(cid) on delete cascade);
/* Insert values into computer */
Insert into test_computer values (111, 'Dell', 'Inspiron');
Insert into test_computer values (222, 'HP', 'TC4400');
/* Insert values into part */
Insert into test_part values(1111, 'Memory', '512MB');
insert into test_part values(2222, 'CD', 'Multimedia drive');
What happens when you delete computer 111?
What happens when you delete computer 222?
What happens when you delete part 2222?
ISOM
Purpose of Normalization
• The benefits of using a database that has a suitable set of relations is that the database will be:easier for the user to access and
maintain the data;take up minimal storage space on
the computer.
© Pearson Education Limited 1995, 2005
ISOM
Why Normalization?
• Poor Relation Design causes Anomalies Insertion anomalies - Insertion of some piece of
information cannot be performed unless other irrelevant information is added to it.
Update anomalies - Update of a single piece of information requires updates to multiple tuples.
Deletion anomalies - Deletion of a piece of information removes other unrelated but necessary information.
• Normalization improves the design to remove these anomalies
ISOM
Why Normalization?
• Benefitscontain minimum amount of redundancyallow users to insert, delete and modify tuples
in the relation without errors or inconsistencies. improve quality of information in the databasedecrease storage space for the database
• Costsmay contribute to performance problemsmay require more storage in some cases
ISOM
Data Redundancy and Update Anomalies
© Pearson Education Limited 1995, 2005
ISOM
Data Redundancy and Update Anomalies
• StaffBranch relation has redundant data; the details of a branch are repeated for every member of staff.
• In contrast, the branch information appears only once for each branch in the Branch relation and only the branch number (branchNo) is repeated in the Staff relation, to represent where each member of staff is located.
© Pearson Education Limited 1995, 2005
ISOM
Lossless-join and Dependency Preservation Properties
• Two important properties of decomposition.Lossless-join property enables us to find
any instance of the original relation from corresponding instances in the smaller relations.
Dependency preservation property enables us to enforce a constraint on the original relation by enforcing some constraint on each of the smaller relations.
© Pearson Education Limited 1995, 2005
ISOM
Normal Forms
Unnormalized Relation
First Normal Form
Second Normal Form
Third Normal Form
Higher Order Forms
Only atomic attributes
Remove nonkey dependency
Remove transitive dependency
Dependency preservation: BCNFRemove Multi-valued Dependencies: 4NFRemove Join Dependencies: 5NF
NF2
1NF
2NF
3NF
BCNF
ISOM
The Basis of Normalization
• Functional Dependency (FD)Consider two attributes, X and Y, and two
arbitrary tuples r1 and r2 of a relation R.
• Y is functionally dependent on X iff:
value of x in r1 = value of x in r2implies
value of Y in r1 = value of Y in r2
• Also stated as: R.X R.Y or X Y
ISOM
An Example Functional Dependency
© Pearson Education Limited 1995, 2005
ISOM
Properties of FDs
• If R.X R.Y or X Y X is called the determinant of Y. X may or may not be the key attribute of R. A FD changes with its semantic meaning
• Name Address?
X and Y may be composite X and Y may be mutually dependent on each other
• Husband Wife, Wife Husband
The same Y value may occur in multiple tuples• Course# Text
ISOM
Fully Functional Dependencies
• When is X Y a FFD?When Y is not functionally dependent on any proper subset
of X
• X Y is a fully functional dependency ( FFD )( SID, Course# ) Name? ( SID, Course# )
Grade?
( SID, Name ) Major? ( SID, Name ) SID?
• By default, the term FD refers to FFD
ISOM
Transitive Dependencies
• Given attributes X, Y, and Z of a relation R,• Z is transitively dependent on X (X Z)
iff X Y and Y Z
• For example:SID Dept, SID Major,
Dept School, Major Dept
• Do you see any Transitive Functional Dependencies?
ISOM
Example Transitive Dependency
• Consider functional dependencies in the StaffBranch relation (see Slide 19).
staffNo → sName, position, salary, branchNo, bAddressbranchNo → bAddress
• Transitive dependency, branchNo → bAddress exists on staffNo via branchNo.
© Pearson Education Limited 1995, 2005
ISOM
First Normal Form
• DEFINITIONA relation R is in first normal form (1NF) if and
only if all underlying domains contain atomic values only.
• TranslationTo be in first normal form the table must not
contain any repeating attributes.
• ImplicationAre all ‘relations’ in First Normal Form (1NF) ?
ISOM
Second Normal Form
• DEFINITIONA relation R is in second normal form (2NF) if
and only if it is in 1NF and every nonkey attribute is dependent on the full primary key.
• TranslationA table is in second normal form if there are no
partial dependencies.
• ImplicationWhat kinds of primary keys may lead to a
violation of the Second Normal Form (2NF) ?
ISOM
Third Normal Form
• DEFINITION A relation R is in third normal form (3NF) if and only if
it is in 2NF and every nonkey attribute is non-transitively dependent on the primary key.
• Translation A table is in Third Normal Form if every non-key
attribute is determined by the key, and nothing else.
• Implication How many total attributes must the relation have for a
possible violation of the Third Normal Form (3NF) ?
ISOM
The Normalization Process
1. Flatten the table completely (no composite columns) – repeate values if necessary (1NF)
2. Find “all” FDs (well as many as you can possibly detect). Ensure all FDs are “Full” FDs and expand transitive dependencies completely
3. Derive the Primary Key of the table using the FDs
4. Find Partial Dependencies and decompose relation using them (2NF)
5. Find Transitive dependencies and decompose using them (3NF)
6. Remember – this is not a deterministic method – depends on the order in which FDs are chosen, so same Relation, same set of FDs can lead to different decompositions!
ISOM
Example - Identifying a set of functional dependencies for the StaffBranch relation
• With sufficient information available, identify the functional dependencies for the StaffBranch relation as:
staffNo → sName, position, salary, branchNo, bAddress
branchNo → bAddress
bAddress → branchNo
branchNo, position → salary
bAddress, position → salary
© Pearson Education Limited 1995, 2005
ISOM
Identifying the Primary Key for a Relation using Functional Dependencies
• Main purpose of identifying a set of functional dependencies for a relation is to specify the set of integrity constraints that must hold on a relation.
• An important integrity constraint to consider first is the identification of candidate keys, one of which is selected to be the primary key for the relation.
© Pearson Education Limited 1995, 2005
ISOM
Example - Identify Primary Key for StaffBranch Relation
• StaffBranch relation has five functional dependencies (see Slide 31).
• The determinants are staffNo, branchNo, bAddress, (branchNo, position), and (bAddress, position).
• To identify all candidate key(s), identify the attribute (or group of attributes) that uniquely identifies each tuple in this relation.
© Pearson Education Limited 1995, 2005
ISOM
Example - Identifying Primary Key for StaffBranch Relation
• All attributes that are not part of a candidate key should be functionally dependent on the key.
• The only candidate key and therefore primary key for StaffBranch relation, is staffNo, as all other attributes of the relation are functionally dependent on staffNo.
© Pearson Education Limited 1995, 2005
ISOM
The Process of Normalization
© Pearson Education Limited 1995, 2005
ISOM
In-class Exercise – Normalize this:
ISOM
The Normalization Tree
Pno, Pname, Eno, Ename, Jclass, Chghr, Hrsbilled, Leader
Eno, Ename, Jclass, Chghr Pno, Pname, Eno*, Hrsbilled, Leader
Pno*, Eno*, Hrsbilled
Pno, Pname, Leader
Eno, Ename, Jclass*
Jclass, Chghr
Eno Ename, Jclass, Chghr
PnoPname,Leader
Jclass Chghr
ISOM
Boyce-Codd Normal Form (BCNF)
• Update anomalies occur in an 3NF relation R ifR has multiple candidate keys,Those candidate keys are composite, andThe candidate keys are overlapped.
Computer-Lab (SID, Account, Class, Hours)
• A relation R is in BCNF iff every determinant is a candidate key.
ISOM
Higher Normal Forms
• Fourth Normal FormMultivalued Dependencies (Fagin 1977)
• Fifth Normal FormJoin Dependencies (Fagin 1979)
• Other Dependencies Inclusion Dependencies (Casanova 1981)Template Dependencies (Sadri 1982)Domain-Key Normal Form (Fagin 1981)