6th database design
TRANSCRIPT
-
7/30/2019 6th Database Design
1/17
Database Design
Requirements Analysis
user needs; what must database do?
Conceptual Design high level description (often done with ER model)
Logical Design translate ER into DBMS data model(Relational model)
(NOW)Schema Refinement
consistency,normalization
Physical Design- indexes, disk layout
Security Design- who accesses what
Good Database Design
no redundancy ofFACT (!)
no inconsistency no insertion, deletion or update anomalies
no information loss
no dependency loss
Informal Design Guidelines for Relational Databases
1. Semantics of the Relation Attributes2. Redundant Information in Tuples and Update Anomalies
3. Null Values in Tuples
4. Spurious Tuples
1:Semantics of the Relation Attributes
GUIDELINE 1: Informally, each tuple in a relation should represent one entity orrelationship instance. (Applies to individual relations and their attributes).
o Attributes of different entities (EMPLOYEEs, DEPARTMENTs,
PROJECTs) should not be mixed in the same relation
o Only foreign keys should be used to refer to other entities
o Entity and relationship attributes should be kept apart as much as possible.
1
-
7/30/2019 6th Database Design
2/17
Design a schema that can be explained easily relation by relation. The semantics of
attributes should be easy to interpret.
2:Redundant Information in Tuples and Update Anomalies
Information is stored redundantlyo Wastes storage
o Causes problems with update anomalies
Insertion anomalies
Deletion anomalies
Modification anomalies
Consider the relation:
EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)
Insertion anomalies
Cannot insert a project unless an employee is assigned to it.
Deletion anomalies
a. When a project is deleted, it will result in deleting all the employees who
work on that project.
b. Alternately, if an employee is the sole employee on a project, deleting that
employee would result in deleting the corresponding project.
Modification anomalies
Changing the name of project number P1 from Billing to Customer-Accounting
may cause this update to be made for all 100 employees working on project P1.
GUIDELINE 2:
Design a schema that does not suffer from the insertion, deletion and
update anomalies.
If there are any anomalies present, then note them so that applications can
be made to take them into account.
3:Null Values in Tuples
GUIDELINE 3:
Relations should be designed such that their tuples will have as few NULL
values as possible
Attributes that are NULL frequently could be placed in separate relations
(with the primary key)
Reasons for nulls:
2
-
7/30/2019 6th Database Design
3/17
Attribute not applicable or invalid
Attribute value unknown (may exist)
Value known to exist, but unavailable
4:Spurious Tuples
Bad designs for a relational database may result in erroneous results for certain
JOIN operations
The "lossless join" property is used to guarantee meaningful results for join
operations
GUIDELINE 4:
The relations should be designed to satisfy the lossless join condition.
No spurious tuples should be generated by doing a natural-join of any
relations.
Normalization:
The process of decomposing unsatisfactory "bad" relations by breakingup their attributes into smaller relations
Normalization is used to design a set of relation schemas that is optimal from thepoint of view of database updating
Normalization starts from a universal relation schema
1NFAttributes must be atomic:
they can be chars, ints, strings they cant be
1. _ tuples
2. _ sets3. _ relations
4. _ composite
5. _ multivaluedConsidered to be part of the definition of relation
Unnormalised Relations
Name PaperListSWETHA EENADU, HINDU,DC
PRASANNA EENADU,VAARTHA,HINDU
This is not ideal. Each person is associated with an unspecified
number of papers. The items in thePaperListcolumn do not have a consistent form. Generally, RDBMS cant cope with relations like this. Each
entry in a table needs to have a single data item in it.
3
-
7/30/2019 6th Database Design
4/17
This is an unnormalisedrelation.
All RDBMS require relations notto be like this - not to havemultiple values in any
column (i.e. no repeating groups)
Name PaperList
SWETHA EENADUSWETHA HINDU
SWETHA DC
PRASANNA HINDU
PRASANNA EENADU
PRASANNA VAARTHA
This clearly contains the same information. And it has the property that we sought. It is inFirst Normal
Form (1NF).
A relation is in 1NF if no entry consists of more than one value(i.e. does not have repeating groups)
So this will be the first requirement in designing our databases:
Obtaining 1NF
1NF is obtained by Splitting composite attributes splitting the relation and propagating the primary key to remove multi valued
attributes
There are three approaches to removing repeating groups fromunnormalized tables:
1. Removes the repeating groups by entering appropriate data in the empty
columns of rows containing the repeating data.
2. Removes the repeating group by placing the repeating data, along with a copyof the original key attribute(s), in a separate relation. A primary key is identified for thenew relation.
3.By finding maximum possible values for the multi valued attribute and adding
that many attributes to the relation
4
-
7/30/2019 6th Database Design
5/17
Example:-
The DEPARTMENT schema is not in 1NF because DLOCATION is not a single
valued attribute.
The relation should be split into two relations. A new relation
DEPT_LOCATIONS is created and the primary key of DEPARTMENT,
DNUMBER, becomes an attribute of the new relation. The primary key of this
relation is {DNUMBER, DLOCATION}
Alternative solution: Leave the DLOCATION attribute as it is. Instead, we have
one tuple for each location of a DEPARTMENT. Then, the relation is in 1NF, but
redundancy exists.
5
-
7/30/2019 6th Database Design
6/17
A super key of a relation schema R = {A1, A2, ...., An} is a set of attributes S
subset-ofR with the property that no two tuples t1 and t2 in any legal relation
state r of R will have t1[S] = t2[S]
A key K is a super key with the additional property that removal of any attribute
from K will cause K not to be a super key any more.
If a relation schema has more than one key, each is called a candidate key.
One of the candidate keys is arbitrarily designated to be the primary key,
and the others are called secondary keys.
A Prime attribute must be a member ofsome candidate key
A Nonprime attribute is not a prime attributethat is, it is not a member of any
candidate key
Functional Dependencies (FDs) Definition of FD
Inference Rules for FDs
Equivalence of Sets of FDs
Minimal Sets of FDs
Functional dependency describes the relationship between attributes in a relation.
For example, if A and B are attributes of relation R, and B is
functionally dependent on A ( denoted A B), if each value ofA is associated with exactly one value of B. ( A and B may each
consist of one or more attributes.)
Trivial functional dependency means that the right-hand side is a subset ( not
necessarily a proper subset) of the left- hand side.
Main characteristics of functional dependencies in normalization
6
-
7/30/2019 6th Database Design
7/17
Have a one-to-one relationship between attribute(s) on the left- and right- hand
side of a dependency;
hold for all time; are nontrivial.
A set of all functional dependencies that are implied by a given
set of functional dependencies X is called closure of X, writtenX+. A set of inference rule is needed to compute X+ from X.
Inference Rules (RATPUP)
1. Relfexivity: If B is a subset of A, them A B
2. Augmentation:If A B, then A, C B,C
3. Transitivity: If A B and B C, then A C
4. Projection: If A B,C then A B and A C5. Union: If A B and A C, then A B,C
6. psudotransitivity: If A B and C D, then A,C B,
Example:-
F = {SSN {ENAME, BDATE, ADDRESS, DNUMBER},
DNUMBER {DNAME, DMGRSSN}}
From F of above example we can infer:
SSN {DNAME, DMGRSSN},
SSN SSN,
DNUMBER DNAME
Full functional dependency indicates that if A and B are
attributes of a relation, B is fully functionally dependent on A if B is functionally
dependent on A, but not on any proper subset of A.
A functional dependency AB is partially dependent if there is some attributes that canbe removed from A and the dependency still holds.
7
-
7/30/2019 6th Database Design
8/17
2NF
Second normal form (2NF)is a relation that is in first normal form and every non--
key attribute is fully functionally dependent on the key.
The normalization of 1NF relations to 2NF involves the removal of partial
dependencies. If a partial dependency exists, we remove the functional dependent
attributes from the relation by placing them in a new relation along with
a copy of their determinant.
Obtaining 2NF
_ If a nonprime attribute is dependent only on a proper part of a key, then we take thegiven attribute as well as the key attributes that determine it and move them all to a new
relation
_ We can bundle all attributes determined by the same subset of the key as a unit
Transitive dependency
A condition where A, B, and C are attributes of a relation such that
if A B and B C, then C is transitively dependent on A via B
(provided that A is not functionally dependent on B or C).
Third normal form (3NF)A relation that is in first and second normal form, and in which
no non-primary-key attribute is transitively dependent on the
primary key.
The normalization of 2NF relations to 3NF involves the removal of transitive
dependencies by placing the attribute(s) in a new relation along with a copy of the
determinant
3NFR is in 3NF if and only if
if X A then
_ X is a superkey of R, or
_ A is a key attribute of R
3NF: Alternative Definition
R is in 3NF if every nonprime attribute of R is
fully functionally dependent on every key of R, and
8
-
7/30/2019 6th Database Design
9/17
non transitively dependent on every key of R.
Obtaining 3NF Split off the attributes in the FD that causes trouble and move them, so there are two
relations for each such FD
The determinant of the FD remains in the original relation
Boyce-Codd normal form (BCNF)
A relation is in BCNF, if and only if, every determinant is a key.
The difference between 3NF and BCNF is that for a functional
dependency A B, 3NF allows this dependency in a relation
if B is a key attribute and A is not a super key,
9
-
7/30/2019 6th Database Design
10/17
where as BCNF insists that for this dependency to remain in a relation, A must be a super
key.
BCNFR is in Boyce-Codd Normal Form iff
if X A then X is a superkey of R
more restrictive than 3NF , preferablehas fewer anomalies
Obtaining BCNF As usual, split the schema to move the attributes of the troublesome FD to another
relation, leaving its determinant in the original so they remain connected
10
-
7/30/2019 6th Database Design
11/17
Decomposition:
The process of decomposing the universal relation schema R into a
set of relation schemas D = {R1,R2, , Rm} that will become therelational database schema by using the functional dependencies.
Attribute preservation condition:
Each attribute in R will appear in at least one relation schema Ri in
the decomposition so that no attributes are lost. Dependency Preservation Property of a Decomposition:
Definition: Given a set of dependencies F on R, the projection of F on Ri,
denoted by pRi(F) where Ri is a subset of R, is the set of dependencies X
Y in F+ such that the attributes in X Y are all contained in Ri.
Hence, the projection of F on each relation schema Ri in thedecomposition D is the set of functional dependencies in F+, the closure of
F, such that all their left- and right-hand-side attributes are in Ri.
Dependency Preservation Property:
A decomposition D = {R1, R2, ..., Rm} of R is dependency-
preserving with respect to F if the union of the projections of F oneach Ri in D is equivalent to F; that is
((R1(F)) . . . (Rm(F)))+ = F+
Lossless (Non-additive) Join Property of a Decomposition:
Definition: Lossless join property: a decomposition D = {R1, R2, ..., Rm}of R has the lossless (nonadditive) join property with respect to the set
11
-
7/30/2019 6th Database Design
12/17
of dependencies F on R if, forevery relation state r of R that satisfies F,
the following holds, where * is the natural join of all the relations in D:
* ( R1(r), ..., Rm(r)) = r
Multi-valued dependency (MVD)represents a dependency between attributes (for example, A,
B and C) in a relation, such that for each value of A there is a
set of values for B and a set of value for C. However, the set of
values for B and C are independent of each other.
A multi-valued dependency can be further defined as being
trivial or nontrivial. A MVD A > B in relation R is defined
as being trivial if
B is a subset of A
or A U B = R
A MVD is defined as being nontrivial if neither of the above twoconditions is satisfied.
Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains
no nontrivial multi-valued dependencies.
A relation schemaR is in 4NF with respect to a set of dependenciesF(that
includes functional dependencies and multivalued dependencies) if, for every
nontrivialmultivalued dependencyX>> YinF+,Xis a superkey for R.
Definition:
Ajoin dependency (JD), denoted by JD(R1,R2, ...,Rn), specified on relation
schemaR, specifies a constraint on the states rofR.
The constraint states that every legal state rofR should have a non-
additive join decomposition intoR1,R2, ...,Rn; that is, for every such r
we have
* (R1(r), R2(r), ..., Rn(r)) = r
Note: an MVD is a special case of a JD where n = 2.
A join dependency JD(R1,R2, ...,Rn), specified on relation schemaR, is a trivial
JD if one of the relation schemasRi in JD(R1,R2, ...,Rn) is equal toR.
Fifth normal form (5NF)
Definition:
A relation schemaR is in fifth normal form (5NF) (orProject-Join Normal
Form (PJNF)) with respect to a setFof functional, multivalued, and joindependencies if,
12
-
7/30/2019 6th Database Design
13/17
for every nontrivial join dependency JD(R1,R2, ...,Rn) inF+ (that is,
implied byF),
everyRi is a superkey ofR.
Each normal form is strictly stronger than the previous one
Every 2NF relation is in 1NF
Every 3NF relation is in 2NF
Every BCNF relation is in 3NF
Every 4NF relation is in BCNF
13
-
7/30/2019 6th Database Design
14/17
Every 5NF relation is in 4NF
Diagrammatic notation of normal forms:-
Normalization
A technique for producing a set of relations with desirable
properties, given the data requirements of an enterprise
UNF is a table that contains one or more repeating groups
1NF is a relation in which the intersection of each row and column contains oneand only one value
2NF is a relation that is in 1NF and every non-primary-key attribute is fullyfunctionally dependent on the primary key.
3NF is a relation that is in 1NF, 2NF in which no non-primary-key attribute istransitively dependent on the primary key
BCNF is a relation in which every determinant is a candidate key
4NF is a relation that is in BCNF and contains no trivial multi-valueddependency
5NF is a relation that contains no join dependency
14
-
7/30/2019 6th Database Design
15/17
DBMS ARCHITECTURES:-
Centralized DBMS:
Combines everything into single system including- DBMS
software, hardware, application programs, and user interfaceprocessing software.
User can still connect through a remote terminal however, allprocessing is done at centralized site.
15
-
7/30/2019 6th Database Design
16/17
Basic 2-tier Client-Server Architectures
Specialized Servers with Specialized functions
Print server
File server
DBMS server Web server
Email server
Clients can access the specialized servers as needed
Clients
Provide appropriate interfaces through a client software module to access andutilize the various server resources.
Clients may be diskless machines or PCs or Workstations with disks with only the
client software installed.
16
-
7/30/2019 6th Database Design
17/17
Connected to the servers via some form of a network.
(LAN: local area network, wireless network, etc.)
DBMS Server
Provides database query and transaction services to the clients
Relational DBMS servers are often called SQL servers, query servers, or
transaction servers Applications running on clients utilize an Application Program Interface (API) to
access server databases via standard interface such as:
ODBC: Open Database Connectivity standard
JDBC: for Java programming access
Client and server must install appropriate client module and server module
software for ODBC or JDBC1. A client program may connect to several DBMSs, sometimes
called the data sources.
2. In general, data sources can be files or other non-DBMS softwarethat manages data.
3. Other variations of clients are possible: e.g., in some objectDBMSs, more functionality is transferred to clients including data
dictionary functions, optimization and recovery across multipleservers, etc.
Three Tier Client-Server Architecture
Common for Web applications
Intermediate Layer called Application Server or Web Server:
Stores the web connectivity software and the business logic part of theapplication used to access the corresponding data from the database server
Acts like a conduit for sending partially processed data between the
database server and the client.
Three-tier Architecture Can Enhance Security:
Database server only accessible via middle tier
Clients cannot directly access database server
17