6th database design

7/30/2019 6th Database Design

1/17

Database Design

Requirements Analysis

user needs; what must database do?

Conceptual Design high level description (often done with ER model)

Logical Design translate ER into DBMS data model(Relational model)

(NOW)Schema Refinement

consistency,normalization

Physical Design- indexes, disk layout

Security Design- who accesses what

Good Database Design

no redundancy ofFACT (!)

no inconsistency no insertion, deletion or update anomalies

no information loss

no dependency loss

Informal Design Guidelines for Relational Databases

1. Semantics of the Relation Attributes2. Redundant Information in Tuples and Update Anomalies

3. Null Values in Tuples

4. Spurious Tuples

1:Semantics of the Relation Attributes

GUIDELINE 1: Informally, each tuple in a relation should represent one entity orrelationship instance. (Applies to individual relations and their attributes).

o Attributes of different entities (EMPLOYEEs, DEPARTMENTs,

PROJECTs) should not be mixed in the same relation

o Only foreign keys should be used to refer to other entities

o Entity and relationship attributes should be kept apart as much as possible.

1


2/17

Design a schema that can be explained easily relation by relation. The semantics of

attributes should be easy to interpret.

2:Redundant Information in Tuples and Update Anomalies

Information is stored redundantlyo Wastes storage

o Causes problems with update anomalies

Insertion anomalies

Deletion anomalies

Modification anomalies

Consider the relation:

EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)

Insertion anomalies

Cannot insert a project unless an employee is assigned to it.

Deletion anomalies

a. When a project is deleted, it will result in deleting all the employees who

work on that project.

b. Alternately, if an employee is the sole employee on a project, deleting that

employee would result in deleting the corresponding project.

Modification anomalies

Changing the name of project number P1 from Billing to Customer-Accounting

may cause this update to be made for all 100 employees working on project P1.

GUIDELINE 2:

Design a schema that does not suffer from the insertion, deletion and

update anomalies.

If there are any anomalies present, then note them so that applications can

be made to take them into account.

3:Null Values in Tuples

GUIDELINE 3:

Relations should be designed such that their tuples will have as few NULL

values as possible

Attributes that are NULL frequently could be placed in separate relations

(with the primary key)

Reasons for nulls:

2


3/17

Attribute not applicable or invalid

Attribute value unknown (may exist)

Value known to exist, but unavailable

4:Spurious Tuples

Bad designs for a relational database may result in erroneous results for certain

JOIN operations

The "lossless join" property is used to guarantee meaningful results for join

operations

GUIDELINE 4:

The relations should be designed to satisfy the lossless join condition.

No spurious tuples should be generated by doing a natural-join of any

relations.

Normalization:

The process of decomposing unsatisfactory "bad" relations by breakingup their attributes into smaller relations

Normalization is used to design a set of relation schemas that is optimal from thepoint of view of database updating

Normalization starts from a universal relation schema

1NFAttributes must be atomic:

they can be chars, ints, strings they cant be

1. _ tuples

2. _ sets3. _ relations

4. _ composite

5. _ multivaluedConsidered to be part of the definition of relation

Unnormalised Relations

Name PaperListSWETHA EENADU, HINDU,DC

PRASANNA EENADU,VAARTHA,HINDU

This is not ideal. Each person is associated with an unspecified

number of papers. The items in thePaperListcolumn do not have a consistent form. Generally, RDBMS cant cope with relations like this. Each

entry in a table needs to have a single data item in it.

3


4/17

This is an unnormalisedrelation.

All RDBMS require relations notto be like this - not to havemultiple values in any

column (i.e. no repeating groups)

Name PaperList

SWETHA EENADUSWETHA HINDU

SWETHA DC

PRASANNA HINDU

PRASANNA EENADU

PRASANNA VAARTHA

This clearly contains the same information. And it has the property that we sought. It is inFirst Normal

Form (1NF).

A relation is in 1NF if no entry consists of more than one value(i.e. does not have repeating groups)

So this will be the first requirement in designing our databases:

Obtaining 1NF

1NF is obtained by Splitting composite attributes splitting the relation and propagating the primary key to remove multi valued

attributes

There are three approaches to removing repeating groups fromunnormalized tables:

1. Removes the repeating groups by entering appropriate data in the empty

columns of rows containing the repeating data.

2. Removes the repeating group by placing the repeating data, along with a copyof the original key attribute(s), in a separate relation. A primary key is identified for thenew relation.

3.By finding maximum possible values for the multi valued attribute and adding

that many attributes to the relation

4


5/17

Example:-

The DEPARTMENT schema is not in 1NF because DLOCATION is not a single

valued attribute.

The relation should be split into two relations. A new relation

DEPT_LOCATIONS is created and the primary key of DEPARTMENT,

DNUMBER, becomes an attribute of the new relation. The primary key of this

relation is {DNUMBER, DLOCATION}

Alternative solution: Leave the DLOCATION attribute as it is. Instead, we have

one tuple for each location of a DEPARTMENT. Then, the relation is in 1NF, but

redundancy exists.

5


6/17

A super key of a relation schema R = {A1, A2, ...., An} is a set of attributes S

subset-ofR with the property that no two tuples t1 and t2 in any legal relation

state r of R will have t1[S] = t2[S]

A key K is a super key with the additional property that removal of any attribute

from K will cause K not to be a super key any more.

If a relation schema has more than one key, each is called a candidate key.

One of the candidate keys is arbitrarily designated to be the primary key,

and the others are called secondary keys.

A Prime attribute must be a member ofsome candidate key

A Nonprime attribute is not a prime attributethat is, it is not a member of any

candidate key

Functional Dependencies (FDs) Definition of FD

Inference Rules for FDs

Equivalence of Sets of FDs

Minimal Sets of FDs

Functional dependency describes the relationship between attributes in a relation.

For example, if A and B are attributes of relation R, and B is

functionally dependent on A ( denoted A B), if each value ofA is associated with exactly one value of B. ( A and B may each

consist of one or more attributes.)

Trivial functional dependency means that the right-hand side is a subset ( not

necessarily a proper subset) of the left- hand side.

Main characteristics of functional dependencies in normalization

6


7/17

Have a one-to-one relationship between attribute(s) on the left- and right- hand

side of a dependency;

hold for all time; are nontrivial.

A set of all functional dependencies that are implied by a given

set of functional dependencies X is called closure of X, writtenX+. A set of inference rule is needed to compute X+ from X.

Inference Rules (RATPUP)

1. Relfexivity: If B is a subset of A, them A B

2. Augmentation:If A B, then A, C B,C

3. Transitivity: If A B and B C, then A C

4. Projection: If A B,C then A B and A C5. Union: If A B and A C, then A B,C

6. psudotransitivity: If A B and C D, then A,C B,

Example:-

F = {SSN {ENAME, BDATE, ADDRESS, DNUMBER},

DNUMBER {DNAME, DMGRSSN}}

From F of above example we can infer:

SSN {DNAME, DMGRSSN},

SSN SSN,

DNUMBER DNAME

Full functional dependency indicates that if A and B are

attributes of a relation, B is fully functionally dependent on A if B is functionally

dependent on A, but not on any proper subset of A.

A functional dependency AB is partially dependent if there is some attributes that canbe removed from A and the dependency still holds.

7


8/17

2NF

Second normal form (2NF)is a relation that is in first normal form and every non--

key attribute is fully functionally dependent on the key.

The normalization of 1NF relations to 2NF involves the removal of partial

dependencies. If a partial dependency exists, we remove the functional dependent

attributes from the relation by placing them in a new relation along with

a copy of their determinant.

Obtaining 2NF

_ If a nonprime attribute is dependent only on a proper part of a key, then we take thegiven attribute as well as the key attributes that determine it and move them all to a new

relation

_ We can bundle all attributes determined by the same subset of the key as a unit

Transitive dependency

A condition where A, B, and C are attributes of a relation such that

if A B and B C, then C is transitively dependent on A via B

(provided that A is not functionally dependent on B or C).

Third normal form (3NF)A relation that is in first and second normal form, and in which

no non-primary-key attribute is transitively dependent on the

primary key.

The normalization of 2NF relations to 3NF involves the removal of transitive

dependencies by placing the attribute(s) in a new relation along with a copy of the

determinant

3NFR is in 3NF if and only if

if X A then

_ X is a superkey of R, or

_ A is a key attribute of R

3NF: Alternative Definition

R is in 3NF if every nonprime attribute of R is

fully functionally dependent on every key of R, and

8


9/17

non transitively dependent on every key of R.

Obtaining 3NF Split off the attributes in the FD that causes trouble and move them, so there are two

relations for each such FD

The determinant of the FD remains in the original relation

Boyce-Codd normal form (BCNF)

A relation is in BCNF, if and only if, every determinant is a key.

The difference between 3NF and BCNF is that for a functional

dependency A B, 3NF allows this dependency in a relation

if B is a key attribute and A is not a super key,

9


10/17

where as BCNF insists that for this dependency to remain in a relation, A must be a super

key.

BCNFR is in Boyce-Codd Normal Form iff

if X A then X is a superkey of R

more restrictive than 3NF , preferablehas fewer anomalies

Obtaining BCNF As usual, split the schema to move the attributes of the troublesome FD to another

relation, leaving its determinant in the original so they remain connected

10


11/17

Decomposition:

The process of decomposing the universal relation schema R into a

set of relation schemas D = {R1,R2, , Rm} that will become therelational database schema by using the functional dependencies.

Attribute preservation condition:

Each attribute in R will appear in at least one relation schema Ri in

the decomposition so that no attributes are lost. Dependency Preservation Property of a Decomposition:

Definition: Given a set of dependencies F on R, the projection of F on Ri,

denoted by pRi(F) where Ri is a subset of R, is the set of dependencies X

Y in F+ such that the attributes in X Y are all contained in Ri.

Hence, the projection of F on each relation schema Ri in thedecomposition D is the set of functional dependencies in F+, the closure of

F, such that all their left- and right-hand-side attributes are in Ri.

Dependency Preservation Property:

A decomposition D = {R1, R2, ..., Rm} of R is dependency-

preserving with respect to F if the union of the projections of F oneach Ri in D is equivalent to F; that is

((R1(F)) . . . (Rm(F)))+ = F+

Lossless (Non-additive) Join Property of a Decomposition:

Definition: Lossless join property: a decomposition D = {R1, R2, ..., Rm}of R has the lossless (nonadditive) join property with respect to the set

11


12/17

of dependencies F on R if, forevery relation state r of R that satisfies F,

the following holds, where * is the natural join of all the relations in D:

* ( R1(r), ..., Rm(r)) = r

Multi-valued dependency (MVD)represents a dependency between attributes (for example, A,

B and C) in a relation, such that for each value of A there is a

set of values for B and a set of value for C. However, the set of

values for B and C are independent of each other.

A multi-valued dependency can be further defined as being

trivial or nontrivial. A MVD A > B in relation R is defined

as being trivial if

B is a subset of A

or A U B = R

A MVD is defined as being nontrivial if neither of the above twoconditions is satisfied.

Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains

no nontrivial multi-valued dependencies.

A relation schemaR is in 4NF with respect to a set of dependenciesF(that

includes functional dependencies and multivalued dependencies) if, for every

nontrivialmultivalued dependencyX>> YinF+,Xis a superkey for R.

Definition:

Ajoin dependency (JD), denoted by JD(R1,R2, ...,Rn), specified on relation

schemaR, specifies a constraint on the states rofR.

The constraint states that every legal state rofR should have a non-

additive join decomposition intoR1,R2, ...,Rn; that is, for every such r

we have

* (R1(r), R2(r), ..., Rn(r)) = r

Note: an MVD is a special case of a JD where n = 2.

A join dependency JD(R1,R2, ...,Rn), specified on relation schemaR, is a trivial

JD if one of the relation schemasRi in JD(R1,R2, ...,Rn) is equal toR.

Fifth normal form (5NF)

Definition:

A relation schemaR is in fifth normal form (5NF) (orProject-Join Normal

Form (PJNF)) with respect to a setFof functional, multivalued, and joindependencies if,

12


13/17

for every nontrivial join dependency JD(R1,R2, ...,Rn) inF+ (that is,

implied byF),

everyRi is a superkey ofR.

Each normal form is strictly stronger than the previous one

Every 2NF relation is in 1NF


Every BCNF relation is in 3NF

Every 4NF relation is in BCNF

13


14/17


Diagrammatic notation of normal forms:-

Normalization

A technique for producing a set of relations with desirable

properties, given the data requirements of an enterprise

UNF is a table that contains one or more repeating groups

1NF is a relation in which the intersection of each row and column contains oneand only one value

2NF is a relation that is in 1NF and every non-primary-key attribute is fullyfunctionally dependent on the primary key.

3NF is a relation that is in 1NF, 2NF in which no non-primary-key attribute istransitively dependent on the primary key

BCNF is a relation in which every determinant is a candidate key

4NF is a relation that is in BCNF and contains no trivial multi-valueddependency

5NF is a relation that contains no join dependency

14


15/17

DBMS ARCHITECTURES:-

Centralized DBMS:

Combines everything into single system including- DBMS

software, hardware, application programs, and user interfaceprocessing software.

User can still connect through a remote terminal however, allprocessing is done at centralized site.

15


16/17

Basic 2-tier Client-Server Architectures

Specialized Servers with Specialized functions

Print server

File server

DBMS server Web server

Email server

Clients can access the specialized servers as needed

Clients

Provide appropriate interfaces through a client software module to access andutilize the various server resources.

Clients may be diskless machines or PCs or Workstations with disks with only the

client software installed.

16


17/17

Connected to the servers via some form of a network.

(LAN: local area network, wireless network, etc.)

DBMS Server

Provides database query and transaction services to the clients

Relational DBMS servers are often called SQL servers, query servers, or

transaction servers Applications running on clients utilize an Application Program Interface (API) to

access server databases via standard interface such as:

ODBC: Open Database Connectivity standard

JDBC: for Java programming access

Client and server must install appropriate client module and server module

software for ODBC or JDBC1. A client program may connect to several DBMSs, sometimes

called the data sources.

2. In general, data sources can be files or other non-DBMS softwarethat manages data.

3. Other variations of clients are possible: e.g., in some objectDBMSs, more functionality is transferred to clients including data

dictionary functions, optimization and recovery across multipleservers, etc.

Three Tier Client-Server Architecture

Common for Web applications

Intermediate Layer called Application Server or Web Server:

Stores the web connectivity software and the business logic part of theapplication used to access the corresponding data from the database server

Acts like a conduit for sending partially processed data between the

database server and the client.

Three-tier Architecture Can Enhance Security:

Database server only accessible via middle tier

Clients cannot directly access database server

17

6th database design

Documents