6th database design

Upload: gautam-dematti

Post on 14-Apr-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 6th Database Design

    1/17

    Database Design

    Requirements Analysis

    user needs; what must database do?

    Conceptual Design high level description (often done with ER model)

    Logical Design translate ER into DBMS data model(Relational model)

    (NOW)Schema Refinement

    consistency,normalization

    Physical Design- indexes, disk layout

    Security Design- who accesses what

    Good Database Design

    no redundancy ofFACT (!)

    no inconsistency no insertion, deletion or update anomalies

    no information loss

    no dependency loss

    Informal Design Guidelines for Relational Databases

    1. Semantics of the Relation Attributes2. Redundant Information in Tuples and Update Anomalies

    3. Null Values in Tuples

    4. Spurious Tuples

    1:Semantics of the Relation Attributes

    GUIDELINE 1: Informally, each tuple in a relation should represent one entity orrelationship instance. (Applies to individual relations and their attributes).

    o Attributes of different entities (EMPLOYEEs, DEPARTMENTs,

    PROJECTs) should not be mixed in the same relation

    o Only foreign keys should be used to refer to other entities

    o Entity and relationship attributes should be kept apart as much as possible.

    1

  • 7/30/2019 6th Database Design

    2/17

    Design a schema that can be explained easily relation by relation. The semantics of

    attributes should be easy to interpret.

    2:Redundant Information in Tuples and Update Anomalies

    Information is stored redundantlyo Wastes storage

    o Causes problems with update anomalies

    Insertion anomalies

    Deletion anomalies

    Modification anomalies

    Consider the relation:

    EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)

    Insertion anomalies

    Cannot insert a project unless an employee is assigned to it.

    Deletion anomalies

    a. When a project is deleted, it will result in deleting all the employees who

    work on that project.

    b. Alternately, if an employee is the sole employee on a project, deleting that

    employee would result in deleting the corresponding project.

    Modification anomalies

    Changing the name of project number P1 from Billing to Customer-Accounting

    may cause this update to be made for all 100 employees working on project P1.

    GUIDELINE 2:

    Design a schema that does not suffer from the insertion, deletion and

    update anomalies.

    If there are any anomalies present, then note them so that applications can

    be made to take them into account.

    3:Null Values in Tuples

    GUIDELINE 3:

    Relations should be designed such that their tuples will have as few NULL

    values as possible

    Attributes that are NULL frequently could be placed in separate relations

    (with the primary key)

    Reasons for nulls:

    2

  • 7/30/2019 6th Database Design

    3/17

    Attribute not applicable or invalid

    Attribute value unknown (may exist)

    Value known to exist, but unavailable

    4:Spurious Tuples

    Bad designs for a relational database may result in erroneous results for certain

    JOIN operations

    The "lossless join" property is used to guarantee meaningful results for join

    operations

    GUIDELINE 4:

    The relations should be designed to satisfy the lossless join condition.

    No spurious tuples should be generated by doing a natural-join of any

    relations.

    Normalization:

    The process of decomposing unsatisfactory "bad" relations by breakingup their attributes into smaller relations

    Normalization is used to design a set of relation schemas that is optimal from thepoint of view of database updating

    Normalization starts from a universal relation schema

    1NFAttributes must be atomic:

    they can be chars, ints, strings they cant be

    1. _ tuples

    2. _ sets3. _ relations

    4. _ composite

    5. _ multivaluedConsidered to be part of the definition of relation

    Unnormalised Relations

    Name PaperListSWETHA EENADU, HINDU,DC

    PRASANNA EENADU,VAARTHA,HINDU

    This is not ideal. Each person is associated with an unspecified

    number of papers. The items in thePaperListcolumn do not have a consistent form. Generally, RDBMS cant cope with relations like this. Each

    entry in a table needs to have a single data item in it.

    3

  • 7/30/2019 6th Database Design

    4/17

    This is an unnormalisedrelation.

    All RDBMS require relations notto be like this - not to havemultiple values in any

    column (i.e. no repeating groups)

    Name PaperList

    SWETHA EENADUSWETHA HINDU

    SWETHA DC

    PRASANNA HINDU

    PRASANNA EENADU

    PRASANNA VAARTHA

    This clearly contains the same information. And it has the property that we sought. It is inFirst Normal

    Form (1NF).

    A relation is in 1NF if no entry consists of more than one value(i.e. does not have repeating groups)

    So this will be the first requirement in designing our databases:

    Obtaining 1NF

    1NF is obtained by Splitting composite attributes splitting the relation and propagating the primary key to remove multi valued

    attributes

    There are three approaches to removing repeating groups fromunnormalized tables:

    1. Removes the repeating groups by entering appropriate data in the empty

    columns of rows containing the repeating data.

    2. Removes the repeating group by placing the repeating data, along with a copyof the original key attribute(s), in a separate relation. A primary key is identified for thenew relation.

    3.By finding maximum possible values for the multi valued attribute and adding

    that many attributes to the relation

    4

  • 7/30/2019 6th Database Design

    5/17

    Example:-

    The DEPARTMENT schema is not in 1NF because DLOCATION is not a single

    valued attribute.

    The relation should be split into two relations. A new relation

    DEPT_LOCATIONS is created and the primary key of DEPARTMENT,

    DNUMBER, becomes an attribute of the new relation. The primary key of this

    relation is {DNUMBER, DLOCATION}

    Alternative solution: Leave the DLOCATION attribute as it is. Instead, we have

    one tuple for each location of a DEPARTMENT. Then, the relation is in 1NF, but

    redundancy exists.

    5

  • 7/30/2019 6th Database Design

    6/17

    A super key of a relation schema R = {A1, A2, ...., An} is a set of attributes S

    subset-ofR with the property that no two tuples t1 and t2 in any legal relation

    state r of R will have t1[S] = t2[S]

    A key K is a super key with the additional property that removal of any attribute

    from K will cause K not to be a super key any more.

    If a relation schema has more than one key, each is called a candidate key.

    One of the candidate keys is arbitrarily designated to be the primary key,

    and the others are called secondary keys.

    A Prime attribute must be a member ofsome candidate key

    A Nonprime attribute is not a prime attributethat is, it is not a member of any

    candidate key

    Functional Dependencies (FDs) Definition of FD

    Inference Rules for FDs

    Equivalence of Sets of FDs

    Minimal Sets of FDs

    Functional dependency describes the relationship between attributes in a relation.

    For example, if A and B are attributes of relation R, and B is

    functionally dependent on A ( denoted A B), if each value ofA is associated with exactly one value of B. ( A and B may each

    consist of one or more attributes.)

    Trivial functional dependency means that the right-hand side is a subset ( not

    necessarily a proper subset) of the left- hand side.

    Main characteristics of functional dependencies in normalization

    6

  • 7/30/2019 6th Database Design

    7/17

    Have a one-to-one relationship between attribute(s) on the left- and right- hand

    side of a dependency;

    hold for all time; are nontrivial.

    A set of all functional dependencies that are implied by a given

    set of functional dependencies X is called closure of X, writtenX+. A set of inference rule is needed to compute X+ from X.

    Inference Rules (RATPUP)

    1. Relfexivity: If B is a subset of A, them A B

    2. Augmentation:If A B, then A, C B,C

    3. Transitivity: If A B and B C, then A C

    4. Projection: If A B,C then A B and A C5. Union: If A B and A C, then A B,C

    6. psudotransitivity: If A B and C D, then A,C B,

    Example:-

    F = {SSN {ENAME, BDATE, ADDRESS, DNUMBER},

    DNUMBER {DNAME, DMGRSSN}}

    From F of above example we can infer:

    SSN {DNAME, DMGRSSN},

    SSN SSN,

    DNUMBER DNAME

    Full functional dependency indicates that if A and B are

    attributes of a relation, B is fully functionally dependent on A if B is functionally

    dependent on A, but not on any proper subset of A.

    A functional dependency AB is partially dependent if there is some attributes that canbe removed from A and the dependency still holds.

    7

  • 7/30/2019 6th Database Design

    8/17

    2NF

    Second normal form (2NF)is a relation that is in first normal form and every non--

    key attribute is fully functionally dependent on the key.

    The normalization of 1NF relations to 2NF involves the removal of partial

    dependencies. If a partial dependency exists, we remove the functional dependent

    attributes from the relation by placing them in a new relation along with

    a copy of their determinant.

    Obtaining 2NF

    _ If a nonprime attribute is dependent only on a proper part of a key, then we take thegiven attribute as well as the key attributes that determine it and move them all to a new

    relation

    _ We can bundle all attributes determined by the same subset of the key as a unit

    Transitive dependency

    A condition where A, B, and C are attributes of a relation such that

    if A B and B C, then C is transitively dependent on A via B

    (provided that A is not functionally dependent on B or C).

    Third normal form (3NF)A relation that is in first and second normal form, and in which

    no non-primary-key attribute is transitively dependent on the

    primary key.

    The normalization of 2NF relations to 3NF involves the removal of transitive

    dependencies by placing the attribute(s) in a new relation along with a copy of the

    determinant

    3NFR is in 3NF if and only if

    if X A then

    _ X is a superkey of R, or

    _ A is a key attribute of R

    3NF: Alternative Definition

    R is in 3NF if every nonprime attribute of R is

    fully functionally dependent on every key of R, and

    8

  • 7/30/2019 6th Database Design

    9/17

    non transitively dependent on every key of R.

    Obtaining 3NF Split off the attributes in the FD that causes trouble and move them, so there are two

    relations for each such FD

    The determinant of the FD remains in the original relation

    Boyce-Codd normal form (BCNF)

    A relation is in BCNF, if and only if, every determinant is a key.

    The difference between 3NF and BCNF is that for a functional

    dependency A B, 3NF allows this dependency in a relation

    if B is a key attribute and A is not a super key,

    9

  • 7/30/2019 6th Database Design

    10/17

    where as BCNF insists that for this dependency to remain in a relation, A must be a super

    key.

    BCNFR is in Boyce-Codd Normal Form iff

    if X A then X is a superkey of R

    more restrictive than 3NF , preferablehas fewer anomalies

    Obtaining BCNF As usual, split the schema to move the attributes of the troublesome FD to another

    relation, leaving its determinant in the original so they remain connected

    10

  • 7/30/2019 6th Database Design

    11/17

    Decomposition:

    The process of decomposing the universal relation schema R into a

    set of relation schemas D = {R1,R2, , Rm} that will become therelational database schema by using the functional dependencies.

    Attribute preservation condition:

    Each attribute in R will appear in at least one relation schema Ri in

    the decomposition so that no attributes are lost. Dependency Preservation Property of a Decomposition:

    Definition: Given a set of dependencies F on R, the projection of F on Ri,

    denoted by pRi(F) where Ri is a subset of R, is the set of dependencies X

    Y in F+ such that the attributes in X Y are all contained in Ri.

    Hence, the projection of F on each relation schema Ri in thedecomposition D is the set of functional dependencies in F+, the closure of

    F, such that all their left- and right-hand-side attributes are in Ri.

    Dependency Preservation Property:

    A decomposition D = {R1, R2, ..., Rm} of R is dependency-

    preserving with respect to F if the union of the projections of F oneach Ri in D is equivalent to F; that is

    ((R1(F)) . . . (Rm(F)))+ = F+

    Lossless (Non-additive) Join Property of a Decomposition:

    Definition: Lossless join property: a decomposition D = {R1, R2, ..., Rm}of R has the lossless (nonadditive) join property with respect to the set

    11

  • 7/30/2019 6th Database Design

    12/17

    of dependencies F on R if, forevery relation state r of R that satisfies F,

    the following holds, where * is the natural join of all the relations in D:

    * ( R1(r), ..., Rm(r)) = r

    Multi-valued dependency (MVD)represents a dependency between attributes (for example, A,

    B and C) in a relation, such that for each value of A there is a

    set of values for B and a set of value for C. However, the set of

    values for B and C are independent of each other.

    A multi-valued dependency can be further defined as being

    trivial or nontrivial. A MVD A > B in relation R is defined

    as being trivial if

    B is a subset of A

    or A U B = R

    A MVD is defined as being nontrivial if neither of the above twoconditions is satisfied.

    Fourth normal form (4NF)

    A relation that is in Boyce-Codd normal form and contains

    no nontrivial multi-valued dependencies.

    A relation schemaR is in 4NF with respect to a set of dependenciesF(that

    includes functional dependencies and multivalued dependencies) if, for every

    nontrivialmultivalued dependencyX>> YinF+,Xis a superkey for R.

    Definition:

    Ajoin dependency (JD), denoted by JD(R1,R2, ...,Rn), specified on relation

    schemaR, specifies a constraint on the states rofR.

    The constraint states that every legal state rofR should have a non-

    additive join decomposition intoR1,R2, ...,Rn; that is, for every such r

    we have

    * (R1(r), R2(r), ..., Rn(r)) = r

    Note: an MVD is a special case of a JD where n = 2.

    A join dependency JD(R1,R2, ...,Rn), specified on relation schemaR, is a trivial

    JD if one of the relation schemasRi in JD(R1,R2, ...,Rn) is equal toR.

    Fifth normal form (5NF)

    Definition:

    A relation schemaR is in fifth normal form (5NF) (orProject-Join Normal

    Form (PJNF)) with respect to a setFof functional, multivalued, and joindependencies if,

    12

  • 7/30/2019 6th Database Design

    13/17

    for every nontrivial join dependency JD(R1,R2, ...,Rn) inF+ (that is,

    implied byF),

    everyRi is a superkey ofR.

    Each normal form is strictly stronger than the previous one

    Every 2NF relation is in 1NF

    Every 3NF relation is in 2NF

    Every BCNF relation is in 3NF

    Every 4NF relation is in BCNF

    13

  • 7/30/2019 6th Database Design

    14/17

    Every 5NF relation is in 4NF

    Diagrammatic notation of normal forms:-

    Normalization

    A technique for producing a set of relations with desirable

    properties, given the data requirements of an enterprise

    UNF is a table that contains one or more repeating groups

    1NF is a relation in which the intersection of each row and column contains oneand only one value

    2NF is a relation that is in 1NF and every non-primary-key attribute is fullyfunctionally dependent on the primary key.

    3NF is a relation that is in 1NF, 2NF in which no non-primary-key attribute istransitively dependent on the primary key

    BCNF is a relation in which every determinant is a candidate key

    4NF is a relation that is in BCNF and contains no trivial multi-valueddependency

    5NF is a relation that contains no join dependency

    14

  • 7/30/2019 6th Database Design

    15/17

    DBMS ARCHITECTURES:-

    Centralized DBMS:

    Combines everything into single system including- DBMS

    software, hardware, application programs, and user interfaceprocessing software.

    User can still connect through a remote terminal however, allprocessing is done at centralized site.

    15

  • 7/30/2019 6th Database Design

    16/17

    Basic 2-tier Client-Server Architectures

    Specialized Servers with Specialized functions

    Print server

    File server

    DBMS server Web server

    Email server

    Clients can access the specialized servers as needed

    Clients

    Provide appropriate interfaces through a client software module to access andutilize the various server resources.

    Clients may be diskless machines or PCs or Workstations with disks with only the

    client software installed.

    16

  • 7/30/2019 6th Database Design

    17/17

    Connected to the servers via some form of a network.

    (LAN: local area network, wireless network, etc.)

    DBMS Server

    Provides database query and transaction services to the clients

    Relational DBMS servers are often called SQL servers, query servers, or

    transaction servers Applications running on clients utilize an Application Program Interface (API) to

    access server databases via standard interface such as:

    ODBC: Open Database Connectivity standard

    JDBC: for Java programming access

    Client and server must install appropriate client module and server module

    software for ODBC or JDBC1. A client program may connect to several DBMSs, sometimes

    called the data sources.

    2. In general, data sources can be files or other non-DBMS softwarethat manages data.

    3. Other variations of clients are possible: e.g., in some objectDBMSs, more functionality is transferred to clients including data

    dictionary functions, optimization and recovery across multipleservers, etc.

    Three Tier Client-Server Architecture

    Common for Web applications

    Intermediate Layer called Application Server or Web Server:

    Stores the web connectivity software and the business logic part of theapplication used to access the corresponding data from the database server

    Acts like a conduit for sending partially processed data between the

    database server and the client.

    Three-tier Architecture Can Enhance Security:

    Database server only accessible via middle tier

    Clients cannot directly access database server

    17