lecture 12 designing databases 12.1 cosc4406: software engineering

38
Lecture 12 Lecture 12 Designing Databases Designing Databases 12. 1 COSC4406: Software Engineering

Upload: norma-gaines

Post on 31-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

Lecture 12Lecture 12Designing DatabasesDesigning Databases

12.112.1

COSC4406: Software Engineering

Page 2: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

Learning ObjectivesLearning Objectives Define each of the following database termsDefine each of the following database terms

RelationRelation Primary keyPrimary key NormalizationNormalization Functional dependencyFunctional dependency Foreign keyForeign key Referential integrityReferential integrity FieldField Data typeData type Null valueNull value DenormalizationDenormalization File organizationFile organization IndexIndex Secondary keySecondary key

12.212.2

Page 3: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

Learning ObjectivesLearning Objectives

Discuss the role of designing databases in the Discuss the role of designing databases in the analysis and design of an information systemanalysis and design of an information system

Learn how to transform an Entity-Relationship Learn how to transform an Entity-Relationship (ER) Diagram into an equivalent set of well-(ER) Diagram into an equivalent set of well-structured relationsstructured relations

Learn how to merge normalized relations from Learn how to merge normalized relations from separate user views into a consolidated set of separate user views into a consolidated set of well-structured relationswell-structured relations

12.312.3

Page 4: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

Learning ObjectivesLearning Objectives

Explain choices of storage formats for Explain choices of storage formats for database fieldsdatabase fields

Learn how to transform well-structured Learn how to transform well-structured relations into efficient database tablesrelations into efficient database tables

Discuss use of different types of file Discuss use of different types of file organizations to store database filesorganizations to store database files

Discuss indexes and their purposeDiscuss indexes and their purpose

12.412.4

Page 5: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

Purpose of Database DesignPurpose of Database Design

Structure the data in stable structures, called Structure the data in stable structures, called normalized tablesnormalized tables– Not likely to change over timeNot likely to change over time– Minimal redundancyMinimal redundancy

Develop a logical database design that reflects Develop a logical database design that reflects actual data requirementsactual data requirementsDevelop a logical database design from which a Develop a logical database design from which a physical database design can be developedphysical database design can be developed

12.512.5

Page 6: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

Purpose of Database DesignPurpose of Database Design

Translate a relational database model into Translate a relational database model into a technical file and database design that a technical file and database design that balances several performance factorsbalances several performance factors

Choose data storage technologies that will Choose data storage technologies that will efficiently, accurately and securely efficiently, accurately and securely process database activitiesprocess database activities

12.612.6

Page 7: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

Process of Database Design Process of Database Design

Logical DesignLogical Design– Based upon the conceptual data modelBased upon the conceptual data model– Four key stepsFour key steps

1.1. Develop a logical data model for each known user interface Develop a logical data model for each known user interface for the application using normalization principlesfor the application using normalization principles

2.2. Combine normalized data requirements from all user Combine normalized data requirements from all user interfaces into one consolidated logical database modelinterfaces into one consolidated logical database model

3.3. Translate the conceptual E-R data model for the application Translate the conceptual E-R data model for the application into normalized data requirementsinto normalized data requirements

4.4. Compare the consolidated logical database design with the Compare the consolidated logical database design with the translated E-R model and produce one final logical database translated E-R model and produce one final logical database model for the applicationmodel for the application

12.712.7

Page 8: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

Process of Database DesignProcess of Database Design

Physical DesignPhysical Design– Based upon results of logical database designBased upon results of logical database design– Key decisionsKey decisions

1.1. Choosing storage format for each attribute from the Choosing storage format for each attribute from the logical database modellogical database model

2.2. Grouping attributes from the logical database model Grouping attributes from the logical database model into physical recordsinto physical records

3.3. Arranging related records in secondary memory (hard Arranging related records in secondary memory (hard disks and magnetic tapes) so that records can be disks and magnetic tapes) so that records can be stored, retrieved and updated rapidlystored, retrieved and updated rapidly

4.4. Selecting media and structures for storing data to Selecting media and structures for storing data to make access more efficientmake access more efficient

12.812.8

Page 9: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

Deliverables and OutcomesDeliverables and OutcomesLogical database design must account Logical database design must account for every data element on a system for every data element on a system input or outputinput or output

Normalized relations are the primary Normalized relations are the primary deliverabledeliverable

Physical database design results in Physical database design results in converting relations into filesconverting relations into files

12.912.9

Page 10: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

Relational Database ModelRelational Database Model

Data represented as a set of related tables or relationsData represented as a set of related tables or relationsRelationRelation– A named, two-dimensional table of data. Each A named, two-dimensional table of data. Each

relation consists of a set of named columns and an relation consists of a set of named columns and an arbitrary number of unnamed rowsarbitrary number of unnamed rows

– PropertiesPropertiesEntries in cells are simpleEntries in cells are simpleEntries in columns are from the same set of valuesEntries in columns are from the same set of valuesEach row is uniqueEach row is uniqueThe sequence of columns can be interchanged without The sequence of columns can be interchanged without changing the meaning or use of the relationchanging the meaning or use of the relationThe rows may be interchanged or stored in any sequenceThe rows may be interchanged or stored in any sequence

12.1012.10

Page 11: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

Relational Database ModelRelational Database Model

Well-Structured RelationWell-Structured Relation– A relation that contains a minimum amount A relation that contains a minimum amount

of redundancy and allows users to insert, of redundancy and allows users to insert, modify and delete the rows without errors modify and delete the rows without errors or inconsistenciesor inconsistencies

12.1112.11

Page 12: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

NormalizationNormalization

The process of converting complex data The process of converting complex data structures into simple, stable data structures into simple, stable data structuresstructures

Second Normal Form (2NF)Second Normal Form (2NF)– Each nonprimary key attribute is identified Each nonprimary key attribute is identified

by the whole key (called full functional by the whole key (called full functional dependency)dependency)

12.1212.12

Page 13: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

NormalizationNormalization

Third Normal Form (3NF)Third Normal Form (3NF)– Nonprimary key attributes do not depend on Nonprimary key attributes do not depend on

each other (called transitive dependencies)each other (called transitive dependencies)

The result of normalization is that every The result of normalization is that every nonprimary key attribute depends upon nonprimary key attribute depends upon the whole primary keythe whole primary key

12.1312.13

Page 14: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

Functional Dependencies and Functional Dependencies and Primary KeysPrimary Keys

Functional DependencyFunctional Dependency– A particular relationship between two attributes. For A particular relationship between two attributes. For

a given relation, attribute B is functionally dependent a given relation, attribute B is functionally dependent on attribute A is, for every valid value of A, that on attribute A is, for every valid value of A, that value of A uniquely determines the value of Bvalue of A uniquely determines the value of B

– Instances (or sample data) in a relation do not prove Instances (or sample data) in a relation do not prove the existence of a functional dependencythe existence of a functional dependency

– Knowledge of problem domain is most reliable Knowledge of problem domain is most reliable method for identifying functional dependencymethod for identifying functional dependency

Primary KeyPrimary Key– An attribute whose value is unique across all An attribute whose value is unique across all

occurrences of a relationoccurrences of a relation

12.1412.14

Page 15: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

Functional Dependencies and Functional Dependencies and Primary KeysPrimary Keys

Second Normal Form (2NF)Second Normal Form (2NF)– A relation is in second normal form (2NF) if A relation is in second normal form (2NF) if

any of the following conditions apply:any of the following conditions apply:The primary key consists of only one attributeThe primary key consists of only one attribute

No nonprimary key attributes exist in the relationNo nonprimary key attributes exist in the relation

Every nonprimary key attribute is functionally Every nonprimary key attribute is functionally dependent on the full set of primary key attributesdependent on the full set of primary key attributes

12.1512.15

Page 16: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

Functional Dependencies andFunctional Dependencies and Primary Keys Primary Keys

Conversion to second normal form (2NF)Conversion to second normal form (2NF)– To convert a relation into 2NF, decompose To convert a relation into 2NF, decompose

the relation into new relations using the the relation into new relations using the attributes, called determinants, that determine attributes, called determinants, that determine other attributesother attributes

– The determinants become the primary key of The determinants become the primary key of the new relationthe new relation

12.1612.16

Page 17: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

Functional Dependencies andFunctional Dependencies and Primary Keys Primary Keys

Third Normal Form (3NF)Third Normal Form (3NF)– A relation is in third normal form (3NF) if it is A relation is in third normal form (3NF) if it is

in second normal form (2NF) and there are no in second normal form (2NF) and there are no functional (transitive) dependencies between functional (transitive) dependencies between two (or more) nonprimary key attributestwo (or more) nonprimary key attributes

12.1712.17

Page 18: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

Functional Dependencies andFunctional Dependencies and Primary Keys Primary Keys

Foreign KeyForeign Key– An attribute that appears as a nonprimary key An attribute that appears as a nonprimary key

attribute in one relation and as a primary key attribute attribute in one relation and as a primary key attribute (or part of a primary key) in another relation(or part of a primary key) in another relation

Referential IntegrityReferential Integrity– An integrity constraint specifying that the value (or An integrity constraint specifying that the value (or

existence) of an attribute in one relation depends on existence) of an attribute in one relation depends on the value (or existence) of the same attribute in the value (or existence) of the same attribute in another relationanother relation

12.1812.18

Page 19: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

Transforming E-R Diagrams into Transforming E-R Diagrams into RelationsRelations

It is useful to transform the conceptual It is useful to transform the conceptual data model into a set of normalized data model into a set of normalized relationsrelationsStepsSteps– Represent entitiesRepresent entities– Represent relationshipsRepresent relationships– Normalize the relationsNormalize the relations– Merge the relationsMerge the relations

12.1912.19

Page 20: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

Transforming E-R Diagrams into Transforming E-R Diagrams into RelationsRelations

Represent EntitiesRepresent Entities– Each regular entity is transformed into a relationEach regular entity is transformed into a relation– The identifier of the entity type becomes the primary The identifier of the entity type becomes the primary

key of the corresponding relationkey of the corresponding relation– The primary key must satisfy the following two The primary key must satisfy the following two

conditionsconditionsa.a. The value of the key must uniquely identify every row in the The value of the key must uniquely identify every row in the

relationrelation

b.b. The key should be nonredundantThe key should be nonredundant

12.2012.20

Page 21: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

Transforming E-R Diagrams into Transforming E-R Diagrams into RelationsRelations

Represent RelationshipsRepresent Relationships– Binary 1:N RelationshipsBinary 1:N Relationships

Add the primary key attribute (or attributes) of the entity on Add the primary key attribute (or attributes) of the entity on the one side of the relationship as a foreign key in the the one side of the relationship as a foreign key in the relation on the right siderelation on the right side

The one side The one side migratesmigrates to the many side to the many side

– Binary or Unary 1:1Binary or Unary 1:1Three possible optionsThree possible options

a.a. Add the primary key of A as a foreign key of BAdd the primary key of A as a foreign key of B

b.b. Add the primary key of B as a foreign key of AAdd the primary key of B as a foreign key of A

c.c. Both of the aboveBoth of the above

12.2112.21

Page 22: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

Transforming E-R Diagrams into Transforming E-R Diagrams into RelationsRelations

Represent Relationships (continued)Represent Relationships (continued)– Binary and Higher M:N relationshipsBinary and Higher M:N relationships

Create another relation and include primary keys of all Create another relation and include primary keys of all relations as primary key of new relationrelations as primary key of new relation

– Unary 1:N RelationshipsUnary 1:N RelationshipsRelationship between instances of a single entity typeRelationship between instances of a single entity typeUtilize a recursive foreign keyUtilize a recursive foreign key– A foreign key in a relation that references the primary key A foreign key in a relation that references the primary key

values of that same relationvalues of that same relation

– Unary M:N RelationshipsUnary M:N RelationshipsCreate a separate relationCreate a separate relationPrimary key of new relation is a composite of two Primary key of new relation is a composite of two attributes that both take their values from the same attributes that both take their values from the same primary keyprimary key

12.2212.22

Page 23: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

12.2312.23

Page 24: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

Transforming E-R Diagrams into Transforming E-R Diagrams into RelationsRelations

Merging Relations (View Integration)Merging Relations (View Integration)– Purpose is to remove redundant relationsPurpose is to remove redundant relations– View Integration ProblemsView Integration Problems

SynonymsSynonyms– Two different names used for the same attributeTwo different names used for the same attribute– When merging, get agreement from users on a single, When merging, get agreement from users on a single,

standard namestandard nameHomonymsHomonyms

– A single attribute name that is used for two or more A single attribute name that is used for two or more different attributesdifferent attributes

– Resolved by creating a new nameResolved by creating a new nameDependencies between nonkeysDependencies between nonkeys

– Dependencies may be created as a result of view Dependencies may be created as a result of view integrationintegration

– In order to resolve, the new relation must be normalizedIn order to resolve, the new relation must be normalized

12.2412.24

Page 25: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

Physical File and Database Physical File and Database DesignDesign

The following information is requiredThe following information is required– Normalized relations, including volume estimatesNormalized relations, including volume estimates– Definitions of each attributeDefinitions of each attribute– Descriptions of where and when data are used, Descriptions of where and when data are used,

entered, retrieved, deleted and updated entered, retrieved, deleted and updated (including frequencies)(including frequencies)

– Expectations or requirements for response time Expectations or requirements for response time and data integrityand data integrity

– Descriptions of the technologies used for Descriptions of the technologies used for implementing the files and databaseimplementing the files and database

12.2512.25

Page 26: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

Designing FieldsDesigning Fields

FieldField– The smallest unit of named application data recognized by system softwareThe smallest unit of named application data recognized by system software– Each attribute from each relation will be represented as one or more fieldsEach attribute from each relation will be represented as one or more fields

Choosing data typesChoosing data types– Data TypeData Type

A coding scheme recognized by system software for representing A coding scheme recognized by system software for representing organizational dataorganizational data

– Four objectivesFour objectivesMinimize storage spaceMinimize storage spaceRepresent all possible values of the fieldRepresent all possible values of the fieldImprove data integrity of the fieldImprove data integrity of the fieldSupport all data manipulations desired on the fieldSupport all data manipulations desired on the field

– Calculated fieldsCalculated fieldsA field that can be derived from other database fieldsA field that can be derived from other database fields

12.2612.26

Page 27: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

Methods of Controlling Data Methods of Controlling Data IntegrityIntegrity

Default ValueDefault Value– A value a field will assume unless an explicit value is A value a field will assume unless an explicit value is

entered for that fieldentered for that field

Range ControlRange Control– Limits range of values which can be entered into fieldLimits range of values which can be entered into field

Referential IntegrityReferential Integrity– An integrity constraint specifying that the value (or An integrity constraint specifying that the value (or

existence) of an attribute in one relation depends on the existence) of an attribute in one relation depends on the value (or existence) of the same attribute in another relationvalue (or existence) of the same attribute in another relation

Null ValueNull Value– A special field value, distinct from 0, blank, or any other A special field value, distinct from 0, blank, or any other

value, that indicates that the value for the field is missing or value, that indicates that the value for the field is missing or otherwise unknownotherwise unknown

12.2712.27

Page 28: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

Designing Physical TablesDesigning Physical Tables

Relational database is a set of related tablesRelational database is a set of related tablesPhysical TablePhysical Table– A named set of rows and columns that specifies the A named set of rows and columns that specifies the

fields in each row of the tablefields in each row of the table

Design GoalsDesign Goals– Efficient use of secondary storage (disk space)Efficient use of secondary storage (disk space)

Disks are divided into units that can be read in one machine Disks are divided into units that can be read in one machine operationoperationSpace is used most efficiently when the physical length of a Space is used most efficiently when the physical length of a table row divides close to evenly with storage unittable row divides close to evenly with storage unit

– Efficient data processingEfficient data processingData are most efficiently processed when stored next to each Data are most efficiently processed when stored next to each other in secondary memoryother in secondary memory

12.2812.28

Page 29: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

Designing Physical TablesDesigning Physical Tables

DenormalizationDenormalization– The process of splitting or combining normalized The process of splitting or combining normalized

relations into physical tables based on affinity of relations into physical tables based on affinity of use of rows and fieldsuse of rows and fields

– PartitioningPartitioningCapability to split a table into separate sectionsCapability to split a table into separate sectionsOracle 8i implements three typesOracle 8i implements three types– RangeRange– HashHash– CompositeComposite

– Optimizes certain operations at the expense of Optimizes certain operations at the expense of othersothers

12.2912.29

Page 30: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

Designing Physical TablesDesigning Physical Tables

DenormalizationDenormalization– Three common situations where Three common situations where

denormalization may be useddenormalization may be used1.1.Two entities with a one-to-one relationshipTwo entities with a one-to-one relationship

2.2.A many-to-many relationship with nonkey A many-to-many relationship with nonkey attributesattributes

3.3.Reference dataReference data

12.3012.30

Page 31: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

Designing Physical TablesDesigning Physical Tables

Arranging Table RowsArranging Table Rows– Physical FilePhysical File

A named set of table rows stored in a contiguous A named set of table rows stored in a contiguous section of secondary memorysection of secondary memory

– Each table may be a physical file or whole Each table may be a physical file or whole database may be one file, depending on database may be one file, depending on database management software utilizeddatabase management software utilized

12.3112.31

Page 32: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

Designing Physical TablesDesigning Physical Tables

File OrganizationFile Organization– A technique for physically arranging the records A technique for physically arranging the records

of a fileof a file– Objectives for choosing file organizationObjectives for choosing file organization

1.1. Fast data retrievalFast data retrieval

2.2. High throughput for processing transactionsHigh throughput for processing transactions

3.3. Efficient use of storage spaceEfficient use of storage space

4.4. Protection from failures or data lossProtection from failures or data loss

5.5. Minimizing need for reorganizationMinimizing need for reorganization

6.6. Accommodating growthAccommodating growth

7.7. Security from unauthorized useSecurity from unauthorized use

12.3212.32

Page 33: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

Designing Physical TablesDesigning Physical Tables

Types of File OrganizationTypes of File Organization– SequentialSequential

The rows in the file are stored in sequence according to a The rows in the file are stored in sequence according to a primary key valueprimary key valueUpdating and adding records may require rewriting the fileUpdating and adding records may require rewriting the fileDeleting records results in wasted spaceDeleting records results in wasted space

– Indexed Indexed The rows are stored either sequentially or nonsequentially and The rows are stored either sequentially or nonsequentially and an index is created that allows software to locate individual an index is created that allows software to locate individual rowsrowsIndexIndex– A table used to determine the location of rows in a file that satisfy A table used to determine the location of rows in a file that satisfy

some conditionsome condition

Secondary IndexSecondary Index– Index based upon a combination of fields for which more than one Index based upon a combination of fields for which more than one

row may have same combination of valuesrow may have same combination of values

12.3312.33

Page 34: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

Designing Physical TablesDesigning Physical Tables

Guidelines for choosing indexesGuidelines for choosing indexes– Specify a unique index for the primary key of each Specify a unique index for the primary key of each

tabletable– Specify an index for foreign keysSpecify an index for foreign keys– Specify an index for nonkey fields that are referenced Specify an index for nonkey fields that are referenced

in qualification, sorting and grouping commands for in qualification, sorting and grouping commands for the purpose of retrieving datathe purpose of retrieving data

Hashed File OrganizationHashed File Organization– The address for each row is determined using an The address for each row is determined using an

algorithmalgorithm

12.3412.34

Page 35: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

12.3512.35

Page 36: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

Designing Controls for FilesDesigning Controls for Files

Backup TechniquesBackup Techniques– Periodic backup of filesPeriodic backup of files– Transaction log or audit trailTransaction log or audit trail– Change logChange log

Data Security TechniquesData Security Techniques– Coding or encryptingCoding or encrypting– User account managementUser account management– Prohibiting users from working directly with the Prohibiting users from working directly with the

data. Users work with a copy which updates the data. Users work with a copy which updates the files only after validation checksfiles only after validation checks

12.3612.36

Page 37: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

SummarySummary

Key TermsKey Terms– RelationRelation– Primary keyPrimary key– NormalizationNormalization– Functional dependencyFunctional dependency– Foreign keyForeign key– Referential integrityReferential integrity– FieldField– Data typeData type– DenormalizationDenormalization– File organizationFile organization– IndexIndex– Secondary keySecondary key

12.3712.37

Page 38: Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering

SummarySummary

Transforming E-R diagram into well-Transforming E-R diagram into well-structured relationsstructured relations

View integrationView integration

Storage formats for database fieldsStorage formats for database fields

Efficient database table designEfficient database table design– Efficient use of secondary storageEfficient use of secondary storage– Data processing speedData processing speed

File organizationFile organization

IndexesIndexes

12.3812.38