- satya - blogs.dbspeak.comblogs.dbspeak.com/.../07/database_design_satya.pdf · adabas, adabas,...

85
1 Satya © 2004 Database design - Satya

Upload: lamthuan

Post on 10-Apr-2018

226 views

Category:

Documents


1 download

TRANSCRIPT

1

Satya © 2004

Database design- Satya

2

Satya © 2004

"Normalization is a logical concept, performance is determined at the physical

level. Therefore, it is impossible to denormalize for performance."

Fabian Pascal –co-founder & editor of Database Debunkings (dbdebunk.com)

Architect’s buzZ word

3

Satya © 2004

“Denormalization, if necessary, should be done at the level of stored files, not at the

level of base relvars”

Chris J. Date –Most respected database expert in Computer Industry, Author –Database Systems.

“denormalization, is not ‘good for performance’, it is good for the performance

of specific applications”

Architect’s buzZ word

4

Satya © 2004

• Presenting concepts, not syntax.• Presenting “How” & “What” not “Why” in the RDBMS.

Background

5

Satya © 2004

• Introduction

Agenda

6

Satya © 2004

40,000 BCEcave paintings

bone tools 3500writing 0 C.E.

paper 1051450

printing1870

electricity, telephonetransistor 1947

computing 1950Late 1960s

Internet(DARPA)

1993The web

1999

1½Bin 1999

GIG

ABYTES

Source: UC Berkeley

Ready for the data eXplosion?

7

Satya © 2004

2000 3B

2001 6B

2002 12B

2003 24B

40,000 BCEcave paintings

bone tools 3500writing 0 C.E.

paper 1051450

printing1870

electricity, telephone

transistor 1947computing 1950

Late 1960sInternet

(DARPA)1993

The web1999

GIG

ABYTES

Source: UC Berkeley

The coming content - “Big Bang”

8

Satya © 2004

• Terabytes of data– Common corporate expression– Petabytes(10^15) & Exabytes(10^18) is fast approaching

• 2-3 Exabytes = total volume of all information generated worldwide annually

– Need structure to efficiently handle large data.

Source: 2001 - IBM Informix Conference, Las Vegas.

Future data size

9

Satya © 2004

? An Organized Store of Information

–Flat Files

–Hierarchical Databases

–Network Databases

–Relational Databases

–Object Relational Databases

–Object Databases

DatabaseDatabase

Adabas, Adabas, FileMaker

IBM’s Information Management System (IMS) – used in Apollo Moon Landing.

GE’s Integrated Data Store (IDS)

Oracle, Db2, Sybase, MS SQL, Postgres

Oracle 9

Cloudscape

Database – the solution

10

Satya © 2004

Project IdentificationProject Identificationand Selectionand Selection

Project InitiationProject Initiationand Planningand Planning

AnalysisAnalysis

Physical DesignPhysical Design

ImplementationImplementation

MaintenanceMaintenance

Logical DesignLogical Design

Enterprise modelingEnterprise modeling

Conceptual data modelingConceptual data modeling

Database development activities during the systems development life cycle (SDLC)

11

Satya © 2004

Database Application Lifecycle

DB Design

SYSTEMS DEFINITIONSYSTEMS DEFINITION

DATABASE PLANNINGDATABASE PLANNING

REQUIREMENTS ANALYSISREQUIREMENTS ANALYSIS

DBMS SELECTIONDBMS SELECTION

PROTOTYPINGPROTOTYPING IMPLEMENTATIONIMPLEMENTATION

DATA LOADING / MIGRATIONDATA LOADING / MIGRATION

TESTINGTESTING

MAINTENANCEMAINTENANCE

CONCEPTUAL DESIGNCONCEPTUAL DESIGN

LOGICAL DESIGNLOGICAL DESIGN

PHYSICAL DESIGNPHYSICAL DESIGN

APPLICATION APPLICATION DESIGNDESIGN

12

Satya © 2004

Entity relationship modeling Entity relationship modeling and normalizationand normalization

Data analysis and Data analysis and requirementsrequirements

Data model verificationData model verification

Distributed database designDistributed database design

DBMS software selectionDBMS software selection

Logical design Logical design

Physical design Physical design

Conceptual Design

Determine end user views, outputs, and transaction processing requirements.

Define entities, attributes and relationships. Draw ER diagrams. Normalize tables.

Identify main processes, insert, update and delete rules. Validate reports, queries, views, integrity, sharing and security.

Define location of tables, access requirements and fragmentation strategy.

Translate the conceptual model into definitions for tables, views and so on…

Define storage structures and access paths for optimum performance.

DBMS independent

DBMS dependent

Hardware dependent

Database design flow

13

Satya © 2004

• Entity-Relationship (ER) data modeling– A graphical technique for understanding and organizing the data

independently of the eventual database implementation

• Normalization– An algorithmic process for evaluating the quality of a database design -

most applicable to relational database designs

• Types of Models– Models (of databases or anything else) can be built at different levels of

abstraction– For databases (following the text):

• Conceptual – logical ? ER Models (represent semantics)• Internal - for the chosen DBMS• External - the way the User see the data• Physical - for the actual physical storage

Design Approach

14

Satya © 2004

ER Modeling

15

Satya © 2004

•The concepts upon which ER models are built are:–Entities (or, more correctly, entity types)

–also called as “relvars”, “base relvars”, “relation”–at physical implementation level, called as table

–Relationships (between entities) –Attributes (of entities and relationships)

Entity-Relationship BasicsER Modeling concepts

16

Satya © 2004

•An entity is “A person, place, event, or thing for which we intend to collect data”•Normally a database will contain data about groups of similar entities (e.g. students, subjects, licenses, aircraft or whatever)•These groups of similar entities are referred to as entity types but often this is shortened to just “entity” or “entities”

Entities & Entity types

17

Satya © 2004

•Entity types are conventionally named in the singular•Attributes are represented on ER diagrams as ellipses attached to the relevant entity type symbol

•There are other notations as well (e.g. a list of attributes next to the entity type symbol) but they are conceptually equivalent

student

studentNumber

NameDOB

Address

Gender

Entity types & Attributes

18

Satya © 2004

•A relationship is an association between entity types•Relationships are represented by diamond shaped symbols on ER diagrams•A descriptive name is placed inside the relationship symbol

student enrolsin

subject

Relationships

19

Satya © 2004

•Entity type names are usually nouns•Relationship names are usually, though not always, verbs (or verb phrases)•Most relationships are binary (i.e. connect 2 entity types) - like “enrolls in”•Other types of relationships are possible

Relationship & entity

20

Satya © 2004

Degree of a Relationship

•The degree of a relationship is the number of entity type(s) that it connects–One Unary–Two Binary–Three Ternary

•Relationships of degree higher than three are rare

employeesupervises

Unary relationship student subjectenrolls Binary relationship

vendor purchasersale

Ternary relationship

item

vendor purchasersale

Three binary relationships

itemsale sale

=

21

Satya © 2004

Relationship Connectivity (Cardinality)

•Relationships can have different connectivity(s)

•one-to-one (1:1)•one-to-many (1:N)•many-to-many (M:N)

•Indicated on the ER diagram by placing an appropriate symbol on each “leg” of the relationship

employeesupervises

M1 supervisor

student subjectenrolls

N

1

lecturer subjectteaches

N

22

Satya © 2004

E R F E R F E R F

One-to-one relationshipmin-card(E, R)=0max-card(E,R)=1min-card(F,R)=0max-card(F,R)=1

Many-to-one relationshipmin-card(E, R)=0max-card(E,R)=Nmin-card(F,R)=1max-card(F,R)=1

Many-to-many relationshipmin-card(E, R)=0max-card(E,R)=Nmin-card(F,R)=0max-card(F,R)=N

23

Satya © 2004

Relationship Participation

•Entity types connected by a relationship can have two kinds of “participation” in it

•Partial (or optional)•Total (or mandatory)

•“Total” means that every entity instance must be connected (through the relationship) to an instance of the other participating entity type(s)•“Partial” means not total

1

staff departmentHead of

1

24

Satya © 2004

Key Attribute(s)

•There will normally be one, or perhaps several, attributes that will be unique for every entity instance•Example:

•Every student will have a unique student number•Such an attribute (or combination) is called a key•If the key for an entity set consists of two or more attributes in combination it is called a concatenated key•Key attribute(s) are underlined on the ER diagram

person

Number

Name DOB

Address

Gender

Qualification Age

25

Satya © 2004

Derived, Multi-valued attributes

•Sometimes it is useful to have, on the ER diagram, attributes that can be derived from other attributes•Example:

•An attribute Age can be derived from an attribute DOB and the current date•Derived attributes can be indicated on the ER diagram by using a dashed ellipse and connecting line to the relevant entity type

26

Satya © 2004

Relationships attributes

•A relationship is an association between entity sets•Relationships can also have attributes•An attribute of a relationship is drawn attached to the relationship diamond•Usually only M:N relationships have attributes

employeesupervises

N

Task

M

27

Satya © 2004

Strong & Weak entities/entity types

•Sometimes the instances of one entity type depend, for their unique identification, on their relationship to the instances of another entity type

building consists of

room

Name Number

28

Satya © 2004

Supertypes & Subtypes

•Sometimes notionally different entity types are really specializations of a more general entity type•Example:

•Trucks, cars, motorcycles, buses, taxis are all motor vehicles•Some attributes are common to all, others are specific to one group

•This kind of situation can be dealt with using a generalization hierarchy (or super type/subtype hierarchy)•The attribute(s) that are common belong to the super type•The attributes that are specific are attached to the relevant subtype

29

Satya © 2004

Supertypes & Subtypes

motorvehicle

truck car bus

Registration

truckattributes

carattributes

busattributes

Seats

dUUU

30

Satya © 2004

Supertypes & Subtypes

employee

Safety officer engineer pilot

TFN

Safetyattributes

engineerattributes

pilotattributes

DOB

oUUU

Gender

Address

31

Satya © 2004

Internal level(storage view)

Conceptual level(community user view)

External level(individual user views)

Internal View

Conceptual

External (COBOL) External (XML )

01 EMPC.02 EMPNO PIC X(6).02 DEPTNO PIC X(4).

STORED_EMP BYTES=20PREFIX TYPE=BYTE(6), OFFSET=0EMP# TYPE=BYTE(6), OFFSET=6, INDEX=EMPXDEPT# TYPE=BYTE(4), OFFSET=12PAY TYPE=FULLWORD, OFFSET=16

EMPLOYEEEMPLOYEE_NUMBER CHARACTER(6)DEPARTMENT_NUMBER CHARACTER(4)SALARY NUMERIC(5)

<xsd:element name=“Emp”><xsd:element name=“Eno” type=“Number” /><xsd:elementname=“Dno” type=“Number” />

</xsd:element >

Three schema architecture for Database development

Conceptual Schema- Neutral View -

ExternalSchema

InternalSchema

32

Satya © 2004

Normalization

33

Satya © 2004

Levels of normalization

5NF relvars

4NF relvars

BCNF relvars

3NF relvars

2NF relvars

1NF relvars (normalized entities)

34

Satya © 2004

Normalization - Keys

Superkey:A superkey is a set of one or more attributes that, taken collectively, allows us to identify uniquely an entity. Candidate key:Any subset of a superkey that is also a superkey and is not reducible to another superkey is called candidate key. Primary key:A primary key is selected arbitrarily from the set of candidate keys to be used in an index for that table.

Source: Database Modeling & Design – Tobey J. Teorey

35

Satya © 2004

Normalization – 1nf

Source: Administration Guide: Planning – DB2Database Systems – C.J. Date

First normal form (1NF):Defn: A relvar is in 1NF if and only if, in every legal value of that relvar, every tuple contains exactly one value for each attribute.

Explanation: At each row and column position in the table, thereexists one value, never a set of values.

Essence: Every row, column should be atomic.

Violation of 1NF:Employee (EID#, Name, SkillSet, Address1, Address2)

•SkillSet stores, comma separated values. (C, VisualBasic, Oracle)•How many more addresses can be stored in this fashion?

36

Satya © 2004

Normalization – 2nf

Source: Administration Guide: Planning – DB2Database Systems – C.J. Date

Second normal form (2NF):(Assuming one candidate key, which we assume is the primary key)

Defn: A relvar is in 2NF if and only if, it is in 1NF and every nonkey attribute is irreducibly dependent on the primary key.

Explanation: Each column that is not part of the key is dependent upon the key.

Essence: All non-keys must depend on Key value.

Violation of 2NF:WarehouseParts(PART#, WAREHOUSE#, Qty, WHAddr)

•WAREHOUSE# ? WHAddr•PART# ? Qty

37

Satya © 2004

Normalization – 3nf

Source: Administration Guide: Planning – DB2Database Systems – C.J. Date

Third normal form (3NF):(Assuming one candidate key, which we assume is the primary key)

Defn: A relvar is in 3NF if and only if, it is in 2NF and every nonkey attribute is nontransitively dependent on on the primary key.

Note: “No transitive dependencies” implies no mutual dependencies.

Explanation: Each column that is not part of the key is dependent upon the key.

Essence: All non-keys must depend “only” on Key value and no other non-key.

Violation of 3NF:Emp_Dept(EID#, FirstName, LastName, WorkDept, DeptName)

38

Satya © 2004

Normalization - bcnf

Boyce/Codd normal form (BCNF):(Assuming composite candidate key as primary key)

Defn: A relvar is in BCNF if and only if, every non-trivial, left irreducible FD has a candidate key as its determinant.

Explanation: Each column that is not part of the key is fully dependent upon the whole composite key and not on any single keyalone.

Essence: All non-keys must depend “only” on “composite” key value and not on a single key.

Violation of BCNF:HotelRoom (HNo#, Room#, RoomType)

RoomType ? Room# & RoomType?HNo#

Source: Database Systems – C.J. Date

39

Satya © 2004

Normalization – 4nf

Source: Administration Guide: Planning – DB2Database Systems – C.J. Date

Fourth normal form (4NF):

Defn: Relvar R is in 4NF if and only if, whenever there exist a subsets A and B of the attributes of R such that the nontrivial MVD A ??B is satisfied, then all attributes of R are also functionally dependent on A.

Explanation: No row contains two or more independent multi-valued facts about an entity.

Essence: Two separate facts cannot be in the same entity.

Violation of 4NF:Emp_Skill(EID#, SkillName#, Language#)

40

Satya © 2004

Normalization – 5nf (pjnf)

Source: Database Systems – C.J. Date

Fifth normal form (5NF):

Defn: Relvar R is in 5NF(also called projection join normal form) if and only if, every nontrivial join dependency that holds for R is implied by the candidate keys of R.Explanation: If a table can be decomposed further losslessly, then it could be decomposed. R{A,B,C} satisfies JD * {AB,AC} if and only if the MVDs A ?? B and A?? C hold in R

A ?? B | C ? * {AB, AC}Essence: Two separate facts cannot be in the same entity.

Violation of 5NF:

41

Satya © 2004

Normalization – Others

Domain Key normal form (DK/NF):

Defn: A relvar R is said to be in DKNF if and only if, every constraint on R is a logical consequence of the domain constraints and key constraints that apply to R.Explanation:--Principle of Orthogonal design (A Digression):Eg: SA has suppliers of Paris, SB has suppliers not in paris or with status 30. It is possible for a row to be present in both SA and SB, thus giving rise to update anomaly.

SX(S#, Sname, Status), SY(S#, Sname, City)

This can be best used in Distributed database design.

Source: Database Systems – C.J. Date

42

Satya © 2004

Denormalization - Types

1. Collapsing Tablesa. Two entities in a m:n relationship

To avoid frequent joins, this can be applied.b. Two entities in a 1:1 relationship

To avoid updates to two separate entities that are in 1:12. Reference data in a 1:m relationship (Add Redundant Columns)

When large composite key / derived keys are used, they can be added to child entity in a 1:m relationship as a foreign key, again to avoid certain join operations.

3. Entities with the most detailed dataWhen MVDs/Temporal design is in place, we could store summarized

data about MVD attribute/temporal dimension (eg: months)4. Derived attributes

When an attribute is derived by a function of another, but its better to store derived attribute. (eg: SearchName, y = f(x) ? store x,y in R)

5. Splitting Tables (Horizontal / Vertical Splitting)

Source: Denormalization effects on Performance of RDBMS, G. Lawrence Sanders, Seungkyoon Shin, State University of New York, Buffalo

43

Satya © 2004

Denormalization – Criteria

Criteria• General application performance requirements• indicated by business needs.• On-line response time requirements for application queries,

updates ad processes.• Minimum number of data access paths.• Minimum amount of storage.

Source: Database Modeling 7 Design – Tobey J. Teorey

44

Satya © 2004

Denormalization – Alternatives

Alternatives• Application performance criteria.• Future application development and maintenance

considerations.• Volatility of application requirements.• Relations between transactions and relations of entities

involved.• Transaction type (update/query, OLTP/OLAP).• Transaction frequency.• Access paths needed by each transaction.• Number of rows accessed by each transaction.• Number of pages/blocks accessed by each transaction.• Cardinality of each relation.

Source: Database Modeling & Design – Tobey J. Teorey

45

Satya © 2004

Data Modeling

46

Satya © 2004

Diagramming Notations

Notation• Bachman Notation• Chen ERD• Database Model Diagram

47

Satya © 2004

Diagramming Notations

48

Satya © 2004

Diagramming NotationsDatabase Model Diagram

49

Satya © 2004

Diagramming Notations

ER Source Model

50

Satya © 2004

Diagramming Notations - IDEF1X Notation

Attribute And Primary Key Syntax

} Primary-Key Attributes

Entity-name/Entity-number

Attribute-Name [Attribute-Name]

[Attribute-Name][Attribute-Name][Attribute-Name]

reference to note (n) where cardinality is specified(n)

zero or one

Z

n-m

from n to mone or moreP

exactly nn

zero, one or more

Relationship Cardinality

Source: IDEF1X Formalization, 1993, Robert G. Brown (The Database Design Group)

51

Satya © 2004

Diagramming Notations - IDEF1X Notation

Identifying Relationship

** The Child Entity in an Identifying Relationship is always an Identifier-Dependent Entity.

* The Parent Entity in an Identifying Relationship may bean Identifier-Independent Entity (as shown) or an Identifier-Dependent Entity depending upon other relationships.

*Parent Entity

Entity-A

Key-Attribute-A

**Child Entity Key-Attribute-BKey-Attribute-A (FK)

Entity-B

Relationship NameIdentifying Relationship

** The Child Entity in a Mandatory Non-Identifying Relationship will be an Identifier-Independent Entity unless the entity is also a Child Entity in some Identifying Relationship.

* The Parent Entity in a Mandatory Non-Identifying Relationship may be an Identifier-Independent Entity (as shown) or an Identifier-Dependent Entity depending upon other relationships.

*Parent Entity

Entity-A

**Child Entity

Key-Attribute-A

Key-Attribute-B

Key-Attribute-A (FK)

Entity-B

Relationship Name

Mandatory Non-Identifying Relationship

Mandatory Non-Identifying Relationship

Source: IDEF1X Formalization, 1993, Robert G. Brown (The Database Design Group)

52

Satya © 2004

Diagramming Notations - IDEF1X Notation

Optional Non-Identifying Relationship

** The Child Entity in a Optional Non-Identifying Relationship will be an Identifier-Independent Entity unless the entity is also a Child Entity in some Identifying Relationship.

* The Parent Entity in a Optional Non-Identifying Relationship may be an Identifier-Independent Entity (as shown) or an Identifier-Dependent

Entity depending upon other relationships.

*Parent Entity

Entity-A

Key-Attribute-A

**Child Entity

Key-Attribute-B

Key-Attribute-A (FK)

Entity-B

Relationship Name

Optional Non-Identifying Relationship

Source: IDEF1X Formalization, 1993, Robert G. Brown (The Database Design Group)

53

Satya © 2004

Diagramming Notations - IDEF1X Notation

Frequency

Ultra High Frequency

(UHF)

Very High Frequency

(VHF)

High Frequency

(HF)

Radio Frequency

Audio Frequency

Ultra-Sonic Sonic Sub-Sonic

Base Domain

Typed Domains

Domain Hierarchy

Source: IDEF1X Formalization, 1993, Robert G. Brown (The Database Design Group)

54

Satya © 2004

Diagramming Notations - IDEF1X Notation

Team Organization

Expert

Source Project

Manager Modeler

Acceptance Review

Committee

Source: IDEF1X Formalization, 1993, Robert G. Brown (The Database Design Group)

55

Satya © 2004

Diagramming Notations - IDEF1X Diagram

domaindomain_id.id (FK) domainRule (O)

entity

entity_id.id (FK)

viewview_id.id (FK) level purpose scope author_conventions

idef1xObjectid name (AK1) description (O)

aliasEntityentity_id (FK) realEntity_id.entity_id (FK)

aliasDomaindomain_id (FK) realDomain_id.domain_id (FK)

baseDomaindomain_id (FK) dataType (O)

viewEntity

view_id (FK) entity_id (FK) is_dependent (O)

typedDomaindomain_id (FK) superType_id.domain_id (FK)

clusterview_id (FK) (AK1) clusterNo generic_id.entity_id (FK) (AK1) discEntity_id.entity_id (O) (FK) is_compete disc_id .attribute_id (O) (FK) (AK1)

connectionRelationshipparent_id.entity_id (FK) (AK1) connectionNo child_id.entity_id (FK) (AK1) view_id (FK) (AK1) name1 (O) (AK1) name2 (O) (AK1) childLow childHigh (O) parentLow (O) parentHigh (O) is-mandatory (O) is-specific is-identifying category

view_id (FK) category_id.entity_id (FK) clusterNo (FK) generic_id (FK)

connectionForeignKeyAttributeparent_id (FK) view_id (FK) role_id.attribute_id (FK) child_id.entity_id (FK) connectionNo (FK)

alternateKeyAttributeentity_id (FK) view_id (FK) attribute_id (FK) alternateKeyNo (FK)

/supertypeaka / real

appears in

aka / realappears in

contains

is parent in / parentis child in / child

is generic in / genericcontains

appears inP

is used as

is discriminator for

primaryKeyAttribute

attribute_id (FK) view_id (FK) entity_id (FK)

viewEntityAttributeattribute_id.domain_id (FK) view_id (FK) entity_id (FK) is_nonull (O) is_owned (O) is_migrated (O)

alternateKeyalternateKeyNo view_id (FK) entity_id (FK)

contains

P

IDEF1X Diagram

Source: IDEF1X Formalization, 1993, Robert G. Brown (The Database Design Group)

56

Satya © 2004

Diagramming Guidelines

• Identify layout conventions• Analyze information requirements for attributes• Model attributes• Identify multi-valued attributes• Validate attributes • Identify common and derived data• Understand the use of domains• Identify the components of a data warehouse

Source: Adding Detail to the Diagram - Annette scott, Oracle

57

Satya © 2004

Lay Out the ER Diagram

• Neat and tidy

• Unambiguous text

• Memorable patterns

• Neat and tidy

• Unambiguous text

• Memorable patterns

Source: Adding Detail to the Diagram - Annette scott, Oracle

58

Satya © 2004

Layout Guidelines

Dead Crows Fly East !

Source: Adding Detail to the Diagram - Annette scott, Oracle

59

Satya © 2004

Attributes

Badge Number - Identifies an employee

Name - Qualifies an employee

Payroll category (weekly or salaried) -Classifies an employee

Date of birth - Quantifies an employee

Employment status (active, leave, terminated) -Expresses the status of an employee

Source: Adding Detail to the Diagram - Annette scott, Oracle

60

Satya © 2004

Finding Attributes

Is this attribute really needed ?

Beware of obsolete requirements from previous systems

Beware of derived data

Source: Adding Detail to the Diagram - Annette scott, Oracle

61

Satya © 2004

Attribute Diagramming Conventions

EMPLOYEE

badge numfirst namelast namepayroll numdate of birthemployment status

• Inside the entity's soft box

• Singular

• Lowercase

• Inside the entity's soft box

• Singular

• Lowercase

Source: Adding Detail to the Diagram - Annette scott, Oracle

62

Satya © 2004

Meaningful Components

PERSON

name

PERSONlast namefirst name

ITEM

code

ITEMtypevendornum

Break down aggregate attributes

Source: Adding Detail to the Diagram - Annette scott, Oracle

63

Satya © 2004

Verify for Single Value

RENTAL

transaction date

total amount paiditem

Yes, more than one item may be rented at a time. An entity is missing.

RENTAL

transaction date

total amount paid

RENTAL ITEM

item num

Can an attribute have more than one value for an instance of the entity?

Source: Adding Detail to the Diagram - Annette scott, Oracle

64

Satya © 2004

Attributes Which have Attributes

Does information need to be stored about any of the attributes?

Yes, review details. An entity is missing.

TITLE

REVIEW

authorcommentdate recorded

product codetitledescriptionreview details

product codetitledescription

TITLE

Source: Adding Detail to the Diagram - Annette scott, Oracle

65

Satya © 2004

Finding Common or Derived Data

• Count • Total• Maximum, Minimum, Average• Calculation

• Count • Total• Maximum, Minimum, Average• Calculation

Derived attributes are redundant and can lead to inconsistent values

12 08 30 22----72----

Source: Adding Detail to the Diagram - Annette scott, Oracle

66

Satya © 2004

Attribute Optionality

• A value must be stored for each entity instance

• Tagged with *

• A value must be stored for each entity instance

• Tagged with *

Mandatory Attributes

Optional Attributes

• A value may be stored for each entity instance

• Tagged with o

• A value may be stored for each entity instance

• Tagged with o

EMPLOYEE

badge num

first name

last name

title

***o weighto

Source: Adding Detail to the Diagram - Annette scott, Oracle

67

Satya © 2004

Attribute Details and Volumes

Attribute - * Engine Size

Format Type NumberMaximum length 4Average length 4Decimal place 1Unit of measure ccAllowable values 900,1000,1500,1800,2000

Volume Initial 100%

Source: Adding Detail to the Diagram - Annette scott, Oracle

68

Satya © 2004

Using a Domain

AUDIO

MONSTESUR

Movie

Game

Audio

Sound

Mono

Stereo

Surround

Source: Adding Detail to the Diagram - Annette scott, Oracle

69

Satya © 2004

Data Warehousing

Reference data

Meta data

Load management

Warehouse management

Query management

Fact data

Summary data

Source: Adding Detail to the Diagram - Annette scott, Oracle

70

Satya © 2004

Database Design Techniques

71

Satya © 2004

Three approaches can be followed.Sentence Analysis:

• Ask Business user to tell ‘their story’•Resultant sentences serve as basic constituents of tasks and processes performed in IS to be supported.•Extract data requirements from those sentences.

Document Analysis:•Analyze documents including transactions, reports.•Interview results, Observation results, Policies and procedures, Output of existing systems (resports & screens), Inputs to existing screens (forms & screens), Database/file specifications of existing systems.

Event Analysis:•Identify and describe what happens (the events), who is involved (actors and business resources), and what responses are required. (Follow the Zachman Interrogatives)

Design Techniques

Source: 1) Logical Data Modeling - Salvatore T. March ,2) IDEF1x.doc

72

Satya © 2004

Sentence Analysis

•Salespeople service Customers.•Customers place Orders through a Salesperson.•Freight is determined when an Order is Shipped.•Salespeople are paid commission based on their commission rate and Invoiced sales.•Each Salesperson has a number, name, and address.•Each Customer has a number and a bill-to-address.

?Identify subjects, verb phrases and objects.

?Specific instances must be generalized.

?If subject and object are both entities, then the verb phrase represents a relationship.

?If subject is an entity but object is a fact about that entity, then the object is an attribute and the verb phrase explains the meaning of the attribute.

Source: Logical Data Modeling - Salvatore T. March

73

Satya © 2004

Document AnalysisINVOICE

Sample Company, Inc. Number Date

111 Any Street 157289 10/02/90

Anytown, USA

Bill To:

Customer Number: 0361 Salesperson: 4531 – Joe Smith

Local Grocery Store Customer PO: 3291

132 Local Street Terms: Net 30

Localtown, USA FOB Point: Anytown

Line Product Product Unit of Quantity Unit

No. Number Description Sale Order Ship Backord Price Discount Extension

1 2157 Cheerios Carton 40 40 0 50.00 5 % 1900.00

2 2283 Oat Rings Each 300 200 100 2.00 0 % 400.00

3 0579 Corn Flakes Carton 30 30 0 40.00 10 % 1080.00

Order Gross 4380.00

Tax at 6 % 262.80

Freight 50.00

--------

Order Net 4692.80

?Each heading can be an entity, attribute or a derived attribute.

?Relationships need to be defined from a careful analysis only.

Source: Logical Data Modeling - Salvatore T. March

74

Satya © 2004

Document Analysis – Data Flow Diagram (DFD)

Source: Logical Data Modeling - Salvatore T. March

Examples of DFDs, Level 0, Level 1, Level 2 etc…

75

Satya © 2004

Event Analysis

?Define an entity for each event. Identify associated actor.

?Hence, Place Order, Ship Order, Invoice Order, and Pay Invoice are all entities.

Source: Logical Data Modeling - Salvatore T. March

76

Satya © 2004

Design Evaluation

?Each entity must be uniquely identified.

?Attributes are associate with entities (not relationships), and each entity must have one and only one value for each of its attributes (otherwise an additional entity must be created)

?Relationships associate a pair of entities or associate an entity with itself (only binary relationships are allowed but relationships can be recursive)

?Many to Many relationships are not allowed

?Subtypes are identified when the minimum degree of a relationship descriptor is zero or when an attribute does not apply to all instances of an entity

Source: Logical Data Modeling - Salvatore T. March

77

Satya © 2004

Further Reading

?CASE*Method: Entity Relationship Modeling by Richard Barker is an excellent introduction to ER modeling.

?Relational Database Design by Fleming and von Halle goes step by step into the nuts and bolts, all the way to the physical side.

?Practical Issues in Database Management by Fabian Pascal, will introduce many of the perennial tough problems in data modeling,and will help assure the new data modeler that there's more to data modeling than what is supported by current commercial implementations of SQL and relational database management products.

78

Satya © 2004

Concurrency

79

Satya © 2004

Benchmarking

80

Satya © 2004

Dependability Estimation

Mean time to failure (MTTF):Mean time to Repair (MTTR):

Availability:MTBF = MTTF + MTTRAi = MTTFi / MTBFi

Reliability:

Mean Transaction time:

81

Satya © 2004

Data Warehousing -Concepts

82

Satya © 2004

Database types

83

Satya © 2004

Relational:Network:Hierarchical:Object-Oriented:Spatial-Geographic:Multimedia:Temporal:Text:Active:Real Time:

84

Satya © 2004

Database – J2EE

85

Satya © 2004

Database - .NET