distributed database systems

66
1 Distributed Database Systems

Upload: shelly-golden

Post on 02-Jan-2016

69 views

Category:

Documents


1 download

DESCRIPTION

Distributed Database Systems. A Distributed Database on a Geographically Dispersed Network. A Distributed Database on a Local Network. A Multi-Processor System. Types of Accesses to a Distributed Database. Distributed Access Plan. At site 1 Send sites 2 and 3 the supplier number SN - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Distributed Database Systems

1

Distributed Database Systems

Page 2: Distributed Database Systems

2

A Distributed Database on a Geographically Dispersed Network

Page 3: Distributed Database Systems

3

A Distributed Database on a Local Network

Page 4: Distributed Database Systems

4

A Multi-Processor System

Page 5: Distributed Database Systems

5

Types of Accesses to a Distributed Database

Page 6: Distributed Database Systems

6

Distributed Access Plan

1) At site 1Send sites 2 and 3 the supplier number SN

2) At sites 2 and 3Execute in parallel, upon receipt of the supplier number, the following program:

Find all PARTS records havingSUP # = SN;Send result to site 1

3) At Site 1Merge results from sites 2 and 3;Output the result.

Page 7: Distributed Database Systems

7

Page 8: Distributed Database Systems

8

Components of a Commercial DDBMS

Page 9: Distributed Database Systems

9

Data Distribution

Problem:Choose a unit of the logical database to use for assignment to data modules.

Possibilities:Relations –Distribution issues will influence

logical database design.Columns –Distribution issues will

influence logical database design.

Rows –Too many; Directories become too large.

Data Items -Too many; Directories become too large.

Page 10: Distributed Database Systems

10

Data Distribution

Fragments – Logically defined rectangular subsets of relationsRelation 1

Relation 2

Fragment 2

Fragment 3

Fragment 1

Fragment 1

Fragment 2

Page 11: Distributed Database Systems

11

Data Distribution

Logical definition of fragments -

Jones

35 32K

Salesman

Black A

Name Age $ Job-Title Supervisor

Dept.

Fragment 1

Fragment 2 Fragment 3

$ > 30K

$ < 30K

Page 12: Distributed Database Systems

12

Data Distribution

Datamodules

F1

F2 F3 F1 F2

DM1

DM2

DM3

Personnel Inventory

Assignment of Fragments to Datamodules

Page 13: Distributed Database Systems

13

Data Distribution

Advantages of fragments as units of distribution.

Very flexible in size and definition.Distribution choices are largely independent of logical design.

Page 14: Distributed Database Systems

14

System Considerations

Reliable NetworkPipelining

Logical Data ItemsDatabase Operations: Read

WriteTransactions: Read Set

Write SetAtomic – “All or Nothing”

Effect

Page 15: Distributed Database Systems

15

System Considerations (cont’d)

Each site in the DDBMS has one or both of the following software modules:

Transaction Manager (TM)Data Manager (DM)

TM’sRead, Parse, and Optimize user queriesHandle all interface with the user

DM’sMaintain physical databasePerform actual reads and writes

Page 16: Distributed Database Systems

16

System Considerations (cont’d)

TM

DMTM

TM DM

DMTransaction

Transaction

Transaction

Transaction

Data

Data

Data

TM’s communication only with DM’s

DM’s communication only with TM’s

Page 17: Distributed Database Systems

17

Transaction Execution

Transaction TM’s Action.

Begin Set up temporary workspace.

Read (X) Select a DM which stores X,Send a message to this DM requesting X,Place X in workspace.

Read (X) No Action necessaryX is already in workspace.

Write (X) Change the value of X.

Read (X) No action necessary.

End Send a pre-commit to each DM that stores a copy of X,

Await acknowledgements,Send commit message

Page 18: Distributed Database Systems

18

Optimal File Allocation In A Distributed Database System

Given a number of computers that process common information files, how can we:

allocate the files optimally so that the allocation yields minimum overall operating costs (storage and communication)?meet access time requirements for each file?not exceed the storage capacity of each computer?

Note: A File may be viewed as a segment.

Page 19: Distributed Database Systems

19

System Parameters

n Computers

m FilesSize of each fileUsage distribution for each file at each computerFrequency of modification of each file at each computer during usageAccess time requirement for each file at each computer

Storage capacity of each computer.

Cost of storage per unit file length per computer.

Cost of transmission per unit file length per second per pair of computers.

Page 20: Distributed Database Systems

20

Model

COSTS

Total Cost = Storage Costs + Transmission Costs

TC = CS + CT

Transmission Costs = Costs for Retrievals + Cost for Updates

CT = CTR + CTU

CONSTRAINTS

Each file must be stored in at least one computer.The storage capacity of each computer must not be exceeded.The probability of exceeding the required access time for each file must be less than a specified bound.

Page 21: Distributed Database Systems

21

Mathematical Representation Model

Page 22: Distributed Database Systems

22

Page 23: Distributed Database Systems

23

Page 24: Distributed Database Systems

24

Page 25: Distributed Database Systems

25

Page 26: Distributed Database Systems

26

Page 27: Distributed Database Systems

27

Transmission Paths Between Each Pair of

Computers

Page 28: Distributed Database Systems

28

Page 29: Distributed Database Systems

29

Reliability Constraint

Assuming processors and channels each have identical reliability,

ap = availability of the processor

ac = availability of the channel

rj = # of redundant copies of the jth file

Aj = Availability of the jth file

Aj= ap [1 - (1 - acap)rj

For example ap = 0.98, ac = 0.99, then

Aj = 0.951 for rj = 1

Aj = 0.979 for rj = 2

Page 30: Distributed Database Systems

30

Page 31: Distributed Database Systems

31

File Directory for Distributed Databases

Page 32: Distributed Database Systems

32

To Other NodesTransaction

ManagerDirectory Manager

Database Manager

DDBMS

User Transactio

n

Database

Directory

Fragment

Overview of the Directory Manager

Legend

High-Level Request

Standard Database Call

Physical Access Call

Non-Local Request

Page 33: Distributed Database Systems

33

Content of Directory

Global description

Fragmentation description

Allocation description

Mappings to local names

Access method description

Statistics on the database

Consistency information

Page 34: Distributed Database Systems

34

Content of a Directory System

Physical (Static)

Location (Site, Copy #, Disk, Page);

Creator;

Creation Date;

Version of the File Size;

Code Format;

Date of Last Update;

Logical (Dynamic)

File Status (R, W)

Number of Backlog Jobs;

Site Availability;

Resource Requirement;

Processing Cost;

Communication Cost;

Translation Cost;

Security

(File, User, C);

C=Read/Write;

Read Only;

Write Only;

Operation

Compression ratio (Logical Operation Query Data Value);

Query Access Optimizer;

Statistical Data Gathering;

Protocols

Page 35: Distributed Database Systems

35

The Functional Objectives ofIntegrated Dictionary/Directory

To support the control of data resourcesMaintaining data independence, security, and integrity

To support applications developmentOffering standardized data definitions and usage characteristicsEstablished program entities, DDL

To provide independence of directory data elements

Different hardware and software environmentsChanges in these environments

Page 36: Distributed Database Systems

36

Possible Data Types In IDD

Data names, definitions, formats and sizes.

Integrity constraints, authorization tables, and usage statistics for transaction management.

Schemas and sub-schemas.

Description of standardized transactions and reports.

Characteristics of hardware, such as processors, lines, and terminals.

Description of users.

The IDD must support the maintenance of relationships between various entities such as:

Associations between

Authorization tables and data,Users and transactionsReports

The IDD supplies version control

Page 37: Distributed Database Systems

37

Entity EntityRelationship

Attribute Attribute Attribute

Attribute Attribute Attribute

Figure 1

Page 38: Distributed Database Systems

38

Contains

Relationship Created 820708

Social Security Number

Entity Created 820114

Payroll Record

Maximum Length 400 Characters

Entity Created 820519

Comments Length

9 Characters

Figure 2

Page 39: Distributed Database Systems

39

Schema Model Level

TypicalMeta-Entity-Types

Schema Level

Typical

Entity-Types, Relationship-Types,and

Attribute-Types

DictionaryLevelTypical

Entities, Relationships, and Attributes

Entity-Type

Element

Record

Document

Social-Security-Number

Agency-Name

Employee Record

Payroll Record

Form 1040

FIPS Guideline

Relationship-Type Record-Contains-Element

Payroll-Record-Contains-Employee-Name

Table 1

Length

CreatorAttribute-Type

9 Characters

ADP Division

Page 40: Distributed Database Systems

40

Classes of Directory

Centralized Directory

Single Master DirectoryExtended Centralized DirectoryMultiple Master Directory

Local Directory

Distributed Directory

Page 41: Distributed Database Systems

41

Page 42: Distributed Database Systems

42

Page 43: Distributed Database Systems

43

Page 44: Distributed Database Systems

44

Page 45: Distributed Database Systems

45

Page 46: Distributed Database Systems

46

Causes For Directory Update

Changing the description or structure of

the user database.

Moving user database entities from one

node to another.

Changing the description of a user or

node.

Changing a user view.

Changing a network node’s status.

Page 47: Distributed Database Systems

47

Specific Drawbacks with Globally Replicated Directories

1) Additional remote activity to maintain directory coherence.

2) Difficulty of posting directory changes to a down site.

3) Difficulty of integrating a new site.

4) Storage of directory entries where they are not referenced.

5) Blurred responsibility for maintaining the directory.

Page 48: Distributed Database Systems

48

Performance Measure

Operating Cost/Unit Time = Communication Cost

(Query+Update)

+Storage Cost + Code Translation Cost(Query+Update)

Response Time

Page 49: Distributed Database Systems

49

Operating Cost for the Centralized Directory System

Page 50: Distributed Database Systems

50

Page 51: Distributed Database Systems

51

Cost Trade-offs of Directory Systems

Assume

Communication cost much greater than storage costNo Translation costAll computers have same directory update rate

Then the cost trade-off point is at directory update rate.

P(C,EC) = 2/(N – 1) P(C,D) = 2/(N – 1) P(L,D) = 1

Page 52: Distributed Database Systems

52

Page 53: Distributed Database Systems

53

Type

Centralized

Extended Centralized

Multiple Master

Distributed Master

Localized

Description

Single Master directory

Advantages

Simplicity

Ease of update

Reduces transmission costs and delays

Reduces transmission costs and delays

Fall-soft CharacteristicsFast Response

Simple update procedure

Disadvantages

Transmission costs and delays

Coordinating updates of local directories

Knowledge of appended directories

Storage requirements

Coordinating update of redundant copies

Storage costs

Transmission costs for updates to the directory

Transmission costs for non-local queries

Variation of the centralized case in which the directory information is permanently appended in the local node once it is obtained from the master directoryVariation of the centralized case in which redundant copies of the master directory exist

Master at every node

Local directory at each node without replication

Directory Design Alternatives

Page 54: Distributed Database Systems

54

Distributed Ingres Dictionary/Directory Contain Four Types of Data:

Relation name and location

Information for parsing queries(domain names, formats, etc.)

Performance information(number of tuples, storage structures, etc.)

Consistency information(protection, integrity constraints, etc. Does not include control data for concurrency control and synchronization)

Page 55: Distributed Database Systems

55

SDD-1 Dictionary/Directory

The directory itself is defined and maintained like any other user data. It can be logically fragmented, distributed, and replicated across the distributed DBMS’s.

A directory locator (a small highly static file of directory fragment locations) is kept at every site and is used by the TMs and DMs to plan and control transactions and to help ensure DB integrity and consistency across concurrent accesses of data elements.

The transaction modules are capable of caching remotely accessed directory data for subsequent usage. This facility is provided on the presumption that DB operations will exhibit the locality-of-reference characteristic.

Page 56: Distributed Database Systems

56

Vpatient : Patient Class

name

SSN

age

patID

{report}

PatientDB1

name

SSN

age

PatientDB2

name

SSN

patID

PatReportDB2

patID

report

Note that a shaded box represents a real collection and an unshaded box represents a

virtual entity.

Figure 17: Pictorial diagram showing usefulness of keys.

Page 57: Distributed Database Systems

57

name

sex

age

ssn

job

personDB1

name

sex

age

ssn

personDB2

name

gender

ssn

job

Figure 15: Pictorial diagram showing correspondence between virtual and real attributes.

Vperson : PersonClass

V person

People

Virtual Collection

Note that a shaded box represents a real collection and an unshaded box represents a

virtual entity.

Character_to_String

Character_to_StringLargePositiveInteger_to_String

Page 58: Distributed Database Systems

58

Vretiree:retireClass

name

income

Vincome: incomeClass

stockAmount

pension

financeDB1

name

stockAmount

financeDB2

name

pension

Note that a shaded box represents a real collection and an unshaded box represents a virtual entity.

Figure 18: Pictorial diagram for aggregation.

Page 59: Distributed Database Systems

59

Vname: nameClass

first

middle

last

personDB1

name

getfirst

getmiddle

getlast

Note that a shaded box represents a real collection and an unshaded box represents a

virtual entity.

Figure 19: Pictorial diagram of computed attribute.

Page 60: Distributed Database Systems

60

Vretiree:retireClass

name

incom

e

financeDB1

name

stockAmount

financeDB2

name

pension

Note that a shaded box represents a real collection and an unshaded box represents a

virtual entity.

Figure 20: Pictorial diagram of computed attribute.

1

2

Page 61: Distributed Database Systems

61

Vinsurance:insuranceClass

name

{insuranceAmoun

ts}

carInsuranceDB1

carOwner

amount

houseInsuranceDB2

houseOnwer

amount

Note that a shaded box represents a real collection and an unshaded box represents a

virtual entity.

Figure 21: Pictorial diagram showing grouping.

Page 62: Distributed Database Systems

62

Vpatient : patientClass

name

{doctors}

Note that a shaded box represents a real collection and an unshaded box represents a

virtual entity.

Figure 22: Pictorial diagram showing relationship.

Vdoctors : doctorClass

name

docID

salarypatientDB1

name

salary

patientDB1

name

docID

patientDB2

name

physician

patientDB1

name

docID(key)

(pointer)

relationship

Page 63: Distributed Database Systems

63

VtreatedBy : treatedByClass

Note that a shaded box represents a real collection and an unshaded box represents a

virtual entity.

Figure 23: Pictorial diagram showing a named relationship.

Vpatient : PatientClass

.

.

.

patientDB1

name

docID

amountOwed

patient

doctor

amountOwed

(key)

(key)

Vdoctor : DoctorClass

.

.

.

Page 64: Distributed Database Systems

64

Note that a shaded box represents a real collection and an unshaded box represents a virtual entity.

VpersonPatient : personClass

name

Vpatient : patientClass

patID

amount

VpersonDoctor : personClass

name

Vdoctor : DoctorClass

docID

salary

patientDB1name

SSN

payment

name

docID

salary

doctorDB2

Figure 24: Pictorial diagram showing relationship.

Vpatient

patient

Vdoctor

doctorperson

VpersonPatient

VpersonDoctor

Virtual collections

Page 65: Distributed Database Systems

65

Note that a shaded box represents a real collection and an unshaded box represents a

virtual entity.

Figure 30: Derivation of Virtual Entity Vconcept.

ConceptSemTypeconceptID

semTypeID

Concept

conceptID

termID

stringType

stringID

stringVal

Vconcept

conceptID

semType

{termSet}Vterm

termID

{stringSet}

Vstring

stringName

stringID

stringType

(key)

Page 66: Distributed Database Systems

66

DsemType

ID

name

definition

{relatedTo}

DsemRelate

relName

semName

status

SemTypeDef

ID

name

definition

SemTypeRel

name1

rel

name2

status

Note that a shaded box represents a real collection and an unshaded box represents a

virtual entity.

Figure 31: Derivation of Virtual Entity VsemType.