distributed database systems
DESCRIPTION
Distributed Database Systems. A Distributed Database on a Geographically Dispersed Network. A Distributed Database on a Local Network. A Multi-Processor System. Types of Accesses to a Distributed Database. Distributed Access Plan. At site 1 Send sites 2 and 3 the supplier number SN - PowerPoint PPT PresentationTRANSCRIPT
1
Distributed Database Systems
2
A Distributed Database on a Geographically Dispersed Network
3
A Distributed Database on a Local Network
4
A Multi-Processor System
5
Types of Accesses to a Distributed Database
6
Distributed Access Plan
1) At site 1Send sites 2 and 3 the supplier number SN
2) At sites 2 and 3Execute in parallel, upon receipt of the supplier number, the following program:
Find all PARTS records havingSUP # = SN;Send result to site 1
3) At Site 1Merge results from sites 2 and 3;Output the result.
7
8
Components of a Commercial DDBMS
9
Data Distribution
Problem:Choose a unit of the logical database to use for assignment to data modules.
Possibilities:Relations –Distribution issues will influence
logical database design.Columns –Distribution issues will
influence logical database design.
Rows –Too many; Directories become too large.
Data Items -Too many; Directories become too large.
10
Data Distribution
Fragments – Logically defined rectangular subsets of relationsRelation 1
Relation 2
Fragment 2
Fragment 3
Fragment 1
Fragment 1
Fragment 2
11
Data Distribution
Logical definition of fragments -
Jones
35 32K
Salesman
Black A
Name Age $ Job-Title Supervisor
Dept.
Fragment 1
Fragment 2 Fragment 3
$ > 30K
$ < 30K
12
Data Distribution
Datamodules
F1
F2 F3 F1 F2
DM1
DM2
DM3
Personnel Inventory
Assignment of Fragments to Datamodules
13
Data Distribution
Advantages of fragments as units of distribution.
Very flexible in size and definition.Distribution choices are largely independent of logical design.
14
System Considerations
Reliable NetworkPipelining
Logical Data ItemsDatabase Operations: Read
WriteTransactions: Read Set
Write SetAtomic – “All or Nothing”
Effect
15
System Considerations (cont’d)
Each site in the DDBMS has one or both of the following software modules:
Transaction Manager (TM)Data Manager (DM)
TM’sRead, Parse, and Optimize user queriesHandle all interface with the user
DM’sMaintain physical databasePerform actual reads and writes
16
System Considerations (cont’d)
TM
DMTM
TM DM
DMTransaction
Transaction
Transaction
Transaction
Data
Data
Data
TM’s communication only with DM’s
DM’s communication only with TM’s
17
Transaction Execution
Transaction TM’s Action.
Begin Set up temporary workspace.
Read (X) Select a DM which stores X,Send a message to this DM requesting X,Place X in workspace.
Read (X) No Action necessaryX is already in workspace.
Write (X) Change the value of X.
Read (X) No action necessary.
End Send a pre-commit to each DM that stores a copy of X,
Await acknowledgements,Send commit message
18
Optimal File Allocation In A Distributed Database System
Given a number of computers that process common information files, how can we:
allocate the files optimally so that the allocation yields minimum overall operating costs (storage and communication)?meet access time requirements for each file?not exceed the storage capacity of each computer?
Note: A File may be viewed as a segment.
19
System Parameters
n Computers
m FilesSize of each fileUsage distribution for each file at each computerFrequency of modification of each file at each computer during usageAccess time requirement for each file at each computer
Storage capacity of each computer.
Cost of storage per unit file length per computer.
Cost of transmission per unit file length per second per pair of computers.
20
Model
COSTS
Total Cost = Storage Costs + Transmission Costs
TC = CS + CT
Transmission Costs = Costs for Retrievals + Cost for Updates
CT = CTR + CTU
CONSTRAINTS
Each file must be stored in at least one computer.The storage capacity of each computer must not be exceeded.The probability of exceeding the required access time for each file must be less than a specified bound.
21
Mathematical Representation Model
22
23
24
25
26
27
Transmission Paths Between Each Pair of
Computers
28
29
Reliability Constraint
Assuming processors and channels each have identical reliability,
ap = availability of the processor
ac = availability of the channel
rj = # of redundant copies of the jth file
Aj = Availability of the jth file
Aj= ap [1 - (1 - acap)rj
For example ap = 0.98, ac = 0.99, then
Aj = 0.951 for rj = 1
Aj = 0.979 for rj = 2
30
31
File Directory for Distributed Databases
32
To Other NodesTransaction
ManagerDirectory Manager
Database Manager
DDBMS
User Transactio
n
Database
Directory
Fragment
Overview of the Directory Manager
Legend
High-Level Request
Standard Database Call
Physical Access Call
Non-Local Request
33
Content of Directory
Global description
Fragmentation description
Allocation description
Mappings to local names
Access method description
Statistics on the database
Consistency information
34
Content of a Directory System
Physical (Static)
Location (Site, Copy #, Disk, Page);
Creator;
Creation Date;
Version of the File Size;
Code Format;
Date of Last Update;
Logical (Dynamic)
File Status (R, W)
Number of Backlog Jobs;
Site Availability;
Resource Requirement;
Processing Cost;
Communication Cost;
Translation Cost;
Security
(File, User, C);
C=Read/Write;
Read Only;
Write Only;
Operation
Compression ratio (Logical Operation Query Data Value);
Query Access Optimizer;
Statistical Data Gathering;
Protocols
35
The Functional Objectives ofIntegrated Dictionary/Directory
To support the control of data resourcesMaintaining data independence, security, and integrity
To support applications developmentOffering standardized data definitions and usage characteristicsEstablished program entities, DDL
To provide independence of directory data elements
Different hardware and software environmentsChanges in these environments
36
Possible Data Types In IDD
Data names, definitions, formats and sizes.
Integrity constraints, authorization tables, and usage statistics for transaction management.
Schemas and sub-schemas.
Description of standardized transactions and reports.
Characteristics of hardware, such as processors, lines, and terminals.
Description of users.
The IDD must support the maintenance of relationships between various entities such as:
Associations between
Authorization tables and data,Users and transactionsReports
The IDD supplies version control
37
Entity EntityRelationship
Attribute Attribute Attribute
Attribute Attribute Attribute
Figure 1
38
Contains
Relationship Created 820708
Social Security Number
Entity Created 820114
Payroll Record
Maximum Length 400 Characters
Entity Created 820519
Comments Length
9 Characters
Figure 2
39
Schema Model Level
TypicalMeta-Entity-Types
Schema Level
Typical
Entity-Types, Relationship-Types,and
Attribute-Types
DictionaryLevelTypical
Entities, Relationships, and Attributes
Entity-Type
Element
Record
Document
Social-Security-Number
Agency-Name
Employee Record
Payroll Record
Form 1040
FIPS Guideline
Relationship-Type Record-Contains-Element
Payroll-Record-Contains-Employee-Name
Table 1
Length
CreatorAttribute-Type
9 Characters
ADP Division
40
Classes of Directory
Centralized Directory
Single Master DirectoryExtended Centralized DirectoryMultiple Master Directory
Local Directory
Distributed Directory
41
42
43
44
45
46
Causes For Directory Update
Changing the description or structure of
the user database.
Moving user database entities from one
node to another.
Changing the description of a user or
node.
Changing a user view.
Changing a network node’s status.
47
Specific Drawbacks with Globally Replicated Directories
1) Additional remote activity to maintain directory coherence.
2) Difficulty of posting directory changes to a down site.
3) Difficulty of integrating a new site.
4) Storage of directory entries where they are not referenced.
5) Blurred responsibility for maintaining the directory.
48
Performance Measure
Operating Cost/Unit Time = Communication Cost
(Query+Update)
+Storage Cost + Code Translation Cost(Query+Update)
Response Time
49
Operating Cost for the Centralized Directory System
50
51
Cost Trade-offs of Directory Systems
Assume
Communication cost much greater than storage costNo Translation costAll computers have same directory update rate
Then the cost trade-off point is at directory update rate.
P(C,EC) = 2/(N – 1) P(C,D) = 2/(N – 1) P(L,D) = 1
52
53
Type
Centralized
Extended Centralized
Multiple Master
Distributed Master
Localized
Description
Single Master directory
Advantages
Simplicity
Ease of update
Reduces transmission costs and delays
Reduces transmission costs and delays
Fall-soft CharacteristicsFast Response
Simple update procedure
Disadvantages
Transmission costs and delays
Coordinating updates of local directories
Knowledge of appended directories
Storage requirements
Coordinating update of redundant copies
Storage costs
Transmission costs for updates to the directory
Transmission costs for non-local queries
Variation of the centralized case in which the directory information is permanently appended in the local node once it is obtained from the master directoryVariation of the centralized case in which redundant copies of the master directory exist
Master at every node
Local directory at each node without replication
Directory Design Alternatives
54
Distributed Ingres Dictionary/Directory Contain Four Types of Data:
Relation name and location
Information for parsing queries(domain names, formats, etc.)
Performance information(number of tuples, storage structures, etc.)
Consistency information(protection, integrity constraints, etc. Does not include control data for concurrency control and synchronization)
55
SDD-1 Dictionary/Directory
The directory itself is defined and maintained like any other user data. It can be logically fragmented, distributed, and replicated across the distributed DBMS’s.
A directory locator (a small highly static file of directory fragment locations) is kept at every site and is used by the TMs and DMs to plan and control transactions and to help ensure DB integrity and consistency across concurrent accesses of data elements.
The transaction modules are capable of caching remotely accessed directory data for subsequent usage. This facility is provided on the presumption that DB operations will exhibit the locality-of-reference characteristic.
56
Vpatient : Patient Class
name
SSN
age
patID
{report}
PatientDB1
name
SSN
age
PatientDB2
name
SSN
patID
PatReportDB2
patID
report
Note that a shaded box represents a real collection and an unshaded box represents a
virtual entity.
Figure 17: Pictorial diagram showing usefulness of keys.
57
name
sex
age
ssn
job
personDB1
name
sex
age
ssn
personDB2
name
gender
ssn
job
Figure 15: Pictorial diagram showing correspondence between virtual and real attributes.
Vperson : PersonClass
V person
People
Virtual Collection
Note that a shaded box represents a real collection and an unshaded box represents a
virtual entity.
Character_to_String
Character_to_StringLargePositiveInteger_to_String
58
Vretiree:retireClass
name
income
Vincome: incomeClass
stockAmount
pension
financeDB1
name
stockAmount
financeDB2
name
pension
Note that a shaded box represents a real collection and an unshaded box represents a virtual entity.
Figure 18: Pictorial diagram for aggregation.
59
Vname: nameClass
first
middle
last
personDB1
name
getfirst
getmiddle
getlast
Note that a shaded box represents a real collection and an unshaded box represents a
virtual entity.
Figure 19: Pictorial diagram of computed attribute.
60
Vretiree:retireClass
name
incom
e
financeDB1
name
stockAmount
financeDB2
name
pension
Note that a shaded box represents a real collection and an unshaded box represents a
virtual entity.
Figure 20: Pictorial diagram of computed attribute.
1
2
61
Vinsurance:insuranceClass
name
{insuranceAmoun
ts}
carInsuranceDB1
carOwner
amount
houseInsuranceDB2
houseOnwer
amount
Note that a shaded box represents a real collection and an unshaded box represents a
virtual entity.
Figure 21: Pictorial diagram showing grouping.
62
Vpatient : patientClass
name
{doctors}
Note that a shaded box represents a real collection and an unshaded box represents a
virtual entity.
Figure 22: Pictorial diagram showing relationship.
Vdoctors : doctorClass
name
docID
salarypatientDB1
name
salary
patientDB1
name
docID
patientDB2
name
physician
patientDB1
name
docID(key)
(pointer)
relationship
63
VtreatedBy : treatedByClass
Note that a shaded box represents a real collection and an unshaded box represents a
virtual entity.
Figure 23: Pictorial diagram showing a named relationship.
Vpatient : PatientClass
.
.
.
patientDB1
name
docID
amountOwed
patient
doctor
amountOwed
(key)
(key)
Vdoctor : DoctorClass
.
.
.
64
Note that a shaded box represents a real collection and an unshaded box represents a virtual entity.
VpersonPatient : personClass
name
Vpatient : patientClass
patID
amount
VpersonDoctor : personClass
name
Vdoctor : DoctorClass
docID
salary
patientDB1name
SSN
payment
name
docID
salary
doctorDB2
Figure 24: Pictorial diagram showing relationship.
Vpatient
patient
Vdoctor
doctorperson
VpersonPatient
VpersonDoctor
Virtual collections
65
Note that a shaded box represents a real collection and an unshaded box represents a
virtual entity.
Figure 30: Derivation of Virtual Entity Vconcept.
ConceptSemTypeconceptID
semTypeID
Concept
conceptID
termID
stringType
stringID
stringVal
Vconcept
conceptID
semType
{termSet}Vterm
termID
{stringSet}
Vstring
stringName
stringID
stringType
(key)
66
DsemType
ID
name
definition
{relatedTo}
DsemRelate
relName
semName
status
SemTypeDef
ID
name
definition
SemTypeRel
name1
rel
name2
status
Note that a shaded box represents a real collection and an unshaded box represents a
virtual entity.
Figure 31: Derivation of Virtual Entity VsemType.