1 distributed database systems. 2 a distributed database on a geographically dispersed network
DESCRIPTION
3 A Distributed Database on a Local NetworkTRANSCRIPT
1
Distributed Database Systems
2
A Distributed Database on a Geographically Dispersed Network
3
A Distributed Database on a Local Network
4
A Multi-Processor System
5
Types of Accesses to a Distributed Database
6
Distributed Access Plan
1) At site 1Send sites 2 and 3 the supplier number SN
2) At sites 2 and 3Execute in parallel, upon receipt of the supplier number, the following program:
Find all PARTS records havingSUP # = SN;Send result to site 1
3) At Site 1Merge results from sites 2 and 3;Output the result.
7
8
Components of a Commercial DDBMS
9
Data DistributionProblem:
Choose a unit of the logical database to use for assignment to data modules.
Possibilities:Relations –Distribution issues will influence
logical database design.Columns –Distribution issues will
influence logical database design.Rows –Too many; Directories become too
large.Data Items -Too many; Directories become too
large.
10
Data Distribution
Fragments – Logically defined rectangular subsets of relationsRelation 1
Relation 2
Fragment 2
Fragment 3
Fragment 1
Fragment 1
Fragment 2
11
Data DistributionLogical definition of fragments -
Jones
35 32K
Salesman
Black AName Age $ Job-Title Supervis
orDept.
Fragment 1
Fragment 2 Fragment 3
$ > 30K
$ < 30K
12
Data DistributionDatamodules
F1
F2 F3 F1 F2
DM1
DM2
DM3
Personnel Inventory
Assignment of Fragments to Datamodules
13
Data Distribution
Advantages of fragments as units of distribution.
Very flexible in size and definition.Distribution choices are largely independent of logical design.
14
System Considerations
Reliable NetworkPipelining
Logical Data ItemsDatabase Operations: Read
WriteTransactions: Read Set
Write SetAtomic – “All or Nothing”
Effect
15
System Considerations (cont’d)Each site in the DDBMS has one or both of the following software modules:
Transaction Manager (TM)Data Manager (DM)
TM’sRead, Parse, and Optimize user queriesHandle all interface with the user
DM’sMaintain physical databasePerform actual reads and writes
16
System Considerations (cont’d)
TM
DMTM
TM DM
DMTransaction
TransactionTransaction
Transaction
Data
Data
Data
TM’s communication only with DM’sDM’s communication only with TM’s
17
Transaction ExecutionTransaction TM’s Action.
Begin Set up temporary workspace.
Read (X) Select a DM which stores X,Send a message to this DM requesting X,Place X in workspace.
Read (X) No Action necessaryX is already in workspace.
Write (X) Change the value of X.
Read (X) No action necessary.
End Send a pre-commit to each DM that stores a copy of X,Await acknowledgements,Send commit message
18
Optimal File Allocation In A Distributed Database System
Given a number of computers that process common information files, how can we:
allocate the files optimally so that the allocation yields minimum overall operating costs (storage and communication)?meet access time requirements for each file?not exceed the storage capacity of each computer?
Note: A File may be viewed as a segment.
19
System Parametersn Computersm Files
Size of each fileUsage distribution for each file at each computerFrequency of modification of each file at each computer during usageAccess time requirement for each file at each computer
Storage capacity of each computer.Cost of storage per unit file length per computer.Cost of transmission per unit file length per
second per pair of computers.
20
ModelCOSTS
Total Cost = Storage Costs + Transmission Costs
TC= CS + CT
Transmission Costs = Costs for Retrievals + Cost for Updates
CT = CTR + CTU
CONSTRAINTS
Each file must be stored in at least one computer.The storage capacity of each computer must not be exceeded.The probability of exceeding the required access time for each file must be less than a specified bound.
21
Mathematical Representation Model
22
23
24
25
26
27
Transmission Paths Between Each Pair of
Computers
28
29
Reliability ConstraintAssuming processors and channels each have identical
reliability,ap = availability of the processorac = availability of the channelrj = # of redundant copies of the jth fileAj = Availability of the jth fileAj= ap [1 - (1 - acap)rj
For example ap = 0.98, ac = 0.99, thenAj = 0.951 for rj = 1Aj = 0.979 for rj = 2
30
31
File Directory for Distributed Databases
32
To Other NodesTransaction Manager
Directory Manager
Database Manager
DDBMS
User Transactio
n
Database
Directory
Fragment
Overview of the Directory Manager
Legend
High-Level Request
Standard Database Call
Physical Access Call
Non-Local Request
33
Content of Directory
Global descriptionFragmentation descriptionAllocation descriptionMappings to local namesAccess method descriptionStatistics on the databaseConsistency information
34
Content of a Directory SystemPhysical (Static)Location (Site, Copy #, Disk, Page);
Creator;
Creation Date;
Version of the File Size;
Code Format;
Date of Last Update;
Logical (Dynamic)File Status (R, W)
Number of Backlog Jobs;
Site Availability;
Resource Requirement;Processing Cost;Communication Cost;Translation Cost;
Security(File, User, C);
C=Read/Write;
Read Only;
Write Only;
OperationCompression ratio (Logical Operation Query Data Value);
Query Access Optimizer;
Statistical Data Gathering;
Protocols
35
The Functional Objectives ofIntegrated Dictionary/Directory
To support the control of data resourcesMaintaining data independence, security, and integrity
To support applications developmentOffering standardized data definitions and usage characteristicsEstablished program entities, DDL
To provide independence of directory data elements
Different hardware and software environmentsChanges in these environments
36
Possible Data Types In IDDData names, definitions, formats and sizes.
Integrity constraints, authorization tables, and usage statistics for transaction management.Schemas and sub-schemas.
Description of standardized transactions and reports.
Characteristics of hardware, such as processors, lines, and terminals.
Description of users.
The IDD must support the maintenance of relationships between various entities such as:Associations between
Authorization tables and data,Users and transactionsReports
The IDD supplies version control
37
Entity EntityRelationship
Attribute Attribute Attribute
Attribute Attribute Attribute
Figure 1
38
Contains
Relationship Created 820708
Social Security Number
Entity Created 820114
Payroll Record
Maximum Length 400 Characters
Entity Created 820519
Comments Length9 Characters
Figure 2
39
Schema Model LevelTypical
Meta-Entity-Types
Schema LevelTypical
Entity-Types, Relationship-Types,and
Attribute-Types
DictionaryLevelTypical
Entities, Relationships, and Attributes
Entity-Type
Element
Record
Document
Social-Security-NumberAgency-Name
Employee RecordPayroll Record
Form 1040FIPS Guideline
Relationship-Type Record-Contains-Element
Payroll-Record-Contains-Employee-Name
Table 1
Length
CreatorAttribute-Type
9 Characters
ADP Division
40
Classes of Directory
Centralized Directory
Single Master DirectoryExtended Centralized DirectoryMultiple Master Directory
Local Directory
Distributed Directory
41
42
43
44
45
46
Causes For Directory UpdateChanging the description or structure of the user database.Moving user database entities from one node to another.Changing the description of a user or node.Changing a user view.Changing a network node’s status.
47
Specific Drawbacks with Globally Replicated Directories
1) Additional remote activity to maintain directory coherence.
2) Difficulty of posting directory changes to a down site.
3) Difficulty of integrating a new site.
4) Storage of directory entries where they are not referenced.
5) Blurred responsibility for maintaining the directory.
48
Performance Measure
Operating Cost/Unit Time = Communication Cost(Query+Update)
+Storage Cost + Code Translation Cost(Query+Update)
Response Time
49
Operating Cost for the Centralized Directory System
50
51
Cost Trade-offs of Directory SystemsAssume
Communication cost much greater than storage costNo Translation costAll computers have same directory update rate
Then the cost trade-off point is at directory update rate.
P(C,EC) = 2/(N – 1) P(C,D) = 2/(N – 1) P(L,D) = 1
52
53
Type
Centralized
Extended Centralized
Multiple Master
Distributed Master
Localized
Description
Single Master directory
Advantages
SimplicityEase of updateReduces transmission costs and delays
Reduces transmission costs and delaysFall-soft CharacteristicsFast Response
Simple update procedure
Disadvantages
Transmission costs and delays
Coordinating updates of local directoriesKnowledge of appended directories
Storage requirementsCoordinating update of redundant copies
Storage costsTransmission costs for updates to the directoryTransmission costs for non-local queries
Variation of the centralized case in which the directory information is permanently appended in the local node once it is obtained from the master directoryVariation of the centralized case in which redundant copies of the master directory existMaster at every node
Local directory at each node without replication
Directory Design Alternatives
54
Distributed Ingres Dictionary/Directory Contain Four Types of Data:
Relation name and location
Information for parsing queries(domain names, formats, etc.)
Performance information(number of tuples, storage structures, etc.)
Consistency information(protection, integrity constraints, etc. Does not include control data for concurrency control and synchronization)
55
SDD-1 Dictionary/DirectoryThe directory itself is defined and maintained like any other user data. It can be logically fragmented, distributed, and replicated across the distributed DBMS’s.A directory locator (a small highly static file of directory fragment locations) is kept at every site and is used by the TMs and DMs to plan and control transactions and to help ensure DB integrity and consistency across concurrent accesses of data elements.The transaction modules are capable of caching remotely accessed directory data for subsequent usage. This facility is provided on the presumption that DB operations will exhibit the locality-of-reference characteristic.
56
Vpatient : Patient ClassnameSSNagepatID{report}
PatientDB1nameSSNage
PatientDB2nameSSNpatID
PatReportDB2patIDreport
Note that a shaded box represents a real collection and an unshaded box represents a
virtual entity.Figure 17: Pictorial diagram showing usefulness of
keys.
57
name
sex
age
ssn
job
personDB1namesexagessn
personDB2namegenderssnjob
Figure 15: Pictorial diagram showing correspondence between virtual and real attributes.
Vperson : PersonClass
V person
People
Virtual Collection
Note that a shaded box represents a real collection and an unshaded box represents a
virtual entity.
Character_to_String
Character_to_StringLargePositiveInteger_to_String
58
Vretiree:retireClassnameincome
Vincome: incomeClassstockAmoun
tpension
financeDB1 name
stockAmount
financeDB2 name
pension
Note that a shaded box represents a real collection and an unshaded box represents a virtual entity.
Figure 18: Pictorial diagram for aggregation.
59
Vname: nameClassfirst
middle
last
personDB1
name
getfirstgetmiddle
getlast
Note that a shaded box represents a real collection and an unshaded box represents a
virtual entity.
Figure 19: Pictorial diagram of computed attribute.
60
Vretiree:retireClass
name
incom
e
financeDB1 name
stockAmount
financeDB2 name
pension
Note that a shaded box represents a real collection and an unshaded box represents a
virtual entity.
Figure 20: Pictorial diagram of computed attribute.
1
2
61
Vinsurance:insuranceClass
name
{insuranceAmoun
ts}
carInsuranceDB1carOwneramount
houseInsuranceDB2houseOnwe
ramount
Note that a shaded box represents a real collection and an unshaded box represents a
virtual entity.
Figure 21: Pictorial diagram showing grouping.
62
Vpatient : patientClass
name
{doctors}
Note that a shaded box represents a real collection and an unshaded box represents a
virtual entity.Figure 22: Pictorial diagram showing relationship.
Vdoctors : doctorClass
name
docID
salarypatientDB1
namesalary
patientDB1namedocID
patientDB2namephysician
patientDB1namedocID
(key)
(pointer)
relationship
63
VtreatedBy : treatedByClass
Note that a shaded box represents a real collection and an unshaded box represents a
virtual entity.Figure 23: Pictorial diagram showing a named
relationship.
Vpatient : PatientClass
.
.
.
patientDB1
namedocIDamountOwed
patientdoctoramountOwed
(key)(key)
Vdoctor : DoctorClass...
64
Note that a shaded box represents a real collection and an unshaded box represents a virtual entity.
VpersonPatient : personClassname
Vpatient : patientClasspatID
amount
VpersonDoctor : personClassname
Vdoctor : DoctorClassdocIDsalary
patientDB1name
SSNpayment
namedocIDsalary
doctorDB2
Figure 24: Pictorial diagram showing relationship.
Vpatientpatient
Vdoctordoctorperson
VpersonPatient
VpersonDoctor
Virtual collections
65
Note that a shaded box represents a real collection and an unshaded box represents a
virtual entity.Figure 30: Derivation of Virtual Entity Vconcept.
ConceptSemTypeconceptID
semTypeID
ConceptconceptIDtermIDstringTypestringIDstringVal
VconceptconceptIDsemType{termSet} Vterm
termID{stringSet}
VstringstringNamestringIDstringType
(key)
66
DsemType
IDnamedefinition{relatedTo}
DsemRelate
relNamesemNamestatus
SemTypeDef
IDnamedefinition
SemTypeRel
name1relname2status
Note that a shaded box represents a real collection and an unshaded box represents a
virtual entity.Figure 31: Derivation of Virtual Entity VsemType.