distributed dbmss – concepts and design
DESCRIPTION
Distributed DBMSs – Concepts and Design. Chapter 24 in Textbook. Overview. Concepts. What is a distributed DBMS? Distributed Processing. Homogeneous vs. Heterogeneous. Functions of a DDBMS. Components of a DDBMS. Advantages and Disadvantages. DDBMS Design. Fragmentation. Replication. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/1.jpg)
Distributed DBMSs – Concepts and Design
Chapter 24 in Textbook
![Page 2: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/2.jpg)
Overview
2
Concepts. What is a distributed DBMS? Distributed Processing. Homogeneous vs. Heterogeneous.
Functions of a DDBMS. Components of a DDBMS. Advantages and Disadvantages. DDBMS Design.
Fragmentation. Replication. Allocation.
DDBMS Transparencies. Date’s 12 Rules for a DDBMS.
![Page 3: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/3.jpg)
Concepts
3
Centralized DBMS systems with a single logical database
located at one site under the control of a single DBMS.
Distributed DBs logically interrelated collection of shared
data physically distributed over a computer network.
Applications can be classified into:
Local applications.
Global applications.
![Page 4: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/4.jpg)
Distributed DBMS
4
Distributed DBMS The software system that:
manages the distributed DBs.
makes distribution transparent to users.
allows users to access data on their own site as well
as remote sites.
Transparent distribution is the fundamental
principle of DDBMS.
![Page 5: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/5.jpg)
Characteristics of DDBMS
5
• A collection of logically related shared data.
• The data is split into a number of fragments.
• Fragments may be replicated.
• Fragments/replicas are allocated to sites.
• The sites are linked by a communications networks.
• The data at each site is under the control of a DBMS.
• The DBMS at each site can handle local applications.
• Each DBMS participates in at least one global application.
![Page 6: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/6.jpg)
Distributed DBMS Topology
6
Site 1
Site 2
Site 3
Site 4
Computer Network
Data itself is distributed and access to it can be local or remote.
![Page 7: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/7.jpg)
Distributed Processing
7
Site 1
Site 2
Site 3
Site 4
Computer Network
Data itself is centralized but access to it can be local or remote.
![Page 8: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/8.jpg)
Homogeneous vs. Heterogeneous DDBMS
8
Homogenous system: all sites use the same DBMS product.
Heterogeneous system: sites may run different DBMS
products & data model.
Possible differences between data in different DBS:
• Data type difference.
• Value difference.
• Semantic difference.
![Page 9: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/9.jpg)
Functions of a DDBMS
9
• Provide access to remote sites and allow transfer of
queries & data among the network’s site.
• Store data distribution details.
• Distributed data processing.
• Security control.
• Concurrency control.
• Recovery services.
![Page 10: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/10.jpg)
Components of a DDBMS
10
Site 1
Site 3
Computer Network
DDBMS
DC LDBMS
DDBMS
DC
GSC
GSC
DB
Global system catalog
Data communication component
![Page 11: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/11.jpg)
Advantages of DDBMS
11
• Reflects organizational structure.
• Improve sharability & local autonomy.
• Improved availability.
• Improved reliability.
• Improved performance.
![Page 12: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/12.jpg)
Disadvantages of DDBMS
12
• Complexity.
• Cost.
• Security.
• Integrity control more difficult.
• Lack of standards.
• Lack of experience.
• DB design more complex.
![Page 13: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/13.jpg)
Distributed Relational DB Design
13
We have a group of tables and we want to distribute them between a group of sites.
Consists of 3 major steps:1. Fragmentation divide a relation into a number of sub-relations (fragments).
(Horizontal & vertical).
2. Replication make a copy of a fragment.
3. Allocation decide where (which site) each of the fragments and replicas are
to be stored.
![Page 14: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/14.jpg)
Distributed Relational DB Design
14
When we fragment, replicate and allocate, we try
to achieve:• Locality of reference.
• Improved reliability and availability.
• Good performance.
• Balanced storage capacities and costs.
• Minimal communication costs.
![Page 15: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/15.jpg)
Rules of Fragmentation
15
Completeness: Nothing (rows or columns) gets lost while we fragment.
Reconstruction: We can get back the original table after we fragmented it.
Dis-jointness: No row or column appears in 2 fragments (there is 1 exception).
![Page 16: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/16.jpg)
Types of Fragmentation
16
Horizontal fragmentation
Vertical fragmentation
Mixedfragmentation
![Page 17: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/17.jpg)
Original PropertyForRent Table
17
PropertyNo
Street City PostCode Type Rooms Rent OwnerNo
StaffNo BranchNo
PA14 16 Holhead Aberdeen AB7 5SU House 6 650 CO46 SA9 B007
PG4 6 Lawrence Glasgow G11 9QX Flat 3 350 CO40 SG14 B003
PG16 5 Novar Dr Glasgow G12 9AX Flat 4 450 CO93 SG14 B003
PG21 18 Dale Rd Glasgow G12 House 5 600 CO87 SG37 B003
PG36 2 Manor Rd Glasgow G32 4QX Flat 3 375 CO93 SG37 B003
PL94 6 Argy11 St London NW2 Flat 4 400 CO87 SL41 B005
![Page 18: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/18.jpg)
18
BranchNo
Based on type of property.
P1: Type=‘House’ (PropertyForRent)
P2: Type=‘Flat’ (PropertyForRent)
Horizontal Fragmentation
PropertyNo
Street City PostCode Type Rooms Rent OwnerNo
StaffNo BranchNo
PA14 16 Holhead Aberdeen AB7 5SU House 6 650 CO46 SA9 B007
PG21 18 Dale Rd Glasgow G12 House 5 600 CO87 SG37 B003
PropertyNo
Street City PostCode Type Rooms Rent OwnerNo
StaffNo BranchNo
PG4 6 Lawrence Glasgow G11 9QX Flat 3 350 CO40 SG14 B003
PG16 5 Novar Dr Glasgow G12 9AX Flat 4 450 CO93 SG14 B003
PG36 2 Manor Rd Glasgow G32 4QX Flat 3 375 CO93 SG37 B003
PL94 6 Argy11 St London NW2 Flat 4 400 CO87 SL41 B005
Fragment P1
Fragment P2
![Page 19: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/19.jpg)
Original Staff Table
19
StaffNo Position sex DOB Salary FName LName BranchNo
SL21 Manager M 1 Oct 93 30000 John White B005
SG37 Assistant F 10 Nov 60 12000 Ann Beech B003
SG14 Supervisor M 24 Mar 58 18000 David Ford B003
SG5 Assistant F 3 Jun 40 24000 Susan Brand B007
![Page 20: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/20.jpg)
20
S1: staffno,Position,sex,DOB, Salary(STAFF)
S2: staffno,fname,lname,BranchNo(STAFF)
Vertical Fragmentation
StaffNo Position sex DOB Salary
SL21 Manager M 1 Oct 93 30000
SG37 Assistant F 10 Nov 60 12000
SG14 Supervisor M 24 Mar 58 18000
SG5 Assistant F 3 Jun 40 24000
StaffNo FName LName BranchNo
SL21 John White B005
SG37 Ann Beech B003
SG14 David Ford B003
SG5 Susan Brand B007
Fragment S1 Fragment S2
![Page 21: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/21.jpg)
21
FName LName BranchNoFragment S2.3
StaffNo FName LName BranchNo
Fragment S2.1
StaffNo LName BranchNo
Fragment S2.2S2.1: BranchNo=‘B005’ (S2)
S2.2: BranchNo=‘B003’ (S2)
S2.3: BranchNo=‘B007’ (S2)
S1: staffno,Position,sex,DOB, Salary(STAFF)
S2: staffoo,fname,lname,BranchNo(STAFF)
Fragment S1
Mixed Fragmentation – Vertical then Horizontal
StaffNo FName LName BranchNo
SL21 John White B005
StaffNo FName LName BranchNo
SG37 Ann Beech B003
SG14 David Ford B003
StaffNo FName LName BranchNo
SG5 Susan Brand B007
StaffNo Position sex DOB Salary
SL21 Manager M 1 Oct 93 30000
SG37 Assistant F 10 Nov 60 12000
SG14 Supervisor M 24 Mar 58 18000
SG5 Assistant F 3 Jun 40 24000
![Page 22: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/22.jpg)
Derived Horizontal Fragmentation
22
Derived Horizontal Fragmentation is the horizontal fragmentation of a table (child), T1, because we horizontally fragmented another related table (parent), T2.
It is not explicitly specified in design but implied from fragmentation of T2.
T1 (child) has a foreign key that belongs to T2 (parent).
Relationship between T1 and T2 either 1-to-1 or Many-to-1.
Use Semi-join operation:
![Page 23: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/23.jpg)
Derived Horizontal Fragmentation
23
You were required by the design to horizontally fragment Staff table. S1: BranchNo=‘B003’ (Staff) S2: BranchNo=‘B005’ (Staff) S3: BranchNo=‘B007’ (Staff)
StaffNo Position sex DOB Salary FName LName BranchNo
SL21 Manager M 1 Oct 93 30000 John White B005
SG37 Assistant F 10 Nov 60 12000 Ann Beech B003
SG14 Supervisor M 24 Mar 58 18000 David Ford B003
SG5 Assistant F 3 Jun 40 24000 Susan Brand B007
![Page 24: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/24.jpg)
Derived Horizontal Fragmentation
24
Fragment S1
Fragment S2
Fragment S3
StaffNo Position sex DOB Salary FName LName BranchNo
SG37 Assistant F 10 Nov 60 12000 Ann Beech B003
SG14 Supervisor M 24 Mar 58 18000 David Ford B003
StaffNo Position sex DOB Salary FName LName BranchNo
SL21 Manager M 1 Oct 93 30000 John White B005
StaffNo Position sex DOB Salary FName LName BranchNo
SG5 Assistant F 3 Jun 40 24000 Susan Brand B007
![Page 25: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/25.jpg)
Derived Horizontal Fragmentation
25
After we fragmented Staff, we found out that there is a table related to it, PropertyForRent.
Because Staff is now fragmented, it makes sense to fragment PropertyForRent too.
PropertyForRent
Staffhandle
s1 N
S1: BranchNo=‘B003’ (Staff)
S2: BranchNo=‘B005’ (Staff) Pi: PropertyForRent staffNo Si
S3: BranchNo=‘B007’ (Staff)
![Page 26: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/26.jpg)
Original PropertyForRent Table
26
PropertyNo
Street City PostCode Type Rooms Rent OwnerNo
StaffNo BranchNo
PA14 16 Holhead Aberdeen AB7 5SU House 6 650 CO46 SA9 B007
PG4 6 Lawrence Glasgow G11 9QX Flat 3 350 CO40 SG14 B003
PG16 5 Novar Dr Glasgow G12 9AX Flat 4 450 CO93 SG14 B003
PG21 18 Dale Rd Glasgow G12 House 5 600 CO87 SG37 B003
PG36 2 Manor Rd Glasgow G32 4QX Flat 3 375 CO93 SG37 B003
PL94 6 Argy11 St London NW2 Flat 4 400 CO87 SL41 B005
![Page 27: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/27.jpg)
27
Derived Horizontal Fragmentation
PropertyNo
Street City PostCode Type Rooms Rent OwnerNo
StaffNo BranchNo
PG4 6 Lawrence Glasgow G11 9QX Flat 3 350 CO40 SG14 B003
PG16 5 Novar Dr Glasgow G12 9AX Flat 4 450 CO93 SG14 B003
PG21 18 Dale Rd Glasgow G12 House 5 600 CO87 SG37 B003
PG36 2 Manor Rd Glasgow G32 4QX Flat 3 375 CO93 SG37 B003
PropertyNo
Street City PostCode Type Rooms Rent OwnerNo
StaffNo BranchNo
PA14 16 Holhead Aberdeen AB7 5SU House 6 650 CO46 SA9 B007
PropertyNo
Street City PostCode Type Rooms Rent OwnerNo
StaffNo BranchNo
PL94 6 Argy11 St London NW2 Flat 4 400 CO87 SL41 B005
Fragment P1
Fragment P2
Fragment P3
![Page 28: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/28.jpg)
Transparencies in a DDBMS
28
4 main transparencies:1. Distribution Transparency.
a. Fragmnetation.b. Location. c. Replication.d. Local Mapping.e. Naming.
2. Transaction Transparency.3. Performance Transparency.4. DBMS Transparency.
![Page 29: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/29.jpg)
1. Distribution Transparency
29
Allows the user to perceive the DB as a single, logical entity. Types:
a. Fragmentation: the user does not need to know the data is fragmented.
b. Location: the user does not need to know the location of fragments.
c. Replication: the user does not need to know the fragments are replicated.
d. Local Mapping: the user specifies the fragment and its location.
e. Naming: DDBMS makes sure every item name is unique.
Consider the distribution of the STAFF relation: S1: staffno,Position,sex,DOB, Salary(STAFF) S2: staffno,fname,lname,BranchNo(STAFF) S21: BranchNo=‘B003’ (S2) S22: BranchNo=‘B005’ (S2) S22: BranchNo=‘B007’ (S2)
![Page 30: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/30.jpg)
a. Fragmentation Transparency
30
Highest level of distribution transparency. The user does not need to know that the data is
fragmented. User treats DDB like a centralized DB. The database access are based on the global schema. Fragmentation of the data can be changed without
impacting the user.
Example:
SELECT Fname, Lname
FROM Staff
WHERE position = ‘Manager’;
![Page 31: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/31.jpg)
b. Location Transparency
31
The middle level of distribution transparency.
The user must know that the data is fragmented but still does not need
to know the location of the data.
Data location can be changed without impact on the user.
Example:
SELECT Fname, Lname FROM S21
WHERE staffNo IN (SELECT staffNo FROM S1 WHERE position=‘Manager’)
UNION
SELECT Fname, Lname FROM S22
WHERE staffNo IN (SELECT staffNo FROM S1 WHERE position=‘Manager’)
UNION
SELECT Fname, Lname FROM S23
WHERE staffNo IN (SELECT staffNo FROM S1 WHERE position=‘Manager’)
![Page 32: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/32.jpg)
c. Replication Transparency
32
User unaware of replication and location but knows that data is fragmented.
On the same level with location transparency.
![Page 33: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/33.jpg)
d. Local Mapping Transparency
33
The lowest level of distribution transparency.
The user knows that the data is fragmented and the location of the data.
Example:
SELECT Fname, Lname FROM S21 AT SITE 3
WHERE staffNo IN
(SELECT staffNo FROM S1 AT SITE 5 WHERE position=‘Manager’)
UNION
SELECT Fname, Lname FROM S22 AT SITE 5
WHERE staffNo IN
(SELECT staffNo FROM S1 AT SITE 5 WHERE position=‘Manager’)
UNION
SELECT Fname, Lname FROM S23 AT SITE 7
WHERE staffNo IN
(SELECT staffNo FROM S1 AT SITE 5 WHERE position=‘Manager’)
![Page 34: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/34.jpg)
e. Naming Transparency
34
Each item in distributed database must have a unique name.
DDBMS must ensure that no two sites violate that.
Solutions Create a central name server.
Bottleneck. against local autonomy.
Prefix an object with the identifier of the site. loss of distribution transparency.
![Page 35: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/35.jpg)
2. Transaction Transparency
35
All transactions must ensure the consistency and integrity of the DDB.
Each transaction that needs to access data in multiple sites is divided into multiple sub-transactions.
Even if transaction is split, atomicity has to be maintained.
![Page 36: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/36.jpg)
3. Performance Transparency
36
DDBMS performs as if it were a centralized DBMS.
Should not suffer because it is distributed (network communication cost).
When a site issues a query, the system must figure out the fastest way of executing it.
Distributed Query Processor (DQP) must figure out: Which fragment to access. Which copy of fragment to access (if replication is used). Where are the fragments.
![Page 37: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/37.jpg)
3. Performance Transparency
37
Consider the following distributed DB: Property(PropertyNo, city) 10,000 records in London Client(ClientNo, maxPrice) 100,000 records in Glasgow Viewing(PropertNo, ClientNo) 1,000,000 records in London
London site wants to list properties in Aberdeen that have been viewed by clients who have a maximum price limit greater than 200,000.
SELECT p.propertyNo
FROM Property P INNER JOIN
(Client c INNER JOIN Viewing v ON c.clientNo = v.clientNo)
ON p.propertyNo = v.propertyNo
WHERE p.city = ‘Aberdeen’ AND
c.maxprice > 200000;
![Page 38: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/38.jpg)
3. Performance Transparency
38
After the query is issued, DDBMS must determine the most cost-effective strategy to execute the query.
Strategies:
1. Move Client table to London and process query there.
2. Move Property and Viewing relation to Glasgow and process query there then return result.
3. Join Property and Viewing at London, project only property number and client number and move result to Glasgow to join with clients with maxPrice > 200,000 then return results.
4. Select clients at Glasgow with maxPrice > 200000, move them to London and join with viewing and Aberdeen property.
![Page 39: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/39.jpg)
4. DBMS Transparency
39
Hides the fact that different sites have different local DBMSs.
Heterogeneous DDBMSs.
![Page 40: Distributed DBMSs – Concepts and Design](https://reader035.vdocument.in/reader035/viewer/2022062408/56813433550346895d9b23ab/html5/thumbnails/40.jpg)
Date’s 12 Rules for a DDBMS
40
1. Local autonomy.
2. No reliance on a central site.
3. Continuous operation.
4. Location independence.
5. Fragmentation independence.
6. Replication independence.
7. Distributed query processing.
8. Distributed transaction processing.
9. Hardware independence.
10. Operating system independence.
11. Network independence.
12. Database independence.