lecture04 (1)

7/29/2019 Lecture04 (1)

http://slidepdf.com/reader/full/lecture04-1 1/11

DDBMS - Lecture 4 Distributed DBMS Architecture

Table of Contents

Architecture .................................................................................................................................1

Standardization ............................................................................................................................1DBMS Implementation Alternatives ...........................................................................................3

Autonomy ............................................................................................................................3

Distribution ......................................................................................................................... 4

Heterogeneity ...................................................................................................................... 4

Architectural Alternatives ................................................................................................... 4

Distributed DBMS Architecture ..................................................................................................5

Client/Server System ...........................................................................................................5

Peer-to-Peer Distributed System ......................................................................................... 7

MDBS Architecture .............................................................................................................8

Directory Issues ...........................................................................................................................9

i

7/29/2019 Lecture04 (1)



Architecture

Defines the structure of the system

components identified

functions of each component defined interrelationships and interactions between components defined

Standardization

Reference Model – A conceptual framework whose purpose is to divide standardization

work into manageable pieces and to show at a general level how these pieces are related to

one another.

Approaches

Component-based

Components of the system are defined together with the interrelationships between

components.

Good for design and implementation of the system.

Function-based

Classes of users are identified together with the functionality that the system will

provide for each class.

The objectives of the system are clearly identified. But, it gives very little insight

into how these objectives will be attained or the level of complexity of the system. Data-based

Identify the different types of describing data and specify the functional units that

will realize and/or use data according to these views. (Datalogical Approach)

The central importance associates with the data resource. But it is impossible to

specify an architectural model fully unless the functional modules are described.

The ANSI/SPARC Architecture

Refer Figure 1

External Schema – for the view of the user, e.g., application administrator.

Conceptual Schema – for the view of the enterprise, e.g., enterprise administrator.

Internal Schema – for the view of the system or machine, e.g., database administrator.

The schema definitions by examples

Conceptual Schema Definition

RELATION EMP [

KEY = {ENO}

ATTRIBUTES = {

ENO : CHARACTER(9)

ENAME : CHARACTER(15)

1

7/29/2019 Lecture04 (1)



TITLE : CHARACTER(10)

}

]

RELATIONPAY [

KEY = {TITLE}

ATTRIBUTES = {


SAL : NUMERIC(6)

}

]

RELATION PROJ [

KEY = {PNO}

ATTRIBUTES = {

PNO : CHARACTER(7)

PNAME : CHARACTER(20)

BUDGET : NUMERIC(7)

}

]

RELATION ASG [

KEY = {ENO,PNO}

ATTRIBUTES = {

ENO : CHARACTER(9)

PNO : CHARACTER(7)

RESP : CHARACTER(10)

DUR : NUMERIC(3)

}

]

RELATION EMP [

KEY = {ENO}

ATTRIBUTES = {

ENO : CHARACTER(9)

ENAME : CHARACTER(15)


}

]

Internal Level Definition – the storage of relations above are described with indexed

file, say HEADER field, which might contain flags (delete, update, etc.).

2

7/29/2019 Lecture04 (1)



INTERNAL_REL EMPL [

INDEX ON E# CALL EMINX

FIELD = {

HEADER : BYTE(1)

E# : BYTE(9)

ENAME : BYTE(15)

TIT : BYTE(10)

}

]

External View Definition –

Example 1 - Create a BUDGET view from the PROJ relation

CREATE VIEW BUDGET(PNAME, BUD)

AS SELECT PNAME, BUDGET

FROM PROJ

Example 2 - Create a Payroll view from relations EMP and TITLE_SALARY

CREATE VIEW PAYROLL (ENO, ENAME, SAL)

AS SELECT EMP.ENO,EMP.ENAME,PAY.SAL

FROM EMP, PAY

WHERE EMP.TITLE = PAY.TITLE

DBMS Implementation Alternatives

- Dimensions of the Problem – it organizes the systems as characterized with respect to

(1) the autonomy of local systems, (2) their distributions, and (3) their heterogeneity.

Autonomy

It indicates the degree to which individual DBMSs can operate independently.

The function of a number of factors such as

Whether the component systems exchange information,

Whether they can independently execute transactions, and

Whether one is allowed to modify them.

Not well understood and most troublesome

Various versions

Gligor and Popescu-Zeletin, 1986

Local operations of the individual DBMSs are not affected by their participation in

the multi-database system.

Individual DBMSs process queries and optimize them should not be affected by

the execution of global queries that access multiple database.

3

7/29/2019 Lecture04 (1)



System consistency or operation should not be compromised when individual

DBMSs join or leave the multi-database confederation.

Du and Elmagarmid, 1989

Design autonomy: Ability of a component DBMS to decide on issues related to its

own design.

Communication autonomy: Ability of a component DBMS to decide whether and

how to communicate with other DBMSs.

Execution autonomy: Ability of a component DBMS to execute local operations in

any manner it wants to.

Classification (taxonomy)

Tight integration – a single-image of the entire database is available to any user who

wants to share the information, which may reside in multiple databases.

Semiautonomous system – consist of DBMSs that can operate independently, but have

decided to participate in a federation to make their local data sharable.

Total isolation – the individual systems are stand-alone DBMSs, which know neither

of the existence of other DBMSs nor how to communicate with them.

Distribution

Whether the components of the system are located on the same machine or not, it

considers the physical distribution of data over multiple sites.

Client/Server distribution – concentrates data management duties at servers while the

clients focus on providing the application environment including the user interface.

Peer-to-Peer (or full) distribution – no distinction of client machines versus servers.

Each machine has full DBMS functionality and can communicate with other machines

to execute queries and transactions.

Heterogeneity

Various levels (hardware, communications, operating system)

DBMS important one – data model, query language, transaction managementalgorithms

Representing data with different modeling tools creates heterogeneity because of the

inherent expressive power and limitations of individual data models.

Architectural Alternatives

Refer Figure 2, the dimensions are identified as A (autonomy), D (distribution) and H

(heterogeneity).

(A0, D0, H0): composite system by logically integrated. The system has no

distribution or heterogeneity but a set of multiple DBMSs.

4

7/29/2019 Lecture04 (1)



(A0, D0, H1): one has multiple data managers that are heterogeneous but provide an

integrated view to the user such as the access to network, hierarchical, and relational

database residing on a single machine.

(A0, D1, H0): the database is distributed even though an integrated view of the data is

provided to users; e.g., client/server distribution.

(A0, D2, H0): the same type of transparency is provided to the user in a fully

distributed environment. No distinction among clients and servers, each site provides

identical functionality; e.g., peer-to-peer distribution.

(A1, D0, H0): semiautonomous systems, which are commonly termed federated

DBMS.

(A1, D0, H1): heterogeneous federated DBMS , mostly found in everyday; e.g., if we

wish to provide an integrated view to the users, then it is necessary to hide the

autonomy and heterogeneity of the component systems and establish a common

interface.

(A1, D1, H1): distributed, heterogeneous federated DBMS , place component systems

on different machines for distribution; herein, autonomy and heterogeneity are more

important than distribution.

(A2, D0, H0): multi-database system (MDBS) architectures with full autonomy.

There’s no concept of cooperation and they do not even know how to talk to each

other. A multi-database management system (multi-DBMS) provides for the

management of this collection of autonomous databases and transparent access to it.

(A2, D0, H1): maybe even more realistic than (A1, D0, H1), we want to build

applications which access data from multiple storage systems with different

characteristics.

(A2, D1, H1) and (A2, D2, H1): both represent the case where component databases

that make up the MDBS are distributed over a number of sites – called distributed

MDBS , dealing with similar interoperability. In the case of client/server (A2, D1, H1),

most of the interoperability concerns are delegated to middleware systems resulting in

a three layer architecture.

Distributed DBMS Architecture

Client/Server System

Beginning of 1990’s

The functions are divided into two classes: server functions and client functions.

Providing a two-level architecture which makes it easier to manage the complexity of

modern DBMSs and the complexity of distribution.

5

7/29/2019 Lecture04 (1)



If one takes a process-centric view, then any process that requests the services of another

process is its client.

“Client/Server computing” and “Client/Server DBMS” in modern definition is based on

actual machine but not referred to processes.

Functionality of server and client

The server does most of the data management works by means of all of query

processing and optimization, transaction management and storage management.

The client, in addition to the application and the user interface, has a DBMS client

module that is responsible for managing the data that is cached to the client and

(sometimes) managing the transaction locks that may have been cached as well.

Client/Server Reference Architecture (also refer Figure 3)

Multiple Client – Single Server ( Figure 6 )

The database is stored on only one machine (the server).

The differences from centralized systems are in the way which transactions are

executed and caches are managed.

Multiple Client – Multiple Server ( Figure 7 )

Stage 1 – Each client manages its own connection to the appropriate server

“Heavy Client” system: the approach simplifies server code, but loads the

client machines with additional responsibilities.

Stage 2 – Each client knows of only its “home server” which then communicates

with other servers as required.

“Light Client” system: concentrates the data management functionality at the

server, the transparency of data access is provided at the server interface.

Client/server system vs. Peer-to-peer system

From the management perspective, the client/server system provides different

architecture from the peer-to-peer system.

From a datalogical perspective, they both give the user the appearance of a

logically single database, while at the physical level data may be distributed.

Distinction: the architecture paradigm that is used to realize the level of transparency that is provided to the users and applications.

Advantages of Client-Server Architectures

More efficient division of labor

Horizontal and vertical scaling of resources

Better price/performance on client machines

Ability to use familiar tools on client machines

Client access to remote data (via standards)

Full DBMS functionality provided to client workstations

Overall better system price/performance

6

7/29/2019 Lecture04 (1)



Problems With Multiple-Client/Single Server

Server forms bottleneck

Server forms single point of failure

Database scaling difficult

Peer-to-Peer Distributed System

Organization definitions:

LIS (local internal schema) – individual internal schema definition at each site

GCS (global conceptual schema) – the enterprise view of the data

LCS (local conceptual schema) – the third layer of architecture to describe logical

organization of data at each site

ES (external schema) – support for user applications and user access above the GCS

Refer Figure 9, The Datalogical Distributed DBMS Architecture

Location and replication transparencies are supported by GCS and LCS.

Network transparency is supported by GCS.

The distributed DBMS translates global queries into a group of local queries.

The local queries are executed by distributed DBMS components at different sites.

Extension of the ANSI/SPARC model in Figure 10

GD/D (global directory/dictionary) – permits the required global mapping

LD/D (local directory/dictionary) – performs the local mapping

The LCSs are mappings of the GCS onto each site; and such database is typically

designed in a top-down fashion to make all external view definitions globally.

The components of a distributed DBMS ( Figure 11) – one component handles the

interaction with users (user process), and another deals with the storage (data process).

The user processor

User interface handler – interprets user commands as they come in, and formats

the result data as it is sent to the user.

Semantic data controller – uses the integrity constraints and authorizations that are

define as part of the GCS to check if the user query can be processed. Global query optimizer and decomposer – determines an execution strategy to

minimize a cost function, and translates the global queries into local ones using the

GCS and LCS as well as the global directory for generating the best strategy.

Distributed execution monitor – or distributed transaction manager , coordinates

the distributed execution of the user request.

The data processor

Local query optimizer – acts as the access path selector, is responsible for choosing

the best access path to access any data item.

Local recovery manager – makes sure that the local database remains consistent

7

7/29/2019 Lecture04 (1)



even when failures occur.

Run-time support processor – physically accesses the database according to the

physical commands in the schedule generated by the query optimizer.

Be the interface to the operating system and contains the database buffer (or

cache), which maintains the main memory buffers and data accesses.

Notes, in peer-to-peer systems, one expects to find both the user processor modules

and the data processor modules on each machine; however, there’re some suggestions

to separate “query-only sites” in a system from full-functionality ones.

MDBS Architecture

Distributed DBMS (DDBMS) vs. Distributed Multiple DBMS (MDBS)

GCS definition:

DDBMS: the GCS defines the conceptual view of the entire database.

MDBS: the GCS represents only the collection of some of the local database that

each local DBMS wants to share.

Global database view:

DDBMS: the union of local database

MDBS: a subset of the same union

MDBS models using a GCS

The GCS defined in MDBS:

Integrating either the external schemas of local autonomous databases or parts of

their local conceptual schemas. ( Figure 12)

Users of a local DBMS define their own views on the local database and do not

need to change their applications if they do not want to access data from another

database.

Designing the GCS in multi-database systems involves the integration of either LCS

or LES.

GCS in multi-DBMS: the mapping is from LCS to GCS;

(up-to-down in Figure 12) GCS in logically integrated distributed DBMS: the mapping is from GCS to LCS.

(down-to-up in Figure 12)

If heterogeneity exists in the system, two implementation alternatives exists:

Unilingual multi-DBMS: requires the users to utilize possibly different data

models and languages when both a local database and the global database are

accessed.

Any application that accesses data from multiple database must do so by

means of an external view defined on the GCS.

One application may have external views below with different language:

8

7/29/2019 Lecture04 (1)



- a local external schema (LES) defined on the local conceptual schema

(LCS)

- a global external schema (GES) defined on the global conceptual schema.

Multilingual multi-DBMS: permits each user to access the global database by

means of an external schema, defined using the language of the user’s local

DBMS.

The GCS here is described in the language of the external schemas of the local

database.

They generally require some processing to be mapped to the GCS.

It makes querying the database easier from the user’s perspective; but more

complicated because we must deal with translation of queries at run time.

MDBS models without a GCS

Referring the Figure 13, the system is identified as two layers: the local system and

the multi-database layer.

The local system layer consists of a number of DBMSs, which present to the

multi-database layer to share with users of other databases.

If heterogeneity is involved, each of these schemas, LCS i, may use a different data

model.

The external views may be defined on one local conceptual schema or on multiple

conceptual schemas.

The responsibility of providing access to multiple (and may be heterogeneous)

database is delegate (or assigned) to the mapping between the external schemas

and the local conceptual schemas.

The MDBS provides a layer of software that runs on top of these individual DBMSs

and provides users with the facilities of accessing various databases.

Figure 14 represents a non-distributed multi-DBMS.

If the system is distributed, we would need to replicate the multi-database layer to

each site where there is a local DBMS that participates in the system.

Directory Issues

The global directory issue is relevant only for a distributed DBMS or a multi-DBMS that

uses a GCS.

The extensions of the directory as described in the ANSI/SPARC report.

Including the location and makeup of the fragments.

The database containing meta-data.

The first issue – type, global or local?

May be either global to the entire database or local to each site.

9

7/29/2019 Lecture04 (1)



May be maintained centrally at one site, or distribute it over a number of sites.

The second issue – location, centralized or distributed directories?

Keep the directory at one site – increase the load at that site and cause traffic.

Distribute directories over a number of sites – increase complexity of managing

directories.

The third issue – replication, single copy or multiple copies?

Multiple copies would provide more reliability for higher possibility to reach the

directory.

Multiple copies also cause lower delays in accessing the directory due to less

contention and the relative proximity of the directory copies.

The multiple copy case only works for distributed DBMS; MDBS only maintain

single directory.

Alternative Directory Management Strategies

Refer Figure 15 for three dimensions with orthogonal to one another.

The question mark (?) represents the unrealistic combinations.

The choice of an appropriate directory management scheme should also depend on the

query processing and the transaction management techniques.

10

lecture04 (1)

Documents