database design – lecture 16 distributed databases
TRANSCRIPT
2
Lecture Objectives Distributed Processing and Distributed
Databases Distributed Database Management
System (DDBMS) Distributed Database Design
3
Distributed Processing
Shares thedatabase’s logical processing amongtwo or more physically independent sitesthat are connectedthrough a network.
Note: data resides at only one site and is shared by other sites (“centralized”)
4
Distributed DatabasesStores a logicallyrelated databaseover two or morephysicallyindependent sites.The sites areconnected by acomputer
network.
Note: database is composed of several parts know as database fragments. These fragments are located at several different sites.
5
Distributed Processing and Distributed Databases
In a distributed database environment, the users do not need to know the name or location of each database fragment in order to access the database – transparent to the user
Distributed processing does not require a distributed database but a distributed database requires distributed processing
Both distributed processing and distributed databases require a network to connect all components
6
Lecture Objectives Distributed Processing and Distributed
Databases Distributed Database Management
System (DDBMS) Distributed Database Design
7
DDBMS Advantages Data are located near/at “greatest
demand” site – improved performance Improved reliability – data replication Growth facilitation Reduced operating costs
9
Distributed Database Management System(DDBMS)
Governs the storage and processing of a single logically related database over interconnected computer systems in which both data and processing functions are distributed among several sites.
10
Distributed Database Management System(DDBMS)
A DDBMS must have at least the following functions to be classified as distributed:
- Application Interface - Validation- Transformation - Query Optimization- Mapping - I/O Interface- Formatting - Security- Backup & Recovery - DB Administration- Concurrency Control - Transaction
Management- Computer Workstations (sites or nodes)- Network Hardware & Software- Communications Media
11
Distributed Database Management System(DDBMS)
A DDBMS must have at least the following functions to be classified as distributed:
Application Interface Allows the interaction with the end user or
application programs and with other DBMSs within the distributed database
Validation Able to analyze data requests
Transformation To determine which data request components are
distributed and which ones are local
12
Distributed Database Management System(DDBMS)
A DDBMS must have at least the following functions to be classified as distributed:
Query Optimization To find the best access strategy
Mapping To determine the data location of local and
remote fragments I/O Interface
To read or write data from or to permanent local storage
13
Distributed Database Management System(DDBMS)
A DDBMS must have at least the following functions to be classified as distributed:
Formatting To prepare the data for presentation to the end
user or an application program Security
To provide data privacy at both local and remote databases
Backup and Recovery To ensure the availability and recoverability of
the database in case of a failure
14
Distributed Database Management System(DDBMS)
A DDBMS must have at least the following functions to be classified as distributed:
DB Administration To allow the Database Administrator to maintain the
databases Concurrency Control
To manage simultaneous data access and ensure data consistency across database fragments in the DDBMS
Transaction Management To ensure that the data move from on consistent
state to another – synchronizing transactions
15
Distributed Database Management System(DDBMS)
A DDBMS must have at least the following components:
Computer Workstations (sites or nodes) Form the network system
Network Hardware and Software Components that reside in each workstation Allows all sites to interact and exchange data
Communications media Carries data from one workstation to another
16
Distributed Database Management System(DDBMS)
A DDBMS must have at least the following components:
Transaction Processor (TP) Software component found in each computer
that requests data Receives and processes the application’s data
requests (remote and local) Data Processor (DP)
Software component residing on each computer that stores and retrieves data located at the site
18
Lecture Objectives Distributed Processing and Distributed
Databases Distributed Database Management
System (DDBMS) Distributed Database Design
19
Distributed Database Design Designing for a relational data base
structure does not change – start with a top down approach
HOWEVER, need to consider the following as well:
How to partition the database into fragments Which fragments to replicate Where to locate those fragments and replicas
More frequently used fragments should be stored locally
Fragments used by all users should be stored centrally
20
Distributed Database Design Data Fragmentation:
Allows a single object to be broken into two or more segments or fragments
Each fragment can be stored at any site on the network
Data fragmentation information is stored in the distributed data catalog (DDC), from which it is accessed by the TP to process user requests
22
Distributed Database Design Types of Data Fragmentation:
Horizontal The division of a relation into tuples (rows) Each fragment is stored at a different node and
each fragment has unique rows Each tuple has the same attributes (columns)
but the rows are fragmented
23
Distributed Database Design Example of horizontal fragmentation
Original structure:5th Edition
Fragmentedstructure: Splitby state6th Edition
24
Distributed Database Design Example of horizontal fragmentation
Resulting structure:
Fragmentedstructure: Splitby state5th Edition
25
Distributed Database Design Types of Data Fragmentation:
Vertical The division of a relation into subsets by attributes
(column) Each subset is stored at a different node, and each
fragment has unique columns – with the exception of the key column, which is common to all fragments
Transaction issues here because same record may need to be inserted into two tables (part of record into 1 table and other part into another table). If only 1 insert is successful; end up with inconsistent data.
26
Distributed Database Design
Original structure:5th Edition
Fragmentedstructure: Splitby location6th Edition
27
Distributed Database Design
Example of Vertical Fragmentation
Original structure:5th Edition
Fragmentedstructure: Splitby location5th Edition
28
Distributed Database Design Types of Data Fragmentation:
Mixed A combination of horizontal and vertical
strategies
31
Data Replication Storage of data copies at multiple sites
served by a computer network Fragment copies can be stored at several
sites to serve specific information requirements Can enhance data availability and response
time Can help to reduce communication and total
query costs
32
Replication Scenarios Fully replicated database:
Stores multiple copies of each database fragment at multiple sites
Can be impractical due to amount of overhead Partially replicated database:
Stores multiple copies of some database fragments at multiple sites
Most DDBMSs are able to handle the partially replicated database well
Unreplicated database: Stores each database fragment at a single
site No duplicate database fragments
33
Data Allocation Deciding where to locate data Allocation strategies:
Centralized data allocation Entire database is stored at one site
Partitioned data allocation Database is divided into several disjointed parts
(fragments) and stored at several sites Replicated data allocation
Copies of one or more database fragments are stored at several sites
Data distribution over a computer network is achieved through data partition, data replication, or a combination of both
34
Distributed Database Design How is a distributed database
managed? Distributed Data Catalog (DDC)
Contains the description of the entire database as seen by the DBA
Translates user requests into sub-queries (remote requests) that will be processed by different DPs
DDC is distributed and replicated at network nodes (the location of a database fragment)
35
Examples of Distributed Databases
Banking Account data distributed at each local
branch Loan data distributed at each local branch Corporate data at head office
(summarized branch information) Insurance
Policy data with each branch Corporate data at head office
36
Examples of Distributed Databases
Retail Inventory data distributed at each local store Employee Scheduling data at each store Corporate data at head office (summarized
store information) Payroll data at head office
Utilities Utility monitoring data at each location (I.e.
nuclear station monitoring – air, water etc at each location)
Corporate data at head office
37
Distributed Database vs Client Server
Client/Server is really an architecture which models a computerized solution based on the distribution of functions between servers and clients. A client requests specific services from a server and a server provides requested services to clients
Distributed processing could be one aspect of client/server architecture – data ‘centralized’
The DDBMS distributes data to different locations – could be used in a Client/Server architecture
38
Distributed Database Design Steps:
1. Always start with a centralized view design 2. Consider horizontal fragmentation of a
centralized database3. Consider vertical fragmentation of a
horizontally fragmented database4. Re-consider PK for all fragments of the
database5. Define data replication rules (scenarios)6. Complete Design