1 cg096: lecture 9 parallel data bases. 2 overview background: recap of relational systems. parallel...
TRANSCRIPT
1
CG096: Lecture 9
Parallel Data Bases
2
Overview
Background: Recap of Relational Systems.
Parallel Query Processing: Sort and Hash-Join assuming a “shared-nothing” architecture supposedly hardest to program, but actually
quite clean. Data Layout. Parallel Query Optimization.
3
Some History
In pre-relational data bases, programmers reigned: Data models used explicit pointers Manipulated by (e.g.) COBOL code.
Relational revolution: data abstraction: Declarative languages and data independence Key to the most successful parallel systems.
Significant developments: Codd’s relational model: early 70’s; Experimental, partially relational implementations (System R &
INGRES): mid-late 70’s; Commercial, fully relational implementations (Oracle, IBM
DB2, INGRES Corp: early 80’s; Rise of parallel DBs: late 80’s onwards.
4
The Relational Data Model
A data model is a collection of concepts for describing data.
A schema is a description of a particular collection of data, using the a given data model.
The relational model of data: Main construct: relation, basically a table with
rows and columns. Every relation has a schema, which describes the
columns, or fields. N.B. no pointers, no nested structures, no ordering,
no irregular collections.
5
2 Levels of Indirection Many views, single
conceptual (logical) schema and single physical schema. Views describe how
users see the data.
Conceptual schema defines logical structure
Physical schema describes the files and indexes used.
Physical Schema
Conceptual Schema
View 1 View 2 View 3
6
Data Independence – aiding the parallel approach
Applications insulated from how data is structured and stored.
Logical data independence: Protection from changes in logical structure of data– lets you implement parallel processing under
traditional applications. Physical data independence:
Protection from changes in physical structure of data– minimises constraints on processing, enabling clean
parallelism.
7
Structure of a DBMS
A typical DBMS has a layered architecture.
The figure does not show the concurrency control and recovery components.
This is one of several possible architectures; each system has its own variations.
Query Optimizationand Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
DB
Parallel considerations
mostly here
8
Relational Query Languages
Query languages: allow manipulation and retrieval of data from a data base.
Relational model supports simple, powerful QLs: Strong formal foundation based on logic.– supporting optimisation and parallelisation.
Query Languages <> Programming Languages! QLs not expected to be “computationally complete”. QLs not designed for doing complex calculations. QLs support easy, efficient access to large data sets.
9
Formal Relational Query Languages Two mathematical Query Languages form the
basis for “real” languages (e.g. SQL), and for implementation:
Relational Algebra: More operational, very useful for representing internal execution plans. “Data base opcodes” – can be optimised and
parallelised for the sake of efficiency. Relational Calculus: Lets users describe what
they want, rather than how to compute it (non-operational, declarative – SQL comes from here.)
10
Basic SQL Relation-list: a list of relation names
– possibly with a range-variable after each name. Target-list: a list of attributes of tables in a
relation-list. Qualification: comparisons combined using
AND, OR and NOT. Comparisons are Attr op const or Attr1 op Attr2,
where op is one of = <> < > <= >= DISTINCT: optional keyword indicating that the
answer should not contain duplicates. Default is that duplicates are not eliminated!
11
Conceptual Evaluation Strategy
Semantics of an SQL query defined in terms of the following conceptual evaluation strategy: Compute the cross-product of the relation-list. Discard resulting tuples if they fail the
qualifications. Delete attributes that are not in the target-list. If DISTINCT is specified, eliminate duplicate rows.
This approach is naïve from the performance point of view
12
Downside of Naïve Approach Probably the least efficient way to compute a
query An optimiser will find the same answers using more
efficient strategies.
Need to devise a strategy which Minimises the size of relations involved in joins Does selects and projects before joins Joins small to large relations before large to large
where 3 or more joins required
13
Optimiser maps SQL to algebra tree with specific algorithms: access methods, join algorithms, scheduling.
Relational operators implemented as iterators: Open () Next (possibly with condition) Close ()
Parallel processing engine built on partitioning data flow to iterators: inter- and intra-query parallelism.
Query Optimisation & Processing
14
Parallel and Distributed Data Bases are being increasingly used to achieve: better performance; increased availability of data; access to data distributed at various sites.
Why use Parallel DB’s?
A Parallel DB enables the first of theabove, and a DDB enables the last twopredominantly.
15
Parallelism on one CPU Can use multi-threaded approach Threads are a way for a program to split
itself into two or more simultaneously running tasks.
Multiple threads can be executed in parallel We are more concerned here with use of
multiple CPUs
16
Parallel Data Base Architecture
Parallel data bases (PDBs) use more than one CPU / disk to carry out evaluation of a query faster.
There are three possible ways of sharing processors, disks and memory: Shared nothing: the individual CPU / disk / memory
are interconnected. Shared memory: all processors share the same
memory, all data can move between memory and any disk.
Shared disk: every processor has its own memory, but can read / write from any disk.
17
The shared-disk and shared-memory architectures have communication overheads.
As the number of CPUs / disks increases, there is more and more of a bottleneck.
Beyond a certain limit, increasing the number of CPUs reduces the performance.
Shared-nothing systems provide linear speed-up and scale-up. The time per transaction should remain the same when the numbers of CPUs and transactions increase in proportion.
Communication Overhead
18
The query tree can be evaluated in parallel, each node can be independently evaluated if the corresponding relations are stored in a separate CPU / disk.
Even a simple query can be evaluated in parallel if the relation is partitioned.
How can we partition a relation?
Parallel Query Evaluation
19
Data Partitioning – Recap
Data can be partitioned in various ways: Each row is allocated to one of the processors
in round-robin. Each row is hashed and allocated to the
corresponding processor. Each CPU stores a range of rows.
Which is better? Depends on nature of queries.
20
Background - Hashing Hashing is an access method that:
Hashes (chops) an input key So that the result is an integer In the range of the address space available
Example: Address space is 0…99 buckets Key is s105 Hash function chops off s, divides by 100 and takes
remainder 5 Record is stored in bucket 5 Fast retrieval and placement (one disk access) if
space in buckets
21
Distributed Data Bases – Recap
A DDB system involves multiple sites or nodes connected by a network.
Each site has own CPU, terminals, DBMS, users, DBA and local autonomy.
A user can access data stored locally or on other nodes. DDBs are becoming more common.
22
Benefits of DDBMS
Closer match to distributed applications such as airline reservation systems, bank ATMs etc.
Increased reliability and availability when coupled with replication – better system functionality.
Shares data between sites, under local control. Improved performance may be achieved, because
processing of transactions is in parallel at different sites.
23
Costs of DDBMS
Communication between sites for transmission of data and commands.
Need to keep track of indices when processing queries that access dispersed and replicated data.
Maintenance. Risk of communication failure.
24
Example of Distributed Data(Head office)
Supplier DetailsCustomer DetailsProduct DetailsStock Levels
Warehouse detailsSales transactions
Supplier Details (snapshot)Customer Details (snapshot)Product Details (daily replication)Transaction records
Supplier Details (snapshot)Customer Details (snapshot)
Product Details (daily replication)
Transaction records
This data does not change often so SN is
sufficient
Product details change infrequently and daily replication is enough.
Replication is for performance
25
Workloads
On-Line Transaction Processing Many little jobs (e.g. debit / credit). A typical SQL system c. 1995 supported
21,000 t.p.m. using 112 CPUs, 670 disks. Batch (decision support and utility)
Few big jobs, parallelism inside. Scan data at 100 MB/sec. Linear Scaleup to 500 processors.
26
Parallel Sorting Why?
DISTINCT, GROUP BY, ORDER BY, sort-merge join, index build – major overheads and inefficient.
Parallelisation targets these components to improve performance– by splitting into manageable pieces.
Phases:1. parallel read and partition (coarse radix sort),
pipelined with parallel sorting of memory-sized runs, spilling runs to disk.
2. parallel reading and merging of runs
27
Coarse Radix Sort Radix sort
Based on binary representation Compute ranks of elements
Coarse-grained Individual tasks are relatively large
28
Parallel Sorting 2 Notes:
Phase 1 requires repartitioning. High bandwidth network required.
Phase 2 totally local processing. Both pipelined and partitioned parallelism. Linear speedup, scaleup
29
Hash Join Phase 1: Partition both
relations using hash fn h: R tuples in partition i will only match S tuples in partition i.
Phase 2: Read in a partition of R, hash it using h2 (<> h!). Scan matching partition of S, search for matches.
Partitionsof R & S
Input bufferfor Si
Hash table for partitionRi (k < B-1 pages)
B main memory buffersDisk
Output buffer
Disk
Join Result
hashfnh2
h2
B main memory buffers DiskDisk
Original Relation OUTPUT
2INPUT
1
hashfunction
h B-1
Partitions
1
2
B-1
. . .
30
Parallel Query Processing 1
Essentially no synchronisation except setup & teardown: No barriers, cache coherence, etc. Database transactions work fine in parallel:
data updated in place, with 2-phase locking transactions;
replicas managed only at end of transaction via 2-phase commit;
coarser grain, higher overhead than cache coherency.
31
Terminology Setup
Establish active connection (add to connection table) Teardown
Remove from the connection table 2-phase locking
In the first phase, locks are acquired but may not be released.
In the second phase, locks are released but new locks may not be acquired.
Rationalises acquiring and releasing of resources Serialises concurrent transactions
32
Parallel Query Processing 2 Bandwidth much more important than latency:
often pump 1-1/n % of a table through the network; aggregate net bandwidth should match aggregate disk
bandwidth.
Ordering of data flow immaterial (relational model)– simplifies synchronisation, allows for work-sharing.
Shared memory helps with skew
– but distributed work queues may solve this.
33
Terminology Bandwidth
Data rate across a network Latency
Delays in processing network data Skew
Uneven distribution of data and/or workload across the disks and processors.
34
Disk Layout
Where was the data to begin with? Major effects on performance. Algorithms run at the speed of the slowest disk!
Disk placement: Logical partitioning, hash, round-robin; “Declustering” for availability and load balance; Indices stored with their data.
This task is typically left to the DBA.
35
Handling Skew For range partitioning,
sample load on disks; cool “hot” (overloaded) disks by making range smaller.
For hash partitioning, cool hot disks by mapping some buckets to others during
query processing. Use hashing and assume uniform:
if range partitioning, sample data and use histogram to even out the distribution;
in application for Shore Management Plan/River scheme: work queue used to balance load.
36
Query Optimisation Map SQL to a relational algebra tree, annotated
with choice of algorithms. Issues: Choice of access methods (indices, scans); Join ordering; Join algorithms; Post-processing (e.g. hash vs. sort for groups, order).
Typical scheme (courtesy System R): Bottom-up dynamic-programming construction of
entire plan space; Prune based on cost and selectivity estimation.
37
Parallel Query Optimisation
More dimensions to plan space: degree of parallelism for each operator; scheduling:
assignment of work to processors.
38
Parallel Query Scheduling 1 Usage of a site by an isolated operator is given by
(Tseq, W, V) where Tseq is the sequential execution time (absence of
parallelism) of the operator; W is a d-dimensional work vector
time-shared (e.g. disks, CPUs, network interfaces)
V is a s-dimensional demand vector space-shared.(e.g. memory buffers)
A set of time-space pairs S = <(W1,V1),…,(Wk,Vk)> is said to be compatible if they can be executed together on a site That is the trade-off between time and space is
satisfactory in practice
39
Parallel Query Scheduling 2 Algorithms in General. All algorithms in
computing science are classified with respect to: Time Space
Time is represented here by W Space by V There are often trade-offs between time and space
Increase space (memory), reduce time Reduce space, increase time
Such trade-offs must be recognised in techniques for handling parallel databases
40
Parallel Query Scheduling 3 Challenges:
capture dependencies among operators (simple); pick a degree of parallelism for each op (no. of
clones); schedule clones to sites, under constraint of
compatibility. Solution: is a mixture of:
query plan understanding, approximation algorithms for bin-packing modifications of dynamic programming optimisation
algorithms.
41
Moving Onward
Parallelism and Object-Relational Can you abandon the structure and keep the
parallelism?– E.g. multi-dimensional objects, lists and array
data, multimedia (usually arrays). Typical tricks include chunking and clustering,
followed by sorting – i.e. try to apply set-like algorithms and “put right”
later.