1 cg096: lecture 9 parallel data bases. 2 overview background: recap of relational systems. parallel...

1

CG096: Lecture 9

Parallel Data Bases

2

Overview

Background: Recap of Relational Systems.

Parallel Query Processing: Sort and Hash-Join assuming a “shared-nothing” architecture supposedly hardest to program, but actually

quite clean. Data Layout. Parallel Query Optimization.

3

Some History

In pre-relational data bases, programmers reigned: Data models used explicit pointers Manipulated by (e.g.) COBOL code.

Relational revolution: data abstraction: Declarative languages and data independence Key to the most successful parallel systems.

Significant developments: Codd’s relational model: early 70’s; Experimental, partially relational implementations (System R &

INGRES): mid-late 70’s; Commercial, fully relational implementations (Oracle, IBM

DB2, INGRES Corp: early 80’s; Rise of parallel DBs: late 80’s onwards.

4

The Relational Data Model

A data model is a collection of concepts for describing data.

A schema is a description of a particular collection of data, using the a given data model.

The relational model of data: Main construct: relation, basically a table with

rows and columns. Every relation has a schema, which describes the

columns, or fields. N.B. no pointers, no nested structures, no ordering,

no irregular collections.

5

2 Levels of Indirection Many views, single

conceptual (logical) schema and single physical schema. Views describe how

users see the data.

Conceptual schema defines logical structure

Physical schema describes the files and indexes used.

Physical Schema

Conceptual Schema

View 1 View 2 View 3

6

Data Independence – aiding the parallel approach

Applications insulated from how data is structured and stored.

Logical data independence: Protection from changes in logical structure of data– lets you implement parallel processing under

traditional applications. Physical data independence:

Protection from changes in physical structure of data– minimises constraints on processing, enabling clean

parallelism.

7

Structure of a DBMS

A typical DBMS has a layered architecture.

The figure does not show the concurrency control and recovery components.

This is one of several possible architectures; each system has its own variations.

Query Optimizationand Execution

Relational Operators

Files and Access Methods

Buffer Management

Disk Space Management

DB

Parallel considerations

mostly here

8

Relational Query Languages

Query languages: allow manipulation and retrieval of data from a data base.

Relational model supports simple, powerful QLs: Strong formal foundation based on logic.– supporting optimisation and parallelisation.

Query Languages <> Programming Languages! QLs not expected to be “computationally complete”. QLs not designed for doing complex calculations. QLs support easy, efficient access to large data sets.

9

Formal Relational Query Languages Two mathematical Query Languages form the

basis for “real” languages (e.g. SQL), and for implementation:

Relational Algebra: More operational, very useful for representing internal execution plans. “Data base opcodes” – can be optimised and

parallelised for the sake of efficiency. Relational Calculus: Lets users describe what

they want, rather than how to compute it (non-operational, declarative – SQL comes from here.)

10

Basic SQL Relation-list: a list of relation names

– possibly with a range-variable after each name. Target-list: a list of attributes of tables in a

relation-list. Qualification: comparisons combined using

AND, OR and NOT. Comparisons are Attr op const or Attr1 op Attr2,

where op is one of = <> < > <= >= DISTINCT: optional keyword indicating that the

answer should not contain duplicates. Default is that duplicates are not eliminated!

11

Conceptual Evaluation Strategy

Semantics of an SQL query defined in terms of the following conceptual evaluation strategy: Compute the cross-product of the relation-list. Discard resulting tuples if they fail the

qualifications. Delete attributes that are not in the target-list. If DISTINCT is specified, eliminate duplicate rows.

This approach is naïve from the performance point of view

12

Downside of Naïve Approach Probably the least efficient way to compute a

query An optimiser will find the same answers using more

efficient strategies.

Need to devise a strategy which Minimises the size of relations involved in joins Does selects and projects before joins Joins small to large relations before large to large

where 3 or more joins required

13

Optimiser maps SQL to algebra tree with specific algorithms: access methods, join algorithms, scheduling.

Relational operators implemented as iterators: Open () Next (possibly with condition) Close ()

Parallel processing engine built on partitioning data flow to iterators: inter- and intra-query parallelism.

Query Optimisation & Processing

14

Parallel and Distributed Data Bases are being increasingly used to achieve: better performance; increased availability of data; access to data distributed at various sites.

Why use Parallel DB’s?

A Parallel DB enables the first of theabove, and a DDB enables the last twopredominantly.

15

Parallelism on one CPU Can use multi-threaded approach Threads are a way for a program to split

itself into two or more simultaneously running tasks.

Multiple threads can be executed in parallel We are more concerned here with use of

multiple CPUs

16

Parallel Data Base Architecture

Parallel data bases (PDBs) use more than one CPU / disk to carry out evaluation of a query faster.

There are three possible ways of sharing processors, disks and memory: Shared nothing: the individual CPU / disk / memory

are interconnected. Shared memory: all processors share the same

memory, all data can move between memory and any disk.

Shared disk: every processor has its own memory, but can read / write from any disk.

17

The shared-disk and shared-memory architectures have communication overheads.

As the number of CPUs / disks increases, there is more and more of a bottleneck.

Beyond a certain limit, increasing the number of CPUs reduces the performance.

Shared-nothing systems provide linear speed-up and scale-up. The time per transaction should remain the same when the numbers of CPUs and transactions increase in proportion.

Communication Overhead

18

The query tree can be evaluated in parallel, each node can be independently evaluated if the corresponding relations are stored in a separate CPU / disk.

Even a simple query can be evaluated in parallel if the relation is partitioned.

How can we partition a relation?

Parallel Query Evaluation

19

Data Partitioning – Recap

Data can be partitioned in various ways: Each row is allocated to one of the processors

in round-robin. Each row is hashed and allocated to the

corresponding processor. Each CPU stores a range of rows.

Which is better? Depends on nature of queries.

20

Background - Hashing Hashing is an access method that:

Hashes (chops) an input key So that the result is an integer In the range of the address space available

Example: Address space is 0…99 buckets Key is s105 Hash function chops off s, divides by 100 and takes

remainder 5 Record is stored in bucket 5 Fast retrieval and placement (one disk access) if

space in buckets

21

Distributed Data Bases – Recap

A DDB system involves multiple sites or nodes connected by a network.

Each site has own CPU, terminals, DBMS, users, DBA and local autonomy.

A user can access data stored locally or on other nodes. DDBs are becoming more common.

22

Benefits of DDBMS

Closer match to distributed applications such as airline reservation systems, bank ATMs etc.

Increased reliability and availability when coupled with replication – better system functionality.

Shares data between sites, under local control. Improved performance may be achieved, because

processing of transactions is in parallel at different sites.

23

Costs of DDBMS

Communication between sites for transmission of data and commands.

Need to keep track of indices when processing queries that access dispersed and replicated data.

Maintenance. Risk of communication failure.

24

Example of Distributed Data(Head office)

Supplier DetailsCustomer DetailsProduct DetailsStock Levels

Warehouse detailsSales transactions

Supplier Details (snapshot)Customer Details (snapshot)Product Details (daily replication)Transaction records

Supplier Details (snapshot)Customer Details (snapshot)

Product Details (daily replication)

Transaction records

This data does not change often so SN is

sufficient

Product details change infrequently and daily replication is enough.

Replication is for performance

25

Workloads

On-Line Transaction Processing Many little jobs (e.g. debit / credit). A typical SQL system c. 1995 supported

21,000 t.p.m. using 112 CPUs, 670 disks. Batch (decision support and utility)

Few big jobs, parallelism inside. Scan data at 100 MB/sec. Linear Scaleup to 500 processors.

26

Parallel Sorting Why?

DISTINCT, GROUP BY, ORDER BY, sort-merge join, index build – major overheads and inefficient.

Parallelisation targets these components to improve performance– by splitting into manageable pieces.

Phases:1. parallel read and partition (coarse radix sort),

pipelined with parallel sorting of memory-sized runs, spilling runs to disk.

2. parallel reading and merging of runs

27

Coarse Radix Sort Radix sort

Based on binary representation Compute ranks of elements

Coarse-grained Individual tasks are relatively large

28

Parallel Sorting 2 Notes:

Phase 1 requires repartitioning. High bandwidth network required.

Phase 2 totally local processing. Both pipelined and partitioned parallelism. Linear speedup, scaleup

29

Hash Join Phase 1: Partition both

relations using hash fn h: R tuples in partition i will only match S tuples in partition i.

Phase 2: Read in a partition of R, hash it using h2 (<> h!). Scan matching partition of S, search for matches.

Partitionsof R & S

Input bufferfor Si

Hash table for partitionRi (k < B-1 pages)

B main memory buffersDisk

Output buffer

Disk

Join Result

hashfnh2

h2

B main memory buffers DiskDisk

Original Relation OUTPUT

2INPUT

1

hashfunction

h B-1

Partitions

1

2

B-1

. . .

30

Parallel Query Processing 1

Essentially no synchronisation except setup & teardown: No barriers, cache coherence, etc. Database transactions work fine in parallel:

data updated in place, with 2-phase locking transactions;

replicas managed only at end of transaction via 2-phase commit;

coarser grain, higher overhead than cache coherency.

31

Terminology Setup

Establish active connection (add to connection table) Teardown

Remove from the connection table 2-phase locking

In the first phase, locks are acquired but may not be released.

In the second phase, locks are released but new locks may not be acquired.

Rationalises acquiring and releasing of resources Serialises concurrent transactions

32

Parallel Query Processing 2 Bandwidth much more important than latency:

often pump 1-1/n % of a table through the network; aggregate net bandwidth should match aggregate disk

bandwidth.

Ordering of data flow immaterial (relational model)– simplifies synchronisation, allows for work-sharing.

Shared memory helps with skew

– but distributed work queues may solve this.

33

Terminology Bandwidth

Data rate across a network Latency

Delays in processing network data Skew

Uneven distribution of data and/or workload across the disks and processors.

34

Disk Layout

Where was the data to begin with? Major effects on performance. Algorithms run at the speed of the slowest disk!

Disk placement: Logical partitioning, hash, round-robin; “Declustering” for availability and load balance; Indices stored with their data.

This task is typically left to the DBA.

35

Handling Skew For range partitioning,

sample load on disks; cool “hot” (overloaded) disks by making range smaller.

For hash partitioning, cool hot disks by mapping some buckets to others during

query processing. Use hashing and assume uniform:

if range partitioning, sample data and use histogram to even out the distribution;

in application for Shore Management Plan/River scheme: work queue used to balance load.

36

Query Optimisation Map SQL to a relational algebra tree, annotated

with choice of algorithms. Issues: Choice of access methods (indices, scans); Join ordering; Join algorithms; Post-processing (e.g. hash vs. sort for groups, order).

Typical scheme (courtesy System R): Bottom-up dynamic-programming construction of

entire plan space; Prune based on cost and selectivity estimation.

37

Parallel Query Optimisation

More dimensions to plan space: degree of parallelism for each operator; scheduling:

assignment of work to processors.

38

Parallel Query Scheduling 1 Usage of a site by an isolated operator is given by

(Tseq, W, V) where Tseq is the sequential execution time (absence of

parallelism) of the operator; W is a d-dimensional work vector

time-shared (e.g. disks, CPUs, network interfaces)

V is a s-dimensional demand vector space-shared.(e.g. memory buffers)

A set of time-space pairs S = <(W1,V1),…,(Wk,Vk)> is said to be compatible if they can be executed together on a site That is the trade-off between time and space is

satisfactory in practice

39

Parallel Query Scheduling 2 Algorithms in General. All algorithms in

computing science are classified with respect to: Time Space

Time is represented here by W Space by V There are often trade-offs between time and space

Increase space (memory), reduce time Reduce space, increase time

Such trade-offs must be recognised in techniques for handling parallel databases

40

Parallel Query Scheduling 3 Challenges:

capture dependencies among operators (simple); pick a degree of parallelism for each op (no. of

clones); schedule clones to sites, under constraint of

compatibility. Solution: is a mixture of:

query plan understanding, approximation algorithms for bin-packing modifications of dynamic programming optimisation

algorithms.

41

Moving Onward

Parallelism and Object-Relational Can you abandon the structure and keep the

parallelism?– E.g. multi-dimensional objects, lists and array

data, multimedia (usually arrays). Typical tricks include chunking and clustering,

followed by sorting – i.e. try to apply set-like algorithms and “put right”

later.

1 cg096: lecture 9 parallel data bases. 2 overview background: recap of relational systems. parallel...

Documents

relational model of

data abstraction

data independencekey

data models

retrieval of data

data layout

logical data independence

physical data independence