cmsc724: anatomy of a database system · cmsc724: anatomy of a database system amol deshpande...

25
CMSC724: Anatomy of a Database System Amol Deshpande CMSC724: Anatomy of a Database System Amol Deshpande University of Maryland, College Park February 7, 2012

Upload: others

Post on 25-Mar-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CMSC724: Anatomy of a Database System · CMSC724: Anatomy of a Database System Amol Deshpande Overview 144 Introduction Fig. 1.1 Main components of a DBMS. a well-understood point

CMSC724:Anatomy of a

Database System

Amol Deshpande

CMSC724: Anatomy of a DatabaseSystem

Amol Deshpande

University of Maryland, College Park

February 7, 2012

Page 2: CMSC724: Anatomy of a Database System · CMSC724: Anatomy of a Database System Amol Deshpande Overview 144 Introduction Fig. 1.1 Main components of a DBMS. a well-understood point

CMSC724:Anatomy of a

Database System

Amol Deshpande

Anatomy of a Database SystemI How is it implemented ?I Issues:

I Process modelsI ParallelismI Storage modelsI Buffer managerI Query processing architectureI Transaction processingI Etc...

Page 3: CMSC724: Anatomy of a Database System · CMSC724: Anatomy of a Database System Amol Deshpande Overview 144 Introduction Fig. 1.1 Main components of a DBMS. a well-understood point

CMSC724:Anatomy of a

Database System

Amol Deshpande

Overview144 Introduction

Fig. 1.1 Main components of a DBMS.

a well-understood point of reference for new extensions and revolutionsin database systems that may arise in the future. As a result, we focuson relational database systems throughout this paper.

At heart, a typical RDBMS has five main components, as illustratedin Figure 1.1. As an introduction to each of these components and theway they fit together, we step through the life of a query in a databasesystem. This also serves as an overview of the remaining sections of thepaper.

Consider a simple but typical database interaction at an airport, inwhich a gate agent clicks on a form to request the passenger list for aflight. This button click results in a single-query transaction that worksroughly as follows:

1. The personal computer at the airport gate (the “client”) callsan API that in turn communicates over a network to estab-lish a connection with the Client Communications Managerof a DBMS (top of Figure 1.1). In some cases, this connection

Page 4: CMSC724: Anatomy of a Database System · CMSC724: Anatomy of a Database System Amol Deshpande Overview 144 Introduction Fig. 1.1 Main components of a DBMS. a well-understood point

CMSC724:Anatomy of a

Database System

Amol Deshpande

Process ModelsI Processes

I Heavyweight, context switch expensiveI Costly to create, limits on how manyI Large address space, OS support from the beginning

I ThreadsI lightweight, more complicated to programI no OS support till recentlyI In theory, can have very large numbers, in practice,

not lightweight enoughI Huge implications on performance

I Many DBMS wrote their own operating systems, theirown thread packages etc...

Page 5: CMSC724: Anatomy of a Database System · CMSC724: Anatomy of a Database System Amol Deshpande Overview 144 Introduction Fig. 1.1 Main components of a DBMS. a well-understood point

CMSC724:Anatomy of a

Database System

Amol Deshpande

Process ModelsI Processes

I Heavyweight, context switch expensiveI Costly to create, limits on how manyI Large address space, OS support from the beginning

I ThreadsI lightweight, more complicated to programI no OS support till recentlyI In theory, can have very large numbers, in practice,

not lightweight enough

I Huge implications on performanceI Many DBMS wrote their own operating systems, their

own thread packages etc...

Page 6: CMSC724: Anatomy of a Database System · CMSC724: Anatomy of a Database System Amol Deshpande Overview 144 Introduction Fig. 1.1 Main components of a DBMS. a well-understood point

CMSC724:Anatomy of a

Database System

Amol Deshpande

Process ModelsI Processes

I Heavyweight, context switch expensiveI Costly to create, limits on how manyI Large address space, OS support from the beginning

I ThreadsI lightweight, more complicated to programI no OS support till recentlyI In theory, can have very large numbers, in practice,

not lightweight enoughI Huge implications on performance

I Many DBMS wrote their own operating systems, theirown thread packages etc...

Page 7: CMSC724: Anatomy of a Database System · CMSC724: Anatomy of a Database System Amol Deshpande Overview 144 Introduction Fig. 1.1 Main components of a DBMS. a well-understood point

CMSC724:Anatomy of a

Database System

Amol Deshpande

Process ModelsI Assume: Uniprocessors + OS support for efficient

threadsI Option 1: “Process per connection”

I Not scalable (1000 Xion/s?), Shared data structuresI OS manages time-sharing, easy to implement

2.1 Uniprocessors and Lightweight Threads 153

2.1.1 Process per DBMS Worker

The process per DBMS worker model (Figure 2.1) was used by earlyDBMS implementations and is still used by many commercial systemstoday. This model is relatively easy to implement since DBMS work-ers are mapped directly onto OS processes. The OS scheduler man-ages the timesharing of DBMS workers and the DBMS programmercan rely on OS protection facilities to isolate standard bugs like mem-ory overruns. Moreover, various programming tools like debuggers andmemory checkers are well-suited to this process model. Complicatingthis model are the in-memory data structures that are shared acrossDBMS connections, including the lock table and bu!er pool (discussedin more detail in Sections 6.3 and 5.3, respectively). These shared datastructures must be explicitly allocated in OS-supported shared memoryaccessible across all DBMS processes. This requires OS support (whichis widely available) and some special DBMS coding. In practice, the

Fig. 2.1 Process per DBMS worker model: each DBMS worker is implemented as an OSprocess.

Figure: Process per Connection

Page 8: CMSC724: Anatomy of a Database System · CMSC724: Anatomy of a Database System Amol Deshpande Overview 144 Introduction Fig. 1.1 Main components of a DBMS. a well-understood point

CMSC724:Anatomy of a

Database System

Amol Deshpande

Process ModelsI Assume: Uniprocessors + OS support for efficient

threadsI Option 2: “Server Process Model”

I Single multi-threaded server. Efficient.I Difficult to port/debug, no OS protection. Requires

asynchronous I/O.

154 Process Models

required extensive use of shared memory in this model reduces some ofthe advantages of address space separation, given that a good fractionof “interesting” memory is shared across processes.

In terms of scaling to very large numbers of concurrent connections,process per DBMS worker is not the most attractive process model. Thescaling issues arise because a process has more state than a thread andconsequently consumes more memory. A process switch requires switch-ing security context, memory manager state, file and network handletables, and other process context. This is not needed with a threadswitch. Nonetheless, the process per DBMS worker model remains pop-ular and is supported by IBM DB2, PostgreSQL, and Oracle.

2.1.2 Thread per DBMS Worker

In the thread per DBMS worker model (Figure 2.2), a single multi-threaded process hosts all the DBMS worker activity. A dispatcher

Fig. 2.2 Thread per DBMS worker model: each DBMS worker is implemented as an OS

thread. Figure: Server Process Model

Page 9: CMSC724: Anatomy of a Database System · CMSC724: Anatomy of a Database System Amol Deshpande Overview 144 Introduction Fig. 1.1 Main components of a DBMS. a well-understood point

CMSC724:Anatomy of a

Database System

Amol Deshpande

Process ModelsI Assume: Uniprocessors + OS support for efficient

threadsI Option 3: “Server Process + I/O processes”

I Use I/O processes for handling disks. One processper device.156 Process Models

Fig. 2.3 Process Pool: each DBMS Worker is allocated to one of a pool of OS processes

as work requests arrive from the Client and the process is returned to the pool once the

request is processed.

and all processes are already servicing other requests, the new requestmust wait for a process to become available.

Process pool has all of the advantages of process per DBMS workerbut, since a much smaller number of processes are required, is consid-erably more memory e!cient. Process pool is often implemented witha dynamically resizable process pool where the pool grows potentiallyto some maximum number when a large number of concurrent requestsarrive. When the request load is lighter, the process pool can be reducedto fewer waiting processes. As with thread per DBMS worker, the pro-cess pool model is also supported by a several current generation DBMSin use today.

2.1.4 Shared Data and Process Boundaries

All models described above aim to execute concurrent client requestsas independently as possible. Yet, full DBMS worker independence andisolation is not possible, since they are operating on the same shared

Figure: Server Process + I/O Processes

Page 10: CMSC724: Anatomy of a Database System · CMSC724: Anatomy of a Database System Amol Deshpande Overview 144 Introduction Fig. 1.1 Main components of a DBMS. a well-understood point

CMSC724:Anatomy of a

Database System

Amol Deshpande

Process ModelsI DBMS threads, OS processes, OS Threads etc...

I Earlier OSs did not support:I Buffering control, asynchronous I/O,

high-performance threadsI Many DBMSs implemented their own thread

packagesI Much replication of functionality

I How to map DBMS threads on OSprocesses/threads ?

I One or more processes/threads to host SQLprocessing threads

I One or more “dispatcher processes/threads”I One process/thread per disk and one per log diskI One coordinator agent process/thread per sessionI Processes/threads for background tools/utilities

Page 11: CMSC724: Anatomy of a Database System · CMSC724: Anatomy of a Database System Amol Deshpande Overview 144 Introduction Fig. 1.1 Main components of a DBMS. a well-understood point

CMSC724:Anatomy of a

Database System

Amol Deshpande

Process ModelsI DBMS threads, OS processes, OS Threads etc...

I Earlier OSs did not support:I Buffering control, asynchronous I/O,

high-performance threadsI Many DBMSs implemented their own thread

packagesI Much replication of functionality

I How to map DBMS threads on OSprocesses/threads ?

I One or more processes/threads to host SQLprocessing threads

I One or more “dispatcher processes/threads”I One process/thread per disk and one per log diskI One coordinator agent process/thread per sessionI Processes/threads for background tools/utilities

Page 12: CMSC724: Anatomy of a Database System · CMSC724: Anatomy of a Database System Amol Deshpande Overview 144 Introduction Fig. 1.1 Main components of a DBMS. a well-understood point

CMSC724:Anatomy of a

Database System

Amol Deshpande

Storage ModelsI Spatial control

I Sequential vs randomI Seeks not improving that fast

I Controlling spatial localityI Directly access to the disk (if possible)I Allocate a large file, and address using the offsets

Page 13: CMSC724: Anatomy of a Database System · CMSC724: Anatomy of a Database System Amol Deshpande Overview 144 Introduction Fig. 1.1 Main components of a DBMS. a well-understood point

CMSC724:Anatomy of a

Database System

Amol Deshpande

Storage ModelsI Buffer management

I DBMS need control – why ?I Correctness (WAL), performance (read-ahead)I Typical installations not I/O-bound

I Allocate a large memory regionI Maintain a page table with: disk location, dirty bit,

replacement policy stats, pin countI Page replacement policy

I LRU-2I “double buffering” issuesI Memory-mapping: mmap

Page 14: CMSC724: Anatomy of a Database System · CMSC724: Anatomy of a Database System Amol Deshpande Overview 144 Introduction Fig. 1.1 Main components of a DBMS. a well-understood point

CMSC724:Anatomy of a

Database System

Amol Deshpande

TransactionsI Monolithic (why?)

I Lock manager, log manager, buffer pool, accessmethods

I ACIDI Typically:

I “I” – locking, “D” – loggingI “A” – locking + logging, “C” – runtime checks

I BASE ? (Eric Brewer)I Basically Available Soft-state Eventually consistent

Page 15: CMSC724: Anatomy of a Database System · CMSC724: Anatomy of a Database System Amol Deshpande Overview 144 Introduction Fig. 1.1 Main components of a DBMS. a well-understood point

CMSC724:Anatomy of a

Database System

Amol Deshpande

TransactionsI Locks

I Strict 2PL most commonI Uses a dynamic hash table-based “lock table”

I Contains: lock mode, holding Xion, waiting Xions etcI Also, a way to start the Xion when a lock is obtained

I LatchesI Quick-durationI Mostly for internal data structures, internal logic

I Can’t have deadlocks or other consistency issues

Page 16: CMSC724: Anatomy of a Database System · CMSC724: Anatomy of a Database System Amol Deshpande Overview 144 Introduction Fig. 1.1 Main components of a DBMS. a well-understood point

CMSC724:Anatomy of a

Database System

Amol Deshpande

Isolation LevelsI Degrees of consistency (Gray et al.)

I Read uncommitted, read committed, repeatableread, serializable

I “Phantom” tuplesI ANSI SQL Isolation levels

I Not fully well-defined

Page 17: CMSC724: Anatomy of a Database System · CMSC724: Anatomy of a Database System Amol Deshpande Overview 144 Introduction Fig. 1.1 Main components of a DBMS. a well-understood point

CMSC724:Anatomy of a

Database System

Amol Deshpande

Log managerI Required for atomicity and durability

I Allows recovery and transaction abortsI Why a problem ?

I “STEAL” and “NO FORCE”I Concepts:

I Write-ahead logging, in-order flushes etcI Undo/redo, checkpoints

I ARIES

Page 18: CMSC724: Anatomy of a Database System · CMSC724: Anatomy of a Database System Amol Deshpande Overview 144 Introduction Fig. 1.1 Main components of a DBMS. a well-understood point

CMSC724:Anatomy of a

Database System

Amol Deshpande

Locking/Logging and IndexesI Locking:

I Can’t use 2PL on indexesI Solutions: “Crabbing”, Right-link schemes

I Logging:I No need to “undo” a index page split

I Phantom problem:I 1. Use predicate lockingI 2. “next-key” locking

Page 19: CMSC724: Anatomy of a Database System · CMSC724: Anatomy of a Database System Amol Deshpande Overview 144 Introduction Fig. 1.1 Main components of a DBMS. a well-understood point

CMSC724:Anatomy of a

Database System

Amol Deshpande

Shared ComponentsI Memory allocations

I Usually “context”-basedI Allocate a large context, and do everything within it

I Why ?I Disk management subsystems

I Dealing with RAID etcI Replication services

I Copy, trigger-based or replay-log

I Statistics gathering, reorganization/indexconstruction, backup/export

Page 20: CMSC724: Anatomy of a Database System · CMSC724: Anatomy of a Database System Amol Deshpande Overview 144 Introduction Fig. 1.1 Main components of a DBMS. a well-understood point

CMSC724:Anatomy of a

Database System

Amol Deshpande

Parallelism

Page 21: CMSC724: Anatomy of a Database System · CMSC724: Anatomy of a Database System Amol Deshpande Overview 144 Introduction Fig. 1.1 Main components of a DBMS. a well-understood point

CMSC724:Anatomy of a

Database System

Amol Deshpande

ParallelismI Shared memory

I Direct mapping from uni-processorI Shared nothing

I Horizontal data partitioning, partial failureI Query processing, optimization challenging

I Shared diskI Distributed lock managers, cache-coherency etc...

Page 22: CMSC724: Anatomy of a Database System · CMSC724: Anatomy of a Database System Amol Deshpande Overview 144 Introduction Fig. 1.1 Main components of a DBMS. a well-understood point

CMSC724:Anatomy of a

Database System

Amol Deshpande

Query ProcessingI Assume single-user, single-threaded

I Concurrency managed by lower layersI Steps:

I Parsing: attritube references, syntax etc...I Catalog stored as “denormalized” tables

I Rewriting:I Views, constants, logical rewrites (transitive

predicates, true/false predicates), semantic (usingconstraints), subquery flattening

Page 23: CMSC724: Anatomy of a Database System · CMSC724: Anatomy of a Database System Amol Deshpande Overview 144 Introduction Fig. 1.1 Main components of a DBMS. a well-understood point

CMSC724:Anatomy of a

Database System

Amol Deshpande

Query ProcessingI Steps: Optimizer

I Block-by-blockI Machine code vs interpretableI Compile-time vs run-timeI Selinger ++:

I Larger plan space, selectivity estimationI Top-down (SQLServer), auto-tuning, expensive fns

I “Hints”

Page 24: CMSC724: Anatomy of a Database System · CMSC724: Anatomy of a Database System Amol Deshpande Overview 144 Introduction Fig. 1.1 Main components of a DBMS. a well-understood point

CMSC724:Anatomy of a

Database System

Amol Deshpande

Query ProcessingI Steps: Executor

I “get_next()” iterator modelI Narrow interface between iteratorsI Can be implemented independentlyI Assumes no-blocking-I/O

I Some low-level detailsI Tuple-descriptorsI Very carefully allocated memory slotsI “avoid in-memory copies”

I Pin and unpin

Page 25: CMSC724: Anatomy of a Database System · CMSC724: Anatomy of a Database System Amol Deshpande Overview 144 Introduction Fig. 1.1 Main components of a DBMS. a well-understood point

CMSC724:Anatomy of a

Database System

Amol Deshpande

Query ProcessingI SQL Update/Delete

I “Halloween” problemI Access Methods

I B+-Tree and heap filesI Multi-dimensional indexes not common

I init(SARG)I “avoid too many back-and-forth function calls”

I Allow access by RID