foundations of information systems 5 dbms...

40
© Prof. Dr.-Ing. Wolfgang Lehner | INTELLIGENT DATABASE GROUP DBMS Architecture 5 Foundations of Information Systems

Upload: others

Post on 08-Apr-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

copy Prof Dr-Ing Wolfgang Lehner |

INTELLIGENT DATABASE GROUP

DBMS Architecture 5

Foundations of Information Systems

| 264

gt

copy A Behrend Foundations of Information Systems |

What is in the Lecture

1 Database Usage Query Programming Design

2 Database Architecture Indexes Transactions Query Processing

| 265

gt

copy A Behrend Foundations of Information Systems |

How is Database System build

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

byte[] b = read(File f int pos int length)

| 266

gt

copy A Behrend Foundations of Information Systems |

Storage System

Buffer

File System

Hardware

Data System

Application

Architectural Blue Print

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

copy A Behrend Foundations of Information Systems | 267

gt

|

Architectural Trends

| 268

gt

copy A Behrend Foundations of Information Systems |

Different Access Characteristics

OLTP (On-line Transaction Processing) Mix between read-only and update queries Minor analysis tasks Used for data preservation and lookup Read typically only a few records at a time High performance by storing contiguous records in disk pages

OLAP (On-line Analytical Processing) Query-intensive DBMS applications Infrequent batch-oriented updates Complex analysis on large data volumes Read typically only a few attributes of large amounts of historical data in order to

partition them and compute aggregates High performance by storing contiguous values of a single attribute

| 269

gt

copy A Behrend Foundations of Information Systems |

Hardware Developments

Hardware improvements not equally distributed Advances in CPU speed have outpaced advances

in RAM latency Main-memory access has become a performance

bottleneck for many computer applications Bandwidth Latency Address translation (TLB)

rarr Memory Wall Cache memories can reduce the memory latency

when the requested data is found in the cache Vertically fragmented data structures optimize

memory cache usage

| 270

gt

copy A Behrend Foundations of Information Systems |

Row Storage vs Column Storage

Row Storage + easy to addmodify a record - might read unnecessary data

Column Storage + only need to read in relevant data - tuple writes require multiple accesses -gt suitable for read-mostly read-intensive large data repositories

| 271

gt

copy A Behrend Foundations of Information Systems |

Processing Models

[Marcin Zukowski Peter A Boncz Niels Nes Saacutendor Heacuteman MonetDBX100 - A DBMS In The CPU Cache IEEE Data Eng Bull 28(2) p17-22 2005]

| 272

gt

copy A Behrend Foundations of Information Systems |

Transaction Management

Principle of a transaction Sequence of successive DB operations that transform a database from a consistent

state into another consistent state surrounded by BOT EOT (Commit Abort)

Properties ACID Atomicity Consistency Isolation Durability A transaction will always come to an end Normal (commit) changes are permanently stored within the DB Abnormal (abort rollback) already composed changes are taken back

Note EOT state must not be different from BOT state

BOT(begin of transaction)

EOT(end of transaction)

possibly inconsistent database

consistentdatabase

consistentdatabase

DB DB

DML operations

| 273

gt

copy A Behrend Foundations of Information Systems |

ACID Properties of Transactions

Atomicity Indivisibility due to the transaction definition (Begin - End) All-or-nothing principle ie the DBS guarantees Either the complete execution of a transaction hellip hellip or the ineffectiveness of the whole transaction (and of all associated operations)

Consistency A successful transaction guarantees that all consistency requirements (integrity

requirements) have been met

Isolation Multiple transactions run isolated from each other and do not use (inconsistent)

intermediate results from other transactions

Durability All results of successful transactions have to be made persistent

| 274

gt

copy A Behrend Foundations of Information Systems |

Motivation

Atomicity Part of the transaction is done but we want to cancel it ABORTROLLBACK System crashes during transaction some changes made it to the disk some did not

Durability Transaction finished user notified COMMIT System crashes before changes sent successfully to disk (asynchronous write)

Consistency Physical consistency Correctness of the storage and access structures Completely executed modification operations preserve the consistency

Logical consistency Correctness of data contents ndash correspond to a (possible) state of the real world Completely executed transactions preserve the logical consistency

- All modifications of finished transactions are included - No modifications of open transactions are included

Remember Logical consistency requires physical consistency in the first place

UNDO Recovery

REDO Recovery

UNDO Recovery for consistency-related rollbacks

| 275

gt

copy A Behrend Foundations of Information Systems |

Reasons for crashes

Transaction error Violation of system restrictions Violation of security regulations Excessive resource requirements deadlocks

Application-related errors eg wrong operations and values ROLLBACK

System error System crash with loss of main-memory contents Database system operating system hardware power failure

Device error (especially storage-medium error) Destruction of secondary storage systems

Catastrophes Destruction of the computing center

| 276

gt

copy A Behrend Foundations of Information Systems |

Guarantee Atomicity amp Durability

Assumptions System may crash but the disk is durable The only atomicity guarantee is that a disk block write is atomic

Materialization strategy Preferred Policy StealNo Force This combination is most complicated but allows for highest performance No Force complicates enforcing Durability What if system crashes before a modified page written by a committed

transaction makes it to disk Write as little as possible in a convenient place at commit time to support

REDOing modifications Steal complicates enforcing Atomicity What if the transaction that performed udpates aborts What if system crashes before transaction is finished Must remember the old value of P (to support UNDOing the write to page P)

copy A Behrend Foundations of Information Systems | 277

gt

|

Storage System

Buffer

File System

Hardware

Data System

Application

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

Record Management

| 278

gt

copy A Behrend Foundations of Information Systems |

Record

Record Package of fields that together describe a thing a person a fact etc Each fields represents on property of the entity described by the record Similar to a struct in C Variable length (in contrast to pages)

Record Manager Organizes physical storage of records in pages Operations Get Insert Update Delete Scan Agnostic to record structure and semantic

records considered as byte strings of variable length Structure and content of record is defined be Access System and application

Challenges Record addressing Free space management

| 279

gt

copy A Behrend Foundations of Information Systems |

Record Addressing

Record address Identifier for records used to address records eg in indexes or query processing Assigned during insert of a record

Goals Stability of identifier Fast and direct access Less organizational overhead

Direct addressing Byte address or position number in file or page Instable Byte address If record grows in length following records would get new address Position number Insert and delete operations change series or records

Indirect addressing Surrogate with mapping table (complete indirection) Tuple Identifier (TID concept)

| 280

gt

copy A Behrend Foundations of Information Systems |

Surrogate with Mapping Table

Surrogate Record type + serial number Serial number remains constant during recordrsquos life time

Mapping table Maps

surrogate to page

Problems Where to store mapping table How can it be extended How to search mapping table efficiently

rarr H2 use B-Tree to store mapping table

Mapping Table Surrogate | Page ID

| 281

gt

copy A Behrend Foundations of Information Systems |

TID Concept

Record addressing with indirection inside the page Each page contains an array with record positions TID of a record consist of page id and index in position array

Pros Access with one page access (two pages in case of overflow) Stable No mapping table required

Operations Insert Reuse unused

position or add position Delete Mark position

as unused in array Update Update all

positions in array Update with overflow Store record

as overflow record and store TID of overflow record at original position (No double overflow Update TID at original position)

Record

Overflow Record

| 282

gt

copy A Behrend Foundations of Information Systems |

Free Space Management

Problem In which page is enough space for new record

Solution Free space table lists for all pages how much space is left

Free space value Precise value Ceil(Log2(page size)) =gt 2 bytes for common page size of 4K Rough value use less bytes free space = (value page size)2^(bits per value)

Free space table With direct page addressing Assuming a single page can take n free space entries First page and each (n+1)-th page takes free space entries

With indirect page addressing Free space information stored in page table

copy A Behrend Foundations of Information Systems | 283

gt

|

Storage System

Buffer

File System

Hardware

Data System

Application

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

Physical Access Paths ndash Index Structures

| 284

gt

copy A Behrend Foundations of Information Systems |

Primary Index Secondary Index

Overview Indexes

Table scan Read all pages and for each record

evaluate the search criteria Pre-fetching

Index Scan Use index for search criteria

on one or more attributes Fast access to single values or value ranges of index attributes Logicalphysical sorting of values of key attributes (depending on index structure) Enforcing uniqueness

Types if indexes Primary (Clustered) Index

determines physical organization use for PK Secondary (Non-Clustered)

Index redundant access path

Pers(PID NAME AGE SALARY)

Age

Salary

| 285

gt

copy A Behrend Foundations of Information Systems |

Overview Indexes (2)

Choice of Access Paths Index scan Only useful for low selectivity

(low number of result tuples) Break even-point according to the

output ratio of the number of tuples (usually max 5) Requires statistics about data Additional costs for index storage

and updating

Table Scan adequateefficient for small tables

(eg 5 pages) Queries with high selectivity

(large result sets) 100-200MBs sequential read

~ 100 disk seekss

hit rate

Index Scan

Table Scan

access time

| 286

gt

copy A Behrend Foundations of Information Systems |

Classification of Index Structures

Classification

Multiway Trees Tree structure with multiple children per node Idea chose fan out so that node size suits page size

Onedimensional Index Structures

Key Comparison Key Transformation

Sequential Tree-Based Hash-Based

Prefix Trees (Tries)

Binary Search Trees Dynamic Static Linked Lists

(log seq) Seq Lists

(phys seq) Multiway

Trees Example B-Tree

| 287

gt

copy A Behrend Foundations of Information Systems |

B-Tree

free space

(Ki Di Pi) = entry min |P| = k+1 max |P| = 2k+1

keys lt K1 keys gt Kp Ki lt keys lt Ki+1

| 288

gt

copy A Behrend Foundations of Information Systems |

B-Tree (2)

Example Keys Agnostic to specific key semantic Only defined complete order required Could be of fixed or variable length

Operations Search for data for given key value Insertion and deletion of key-data pair

Payload Agnostic to specific data semantic Can be record or reference (TID) or

mix

B-Tree with k = 2 h = 3

| 289

gt

copy A Behrend Foundations of Information Systems |

Search in the B-Tree

Starting at the root node each node is searched from left to right 1) if Ki matches the desired key value the data record has been found (further

records with the same key value might be located in a sub-tree to which Pi-1 points) 2) if Ki is smaller than the desired value the search will be continued in the root of

the sub-tree identified by Pi-1 3) if Ki is larger than the desired value the comparison with Ki+1 is repeated 4) if K2k is also smaller than the desired value the search will be continued in the

sub-tree of P2k If itlsquos impossible to descend further into a sub-tree (2 or 4) (leaf node) The search is aborted no record with the desired key value is found

Search for 38 20 6

| 290

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (1)

Insertion Rule insert only into leaf nodes At Non-Leaf Nodes descend down the tree as for the search S le Ki follow Pi-1 S gt Ki check Ki+1 S gt K2k follow P2k

At Leaf Node Insert the data record according to the sorting order Special case leaf node is full (2k records) rarr split the leaf node

Splitting Generate a new leaf node Split the 2k+1 entries (in order)

into two leaf nodes first k entries rarr left node last k entries rarr right node

middle entry (k+1-th) is used as new ldquodiscriminatorrdquo (branching) and inserted into the parent node

| 291

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (2)

Node Splitting during Insertion Two possible situations after a split The parent node is full rarr repeat split on this level Enough space rarr FINISHED

Special case root split Split of the root node rarr New root with two successor nodes Height of a tree grows by 1 The tree has been split from the bottom to the top

Dynamic reorganization (self-balancing) No unloading or loading necessary Tree is always balanced But In case of many insertions deletions reorganization can be beneficial

| 292

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (3)

Insertion Example Order k = 1 n=2k Keys 1 5 2 6 7 4 8 3

Finally h=3

| 293

gt

copy A Behrend Foundations of Information Systems |

Insertion and Deletion in the B-Tree

Problem Insertion can create overflow Deletion can create underflow and overflow

Example Insertion of key 22

rarr Overflow rarr Split

Deletion of key 22 rarr Underflow need to access all four nodes finally same as input

Insert 22

| 294

gt

copy A Behrend Foundations of Information Systems |

Underflow Merge

Deletion in the B-Tree

Example Order k = 1 n=2k Delete key 3

| 295

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (2)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Remember Each path from the root to the leaf has the same length h

| 296

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (3)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Overflow Split

| 297

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (4)

Example Order k = 1 n=2k Delete key 3

Overflow Split

Root Split

| 298

gt

copy A Behrend Foundations of Information Systems |

Deletion Algorithm

Example ndash there are different algorithms Search the node that contains the key K to be deleted If key K is in a leaf node delete the key in the leaf node and handle potentially

resulting underflow by merging with sibling If key K is in an inner node pull up new discriminator from one of the successors Analyze which successor node of K has more elements left or right one

If both have the same number of elements decide for one Replace the key K to be deleted with the direct successor Krsquo from the left

successor node or with the direct successor Krsquorsquo from the right successor node respectively Delete Krsquo or Krsquorsquo from the respective successor node (recursively)

Note Major variants Merge (tis lecture) Re-distribution (instead of splitmerge in case of overflowunderflow the entries are

re-distributed under consideration of one or multiple adjacent nodes)

| 299

gt

copy A Behrend Foundations of Information Systems |

B-Trees B+-Trees and B-Trees

B+-Trees and B-Trees Data is only in leaf nodes Key redundancy but higher fan-out rarr lower tree high less IO Simpler delete procedure rarr requires only merging of nodes

Double linked list of all leaf nodes B-Trees Modified valid node sizes

from [k2k] to [43k2k] rarr better node utilization but more splitsmerges

Example Secondary index Non unique

B-Tree with k = 2 h = 3

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 2: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

| 264

gt

copy A Behrend Foundations of Information Systems |

What is in the Lecture

1 Database Usage Query Programming Design

2 Database Architecture Indexes Transactions Query Processing

| 265

gt

copy A Behrend Foundations of Information Systems |

How is Database System build

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

byte[] b = read(File f int pos int length)

| 266

gt

copy A Behrend Foundations of Information Systems |

Storage System

Buffer

File System

Hardware

Data System

Application

Architectural Blue Print

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

copy A Behrend Foundations of Information Systems | 267

gt

|

Architectural Trends

| 268

gt

copy A Behrend Foundations of Information Systems |

Different Access Characteristics

OLTP (On-line Transaction Processing) Mix between read-only and update queries Minor analysis tasks Used for data preservation and lookup Read typically only a few records at a time High performance by storing contiguous records in disk pages

OLAP (On-line Analytical Processing) Query-intensive DBMS applications Infrequent batch-oriented updates Complex analysis on large data volumes Read typically only a few attributes of large amounts of historical data in order to

partition them and compute aggregates High performance by storing contiguous values of a single attribute

| 269

gt

copy A Behrend Foundations of Information Systems |

Hardware Developments

Hardware improvements not equally distributed Advances in CPU speed have outpaced advances

in RAM latency Main-memory access has become a performance

bottleneck for many computer applications Bandwidth Latency Address translation (TLB)

rarr Memory Wall Cache memories can reduce the memory latency

when the requested data is found in the cache Vertically fragmented data structures optimize

memory cache usage

| 270

gt

copy A Behrend Foundations of Information Systems |

Row Storage vs Column Storage

Row Storage + easy to addmodify a record - might read unnecessary data

Column Storage + only need to read in relevant data - tuple writes require multiple accesses -gt suitable for read-mostly read-intensive large data repositories

| 271

gt

copy A Behrend Foundations of Information Systems |

Processing Models

[Marcin Zukowski Peter A Boncz Niels Nes Saacutendor Heacuteman MonetDBX100 - A DBMS In The CPU Cache IEEE Data Eng Bull 28(2) p17-22 2005]

| 272

gt

copy A Behrend Foundations of Information Systems |

Transaction Management

Principle of a transaction Sequence of successive DB operations that transform a database from a consistent

state into another consistent state surrounded by BOT EOT (Commit Abort)

Properties ACID Atomicity Consistency Isolation Durability A transaction will always come to an end Normal (commit) changes are permanently stored within the DB Abnormal (abort rollback) already composed changes are taken back

Note EOT state must not be different from BOT state

BOT(begin of transaction)

EOT(end of transaction)

possibly inconsistent database

consistentdatabase

consistentdatabase

DB DB

DML operations

| 273

gt

copy A Behrend Foundations of Information Systems |

ACID Properties of Transactions

Atomicity Indivisibility due to the transaction definition (Begin - End) All-or-nothing principle ie the DBS guarantees Either the complete execution of a transaction hellip hellip or the ineffectiveness of the whole transaction (and of all associated operations)

Consistency A successful transaction guarantees that all consistency requirements (integrity

requirements) have been met

Isolation Multiple transactions run isolated from each other and do not use (inconsistent)

intermediate results from other transactions

Durability All results of successful transactions have to be made persistent

| 274

gt

copy A Behrend Foundations of Information Systems |

Motivation

Atomicity Part of the transaction is done but we want to cancel it ABORTROLLBACK System crashes during transaction some changes made it to the disk some did not

Durability Transaction finished user notified COMMIT System crashes before changes sent successfully to disk (asynchronous write)

Consistency Physical consistency Correctness of the storage and access structures Completely executed modification operations preserve the consistency

Logical consistency Correctness of data contents ndash correspond to a (possible) state of the real world Completely executed transactions preserve the logical consistency

- All modifications of finished transactions are included - No modifications of open transactions are included

Remember Logical consistency requires physical consistency in the first place

UNDO Recovery

REDO Recovery

UNDO Recovery for consistency-related rollbacks

| 275

gt

copy A Behrend Foundations of Information Systems |

Reasons for crashes

Transaction error Violation of system restrictions Violation of security regulations Excessive resource requirements deadlocks

Application-related errors eg wrong operations and values ROLLBACK

System error System crash with loss of main-memory contents Database system operating system hardware power failure

Device error (especially storage-medium error) Destruction of secondary storage systems

Catastrophes Destruction of the computing center

| 276

gt

copy A Behrend Foundations of Information Systems |

Guarantee Atomicity amp Durability

Assumptions System may crash but the disk is durable The only atomicity guarantee is that a disk block write is atomic

Materialization strategy Preferred Policy StealNo Force This combination is most complicated but allows for highest performance No Force complicates enforcing Durability What if system crashes before a modified page written by a committed

transaction makes it to disk Write as little as possible in a convenient place at commit time to support

REDOing modifications Steal complicates enforcing Atomicity What if the transaction that performed udpates aborts What if system crashes before transaction is finished Must remember the old value of P (to support UNDOing the write to page P)

copy A Behrend Foundations of Information Systems | 277

gt

|

Storage System

Buffer

File System

Hardware

Data System

Application

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

Record Management

| 278

gt

copy A Behrend Foundations of Information Systems |

Record

Record Package of fields that together describe a thing a person a fact etc Each fields represents on property of the entity described by the record Similar to a struct in C Variable length (in contrast to pages)

Record Manager Organizes physical storage of records in pages Operations Get Insert Update Delete Scan Agnostic to record structure and semantic

records considered as byte strings of variable length Structure and content of record is defined be Access System and application

Challenges Record addressing Free space management

| 279

gt

copy A Behrend Foundations of Information Systems |

Record Addressing

Record address Identifier for records used to address records eg in indexes or query processing Assigned during insert of a record

Goals Stability of identifier Fast and direct access Less organizational overhead

Direct addressing Byte address or position number in file or page Instable Byte address If record grows in length following records would get new address Position number Insert and delete operations change series or records

Indirect addressing Surrogate with mapping table (complete indirection) Tuple Identifier (TID concept)

| 280

gt

copy A Behrend Foundations of Information Systems |

Surrogate with Mapping Table

Surrogate Record type + serial number Serial number remains constant during recordrsquos life time

Mapping table Maps

surrogate to page

Problems Where to store mapping table How can it be extended How to search mapping table efficiently

rarr H2 use B-Tree to store mapping table

Mapping Table Surrogate | Page ID

| 281

gt

copy A Behrend Foundations of Information Systems |

TID Concept

Record addressing with indirection inside the page Each page contains an array with record positions TID of a record consist of page id and index in position array

Pros Access with one page access (two pages in case of overflow) Stable No mapping table required

Operations Insert Reuse unused

position or add position Delete Mark position

as unused in array Update Update all

positions in array Update with overflow Store record

as overflow record and store TID of overflow record at original position (No double overflow Update TID at original position)

Record

Overflow Record

| 282

gt

copy A Behrend Foundations of Information Systems |

Free Space Management

Problem In which page is enough space for new record

Solution Free space table lists for all pages how much space is left

Free space value Precise value Ceil(Log2(page size)) =gt 2 bytes for common page size of 4K Rough value use less bytes free space = (value page size)2^(bits per value)

Free space table With direct page addressing Assuming a single page can take n free space entries First page and each (n+1)-th page takes free space entries

With indirect page addressing Free space information stored in page table

copy A Behrend Foundations of Information Systems | 283

gt

|

Storage System

Buffer

File System

Hardware

Data System

Application

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

Physical Access Paths ndash Index Structures

| 284

gt

copy A Behrend Foundations of Information Systems |

Primary Index Secondary Index

Overview Indexes

Table scan Read all pages and for each record

evaluate the search criteria Pre-fetching

Index Scan Use index for search criteria

on one or more attributes Fast access to single values or value ranges of index attributes Logicalphysical sorting of values of key attributes (depending on index structure) Enforcing uniqueness

Types if indexes Primary (Clustered) Index

determines physical organization use for PK Secondary (Non-Clustered)

Index redundant access path

Pers(PID NAME AGE SALARY)

Age

Salary

| 285

gt

copy A Behrend Foundations of Information Systems |

Overview Indexes (2)

Choice of Access Paths Index scan Only useful for low selectivity

(low number of result tuples) Break even-point according to the

output ratio of the number of tuples (usually max 5) Requires statistics about data Additional costs for index storage

and updating

Table Scan adequateefficient for small tables

(eg 5 pages) Queries with high selectivity

(large result sets) 100-200MBs sequential read

~ 100 disk seekss

hit rate

Index Scan

Table Scan

access time

| 286

gt

copy A Behrend Foundations of Information Systems |

Classification of Index Structures

Classification

Multiway Trees Tree structure with multiple children per node Idea chose fan out so that node size suits page size

Onedimensional Index Structures

Key Comparison Key Transformation

Sequential Tree-Based Hash-Based

Prefix Trees (Tries)

Binary Search Trees Dynamic Static Linked Lists

(log seq) Seq Lists

(phys seq) Multiway

Trees Example B-Tree

| 287

gt

copy A Behrend Foundations of Information Systems |

B-Tree

free space

(Ki Di Pi) = entry min |P| = k+1 max |P| = 2k+1

keys lt K1 keys gt Kp Ki lt keys lt Ki+1

| 288

gt

copy A Behrend Foundations of Information Systems |

B-Tree (2)

Example Keys Agnostic to specific key semantic Only defined complete order required Could be of fixed or variable length

Operations Search for data for given key value Insertion and deletion of key-data pair

Payload Agnostic to specific data semantic Can be record or reference (TID) or

mix

B-Tree with k = 2 h = 3

| 289

gt

copy A Behrend Foundations of Information Systems |

Search in the B-Tree

Starting at the root node each node is searched from left to right 1) if Ki matches the desired key value the data record has been found (further

records with the same key value might be located in a sub-tree to which Pi-1 points) 2) if Ki is smaller than the desired value the search will be continued in the root of

the sub-tree identified by Pi-1 3) if Ki is larger than the desired value the comparison with Ki+1 is repeated 4) if K2k is also smaller than the desired value the search will be continued in the

sub-tree of P2k If itlsquos impossible to descend further into a sub-tree (2 or 4) (leaf node) The search is aborted no record with the desired key value is found

Search for 38 20 6

| 290

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (1)

Insertion Rule insert only into leaf nodes At Non-Leaf Nodes descend down the tree as for the search S le Ki follow Pi-1 S gt Ki check Ki+1 S gt K2k follow P2k

At Leaf Node Insert the data record according to the sorting order Special case leaf node is full (2k records) rarr split the leaf node

Splitting Generate a new leaf node Split the 2k+1 entries (in order)

into two leaf nodes first k entries rarr left node last k entries rarr right node

middle entry (k+1-th) is used as new ldquodiscriminatorrdquo (branching) and inserted into the parent node

| 291

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (2)

Node Splitting during Insertion Two possible situations after a split The parent node is full rarr repeat split on this level Enough space rarr FINISHED

Special case root split Split of the root node rarr New root with two successor nodes Height of a tree grows by 1 The tree has been split from the bottom to the top

Dynamic reorganization (self-balancing) No unloading or loading necessary Tree is always balanced But In case of many insertions deletions reorganization can be beneficial

| 292

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (3)

Insertion Example Order k = 1 n=2k Keys 1 5 2 6 7 4 8 3

Finally h=3

| 293

gt

copy A Behrend Foundations of Information Systems |

Insertion and Deletion in the B-Tree

Problem Insertion can create overflow Deletion can create underflow and overflow

Example Insertion of key 22

rarr Overflow rarr Split

Deletion of key 22 rarr Underflow need to access all four nodes finally same as input

Insert 22

| 294

gt

copy A Behrend Foundations of Information Systems |

Underflow Merge

Deletion in the B-Tree

Example Order k = 1 n=2k Delete key 3

| 295

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (2)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Remember Each path from the root to the leaf has the same length h

| 296

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (3)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Overflow Split

| 297

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (4)

Example Order k = 1 n=2k Delete key 3

Overflow Split

Root Split

| 298

gt

copy A Behrend Foundations of Information Systems |

Deletion Algorithm

Example ndash there are different algorithms Search the node that contains the key K to be deleted If key K is in a leaf node delete the key in the leaf node and handle potentially

resulting underflow by merging with sibling If key K is in an inner node pull up new discriminator from one of the successors Analyze which successor node of K has more elements left or right one

If both have the same number of elements decide for one Replace the key K to be deleted with the direct successor Krsquo from the left

successor node or with the direct successor Krsquorsquo from the right successor node respectively Delete Krsquo or Krsquorsquo from the respective successor node (recursively)

Note Major variants Merge (tis lecture) Re-distribution (instead of splitmerge in case of overflowunderflow the entries are

re-distributed under consideration of one or multiple adjacent nodes)

| 299

gt

copy A Behrend Foundations of Information Systems |

B-Trees B+-Trees and B-Trees

B+-Trees and B-Trees Data is only in leaf nodes Key redundancy but higher fan-out rarr lower tree high less IO Simpler delete procedure rarr requires only merging of nodes

Double linked list of all leaf nodes B-Trees Modified valid node sizes

from [k2k] to [43k2k] rarr better node utilization but more splitsmerges

Example Secondary index Non unique

B-Tree with k = 2 h = 3

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 3: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

| 265

gt

copy A Behrend Foundations of Information Systems |

How is Database System build

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

byte[] b = read(File f int pos int length)

| 266

gt

copy A Behrend Foundations of Information Systems |

Storage System

Buffer

File System

Hardware

Data System

Application

Architectural Blue Print

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

copy A Behrend Foundations of Information Systems | 267

gt

|

Architectural Trends

| 268

gt

copy A Behrend Foundations of Information Systems |

Different Access Characteristics

OLTP (On-line Transaction Processing) Mix between read-only and update queries Minor analysis tasks Used for data preservation and lookup Read typically only a few records at a time High performance by storing contiguous records in disk pages

OLAP (On-line Analytical Processing) Query-intensive DBMS applications Infrequent batch-oriented updates Complex analysis on large data volumes Read typically only a few attributes of large amounts of historical data in order to

partition them and compute aggregates High performance by storing contiguous values of a single attribute

| 269

gt

copy A Behrend Foundations of Information Systems |

Hardware Developments

Hardware improvements not equally distributed Advances in CPU speed have outpaced advances

in RAM latency Main-memory access has become a performance

bottleneck for many computer applications Bandwidth Latency Address translation (TLB)

rarr Memory Wall Cache memories can reduce the memory latency

when the requested data is found in the cache Vertically fragmented data structures optimize

memory cache usage

| 270

gt

copy A Behrend Foundations of Information Systems |

Row Storage vs Column Storage

Row Storage + easy to addmodify a record - might read unnecessary data

Column Storage + only need to read in relevant data - tuple writes require multiple accesses -gt suitable for read-mostly read-intensive large data repositories

| 271

gt

copy A Behrend Foundations of Information Systems |

Processing Models

[Marcin Zukowski Peter A Boncz Niels Nes Saacutendor Heacuteman MonetDBX100 - A DBMS In The CPU Cache IEEE Data Eng Bull 28(2) p17-22 2005]

| 272

gt

copy A Behrend Foundations of Information Systems |

Transaction Management

Principle of a transaction Sequence of successive DB operations that transform a database from a consistent

state into another consistent state surrounded by BOT EOT (Commit Abort)

Properties ACID Atomicity Consistency Isolation Durability A transaction will always come to an end Normal (commit) changes are permanently stored within the DB Abnormal (abort rollback) already composed changes are taken back

Note EOT state must not be different from BOT state

BOT(begin of transaction)

EOT(end of transaction)

possibly inconsistent database

consistentdatabase

consistentdatabase

DB DB

DML operations

| 273

gt

copy A Behrend Foundations of Information Systems |

ACID Properties of Transactions

Atomicity Indivisibility due to the transaction definition (Begin - End) All-or-nothing principle ie the DBS guarantees Either the complete execution of a transaction hellip hellip or the ineffectiveness of the whole transaction (and of all associated operations)

Consistency A successful transaction guarantees that all consistency requirements (integrity

requirements) have been met

Isolation Multiple transactions run isolated from each other and do not use (inconsistent)

intermediate results from other transactions

Durability All results of successful transactions have to be made persistent

| 274

gt

copy A Behrend Foundations of Information Systems |

Motivation

Atomicity Part of the transaction is done but we want to cancel it ABORTROLLBACK System crashes during transaction some changes made it to the disk some did not

Durability Transaction finished user notified COMMIT System crashes before changes sent successfully to disk (asynchronous write)

Consistency Physical consistency Correctness of the storage and access structures Completely executed modification operations preserve the consistency

Logical consistency Correctness of data contents ndash correspond to a (possible) state of the real world Completely executed transactions preserve the logical consistency

- All modifications of finished transactions are included - No modifications of open transactions are included

Remember Logical consistency requires physical consistency in the first place

UNDO Recovery

REDO Recovery

UNDO Recovery for consistency-related rollbacks

| 275

gt

copy A Behrend Foundations of Information Systems |

Reasons for crashes

Transaction error Violation of system restrictions Violation of security regulations Excessive resource requirements deadlocks

Application-related errors eg wrong operations and values ROLLBACK

System error System crash with loss of main-memory contents Database system operating system hardware power failure

Device error (especially storage-medium error) Destruction of secondary storage systems

Catastrophes Destruction of the computing center

| 276

gt

copy A Behrend Foundations of Information Systems |

Guarantee Atomicity amp Durability

Assumptions System may crash but the disk is durable The only atomicity guarantee is that a disk block write is atomic

Materialization strategy Preferred Policy StealNo Force This combination is most complicated but allows for highest performance No Force complicates enforcing Durability What if system crashes before a modified page written by a committed

transaction makes it to disk Write as little as possible in a convenient place at commit time to support

REDOing modifications Steal complicates enforcing Atomicity What if the transaction that performed udpates aborts What if system crashes before transaction is finished Must remember the old value of P (to support UNDOing the write to page P)

copy A Behrend Foundations of Information Systems | 277

gt

|

Storage System

Buffer

File System

Hardware

Data System

Application

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

Record Management

| 278

gt

copy A Behrend Foundations of Information Systems |

Record

Record Package of fields that together describe a thing a person a fact etc Each fields represents on property of the entity described by the record Similar to a struct in C Variable length (in contrast to pages)

Record Manager Organizes physical storage of records in pages Operations Get Insert Update Delete Scan Agnostic to record structure and semantic

records considered as byte strings of variable length Structure and content of record is defined be Access System and application

Challenges Record addressing Free space management

| 279

gt

copy A Behrend Foundations of Information Systems |

Record Addressing

Record address Identifier for records used to address records eg in indexes or query processing Assigned during insert of a record

Goals Stability of identifier Fast and direct access Less organizational overhead

Direct addressing Byte address or position number in file or page Instable Byte address If record grows in length following records would get new address Position number Insert and delete operations change series or records

Indirect addressing Surrogate with mapping table (complete indirection) Tuple Identifier (TID concept)

| 280

gt

copy A Behrend Foundations of Information Systems |

Surrogate with Mapping Table

Surrogate Record type + serial number Serial number remains constant during recordrsquos life time

Mapping table Maps

surrogate to page

Problems Where to store mapping table How can it be extended How to search mapping table efficiently

rarr H2 use B-Tree to store mapping table

Mapping Table Surrogate | Page ID

| 281

gt

copy A Behrend Foundations of Information Systems |

TID Concept

Record addressing with indirection inside the page Each page contains an array with record positions TID of a record consist of page id and index in position array

Pros Access with one page access (two pages in case of overflow) Stable No mapping table required

Operations Insert Reuse unused

position or add position Delete Mark position

as unused in array Update Update all

positions in array Update with overflow Store record

as overflow record and store TID of overflow record at original position (No double overflow Update TID at original position)

Record

Overflow Record

| 282

gt

copy A Behrend Foundations of Information Systems |

Free Space Management

Problem In which page is enough space for new record

Solution Free space table lists for all pages how much space is left

Free space value Precise value Ceil(Log2(page size)) =gt 2 bytes for common page size of 4K Rough value use less bytes free space = (value page size)2^(bits per value)

Free space table With direct page addressing Assuming a single page can take n free space entries First page and each (n+1)-th page takes free space entries

With indirect page addressing Free space information stored in page table

copy A Behrend Foundations of Information Systems | 283

gt

|

Storage System

Buffer

File System

Hardware

Data System

Application

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

Physical Access Paths ndash Index Structures

| 284

gt

copy A Behrend Foundations of Information Systems |

Primary Index Secondary Index

Overview Indexes

Table scan Read all pages and for each record

evaluate the search criteria Pre-fetching

Index Scan Use index for search criteria

on one or more attributes Fast access to single values or value ranges of index attributes Logicalphysical sorting of values of key attributes (depending on index structure) Enforcing uniqueness

Types if indexes Primary (Clustered) Index

determines physical organization use for PK Secondary (Non-Clustered)

Index redundant access path

Pers(PID NAME AGE SALARY)

Age

Salary

| 285

gt

copy A Behrend Foundations of Information Systems |

Overview Indexes (2)

Choice of Access Paths Index scan Only useful for low selectivity

(low number of result tuples) Break even-point according to the

output ratio of the number of tuples (usually max 5) Requires statistics about data Additional costs for index storage

and updating

Table Scan adequateefficient for small tables

(eg 5 pages) Queries with high selectivity

(large result sets) 100-200MBs sequential read

~ 100 disk seekss

hit rate

Index Scan

Table Scan

access time

| 286

gt

copy A Behrend Foundations of Information Systems |

Classification of Index Structures

Classification

Multiway Trees Tree structure with multiple children per node Idea chose fan out so that node size suits page size

Onedimensional Index Structures

Key Comparison Key Transformation

Sequential Tree-Based Hash-Based

Prefix Trees (Tries)

Binary Search Trees Dynamic Static Linked Lists

(log seq) Seq Lists

(phys seq) Multiway

Trees Example B-Tree

| 287

gt

copy A Behrend Foundations of Information Systems |

B-Tree

free space

(Ki Di Pi) = entry min |P| = k+1 max |P| = 2k+1

keys lt K1 keys gt Kp Ki lt keys lt Ki+1

| 288

gt

copy A Behrend Foundations of Information Systems |

B-Tree (2)

Example Keys Agnostic to specific key semantic Only defined complete order required Could be of fixed or variable length

Operations Search for data for given key value Insertion and deletion of key-data pair

Payload Agnostic to specific data semantic Can be record or reference (TID) or

mix

B-Tree with k = 2 h = 3

| 289

gt

copy A Behrend Foundations of Information Systems |

Search in the B-Tree

Starting at the root node each node is searched from left to right 1) if Ki matches the desired key value the data record has been found (further

records with the same key value might be located in a sub-tree to which Pi-1 points) 2) if Ki is smaller than the desired value the search will be continued in the root of

the sub-tree identified by Pi-1 3) if Ki is larger than the desired value the comparison with Ki+1 is repeated 4) if K2k is also smaller than the desired value the search will be continued in the

sub-tree of P2k If itlsquos impossible to descend further into a sub-tree (2 or 4) (leaf node) The search is aborted no record with the desired key value is found

Search for 38 20 6

| 290

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (1)

Insertion Rule insert only into leaf nodes At Non-Leaf Nodes descend down the tree as for the search S le Ki follow Pi-1 S gt Ki check Ki+1 S gt K2k follow P2k

At Leaf Node Insert the data record according to the sorting order Special case leaf node is full (2k records) rarr split the leaf node

Splitting Generate a new leaf node Split the 2k+1 entries (in order)

into two leaf nodes first k entries rarr left node last k entries rarr right node

middle entry (k+1-th) is used as new ldquodiscriminatorrdquo (branching) and inserted into the parent node

| 291

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (2)

Node Splitting during Insertion Two possible situations after a split The parent node is full rarr repeat split on this level Enough space rarr FINISHED

Special case root split Split of the root node rarr New root with two successor nodes Height of a tree grows by 1 The tree has been split from the bottom to the top

Dynamic reorganization (self-balancing) No unloading or loading necessary Tree is always balanced But In case of many insertions deletions reorganization can be beneficial

| 292

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (3)

Insertion Example Order k = 1 n=2k Keys 1 5 2 6 7 4 8 3

Finally h=3

| 293

gt

copy A Behrend Foundations of Information Systems |

Insertion and Deletion in the B-Tree

Problem Insertion can create overflow Deletion can create underflow and overflow

Example Insertion of key 22

rarr Overflow rarr Split

Deletion of key 22 rarr Underflow need to access all four nodes finally same as input

Insert 22

| 294

gt

copy A Behrend Foundations of Information Systems |

Underflow Merge

Deletion in the B-Tree

Example Order k = 1 n=2k Delete key 3

| 295

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (2)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Remember Each path from the root to the leaf has the same length h

| 296

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (3)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Overflow Split

| 297

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (4)

Example Order k = 1 n=2k Delete key 3

Overflow Split

Root Split

| 298

gt

copy A Behrend Foundations of Information Systems |

Deletion Algorithm

Example ndash there are different algorithms Search the node that contains the key K to be deleted If key K is in a leaf node delete the key in the leaf node and handle potentially

resulting underflow by merging with sibling If key K is in an inner node pull up new discriminator from one of the successors Analyze which successor node of K has more elements left or right one

If both have the same number of elements decide for one Replace the key K to be deleted with the direct successor Krsquo from the left

successor node or with the direct successor Krsquorsquo from the right successor node respectively Delete Krsquo or Krsquorsquo from the respective successor node (recursively)

Note Major variants Merge (tis lecture) Re-distribution (instead of splitmerge in case of overflowunderflow the entries are

re-distributed under consideration of one or multiple adjacent nodes)

| 299

gt

copy A Behrend Foundations of Information Systems |

B-Trees B+-Trees and B-Trees

B+-Trees and B-Trees Data is only in leaf nodes Key redundancy but higher fan-out rarr lower tree high less IO Simpler delete procedure rarr requires only merging of nodes

Double linked list of all leaf nodes B-Trees Modified valid node sizes

from [k2k] to [43k2k] rarr better node utilization but more splitsmerges

Example Secondary index Non unique

B-Tree with k = 2 h = 3

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 4: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

| 266

gt

copy A Behrend Foundations of Information Systems |

Storage System

Buffer

File System

Hardware

Data System

Application

Architectural Blue Print

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

copy A Behrend Foundations of Information Systems | 267

gt

|

Architectural Trends

| 268

gt

copy A Behrend Foundations of Information Systems |

Different Access Characteristics

OLTP (On-line Transaction Processing) Mix between read-only and update queries Minor analysis tasks Used for data preservation and lookup Read typically only a few records at a time High performance by storing contiguous records in disk pages

OLAP (On-line Analytical Processing) Query-intensive DBMS applications Infrequent batch-oriented updates Complex analysis on large data volumes Read typically only a few attributes of large amounts of historical data in order to

partition them and compute aggregates High performance by storing contiguous values of a single attribute

| 269

gt

copy A Behrend Foundations of Information Systems |

Hardware Developments

Hardware improvements not equally distributed Advances in CPU speed have outpaced advances

in RAM latency Main-memory access has become a performance

bottleneck for many computer applications Bandwidth Latency Address translation (TLB)

rarr Memory Wall Cache memories can reduce the memory latency

when the requested data is found in the cache Vertically fragmented data structures optimize

memory cache usage

| 270

gt

copy A Behrend Foundations of Information Systems |

Row Storage vs Column Storage

Row Storage + easy to addmodify a record - might read unnecessary data

Column Storage + only need to read in relevant data - tuple writes require multiple accesses -gt suitable for read-mostly read-intensive large data repositories

| 271

gt

copy A Behrend Foundations of Information Systems |

Processing Models

[Marcin Zukowski Peter A Boncz Niels Nes Saacutendor Heacuteman MonetDBX100 - A DBMS In The CPU Cache IEEE Data Eng Bull 28(2) p17-22 2005]

| 272

gt

copy A Behrend Foundations of Information Systems |

Transaction Management

Principle of a transaction Sequence of successive DB operations that transform a database from a consistent

state into another consistent state surrounded by BOT EOT (Commit Abort)

Properties ACID Atomicity Consistency Isolation Durability A transaction will always come to an end Normal (commit) changes are permanently stored within the DB Abnormal (abort rollback) already composed changes are taken back

Note EOT state must not be different from BOT state

BOT(begin of transaction)

EOT(end of transaction)

possibly inconsistent database

consistentdatabase

consistentdatabase

DB DB

DML operations

| 273

gt

copy A Behrend Foundations of Information Systems |

ACID Properties of Transactions

Atomicity Indivisibility due to the transaction definition (Begin - End) All-or-nothing principle ie the DBS guarantees Either the complete execution of a transaction hellip hellip or the ineffectiveness of the whole transaction (and of all associated operations)

Consistency A successful transaction guarantees that all consistency requirements (integrity

requirements) have been met

Isolation Multiple transactions run isolated from each other and do not use (inconsistent)

intermediate results from other transactions

Durability All results of successful transactions have to be made persistent

| 274

gt

copy A Behrend Foundations of Information Systems |

Motivation

Atomicity Part of the transaction is done but we want to cancel it ABORTROLLBACK System crashes during transaction some changes made it to the disk some did not

Durability Transaction finished user notified COMMIT System crashes before changes sent successfully to disk (asynchronous write)

Consistency Physical consistency Correctness of the storage and access structures Completely executed modification operations preserve the consistency

Logical consistency Correctness of data contents ndash correspond to a (possible) state of the real world Completely executed transactions preserve the logical consistency

- All modifications of finished transactions are included - No modifications of open transactions are included

Remember Logical consistency requires physical consistency in the first place

UNDO Recovery

REDO Recovery

UNDO Recovery for consistency-related rollbacks

| 275

gt

copy A Behrend Foundations of Information Systems |

Reasons for crashes

Transaction error Violation of system restrictions Violation of security regulations Excessive resource requirements deadlocks

Application-related errors eg wrong operations and values ROLLBACK

System error System crash with loss of main-memory contents Database system operating system hardware power failure

Device error (especially storage-medium error) Destruction of secondary storage systems

Catastrophes Destruction of the computing center

| 276

gt

copy A Behrend Foundations of Information Systems |

Guarantee Atomicity amp Durability

Assumptions System may crash but the disk is durable The only atomicity guarantee is that a disk block write is atomic

Materialization strategy Preferred Policy StealNo Force This combination is most complicated but allows for highest performance No Force complicates enforcing Durability What if system crashes before a modified page written by a committed

transaction makes it to disk Write as little as possible in a convenient place at commit time to support

REDOing modifications Steal complicates enforcing Atomicity What if the transaction that performed udpates aborts What if system crashes before transaction is finished Must remember the old value of P (to support UNDOing the write to page P)

copy A Behrend Foundations of Information Systems | 277

gt

|

Storage System

Buffer

File System

Hardware

Data System

Application

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

Record Management

| 278

gt

copy A Behrend Foundations of Information Systems |

Record

Record Package of fields that together describe a thing a person a fact etc Each fields represents on property of the entity described by the record Similar to a struct in C Variable length (in contrast to pages)

Record Manager Organizes physical storage of records in pages Operations Get Insert Update Delete Scan Agnostic to record structure and semantic

records considered as byte strings of variable length Structure and content of record is defined be Access System and application

Challenges Record addressing Free space management

| 279

gt

copy A Behrend Foundations of Information Systems |

Record Addressing

Record address Identifier for records used to address records eg in indexes or query processing Assigned during insert of a record

Goals Stability of identifier Fast and direct access Less organizational overhead

Direct addressing Byte address or position number in file or page Instable Byte address If record grows in length following records would get new address Position number Insert and delete operations change series or records

Indirect addressing Surrogate with mapping table (complete indirection) Tuple Identifier (TID concept)

| 280

gt

copy A Behrend Foundations of Information Systems |

Surrogate with Mapping Table

Surrogate Record type + serial number Serial number remains constant during recordrsquos life time

Mapping table Maps

surrogate to page

Problems Where to store mapping table How can it be extended How to search mapping table efficiently

rarr H2 use B-Tree to store mapping table

Mapping Table Surrogate | Page ID

| 281

gt

copy A Behrend Foundations of Information Systems |

TID Concept

Record addressing with indirection inside the page Each page contains an array with record positions TID of a record consist of page id and index in position array

Pros Access with one page access (two pages in case of overflow) Stable No mapping table required

Operations Insert Reuse unused

position or add position Delete Mark position

as unused in array Update Update all

positions in array Update with overflow Store record

as overflow record and store TID of overflow record at original position (No double overflow Update TID at original position)

Record

Overflow Record

| 282

gt

copy A Behrend Foundations of Information Systems |

Free Space Management

Problem In which page is enough space for new record

Solution Free space table lists for all pages how much space is left

Free space value Precise value Ceil(Log2(page size)) =gt 2 bytes for common page size of 4K Rough value use less bytes free space = (value page size)2^(bits per value)

Free space table With direct page addressing Assuming a single page can take n free space entries First page and each (n+1)-th page takes free space entries

With indirect page addressing Free space information stored in page table

copy A Behrend Foundations of Information Systems | 283

gt

|

Storage System

Buffer

File System

Hardware

Data System

Application

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

Physical Access Paths ndash Index Structures

| 284

gt

copy A Behrend Foundations of Information Systems |

Primary Index Secondary Index

Overview Indexes

Table scan Read all pages and for each record

evaluate the search criteria Pre-fetching

Index Scan Use index for search criteria

on one or more attributes Fast access to single values or value ranges of index attributes Logicalphysical sorting of values of key attributes (depending on index structure) Enforcing uniqueness

Types if indexes Primary (Clustered) Index

determines physical organization use for PK Secondary (Non-Clustered)

Index redundant access path

Pers(PID NAME AGE SALARY)

Age

Salary

| 285

gt

copy A Behrend Foundations of Information Systems |

Overview Indexes (2)

Choice of Access Paths Index scan Only useful for low selectivity

(low number of result tuples) Break even-point according to the

output ratio of the number of tuples (usually max 5) Requires statistics about data Additional costs for index storage

and updating

Table Scan adequateefficient for small tables

(eg 5 pages) Queries with high selectivity

(large result sets) 100-200MBs sequential read

~ 100 disk seekss

hit rate

Index Scan

Table Scan

access time

| 286

gt

copy A Behrend Foundations of Information Systems |

Classification of Index Structures

Classification

Multiway Trees Tree structure with multiple children per node Idea chose fan out so that node size suits page size

Onedimensional Index Structures

Key Comparison Key Transformation

Sequential Tree-Based Hash-Based

Prefix Trees (Tries)

Binary Search Trees Dynamic Static Linked Lists

(log seq) Seq Lists

(phys seq) Multiway

Trees Example B-Tree

| 287

gt

copy A Behrend Foundations of Information Systems |

B-Tree

free space

(Ki Di Pi) = entry min |P| = k+1 max |P| = 2k+1

keys lt K1 keys gt Kp Ki lt keys lt Ki+1

| 288

gt

copy A Behrend Foundations of Information Systems |

B-Tree (2)

Example Keys Agnostic to specific key semantic Only defined complete order required Could be of fixed or variable length

Operations Search for data for given key value Insertion and deletion of key-data pair

Payload Agnostic to specific data semantic Can be record or reference (TID) or

mix

B-Tree with k = 2 h = 3

| 289

gt

copy A Behrend Foundations of Information Systems |

Search in the B-Tree

Starting at the root node each node is searched from left to right 1) if Ki matches the desired key value the data record has been found (further

records with the same key value might be located in a sub-tree to which Pi-1 points) 2) if Ki is smaller than the desired value the search will be continued in the root of

the sub-tree identified by Pi-1 3) if Ki is larger than the desired value the comparison with Ki+1 is repeated 4) if K2k is also smaller than the desired value the search will be continued in the

sub-tree of P2k If itlsquos impossible to descend further into a sub-tree (2 or 4) (leaf node) The search is aborted no record with the desired key value is found

Search for 38 20 6

| 290

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (1)

Insertion Rule insert only into leaf nodes At Non-Leaf Nodes descend down the tree as for the search S le Ki follow Pi-1 S gt Ki check Ki+1 S gt K2k follow P2k

At Leaf Node Insert the data record according to the sorting order Special case leaf node is full (2k records) rarr split the leaf node

Splitting Generate a new leaf node Split the 2k+1 entries (in order)

into two leaf nodes first k entries rarr left node last k entries rarr right node

middle entry (k+1-th) is used as new ldquodiscriminatorrdquo (branching) and inserted into the parent node

| 291

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (2)

Node Splitting during Insertion Two possible situations after a split The parent node is full rarr repeat split on this level Enough space rarr FINISHED

Special case root split Split of the root node rarr New root with two successor nodes Height of a tree grows by 1 The tree has been split from the bottom to the top

Dynamic reorganization (self-balancing) No unloading or loading necessary Tree is always balanced But In case of many insertions deletions reorganization can be beneficial

| 292

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (3)

Insertion Example Order k = 1 n=2k Keys 1 5 2 6 7 4 8 3

Finally h=3

| 293

gt

copy A Behrend Foundations of Information Systems |

Insertion and Deletion in the B-Tree

Problem Insertion can create overflow Deletion can create underflow and overflow

Example Insertion of key 22

rarr Overflow rarr Split

Deletion of key 22 rarr Underflow need to access all four nodes finally same as input

Insert 22

| 294

gt

copy A Behrend Foundations of Information Systems |

Underflow Merge

Deletion in the B-Tree

Example Order k = 1 n=2k Delete key 3

| 295

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (2)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Remember Each path from the root to the leaf has the same length h

| 296

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (3)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Overflow Split

| 297

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (4)

Example Order k = 1 n=2k Delete key 3

Overflow Split

Root Split

| 298

gt

copy A Behrend Foundations of Information Systems |

Deletion Algorithm

Example ndash there are different algorithms Search the node that contains the key K to be deleted If key K is in a leaf node delete the key in the leaf node and handle potentially

resulting underflow by merging with sibling If key K is in an inner node pull up new discriminator from one of the successors Analyze which successor node of K has more elements left or right one

If both have the same number of elements decide for one Replace the key K to be deleted with the direct successor Krsquo from the left

successor node or with the direct successor Krsquorsquo from the right successor node respectively Delete Krsquo or Krsquorsquo from the respective successor node (recursively)

Note Major variants Merge (tis lecture) Re-distribution (instead of splitmerge in case of overflowunderflow the entries are

re-distributed under consideration of one or multiple adjacent nodes)

| 299

gt

copy A Behrend Foundations of Information Systems |

B-Trees B+-Trees and B-Trees

B+-Trees and B-Trees Data is only in leaf nodes Key redundancy but higher fan-out rarr lower tree high less IO Simpler delete procedure rarr requires only merging of nodes

Double linked list of all leaf nodes B-Trees Modified valid node sizes

from [k2k] to [43k2k] rarr better node utilization but more splitsmerges

Example Secondary index Non unique

B-Tree with k = 2 h = 3

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 5: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

copy A Behrend Foundations of Information Systems | 267

gt

|

Architectural Trends

| 268

gt

copy A Behrend Foundations of Information Systems |

Different Access Characteristics

OLTP (On-line Transaction Processing) Mix between read-only and update queries Minor analysis tasks Used for data preservation and lookup Read typically only a few records at a time High performance by storing contiguous records in disk pages

OLAP (On-line Analytical Processing) Query-intensive DBMS applications Infrequent batch-oriented updates Complex analysis on large data volumes Read typically only a few attributes of large amounts of historical data in order to

partition them and compute aggregates High performance by storing contiguous values of a single attribute

| 269

gt

copy A Behrend Foundations of Information Systems |

Hardware Developments

Hardware improvements not equally distributed Advances in CPU speed have outpaced advances

in RAM latency Main-memory access has become a performance

bottleneck for many computer applications Bandwidth Latency Address translation (TLB)

rarr Memory Wall Cache memories can reduce the memory latency

when the requested data is found in the cache Vertically fragmented data structures optimize

memory cache usage

| 270

gt

copy A Behrend Foundations of Information Systems |

Row Storage vs Column Storage

Row Storage + easy to addmodify a record - might read unnecessary data

Column Storage + only need to read in relevant data - tuple writes require multiple accesses -gt suitable for read-mostly read-intensive large data repositories

| 271

gt

copy A Behrend Foundations of Information Systems |

Processing Models

[Marcin Zukowski Peter A Boncz Niels Nes Saacutendor Heacuteman MonetDBX100 - A DBMS In The CPU Cache IEEE Data Eng Bull 28(2) p17-22 2005]

| 272

gt

copy A Behrend Foundations of Information Systems |

Transaction Management

Principle of a transaction Sequence of successive DB operations that transform a database from a consistent

state into another consistent state surrounded by BOT EOT (Commit Abort)

Properties ACID Atomicity Consistency Isolation Durability A transaction will always come to an end Normal (commit) changes are permanently stored within the DB Abnormal (abort rollback) already composed changes are taken back

Note EOT state must not be different from BOT state

BOT(begin of transaction)

EOT(end of transaction)

possibly inconsistent database

consistentdatabase

consistentdatabase

DB DB

DML operations

| 273

gt

copy A Behrend Foundations of Information Systems |

ACID Properties of Transactions

Atomicity Indivisibility due to the transaction definition (Begin - End) All-or-nothing principle ie the DBS guarantees Either the complete execution of a transaction hellip hellip or the ineffectiveness of the whole transaction (and of all associated operations)

Consistency A successful transaction guarantees that all consistency requirements (integrity

requirements) have been met

Isolation Multiple transactions run isolated from each other and do not use (inconsistent)

intermediate results from other transactions

Durability All results of successful transactions have to be made persistent

| 274

gt

copy A Behrend Foundations of Information Systems |

Motivation

Atomicity Part of the transaction is done but we want to cancel it ABORTROLLBACK System crashes during transaction some changes made it to the disk some did not

Durability Transaction finished user notified COMMIT System crashes before changes sent successfully to disk (asynchronous write)

Consistency Physical consistency Correctness of the storage and access structures Completely executed modification operations preserve the consistency

Logical consistency Correctness of data contents ndash correspond to a (possible) state of the real world Completely executed transactions preserve the logical consistency

- All modifications of finished transactions are included - No modifications of open transactions are included

Remember Logical consistency requires physical consistency in the first place

UNDO Recovery

REDO Recovery

UNDO Recovery for consistency-related rollbacks

| 275

gt

copy A Behrend Foundations of Information Systems |

Reasons for crashes

Transaction error Violation of system restrictions Violation of security regulations Excessive resource requirements deadlocks

Application-related errors eg wrong operations and values ROLLBACK

System error System crash with loss of main-memory contents Database system operating system hardware power failure

Device error (especially storage-medium error) Destruction of secondary storage systems

Catastrophes Destruction of the computing center

| 276

gt

copy A Behrend Foundations of Information Systems |

Guarantee Atomicity amp Durability

Assumptions System may crash but the disk is durable The only atomicity guarantee is that a disk block write is atomic

Materialization strategy Preferred Policy StealNo Force This combination is most complicated but allows for highest performance No Force complicates enforcing Durability What if system crashes before a modified page written by a committed

transaction makes it to disk Write as little as possible in a convenient place at commit time to support

REDOing modifications Steal complicates enforcing Atomicity What if the transaction that performed udpates aborts What if system crashes before transaction is finished Must remember the old value of P (to support UNDOing the write to page P)

copy A Behrend Foundations of Information Systems | 277

gt

|

Storage System

Buffer

File System

Hardware

Data System

Application

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

Record Management

| 278

gt

copy A Behrend Foundations of Information Systems |

Record

Record Package of fields that together describe a thing a person a fact etc Each fields represents on property of the entity described by the record Similar to a struct in C Variable length (in contrast to pages)

Record Manager Organizes physical storage of records in pages Operations Get Insert Update Delete Scan Agnostic to record structure and semantic

records considered as byte strings of variable length Structure and content of record is defined be Access System and application

Challenges Record addressing Free space management

| 279

gt

copy A Behrend Foundations of Information Systems |

Record Addressing

Record address Identifier for records used to address records eg in indexes or query processing Assigned during insert of a record

Goals Stability of identifier Fast and direct access Less organizational overhead

Direct addressing Byte address or position number in file or page Instable Byte address If record grows in length following records would get new address Position number Insert and delete operations change series or records

Indirect addressing Surrogate with mapping table (complete indirection) Tuple Identifier (TID concept)

| 280

gt

copy A Behrend Foundations of Information Systems |

Surrogate with Mapping Table

Surrogate Record type + serial number Serial number remains constant during recordrsquos life time

Mapping table Maps

surrogate to page

Problems Where to store mapping table How can it be extended How to search mapping table efficiently

rarr H2 use B-Tree to store mapping table

Mapping Table Surrogate | Page ID

| 281

gt

copy A Behrend Foundations of Information Systems |

TID Concept

Record addressing with indirection inside the page Each page contains an array with record positions TID of a record consist of page id and index in position array

Pros Access with one page access (two pages in case of overflow) Stable No mapping table required

Operations Insert Reuse unused

position or add position Delete Mark position

as unused in array Update Update all

positions in array Update with overflow Store record

as overflow record and store TID of overflow record at original position (No double overflow Update TID at original position)

Record

Overflow Record

| 282

gt

copy A Behrend Foundations of Information Systems |

Free Space Management

Problem In which page is enough space for new record

Solution Free space table lists for all pages how much space is left

Free space value Precise value Ceil(Log2(page size)) =gt 2 bytes for common page size of 4K Rough value use less bytes free space = (value page size)2^(bits per value)

Free space table With direct page addressing Assuming a single page can take n free space entries First page and each (n+1)-th page takes free space entries

With indirect page addressing Free space information stored in page table

copy A Behrend Foundations of Information Systems | 283

gt

|

Storage System

Buffer

File System

Hardware

Data System

Application

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

Physical Access Paths ndash Index Structures

| 284

gt

copy A Behrend Foundations of Information Systems |

Primary Index Secondary Index

Overview Indexes

Table scan Read all pages and for each record

evaluate the search criteria Pre-fetching

Index Scan Use index for search criteria

on one or more attributes Fast access to single values or value ranges of index attributes Logicalphysical sorting of values of key attributes (depending on index structure) Enforcing uniqueness

Types if indexes Primary (Clustered) Index

determines physical organization use for PK Secondary (Non-Clustered)

Index redundant access path

Pers(PID NAME AGE SALARY)

Age

Salary

| 285

gt

copy A Behrend Foundations of Information Systems |

Overview Indexes (2)

Choice of Access Paths Index scan Only useful for low selectivity

(low number of result tuples) Break even-point according to the

output ratio of the number of tuples (usually max 5) Requires statistics about data Additional costs for index storage

and updating

Table Scan adequateefficient for small tables

(eg 5 pages) Queries with high selectivity

(large result sets) 100-200MBs sequential read

~ 100 disk seekss

hit rate

Index Scan

Table Scan

access time

| 286

gt

copy A Behrend Foundations of Information Systems |

Classification of Index Structures

Classification

Multiway Trees Tree structure with multiple children per node Idea chose fan out so that node size suits page size

Onedimensional Index Structures

Key Comparison Key Transformation

Sequential Tree-Based Hash-Based

Prefix Trees (Tries)

Binary Search Trees Dynamic Static Linked Lists

(log seq) Seq Lists

(phys seq) Multiway

Trees Example B-Tree

| 287

gt

copy A Behrend Foundations of Information Systems |

B-Tree

free space

(Ki Di Pi) = entry min |P| = k+1 max |P| = 2k+1

keys lt K1 keys gt Kp Ki lt keys lt Ki+1

| 288

gt

copy A Behrend Foundations of Information Systems |

B-Tree (2)

Example Keys Agnostic to specific key semantic Only defined complete order required Could be of fixed or variable length

Operations Search for data for given key value Insertion and deletion of key-data pair

Payload Agnostic to specific data semantic Can be record or reference (TID) or

mix

B-Tree with k = 2 h = 3

| 289

gt

copy A Behrend Foundations of Information Systems |

Search in the B-Tree

Starting at the root node each node is searched from left to right 1) if Ki matches the desired key value the data record has been found (further

records with the same key value might be located in a sub-tree to which Pi-1 points) 2) if Ki is smaller than the desired value the search will be continued in the root of

the sub-tree identified by Pi-1 3) if Ki is larger than the desired value the comparison with Ki+1 is repeated 4) if K2k is also smaller than the desired value the search will be continued in the

sub-tree of P2k If itlsquos impossible to descend further into a sub-tree (2 or 4) (leaf node) The search is aborted no record with the desired key value is found

Search for 38 20 6

| 290

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (1)

Insertion Rule insert only into leaf nodes At Non-Leaf Nodes descend down the tree as for the search S le Ki follow Pi-1 S gt Ki check Ki+1 S gt K2k follow P2k

At Leaf Node Insert the data record according to the sorting order Special case leaf node is full (2k records) rarr split the leaf node

Splitting Generate a new leaf node Split the 2k+1 entries (in order)

into two leaf nodes first k entries rarr left node last k entries rarr right node

middle entry (k+1-th) is used as new ldquodiscriminatorrdquo (branching) and inserted into the parent node

| 291

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (2)

Node Splitting during Insertion Two possible situations after a split The parent node is full rarr repeat split on this level Enough space rarr FINISHED

Special case root split Split of the root node rarr New root with two successor nodes Height of a tree grows by 1 The tree has been split from the bottom to the top

Dynamic reorganization (self-balancing) No unloading or loading necessary Tree is always balanced But In case of many insertions deletions reorganization can be beneficial

| 292

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (3)

Insertion Example Order k = 1 n=2k Keys 1 5 2 6 7 4 8 3

Finally h=3

| 293

gt

copy A Behrend Foundations of Information Systems |

Insertion and Deletion in the B-Tree

Problem Insertion can create overflow Deletion can create underflow and overflow

Example Insertion of key 22

rarr Overflow rarr Split

Deletion of key 22 rarr Underflow need to access all four nodes finally same as input

Insert 22

| 294

gt

copy A Behrend Foundations of Information Systems |

Underflow Merge

Deletion in the B-Tree

Example Order k = 1 n=2k Delete key 3

| 295

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (2)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Remember Each path from the root to the leaf has the same length h

| 296

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (3)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Overflow Split

| 297

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (4)

Example Order k = 1 n=2k Delete key 3

Overflow Split

Root Split

| 298

gt

copy A Behrend Foundations of Information Systems |

Deletion Algorithm

Example ndash there are different algorithms Search the node that contains the key K to be deleted If key K is in a leaf node delete the key in the leaf node and handle potentially

resulting underflow by merging with sibling If key K is in an inner node pull up new discriminator from one of the successors Analyze which successor node of K has more elements left or right one

If both have the same number of elements decide for one Replace the key K to be deleted with the direct successor Krsquo from the left

successor node or with the direct successor Krsquorsquo from the right successor node respectively Delete Krsquo or Krsquorsquo from the respective successor node (recursively)

Note Major variants Merge (tis lecture) Re-distribution (instead of splitmerge in case of overflowunderflow the entries are

re-distributed under consideration of one or multiple adjacent nodes)

| 299

gt

copy A Behrend Foundations of Information Systems |

B-Trees B+-Trees and B-Trees

B+-Trees and B-Trees Data is only in leaf nodes Key redundancy but higher fan-out rarr lower tree high less IO Simpler delete procedure rarr requires only merging of nodes

Double linked list of all leaf nodes B-Trees Modified valid node sizes

from [k2k] to [43k2k] rarr better node utilization but more splitsmerges

Example Secondary index Non unique

B-Tree with k = 2 h = 3

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 6: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

| 268

gt

copy A Behrend Foundations of Information Systems |

Different Access Characteristics

OLTP (On-line Transaction Processing) Mix between read-only and update queries Minor analysis tasks Used for data preservation and lookup Read typically only a few records at a time High performance by storing contiguous records in disk pages

OLAP (On-line Analytical Processing) Query-intensive DBMS applications Infrequent batch-oriented updates Complex analysis on large data volumes Read typically only a few attributes of large amounts of historical data in order to

partition them and compute aggregates High performance by storing contiguous values of a single attribute

| 269

gt

copy A Behrend Foundations of Information Systems |

Hardware Developments

Hardware improvements not equally distributed Advances in CPU speed have outpaced advances

in RAM latency Main-memory access has become a performance

bottleneck for many computer applications Bandwidth Latency Address translation (TLB)

rarr Memory Wall Cache memories can reduce the memory latency

when the requested data is found in the cache Vertically fragmented data structures optimize

memory cache usage

| 270

gt

copy A Behrend Foundations of Information Systems |

Row Storage vs Column Storage

Row Storage + easy to addmodify a record - might read unnecessary data

Column Storage + only need to read in relevant data - tuple writes require multiple accesses -gt suitable for read-mostly read-intensive large data repositories

| 271

gt

copy A Behrend Foundations of Information Systems |

Processing Models

[Marcin Zukowski Peter A Boncz Niels Nes Saacutendor Heacuteman MonetDBX100 - A DBMS In The CPU Cache IEEE Data Eng Bull 28(2) p17-22 2005]

| 272

gt

copy A Behrend Foundations of Information Systems |

Transaction Management

Principle of a transaction Sequence of successive DB operations that transform a database from a consistent

state into another consistent state surrounded by BOT EOT (Commit Abort)

Properties ACID Atomicity Consistency Isolation Durability A transaction will always come to an end Normal (commit) changes are permanently stored within the DB Abnormal (abort rollback) already composed changes are taken back

Note EOT state must not be different from BOT state

BOT(begin of transaction)

EOT(end of transaction)

possibly inconsistent database

consistentdatabase

consistentdatabase

DB DB

DML operations

| 273

gt

copy A Behrend Foundations of Information Systems |

ACID Properties of Transactions

Atomicity Indivisibility due to the transaction definition (Begin - End) All-or-nothing principle ie the DBS guarantees Either the complete execution of a transaction hellip hellip or the ineffectiveness of the whole transaction (and of all associated operations)

Consistency A successful transaction guarantees that all consistency requirements (integrity

requirements) have been met

Isolation Multiple transactions run isolated from each other and do not use (inconsistent)

intermediate results from other transactions

Durability All results of successful transactions have to be made persistent

| 274

gt

copy A Behrend Foundations of Information Systems |

Motivation

Atomicity Part of the transaction is done but we want to cancel it ABORTROLLBACK System crashes during transaction some changes made it to the disk some did not

Durability Transaction finished user notified COMMIT System crashes before changes sent successfully to disk (asynchronous write)

Consistency Physical consistency Correctness of the storage and access structures Completely executed modification operations preserve the consistency

Logical consistency Correctness of data contents ndash correspond to a (possible) state of the real world Completely executed transactions preserve the logical consistency

- All modifications of finished transactions are included - No modifications of open transactions are included

Remember Logical consistency requires physical consistency in the first place

UNDO Recovery

REDO Recovery

UNDO Recovery for consistency-related rollbacks

| 275

gt

copy A Behrend Foundations of Information Systems |

Reasons for crashes

Transaction error Violation of system restrictions Violation of security regulations Excessive resource requirements deadlocks

Application-related errors eg wrong operations and values ROLLBACK

System error System crash with loss of main-memory contents Database system operating system hardware power failure

Device error (especially storage-medium error) Destruction of secondary storage systems

Catastrophes Destruction of the computing center

| 276

gt

copy A Behrend Foundations of Information Systems |

Guarantee Atomicity amp Durability

Assumptions System may crash but the disk is durable The only atomicity guarantee is that a disk block write is atomic

Materialization strategy Preferred Policy StealNo Force This combination is most complicated but allows for highest performance No Force complicates enforcing Durability What if system crashes before a modified page written by a committed

transaction makes it to disk Write as little as possible in a convenient place at commit time to support

REDOing modifications Steal complicates enforcing Atomicity What if the transaction that performed udpates aborts What if system crashes before transaction is finished Must remember the old value of P (to support UNDOing the write to page P)

copy A Behrend Foundations of Information Systems | 277

gt

|

Storage System

Buffer

File System

Hardware

Data System

Application

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

Record Management

| 278

gt

copy A Behrend Foundations of Information Systems |

Record

Record Package of fields that together describe a thing a person a fact etc Each fields represents on property of the entity described by the record Similar to a struct in C Variable length (in contrast to pages)

Record Manager Organizes physical storage of records in pages Operations Get Insert Update Delete Scan Agnostic to record structure and semantic

records considered as byte strings of variable length Structure and content of record is defined be Access System and application

Challenges Record addressing Free space management

| 279

gt

copy A Behrend Foundations of Information Systems |

Record Addressing

Record address Identifier for records used to address records eg in indexes or query processing Assigned during insert of a record

Goals Stability of identifier Fast and direct access Less organizational overhead

Direct addressing Byte address or position number in file or page Instable Byte address If record grows in length following records would get new address Position number Insert and delete operations change series or records

Indirect addressing Surrogate with mapping table (complete indirection) Tuple Identifier (TID concept)

| 280

gt

copy A Behrend Foundations of Information Systems |

Surrogate with Mapping Table

Surrogate Record type + serial number Serial number remains constant during recordrsquos life time

Mapping table Maps

surrogate to page

Problems Where to store mapping table How can it be extended How to search mapping table efficiently

rarr H2 use B-Tree to store mapping table

Mapping Table Surrogate | Page ID

| 281

gt

copy A Behrend Foundations of Information Systems |

TID Concept

Record addressing with indirection inside the page Each page contains an array with record positions TID of a record consist of page id and index in position array

Pros Access with one page access (two pages in case of overflow) Stable No mapping table required

Operations Insert Reuse unused

position or add position Delete Mark position

as unused in array Update Update all

positions in array Update with overflow Store record

as overflow record and store TID of overflow record at original position (No double overflow Update TID at original position)

Record

Overflow Record

| 282

gt

copy A Behrend Foundations of Information Systems |

Free Space Management

Problem In which page is enough space for new record

Solution Free space table lists for all pages how much space is left

Free space value Precise value Ceil(Log2(page size)) =gt 2 bytes for common page size of 4K Rough value use less bytes free space = (value page size)2^(bits per value)

Free space table With direct page addressing Assuming a single page can take n free space entries First page and each (n+1)-th page takes free space entries

With indirect page addressing Free space information stored in page table

copy A Behrend Foundations of Information Systems | 283

gt

|

Storage System

Buffer

File System

Hardware

Data System

Application

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

Physical Access Paths ndash Index Structures

| 284

gt

copy A Behrend Foundations of Information Systems |

Primary Index Secondary Index

Overview Indexes

Table scan Read all pages and for each record

evaluate the search criteria Pre-fetching

Index Scan Use index for search criteria

on one or more attributes Fast access to single values or value ranges of index attributes Logicalphysical sorting of values of key attributes (depending on index structure) Enforcing uniqueness

Types if indexes Primary (Clustered) Index

determines physical organization use for PK Secondary (Non-Clustered)

Index redundant access path

Pers(PID NAME AGE SALARY)

Age

Salary

| 285

gt

copy A Behrend Foundations of Information Systems |

Overview Indexes (2)

Choice of Access Paths Index scan Only useful for low selectivity

(low number of result tuples) Break even-point according to the

output ratio of the number of tuples (usually max 5) Requires statistics about data Additional costs for index storage

and updating

Table Scan adequateefficient for small tables

(eg 5 pages) Queries with high selectivity

(large result sets) 100-200MBs sequential read

~ 100 disk seekss

hit rate

Index Scan

Table Scan

access time

| 286

gt

copy A Behrend Foundations of Information Systems |

Classification of Index Structures

Classification

Multiway Trees Tree structure with multiple children per node Idea chose fan out so that node size suits page size

Onedimensional Index Structures

Key Comparison Key Transformation

Sequential Tree-Based Hash-Based

Prefix Trees (Tries)

Binary Search Trees Dynamic Static Linked Lists

(log seq) Seq Lists

(phys seq) Multiway

Trees Example B-Tree

| 287

gt

copy A Behrend Foundations of Information Systems |

B-Tree

free space

(Ki Di Pi) = entry min |P| = k+1 max |P| = 2k+1

keys lt K1 keys gt Kp Ki lt keys lt Ki+1

| 288

gt

copy A Behrend Foundations of Information Systems |

B-Tree (2)

Example Keys Agnostic to specific key semantic Only defined complete order required Could be of fixed or variable length

Operations Search for data for given key value Insertion and deletion of key-data pair

Payload Agnostic to specific data semantic Can be record or reference (TID) or

mix

B-Tree with k = 2 h = 3

| 289

gt

copy A Behrend Foundations of Information Systems |

Search in the B-Tree

Starting at the root node each node is searched from left to right 1) if Ki matches the desired key value the data record has been found (further

records with the same key value might be located in a sub-tree to which Pi-1 points) 2) if Ki is smaller than the desired value the search will be continued in the root of

the sub-tree identified by Pi-1 3) if Ki is larger than the desired value the comparison with Ki+1 is repeated 4) if K2k is also smaller than the desired value the search will be continued in the

sub-tree of P2k If itlsquos impossible to descend further into a sub-tree (2 or 4) (leaf node) The search is aborted no record with the desired key value is found

Search for 38 20 6

| 290

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (1)

Insertion Rule insert only into leaf nodes At Non-Leaf Nodes descend down the tree as for the search S le Ki follow Pi-1 S gt Ki check Ki+1 S gt K2k follow P2k

At Leaf Node Insert the data record according to the sorting order Special case leaf node is full (2k records) rarr split the leaf node

Splitting Generate a new leaf node Split the 2k+1 entries (in order)

into two leaf nodes first k entries rarr left node last k entries rarr right node

middle entry (k+1-th) is used as new ldquodiscriminatorrdquo (branching) and inserted into the parent node

| 291

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (2)

Node Splitting during Insertion Two possible situations after a split The parent node is full rarr repeat split on this level Enough space rarr FINISHED

Special case root split Split of the root node rarr New root with two successor nodes Height of a tree grows by 1 The tree has been split from the bottom to the top

Dynamic reorganization (self-balancing) No unloading or loading necessary Tree is always balanced But In case of many insertions deletions reorganization can be beneficial

| 292

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (3)

Insertion Example Order k = 1 n=2k Keys 1 5 2 6 7 4 8 3

Finally h=3

| 293

gt

copy A Behrend Foundations of Information Systems |

Insertion and Deletion in the B-Tree

Problem Insertion can create overflow Deletion can create underflow and overflow

Example Insertion of key 22

rarr Overflow rarr Split

Deletion of key 22 rarr Underflow need to access all four nodes finally same as input

Insert 22

| 294

gt

copy A Behrend Foundations of Information Systems |

Underflow Merge

Deletion in the B-Tree

Example Order k = 1 n=2k Delete key 3

| 295

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (2)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Remember Each path from the root to the leaf has the same length h

| 296

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (3)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Overflow Split

| 297

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (4)

Example Order k = 1 n=2k Delete key 3

Overflow Split

Root Split

| 298

gt

copy A Behrend Foundations of Information Systems |

Deletion Algorithm

Example ndash there are different algorithms Search the node that contains the key K to be deleted If key K is in a leaf node delete the key in the leaf node and handle potentially

resulting underflow by merging with sibling If key K is in an inner node pull up new discriminator from one of the successors Analyze which successor node of K has more elements left or right one

If both have the same number of elements decide for one Replace the key K to be deleted with the direct successor Krsquo from the left

successor node or with the direct successor Krsquorsquo from the right successor node respectively Delete Krsquo or Krsquorsquo from the respective successor node (recursively)

Note Major variants Merge (tis lecture) Re-distribution (instead of splitmerge in case of overflowunderflow the entries are

re-distributed under consideration of one or multiple adjacent nodes)

| 299

gt

copy A Behrend Foundations of Information Systems |

B-Trees B+-Trees and B-Trees

B+-Trees and B-Trees Data is only in leaf nodes Key redundancy but higher fan-out rarr lower tree high less IO Simpler delete procedure rarr requires only merging of nodes

Double linked list of all leaf nodes B-Trees Modified valid node sizes

from [k2k] to [43k2k] rarr better node utilization but more splitsmerges

Example Secondary index Non unique

B-Tree with k = 2 h = 3

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 7: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

| 269

gt

copy A Behrend Foundations of Information Systems |

Hardware Developments

Hardware improvements not equally distributed Advances in CPU speed have outpaced advances

in RAM latency Main-memory access has become a performance

bottleneck for many computer applications Bandwidth Latency Address translation (TLB)

rarr Memory Wall Cache memories can reduce the memory latency

when the requested data is found in the cache Vertically fragmented data structures optimize

memory cache usage

| 270

gt

copy A Behrend Foundations of Information Systems |

Row Storage vs Column Storage

Row Storage + easy to addmodify a record - might read unnecessary data

Column Storage + only need to read in relevant data - tuple writes require multiple accesses -gt suitable for read-mostly read-intensive large data repositories

| 271

gt

copy A Behrend Foundations of Information Systems |

Processing Models

[Marcin Zukowski Peter A Boncz Niels Nes Saacutendor Heacuteman MonetDBX100 - A DBMS In The CPU Cache IEEE Data Eng Bull 28(2) p17-22 2005]

| 272

gt

copy A Behrend Foundations of Information Systems |

Transaction Management

Principle of a transaction Sequence of successive DB operations that transform a database from a consistent

state into another consistent state surrounded by BOT EOT (Commit Abort)

Properties ACID Atomicity Consistency Isolation Durability A transaction will always come to an end Normal (commit) changes are permanently stored within the DB Abnormal (abort rollback) already composed changes are taken back

Note EOT state must not be different from BOT state

BOT(begin of transaction)

EOT(end of transaction)

possibly inconsistent database

consistentdatabase

consistentdatabase

DB DB

DML operations

| 273

gt

copy A Behrend Foundations of Information Systems |

ACID Properties of Transactions

Atomicity Indivisibility due to the transaction definition (Begin - End) All-or-nothing principle ie the DBS guarantees Either the complete execution of a transaction hellip hellip or the ineffectiveness of the whole transaction (and of all associated operations)

Consistency A successful transaction guarantees that all consistency requirements (integrity

requirements) have been met

Isolation Multiple transactions run isolated from each other and do not use (inconsistent)

intermediate results from other transactions

Durability All results of successful transactions have to be made persistent

| 274

gt

copy A Behrend Foundations of Information Systems |

Motivation

Atomicity Part of the transaction is done but we want to cancel it ABORTROLLBACK System crashes during transaction some changes made it to the disk some did not

Durability Transaction finished user notified COMMIT System crashes before changes sent successfully to disk (asynchronous write)

Consistency Physical consistency Correctness of the storage and access structures Completely executed modification operations preserve the consistency

Logical consistency Correctness of data contents ndash correspond to a (possible) state of the real world Completely executed transactions preserve the logical consistency

- All modifications of finished transactions are included - No modifications of open transactions are included

Remember Logical consistency requires physical consistency in the first place

UNDO Recovery

REDO Recovery

UNDO Recovery for consistency-related rollbacks

| 275

gt

copy A Behrend Foundations of Information Systems |

Reasons for crashes

Transaction error Violation of system restrictions Violation of security regulations Excessive resource requirements deadlocks

Application-related errors eg wrong operations and values ROLLBACK

System error System crash with loss of main-memory contents Database system operating system hardware power failure

Device error (especially storage-medium error) Destruction of secondary storage systems

Catastrophes Destruction of the computing center

| 276

gt

copy A Behrend Foundations of Information Systems |

Guarantee Atomicity amp Durability

Assumptions System may crash but the disk is durable The only atomicity guarantee is that a disk block write is atomic

Materialization strategy Preferred Policy StealNo Force This combination is most complicated but allows for highest performance No Force complicates enforcing Durability What if system crashes before a modified page written by a committed

transaction makes it to disk Write as little as possible in a convenient place at commit time to support

REDOing modifications Steal complicates enforcing Atomicity What if the transaction that performed udpates aborts What if system crashes before transaction is finished Must remember the old value of P (to support UNDOing the write to page P)

copy A Behrend Foundations of Information Systems | 277

gt

|

Storage System

Buffer

File System

Hardware

Data System

Application

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

Record Management

| 278

gt

copy A Behrend Foundations of Information Systems |

Record

Record Package of fields that together describe a thing a person a fact etc Each fields represents on property of the entity described by the record Similar to a struct in C Variable length (in contrast to pages)

Record Manager Organizes physical storage of records in pages Operations Get Insert Update Delete Scan Agnostic to record structure and semantic

records considered as byte strings of variable length Structure and content of record is defined be Access System and application

Challenges Record addressing Free space management

| 279

gt

copy A Behrend Foundations of Information Systems |

Record Addressing

Record address Identifier for records used to address records eg in indexes or query processing Assigned during insert of a record

Goals Stability of identifier Fast and direct access Less organizational overhead

Direct addressing Byte address or position number in file or page Instable Byte address If record grows in length following records would get new address Position number Insert and delete operations change series or records

Indirect addressing Surrogate with mapping table (complete indirection) Tuple Identifier (TID concept)

| 280

gt

copy A Behrend Foundations of Information Systems |

Surrogate with Mapping Table

Surrogate Record type + serial number Serial number remains constant during recordrsquos life time

Mapping table Maps

surrogate to page

Problems Where to store mapping table How can it be extended How to search mapping table efficiently

rarr H2 use B-Tree to store mapping table

Mapping Table Surrogate | Page ID

| 281

gt

copy A Behrend Foundations of Information Systems |

TID Concept

Record addressing with indirection inside the page Each page contains an array with record positions TID of a record consist of page id and index in position array

Pros Access with one page access (two pages in case of overflow) Stable No mapping table required

Operations Insert Reuse unused

position or add position Delete Mark position

as unused in array Update Update all

positions in array Update with overflow Store record

as overflow record and store TID of overflow record at original position (No double overflow Update TID at original position)

Record

Overflow Record

| 282

gt

copy A Behrend Foundations of Information Systems |

Free Space Management

Problem In which page is enough space for new record

Solution Free space table lists for all pages how much space is left

Free space value Precise value Ceil(Log2(page size)) =gt 2 bytes for common page size of 4K Rough value use less bytes free space = (value page size)2^(bits per value)

Free space table With direct page addressing Assuming a single page can take n free space entries First page and each (n+1)-th page takes free space entries

With indirect page addressing Free space information stored in page table

copy A Behrend Foundations of Information Systems | 283

gt

|

Storage System

Buffer

File System

Hardware

Data System

Application

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

Physical Access Paths ndash Index Structures

| 284

gt

copy A Behrend Foundations of Information Systems |

Primary Index Secondary Index

Overview Indexes

Table scan Read all pages and for each record

evaluate the search criteria Pre-fetching

Index Scan Use index for search criteria

on one or more attributes Fast access to single values or value ranges of index attributes Logicalphysical sorting of values of key attributes (depending on index structure) Enforcing uniqueness

Types if indexes Primary (Clustered) Index

determines physical organization use for PK Secondary (Non-Clustered)

Index redundant access path

Pers(PID NAME AGE SALARY)

Age

Salary

| 285

gt

copy A Behrend Foundations of Information Systems |

Overview Indexes (2)

Choice of Access Paths Index scan Only useful for low selectivity

(low number of result tuples) Break even-point according to the

output ratio of the number of tuples (usually max 5) Requires statistics about data Additional costs for index storage

and updating

Table Scan adequateefficient for small tables

(eg 5 pages) Queries with high selectivity

(large result sets) 100-200MBs sequential read

~ 100 disk seekss

hit rate

Index Scan

Table Scan

access time

| 286

gt

copy A Behrend Foundations of Information Systems |

Classification of Index Structures

Classification

Multiway Trees Tree structure with multiple children per node Idea chose fan out so that node size suits page size

Onedimensional Index Structures

Key Comparison Key Transformation

Sequential Tree-Based Hash-Based

Prefix Trees (Tries)

Binary Search Trees Dynamic Static Linked Lists

(log seq) Seq Lists

(phys seq) Multiway

Trees Example B-Tree

| 287

gt

copy A Behrend Foundations of Information Systems |

B-Tree

free space

(Ki Di Pi) = entry min |P| = k+1 max |P| = 2k+1

keys lt K1 keys gt Kp Ki lt keys lt Ki+1

| 288

gt

copy A Behrend Foundations of Information Systems |

B-Tree (2)

Example Keys Agnostic to specific key semantic Only defined complete order required Could be of fixed or variable length

Operations Search for data for given key value Insertion and deletion of key-data pair

Payload Agnostic to specific data semantic Can be record or reference (TID) or

mix

B-Tree with k = 2 h = 3

| 289

gt

copy A Behrend Foundations of Information Systems |

Search in the B-Tree

Starting at the root node each node is searched from left to right 1) if Ki matches the desired key value the data record has been found (further

records with the same key value might be located in a sub-tree to which Pi-1 points) 2) if Ki is smaller than the desired value the search will be continued in the root of

the sub-tree identified by Pi-1 3) if Ki is larger than the desired value the comparison with Ki+1 is repeated 4) if K2k is also smaller than the desired value the search will be continued in the

sub-tree of P2k If itlsquos impossible to descend further into a sub-tree (2 or 4) (leaf node) The search is aborted no record with the desired key value is found

Search for 38 20 6

| 290

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (1)

Insertion Rule insert only into leaf nodes At Non-Leaf Nodes descend down the tree as for the search S le Ki follow Pi-1 S gt Ki check Ki+1 S gt K2k follow P2k

At Leaf Node Insert the data record according to the sorting order Special case leaf node is full (2k records) rarr split the leaf node

Splitting Generate a new leaf node Split the 2k+1 entries (in order)

into two leaf nodes first k entries rarr left node last k entries rarr right node

middle entry (k+1-th) is used as new ldquodiscriminatorrdquo (branching) and inserted into the parent node

| 291

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (2)

Node Splitting during Insertion Two possible situations after a split The parent node is full rarr repeat split on this level Enough space rarr FINISHED

Special case root split Split of the root node rarr New root with two successor nodes Height of a tree grows by 1 The tree has been split from the bottom to the top

Dynamic reorganization (self-balancing) No unloading or loading necessary Tree is always balanced But In case of many insertions deletions reorganization can be beneficial

| 292

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (3)

Insertion Example Order k = 1 n=2k Keys 1 5 2 6 7 4 8 3

Finally h=3

| 293

gt

copy A Behrend Foundations of Information Systems |

Insertion and Deletion in the B-Tree

Problem Insertion can create overflow Deletion can create underflow and overflow

Example Insertion of key 22

rarr Overflow rarr Split

Deletion of key 22 rarr Underflow need to access all four nodes finally same as input

Insert 22

| 294

gt

copy A Behrend Foundations of Information Systems |

Underflow Merge

Deletion in the B-Tree

Example Order k = 1 n=2k Delete key 3

| 295

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (2)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Remember Each path from the root to the leaf has the same length h

| 296

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (3)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Overflow Split

| 297

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (4)

Example Order k = 1 n=2k Delete key 3

Overflow Split

Root Split

| 298

gt

copy A Behrend Foundations of Information Systems |

Deletion Algorithm

Example ndash there are different algorithms Search the node that contains the key K to be deleted If key K is in a leaf node delete the key in the leaf node and handle potentially

resulting underflow by merging with sibling If key K is in an inner node pull up new discriminator from one of the successors Analyze which successor node of K has more elements left or right one

If both have the same number of elements decide for one Replace the key K to be deleted with the direct successor Krsquo from the left

successor node or with the direct successor Krsquorsquo from the right successor node respectively Delete Krsquo or Krsquorsquo from the respective successor node (recursively)

Note Major variants Merge (tis lecture) Re-distribution (instead of splitmerge in case of overflowunderflow the entries are

re-distributed under consideration of one or multiple adjacent nodes)

| 299

gt

copy A Behrend Foundations of Information Systems |

B-Trees B+-Trees and B-Trees

B+-Trees and B-Trees Data is only in leaf nodes Key redundancy but higher fan-out rarr lower tree high less IO Simpler delete procedure rarr requires only merging of nodes

Double linked list of all leaf nodes B-Trees Modified valid node sizes

from [k2k] to [43k2k] rarr better node utilization but more splitsmerges

Example Secondary index Non unique

B-Tree with k = 2 h = 3

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 8: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

| 270

gt

copy A Behrend Foundations of Information Systems |

Row Storage vs Column Storage

Row Storage + easy to addmodify a record - might read unnecessary data

Column Storage + only need to read in relevant data - tuple writes require multiple accesses -gt suitable for read-mostly read-intensive large data repositories

| 271

gt

copy A Behrend Foundations of Information Systems |

Processing Models

[Marcin Zukowski Peter A Boncz Niels Nes Saacutendor Heacuteman MonetDBX100 - A DBMS In The CPU Cache IEEE Data Eng Bull 28(2) p17-22 2005]

| 272

gt

copy A Behrend Foundations of Information Systems |

Transaction Management

Principle of a transaction Sequence of successive DB operations that transform a database from a consistent

state into another consistent state surrounded by BOT EOT (Commit Abort)

Properties ACID Atomicity Consistency Isolation Durability A transaction will always come to an end Normal (commit) changes are permanently stored within the DB Abnormal (abort rollback) already composed changes are taken back

Note EOT state must not be different from BOT state

BOT(begin of transaction)

EOT(end of transaction)

possibly inconsistent database

consistentdatabase

consistentdatabase

DB DB

DML operations

| 273

gt

copy A Behrend Foundations of Information Systems |

ACID Properties of Transactions

Atomicity Indivisibility due to the transaction definition (Begin - End) All-or-nothing principle ie the DBS guarantees Either the complete execution of a transaction hellip hellip or the ineffectiveness of the whole transaction (and of all associated operations)

Consistency A successful transaction guarantees that all consistency requirements (integrity

requirements) have been met

Isolation Multiple transactions run isolated from each other and do not use (inconsistent)

intermediate results from other transactions

Durability All results of successful transactions have to be made persistent

| 274

gt

copy A Behrend Foundations of Information Systems |

Motivation

Atomicity Part of the transaction is done but we want to cancel it ABORTROLLBACK System crashes during transaction some changes made it to the disk some did not

Durability Transaction finished user notified COMMIT System crashes before changes sent successfully to disk (asynchronous write)

Consistency Physical consistency Correctness of the storage and access structures Completely executed modification operations preserve the consistency

Logical consistency Correctness of data contents ndash correspond to a (possible) state of the real world Completely executed transactions preserve the logical consistency

- All modifications of finished transactions are included - No modifications of open transactions are included

Remember Logical consistency requires physical consistency in the first place

UNDO Recovery

REDO Recovery

UNDO Recovery for consistency-related rollbacks

| 275

gt

copy A Behrend Foundations of Information Systems |

Reasons for crashes

Transaction error Violation of system restrictions Violation of security regulations Excessive resource requirements deadlocks

Application-related errors eg wrong operations and values ROLLBACK

System error System crash with loss of main-memory contents Database system operating system hardware power failure

Device error (especially storage-medium error) Destruction of secondary storage systems

Catastrophes Destruction of the computing center

| 276

gt

copy A Behrend Foundations of Information Systems |

Guarantee Atomicity amp Durability

Assumptions System may crash but the disk is durable The only atomicity guarantee is that a disk block write is atomic

Materialization strategy Preferred Policy StealNo Force This combination is most complicated but allows for highest performance No Force complicates enforcing Durability What if system crashes before a modified page written by a committed

transaction makes it to disk Write as little as possible in a convenient place at commit time to support

REDOing modifications Steal complicates enforcing Atomicity What if the transaction that performed udpates aborts What if system crashes before transaction is finished Must remember the old value of P (to support UNDOing the write to page P)

copy A Behrend Foundations of Information Systems | 277

gt

|

Storage System

Buffer

File System

Hardware

Data System

Application

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

Record Management

| 278

gt

copy A Behrend Foundations of Information Systems |

Record

Record Package of fields that together describe a thing a person a fact etc Each fields represents on property of the entity described by the record Similar to a struct in C Variable length (in contrast to pages)

Record Manager Organizes physical storage of records in pages Operations Get Insert Update Delete Scan Agnostic to record structure and semantic

records considered as byte strings of variable length Structure and content of record is defined be Access System and application

Challenges Record addressing Free space management

| 279

gt

copy A Behrend Foundations of Information Systems |

Record Addressing

Record address Identifier for records used to address records eg in indexes or query processing Assigned during insert of a record

Goals Stability of identifier Fast and direct access Less organizational overhead

Direct addressing Byte address or position number in file or page Instable Byte address If record grows in length following records would get new address Position number Insert and delete operations change series or records

Indirect addressing Surrogate with mapping table (complete indirection) Tuple Identifier (TID concept)

| 280

gt

copy A Behrend Foundations of Information Systems |

Surrogate with Mapping Table

Surrogate Record type + serial number Serial number remains constant during recordrsquos life time

Mapping table Maps

surrogate to page

Problems Where to store mapping table How can it be extended How to search mapping table efficiently

rarr H2 use B-Tree to store mapping table

Mapping Table Surrogate | Page ID

| 281

gt

copy A Behrend Foundations of Information Systems |

TID Concept

Record addressing with indirection inside the page Each page contains an array with record positions TID of a record consist of page id and index in position array

Pros Access with one page access (two pages in case of overflow) Stable No mapping table required

Operations Insert Reuse unused

position or add position Delete Mark position

as unused in array Update Update all

positions in array Update with overflow Store record

as overflow record and store TID of overflow record at original position (No double overflow Update TID at original position)

Record

Overflow Record

| 282

gt

copy A Behrend Foundations of Information Systems |

Free Space Management

Problem In which page is enough space for new record

Solution Free space table lists for all pages how much space is left

Free space value Precise value Ceil(Log2(page size)) =gt 2 bytes for common page size of 4K Rough value use less bytes free space = (value page size)2^(bits per value)

Free space table With direct page addressing Assuming a single page can take n free space entries First page and each (n+1)-th page takes free space entries

With indirect page addressing Free space information stored in page table

copy A Behrend Foundations of Information Systems | 283

gt

|

Storage System

Buffer

File System

Hardware

Data System

Application

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

Physical Access Paths ndash Index Structures

| 284

gt

copy A Behrend Foundations of Information Systems |

Primary Index Secondary Index

Overview Indexes

Table scan Read all pages and for each record

evaluate the search criteria Pre-fetching

Index Scan Use index for search criteria

on one or more attributes Fast access to single values or value ranges of index attributes Logicalphysical sorting of values of key attributes (depending on index structure) Enforcing uniqueness

Types if indexes Primary (Clustered) Index

determines physical organization use for PK Secondary (Non-Clustered)

Index redundant access path

Pers(PID NAME AGE SALARY)

Age

Salary

| 285

gt

copy A Behrend Foundations of Information Systems |

Overview Indexes (2)

Choice of Access Paths Index scan Only useful for low selectivity

(low number of result tuples) Break even-point according to the

output ratio of the number of tuples (usually max 5) Requires statistics about data Additional costs for index storage

and updating

Table Scan adequateefficient for small tables

(eg 5 pages) Queries with high selectivity

(large result sets) 100-200MBs sequential read

~ 100 disk seekss

hit rate

Index Scan

Table Scan

access time

| 286

gt

copy A Behrend Foundations of Information Systems |

Classification of Index Structures

Classification

Multiway Trees Tree structure with multiple children per node Idea chose fan out so that node size suits page size

Onedimensional Index Structures

Key Comparison Key Transformation

Sequential Tree-Based Hash-Based

Prefix Trees (Tries)

Binary Search Trees Dynamic Static Linked Lists

(log seq) Seq Lists

(phys seq) Multiway

Trees Example B-Tree

| 287

gt

copy A Behrend Foundations of Information Systems |

B-Tree

free space

(Ki Di Pi) = entry min |P| = k+1 max |P| = 2k+1

keys lt K1 keys gt Kp Ki lt keys lt Ki+1

| 288

gt

copy A Behrend Foundations of Information Systems |

B-Tree (2)

Example Keys Agnostic to specific key semantic Only defined complete order required Could be of fixed or variable length

Operations Search for data for given key value Insertion and deletion of key-data pair

Payload Agnostic to specific data semantic Can be record or reference (TID) or

mix

B-Tree with k = 2 h = 3

| 289

gt

copy A Behrend Foundations of Information Systems |

Search in the B-Tree

Starting at the root node each node is searched from left to right 1) if Ki matches the desired key value the data record has been found (further

records with the same key value might be located in a sub-tree to which Pi-1 points) 2) if Ki is smaller than the desired value the search will be continued in the root of

the sub-tree identified by Pi-1 3) if Ki is larger than the desired value the comparison with Ki+1 is repeated 4) if K2k is also smaller than the desired value the search will be continued in the

sub-tree of P2k If itlsquos impossible to descend further into a sub-tree (2 or 4) (leaf node) The search is aborted no record with the desired key value is found

Search for 38 20 6

| 290

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (1)

Insertion Rule insert only into leaf nodes At Non-Leaf Nodes descend down the tree as for the search S le Ki follow Pi-1 S gt Ki check Ki+1 S gt K2k follow P2k

At Leaf Node Insert the data record according to the sorting order Special case leaf node is full (2k records) rarr split the leaf node

Splitting Generate a new leaf node Split the 2k+1 entries (in order)

into two leaf nodes first k entries rarr left node last k entries rarr right node

middle entry (k+1-th) is used as new ldquodiscriminatorrdquo (branching) and inserted into the parent node

| 291

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (2)

Node Splitting during Insertion Two possible situations after a split The parent node is full rarr repeat split on this level Enough space rarr FINISHED

Special case root split Split of the root node rarr New root with two successor nodes Height of a tree grows by 1 The tree has been split from the bottom to the top

Dynamic reorganization (self-balancing) No unloading or loading necessary Tree is always balanced But In case of many insertions deletions reorganization can be beneficial

| 292

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (3)

Insertion Example Order k = 1 n=2k Keys 1 5 2 6 7 4 8 3

Finally h=3

| 293

gt

copy A Behrend Foundations of Information Systems |

Insertion and Deletion in the B-Tree

Problem Insertion can create overflow Deletion can create underflow and overflow

Example Insertion of key 22

rarr Overflow rarr Split

Deletion of key 22 rarr Underflow need to access all four nodes finally same as input

Insert 22

| 294

gt

copy A Behrend Foundations of Information Systems |

Underflow Merge

Deletion in the B-Tree

Example Order k = 1 n=2k Delete key 3

| 295

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (2)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Remember Each path from the root to the leaf has the same length h

| 296

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (3)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Overflow Split

| 297

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (4)

Example Order k = 1 n=2k Delete key 3

Overflow Split

Root Split

| 298

gt

copy A Behrend Foundations of Information Systems |

Deletion Algorithm

Example ndash there are different algorithms Search the node that contains the key K to be deleted If key K is in a leaf node delete the key in the leaf node and handle potentially

resulting underflow by merging with sibling If key K is in an inner node pull up new discriminator from one of the successors Analyze which successor node of K has more elements left or right one

If both have the same number of elements decide for one Replace the key K to be deleted with the direct successor Krsquo from the left

successor node or with the direct successor Krsquorsquo from the right successor node respectively Delete Krsquo or Krsquorsquo from the respective successor node (recursively)

Note Major variants Merge (tis lecture) Re-distribution (instead of splitmerge in case of overflowunderflow the entries are

re-distributed under consideration of one or multiple adjacent nodes)

| 299

gt

copy A Behrend Foundations of Information Systems |

B-Trees B+-Trees and B-Trees

B+-Trees and B-Trees Data is only in leaf nodes Key redundancy but higher fan-out rarr lower tree high less IO Simpler delete procedure rarr requires only merging of nodes

Double linked list of all leaf nodes B-Trees Modified valid node sizes

from [k2k] to [43k2k] rarr better node utilization but more splitsmerges

Example Secondary index Non unique

B-Tree with k = 2 h = 3

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 9: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

| 271

gt

copy A Behrend Foundations of Information Systems |

Processing Models

[Marcin Zukowski Peter A Boncz Niels Nes Saacutendor Heacuteman MonetDBX100 - A DBMS In The CPU Cache IEEE Data Eng Bull 28(2) p17-22 2005]

| 272

gt

copy A Behrend Foundations of Information Systems |

Transaction Management

Principle of a transaction Sequence of successive DB operations that transform a database from a consistent

state into another consistent state surrounded by BOT EOT (Commit Abort)

Properties ACID Atomicity Consistency Isolation Durability A transaction will always come to an end Normal (commit) changes are permanently stored within the DB Abnormal (abort rollback) already composed changes are taken back

Note EOT state must not be different from BOT state

BOT(begin of transaction)

EOT(end of transaction)

possibly inconsistent database

consistentdatabase

consistentdatabase

DB DB

DML operations

| 273

gt

copy A Behrend Foundations of Information Systems |

ACID Properties of Transactions

Atomicity Indivisibility due to the transaction definition (Begin - End) All-or-nothing principle ie the DBS guarantees Either the complete execution of a transaction hellip hellip or the ineffectiveness of the whole transaction (and of all associated operations)

Consistency A successful transaction guarantees that all consistency requirements (integrity

requirements) have been met

Isolation Multiple transactions run isolated from each other and do not use (inconsistent)

intermediate results from other transactions

Durability All results of successful transactions have to be made persistent

| 274

gt

copy A Behrend Foundations of Information Systems |

Motivation

Atomicity Part of the transaction is done but we want to cancel it ABORTROLLBACK System crashes during transaction some changes made it to the disk some did not

Durability Transaction finished user notified COMMIT System crashes before changes sent successfully to disk (asynchronous write)

Consistency Physical consistency Correctness of the storage and access structures Completely executed modification operations preserve the consistency

Logical consistency Correctness of data contents ndash correspond to a (possible) state of the real world Completely executed transactions preserve the logical consistency

- All modifications of finished transactions are included - No modifications of open transactions are included

Remember Logical consistency requires physical consistency in the first place

UNDO Recovery

REDO Recovery

UNDO Recovery for consistency-related rollbacks

| 275

gt

copy A Behrend Foundations of Information Systems |

Reasons for crashes

Transaction error Violation of system restrictions Violation of security regulations Excessive resource requirements deadlocks

Application-related errors eg wrong operations and values ROLLBACK

System error System crash with loss of main-memory contents Database system operating system hardware power failure

Device error (especially storage-medium error) Destruction of secondary storage systems

Catastrophes Destruction of the computing center

| 276

gt

copy A Behrend Foundations of Information Systems |

Guarantee Atomicity amp Durability

Assumptions System may crash but the disk is durable The only atomicity guarantee is that a disk block write is atomic

Materialization strategy Preferred Policy StealNo Force This combination is most complicated but allows for highest performance No Force complicates enforcing Durability What if system crashes before a modified page written by a committed

transaction makes it to disk Write as little as possible in a convenient place at commit time to support

REDOing modifications Steal complicates enforcing Atomicity What if the transaction that performed udpates aborts What if system crashes before transaction is finished Must remember the old value of P (to support UNDOing the write to page P)

copy A Behrend Foundations of Information Systems | 277

gt

|

Storage System

Buffer

File System

Hardware

Data System

Application

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

Record Management

| 278

gt

copy A Behrend Foundations of Information Systems |

Record

Record Package of fields that together describe a thing a person a fact etc Each fields represents on property of the entity described by the record Similar to a struct in C Variable length (in contrast to pages)

Record Manager Organizes physical storage of records in pages Operations Get Insert Update Delete Scan Agnostic to record structure and semantic

records considered as byte strings of variable length Structure and content of record is defined be Access System and application

Challenges Record addressing Free space management

| 279

gt

copy A Behrend Foundations of Information Systems |

Record Addressing

Record address Identifier for records used to address records eg in indexes or query processing Assigned during insert of a record

Goals Stability of identifier Fast and direct access Less organizational overhead

Direct addressing Byte address or position number in file or page Instable Byte address If record grows in length following records would get new address Position number Insert and delete operations change series or records

Indirect addressing Surrogate with mapping table (complete indirection) Tuple Identifier (TID concept)

| 280

gt

copy A Behrend Foundations of Information Systems |

Surrogate with Mapping Table

Surrogate Record type + serial number Serial number remains constant during recordrsquos life time

Mapping table Maps

surrogate to page

Problems Where to store mapping table How can it be extended How to search mapping table efficiently

rarr H2 use B-Tree to store mapping table

Mapping Table Surrogate | Page ID

| 281

gt

copy A Behrend Foundations of Information Systems |

TID Concept

Record addressing with indirection inside the page Each page contains an array with record positions TID of a record consist of page id and index in position array

Pros Access with one page access (two pages in case of overflow) Stable No mapping table required

Operations Insert Reuse unused

position or add position Delete Mark position

as unused in array Update Update all

positions in array Update with overflow Store record

as overflow record and store TID of overflow record at original position (No double overflow Update TID at original position)

Record

Overflow Record

| 282

gt

copy A Behrend Foundations of Information Systems |

Free Space Management

Problem In which page is enough space for new record

Solution Free space table lists for all pages how much space is left

Free space value Precise value Ceil(Log2(page size)) =gt 2 bytes for common page size of 4K Rough value use less bytes free space = (value page size)2^(bits per value)

Free space table With direct page addressing Assuming a single page can take n free space entries First page and each (n+1)-th page takes free space entries

With indirect page addressing Free space information stored in page table

copy A Behrend Foundations of Information Systems | 283

gt

|

Storage System

Buffer

File System

Hardware

Data System

Application

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

Physical Access Paths ndash Index Structures

| 284

gt

copy A Behrend Foundations of Information Systems |

Primary Index Secondary Index

Overview Indexes

Table scan Read all pages and for each record

evaluate the search criteria Pre-fetching

Index Scan Use index for search criteria

on one or more attributes Fast access to single values or value ranges of index attributes Logicalphysical sorting of values of key attributes (depending on index structure) Enforcing uniqueness

Types if indexes Primary (Clustered) Index

determines physical organization use for PK Secondary (Non-Clustered)

Index redundant access path

Pers(PID NAME AGE SALARY)

Age

Salary

| 285

gt

copy A Behrend Foundations of Information Systems |

Overview Indexes (2)

Choice of Access Paths Index scan Only useful for low selectivity

(low number of result tuples) Break even-point according to the

output ratio of the number of tuples (usually max 5) Requires statistics about data Additional costs for index storage

and updating

Table Scan adequateefficient for small tables

(eg 5 pages) Queries with high selectivity

(large result sets) 100-200MBs sequential read

~ 100 disk seekss

hit rate

Index Scan

Table Scan

access time

| 286

gt

copy A Behrend Foundations of Information Systems |

Classification of Index Structures

Classification

Multiway Trees Tree structure with multiple children per node Idea chose fan out so that node size suits page size

Onedimensional Index Structures

Key Comparison Key Transformation

Sequential Tree-Based Hash-Based

Prefix Trees (Tries)

Binary Search Trees Dynamic Static Linked Lists

(log seq) Seq Lists

(phys seq) Multiway

Trees Example B-Tree

| 287

gt

copy A Behrend Foundations of Information Systems |

B-Tree

free space

(Ki Di Pi) = entry min |P| = k+1 max |P| = 2k+1

keys lt K1 keys gt Kp Ki lt keys lt Ki+1

| 288

gt

copy A Behrend Foundations of Information Systems |

B-Tree (2)

Example Keys Agnostic to specific key semantic Only defined complete order required Could be of fixed or variable length

Operations Search for data for given key value Insertion and deletion of key-data pair

Payload Agnostic to specific data semantic Can be record or reference (TID) or

mix

B-Tree with k = 2 h = 3

| 289

gt

copy A Behrend Foundations of Information Systems |

Search in the B-Tree

Starting at the root node each node is searched from left to right 1) if Ki matches the desired key value the data record has been found (further

records with the same key value might be located in a sub-tree to which Pi-1 points) 2) if Ki is smaller than the desired value the search will be continued in the root of

the sub-tree identified by Pi-1 3) if Ki is larger than the desired value the comparison with Ki+1 is repeated 4) if K2k is also smaller than the desired value the search will be continued in the

sub-tree of P2k If itlsquos impossible to descend further into a sub-tree (2 or 4) (leaf node) The search is aborted no record with the desired key value is found

Search for 38 20 6

| 290

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (1)

Insertion Rule insert only into leaf nodes At Non-Leaf Nodes descend down the tree as for the search S le Ki follow Pi-1 S gt Ki check Ki+1 S gt K2k follow P2k

At Leaf Node Insert the data record according to the sorting order Special case leaf node is full (2k records) rarr split the leaf node

Splitting Generate a new leaf node Split the 2k+1 entries (in order)

into two leaf nodes first k entries rarr left node last k entries rarr right node

middle entry (k+1-th) is used as new ldquodiscriminatorrdquo (branching) and inserted into the parent node

| 291

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (2)

Node Splitting during Insertion Two possible situations after a split The parent node is full rarr repeat split on this level Enough space rarr FINISHED

Special case root split Split of the root node rarr New root with two successor nodes Height of a tree grows by 1 The tree has been split from the bottom to the top

Dynamic reorganization (self-balancing) No unloading or loading necessary Tree is always balanced But In case of many insertions deletions reorganization can be beneficial

| 292

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (3)

Insertion Example Order k = 1 n=2k Keys 1 5 2 6 7 4 8 3

Finally h=3

| 293

gt

copy A Behrend Foundations of Information Systems |

Insertion and Deletion in the B-Tree

Problem Insertion can create overflow Deletion can create underflow and overflow

Example Insertion of key 22

rarr Overflow rarr Split

Deletion of key 22 rarr Underflow need to access all four nodes finally same as input

Insert 22

| 294

gt

copy A Behrend Foundations of Information Systems |

Underflow Merge

Deletion in the B-Tree

Example Order k = 1 n=2k Delete key 3

| 295

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (2)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Remember Each path from the root to the leaf has the same length h

| 296

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (3)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Overflow Split

| 297

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (4)

Example Order k = 1 n=2k Delete key 3

Overflow Split

Root Split

| 298

gt

copy A Behrend Foundations of Information Systems |

Deletion Algorithm

Example ndash there are different algorithms Search the node that contains the key K to be deleted If key K is in a leaf node delete the key in the leaf node and handle potentially

resulting underflow by merging with sibling If key K is in an inner node pull up new discriminator from one of the successors Analyze which successor node of K has more elements left or right one

If both have the same number of elements decide for one Replace the key K to be deleted with the direct successor Krsquo from the left

successor node or with the direct successor Krsquorsquo from the right successor node respectively Delete Krsquo or Krsquorsquo from the respective successor node (recursively)

Note Major variants Merge (tis lecture) Re-distribution (instead of splitmerge in case of overflowunderflow the entries are

re-distributed under consideration of one or multiple adjacent nodes)

| 299

gt

copy A Behrend Foundations of Information Systems |

B-Trees B+-Trees and B-Trees

B+-Trees and B-Trees Data is only in leaf nodes Key redundancy but higher fan-out rarr lower tree high less IO Simpler delete procedure rarr requires only merging of nodes

Double linked list of all leaf nodes B-Trees Modified valid node sizes

from [k2k] to [43k2k] rarr better node utilization but more splitsmerges

Example Secondary index Non unique

B-Tree with k = 2 h = 3

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 10: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

| 272

gt

copy A Behrend Foundations of Information Systems |

Transaction Management

Principle of a transaction Sequence of successive DB operations that transform a database from a consistent

state into another consistent state surrounded by BOT EOT (Commit Abort)

Properties ACID Atomicity Consistency Isolation Durability A transaction will always come to an end Normal (commit) changes are permanently stored within the DB Abnormal (abort rollback) already composed changes are taken back

Note EOT state must not be different from BOT state

BOT(begin of transaction)

EOT(end of transaction)

possibly inconsistent database

consistentdatabase

consistentdatabase

DB DB

DML operations

| 273

gt

copy A Behrend Foundations of Information Systems |

ACID Properties of Transactions

Atomicity Indivisibility due to the transaction definition (Begin - End) All-or-nothing principle ie the DBS guarantees Either the complete execution of a transaction hellip hellip or the ineffectiveness of the whole transaction (and of all associated operations)

Consistency A successful transaction guarantees that all consistency requirements (integrity

requirements) have been met

Isolation Multiple transactions run isolated from each other and do not use (inconsistent)

intermediate results from other transactions

Durability All results of successful transactions have to be made persistent

| 274

gt

copy A Behrend Foundations of Information Systems |

Motivation

Atomicity Part of the transaction is done but we want to cancel it ABORTROLLBACK System crashes during transaction some changes made it to the disk some did not

Durability Transaction finished user notified COMMIT System crashes before changes sent successfully to disk (asynchronous write)

Consistency Physical consistency Correctness of the storage and access structures Completely executed modification operations preserve the consistency

Logical consistency Correctness of data contents ndash correspond to a (possible) state of the real world Completely executed transactions preserve the logical consistency

- All modifications of finished transactions are included - No modifications of open transactions are included

Remember Logical consistency requires physical consistency in the first place

UNDO Recovery

REDO Recovery

UNDO Recovery for consistency-related rollbacks

| 275

gt

copy A Behrend Foundations of Information Systems |

Reasons for crashes

Transaction error Violation of system restrictions Violation of security regulations Excessive resource requirements deadlocks

Application-related errors eg wrong operations and values ROLLBACK

System error System crash with loss of main-memory contents Database system operating system hardware power failure

Device error (especially storage-medium error) Destruction of secondary storage systems

Catastrophes Destruction of the computing center

| 276

gt

copy A Behrend Foundations of Information Systems |

Guarantee Atomicity amp Durability

Assumptions System may crash but the disk is durable The only atomicity guarantee is that a disk block write is atomic

Materialization strategy Preferred Policy StealNo Force This combination is most complicated but allows for highest performance No Force complicates enforcing Durability What if system crashes before a modified page written by a committed

transaction makes it to disk Write as little as possible in a convenient place at commit time to support

REDOing modifications Steal complicates enforcing Atomicity What if the transaction that performed udpates aborts What if system crashes before transaction is finished Must remember the old value of P (to support UNDOing the write to page P)

copy A Behrend Foundations of Information Systems | 277

gt

|

Storage System

Buffer

File System

Hardware

Data System

Application

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

Record Management

| 278

gt

copy A Behrend Foundations of Information Systems |

Record

Record Package of fields that together describe a thing a person a fact etc Each fields represents on property of the entity described by the record Similar to a struct in C Variable length (in contrast to pages)

Record Manager Organizes physical storage of records in pages Operations Get Insert Update Delete Scan Agnostic to record structure and semantic

records considered as byte strings of variable length Structure and content of record is defined be Access System and application

Challenges Record addressing Free space management

| 279

gt

copy A Behrend Foundations of Information Systems |

Record Addressing

Record address Identifier for records used to address records eg in indexes or query processing Assigned during insert of a record

Goals Stability of identifier Fast and direct access Less organizational overhead

Direct addressing Byte address or position number in file or page Instable Byte address If record grows in length following records would get new address Position number Insert and delete operations change series or records

Indirect addressing Surrogate with mapping table (complete indirection) Tuple Identifier (TID concept)

| 280

gt

copy A Behrend Foundations of Information Systems |

Surrogate with Mapping Table

Surrogate Record type + serial number Serial number remains constant during recordrsquos life time

Mapping table Maps

surrogate to page

Problems Where to store mapping table How can it be extended How to search mapping table efficiently

rarr H2 use B-Tree to store mapping table

Mapping Table Surrogate | Page ID

| 281

gt

copy A Behrend Foundations of Information Systems |

TID Concept

Record addressing with indirection inside the page Each page contains an array with record positions TID of a record consist of page id and index in position array

Pros Access with one page access (two pages in case of overflow) Stable No mapping table required

Operations Insert Reuse unused

position or add position Delete Mark position

as unused in array Update Update all

positions in array Update with overflow Store record

as overflow record and store TID of overflow record at original position (No double overflow Update TID at original position)

Record

Overflow Record

| 282

gt

copy A Behrend Foundations of Information Systems |

Free Space Management

Problem In which page is enough space for new record

Solution Free space table lists for all pages how much space is left

Free space value Precise value Ceil(Log2(page size)) =gt 2 bytes for common page size of 4K Rough value use less bytes free space = (value page size)2^(bits per value)

Free space table With direct page addressing Assuming a single page can take n free space entries First page and each (n+1)-th page takes free space entries

With indirect page addressing Free space information stored in page table

copy A Behrend Foundations of Information Systems | 283

gt

|

Storage System

Buffer

File System

Hardware

Data System

Application

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

Physical Access Paths ndash Index Structures

| 284

gt

copy A Behrend Foundations of Information Systems |

Primary Index Secondary Index

Overview Indexes

Table scan Read all pages and for each record

evaluate the search criteria Pre-fetching

Index Scan Use index for search criteria

on one or more attributes Fast access to single values or value ranges of index attributes Logicalphysical sorting of values of key attributes (depending on index structure) Enforcing uniqueness

Types if indexes Primary (Clustered) Index

determines physical organization use for PK Secondary (Non-Clustered)

Index redundant access path

Pers(PID NAME AGE SALARY)

Age

Salary

| 285

gt

copy A Behrend Foundations of Information Systems |

Overview Indexes (2)

Choice of Access Paths Index scan Only useful for low selectivity

(low number of result tuples) Break even-point according to the

output ratio of the number of tuples (usually max 5) Requires statistics about data Additional costs for index storage

and updating

Table Scan adequateefficient for small tables

(eg 5 pages) Queries with high selectivity

(large result sets) 100-200MBs sequential read

~ 100 disk seekss

hit rate

Index Scan

Table Scan

access time

| 286

gt

copy A Behrend Foundations of Information Systems |

Classification of Index Structures

Classification

Multiway Trees Tree structure with multiple children per node Idea chose fan out so that node size suits page size

Onedimensional Index Structures

Key Comparison Key Transformation

Sequential Tree-Based Hash-Based

Prefix Trees (Tries)

Binary Search Trees Dynamic Static Linked Lists

(log seq) Seq Lists

(phys seq) Multiway

Trees Example B-Tree

| 287

gt

copy A Behrend Foundations of Information Systems |

B-Tree

free space

(Ki Di Pi) = entry min |P| = k+1 max |P| = 2k+1

keys lt K1 keys gt Kp Ki lt keys lt Ki+1

| 288

gt

copy A Behrend Foundations of Information Systems |

B-Tree (2)

Example Keys Agnostic to specific key semantic Only defined complete order required Could be of fixed or variable length

Operations Search for data for given key value Insertion and deletion of key-data pair

Payload Agnostic to specific data semantic Can be record or reference (TID) or

mix

B-Tree with k = 2 h = 3

| 289

gt

copy A Behrend Foundations of Information Systems |

Search in the B-Tree

Starting at the root node each node is searched from left to right 1) if Ki matches the desired key value the data record has been found (further

records with the same key value might be located in a sub-tree to which Pi-1 points) 2) if Ki is smaller than the desired value the search will be continued in the root of

the sub-tree identified by Pi-1 3) if Ki is larger than the desired value the comparison with Ki+1 is repeated 4) if K2k is also smaller than the desired value the search will be continued in the

sub-tree of P2k If itlsquos impossible to descend further into a sub-tree (2 or 4) (leaf node) The search is aborted no record with the desired key value is found

Search for 38 20 6

| 290

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (1)

Insertion Rule insert only into leaf nodes At Non-Leaf Nodes descend down the tree as for the search S le Ki follow Pi-1 S gt Ki check Ki+1 S gt K2k follow P2k

At Leaf Node Insert the data record according to the sorting order Special case leaf node is full (2k records) rarr split the leaf node

Splitting Generate a new leaf node Split the 2k+1 entries (in order)

into two leaf nodes first k entries rarr left node last k entries rarr right node

middle entry (k+1-th) is used as new ldquodiscriminatorrdquo (branching) and inserted into the parent node

| 291

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (2)

Node Splitting during Insertion Two possible situations after a split The parent node is full rarr repeat split on this level Enough space rarr FINISHED

Special case root split Split of the root node rarr New root with two successor nodes Height of a tree grows by 1 The tree has been split from the bottom to the top

Dynamic reorganization (self-balancing) No unloading or loading necessary Tree is always balanced But In case of many insertions deletions reorganization can be beneficial

| 292

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (3)

Insertion Example Order k = 1 n=2k Keys 1 5 2 6 7 4 8 3

Finally h=3

| 293

gt

copy A Behrend Foundations of Information Systems |

Insertion and Deletion in the B-Tree

Problem Insertion can create overflow Deletion can create underflow and overflow

Example Insertion of key 22

rarr Overflow rarr Split

Deletion of key 22 rarr Underflow need to access all four nodes finally same as input

Insert 22

| 294

gt

copy A Behrend Foundations of Information Systems |

Underflow Merge

Deletion in the B-Tree

Example Order k = 1 n=2k Delete key 3

| 295

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (2)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Remember Each path from the root to the leaf has the same length h

| 296

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (3)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Overflow Split

| 297

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (4)

Example Order k = 1 n=2k Delete key 3

Overflow Split

Root Split

| 298

gt

copy A Behrend Foundations of Information Systems |

Deletion Algorithm

Example ndash there are different algorithms Search the node that contains the key K to be deleted If key K is in a leaf node delete the key in the leaf node and handle potentially

resulting underflow by merging with sibling If key K is in an inner node pull up new discriminator from one of the successors Analyze which successor node of K has more elements left or right one

If both have the same number of elements decide for one Replace the key K to be deleted with the direct successor Krsquo from the left

successor node or with the direct successor Krsquorsquo from the right successor node respectively Delete Krsquo or Krsquorsquo from the respective successor node (recursively)

Note Major variants Merge (tis lecture) Re-distribution (instead of splitmerge in case of overflowunderflow the entries are

re-distributed under consideration of one or multiple adjacent nodes)

| 299

gt

copy A Behrend Foundations of Information Systems |

B-Trees B+-Trees and B-Trees

B+-Trees and B-Trees Data is only in leaf nodes Key redundancy but higher fan-out rarr lower tree high less IO Simpler delete procedure rarr requires only merging of nodes

Double linked list of all leaf nodes B-Trees Modified valid node sizes

from [k2k] to [43k2k] rarr better node utilization but more splitsmerges

Example Secondary index Non unique

B-Tree with k = 2 h = 3

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 11: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

| 273

gt

copy A Behrend Foundations of Information Systems |

ACID Properties of Transactions

Atomicity Indivisibility due to the transaction definition (Begin - End) All-or-nothing principle ie the DBS guarantees Either the complete execution of a transaction hellip hellip or the ineffectiveness of the whole transaction (and of all associated operations)

Consistency A successful transaction guarantees that all consistency requirements (integrity

requirements) have been met

Isolation Multiple transactions run isolated from each other and do not use (inconsistent)

intermediate results from other transactions

Durability All results of successful transactions have to be made persistent

| 274

gt

copy A Behrend Foundations of Information Systems |

Motivation

Atomicity Part of the transaction is done but we want to cancel it ABORTROLLBACK System crashes during transaction some changes made it to the disk some did not

Durability Transaction finished user notified COMMIT System crashes before changes sent successfully to disk (asynchronous write)

Consistency Physical consistency Correctness of the storage and access structures Completely executed modification operations preserve the consistency

Logical consistency Correctness of data contents ndash correspond to a (possible) state of the real world Completely executed transactions preserve the logical consistency

- All modifications of finished transactions are included - No modifications of open transactions are included

Remember Logical consistency requires physical consistency in the first place

UNDO Recovery

REDO Recovery

UNDO Recovery for consistency-related rollbacks

| 275

gt

copy A Behrend Foundations of Information Systems |

Reasons for crashes

Transaction error Violation of system restrictions Violation of security regulations Excessive resource requirements deadlocks

Application-related errors eg wrong operations and values ROLLBACK

System error System crash with loss of main-memory contents Database system operating system hardware power failure

Device error (especially storage-medium error) Destruction of secondary storage systems

Catastrophes Destruction of the computing center

| 276

gt

copy A Behrend Foundations of Information Systems |

Guarantee Atomicity amp Durability

Assumptions System may crash but the disk is durable The only atomicity guarantee is that a disk block write is atomic

Materialization strategy Preferred Policy StealNo Force This combination is most complicated but allows for highest performance No Force complicates enforcing Durability What if system crashes before a modified page written by a committed

transaction makes it to disk Write as little as possible in a convenient place at commit time to support

REDOing modifications Steal complicates enforcing Atomicity What if the transaction that performed udpates aborts What if system crashes before transaction is finished Must remember the old value of P (to support UNDOing the write to page P)

copy A Behrend Foundations of Information Systems | 277

gt

|

Storage System

Buffer

File System

Hardware

Data System

Application

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

Record Management

| 278

gt

copy A Behrend Foundations of Information Systems |

Record

Record Package of fields that together describe a thing a person a fact etc Each fields represents on property of the entity described by the record Similar to a struct in C Variable length (in contrast to pages)

Record Manager Organizes physical storage of records in pages Operations Get Insert Update Delete Scan Agnostic to record structure and semantic

records considered as byte strings of variable length Structure and content of record is defined be Access System and application

Challenges Record addressing Free space management

| 279

gt

copy A Behrend Foundations of Information Systems |

Record Addressing

Record address Identifier for records used to address records eg in indexes or query processing Assigned during insert of a record

Goals Stability of identifier Fast and direct access Less organizational overhead

Direct addressing Byte address or position number in file or page Instable Byte address If record grows in length following records would get new address Position number Insert and delete operations change series or records

Indirect addressing Surrogate with mapping table (complete indirection) Tuple Identifier (TID concept)

| 280

gt

copy A Behrend Foundations of Information Systems |

Surrogate with Mapping Table

Surrogate Record type + serial number Serial number remains constant during recordrsquos life time

Mapping table Maps

surrogate to page

Problems Where to store mapping table How can it be extended How to search mapping table efficiently

rarr H2 use B-Tree to store mapping table

Mapping Table Surrogate | Page ID

| 281

gt

copy A Behrend Foundations of Information Systems |

TID Concept

Record addressing with indirection inside the page Each page contains an array with record positions TID of a record consist of page id and index in position array

Pros Access with one page access (two pages in case of overflow) Stable No mapping table required

Operations Insert Reuse unused

position or add position Delete Mark position

as unused in array Update Update all

positions in array Update with overflow Store record

as overflow record and store TID of overflow record at original position (No double overflow Update TID at original position)

Record

Overflow Record

| 282

gt

copy A Behrend Foundations of Information Systems |

Free Space Management

Problem In which page is enough space for new record

Solution Free space table lists for all pages how much space is left

Free space value Precise value Ceil(Log2(page size)) =gt 2 bytes for common page size of 4K Rough value use less bytes free space = (value page size)2^(bits per value)

Free space table With direct page addressing Assuming a single page can take n free space entries First page and each (n+1)-th page takes free space entries

With indirect page addressing Free space information stored in page table

copy A Behrend Foundations of Information Systems | 283

gt

|

Storage System

Buffer

File System

Hardware

Data System

Application

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

Physical Access Paths ndash Index Structures

| 284

gt

copy A Behrend Foundations of Information Systems |

Primary Index Secondary Index

Overview Indexes

Table scan Read all pages and for each record

evaluate the search criteria Pre-fetching

Index Scan Use index for search criteria

on one or more attributes Fast access to single values or value ranges of index attributes Logicalphysical sorting of values of key attributes (depending on index structure) Enforcing uniqueness

Types if indexes Primary (Clustered) Index

determines physical organization use for PK Secondary (Non-Clustered)

Index redundant access path

Pers(PID NAME AGE SALARY)

Age

Salary

| 285

gt

copy A Behrend Foundations of Information Systems |

Overview Indexes (2)

Choice of Access Paths Index scan Only useful for low selectivity

(low number of result tuples) Break even-point according to the

output ratio of the number of tuples (usually max 5) Requires statistics about data Additional costs for index storage

and updating

Table Scan adequateefficient for small tables

(eg 5 pages) Queries with high selectivity

(large result sets) 100-200MBs sequential read

~ 100 disk seekss

hit rate

Index Scan

Table Scan

access time

| 286

gt

copy A Behrend Foundations of Information Systems |

Classification of Index Structures

Classification

Multiway Trees Tree structure with multiple children per node Idea chose fan out so that node size suits page size

Onedimensional Index Structures

Key Comparison Key Transformation

Sequential Tree-Based Hash-Based

Prefix Trees (Tries)

Binary Search Trees Dynamic Static Linked Lists

(log seq) Seq Lists

(phys seq) Multiway

Trees Example B-Tree

| 287

gt

copy A Behrend Foundations of Information Systems |

B-Tree

free space

(Ki Di Pi) = entry min |P| = k+1 max |P| = 2k+1

keys lt K1 keys gt Kp Ki lt keys lt Ki+1

| 288

gt

copy A Behrend Foundations of Information Systems |

B-Tree (2)

Example Keys Agnostic to specific key semantic Only defined complete order required Could be of fixed or variable length

Operations Search for data for given key value Insertion and deletion of key-data pair

Payload Agnostic to specific data semantic Can be record or reference (TID) or

mix

B-Tree with k = 2 h = 3

| 289

gt

copy A Behrend Foundations of Information Systems |

Search in the B-Tree

Starting at the root node each node is searched from left to right 1) if Ki matches the desired key value the data record has been found (further

records with the same key value might be located in a sub-tree to which Pi-1 points) 2) if Ki is smaller than the desired value the search will be continued in the root of

the sub-tree identified by Pi-1 3) if Ki is larger than the desired value the comparison with Ki+1 is repeated 4) if K2k is also smaller than the desired value the search will be continued in the

sub-tree of P2k If itlsquos impossible to descend further into a sub-tree (2 or 4) (leaf node) The search is aborted no record with the desired key value is found

Search for 38 20 6

| 290

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (1)

Insertion Rule insert only into leaf nodes At Non-Leaf Nodes descend down the tree as for the search S le Ki follow Pi-1 S gt Ki check Ki+1 S gt K2k follow P2k

At Leaf Node Insert the data record according to the sorting order Special case leaf node is full (2k records) rarr split the leaf node

Splitting Generate a new leaf node Split the 2k+1 entries (in order)

into two leaf nodes first k entries rarr left node last k entries rarr right node

middle entry (k+1-th) is used as new ldquodiscriminatorrdquo (branching) and inserted into the parent node

| 291

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (2)

Node Splitting during Insertion Two possible situations after a split The parent node is full rarr repeat split on this level Enough space rarr FINISHED

Special case root split Split of the root node rarr New root with two successor nodes Height of a tree grows by 1 The tree has been split from the bottom to the top

Dynamic reorganization (self-balancing) No unloading or loading necessary Tree is always balanced But In case of many insertions deletions reorganization can be beneficial

| 292

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (3)

Insertion Example Order k = 1 n=2k Keys 1 5 2 6 7 4 8 3

Finally h=3

| 293

gt

copy A Behrend Foundations of Information Systems |

Insertion and Deletion in the B-Tree

Problem Insertion can create overflow Deletion can create underflow and overflow

Example Insertion of key 22

rarr Overflow rarr Split

Deletion of key 22 rarr Underflow need to access all four nodes finally same as input

Insert 22

| 294

gt

copy A Behrend Foundations of Information Systems |

Underflow Merge

Deletion in the B-Tree

Example Order k = 1 n=2k Delete key 3

| 295

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (2)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Remember Each path from the root to the leaf has the same length h

| 296

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (3)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Overflow Split

| 297

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (4)

Example Order k = 1 n=2k Delete key 3

Overflow Split

Root Split

| 298

gt

copy A Behrend Foundations of Information Systems |

Deletion Algorithm

Example ndash there are different algorithms Search the node that contains the key K to be deleted If key K is in a leaf node delete the key in the leaf node and handle potentially

resulting underflow by merging with sibling If key K is in an inner node pull up new discriminator from one of the successors Analyze which successor node of K has more elements left or right one

If both have the same number of elements decide for one Replace the key K to be deleted with the direct successor Krsquo from the left

successor node or with the direct successor Krsquorsquo from the right successor node respectively Delete Krsquo or Krsquorsquo from the respective successor node (recursively)

Note Major variants Merge (tis lecture) Re-distribution (instead of splitmerge in case of overflowunderflow the entries are

re-distributed under consideration of one or multiple adjacent nodes)

| 299

gt

copy A Behrend Foundations of Information Systems |

B-Trees B+-Trees and B-Trees

B+-Trees and B-Trees Data is only in leaf nodes Key redundancy but higher fan-out rarr lower tree high less IO Simpler delete procedure rarr requires only merging of nodes

Double linked list of all leaf nodes B-Trees Modified valid node sizes

from [k2k] to [43k2k] rarr better node utilization but more splitsmerges

Example Secondary index Non unique

B-Tree with k = 2 h = 3

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 12: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

| 274

gt

copy A Behrend Foundations of Information Systems |

Motivation

Atomicity Part of the transaction is done but we want to cancel it ABORTROLLBACK System crashes during transaction some changes made it to the disk some did not

Durability Transaction finished user notified COMMIT System crashes before changes sent successfully to disk (asynchronous write)

Consistency Physical consistency Correctness of the storage and access structures Completely executed modification operations preserve the consistency

Logical consistency Correctness of data contents ndash correspond to a (possible) state of the real world Completely executed transactions preserve the logical consistency

- All modifications of finished transactions are included - No modifications of open transactions are included

Remember Logical consistency requires physical consistency in the first place

UNDO Recovery

REDO Recovery

UNDO Recovery for consistency-related rollbacks

| 275

gt

copy A Behrend Foundations of Information Systems |

Reasons for crashes

Transaction error Violation of system restrictions Violation of security regulations Excessive resource requirements deadlocks

Application-related errors eg wrong operations and values ROLLBACK

System error System crash with loss of main-memory contents Database system operating system hardware power failure

Device error (especially storage-medium error) Destruction of secondary storage systems

Catastrophes Destruction of the computing center

| 276

gt

copy A Behrend Foundations of Information Systems |

Guarantee Atomicity amp Durability

Assumptions System may crash but the disk is durable The only atomicity guarantee is that a disk block write is atomic

Materialization strategy Preferred Policy StealNo Force This combination is most complicated but allows for highest performance No Force complicates enforcing Durability What if system crashes before a modified page written by a committed

transaction makes it to disk Write as little as possible in a convenient place at commit time to support

REDOing modifications Steal complicates enforcing Atomicity What if the transaction that performed udpates aborts What if system crashes before transaction is finished Must remember the old value of P (to support UNDOing the write to page P)

copy A Behrend Foundations of Information Systems | 277

gt

|

Storage System

Buffer

File System

Hardware

Data System

Application

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

Record Management

| 278

gt

copy A Behrend Foundations of Information Systems |

Record

Record Package of fields that together describe a thing a person a fact etc Each fields represents on property of the entity described by the record Similar to a struct in C Variable length (in contrast to pages)

Record Manager Organizes physical storage of records in pages Operations Get Insert Update Delete Scan Agnostic to record structure and semantic

records considered as byte strings of variable length Structure and content of record is defined be Access System and application

Challenges Record addressing Free space management

| 279

gt

copy A Behrend Foundations of Information Systems |

Record Addressing

Record address Identifier for records used to address records eg in indexes or query processing Assigned during insert of a record

Goals Stability of identifier Fast and direct access Less organizational overhead

Direct addressing Byte address or position number in file or page Instable Byte address If record grows in length following records would get new address Position number Insert and delete operations change series or records

Indirect addressing Surrogate with mapping table (complete indirection) Tuple Identifier (TID concept)

| 280

gt

copy A Behrend Foundations of Information Systems |

Surrogate with Mapping Table

Surrogate Record type + serial number Serial number remains constant during recordrsquos life time

Mapping table Maps

surrogate to page

Problems Where to store mapping table How can it be extended How to search mapping table efficiently

rarr H2 use B-Tree to store mapping table

Mapping Table Surrogate | Page ID

| 281

gt

copy A Behrend Foundations of Information Systems |

TID Concept

Record addressing with indirection inside the page Each page contains an array with record positions TID of a record consist of page id and index in position array

Pros Access with one page access (two pages in case of overflow) Stable No mapping table required

Operations Insert Reuse unused

position or add position Delete Mark position

as unused in array Update Update all

positions in array Update with overflow Store record

as overflow record and store TID of overflow record at original position (No double overflow Update TID at original position)

Record

Overflow Record

| 282

gt

copy A Behrend Foundations of Information Systems |

Free Space Management

Problem In which page is enough space for new record

Solution Free space table lists for all pages how much space is left

Free space value Precise value Ceil(Log2(page size)) =gt 2 bytes for common page size of 4K Rough value use less bytes free space = (value page size)2^(bits per value)

Free space table With direct page addressing Assuming a single page can take n free space entries First page and each (n+1)-th page takes free space entries

With indirect page addressing Free space information stored in page table

copy A Behrend Foundations of Information Systems | 283

gt

|

Storage System

Buffer

File System

Hardware

Data System

Application

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

Physical Access Paths ndash Index Structures

| 284

gt

copy A Behrend Foundations of Information Systems |

Primary Index Secondary Index

Overview Indexes

Table scan Read all pages and for each record

evaluate the search criteria Pre-fetching

Index Scan Use index for search criteria

on one or more attributes Fast access to single values or value ranges of index attributes Logicalphysical sorting of values of key attributes (depending on index structure) Enforcing uniqueness

Types if indexes Primary (Clustered) Index

determines physical organization use for PK Secondary (Non-Clustered)

Index redundant access path

Pers(PID NAME AGE SALARY)

Age

Salary

| 285

gt

copy A Behrend Foundations of Information Systems |

Overview Indexes (2)

Choice of Access Paths Index scan Only useful for low selectivity

(low number of result tuples) Break even-point according to the

output ratio of the number of tuples (usually max 5) Requires statistics about data Additional costs for index storage

and updating

Table Scan adequateefficient for small tables

(eg 5 pages) Queries with high selectivity

(large result sets) 100-200MBs sequential read

~ 100 disk seekss

hit rate

Index Scan

Table Scan

access time

| 286

gt

copy A Behrend Foundations of Information Systems |

Classification of Index Structures

Classification

Multiway Trees Tree structure with multiple children per node Idea chose fan out so that node size suits page size

Onedimensional Index Structures

Key Comparison Key Transformation

Sequential Tree-Based Hash-Based

Prefix Trees (Tries)

Binary Search Trees Dynamic Static Linked Lists

(log seq) Seq Lists

(phys seq) Multiway

Trees Example B-Tree

| 287

gt

copy A Behrend Foundations of Information Systems |

B-Tree

free space

(Ki Di Pi) = entry min |P| = k+1 max |P| = 2k+1

keys lt K1 keys gt Kp Ki lt keys lt Ki+1

| 288

gt

copy A Behrend Foundations of Information Systems |

B-Tree (2)

Example Keys Agnostic to specific key semantic Only defined complete order required Could be of fixed or variable length

Operations Search for data for given key value Insertion and deletion of key-data pair

Payload Agnostic to specific data semantic Can be record or reference (TID) or

mix

B-Tree with k = 2 h = 3

| 289

gt

copy A Behrend Foundations of Information Systems |

Search in the B-Tree

Starting at the root node each node is searched from left to right 1) if Ki matches the desired key value the data record has been found (further

records with the same key value might be located in a sub-tree to which Pi-1 points) 2) if Ki is smaller than the desired value the search will be continued in the root of

the sub-tree identified by Pi-1 3) if Ki is larger than the desired value the comparison with Ki+1 is repeated 4) if K2k is also smaller than the desired value the search will be continued in the

sub-tree of P2k If itlsquos impossible to descend further into a sub-tree (2 or 4) (leaf node) The search is aborted no record with the desired key value is found

Search for 38 20 6

| 290

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (1)

Insertion Rule insert only into leaf nodes At Non-Leaf Nodes descend down the tree as for the search S le Ki follow Pi-1 S gt Ki check Ki+1 S gt K2k follow P2k

At Leaf Node Insert the data record according to the sorting order Special case leaf node is full (2k records) rarr split the leaf node

Splitting Generate a new leaf node Split the 2k+1 entries (in order)

into two leaf nodes first k entries rarr left node last k entries rarr right node

middle entry (k+1-th) is used as new ldquodiscriminatorrdquo (branching) and inserted into the parent node

| 291

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (2)

Node Splitting during Insertion Two possible situations after a split The parent node is full rarr repeat split on this level Enough space rarr FINISHED

Special case root split Split of the root node rarr New root with two successor nodes Height of a tree grows by 1 The tree has been split from the bottom to the top

Dynamic reorganization (self-balancing) No unloading or loading necessary Tree is always balanced But In case of many insertions deletions reorganization can be beneficial

| 292

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (3)

Insertion Example Order k = 1 n=2k Keys 1 5 2 6 7 4 8 3

Finally h=3

| 293

gt

copy A Behrend Foundations of Information Systems |

Insertion and Deletion in the B-Tree

Problem Insertion can create overflow Deletion can create underflow and overflow

Example Insertion of key 22

rarr Overflow rarr Split

Deletion of key 22 rarr Underflow need to access all four nodes finally same as input

Insert 22

| 294

gt

copy A Behrend Foundations of Information Systems |

Underflow Merge

Deletion in the B-Tree

Example Order k = 1 n=2k Delete key 3

| 295

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (2)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Remember Each path from the root to the leaf has the same length h

| 296

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (3)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Overflow Split

| 297

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (4)

Example Order k = 1 n=2k Delete key 3

Overflow Split

Root Split

| 298

gt

copy A Behrend Foundations of Information Systems |

Deletion Algorithm

Example ndash there are different algorithms Search the node that contains the key K to be deleted If key K is in a leaf node delete the key in the leaf node and handle potentially

resulting underflow by merging with sibling If key K is in an inner node pull up new discriminator from one of the successors Analyze which successor node of K has more elements left or right one

If both have the same number of elements decide for one Replace the key K to be deleted with the direct successor Krsquo from the left

successor node or with the direct successor Krsquorsquo from the right successor node respectively Delete Krsquo or Krsquorsquo from the respective successor node (recursively)

Note Major variants Merge (tis lecture) Re-distribution (instead of splitmerge in case of overflowunderflow the entries are

re-distributed under consideration of one or multiple adjacent nodes)

| 299

gt

copy A Behrend Foundations of Information Systems |

B-Trees B+-Trees and B-Trees

B+-Trees and B-Trees Data is only in leaf nodes Key redundancy but higher fan-out rarr lower tree high less IO Simpler delete procedure rarr requires only merging of nodes

Double linked list of all leaf nodes B-Trees Modified valid node sizes

from [k2k] to [43k2k] rarr better node utilization but more splitsmerges

Example Secondary index Non unique

B-Tree with k = 2 h = 3

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 13: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

| 275

gt

copy A Behrend Foundations of Information Systems |

Reasons for crashes

Transaction error Violation of system restrictions Violation of security regulations Excessive resource requirements deadlocks

Application-related errors eg wrong operations and values ROLLBACK

System error System crash with loss of main-memory contents Database system operating system hardware power failure

Device error (especially storage-medium error) Destruction of secondary storage systems

Catastrophes Destruction of the computing center

| 276

gt

copy A Behrend Foundations of Information Systems |

Guarantee Atomicity amp Durability

Assumptions System may crash but the disk is durable The only atomicity guarantee is that a disk block write is atomic

Materialization strategy Preferred Policy StealNo Force This combination is most complicated but allows for highest performance No Force complicates enforcing Durability What if system crashes before a modified page written by a committed

transaction makes it to disk Write as little as possible in a convenient place at commit time to support

REDOing modifications Steal complicates enforcing Atomicity What if the transaction that performed udpates aborts What if system crashes before transaction is finished Must remember the old value of P (to support UNDOing the write to page P)

copy A Behrend Foundations of Information Systems | 277

gt

|

Storage System

Buffer

File System

Hardware

Data System

Application

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

Record Management

| 278

gt

copy A Behrend Foundations of Information Systems |

Record

Record Package of fields that together describe a thing a person a fact etc Each fields represents on property of the entity described by the record Similar to a struct in C Variable length (in contrast to pages)

Record Manager Organizes physical storage of records in pages Operations Get Insert Update Delete Scan Agnostic to record structure and semantic

records considered as byte strings of variable length Structure and content of record is defined be Access System and application

Challenges Record addressing Free space management

| 279

gt

copy A Behrend Foundations of Information Systems |

Record Addressing

Record address Identifier for records used to address records eg in indexes or query processing Assigned during insert of a record

Goals Stability of identifier Fast and direct access Less organizational overhead

Direct addressing Byte address or position number in file or page Instable Byte address If record grows in length following records would get new address Position number Insert and delete operations change series or records

Indirect addressing Surrogate with mapping table (complete indirection) Tuple Identifier (TID concept)

| 280

gt

copy A Behrend Foundations of Information Systems |

Surrogate with Mapping Table

Surrogate Record type + serial number Serial number remains constant during recordrsquos life time

Mapping table Maps

surrogate to page

Problems Where to store mapping table How can it be extended How to search mapping table efficiently

rarr H2 use B-Tree to store mapping table

Mapping Table Surrogate | Page ID

| 281

gt

copy A Behrend Foundations of Information Systems |

TID Concept

Record addressing with indirection inside the page Each page contains an array with record positions TID of a record consist of page id and index in position array

Pros Access with one page access (two pages in case of overflow) Stable No mapping table required

Operations Insert Reuse unused

position or add position Delete Mark position

as unused in array Update Update all

positions in array Update with overflow Store record

as overflow record and store TID of overflow record at original position (No double overflow Update TID at original position)

Record

Overflow Record

| 282

gt

copy A Behrend Foundations of Information Systems |

Free Space Management

Problem In which page is enough space for new record

Solution Free space table lists for all pages how much space is left

Free space value Precise value Ceil(Log2(page size)) =gt 2 bytes for common page size of 4K Rough value use less bytes free space = (value page size)2^(bits per value)

Free space table With direct page addressing Assuming a single page can take n free space entries First page and each (n+1)-th page takes free space entries

With indirect page addressing Free space information stored in page table

copy A Behrend Foundations of Information Systems | 283

gt

|

Storage System

Buffer

File System

Hardware

Data System

Application

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

Physical Access Paths ndash Index Structures

| 284

gt

copy A Behrend Foundations of Information Systems |

Primary Index Secondary Index

Overview Indexes

Table scan Read all pages and for each record

evaluate the search criteria Pre-fetching

Index Scan Use index for search criteria

on one or more attributes Fast access to single values or value ranges of index attributes Logicalphysical sorting of values of key attributes (depending on index structure) Enforcing uniqueness

Types if indexes Primary (Clustered) Index

determines physical organization use for PK Secondary (Non-Clustered)

Index redundant access path

Pers(PID NAME AGE SALARY)

Age

Salary

| 285

gt

copy A Behrend Foundations of Information Systems |

Overview Indexes (2)

Choice of Access Paths Index scan Only useful for low selectivity

(low number of result tuples) Break even-point according to the

output ratio of the number of tuples (usually max 5) Requires statistics about data Additional costs for index storage

and updating

Table Scan adequateefficient for small tables

(eg 5 pages) Queries with high selectivity

(large result sets) 100-200MBs sequential read

~ 100 disk seekss

hit rate

Index Scan

Table Scan

access time

| 286

gt

copy A Behrend Foundations of Information Systems |

Classification of Index Structures

Classification

Multiway Trees Tree structure with multiple children per node Idea chose fan out so that node size suits page size

Onedimensional Index Structures

Key Comparison Key Transformation

Sequential Tree-Based Hash-Based

Prefix Trees (Tries)

Binary Search Trees Dynamic Static Linked Lists

(log seq) Seq Lists

(phys seq) Multiway

Trees Example B-Tree

| 287

gt

copy A Behrend Foundations of Information Systems |

B-Tree

free space

(Ki Di Pi) = entry min |P| = k+1 max |P| = 2k+1

keys lt K1 keys gt Kp Ki lt keys lt Ki+1

| 288

gt

copy A Behrend Foundations of Information Systems |

B-Tree (2)

Example Keys Agnostic to specific key semantic Only defined complete order required Could be of fixed or variable length

Operations Search for data for given key value Insertion and deletion of key-data pair

Payload Agnostic to specific data semantic Can be record or reference (TID) or

mix

B-Tree with k = 2 h = 3

| 289

gt

copy A Behrend Foundations of Information Systems |

Search in the B-Tree

Starting at the root node each node is searched from left to right 1) if Ki matches the desired key value the data record has been found (further

records with the same key value might be located in a sub-tree to which Pi-1 points) 2) if Ki is smaller than the desired value the search will be continued in the root of

the sub-tree identified by Pi-1 3) if Ki is larger than the desired value the comparison with Ki+1 is repeated 4) if K2k is also smaller than the desired value the search will be continued in the

sub-tree of P2k If itlsquos impossible to descend further into a sub-tree (2 or 4) (leaf node) The search is aborted no record with the desired key value is found

Search for 38 20 6

| 290

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (1)

Insertion Rule insert only into leaf nodes At Non-Leaf Nodes descend down the tree as for the search S le Ki follow Pi-1 S gt Ki check Ki+1 S gt K2k follow P2k

At Leaf Node Insert the data record according to the sorting order Special case leaf node is full (2k records) rarr split the leaf node

Splitting Generate a new leaf node Split the 2k+1 entries (in order)

into two leaf nodes first k entries rarr left node last k entries rarr right node

middle entry (k+1-th) is used as new ldquodiscriminatorrdquo (branching) and inserted into the parent node

| 291

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (2)

Node Splitting during Insertion Two possible situations after a split The parent node is full rarr repeat split on this level Enough space rarr FINISHED

Special case root split Split of the root node rarr New root with two successor nodes Height of a tree grows by 1 The tree has been split from the bottom to the top

Dynamic reorganization (self-balancing) No unloading or loading necessary Tree is always balanced But In case of many insertions deletions reorganization can be beneficial

| 292

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (3)

Insertion Example Order k = 1 n=2k Keys 1 5 2 6 7 4 8 3

Finally h=3

| 293

gt

copy A Behrend Foundations of Information Systems |

Insertion and Deletion in the B-Tree

Problem Insertion can create overflow Deletion can create underflow and overflow

Example Insertion of key 22

rarr Overflow rarr Split

Deletion of key 22 rarr Underflow need to access all four nodes finally same as input

Insert 22

| 294

gt

copy A Behrend Foundations of Information Systems |

Underflow Merge

Deletion in the B-Tree

Example Order k = 1 n=2k Delete key 3

| 295

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (2)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Remember Each path from the root to the leaf has the same length h

| 296

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (3)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Overflow Split

| 297

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (4)

Example Order k = 1 n=2k Delete key 3

Overflow Split

Root Split

| 298

gt

copy A Behrend Foundations of Information Systems |

Deletion Algorithm

Example ndash there are different algorithms Search the node that contains the key K to be deleted If key K is in a leaf node delete the key in the leaf node and handle potentially

resulting underflow by merging with sibling If key K is in an inner node pull up new discriminator from one of the successors Analyze which successor node of K has more elements left or right one

If both have the same number of elements decide for one Replace the key K to be deleted with the direct successor Krsquo from the left

successor node or with the direct successor Krsquorsquo from the right successor node respectively Delete Krsquo or Krsquorsquo from the respective successor node (recursively)

Note Major variants Merge (tis lecture) Re-distribution (instead of splitmerge in case of overflowunderflow the entries are

re-distributed under consideration of one or multiple adjacent nodes)

| 299

gt

copy A Behrend Foundations of Information Systems |

B-Trees B+-Trees and B-Trees

B+-Trees and B-Trees Data is only in leaf nodes Key redundancy but higher fan-out rarr lower tree high less IO Simpler delete procedure rarr requires only merging of nodes

Double linked list of all leaf nodes B-Trees Modified valid node sizes

from [k2k] to [43k2k] rarr better node utilization but more splitsmerges

Example Secondary index Non unique

B-Tree with k = 2 h = 3

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 14: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

| 276

gt

copy A Behrend Foundations of Information Systems |

Guarantee Atomicity amp Durability

Assumptions System may crash but the disk is durable The only atomicity guarantee is that a disk block write is atomic

Materialization strategy Preferred Policy StealNo Force This combination is most complicated but allows for highest performance No Force complicates enforcing Durability What if system crashes before a modified page written by a committed

transaction makes it to disk Write as little as possible in a convenient place at commit time to support

REDOing modifications Steal complicates enforcing Atomicity What if the transaction that performed udpates aborts What if system crashes before transaction is finished Must remember the old value of P (to support UNDOing the write to page P)

copy A Behrend Foundations of Information Systems | 277

gt

|

Storage System

Buffer

File System

Hardware

Data System

Application

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

Record Management

| 278

gt

copy A Behrend Foundations of Information Systems |

Record

Record Package of fields that together describe a thing a person a fact etc Each fields represents on property of the entity described by the record Similar to a struct in C Variable length (in contrast to pages)

Record Manager Organizes physical storage of records in pages Operations Get Insert Update Delete Scan Agnostic to record structure and semantic

records considered as byte strings of variable length Structure and content of record is defined be Access System and application

Challenges Record addressing Free space management

| 279

gt

copy A Behrend Foundations of Information Systems |

Record Addressing

Record address Identifier for records used to address records eg in indexes or query processing Assigned during insert of a record

Goals Stability of identifier Fast and direct access Less organizational overhead

Direct addressing Byte address or position number in file or page Instable Byte address If record grows in length following records would get new address Position number Insert and delete operations change series or records

Indirect addressing Surrogate with mapping table (complete indirection) Tuple Identifier (TID concept)

| 280

gt

copy A Behrend Foundations of Information Systems |

Surrogate with Mapping Table

Surrogate Record type + serial number Serial number remains constant during recordrsquos life time

Mapping table Maps

surrogate to page

Problems Where to store mapping table How can it be extended How to search mapping table efficiently

rarr H2 use B-Tree to store mapping table

Mapping Table Surrogate | Page ID

| 281

gt

copy A Behrend Foundations of Information Systems |

TID Concept

Record addressing with indirection inside the page Each page contains an array with record positions TID of a record consist of page id and index in position array

Pros Access with one page access (two pages in case of overflow) Stable No mapping table required

Operations Insert Reuse unused

position or add position Delete Mark position

as unused in array Update Update all

positions in array Update with overflow Store record

as overflow record and store TID of overflow record at original position (No double overflow Update TID at original position)

Record

Overflow Record

| 282

gt

copy A Behrend Foundations of Information Systems |

Free Space Management

Problem In which page is enough space for new record

Solution Free space table lists for all pages how much space is left

Free space value Precise value Ceil(Log2(page size)) =gt 2 bytes for common page size of 4K Rough value use less bytes free space = (value page size)2^(bits per value)

Free space table With direct page addressing Assuming a single page can take n free space entries First page and each (n+1)-th page takes free space entries

With indirect page addressing Free space information stored in page table

copy A Behrend Foundations of Information Systems | 283

gt

|

Storage System

Buffer

File System

Hardware

Data System

Application

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

Physical Access Paths ndash Index Structures

| 284

gt

copy A Behrend Foundations of Information Systems |

Primary Index Secondary Index

Overview Indexes

Table scan Read all pages and for each record

evaluate the search criteria Pre-fetching

Index Scan Use index for search criteria

on one or more attributes Fast access to single values or value ranges of index attributes Logicalphysical sorting of values of key attributes (depending on index structure) Enforcing uniqueness

Types if indexes Primary (Clustered) Index

determines physical organization use for PK Secondary (Non-Clustered)

Index redundant access path

Pers(PID NAME AGE SALARY)

Age

Salary

| 285

gt

copy A Behrend Foundations of Information Systems |

Overview Indexes (2)

Choice of Access Paths Index scan Only useful for low selectivity

(low number of result tuples) Break even-point according to the

output ratio of the number of tuples (usually max 5) Requires statistics about data Additional costs for index storage

and updating

Table Scan adequateefficient for small tables

(eg 5 pages) Queries with high selectivity

(large result sets) 100-200MBs sequential read

~ 100 disk seekss

hit rate

Index Scan

Table Scan

access time

| 286

gt

copy A Behrend Foundations of Information Systems |

Classification of Index Structures

Classification

Multiway Trees Tree structure with multiple children per node Idea chose fan out so that node size suits page size

Onedimensional Index Structures

Key Comparison Key Transformation

Sequential Tree-Based Hash-Based

Prefix Trees (Tries)

Binary Search Trees Dynamic Static Linked Lists

(log seq) Seq Lists

(phys seq) Multiway

Trees Example B-Tree

| 287

gt

copy A Behrend Foundations of Information Systems |

B-Tree

free space

(Ki Di Pi) = entry min |P| = k+1 max |P| = 2k+1

keys lt K1 keys gt Kp Ki lt keys lt Ki+1

| 288

gt

copy A Behrend Foundations of Information Systems |

B-Tree (2)

Example Keys Agnostic to specific key semantic Only defined complete order required Could be of fixed or variable length

Operations Search for data for given key value Insertion and deletion of key-data pair

Payload Agnostic to specific data semantic Can be record or reference (TID) or

mix

B-Tree with k = 2 h = 3

| 289

gt

copy A Behrend Foundations of Information Systems |

Search in the B-Tree

Starting at the root node each node is searched from left to right 1) if Ki matches the desired key value the data record has been found (further

records with the same key value might be located in a sub-tree to which Pi-1 points) 2) if Ki is smaller than the desired value the search will be continued in the root of

the sub-tree identified by Pi-1 3) if Ki is larger than the desired value the comparison with Ki+1 is repeated 4) if K2k is also smaller than the desired value the search will be continued in the

sub-tree of P2k If itlsquos impossible to descend further into a sub-tree (2 or 4) (leaf node) The search is aborted no record with the desired key value is found

Search for 38 20 6

| 290

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (1)

Insertion Rule insert only into leaf nodes At Non-Leaf Nodes descend down the tree as for the search S le Ki follow Pi-1 S gt Ki check Ki+1 S gt K2k follow P2k

At Leaf Node Insert the data record according to the sorting order Special case leaf node is full (2k records) rarr split the leaf node

Splitting Generate a new leaf node Split the 2k+1 entries (in order)

into two leaf nodes first k entries rarr left node last k entries rarr right node

middle entry (k+1-th) is used as new ldquodiscriminatorrdquo (branching) and inserted into the parent node

| 291

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (2)

Node Splitting during Insertion Two possible situations after a split The parent node is full rarr repeat split on this level Enough space rarr FINISHED

Special case root split Split of the root node rarr New root with two successor nodes Height of a tree grows by 1 The tree has been split from the bottom to the top

Dynamic reorganization (self-balancing) No unloading or loading necessary Tree is always balanced But In case of many insertions deletions reorganization can be beneficial

| 292

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (3)

Insertion Example Order k = 1 n=2k Keys 1 5 2 6 7 4 8 3

Finally h=3

| 293

gt

copy A Behrend Foundations of Information Systems |

Insertion and Deletion in the B-Tree

Problem Insertion can create overflow Deletion can create underflow and overflow

Example Insertion of key 22

rarr Overflow rarr Split

Deletion of key 22 rarr Underflow need to access all four nodes finally same as input

Insert 22

| 294

gt

copy A Behrend Foundations of Information Systems |

Underflow Merge

Deletion in the B-Tree

Example Order k = 1 n=2k Delete key 3

| 295

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (2)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Remember Each path from the root to the leaf has the same length h

| 296

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (3)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Overflow Split

| 297

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (4)

Example Order k = 1 n=2k Delete key 3

Overflow Split

Root Split

| 298

gt

copy A Behrend Foundations of Information Systems |

Deletion Algorithm

Example ndash there are different algorithms Search the node that contains the key K to be deleted If key K is in a leaf node delete the key in the leaf node and handle potentially

resulting underflow by merging with sibling If key K is in an inner node pull up new discriminator from one of the successors Analyze which successor node of K has more elements left or right one

If both have the same number of elements decide for one Replace the key K to be deleted with the direct successor Krsquo from the left

successor node or with the direct successor Krsquorsquo from the right successor node respectively Delete Krsquo or Krsquorsquo from the respective successor node (recursively)

Note Major variants Merge (tis lecture) Re-distribution (instead of splitmerge in case of overflowunderflow the entries are

re-distributed under consideration of one or multiple adjacent nodes)

| 299

gt

copy A Behrend Foundations of Information Systems |

B-Trees B+-Trees and B-Trees

B+-Trees and B-Trees Data is only in leaf nodes Key redundancy but higher fan-out rarr lower tree high less IO Simpler delete procedure rarr requires only merging of nodes

Double linked list of all leaf nodes B-Trees Modified valid node sizes

from [k2k] to [43k2k] rarr better node utilization but more splitsmerges

Example Secondary index Non unique

B-Tree with k = 2 h = 3

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 15: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

copy A Behrend Foundations of Information Systems | 277

gt

|

Storage System

Buffer

File System

Hardware

Data System

Application

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

Record Management

| 278

gt

copy A Behrend Foundations of Information Systems |

Record

Record Package of fields that together describe a thing a person a fact etc Each fields represents on property of the entity described by the record Similar to a struct in C Variable length (in contrast to pages)

Record Manager Organizes physical storage of records in pages Operations Get Insert Update Delete Scan Agnostic to record structure and semantic

records considered as byte strings of variable length Structure and content of record is defined be Access System and application

Challenges Record addressing Free space management

| 279

gt

copy A Behrend Foundations of Information Systems |

Record Addressing

Record address Identifier for records used to address records eg in indexes or query processing Assigned during insert of a record

Goals Stability of identifier Fast and direct access Less organizational overhead

Direct addressing Byte address or position number in file or page Instable Byte address If record grows in length following records would get new address Position number Insert and delete operations change series or records

Indirect addressing Surrogate with mapping table (complete indirection) Tuple Identifier (TID concept)

| 280

gt

copy A Behrend Foundations of Information Systems |

Surrogate with Mapping Table

Surrogate Record type + serial number Serial number remains constant during recordrsquos life time

Mapping table Maps

surrogate to page

Problems Where to store mapping table How can it be extended How to search mapping table efficiently

rarr H2 use B-Tree to store mapping table

Mapping Table Surrogate | Page ID

| 281

gt

copy A Behrend Foundations of Information Systems |

TID Concept

Record addressing with indirection inside the page Each page contains an array with record positions TID of a record consist of page id and index in position array

Pros Access with one page access (two pages in case of overflow) Stable No mapping table required

Operations Insert Reuse unused

position or add position Delete Mark position

as unused in array Update Update all

positions in array Update with overflow Store record

as overflow record and store TID of overflow record at original position (No double overflow Update TID at original position)

Record

Overflow Record

| 282

gt

copy A Behrend Foundations of Information Systems |

Free Space Management

Problem In which page is enough space for new record

Solution Free space table lists for all pages how much space is left

Free space value Precise value Ceil(Log2(page size)) =gt 2 bytes for common page size of 4K Rough value use less bytes free space = (value page size)2^(bits per value)

Free space table With direct page addressing Assuming a single page can take n free space entries First page and each (n+1)-th page takes free space entries

With indirect page addressing Free space information stored in page table

copy A Behrend Foundations of Information Systems | 283

gt

|

Storage System

Buffer

File System

Hardware

Data System

Application

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

Physical Access Paths ndash Index Structures

| 284

gt

copy A Behrend Foundations of Information Systems |

Primary Index Secondary Index

Overview Indexes

Table scan Read all pages and for each record

evaluate the search criteria Pre-fetching

Index Scan Use index for search criteria

on one or more attributes Fast access to single values or value ranges of index attributes Logicalphysical sorting of values of key attributes (depending on index structure) Enforcing uniqueness

Types if indexes Primary (Clustered) Index

determines physical organization use for PK Secondary (Non-Clustered)

Index redundant access path

Pers(PID NAME AGE SALARY)

Age

Salary

| 285

gt

copy A Behrend Foundations of Information Systems |

Overview Indexes (2)

Choice of Access Paths Index scan Only useful for low selectivity

(low number of result tuples) Break even-point according to the

output ratio of the number of tuples (usually max 5) Requires statistics about data Additional costs for index storage

and updating

Table Scan adequateefficient for small tables

(eg 5 pages) Queries with high selectivity

(large result sets) 100-200MBs sequential read

~ 100 disk seekss

hit rate

Index Scan

Table Scan

access time

| 286

gt

copy A Behrend Foundations of Information Systems |

Classification of Index Structures

Classification

Multiway Trees Tree structure with multiple children per node Idea chose fan out so that node size suits page size

Onedimensional Index Structures

Key Comparison Key Transformation

Sequential Tree-Based Hash-Based

Prefix Trees (Tries)

Binary Search Trees Dynamic Static Linked Lists

(log seq) Seq Lists

(phys seq) Multiway

Trees Example B-Tree

| 287

gt

copy A Behrend Foundations of Information Systems |

B-Tree

free space

(Ki Di Pi) = entry min |P| = k+1 max |P| = 2k+1

keys lt K1 keys gt Kp Ki lt keys lt Ki+1

| 288

gt

copy A Behrend Foundations of Information Systems |

B-Tree (2)

Example Keys Agnostic to specific key semantic Only defined complete order required Could be of fixed or variable length

Operations Search for data for given key value Insertion and deletion of key-data pair

Payload Agnostic to specific data semantic Can be record or reference (TID) or

mix

B-Tree with k = 2 h = 3

| 289

gt

copy A Behrend Foundations of Information Systems |

Search in the B-Tree

Starting at the root node each node is searched from left to right 1) if Ki matches the desired key value the data record has been found (further

records with the same key value might be located in a sub-tree to which Pi-1 points) 2) if Ki is smaller than the desired value the search will be continued in the root of

the sub-tree identified by Pi-1 3) if Ki is larger than the desired value the comparison with Ki+1 is repeated 4) if K2k is also smaller than the desired value the search will be continued in the

sub-tree of P2k If itlsquos impossible to descend further into a sub-tree (2 or 4) (leaf node) The search is aborted no record with the desired key value is found

Search for 38 20 6

| 290

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (1)

Insertion Rule insert only into leaf nodes At Non-Leaf Nodes descend down the tree as for the search S le Ki follow Pi-1 S gt Ki check Ki+1 S gt K2k follow P2k

At Leaf Node Insert the data record according to the sorting order Special case leaf node is full (2k records) rarr split the leaf node

Splitting Generate a new leaf node Split the 2k+1 entries (in order)

into two leaf nodes first k entries rarr left node last k entries rarr right node

middle entry (k+1-th) is used as new ldquodiscriminatorrdquo (branching) and inserted into the parent node

| 291

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (2)

Node Splitting during Insertion Two possible situations after a split The parent node is full rarr repeat split on this level Enough space rarr FINISHED

Special case root split Split of the root node rarr New root with two successor nodes Height of a tree grows by 1 The tree has been split from the bottom to the top

Dynamic reorganization (self-balancing) No unloading or loading necessary Tree is always balanced But In case of many insertions deletions reorganization can be beneficial

| 292

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (3)

Insertion Example Order k = 1 n=2k Keys 1 5 2 6 7 4 8 3

Finally h=3

| 293

gt

copy A Behrend Foundations of Information Systems |

Insertion and Deletion in the B-Tree

Problem Insertion can create overflow Deletion can create underflow and overflow

Example Insertion of key 22

rarr Overflow rarr Split

Deletion of key 22 rarr Underflow need to access all four nodes finally same as input

Insert 22

| 294

gt

copy A Behrend Foundations of Information Systems |

Underflow Merge

Deletion in the B-Tree

Example Order k = 1 n=2k Delete key 3

| 295

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (2)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Remember Each path from the root to the leaf has the same length h

| 296

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (3)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Overflow Split

| 297

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (4)

Example Order k = 1 n=2k Delete key 3

Overflow Split

Root Split

| 298

gt

copy A Behrend Foundations of Information Systems |

Deletion Algorithm

Example ndash there are different algorithms Search the node that contains the key K to be deleted If key K is in a leaf node delete the key in the leaf node and handle potentially

resulting underflow by merging with sibling If key K is in an inner node pull up new discriminator from one of the successors Analyze which successor node of K has more elements left or right one

If both have the same number of elements decide for one Replace the key K to be deleted with the direct successor Krsquo from the left

successor node or with the direct successor Krsquorsquo from the right successor node respectively Delete Krsquo or Krsquorsquo from the respective successor node (recursively)

Note Major variants Merge (tis lecture) Re-distribution (instead of splitmerge in case of overflowunderflow the entries are

re-distributed under consideration of one or multiple adjacent nodes)

| 299

gt

copy A Behrend Foundations of Information Systems |

B-Trees B+-Trees and B-Trees

B+-Trees and B-Trees Data is only in leaf nodes Key redundancy but higher fan-out rarr lower tree high less IO Simpler delete procedure rarr requires only merging of nodes

Double linked list of all leaf nodes B-Trees Modified valid node sizes

from [k2k] to [43k2k] rarr better node utilization but more splitsmerges

Example Secondary index Non unique

B-Tree with k = 2 h = 3

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 16: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

| 278

gt

copy A Behrend Foundations of Information Systems |

Record

Record Package of fields that together describe a thing a person a fact etc Each fields represents on property of the entity described by the record Similar to a struct in C Variable length (in contrast to pages)

Record Manager Organizes physical storage of records in pages Operations Get Insert Update Delete Scan Agnostic to record structure and semantic

records considered as byte strings of variable length Structure and content of record is defined be Access System and application

Challenges Record addressing Free space management

| 279

gt

copy A Behrend Foundations of Information Systems |

Record Addressing

Record address Identifier for records used to address records eg in indexes or query processing Assigned during insert of a record

Goals Stability of identifier Fast and direct access Less organizational overhead

Direct addressing Byte address or position number in file or page Instable Byte address If record grows in length following records would get new address Position number Insert and delete operations change series or records

Indirect addressing Surrogate with mapping table (complete indirection) Tuple Identifier (TID concept)

| 280

gt

copy A Behrend Foundations of Information Systems |

Surrogate with Mapping Table

Surrogate Record type + serial number Serial number remains constant during recordrsquos life time

Mapping table Maps

surrogate to page

Problems Where to store mapping table How can it be extended How to search mapping table efficiently

rarr H2 use B-Tree to store mapping table

Mapping Table Surrogate | Page ID

| 281

gt

copy A Behrend Foundations of Information Systems |

TID Concept

Record addressing with indirection inside the page Each page contains an array with record positions TID of a record consist of page id and index in position array

Pros Access with one page access (two pages in case of overflow) Stable No mapping table required

Operations Insert Reuse unused

position or add position Delete Mark position

as unused in array Update Update all

positions in array Update with overflow Store record

as overflow record and store TID of overflow record at original position (No double overflow Update TID at original position)

Record

Overflow Record

| 282

gt

copy A Behrend Foundations of Information Systems |

Free Space Management

Problem In which page is enough space for new record

Solution Free space table lists for all pages how much space is left

Free space value Precise value Ceil(Log2(page size)) =gt 2 bytes for common page size of 4K Rough value use less bytes free space = (value page size)2^(bits per value)

Free space table With direct page addressing Assuming a single page can take n free space entries First page and each (n+1)-th page takes free space entries

With indirect page addressing Free space information stored in page table

copy A Behrend Foundations of Information Systems | 283

gt

|

Storage System

Buffer

File System

Hardware

Data System

Application

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

Physical Access Paths ndash Index Structures

| 284

gt

copy A Behrend Foundations of Information Systems |

Primary Index Secondary Index

Overview Indexes

Table scan Read all pages and for each record

evaluate the search criteria Pre-fetching

Index Scan Use index for search criteria

on one or more attributes Fast access to single values or value ranges of index attributes Logicalphysical sorting of values of key attributes (depending on index structure) Enforcing uniqueness

Types if indexes Primary (Clustered) Index

determines physical organization use for PK Secondary (Non-Clustered)

Index redundant access path

Pers(PID NAME AGE SALARY)

Age

Salary

| 285

gt

copy A Behrend Foundations of Information Systems |

Overview Indexes (2)

Choice of Access Paths Index scan Only useful for low selectivity

(low number of result tuples) Break even-point according to the

output ratio of the number of tuples (usually max 5) Requires statistics about data Additional costs for index storage

and updating

Table Scan adequateefficient for small tables

(eg 5 pages) Queries with high selectivity

(large result sets) 100-200MBs sequential read

~ 100 disk seekss

hit rate

Index Scan

Table Scan

access time

| 286

gt

copy A Behrend Foundations of Information Systems |

Classification of Index Structures

Classification

Multiway Trees Tree structure with multiple children per node Idea chose fan out so that node size suits page size

Onedimensional Index Structures

Key Comparison Key Transformation

Sequential Tree-Based Hash-Based

Prefix Trees (Tries)

Binary Search Trees Dynamic Static Linked Lists

(log seq) Seq Lists

(phys seq) Multiway

Trees Example B-Tree

| 287

gt

copy A Behrend Foundations of Information Systems |

B-Tree

free space

(Ki Di Pi) = entry min |P| = k+1 max |P| = 2k+1

keys lt K1 keys gt Kp Ki lt keys lt Ki+1

| 288

gt

copy A Behrend Foundations of Information Systems |

B-Tree (2)

Example Keys Agnostic to specific key semantic Only defined complete order required Could be of fixed or variable length

Operations Search for data for given key value Insertion and deletion of key-data pair

Payload Agnostic to specific data semantic Can be record or reference (TID) or

mix

B-Tree with k = 2 h = 3

| 289

gt

copy A Behrend Foundations of Information Systems |

Search in the B-Tree

Starting at the root node each node is searched from left to right 1) if Ki matches the desired key value the data record has been found (further

records with the same key value might be located in a sub-tree to which Pi-1 points) 2) if Ki is smaller than the desired value the search will be continued in the root of

the sub-tree identified by Pi-1 3) if Ki is larger than the desired value the comparison with Ki+1 is repeated 4) if K2k is also smaller than the desired value the search will be continued in the

sub-tree of P2k If itlsquos impossible to descend further into a sub-tree (2 or 4) (leaf node) The search is aborted no record with the desired key value is found

Search for 38 20 6

| 290

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (1)

Insertion Rule insert only into leaf nodes At Non-Leaf Nodes descend down the tree as for the search S le Ki follow Pi-1 S gt Ki check Ki+1 S gt K2k follow P2k

At Leaf Node Insert the data record according to the sorting order Special case leaf node is full (2k records) rarr split the leaf node

Splitting Generate a new leaf node Split the 2k+1 entries (in order)

into two leaf nodes first k entries rarr left node last k entries rarr right node

middle entry (k+1-th) is used as new ldquodiscriminatorrdquo (branching) and inserted into the parent node

| 291

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (2)

Node Splitting during Insertion Two possible situations after a split The parent node is full rarr repeat split on this level Enough space rarr FINISHED

Special case root split Split of the root node rarr New root with two successor nodes Height of a tree grows by 1 The tree has been split from the bottom to the top

Dynamic reorganization (self-balancing) No unloading or loading necessary Tree is always balanced But In case of many insertions deletions reorganization can be beneficial

| 292

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (3)

Insertion Example Order k = 1 n=2k Keys 1 5 2 6 7 4 8 3

Finally h=3

| 293

gt

copy A Behrend Foundations of Information Systems |

Insertion and Deletion in the B-Tree

Problem Insertion can create overflow Deletion can create underflow and overflow

Example Insertion of key 22

rarr Overflow rarr Split

Deletion of key 22 rarr Underflow need to access all four nodes finally same as input

Insert 22

| 294

gt

copy A Behrend Foundations of Information Systems |

Underflow Merge

Deletion in the B-Tree

Example Order k = 1 n=2k Delete key 3

| 295

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (2)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Remember Each path from the root to the leaf has the same length h

| 296

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (3)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Overflow Split

| 297

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (4)

Example Order k = 1 n=2k Delete key 3

Overflow Split

Root Split

| 298

gt

copy A Behrend Foundations of Information Systems |

Deletion Algorithm

Example ndash there are different algorithms Search the node that contains the key K to be deleted If key K is in a leaf node delete the key in the leaf node and handle potentially

resulting underflow by merging with sibling If key K is in an inner node pull up new discriminator from one of the successors Analyze which successor node of K has more elements left or right one

If both have the same number of elements decide for one Replace the key K to be deleted with the direct successor Krsquo from the left

successor node or with the direct successor Krsquorsquo from the right successor node respectively Delete Krsquo or Krsquorsquo from the respective successor node (recursively)

Note Major variants Merge (tis lecture) Re-distribution (instead of splitmerge in case of overflowunderflow the entries are

re-distributed under consideration of one or multiple adjacent nodes)

| 299

gt

copy A Behrend Foundations of Information Systems |

B-Trees B+-Trees and B-Trees

B+-Trees and B-Trees Data is only in leaf nodes Key redundancy but higher fan-out rarr lower tree high less IO Simpler delete procedure rarr requires only merging of nodes

Double linked list of all leaf nodes B-Trees Modified valid node sizes

from [k2k] to [43k2k] rarr better node utilization but more splitsmerges

Example Secondary index Non unique

B-Tree with k = 2 h = 3

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 17: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

| 279

gt

copy A Behrend Foundations of Information Systems |

Record Addressing

Record address Identifier for records used to address records eg in indexes or query processing Assigned during insert of a record

Goals Stability of identifier Fast and direct access Less organizational overhead

Direct addressing Byte address or position number in file or page Instable Byte address If record grows in length following records would get new address Position number Insert and delete operations change series or records

Indirect addressing Surrogate with mapping table (complete indirection) Tuple Identifier (TID concept)

| 280

gt

copy A Behrend Foundations of Information Systems |

Surrogate with Mapping Table

Surrogate Record type + serial number Serial number remains constant during recordrsquos life time

Mapping table Maps

surrogate to page

Problems Where to store mapping table How can it be extended How to search mapping table efficiently

rarr H2 use B-Tree to store mapping table

Mapping Table Surrogate | Page ID

| 281

gt

copy A Behrend Foundations of Information Systems |

TID Concept

Record addressing with indirection inside the page Each page contains an array with record positions TID of a record consist of page id and index in position array

Pros Access with one page access (two pages in case of overflow) Stable No mapping table required

Operations Insert Reuse unused

position or add position Delete Mark position

as unused in array Update Update all

positions in array Update with overflow Store record

as overflow record and store TID of overflow record at original position (No double overflow Update TID at original position)

Record

Overflow Record

| 282

gt

copy A Behrend Foundations of Information Systems |

Free Space Management

Problem In which page is enough space for new record

Solution Free space table lists for all pages how much space is left

Free space value Precise value Ceil(Log2(page size)) =gt 2 bytes for common page size of 4K Rough value use less bytes free space = (value page size)2^(bits per value)

Free space table With direct page addressing Assuming a single page can take n free space entries First page and each (n+1)-th page takes free space entries

With indirect page addressing Free space information stored in page table

copy A Behrend Foundations of Information Systems | 283

gt

|

Storage System

Buffer

File System

Hardware

Data System

Application

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

Physical Access Paths ndash Index Structures

| 284

gt

copy A Behrend Foundations of Information Systems |

Primary Index Secondary Index

Overview Indexes

Table scan Read all pages and for each record

evaluate the search criteria Pre-fetching

Index Scan Use index for search criteria

on one or more attributes Fast access to single values or value ranges of index attributes Logicalphysical sorting of values of key attributes (depending on index structure) Enforcing uniqueness

Types if indexes Primary (Clustered) Index

determines physical organization use for PK Secondary (Non-Clustered)

Index redundant access path

Pers(PID NAME AGE SALARY)

Age

Salary

| 285

gt

copy A Behrend Foundations of Information Systems |

Overview Indexes (2)

Choice of Access Paths Index scan Only useful for low selectivity

(low number of result tuples) Break even-point according to the

output ratio of the number of tuples (usually max 5) Requires statistics about data Additional costs for index storage

and updating

Table Scan adequateefficient for small tables

(eg 5 pages) Queries with high selectivity

(large result sets) 100-200MBs sequential read

~ 100 disk seekss

hit rate

Index Scan

Table Scan

access time

| 286

gt

copy A Behrend Foundations of Information Systems |

Classification of Index Structures

Classification

Multiway Trees Tree structure with multiple children per node Idea chose fan out so that node size suits page size

Onedimensional Index Structures

Key Comparison Key Transformation

Sequential Tree-Based Hash-Based

Prefix Trees (Tries)

Binary Search Trees Dynamic Static Linked Lists

(log seq) Seq Lists

(phys seq) Multiway

Trees Example B-Tree

| 287

gt

copy A Behrend Foundations of Information Systems |

B-Tree

free space

(Ki Di Pi) = entry min |P| = k+1 max |P| = 2k+1

keys lt K1 keys gt Kp Ki lt keys lt Ki+1

| 288

gt

copy A Behrend Foundations of Information Systems |

B-Tree (2)

Example Keys Agnostic to specific key semantic Only defined complete order required Could be of fixed or variable length

Operations Search for data for given key value Insertion and deletion of key-data pair

Payload Agnostic to specific data semantic Can be record or reference (TID) or

mix

B-Tree with k = 2 h = 3

| 289

gt

copy A Behrend Foundations of Information Systems |

Search in the B-Tree

Starting at the root node each node is searched from left to right 1) if Ki matches the desired key value the data record has been found (further

records with the same key value might be located in a sub-tree to which Pi-1 points) 2) if Ki is smaller than the desired value the search will be continued in the root of

the sub-tree identified by Pi-1 3) if Ki is larger than the desired value the comparison with Ki+1 is repeated 4) if K2k is also smaller than the desired value the search will be continued in the

sub-tree of P2k If itlsquos impossible to descend further into a sub-tree (2 or 4) (leaf node) The search is aborted no record with the desired key value is found

Search for 38 20 6

| 290

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (1)

Insertion Rule insert only into leaf nodes At Non-Leaf Nodes descend down the tree as for the search S le Ki follow Pi-1 S gt Ki check Ki+1 S gt K2k follow P2k

At Leaf Node Insert the data record according to the sorting order Special case leaf node is full (2k records) rarr split the leaf node

Splitting Generate a new leaf node Split the 2k+1 entries (in order)

into two leaf nodes first k entries rarr left node last k entries rarr right node

middle entry (k+1-th) is used as new ldquodiscriminatorrdquo (branching) and inserted into the parent node

| 291

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (2)

Node Splitting during Insertion Two possible situations after a split The parent node is full rarr repeat split on this level Enough space rarr FINISHED

Special case root split Split of the root node rarr New root with two successor nodes Height of a tree grows by 1 The tree has been split from the bottom to the top

Dynamic reorganization (self-balancing) No unloading or loading necessary Tree is always balanced But In case of many insertions deletions reorganization can be beneficial

| 292

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (3)

Insertion Example Order k = 1 n=2k Keys 1 5 2 6 7 4 8 3

Finally h=3

| 293

gt

copy A Behrend Foundations of Information Systems |

Insertion and Deletion in the B-Tree

Problem Insertion can create overflow Deletion can create underflow and overflow

Example Insertion of key 22

rarr Overflow rarr Split

Deletion of key 22 rarr Underflow need to access all four nodes finally same as input

Insert 22

| 294

gt

copy A Behrend Foundations of Information Systems |

Underflow Merge

Deletion in the B-Tree

Example Order k = 1 n=2k Delete key 3

| 295

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (2)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Remember Each path from the root to the leaf has the same length h

| 296

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (3)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Overflow Split

| 297

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (4)

Example Order k = 1 n=2k Delete key 3

Overflow Split

Root Split

| 298

gt

copy A Behrend Foundations of Information Systems |

Deletion Algorithm

Example ndash there are different algorithms Search the node that contains the key K to be deleted If key K is in a leaf node delete the key in the leaf node and handle potentially

resulting underflow by merging with sibling If key K is in an inner node pull up new discriminator from one of the successors Analyze which successor node of K has more elements left or right one

If both have the same number of elements decide for one Replace the key K to be deleted with the direct successor Krsquo from the left

successor node or with the direct successor Krsquorsquo from the right successor node respectively Delete Krsquo or Krsquorsquo from the respective successor node (recursively)

Note Major variants Merge (tis lecture) Re-distribution (instead of splitmerge in case of overflowunderflow the entries are

re-distributed under consideration of one or multiple adjacent nodes)

| 299

gt

copy A Behrend Foundations of Information Systems |

B-Trees B+-Trees and B-Trees

B+-Trees and B-Trees Data is only in leaf nodes Key redundancy but higher fan-out rarr lower tree high less IO Simpler delete procedure rarr requires only merging of nodes

Double linked list of all leaf nodes B-Trees Modified valid node sizes

from [k2k] to [43k2k] rarr better node utilization but more splitsmerges

Example Secondary index Non unique

B-Tree with k = 2 h = 3

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 18: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

| 280

gt

copy A Behrend Foundations of Information Systems |

Surrogate with Mapping Table

Surrogate Record type + serial number Serial number remains constant during recordrsquos life time

Mapping table Maps

surrogate to page

Problems Where to store mapping table How can it be extended How to search mapping table efficiently

rarr H2 use B-Tree to store mapping table

Mapping Table Surrogate | Page ID

| 281

gt

copy A Behrend Foundations of Information Systems |

TID Concept

Record addressing with indirection inside the page Each page contains an array with record positions TID of a record consist of page id and index in position array

Pros Access with one page access (two pages in case of overflow) Stable No mapping table required

Operations Insert Reuse unused

position or add position Delete Mark position

as unused in array Update Update all

positions in array Update with overflow Store record

as overflow record and store TID of overflow record at original position (No double overflow Update TID at original position)

Record

Overflow Record

| 282

gt

copy A Behrend Foundations of Information Systems |

Free Space Management

Problem In which page is enough space for new record

Solution Free space table lists for all pages how much space is left

Free space value Precise value Ceil(Log2(page size)) =gt 2 bytes for common page size of 4K Rough value use less bytes free space = (value page size)2^(bits per value)

Free space table With direct page addressing Assuming a single page can take n free space entries First page and each (n+1)-th page takes free space entries

With indirect page addressing Free space information stored in page table

copy A Behrend Foundations of Information Systems | 283

gt

|

Storage System

Buffer

File System

Hardware

Data System

Application

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

Physical Access Paths ndash Index Structures

| 284

gt

copy A Behrend Foundations of Information Systems |

Primary Index Secondary Index

Overview Indexes

Table scan Read all pages and for each record

evaluate the search criteria Pre-fetching

Index Scan Use index for search criteria

on one or more attributes Fast access to single values or value ranges of index attributes Logicalphysical sorting of values of key attributes (depending on index structure) Enforcing uniqueness

Types if indexes Primary (Clustered) Index

determines physical organization use for PK Secondary (Non-Clustered)

Index redundant access path

Pers(PID NAME AGE SALARY)

Age

Salary

| 285

gt

copy A Behrend Foundations of Information Systems |

Overview Indexes (2)

Choice of Access Paths Index scan Only useful for low selectivity

(low number of result tuples) Break even-point according to the

output ratio of the number of tuples (usually max 5) Requires statistics about data Additional costs for index storage

and updating

Table Scan adequateefficient for small tables

(eg 5 pages) Queries with high selectivity

(large result sets) 100-200MBs sequential read

~ 100 disk seekss

hit rate

Index Scan

Table Scan

access time

| 286

gt

copy A Behrend Foundations of Information Systems |

Classification of Index Structures

Classification

Multiway Trees Tree structure with multiple children per node Idea chose fan out so that node size suits page size

Onedimensional Index Structures

Key Comparison Key Transformation

Sequential Tree-Based Hash-Based

Prefix Trees (Tries)

Binary Search Trees Dynamic Static Linked Lists

(log seq) Seq Lists

(phys seq) Multiway

Trees Example B-Tree

| 287

gt

copy A Behrend Foundations of Information Systems |

B-Tree

free space

(Ki Di Pi) = entry min |P| = k+1 max |P| = 2k+1

keys lt K1 keys gt Kp Ki lt keys lt Ki+1

| 288

gt

copy A Behrend Foundations of Information Systems |

B-Tree (2)

Example Keys Agnostic to specific key semantic Only defined complete order required Could be of fixed or variable length

Operations Search for data for given key value Insertion and deletion of key-data pair

Payload Agnostic to specific data semantic Can be record or reference (TID) or

mix

B-Tree with k = 2 h = 3

| 289

gt

copy A Behrend Foundations of Information Systems |

Search in the B-Tree

Starting at the root node each node is searched from left to right 1) if Ki matches the desired key value the data record has been found (further

records with the same key value might be located in a sub-tree to which Pi-1 points) 2) if Ki is smaller than the desired value the search will be continued in the root of

the sub-tree identified by Pi-1 3) if Ki is larger than the desired value the comparison with Ki+1 is repeated 4) if K2k is also smaller than the desired value the search will be continued in the

sub-tree of P2k If itlsquos impossible to descend further into a sub-tree (2 or 4) (leaf node) The search is aborted no record with the desired key value is found

Search for 38 20 6

| 290

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (1)

Insertion Rule insert only into leaf nodes At Non-Leaf Nodes descend down the tree as for the search S le Ki follow Pi-1 S gt Ki check Ki+1 S gt K2k follow P2k

At Leaf Node Insert the data record according to the sorting order Special case leaf node is full (2k records) rarr split the leaf node

Splitting Generate a new leaf node Split the 2k+1 entries (in order)

into two leaf nodes first k entries rarr left node last k entries rarr right node

middle entry (k+1-th) is used as new ldquodiscriminatorrdquo (branching) and inserted into the parent node

| 291

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (2)

Node Splitting during Insertion Two possible situations after a split The parent node is full rarr repeat split on this level Enough space rarr FINISHED

Special case root split Split of the root node rarr New root with two successor nodes Height of a tree grows by 1 The tree has been split from the bottom to the top

Dynamic reorganization (self-balancing) No unloading or loading necessary Tree is always balanced But In case of many insertions deletions reorganization can be beneficial

| 292

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (3)

Insertion Example Order k = 1 n=2k Keys 1 5 2 6 7 4 8 3

Finally h=3

| 293

gt

copy A Behrend Foundations of Information Systems |

Insertion and Deletion in the B-Tree

Problem Insertion can create overflow Deletion can create underflow and overflow

Example Insertion of key 22

rarr Overflow rarr Split

Deletion of key 22 rarr Underflow need to access all four nodes finally same as input

Insert 22

| 294

gt

copy A Behrend Foundations of Information Systems |

Underflow Merge

Deletion in the B-Tree

Example Order k = 1 n=2k Delete key 3

| 295

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (2)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Remember Each path from the root to the leaf has the same length h

| 296

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (3)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Overflow Split

| 297

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (4)

Example Order k = 1 n=2k Delete key 3

Overflow Split

Root Split

| 298

gt

copy A Behrend Foundations of Information Systems |

Deletion Algorithm

Example ndash there are different algorithms Search the node that contains the key K to be deleted If key K is in a leaf node delete the key in the leaf node and handle potentially

resulting underflow by merging with sibling If key K is in an inner node pull up new discriminator from one of the successors Analyze which successor node of K has more elements left or right one

If both have the same number of elements decide for one Replace the key K to be deleted with the direct successor Krsquo from the left

successor node or with the direct successor Krsquorsquo from the right successor node respectively Delete Krsquo or Krsquorsquo from the respective successor node (recursively)

Note Major variants Merge (tis lecture) Re-distribution (instead of splitmerge in case of overflowunderflow the entries are

re-distributed under consideration of one or multiple adjacent nodes)

| 299

gt

copy A Behrend Foundations of Information Systems |

B-Trees B+-Trees and B-Trees

B+-Trees and B-Trees Data is only in leaf nodes Key redundancy but higher fan-out rarr lower tree high less IO Simpler delete procedure rarr requires only merging of nodes

Double linked list of all leaf nodes B-Trees Modified valid node sizes

from [k2k] to [43k2k] rarr better node utilization but more splitsmerges

Example Secondary index Non unique

B-Tree with k = 2 h = 3

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 19: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

| 281

gt

copy A Behrend Foundations of Information Systems |

TID Concept

Record addressing with indirection inside the page Each page contains an array with record positions TID of a record consist of page id and index in position array

Pros Access with one page access (two pages in case of overflow) Stable No mapping table required

Operations Insert Reuse unused

position or add position Delete Mark position

as unused in array Update Update all

positions in array Update with overflow Store record

as overflow record and store TID of overflow record at original position (No double overflow Update TID at original position)

Record

Overflow Record

| 282

gt

copy A Behrend Foundations of Information Systems |

Free Space Management

Problem In which page is enough space for new record

Solution Free space table lists for all pages how much space is left

Free space value Precise value Ceil(Log2(page size)) =gt 2 bytes for common page size of 4K Rough value use less bytes free space = (value page size)2^(bits per value)

Free space table With direct page addressing Assuming a single page can take n free space entries First page and each (n+1)-th page takes free space entries

With indirect page addressing Free space information stored in page table

copy A Behrend Foundations of Information Systems | 283

gt

|

Storage System

Buffer

File System

Hardware

Data System

Application

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

Physical Access Paths ndash Index Structures

| 284

gt

copy A Behrend Foundations of Information Systems |

Primary Index Secondary Index

Overview Indexes

Table scan Read all pages and for each record

evaluate the search criteria Pre-fetching

Index Scan Use index for search criteria

on one or more attributes Fast access to single values or value ranges of index attributes Logicalphysical sorting of values of key attributes (depending on index structure) Enforcing uniqueness

Types if indexes Primary (Clustered) Index

determines physical organization use for PK Secondary (Non-Clustered)

Index redundant access path

Pers(PID NAME AGE SALARY)

Age

Salary

| 285

gt

copy A Behrend Foundations of Information Systems |

Overview Indexes (2)

Choice of Access Paths Index scan Only useful for low selectivity

(low number of result tuples) Break even-point according to the

output ratio of the number of tuples (usually max 5) Requires statistics about data Additional costs for index storage

and updating

Table Scan adequateefficient for small tables

(eg 5 pages) Queries with high selectivity

(large result sets) 100-200MBs sequential read

~ 100 disk seekss

hit rate

Index Scan

Table Scan

access time

| 286

gt

copy A Behrend Foundations of Information Systems |

Classification of Index Structures

Classification

Multiway Trees Tree structure with multiple children per node Idea chose fan out so that node size suits page size

Onedimensional Index Structures

Key Comparison Key Transformation

Sequential Tree-Based Hash-Based

Prefix Trees (Tries)

Binary Search Trees Dynamic Static Linked Lists

(log seq) Seq Lists

(phys seq) Multiway

Trees Example B-Tree

| 287

gt

copy A Behrend Foundations of Information Systems |

B-Tree

free space

(Ki Di Pi) = entry min |P| = k+1 max |P| = 2k+1

keys lt K1 keys gt Kp Ki lt keys lt Ki+1

| 288

gt

copy A Behrend Foundations of Information Systems |

B-Tree (2)

Example Keys Agnostic to specific key semantic Only defined complete order required Could be of fixed or variable length

Operations Search for data for given key value Insertion and deletion of key-data pair

Payload Agnostic to specific data semantic Can be record or reference (TID) or

mix

B-Tree with k = 2 h = 3

| 289

gt

copy A Behrend Foundations of Information Systems |

Search in the B-Tree

Starting at the root node each node is searched from left to right 1) if Ki matches the desired key value the data record has been found (further

records with the same key value might be located in a sub-tree to which Pi-1 points) 2) if Ki is smaller than the desired value the search will be continued in the root of

the sub-tree identified by Pi-1 3) if Ki is larger than the desired value the comparison with Ki+1 is repeated 4) if K2k is also smaller than the desired value the search will be continued in the

sub-tree of P2k If itlsquos impossible to descend further into a sub-tree (2 or 4) (leaf node) The search is aborted no record with the desired key value is found

Search for 38 20 6

| 290

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (1)

Insertion Rule insert only into leaf nodes At Non-Leaf Nodes descend down the tree as for the search S le Ki follow Pi-1 S gt Ki check Ki+1 S gt K2k follow P2k

At Leaf Node Insert the data record according to the sorting order Special case leaf node is full (2k records) rarr split the leaf node

Splitting Generate a new leaf node Split the 2k+1 entries (in order)

into two leaf nodes first k entries rarr left node last k entries rarr right node

middle entry (k+1-th) is used as new ldquodiscriminatorrdquo (branching) and inserted into the parent node

| 291

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (2)

Node Splitting during Insertion Two possible situations after a split The parent node is full rarr repeat split on this level Enough space rarr FINISHED

Special case root split Split of the root node rarr New root with two successor nodes Height of a tree grows by 1 The tree has been split from the bottom to the top

Dynamic reorganization (self-balancing) No unloading or loading necessary Tree is always balanced But In case of many insertions deletions reorganization can be beneficial

| 292

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (3)

Insertion Example Order k = 1 n=2k Keys 1 5 2 6 7 4 8 3

Finally h=3

| 293

gt

copy A Behrend Foundations of Information Systems |

Insertion and Deletion in the B-Tree

Problem Insertion can create overflow Deletion can create underflow and overflow

Example Insertion of key 22

rarr Overflow rarr Split

Deletion of key 22 rarr Underflow need to access all four nodes finally same as input

Insert 22

| 294

gt

copy A Behrend Foundations of Information Systems |

Underflow Merge

Deletion in the B-Tree

Example Order k = 1 n=2k Delete key 3

| 295

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (2)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Remember Each path from the root to the leaf has the same length h

| 296

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (3)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Overflow Split

| 297

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (4)

Example Order k = 1 n=2k Delete key 3

Overflow Split

Root Split

| 298

gt

copy A Behrend Foundations of Information Systems |

Deletion Algorithm

Example ndash there are different algorithms Search the node that contains the key K to be deleted If key K is in a leaf node delete the key in the leaf node and handle potentially

resulting underflow by merging with sibling If key K is in an inner node pull up new discriminator from one of the successors Analyze which successor node of K has more elements left or right one

If both have the same number of elements decide for one Replace the key K to be deleted with the direct successor Krsquo from the left

successor node or with the direct successor Krsquorsquo from the right successor node respectively Delete Krsquo or Krsquorsquo from the respective successor node (recursively)

Note Major variants Merge (tis lecture) Re-distribution (instead of splitmerge in case of overflowunderflow the entries are

re-distributed under consideration of one or multiple adjacent nodes)

| 299

gt

copy A Behrend Foundations of Information Systems |

B-Trees B+-Trees and B-Trees

B+-Trees and B-Trees Data is only in leaf nodes Key redundancy but higher fan-out rarr lower tree high less IO Simpler delete procedure rarr requires only merging of nodes

Double linked list of all leaf nodes B-Trees Modified valid node sizes

from [k2k] to [43k2k] rarr better node utilization but more splitsmerges

Example Secondary index Non unique

B-Tree with k = 2 h = 3

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 20: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

| 282

gt

copy A Behrend Foundations of Information Systems |

Free Space Management

Problem In which page is enough space for new record

Solution Free space table lists for all pages how much space is left

Free space value Precise value Ceil(Log2(page size)) =gt 2 bytes for common page size of 4K Rough value use less bytes free space = (value page size)2^(bits per value)

Free space table With direct page addressing Assuming a single page can take n free space entries First page and each (n+1)-th page takes free space entries

With indirect page addressing Free space information stored in page table

copy A Behrend Foundations of Information Systems | 283

gt

|

Storage System

Buffer

File System

Hardware

Data System

Application

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

Physical Access Paths ndash Index Structures

| 284

gt

copy A Behrend Foundations of Information Systems |

Primary Index Secondary Index

Overview Indexes

Table scan Read all pages and for each record

evaluate the search criteria Pre-fetching

Index Scan Use index for search criteria

on one or more attributes Fast access to single values or value ranges of index attributes Logicalphysical sorting of values of key attributes (depending on index structure) Enforcing uniqueness

Types if indexes Primary (Clustered) Index

determines physical organization use for PK Secondary (Non-Clustered)

Index redundant access path

Pers(PID NAME AGE SALARY)

Age

Salary

| 285

gt

copy A Behrend Foundations of Information Systems |

Overview Indexes (2)

Choice of Access Paths Index scan Only useful for low selectivity

(low number of result tuples) Break even-point according to the

output ratio of the number of tuples (usually max 5) Requires statistics about data Additional costs for index storage

and updating

Table Scan adequateefficient for small tables

(eg 5 pages) Queries with high selectivity

(large result sets) 100-200MBs sequential read

~ 100 disk seekss

hit rate

Index Scan

Table Scan

access time

| 286

gt

copy A Behrend Foundations of Information Systems |

Classification of Index Structures

Classification

Multiway Trees Tree structure with multiple children per node Idea chose fan out so that node size suits page size

Onedimensional Index Structures

Key Comparison Key Transformation

Sequential Tree-Based Hash-Based

Prefix Trees (Tries)

Binary Search Trees Dynamic Static Linked Lists

(log seq) Seq Lists

(phys seq) Multiway

Trees Example B-Tree

| 287

gt

copy A Behrend Foundations of Information Systems |

B-Tree

free space

(Ki Di Pi) = entry min |P| = k+1 max |P| = 2k+1

keys lt K1 keys gt Kp Ki lt keys lt Ki+1

| 288

gt

copy A Behrend Foundations of Information Systems |

B-Tree (2)

Example Keys Agnostic to specific key semantic Only defined complete order required Could be of fixed or variable length

Operations Search for data for given key value Insertion and deletion of key-data pair

Payload Agnostic to specific data semantic Can be record or reference (TID) or

mix

B-Tree with k = 2 h = 3

| 289

gt

copy A Behrend Foundations of Information Systems |

Search in the B-Tree

Starting at the root node each node is searched from left to right 1) if Ki matches the desired key value the data record has been found (further

records with the same key value might be located in a sub-tree to which Pi-1 points) 2) if Ki is smaller than the desired value the search will be continued in the root of

the sub-tree identified by Pi-1 3) if Ki is larger than the desired value the comparison with Ki+1 is repeated 4) if K2k is also smaller than the desired value the search will be continued in the

sub-tree of P2k If itlsquos impossible to descend further into a sub-tree (2 or 4) (leaf node) The search is aborted no record with the desired key value is found

Search for 38 20 6

| 290

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (1)

Insertion Rule insert only into leaf nodes At Non-Leaf Nodes descend down the tree as for the search S le Ki follow Pi-1 S gt Ki check Ki+1 S gt K2k follow P2k

At Leaf Node Insert the data record according to the sorting order Special case leaf node is full (2k records) rarr split the leaf node

Splitting Generate a new leaf node Split the 2k+1 entries (in order)

into two leaf nodes first k entries rarr left node last k entries rarr right node

middle entry (k+1-th) is used as new ldquodiscriminatorrdquo (branching) and inserted into the parent node

| 291

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (2)

Node Splitting during Insertion Two possible situations after a split The parent node is full rarr repeat split on this level Enough space rarr FINISHED

Special case root split Split of the root node rarr New root with two successor nodes Height of a tree grows by 1 The tree has been split from the bottom to the top

Dynamic reorganization (self-balancing) No unloading or loading necessary Tree is always balanced But In case of many insertions deletions reorganization can be beneficial

| 292

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (3)

Insertion Example Order k = 1 n=2k Keys 1 5 2 6 7 4 8 3

Finally h=3

| 293

gt

copy A Behrend Foundations of Information Systems |

Insertion and Deletion in the B-Tree

Problem Insertion can create overflow Deletion can create underflow and overflow

Example Insertion of key 22

rarr Overflow rarr Split

Deletion of key 22 rarr Underflow need to access all four nodes finally same as input

Insert 22

| 294

gt

copy A Behrend Foundations of Information Systems |

Underflow Merge

Deletion in the B-Tree

Example Order k = 1 n=2k Delete key 3

| 295

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (2)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Remember Each path from the root to the leaf has the same length h

| 296

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (3)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Overflow Split

| 297

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (4)

Example Order k = 1 n=2k Delete key 3

Overflow Split

Root Split

| 298

gt

copy A Behrend Foundations of Information Systems |

Deletion Algorithm

Example ndash there are different algorithms Search the node that contains the key K to be deleted If key K is in a leaf node delete the key in the leaf node and handle potentially

resulting underflow by merging with sibling If key K is in an inner node pull up new discriminator from one of the successors Analyze which successor node of K has more elements left or right one

If both have the same number of elements decide for one Replace the key K to be deleted with the direct successor Krsquo from the left

successor node or with the direct successor Krsquorsquo from the right successor node respectively Delete Krsquo or Krsquorsquo from the respective successor node (recursively)

Note Major variants Merge (tis lecture) Re-distribution (instead of splitmerge in case of overflowunderflow the entries are

re-distributed under consideration of one or multiple adjacent nodes)

| 299

gt

copy A Behrend Foundations of Information Systems |

B-Trees B+-Trees and B-Trees

B+-Trees and B-Trees Data is only in leaf nodes Key redundancy but higher fan-out rarr lower tree high less IO Simpler delete procedure rarr requires only merging of nodes

Double linked list of all leaf nodes B-Trees Modified valid node sizes

from [k2k] to [43k2k] rarr better node utilization but more splitsmerges

Example Secondary index Non unique

B-Tree with k = 2 h = 3

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 21: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

copy A Behrend Foundations of Information Systems | 283

gt

|

Storage System

Buffer

File System

Hardware

Data System

Application

TID TID TID

TID TID TID

SELECT sfirstname slastname COUNT(lname) FROM Student s INNER JOIN Program p ON sprogramId = pid INNER JOIN Attendance a ON astudentId = sstudentId INNER JOIN Lecture l ON alectureId = lid GROUP BY sfirstname slastname WHERE pname=lsquoDSErsquo

Run

Buffered Pages - Page replacement strategy - Materialization strategy - Logging Backup Recovery

Paged files

Disks Flash RAID SAN hellip

Storage Structures - Record management - Free space management - Physical access paths

Access System

Data

base

Sys

tem

Data model semantics - System catalog - Record format - Logical access paths

Query processing - Parsing - Plan generation - Plan optimization - Plan execution

1 lsquoSmithrsquo 15061982

Table Person id INT name VARCHAR birthday DATE Index P_id_IX on Personid

Database System

SQL JDBC ODBC hellip

Physical Access Paths ndash Index Structures

| 284

gt

copy A Behrend Foundations of Information Systems |

Primary Index Secondary Index

Overview Indexes

Table scan Read all pages and for each record

evaluate the search criteria Pre-fetching

Index Scan Use index for search criteria

on one or more attributes Fast access to single values or value ranges of index attributes Logicalphysical sorting of values of key attributes (depending on index structure) Enforcing uniqueness

Types if indexes Primary (Clustered) Index

determines physical organization use for PK Secondary (Non-Clustered)

Index redundant access path

Pers(PID NAME AGE SALARY)

Age

Salary

| 285

gt

copy A Behrend Foundations of Information Systems |

Overview Indexes (2)

Choice of Access Paths Index scan Only useful for low selectivity

(low number of result tuples) Break even-point according to the

output ratio of the number of tuples (usually max 5) Requires statistics about data Additional costs for index storage

and updating

Table Scan adequateefficient for small tables

(eg 5 pages) Queries with high selectivity

(large result sets) 100-200MBs sequential read

~ 100 disk seekss

hit rate

Index Scan

Table Scan

access time

| 286

gt

copy A Behrend Foundations of Information Systems |

Classification of Index Structures

Classification

Multiway Trees Tree structure with multiple children per node Idea chose fan out so that node size suits page size

Onedimensional Index Structures

Key Comparison Key Transformation

Sequential Tree-Based Hash-Based

Prefix Trees (Tries)

Binary Search Trees Dynamic Static Linked Lists

(log seq) Seq Lists

(phys seq) Multiway

Trees Example B-Tree

| 287

gt

copy A Behrend Foundations of Information Systems |

B-Tree

free space

(Ki Di Pi) = entry min |P| = k+1 max |P| = 2k+1

keys lt K1 keys gt Kp Ki lt keys lt Ki+1

| 288

gt

copy A Behrend Foundations of Information Systems |

B-Tree (2)

Example Keys Agnostic to specific key semantic Only defined complete order required Could be of fixed or variable length

Operations Search for data for given key value Insertion and deletion of key-data pair

Payload Agnostic to specific data semantic Can be record or reference (TID) or

mix

B-Tree with k = 2 h = 3

| 289

gt

copy A Behrend Foundations of Information Systems |

Search in the B-Tree

Starting at the root node each node is searched from left to right 1) if Ki matches the desired key value the data record has been found (further

records with the same key value might be located in a sub-tree to which Pi-1 points) 2) if Ki is smaller than the desired value the search will be continued in the root of

the sub-tree identified by Pi-1 3) if Ki is larger than the desired value the comparison with Ki+1 is repeated 4) if K2k is also smaller than the desired value the search will be continued in the

sub-tree of P2k If itlsquos impossible to descend further into a sub-tree (2 or 4) (leaf node) The search is aborted no record with the desired key value is found

Search for 38 20 6

| 290

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (1)

Insertion Rule insert only into leaf nodes At Non-Leaf Nodes descend down the tree as for the search S le Ki follow Pi-1 S gt Ki check Ki+1 S gt K2k follow P2k

At Leaf Node Insert the data record according to the sorting order Special case leaf node is full (2k records) rarr split the leaf node

Splitting Generate a new leaf node Split the 2k+1 entries (in order)

into two leaf nodes first k entries rarr left node last k entries rarr right node

middle entry (k+1-th) is used as new ldquodiscriminatorrdquo (branching) and inserted into the parent node

| 291

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (2)

Node Splitting during Insertion Two possible situations after a split The parent node is full rarr repeat split on this level Enough space rarr FINISHED

Special case root split Split of the root node rarr New root with two successor nodes Height of a tree grows by 1 The tree has been split from the bottom to the top

Dynamic reorganization (self-balancing) No unloading or loading necessary Tree is always balanced But In case of many insertions deletions reorganization can be beneficial

| 292

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (3)

Insertion Example Order k = 1 n=2k Keys 1 5 2 6 7 4 8 3

Finally h=3

| 293

gt

copy A Behrend Foundations of Information Systems |

Insertion and Deletion in the B-Tree

Problem Insertion can create overflow Deletion can create underflow and overflow

Example Insertion of key 22

rarr Overflow rarr Split

Deletion of key 22 rarr Underflow need to access all four nodes finally same as input

Insert 22

| 294

gt

copy A Behrend Foundations of Information Systems |

Underflow Merge

Deletion in the B-Tree

Example Order k = 1 n=2k Delete key 3

| 295

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (2)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Remember Each path from the root to the leaf has the same length h

| 296

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (3)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Overflow Split

| 297

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (4)

Example Order k = 1 n=2k Delete key 3

Overflow Split

Root Split

| 298

gt

copy A Behrend Foundations of Information Systems |

Deletion Algorithm

Example ndash there are different algorithms Search the node that contains the key K to be deleted If key K is in a leaf node delete the key in the leaf node and handle potentially

resulting underflow by merging with sibling If key K is in an inner node pull up new discriminator from one of the successors Analyze which successor node of K has more elements left or right one

If both have the same number of elements decide for one Replace the key K to be deleted with the direct successor Krsquo from the left

successor node or with the direct successor Krsquorsquo from the right successor node respectively Delete Krsquo or Krsquorsquo from the respective successor node (recursively)

Note Major variants Merge (tis lecture) Re-distribution (instead of splitmerge in case of overflowunderflow the entries are

re-distributed under consideration of one or multiple adjacent nodes)

| 299

gt

copy A Behrend Foundations of Information Systems |

B-Trees B+-Trees and B-Trees

B+-Trees and B-Trees Data is only in leaf nodes Key redundancy but higher fan-out rarr lower tree high less IO Simpler delete procedure rarr requires only merging of nodes

Double linked list of all leaf nodes B-Trees Modified valid node sizes

from [k2k] to [43k2k] rarr better node utilization but more splitsmerges

Example Secondary index Non unique

B-Tree with k = 2 h = 3

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 22: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

| 284

gt

copy A Behrend Foundations of Information Systems |

Primary Index Secondary Index

Overview Indexes

Table scan Read all pages and for each record

evaluate the search criteria Pre-fetching

Index Scan Use index for search criteria

on one or more attributes Fast access to single values or value ranges of index attributes Logicalphysical sorting of values of key attributes (depending on index structure) Enforcing uniqueness

Types if indexes Primary (Clustered) Index

determines physical organization use for PK Secondary (Non-Clustered)

Index redundant access path

Pers(PID NAME AGE SALARY)

Age

Salary

| 285

gt

copy A Behrend Foundations of Information Systems |

Overview Indexes (2)

Choice of Access Paths Index scan Only useful for low selectivity

(low number of result tuples) Break even-point according to the

output ratio of the number of tuples (usually max 5) Requires statistics about data Additional costs for index storage

and updating

Table Scan adequateefficient for small tables

(eg 5 pages) Queries with high selectivity

(large result sets) 100-200MBs sequential read

~ 100 disk seekss

hit rate

Index Scan

Table Scan

access time

| 286

gt

copy A Behrend Foundations of Information Systems |

Classification of Index Structures

Classification

Multiway Trees Tree structure with multiple children per node Idea chose fan out so that node size suits page size

Onedimensional Index Structures

Key Comparison Key Transformation

Sequential Tree-Based Hash-Based

Prefix Trees (Tries)

Binary Search Trees Dynamic Static Linked Lists

(log seq) Seq Lists

(phys seq) Multiway

Trees Example B-Tree

| 287

gt

copy A Behrend Foundations of Information Systems |

B-Tree

free space

(Ki Di Pi) = entry min |P| = k+1 max |P| = 2k+1

keys lt K1 keys gt Kp Ki lt keys lt Ki+1

| 288

gt

copy A Behrend Foundations of Information Systems |

B-Tree (2)

Example Keys Agnostic to specific key semantic Only defined complete order required Could be of fixed or variable length

Operations Search for data for given key value Insertion and deletion of key-data pair

Payload Agnostic to specific data semantic Can be record or reference (TID) or

mix

B-Tree with k = 2 h = 3

| 289

gt

copy A Behrend Foundations of Information Systems |

Search in the B-Tree

Starting at the root node each node is searched from left to right 1) if Ki matches the desired key value the data record has been found (further

records with the same key value might be located in a sub-tree to which Pi-1 points) 2) if Ki is smaller than the desired value the search will be continued in the root of

the sub-tree identified by Pi-1 3) if Ki is larger than the desired value the comparison with Ki+1 is repeated 4) if K2k is also smaller than the desired value the search will be continued in the

sub-tree of P2k If itlsquos impossible to descend further into a sub-tree (2 or 4) (leaf node) The search is aborted no record with the desired key value is found

Search for 38 20 6

| 290

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (1)

Insertion Rule insert only into leaf nodes At Non-Leaf Nodes descend down the tree as for the search S le Ki follow Pi-1 S gt Ki check Ki+1 S gt K2k follow P2k

At Leaf Node Insert the data record according to the sorting order Special case leaf node is full (2k records) rarr split the leaf node

Splitting Generate a new leaf node Split the 2k+1 entries (in order)

into two leaf nodes first k entries rarr left node last k entries rarr right node

middle entry (k+1-th) is used as new ldquodiscriminatorrdquo (branching) and inserted into the parent node

| 291

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (2)

Node Splitting during Insertion Two possible situations after a split The parent node is full rarr repeat split on this level Enough space rarr FINISHED

Special case root split Split of the root node rarr New root with two successor nodes Height of a tree grows by 1 The tree has been split from the bottom to the top

Dynamic reorganization (self-balancing) No unloading or loading necessary Tree is always balanced But In case of many insertions deletions reorganization can be beneficial

| 292

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (3)

Insertion Example Order k = 1 n=2k Keys 1 5 2 6 7 4 8 3

Finally h=3

| 293

gt

copy A Behrend Foundations of Information Systems |

Insertion and Deletion in the B-Tree

Problem Insertion can create overflow Deletion can create underflow and overflow

Example Insertion of key 22

rarr Overflow rarr Split

Deletion of key 22 rarr Underflow need to access all four nodes finally same as input

Insert 22

| 294

gt

copy A Behrend Foundations of Information Systems |

Underflow Merge

Deletion in the B-Tree

Example Order k = 1 n=2k Delete key 3

| 295

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (2)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Remember Each path from the root to the leaf has the same length h

| 296

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (3)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Overflow Split

| 297

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (4)

Example Order k = 1 n=2k Delete key 3

Overflow Split

Root Split

| 298

gt

copy A Behrend Foundations of Information Systems |

Deletion Algorithm

Example ndash there are different algorithms Search the node that contains the key K to be deleted If key K is in a leaf node delete the key in the leaf node and handle potentially

resulting underflow by merging with sibling If key K is in an inner node pull up new discriminator from one of the successors Analyze which successor node of K has more elements left or right one

If both have the same number of elements decide for one Replace the key K to be deleted with the direct successor Krsquo from the left

successor node or with the direct successor Krsquorsquo from the right successor node respectively Delete Krsquo or Krsquorsquo from the respective successor node (recursively)

Note Major variants Merge (tis lecture) Re-distribution (instead of splitmerge in case of overflowunderflow the entries are

re-distributed under consideration of one or multiple adjacent nodes)

| 299

gt

copy A Behrend Foundations of Information Systems |

B-Trees B+-Trees and B-Trees

B+-Trees and B-Trees Data is only in leaf nodes Key redundancy but higher fan-out rarr lower tree high less IO Simpler delete procedure rarr requires only merging of nodes

Double linked list of all leaf nodes B-Trees Modified valid node sizes

from [k2k] to [43k2k] rarr better node utilization but more splitsmerges

Example Secondary index Non unique

B-Tree with k = 2 h = 3

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 23: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

| 285

gt

copy A Behrend Foundations of Information Systems |

Overview Indexes (2)

Choice of Access Paths Index scan Only useful for low selectivity

(low number of result tuples) Break even-point according to the

output ratio of the number of tuples (usually max 5) Requires statistics about data Additional costs for index storage

and updating

Table Scan adequateefficient for small tables

(eg 5 pages) Queries with high selectivity

(large result sets) 100-200MBs sequential read

~ 100 disk seekss

hit rate

Index Scan

Table Scan

access time

| 286

gt

copy A Behrend Foundations of Information Systems |

Classification of Index Structures

Classification

Multiway Trees Tree structure with multiple children per node Idea chose fan out so that node size suits page size

Onedimensional Index Structures

Key Comparison Key Transformation

Sequential Tree-Based Hash-Based

Prefix Trees (Tries)

Binary Search Trees Dynamic Static Linked Lists

(log seq) Seq Lists

(phys seq) Multiway

Trees Example B-Tree

| 287

gt

copy A Behrend Foundations of Information Systems |

B-Tree

free space

(Ki Di Pi) = entry min |P| = k+1 max |P| = 2k+1

keys lt K1 keys gt Kp Ki lt keys lt Ki+1

| 288

gt

copy A Behrend Foundations of Information Systems |

B-Tree (2)

Example Keys Agnostic to specific key semantic Only defined complete order required Could be of fixed or variable length

Operations Search for data for given key value Insertion and deletion of key-data pair

Payload Agnostic to specific data semantic Can be record or reference (TID) or

mix

B-Tree with k = 2 h = 3

| 289

gt

copy A Behrend Foundations of Information Systems |

Search in the B-Tree

Starting at the root node each node is searched from left to right 1) if Ki matches the desired key value the data record has been found (further

records with the same key value might be located in a sub-tree to which Pi-1 points) 2) if Ki is smaller than the desired value the search will be continued in the root of

the sub-tree identified by Pi-1 3) if Ki is larger than the desired value the comparison with Ki+1 is repeated 4) if K2k is also smaller than the desired value the search will be continued in the

sub-tree of P2k If itlsquos impossible to descend further into a sub-tree (2 or 4) (leaf node) The search is aborted no record with the desired key value is found

Search for 38 20 6

| 290

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (1)

Insertion Rule insert only into leaf nodes At Non-Leaf Nodes descend down the tree as for the search S le Ki follow Pi-1 S gt Ki check Ki+1 S gt K2k follow P2k

At Leaf Node Insert the data record according to the sorting order Special case leaf node is full (2k records) rarr split the leaf node

Splitting Generate a new leaf node Split the 2k+1 entries (in order)

into two leaf nodes first k entries rarr left node last k entries rarr right node

middle entry (k+1-th) is used as new ldquodiscriminatorrdquo (branching) and inserted into the parent node

| 291

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (2)

Node Splitting during Insertion Two possible situations after a split The parent node is full rarr repeat split on this level Enough space rarr FINISHED

Special case root split Split of the root node rarr New root with two successor nodes Height of a tree grows by 1 The tree has been split from the bottom to the top

Dynamic reorganization (self-balancing) No unloading or loading necessary Tree is always balanced But In case of many insertions deletions reorganization can be beneficial

| 292

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (3)

Insertion Example Order k = 1 n=2k Keys 1 5 2 6 7 4 8 3

Finally h=3

| 293

gt

copy A Behrend Foundations of Information Systems |

Insertion and Deletion in the B-Tree

Problem Insertion can create overflow Deletion can create underflow and overflow

Example Insertion of key 22

rarr Overflow rarr Split

Deletion of key 22 rarr Underflow need to access all four nodes finally same as input

Insert 22

| 294

gt

copy A Behrend Foundations of Information Systems |

Underflow Merge

Deletion in the B-Tree

Example Order k = 1 n=2k Delete key 3

| 295

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (2)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Remember Each path from the root to the leaf has the same length h

| 296

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (3)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Overflow Split

| 297

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (4)

Example Order k = 1 n=2k Delete key 3

Overflow Split

Root Split

| 298

gt

copy A Behrend Foundations of Information Systems |

Deletion Algorithm

Example ndash there are different algorithms Search the node that contains the key K to be deleted If key K is in a leaf node delete the key in the leaf node and handle potentially

resulting underflow by merging with sibling If key K is in an inner node pull up new discriminator from one of the successors Analyze which successor node of K has more elements left or right one

If both have the same number of elements decide for one Replace the key K to be deleted with the direct successor Krsquo from the left

successor node or with the direct successor Krsquorsquo from the right successor node respectively Delete Krsquo or Krsquorsquo from the respective successor node (recursively)

Note Major variants Merge (tis lecture) Re-distribution (instead of splitmerge in case of overflowunderflow the entries are

re-distributed under consideration of one or multiple adjacent nodes)

| 299

gt

copy A Behrend Foundations of Information Systems |

B-Trees B+-Trees and B-Trees

B+-Trees and B-Trees Data is only in leaf nodes Key redundancy but higher fan-out rarr lower tree high less IO Simpler delete procedure rarr requires only merging of nodes

Double linked list of all leaf nodes B-Trees Modified valid node sizes

from [k2k] to [43k2k] rarr better node utilization but more splitsmerges

Example Secondary index Non unique

B-Tree with k = 2 h = 3

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 24: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

| 286

gt

copy A Behrend Foundations of Information Systems |

Classification of Index Structures

Classification

Multiway Trees Tree structure with multiple children per node Idea chose fan out so that node size suits page size

Onedimensional Index Structures

Key Comparison Key Transformation

Sequential Tree-Based Hash-Based

Prefix Trees (Tries)

Binary Search Trees Dynamic Static Linked Lists

(log seq) Seq Lists

(phys seq) Multiway

Trees Example B-Tree

| 287

gt

copy A Behrend Foundations of Information Systems |

B-Tree

free space

(Ki Di Pi) = entry min |P| = k+1 max |P| = 2k+1

keys lt K1 keys gt Kp Ki lt keys lt Ki+1

| 288

gt

copy A Behrend Foundations of Information Systems |

B-Tree (2)

Example Keys Agnostic to specific key semantic Only defined complete order required Could be of fixed or variable length

Operations Search for data for given key value Insertion and deletion of key-data pair

Payload Agnostic to specific data semantic Can be record or reference (TID) or

mix

B-Tree with k = 2 h = 3

| 289

gt

copy A Behrend Foundations of Information Systems |

Search in the B-Tree

Starting at the root node each node is searched from left to right 1) if Ki matches the desired key value the data record has been found (further

records with the same key value might be located in a sub-tree to which Pi-1 points) 2) if Ki is smaller than the desired value the search will be continued in the root of

the sub-tree identified by Pi-1 3) if Ki is larger than the desired value the comparison with Ki+1 is repeated 4) if K2k is also smaller than the desired value the search will be continued in the

sub-tree of P2k If itlsquos impossible to descend further into a sub-tree (2 or 4) (leaf node) The search is aborted no record with the desired key value is found

Search for 38 20 6

| 290

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (1)

Insertion Rule insert only into leaf nodes At Non-Leaf Nodes descend down the tree as for the search S le Ki follow Pi-1 S gt Ki check Ki+1 S gt K2k follow P2k

At Leaf Node Insert the data record according to the sorting order Special case leaf node is full (2k records) rarr split the leaf node

Splitting Generate a new leaf node Split the 2k+1 entries (in order)

into two leaf nodes first k entries rarr left node last k entries rarr right node

middle entry (k+1-th) is used as new ldquodiscriminatorrdquo (branching) and inserted into the parent node

| 291

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (2)

Node Splitting during Insertion Two possible situations after a split The parent node is full rarr repeat split on this level Enough space rarr FINISHED

Special case root split Split of the root node rarr New root with two successor nodes Height of a tree grows by 1 The tree has been split from the bottom to the top

Dynamic reorganization (self-balancing) No unloading or loading necessary Tree is always balanced But In case of many insertions deletions reorganization can be beneficial

| 292

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (3)

Insertion Example Order k = 1 n=2k Keys 1 5 2 6 7 4 8 3

Finally h=3

| 293

gt

copy A Behrend Foundations of Information Systems |

Insertion and Deletion in the B-Tree

Problem Insertion can create overflow Deletion can create underflow and overflow

Example Insertion of key 22

rarr Overflow rarr Split

Deletion of key 22 rarr Underflow need to access all four nodes finally same as input

Insert 22

| 294

gt

copy A Behrend Foundations of Information Systems |

Underflow Merge

Deletion in the B-Tree

Example Order k = 1 n=2k Delete key 3

| 295

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (2)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Remember Each path from the root to the leaf has the same length h

| 296

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (3)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Overflow Split

| 297

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (4)

Example Order k = 1 n=2k Delete key 3

Overflow Split

Root Split

| 298

gt

copy A Behrend Foundations of Information Systems |

Deletion Algorithm

Example ndash there are different algorithms Search the node that contains the key K to be deleted If key K is in a leaf node delete the key in the leaf node and handle potentially

resulting underflow by merging with sibling If key K is in an inner node pull up new discriminator from one of the successors Analyze which successor node of K has more elements left or right one

If both have the same number of elements decide for one Replace the key K to be deleted with the direct successor Krsquo from the left

successor node or with the direct successor Krsquorsquo from the right successor node respectively Delete Krsquo or Krsquorsquo from the respective successor node (recursively)

Note Major variants Merge (tis lecture) Re-distribution (instead of splitmerge in case of overflowunderflow the entries are

re-distributed under consideration of one or multiple adjacent nodes)

| 299

gt

copy A Behrend Foundations of Information Systems |

B-Trees B+-Trees and B-Trees

B+-Trees and B-Trees Data is only in leaf nodes Key redundancy but higher fan-out rarr lower tree high less IO Simpler delete procedure rarr requires only merging of nodes

Double linked list of all leaf nodes B-Trees Modified valid node sizes

from [k2k] to [43k2k] rarr better node utilization but more splitsmerges

Example Secondary index Non unique

B-Tree with k = 2 h = 3

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 25: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

| 287

gt

copy A Behrend Foundations of Information Systems |

B-Tree

free space

(Ki Di Pi) = entry min |P| = k+1 max |P| = 2k+1

keys lt K1 keys gt Kp Ki lt keys lt Ki+1

| 288

gt

copy A Behrend Foundations of Information Systems |

B-Tree (2)

Example Keys Agnostic to specific key semantic Only defined complete order required Could be of fixed or variable length

Operations Search for data for given key value Insertion and deletion of key-data pair

Payload Agnostic to specific data semantic Can be record or reference (TID) or

mix

B-Tree with k = 2 h = 3

| 289

gt

copy A Behrend Foundations of Information Systems |

Search in the B-Tree

Starting at the root node each node is searched from left to right 1) if Ki matches the desired key value the data record has been found (further

records with the same key value might be located in a sub-tree to which Pi-1 points) 2) if Ki is smaller than the desired value the search will be continued in the root of

the sub-tree identified by Pi-1 3) if Ki is larger than the desired value the comparison with Ki+1 is repeated 4) if K2k is also smaller than the desired value the search will be continued in the

sub-tree of P2k If itlsquos impossible to descend further into a sub-tree (2 or 4) (leaf node) The search is aborted no record with the desired key value is found

Search for 38 20 6

| 290

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (1)

Insertion Rule insert only into leaf nodes At Non-Leaf Nodes descend down the tree as for the search S le Ki follow Pi-1 S gt Ki check Ki+1 S gt K2k follow P2k

At Leaf Node Insert the data record according to the sorting order Special case leaf node is full (2k records) rarr split the leaf node

Splitting Generate a new leaf node Split the 2k+1 entries (in order)

into two leaf nodes first k entries rarr left node last k entries rarr right node

middle entry (k+1-th) is used as new ldquodiscriminatorrdquo (branching) and inserted into the parent node

| 291

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (2)

Node Splitting during Insertion Two possible situations after a split The parent node is full rarr repeat split on this level Enough space rarr FINISHED

Special case root split Split of the root node rarr New root with two successor nodes Height of a tree grows by 1 The tree has been split from the bottom to the top

Dynamic reorganization (self-balancing) No unloading or loading necessary Tree is always balanced But In case of many insertions deletions reorganization can be beneficial

| 292

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (3)

Insertion Example Order k = 1 n=2k Keys 1 5 2 6 7 4 8 3

Finally h=3

| 293

gt

copy A Behrend Foundations of Information Systems |

Insertion and Deletion in the B-Tree

Problem Insertion can create overflow Deletion can create underflow and overflow

Example Insertion of key 22

rarr Overflow rarr Split

Deletion of key 22 rarr Underflow need to access all four nodes finally same as input

Insert 22

| 294

gt

copy A Behrend Foundations of Information Systems |

Underflow Merge

Deletion in the B-Tree

Example Order k = 1 n=2k Delete key 3

| 295

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (2)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Remember Each path from the root to the leaf has the same length h

| 296

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (3)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Overflow Split

| 297

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (4)

Example Order k = 1 n=2k Delete key 3

Overflow Split

Root Split

| 298

gt

copy A Behrend Foundations of Information Systems |

Deletion Algorithm

Example ndash there are different algorithms Search the node that contains the key K to be deleted If key K is in a leaf node delete the key in the leaf node and handle potentially

resulting underflow by merging with sibling If key K is in an inner node pull up new discriminator from one of the successors Analyze which successor node of K has more elements left or right one

If both have the same number of elements decide for one Replace the key K to be deleted with the direct successor Krsquo from the left

successor node or with the direct successor Krsquorsquo from the right successor node respectively Delete Krsquo or Krsquorsquo from the respective successor node (recursively)

Note Major variants Merge (tis lecture) Re-distribution (instead of splitmerge in case of overflowunderflow the entries are

re-distributed under consideration of one or multiple adjacent nodes)

| 299

gt

copy A Behrend Foundations of Information Systems |

B-Trees B+-Trees and B-Trees

B+-Trees and B-Trees Data is only in leaf nodes Key redundancy but higher fan-out rarr lower tree high less IO Simpler delete procedure rarr requires only merging of nodes

Double linked list of all leaf nodes B-Trees Modified valid node sizes

from [k2k] to [43k2k] rarr better node utilization but more splitsmerges

Example Secondary index Non unique

B-Tree with k = 2 h = 3

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 26: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

| 288

gt

copy A Behrend Foundations of Information Systems |

B-Tree (2)

Example Keys Agnostic to specific key semantic Only defined complete order required Could be of fixed or variable length

Operations Search for data for given key value Insertion and deletion of key-data pair

Payload Agnostic to specific data semantic Can be record or reference (TID) or

mix

B-Tree with k = 2 h = 3

| 289

gt

copy A Behrend Foundations of Information Systems |

Search in the B-Tree

Starting at the root node each node is searched from left to right 1) if Ki matches the desired key value the data record has been found (further

records with the same key value might be located in a sub-tree to which Pi-1 points) 2) if Ki is smaller than the desired value the search will be continued in the root of

the sub-tree identified by Pi-1 3) if Ki is larger than the desired value the comparison with Ki+1 is repeated 4) if K2k is also smaller than the desired value the search will be continued in the

sub-tree of P2k If itlsquos impossible to descend further into a sub-tree (2 or 4) (leaf node) The search is aborted no record with the desired key value is found

Search for 38 20 6

| 290

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (1)

Insertion Rule insert only into leaf nodes At Non-Leaf Nodes descend down the tree as for the search S le Ki follow Pi-1 S gt Ki check Ki+1 S gt K2k follow P2k

At Leaf Node Insert the data record according to the sorting order Special case leaf node is full (2k records) rarr split the leaf node

Splitting Generate a new leaf node Split the 2k+1 entries (in order)

into two leaf nodes first k entries rarr left node last k entries rarr right node

middle entry (k+1-th) is used as new ldquodiscriminatorrdquo (branching) and inserted into the parent node

| 291

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (2)

Node Splitting during Insertion Two possible situations after a split The parent node is full rarr repeat split on this level Enough space rarr FINISHED

Special case root split Split of the root node rarr New root with two successor nodes Height of a tree grows by 1 The tree has been split from the bottom to the top

Dynamic reorganization (self-balancing) No unloading or loading necessary Tree is always balanced But In case of many insertions deletions reorganization can be beneficial

| 292

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (3)

Insertion Example Order k = 1 n=2k Keys 1 5 2 6 7 4 8 3

Finally h=3

| 293

gt

copy A Behrend Foundations of Information Systems |

Insertion and Deletion in the B-Tree

Problem Insertion can create overflow Deletion can create underflow and overflow

Example Insertion of key 22

rarr Overflow rarr Split

Deletion of key 22 rarr Underflow need to access all four nodes finally same as input

Insert 22

| 294

gt

copy A Behrend Foundations of Information Systems |

Underflow Merge

Deletion in the B-Tree

Example Order k = 1 n=2k Delete key 3

| 295

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (2)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Remember Each path from the root to the leaf has the same length h

| 296

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (3)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Overflow Split

| 297

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (4)

Example Order k = 1 n=2k Delete key 3

Overflow Split

Root Split

| 298

gt

copy A Behrend Foundations of Information Systems |

Deletion Algorithm

Example ndash there are different algorithms Search the node that contains the key K to be deleted If key K is in a leaf node delete the key in the leaf node and handle potentially

resulting underflow by merging with sibling If key K is in an inner node pull up new discriminator from one of the successors Analyze which successor node of K has more elements left or right one

If both have the same number of elements decide for one Replace the key K to be deleted with the direct successor Krsquo from the left

successor node or with the direct successor Krsquorsquo from the right successor node respectively Delete Krsquo or Krsquorsquo from the respective successor node (recursively)

Note Major variants Merge (tis lecture) Re-distribution (instead of splitmerge in case of overflowunderflow the entries are

re-distributed under consideration of one or multiple adjacent nodes)

| 299

gt

copy A Behrend Foundations of Information Systems |

B-Trees B+-Trees and B-Trees

B+-Trees and B-Trees Data is only in leaf nodes Key redundancy but higher fan-out rarr lower tree high less IO Simpler delete procedure rarr requires only merging of nodes

Double linked list of all leaf nodes B-Trees Modified valid node sizes

from [k2k] to [43k2k] rarr better node utilization but more splitsmerges

Example Secondary index Non unique

B-Tree with k = 2 h = 3

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 27: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

| 289

gt

copy A Behrend Foundations of Information Systems |

Search in the B-Tree

Starting at the root node each node is searched from left to right 1) if Ki matches the desired key value the data record has been found (further

records with the same key value might be located in a sub-tree to which Pi-1 points) 2) if Ki is smaller than the desired value the search will be continued in the root of

the sub-tree identified by Pi-1 3) if Ki is larger than the desired value the comparison with Ki+1 is repeated 4) if K2k is also smaller than the desired value the search will be continued in the

sub-tree of P2k If itlsquos impossible to descend further into a sub-tree (2 or 4) (leaf node) The search is aborted no record with the desired key value is found

Search for 38 20 6

| 290

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (1)

Insertion Rule insert only into leaf nodes At Non-Leaf Nodes descend down the tree as for the search S le Ki follow Pi-1 S gt Ki check Ki+1 S gt K2k follow P2k

At Leaf Node Insert the data record according to the sorting order Special case leaf node is full (2k records) rarr split the leaf node

Splitting Generate a new leaf node Split the 2k+1 entries (in order)

into two leaf nodes first k entries rarr left node last k entries rarr right node

middle entry (k+1-th) is used as new ldquodiscriminatorrdquo (branching) and inserted into the parent node

| 291

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (2)

Node Splitting during Insertion Two possible situations after a split The parent node is full rarr repeat split on this level Enough space rarr FINISHED

Special case root split Split of the root node rarr New root with two successor nodes Height of a tree grows by 1 The tree has been split from the bottom to the top

Dynamic reorganization (self-balancing) No unloading or loading necessary Tree is always balanced But In case of many insertions deletions reorganization can be beneficial

| 292

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (3)

Insertion Example Order k = 1 n=2k Keys 1 5 2 6 7 4 8 3

Finally h=3

| 293

gt

copy A Behrend Foundations of Information Systems |

Insertion and Deletion in the B-Tree

Problem Insertion can create overflow Deletion can create underflow and overflow

Example Insertion of key 22

rarr Overflow rarr Split

Deletion of key 22 rarr Underflow need to access all four nodes finally same as input

Insert 22

| 294

gt

copy A Behrend Foundations of Information Systems |

Underflow Merge

Deletion in the B-Tree

Example Order k = 1 n=2k Delete key 3

| 295

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (2)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Remember Each path from the root to the leaf has the same length h

| 296

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (3)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Overflow Split

| 297

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (4)

Example Order k = 1 n=2k Delete key 3

Overflow Split

Root Split

| 298

gt

copy A Behrend Foundations of Information Systems |

Deletion Algorithm

Example ndash there are different algorithms Search the node that contains the key K to be deleted If key K is in a leaf node delete the key in the leaf node and handle potentially

resulting underflow by merging with sibling If key K is in an inner node pull up new discriminator from one of the successors Analyze which successor node of K has more elements left or right one

If both have the same number of elements decide for one Replace the key K to be deleted with the direct successor Krsquo from the left

successor node or with the direct successor Krsquorsquo from the right successor node respectively Delete Krsquo or Krsquorsquo from the respective successor node (recursively)

Note Major variants Merge (tis lecture) Re-distribution (instead of splitmerge in case of overflowunderflow the entries are

re-distributed under consideration of one or multiple adjacent nodes)

| 299

gt

copy A Behrend Foundations of Information Systems |

B-Trees B+-Trees and B-Trees

B+-Trees and B-Trees Data is only in leaf nodes Key redundancy but higher fan-out rarr lower tree high less IO Simpler delete procedure rarr requires only merging of nodes

Double linked list of all leaf nodes B-Trees Modified valid node sizes

from [k2k] to [43k2k] rarr better node utilization but more splitsmerges

Example Secondary index Non unique

B-Tree with k = 2 h = 3

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 28: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

| 290

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (1)

Insertion Rule insert only into leaf nodes At Non-Leaf Nodes descend down the tree as for the search S le Ki follow Pi-1 S gt Ki check Ki+1 S gt K2k follow P2k

At Leaf Node Insert the data record according to the sorting order Special case leaf node is full (2k records) rarr split the leaf node

Splitting Generate a new leaf node Split the 2k+1 entries (in order)

into two leaf nodes first k entries rarr left node last k entries rarr right node

middle entry (k+1-th) is used as new ldquodiscriminatorrdquo (branching) and inserted into the parent node

| 291

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (2)

Node Splitting during Insertion Two possible situations after a split The parent node is full rarr repeat split on this level Enough space rarr FINISHED

Special case root split Split of the root node rarr New root with two successor nodes Height of a tree grows by 1 The tree has been split from the bottom to the top

Dynamic reorganization (self-balancing) No unloading or loading necessary Tree is always balanced But In case of many insertions deletions reorganization can be beneficial

| 292

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (3)

Insertion Example Order k = 1 n=2k Keys 1 5 2 6 7 4 8 3

Finally h=3

| 293

gt

copy A Behrend Foundations of Information Systems |

Insertion and Deletion in the B-Tree

Problem Insertion can create overflow Deletion can create underflow and overflow

Example Insertion of key 22

rarr Overflow rarr Split

Deletion of key 22 rarr Underflow need to access all four nodes finally same as input

Insert 22

| 294

gt

copy A Behrend Foundations of Information Systems |

Underflow Merge

Deletion in the B-Tree

Example Order k = 1 n=2k Delete key 3

| 295

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (2)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Remember Each path from the root to the leaf has the same length h

| 296

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (3)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Overflow Split

| 297

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (4)

Example Order k = 1 n=2k Delete key 3

Overflow Split

Root Split

| 298

gt

copy A Behrend Foundations of Information Systems |

Deletion Algorithm

Example ndash there are different algorithms Search the node that contains the key K to be deleted If key K is in a leaf node delete the key in the leaf node and handle potentially

resulting underflow by merging with sibling If key K is in an inner node pull up new discriminator from one of the successors Analyze which successor node of K has more elements left or right one

If both have the same number of elements decide for one Replace the key K to be deleted with the direct successor Krsquo from the left

successor node or with the direct successor Krsquorsquo from the right successor node respectively Delete Krsquo or Krsquorsquo from the respective successor node (recursively)

Note Major variants Merge (tis lecture) Re-distribution (instead of splitmerge in case of overflowunderflow the entries are

re-distributed under consideration of one or multiple adjacent nodes)

| 299

gt

copy A Behrend Foundations of Information Systems |

B-Trees B+-Trees and B-Trees

B+-Trees and B-Trees Data is only in leaf nodes Key redundancy but higher fan-out rarr lower tree high less IO Simpler delete procedure rarr requires only merging of nodes

Double linked list of all leaf nodes B-Trees Modified valid node sizes

from [k2k] to [43k2k] rarr better node utilization but more splitsmerges

Example Secondary index Non unique

B-Tree with k = 2 h = 3

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 29: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

| 291

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (2)

Node Splitting during Insertion Two possible situations after a split The parent node is full rarr repeat split on this level Enough space rarr FINISHED

Special case root split Split of the root node rarr New root with two successor nodes Height of a tree grows by 1 The tree has been split from the bottom to the top

Dynamic reorganization (self-balancing) No unloading or loading necessary Tree is always balanced But In case of many insertions deletions reorganization can be beneficial

| 292

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (3)

Insertion Example Order k = 1 n=2k Keys 1 5 2 6 7 4 8 3

Finally h=3

| 293

gt

copy A Behrend Foundations of Information Systems |

Insertion and Deletion in the B-Tree

Problem Insertion can create overflow Deletion can create underflow and overflow

Example Insertion of key 22

rarr Overflow rarr Split

Deletion of key 22 rarr Underflow need to access all four nodes finally same as input

Insert 22

| 294

gt

copy A Behrend Foundations of Information Systems |

Underflow Merge

Deletion in the B-Tree

Example Order k = 1 n=2k Delete key 3

| 295

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (2)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Remember Each path from the root to the leaf has the same length h

| 296

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (3)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Overflow Split

| 297

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (4)

Example Order k = 1 n=2k Delete key 3

Overflow Split

Root Split

| 298

gt

copy A Behrend Foundations of Information Systems |

Deletion Algorithm

Example ndash there are different algorithms Search the node that contains the key K to be deleted If key K is in a leaf node delete the key in the leaf node and handle potentially

resulting underflow by merging with sibling If key K is in an inner node pull up new discriminator from one of the successors Analyze which successor node of K has more elements left or right one

If both have the same number of elements decide for one Replace the key K to be deleted with the direct successor Krsquo from the left

successor node or with the direct successor Krsquorsquo from the right successor node respectively Delete Krsquo or Krsquorsquo from the respective successor node (recursively)

Note Major variants Merge (tis lecture) Re-distribution (instead of splitmerge in case of overflowunderflow the entries are

re-distributed under consideration of one or multiple adjacent nodes)

| 299

gt

copy A Behrend Foundations of Information Systems |

B-Trees B+-Trees and B-Trees

B+-Trees and B-Trees Data is only in leaf nodes Key redundancy but higher fan-out rarr lower tree high less IO Simpler delete procedure rarr requires only merging of nodes

Double linked list of all leaf nodes B-Trees Modified valid node sizes

from [k2k] to [43k2k] rarr better node utilization but more splitsmerges

Example Secondary index Non unique

B-Tree with k = 2 h = 3

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 30: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

| 292

gt

copy A Behrend Foundations of Information Systems |

Insertions in the B-Tree (3)

Insertion Example Order k = 1 n=2k Keys 1 5 2 6 7 4 8 3

Finally h=3

| 293

gt

copy A Behrend Foundations of Information Systems |

Insertion and Deletion in the B-Tree

Problem Insertion can create overflow Deletion can create underflow and overflow

Example Insertion of key 22

rarr Overflow rarr Split

Deletion of key 22 rarr Underflow need to access all four nodes finally same as input

Insert 22

| 294

gt

copy A Behrend Foundations of Information Systems |

Underflow Merge

Deletion in the B-Tree

Example Order k = 1 n=2k Delete key 3

| 295

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (2)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Remember Each path from the root to the leaf has the same length h

| 296

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (3)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Overflow Split

| 297

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (4)

Example Order k = 1 n=2k Delete key 3

Overflow Split

Root Split

| 298

gt

copy A Behrend Foundations of Information Systems |

Deletion Algorithm

Example ndash there are different algorithms Search the node that contains the key K to be deleted If key K is in a leaf node delete the key in the leaf node and handle potentially

resulting underflow by merging with sibling If key K is in an inner node pull up new discriminator from one of the successors Analyze which successor node of K has more elements left or right one

If both have the same number of elements decide for one Replace the key K to be deleted with the direct successor Krsquo from the left

successor node or with the direct successor Krsquorsquo from the right successor node respectively Delete Krsquo or Krsquorsquo from the respective successor node (recursively)

Note Major variants Merge (tis lecture) Re-distribution (instead of splitmerge in case of overflowunderflow the entries are

re-distributed under consideration of one or multiple adjacent nodes)

| 299

gt

copy A Behrend Foundations of Information Systems |

B-Trees B+-Trees and B-Trees

B+-Trees and B-Trees Data is only in leaf nodes Key redundancy but higher fan-out rarr lower tree high less IO Simpler delete procedure rarr requires only merging of nodes

Double linked list of all leaf nodes B-Trees Modified valid node sizes

from [k2k] to [43k2k] rarr better node utilization but more splitsmerges

Example Secondary index Non unique

B-Tree with k = 2 h = 3

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 31: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

| 293

gt

copy A Behrend Foundations of Information Systems |

Insertion and Deletion in the B-Tree

Problem Insertion can create overflow Deletion can create underflow and overflow

Example Insertion of key 22

rarr Overflow rarr Split

Deletion of key 22 rarr Underflow need to access all four nodes finally same as input

Insert 22

| 294

gt

copy A Behrend Foundations of Information Systems |

Underflow Merge

Deletion in the B-Tree

Example Order k = 1 n=2k Delete key 3

| 295

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (2)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Remember Each path from the root to the leaf has the same length h

| 296

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (3)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Overflow Split

| 297

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (4)

Example Order k = 1 n=2k Delete key 3

Overflow Split

Root Split

| 298

gt

copy A Behrend Foundations of Information Systems |

Deletion Algorithm

Example ndash there are different algorithms Search the node that contains the key K to be deleted If key K is in a leaf node delete the key in the leaf node and handle potentially

resulting underflow by merging with sibling If key K is in an inner node pull up new discriminator from one of the successors Analyze which successor node of K has more elements left or right one

If both have the same number of elements decide for one Replace the key K to be deleted with the direct successor Krsquo from the left

successor node or with the direct successor Krsquorsquo from the right successor node respectively Delete Krsquo or Krsquorsquo from the respective successor node (recursively)

Note Major variants Merge (tis lecture) Re-distribution (instead of splitmerge in case of overflowunderflow the entries are

re-distributed under consideration of one or multiple adjacent nodes)

| 299

gt

copy A Behrend Foundations of Information Systems |

B-Trees B+-Trees and B-Trees

B+-Trees and B-Trees Data is only in leaf nodes Key redundancy but higher fan-out rarr lower tree high less IO Simpler delete procedure rarr requires only merging of nodes

Double linked list of all leaf nodes B-Trees Modified valid node sizes

from [k2k] to [43k2k] rarr better node utilization but more splitsmerges

Example Secondary index Non unique

B-Tree with k = 2 h = 3

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 32: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

| 294

gt

copy A Behrend Foundations of Information Systems |

Underflow Merge

Deletion in the B-Tree

Example Order k = 1 n=2k Delete key 3

| 295

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (2)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Remember Each path from the root to the leaf has the same length h

| 296

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (3)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Overflow Split

| 297

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (4)

Example Order k = 1 n=2k Delete key 3

Overflow Split

Root Split

| 298

gt

copy A Behrend Foundations of Information Systems |

Deletion Algorithm

Example ndash there are different algorithms Search the node that contains the key K to be deleted If key K is in a leaf node delete the key in the leaf node and handle potentially

resulting underflow by merging with sibling If key K is in an inner node pull up new discriminator from one of the successors Analyze which successor node of K has more elements left or right one

If both have the same number of elements decide for one Replace the key K to be deleted with the direct successor Krsquo from the left

successor node or with the direct successor Krsquorsquo from the right successor node respectively Delete Krsquo or Krsquorsquo from the respective successor node (recursively)

Note Major variants Merge (tis lecture) Re-distribution (instead of splitmerge in case of overflowunderflow the entries are

re-distributed under consideration of one or multiple adjacent nodes)

| 299

gt

copy A Behrend Foundations of Information Systems |

B-Trees B+-Trees and B-Trees

B+-Trees and B-Trees Data is only in leaf nodes Key redundancy but higher fan-out rarr lower tree high less IO Simpler delete procedure rarr requires only merging of nodes

Double linked list of all leaf nodes B-Trees Modified valid node sizes

from [k2k] to [43k2k] rarr better node utilization but more splitsmerges

Example Secondary index Non unique

B-Tree with k = 2 h = 3

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 33: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

| 295

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (2)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Remember Each path from the root to the leaf has the same length h

| 296

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (3)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Overflow Split

| 297

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (4)

Example Order k = 1 n=2k Delete key 3

Overflow Split

Root Split

| 298

gt

copy A Behrend Foundations of Information Systems |

Deletion Algorithm

Example ndash there are different algorithms Search the node that contains the key K to be deleted If key K is in a leaf node delete the key in the leaf node and handle potentially

resulting underflow by merging with sibling If key K is in an inner node pull up new discriminator from one of the successors Analyze which successor node of K has more elements left or right one

If both have the same number of elements decide for one Replace the key K to be deleted with the direct successor Krsquo from the left

successor node or with the direct successor Krsquorsquo from the right successor node respectively Delete Krsquo or Krsquorsquo from the respective successor node (recursively)

Note Major variants Merge (tis lecture) Re-distribution (instead of splitmerge in case of overflowunderflow the entries are

re-distributed under consideration of one or multiple adjacent nodes)

| 299

gt

copy A Behrend Foundations of Information Systems |

B-Trees B+-Trees and B-Trees

B+-Trees and B-Trees Data is only in leaf nodes Key redundancy but higher fan-out rarr lower tree high less IO Simpler delete procedure rarr requires only merging of nodes

Double linked list of all leaf nodes B-Trees Modified valid node sizes

from [k2k] to [43k2k] rarr better node utilization but more splitsmerges

Example Secondary index Non unique

B-Tree with k = 2 h = 3

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 34: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

| 296

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (3)

Example Order k = 1 n=2k Delete key 3

Underflow Merge

Overflow Split

| 297

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (4)

Example Order k = 1 n=2k Delete key 3

Overflow Split

Root Split

| 298

gt

copy A Behrend Foundations of Information Systems |

Deletion Algorithm

Example ndash there are different algorithms Search the node that contains the key K to be deleted If key K is in a leaf node delete the key in the leaf node and handle potentially

resulting underflow by merging with sibling If key K is in an inner node pull up new discriminator from one of the successors Analyze which successor node of K has more elements left or right one

If both have the same number of elements decide for one Replace the key K to be deleted with the direct successor Krsquo from the left

successor node or with the direct successor Krsquorsquo from the right successor node respectively Delete Krsquo or Krsquorsquo from the respective successor node (recursively)

Note Major variants Merge (tis lecture) Re-distribution (instead of splitmerge in case of overflowunderflow the entries are

re-distributed under consideration of one or multiple adjacent nodes)

| 299

gt

copy A Behrend Foundations of Information Systems |

B-Trees B+-Trees and B-Trees

B+-Trees and B-Trees Data is only in leaf nodes Key redundancy but higher fan-out rarr lower tree high less IO Simpler delete procedure rarr requires only merging of nodes

Double linked list of all leaf nodes B-Trees Modified valid node sizes

from [k2k] to [43k2k] rarr better node utilization but more splitsmerges

Example Secondary index Non unique

B-Tree with k = 2 h = 3

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 35: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

| 297

gt

copy A Behrend Foundations of Information Systems |

Deletion in the B-Tree (4)

Example Order k = 1 n=2k Delete key 3

Overflow Split

Root Split

| 298

gt

copy A Behrend Foundations of Information Systems |

Deletion Algorithm

Example ndash there are different algorithms Search the node that contains the key K to be deleted If key K is in a leaf node delete the key in the leaf node and handle potentially

resulting underflow by merging with sibling If key K is in an inner node pull up new discriminator from one of the successors Analyze which successor node of K has more elements left or right one

If both have the same number of elements decide for one Replace the key K to be deleted with the direct successor Krsquo from the left

successor node or with the direct successor Krsquorsquo from the right successor node respectively Delete Krsquo or Krsquorsquo from the respective successor node (recursively)

Note Major variants Merge (tis lecture) Re-distribution (instead of splitmerge in case of overflowunderflow the entries are

re-distributed under consideration of one or multiple adjacent nodes)

| 299

gt

copy A Behrend Foundations of Information Systems |

B-Trees B+-Trees and B-Trees

B+-Trees and B-Trees Data is only in leaf nodes Key redundancy but higher fan-out rarr lower tree high less IO Simpler delete procedure rarr requires only merging of nodes

Double linked list of all leaf nodes B-Trees Modified valid node sizes

from [k2k] to [43k2k] rarr better node utilization but more splitsmerges

Example Secondary index Non unique

B-Tree with k = 2 h = 3

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 36: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

| 298

gt

copy A Behrend Foundations of Information Systems |

Deletion Algorithm

Example ndash there are different algorithms Search the node that contains the key K to be deleted If key K is in a leaf node delete the key in the leaf node and handle potentially

resulting underflow by merging with sibling If key K is in an inner node pull up new discriminator from one of the successors Analyze which successor node of K has more elements left or right one

If both have the same number of elements decide for one Replace the key K to be deleted with the direct successor Krsquo from the left

successor node or with the direct successor Krsquorsquo from the right successor node respectively Delete Krsquo or Krsquorsquo from the respective successor node (recursively)

Note Major variants Merge (tis lecture) Re-distribution (instead of splitmerge in case of overflowunderflow the entries are

re-distributed under consideration of one or multiple adjacent nodes)

| 299

gt

copy A Behrend Foundations of Information Systems |

B-Trees B+-Trees and B-Trees

B+-Trees and B-Trees Data is only in leaf nodes Key redundancy but higher fan-out rarr lower tree high less IO Simpler delete procedure rarr requires only merging of nodes

Double linked list of all leaf nodes B-Trees Modified valid node sizes

from [k2k] to [43k2k] rarr better node utilization but more splitsmerges

Example Secondary index Non unique

B-Tree with k = 2 h = 3

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 37: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

| 299

gt

copy A Behrend Foundations of Information Systems |

B-Trees B+-Trees and B-Trees

B+-Trees and B-Trees Data is only in leaf nodes Key redundancy but higher fan-out rarr lower tree high less IO Simpler delete procedure rarr requires only merging of nodes

Double linked list of all leaf nodes B-Trees Modified valid node sizes

from [k2k] to [43k2k] rarr better node utilization but more splitsmerges

Example Secondary index Non unique

B-Tree with k = 2 h = 3

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 38: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

| 300

gt

copy A Behrend Foundations of Information Systems |

Indexing Low Cardinality Columns

Problem Example B-tree on the sex of customers for a table with 1000000 tuples results in

two lists with approximately 500000 tuples each

Query for all female customers requires 500000 random page accesses (secondary index) Table scan would be much faster

Conclusion B-trees (and also hashing) are useful for predicates with low selectivity

(outputinput cardinality ratio) Rule of thumb margin hit rate is approx 5 higher hit rates do not justify the efforts for an index access

F M

TID TID TID TID hellip TID TID TID TID hellip

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 39: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

| 301

gt

copy A Behrend Foundations of Information Systems |

Bitmap Index

Idea (Long history since the 1960s) Create a bitmapbitlist for each

attribute value Each tuple in the table is assigned to

one bit in the bitmap (by position sequential TID)

Bit values 1 attribute value set 0 attribute value not set

Necessary condition Sequential numbering of the tuples

(TIDs)

F M

1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

0 1 0 0 1 0 0 0 0 1 1 0 1 0 0

Name Sex Region Race Carol f n white

Harold m e black Anne f e asian

Iris f ne white hellip m se hisp hellip f e white hellip f sw asian hellip f w black hellip f n asian hellip m e hisp hellip m se black hellip f s white hellip m nw black hellip f s white hellip f w black

Sex

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes
Page 40: Foundations of Information Systems 5 DBMS Architecturepages.iai.uni-bonn.de/behrend_andreas/lehre/FIM/WS16/11... · 2017-01-10 · Main-memory access has become a performance bottleneck

| 302

gt

copy A Behrend Foundations of Information Systems |

Querying Bitmap Indexes

Main advantage of bitmap indexes Simple and efficient logical join possible Read only data that is relevant for predicates Example σSex=lsquoflsquo ᴧ Region=lsquonlsquo R Bitmaps B1 and B2 in conjunction for (i=0 iltB1length i++) B = B1[i] amp B2[i]

Example IO Costs Estimation σSex=lsquoflsquo ᴧ Region=lsquonlsquo ᴧ Race=lsquoAsianlsquo R

(ldquoAsian women of region Northrdquo) Selectivity 12 18 14 = 164 N=10000 tuples with length of 400 bytes each

(~ 10 tuples per page for 4kB pages) Table scan 1000 pages Bitmap access 1000064 156 pages (worst case each tuple in a different page)

plus 1 page for bitmaps

F 1 0 1 1 0 1 1 1 1 0 0 1 0 1 1

N 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0

A 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

AND = AND

  • DBMS Architecture
  • What is in the Lecture
  • How is Database System build
  • Architectural Blue Print
  • Architectural Trends
  • Different Access Characteristics
  • Hardware Developments
  • Row Storage vs Column Storage
  • Processing Models
  • Transaction Management
  • ACID Properties of Transactions
  • Motivation
  • Reasons for crashes
  • Guarantee Atomicity amp Durability
  • Record Management
  • Record
  • Record Addressing
  • Surrogate with Mapping Table
  • TID Concept
  • Free Space Management
  • Physical Access Paths ndash Index Structures
  • Overview Indexes
  • Overview Indexes (2)
  • Classification of Index Structures
  • B-Tree
  • B-Tree (2)
  • Search in the B-Tree
  • Insertions in the B-Tree (1)
  • Insertions in the B-Tree (2)
  • Insertions in the B-Tree (3)
  • Insertion and Deletion in the B-Tree
  • Deletion in the B-Tree
  • Deletion in the B-Tree (2)
  • Deletion in the B-Tree (3)
  • Deletion in the B-Tree (4)
  • Deletion Algorithm
  • B-Trees B+-Trees and B-Trees
  • Indexing Low Cardinality Columns
  • Bitmap Index
  • Querying Bitmap Indexes