faculty of computer science database and software engineering group main-memory ... ·...
TRANSCRIPT
![Page 1: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/1.jpg)
Faculty of Computer ScienceDatabase and Software Engineering Group
Main-Memory Database Management Systems
David Broneske
Advanced Topics in Databases, 2019/April/05Otto-von-Guericke University of Magdeburg
![Page 2: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/2.jpg)
Credits
42
Part of this lecture are based on content by
● Jens Teubner from TU Dortmund
● Sebastian Breß from TU Berlin
● Sebastian Dorok
David Broneske | Main-Memory Database Management Systems
![Page 3: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/3.jpg)
We will talk about
43
● Computer and Database Systems Architecture
● Cache Awareness
● Processing Models
● Storage Models
David Broneske | Main-Memory Database Management Systems
![Page 4: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/4.jpg)
44
Computer and Database Systems
ArchitectureThe Past and the Present
David Broneske | Main-Memory Database Management Systems
![Page 5: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/5.jpg)
45
The Past
David Broneske | Main-Memory Database Management SystemsData taken from [Hennessy and Patterson, 1996]
![Page 6: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/6.jpg)
● Main-memory capacity is limited to several megabytes
→ Only a small fraction of the database fits in main memory
46
The Past - Database Systems
David Broneske | Main-Memory Database Management Systems
![Page 7: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/7.jpg)
● Main-memory capacity is limited to several megabytes
→ Only a small fraction of the database fits in main memory
● And disk storage is ”huge”,
→ Traditional database systems use disk as primary storage
47
The Past - Database Systems
David Broneske | Main-Memory Database Management Systems
![Page 8: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/8.jpg)
● Main-memory capacity is limited to several megabytes
→ Only a small fraction of the database fits in main memory
● And disk storage is ”huge”,
→ Traditional database systems use disk as primary storage
● But disk latency is high
→ Parallel query processing to hide disk latencies
→ Choose proper buffer replacement strategy to reduce I/O
→ Architectural properties inherited from system R, the first ”real”
relational DBMS
→ From the 1970’s...
48
The Past - Database Systems
David Broneske | Main-Memory Database Management Systems
![Page 9: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/9.jpg)
49
System-R-like Architecture
David Broneske | Main-Memory Database Management Systems
![Page 10: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/10.jpg)
50
Overhead Breakdown of RDBMS Shore
David Broneske | Main-Memory Database Management Systems
Picture taken from [Harizopoulos et al., 2008]
![Page 11: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/11.jpg)
51
The Present - Computer Architecture
David Broneske | Main-Memory Database Management SystemsData taken from [Hennessy and Patterson, 1996]
![Page 12: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/12.jpg)
● Server machines have up to thousands of gigabyte of main memory
available
→ Use main memory as primary storage for the database and
remove disk access as main performance bottleneck
52
The Present - Database Systems
David Broneske | Main-Memory Database Management Systems
![Page 13: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/13.jpg)
● Server machines have up to thousands of gigabyte of main memory
available
→ Use main memory as primary storage for the database and
remove disk access as main performance bottleneck
● But the architecture of traditional DBMSs is designed for disk-oriented
database systems
→ ”30 years of Moore’s law have antiquated the disk-oriented
relational architecture for OLTP applications.” [Stonebraker et al., 2007]
53
The Present - Database Systems
David Broneske | Main-Memory Database Management Systems
![Page 14: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/14.jpg)
54
Disk-based vs. Main-Memory DBMS
David Broneske | Main-Memory Database Management Systems
![Page 15: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/15.jpg)
55
Disk-based vs. Main-Memory DBMS
David Broneske | Main-Memory Database Management Systems
Having the database in main memory
allows us to remove buffer manager and
paging
→ Remove level of indirection
→ Results in better performance
![Page 16: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/16.jpg)
56
Overhead Breakdown of RDBMS Shore: Payment TXN of TPC-C Benchmark
David Broneske | Main-Memory Database Management Systems
Picture taken from [Harizopoulos et al., 2008]
![Page 17: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/17.jpg)
57
Disk-based vs. Main-Memory DBMS
David Broneske | Main-Memory Database Management Systems
Disk bottleneck is removed as database is kept in main memory
→ Access to main memory becomes new bottleneck
![Page 18: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/18.jpg)
58
The New Bottleneck: Memory Access
David Broneske | Main-Memory Database Management Systems
![Page 19: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/19.jpg)
59
The New Bottleneck: Memory Access
David Broneske | Main-Memory Database Management Systems
There is an increasing gap between CPU and memory speeds.
● Also called the memory wall.
● CPUs spend much of their time waiting for memory.
How can we break the memory wall and better utilize the CPU?
![Page 20: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/20.jpg)
60
The New Bottleneck: Memory Access
David Broneske | Main-Memory Database Management Systems
→ Caches resemble the buffer manager but are controlled by hardware
→ Be aware of the caches!
![Page 21: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/21.jpg)
61
Cache Awareness
David Broneske | Main-Memory Database Management Systems
![Page 22: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/22.jpg)
62
A Motivating Example (Memory Access)
David Broneske | Main-Memory Database Management Systems
Task: sum up all entries in a two-dimensional array.
Both alternatives touch the same data, but in different order.
![Page 23: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/23.jpg)
63
A Motivating Example (Memory Access)
David Broneske | Main-Memory Database Management Systems
![Page 24: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/24.jpg)
64
Principle of Locality
David Broneske | Main-Memory Database Management Systems
Caches take advantage of the principle of locality.
● The hot set of data often fits into caches.
● 90% execution time spent in 10% of the code.
Spatial Locality:
● Related data is often spatially close.
● Code often contains loops.
Temporal Locality:
● Programs tend to re-use data frequently.
● Code may call a function repeatedly, even if it is not spatially close.
![Page 25: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/25.jpg)
65
CPU Cache Internals
David Broneske | Main-Memory Database Management Systems
To guarantee speed, the overhead of caching must be kept reasonable.
● Organize cache in cache lines.
● Only load/evict full cache lines.
● Typical cache line size: 64 bytes.
● The organization in cache lines is consistent with the principle of... locality.
![Page 26: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/26.jpg)
66
CPU Cache Internals
David Broneske | Main-Memory Database Management Systems
To guarantee speed, the overhead of caching must be kept reasonable.
● Organize cache in cache lines.
● Only load/evict full cache lines.
● Typical cache line size: 64 bytes.
● The organization in cache lines is consistent with the principle of spatial
locality.
![Page 27: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/27.jpg)
67
Memory Access
David Broneske | Main-Memory Database Management Systems
On every memory access, the CPU checks if the respective cache
line is already cached.
Cache Hit:
● Read data directly from the cache.
● No need to access lower-level memory.
Cache Miss:
● Read full cache line from lower-level memory.
● Evict some cached block and replace it by the newly read cache line.
● CPU stalls until data becomes available.**Modern CPUs support out-of-order execution and several in-flight cache misses
![Page 28: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/28.jpg)
68
Example: AMD Opteron Data taken from [Hennessy and Patterson, 2006]
David Broneske | Main-Memory Database Management Systems
Example: AMD Opteron, 2.8 GHz, PC3200 DDR SDRAM
● L1 cache: separate data and instruction caches, each 64 kB, 64 B cache
lines
● L2 cache: shared cache, 1 MB, 64 B cache lines
● L1 hit latency: 2 cycles (≈ 1 ns)
● L2 hit latency: 7 cycles (≈ 3.5 ns)
● L2 miss latency: 160–180 cycles (≈ 60 ns)
![Page 29: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/29.jpg)
69
Block Placement: Fully Associative Cache
David Broneske | Main-Memory Database Management Systems
In a fully associative cache, a block can be loaded into any cache line.
● Offers freedom to block
replacement strategy.
● Does not scale to large caches
→ 4MB cache, line size of 64B:
65,536 cache lines.
● Used, e.g., for small TLB caches.
![Page 30: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/30.jpg)
70
Block Placement: Direct-Mapped Cache
David Broneske | Main-Memory Database Management Systems
In a direct-mapped cache, a block has only one place it can appear in the cache.
● Much simpler to implement.
● Easier to make fast.
● Increases the chanceof conflicts.
![Page 31: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/31.jpg)
71
Block Placement: Set-Associative Cache
David Broneske | Main-Memory Database Management Systems
A compromise are set-associative caches.
● Group cache lines into sets.
● Each memory block maps to one set.
● Block can be placed anywhere within a set.
● Most processor caches today are set-associative.
![Page 32: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/32.jpg)
72
Effect of Cache Parameters
David Broneske | Main-Memory Database Management Systems
![Page 33: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/33.jpg)
73
Block Replacement
David Broneske | Main-Memory Database Management Systems
When bringing in new cache lines, an existing entry has to be evicted:
Least Recently Used (LRU)
● Evict cache line whose last access is longest ago.
→ Least likely to be needed any time soon.
First In First Out (FIFO)
● Behaves often similar like LRU.
● But easier to implement.
Random
● Pick a random cache line to evict.
● Very simple to implement in hardware.
Replacement has to be decided in hardware and fast.
![Page 34: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/34.jpg)
74
What Happens on a Write?
David Broneske | Main-Memory Database Management Systems
To implement memory writes, CPU makers have two options:
Write Through
● Data is directly written to lower-level memory (and to the cache).
→ Writes will stall the CPU.*
→ Greatly simplifies data coherency.
Write Back
● Data is only written into the cache.
● A dirty flag marks modified cache lines (Uses a status field.)
→ May reduce traffic to lower-level memory.
→ Need to write on eviction of dirty cache lines.
Modern processors usually implement write back.
*Write buffers can be used to overcome this problem.
![Page 35: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/35.jpg)
75
Putting It All Together
David Broneske | Main-Memory Database Management Systems
To compensate for slow memory, systems use caches.
● Typically multiple levels of caching (memory hierarchy).
● Caches are organized into cache lines.
● Set associativity: A memory block can only go into a small number of
cache lines (most caches are set-associative).
Systems will benefit from locality of data and code.
*Write buffers can be used to overcome this problem.
![Page 36: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/36.jpg)
76
Performance (SPECint 2000)
David Broneske | Main-Memory Database Management Systems
![Page 37: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/37.jpg)
77
Performance (SPECint 2000)
David Broneske | Main-Memory Database Management Systems
![Page 38: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/38.jpg)
78
Why do DBSs show such poor cache behavior?
David Broneske | Main-Memory Database Management Systems
Poor code locality:
● Polymorphic functions
(E.g., resolve attribute types for each processed tuple individually.)
● Set associativity: A memory block can only go into a small number of
cache lines (most caches are set-associative).
![Page 39: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/39.jpg)
79
Why do DBSs show such poor cache behavior?
David Broneske | Main-Memory Database Management Systems
Poor data locality:
● Database systems are designed to navigate through large data volumes
quickly.
● Navigating an index tree, e.g., by design means to “randomly” visit any of
the (many) child nodes.
![Page 40: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/40.jpg)
80
Processing ModelsOr how to improve the instruction cache effectiveness?
David Broneske | Main-Memory Database Management Systems
![Page 41: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/41.jpg)
81
Processing Models
David Broneske | Main-Memory Database Management Systems
There are basically two alternative processing models that are used in
modern DBMSs:
● Tuple-at-a-time volcano model [Graefe, 1990]
○ Operator requests next tuple, processes it, and passes it to the
next operator
● Operator-at-a-time bulk processing [Manegold et al., 2009]
○ Operator consumes its input and materializes its output
![Page 42: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/42.jpg)
82
Tuple-At-A-Time Processing
David Broneske | Main-Memory Database Management Systems
Most systems implement the Volcano
iterator model:
● Operators request tuples from
their input using next().
● Data is processed tuple at a time.
● Each operator keeps its own
state.
![Page 43: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/43.jpg)
83
Tuple-At-A-Time Processing
David Broneske | Main-Memory Database Management Systems
● Pipeline-parallelism
→ Data processing can start although data does not fully reside in
main memory
→ Small intermediate results
![Page 44: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/44.jpg)
84
Tuple-At-A-Time Processing
David Broneske | Main-Memory Database Management Systems
● Pipeline-parallelism
→ Data processing can start although data does not fully reside in
main memory
→ Small intermediate results
● All operators in a plan run tightly interleaved.
→ Their combined instruction footprint may be large.
→ Instruction cache misses.
![Page 45: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/45.jpg)
85
Tuple-At-A-Time Processing
David Broneske | Main-Memory Database Management Systems
● Pipeline-parallelism
→ Data processing can start although data does not fully reside in
main memory
→ Small intermediate results
● All operators in a plan run tightly interleaved.
→ Their combined instruction footprint may be large.
→ Instruction cache misses.
● Operators constantly call each other’s functionality.
→ Large function call overhead.
![Page 46: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/46.jpg)
86
Tuple-At-A-Time Processing
David Broneske | Main-Memory Database Management Systems
● Pipeline-parallelism
→ Data processing can start although data does not fully reside in
main memory
→ Small intermediate results
● All operators in a plan run tightly interleaved.
→ Their combined instruction footprint may be large.
→ Instruction cache misses.
● Operators constantly call each other’s functionality.
→ Large function call overhead.
● The combined state may be too large to fit into caches.
○ E.g., hash tables, cursors, partial aggregates.
→ Data cache misses.
![Page 47: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/47.jpg)
87
Example: TPC-H Query Q1 on MySQL
David Broneske | Main-Memory Database Management Systems
• Scan query with arithmetics and a bit of aggregation.
Source: MonetDB/X100: Hyper-Pipelining Query Execution. [Boncz et al., 2005]
![Page 48: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/48.jpg)
88David Broneske | Main-Memory Database Management Systems
![Page 49: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/49.jpg)
89
Observations
David Broneske | Main-Memory Database Management Systems
● Only single tuple processed in each call; millions of calls.
● Only 10 % of the time spent on actual query task.
● Low instructions-per-cycle (IPC) ratio.
● Much time spent on field access (e.g., rec get nth field ()).
○ Polymorphic operators
● Single-tuple functions hard to optimize (by compiler).
→ Low instructions-per-cycle ratio.
→ Vector instructions (SIMD) hardly applicable.
● Function call overhead (e.g., Item func plus::val ()).
vs. 3 instr. for load/add/store assembly*
* Depends on underlying hardware
![Page 50: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/50.jpg)
90
Operator-At-A-Time Processing
David Broneske | Main-Memory Database Management Systems
● Operators consume and produce
full tables.
● Each (sub-)result is fully
materialized (in memory).
● No pipelining (rather a sequence
of statements).
● Each operator runs exactly once.
![Page 51: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/51.jpg)
91
Operator-At-A-Time Processing
David Broneske | Main-Memory Database Management Systems
Function call overhead is now replaced by extremely tight loops.
Example: batval_int_add(···)
![Page 52: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/52.jpg)
92
Operator-At-A-Time Consequences
David Broneske | Main-Memory Database Management Systems
● Parallelism: Inter-operator and intra-operator
![Page 53: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/53.jpg)
93
Operator-At-A-Time Consequences
David Broneske | Main-Memory Database Management Systems
● Parallelism: Inter-operator and intra-operator
● Function call overhead is now replaced by extremely tight loops that
○ conveniently fit into instruction caches,
○ can be optimized effectively by modern compilers
→ loop unrolling
→ vectorization (use of SIMD instructions)
○ can leverage modern CPU features (hardware prefetching).
![Page 54: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/54.jpg)
94
Operator-At-A-Time Consequences
David Broneske | Main-Memory Database Management Systems
● Parallelism: Inter-operator and intra-operator
● Function call overhead is now replaced by extremely tight loops that
○ conveniently fit into instruction caches,
○ can be optimized effectively by modern compilers
→ loop unrolling
→ vectorization (use of SIMD instructions)
○ can leverage modern CPU features (hardware prefetching).
● Function calls are now out of the critical code path.
![Page 55: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/55.jpg)
95
Operator-At-A-Time Consequences
David Broneske | Main-Memory Database Management Systems
● Parallelism: Inter-operator and intra-operator
● Function call overhead is now replaced by extremely tight loops that
○ conveniently fit into instruction caches,
○ can be optimized effectively by modern compilers
→ loop unrolling
→ vectorization (use of SIMD instructions)
○ can leverage modern CPU features (hardware prefetching).
● Function calls are now out of the critical code path.
● No per-tuple field extraction or type resolution.
○ Operator specialization, e.g., for every possible type.
○ Implemented using macro expansion.
○ Possible due to column-based storage.
![Page 56: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/56.jpg)
96David Broneske | Main-Memory Database Management Systems
Source: MonetDB/X100: Hyper-Pipelining Query Execution. [Boncz et al., 2005]
![Page 57: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/57.jpg)
97
Tuple-At-A-Time vs. Operator-At-A-Time
David Broneske | Main-Memory Database Management Systems
The operator-at-a-time model is a two-edged sword:
😃 Cache-efficient with respect to code and operator state. �
😃 Tight loops, optimizable code.
� 😡 Data won’t fully fit into cache.
→ Repeated scans will fetch data from memory over and over.
→ Strategy falls apart when intermediate results no longer fit into
main memory.
Can we aim for the middle ground between the two extremes?
![Page 58: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/58.jpg)
98
Vectorized Execution Model
David Broneske | Main-Memory Database Management Systems
Idea:
● Use Volcano-style iteration,
but:
● for each next() call return a large
number of tuples
→ a so called “vector”
![Page 59: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/59.jpg)
99
Vectorized Execution Model
David Broneske | Main-Memory Database Management Systems
Idea:
● Use Volcano-style iteration,
but:
● for each next() call return a large
number of tuples
→ a so called “vector”
Choose vector size
● large enough to compensate for
iteration overhead (function calls,
instruction cache misses, . . . ), but
● small enough to not thrash data
caches.
![Page 60: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/60.jpg)
100
Vector Size ↔ Instruction Cache Effectiveness
David Broneske | Main-Memory Database Management Systems
● Vectorized execution quickly compensates for iteration overhead.
● 1000 tuples should conveniently fit into caches
![Page 61: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/61.jpg)
101
Comparison of Execution Models
David Broneske | Main-Memory Database Management Systems
![Page 62: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/62.jpg)
102
Storage ModelsOr how to improve the data cache effectiveness?
![Page 63: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/63.jpg)
103
Vectorized Execution Model
David Broneske | Main-Memory Database Management Systems
There are basically two alternative storage models that are used in
modern relational DBMSs:
● Row Stores
● Column Stores
![Page 64: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/64.jpg)
104
Row Stores
David Broneske | Main-Memory Database Management Systems
a.k.a. row-wise storage or n-ary storage model, NSM:
![Page 65: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/65.jpg)
105
Column Stores
David Broneske | Main-Memory Database Management Systems
a.k.a. column-wise storage or decomposition storage model, DSM:
![Page 66: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/66.jpg)
106
The Effect on Query Processing
David Broneske | Main-Memory Database Management Systems
Consider, e.g., a selection query:
This query typically involves a full table scan.
![Page 67: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/67.jpg)
107
A Full Table Scan in a Row Store
David Broneske | Main-Memory Database Management Systems
In a row-store, all rows of a table are stored sequentially on a
database page.
![Page 68: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/68.jpg)
108
A Full Table Scan in a Row Store
David Broneske | Main-Memory Database Management Systems
In a row-store, all rows of a table are stored sequentially on a
database page.
![Page 69: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/69.jpg)
109
A Full Table Scan in a Row Store
David Broneske | Main-Memory Database Management Systems
In a row-store, all rows of a table are stored sequentially on a
database page.
With every access to a l_shipdate field, we load a large amount of
irrelevant information into the cache
![Page 70: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/70.jpg)
110
A “Full Table Scan” in a Column Store
David Broneske | Main-Memory Database Management Systems
In a column store, all values of one column are stored sequentially
on a database page.
![Page 71: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/71.jpg)
111
A “Full Table Scan” in a Column Store
David Broneske | Main-Memory Database Management Systems
In a column store, all values of one column are stored sequentially
on a database page.
All data loaded into caches by a “l_shipdate scan” is now actually
relevant for the query.
![Page 72: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/72.jpg)
112
Column Store Advantages
David Broneske | Main-Memory Database Management Systems
● All data loaded into caches by a “l_shipdate scan” is now actually
relevant for the query.
→ Less data has to be fetched from memory.
→ Amortize cost for fetch over more tuples.
→ If we’re really lucky, the full (l_shipdate) data might now
even fit into caches.
● The same arguments hold, by the way, also for disk-based systems.
● Additional benefit: Data compression might work better.
![Page 73: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/73.jpg)
113
Data Compression - Requirements
David Broneske | Main-Memory Database Management Systems
● Lossless compression
→ otherwise we generate data errors
● Lightweight (de-)compression
→ otherwise (de-)compression overhead would outweigh our
possible performance potentials
● Enable processing of compressed values
→ no additional overhead for decompression
![Page 74: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/74.jpg)
114
Classification of Compression Techniques
David Broneske | Main-Memory Database Management Systems
● General idea: Replace data by representation that needs less bits
than original data
● Granularity:
○ Attribute values, tuples, tables, pages
○ Index structures
● Code length:
○ Fixed code length: All values are encoded with same number of
bits
○ Variable code length: Number of bits differs (e.g., correlate
number of used bits with value frequency; Huffman Encoding)
![Page 75: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/75.jpg)
115
Dictionary Encoding
David Broneske | Main-Memory Database Management Systems
● Use a dictionary that contains data values and their surrogates
● Surrogate can be derived from values’ dictionary position
● Applicable to row- and column-oriented data layouts
![Page 76: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/76.jpg)
116
Bit Packing
David Broneske | Main-Memory Database Management Systems
● Surrogate values do not have to be multiples of one byte
● Example: 16 distinct values can be effectively stored using 4 bit per
surrogate → 2 values per byte
�→ Processing of compressed values is not straight forward
![Page 77: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/77.jpg)
117
Run Length Encoding
David Broneske | Main-Memory Database Management Systems
● Reduce size of sequences of same value
● Store the value and an indicator about the sequence length •
Applicable to column-oriented data layouts
● Sorting can further improve compression effectiveness
![Page 78: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/78.jpg)
118
Common Value Suppression
David Broneske | Main-Memory Database Management Systems
● A common value is scattered across a column (e.g., null)
● Use a data structure that indicates whether a common value is
stored at a given row index or not
○ Yes: Common value is stored here
○ No: Lookup value in the dictionary (using prefix sum)
![Page 79: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/79.jpg)
119
Bit-Vector Encoding
David Broneske | Main-Memory Database Management Systems
● Suitable for columns that have low number of distinct values
● Use bit string for every column value that indicates whether the
value is present at a given row index or not
● Length of bit string equals number of tuples
● Used in Bitmap-Indexes
![Page 80: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/80.jpg)
120
Delta Coding
David Broneske | Main-Memory Database Management Systems
● Store difference to precedent value instead of the original value
● Applicable to column-oriented data layouts
● Sorting can further improve compression effectiveness
![Page 81: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/81.jpg)
121
Frequency Compression
David Broneske | Main-Memory Database Management Systems
● Idea: Exploit data skew
● Principle:
○ More frequent values are encoded using fewer bits
○ Less frequent values are encoded using more bits
● Use prefix codes (e.g., Huffman Encoding)
![Page 82: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/82.jpg)
122
Frequency Partitioning
David Broneske | Main-Memory Database Management Systems
● Developed for IBM’s BLINK project [Raman et al., 2008]
● Similar to frequency compression of tuples in a row-oriented data
layout
● But, partitioning tuples regarding column values
→ Overhead is reduced as within one partition code length is fixed
![Page 83: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/83.jpg)
123
Frequency Partitioning: Principle
David Broneske | Main-Memory Database Management Systems
![Page 84: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/84.jpg)
124
Data Compression - Summary
David Broneske | Main-Memory Database Management Systems
● General idea: Replace data by representation that needs less bits
than original data
● Discussed approaches:
○ Fixed code length: Dictionary Encoding, RLE, Common Value
Suppression
○ Variable code length: Delta Coding, Frequency Compression,
Frequency Partitioning
● Improvements: bit packing, partitioning
● Benefits for main-memory DBMSs:
○ Reduced storage requirements
○ Better memory bandwidth utilization
![Page 85: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/85.jpg)
125
Column Store Trade Offs
David Broneske | Main-Memory Database Management Systems
Tuple recombination can
cause considerable cost.
• Need to perform many joins.
• Workload-dependent
trade-off.
![Page 86: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/86.jpg)
126
An example: Binary Association Tables in MonetDB
David Broneske | Main-Memory Database Management Systems
MonetDB makes this explicit in its data model.
● All tables in MonetDB have two columns (“head” and “tail”).
● Each column yields one binary association table (BAT).
● Object identifiers (oids) identify matching entries (BUNs).
● Often, oids can be implemented as virtual oids (voids).
→ Not explicitly materialized in memory.
![Page 87: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/87.jpg)
127
Materialization Strategies
David Broneske | Main-Memory Database Management Systems
● Recall: In a column-oriented data layout each column is stored
separately
● Consider, e.g., a selection query:
→ Query accesses and returns values of two columns
→ Materialization (tuple reconstruction) during query processing
necessary
![Page 88: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/88.jpg)
128
Early Materialization
David Broneske | Main-Memory Database Management Systems
Reconstruct tuples
as soon as possible
![Page 89: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/89.jpg)
129
Late Materialization
David Broneske | Main-Memory Database Management Systems
Postpone tuple reconstruction
to the latest possible time
![Page 90: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/90.jpg)
130
Advantages of Early and Late Materialization
David Broneske | Main-Memory Database Management Systems
Early Materialization (EM):
● Reduces access cost if one column has to be accessed multiple times
during query processing
↗ [Abadi et al., 2007]
Late Materialization (LM):
● Reduces amount of tuples to reconstruct
● LM allows processing of columns as long as possible
→ Processing of compressed data
→ LM improves cache effectiveness
↗ [Abadi et al., 2008]
![Page 91: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/91.jpg)
131
Conclusion
David Broneske | Main-Memory Database Management Systems
● Row stores store complete tuples sequentially on a database page
● Column stores store all values of one column sequentially on a
database page
● Depending on the workload column stores or row stores are more
advantageous
○ Tuple reconstruction is overhead in column stores
○ Analytical workloads that process few columns at a time benefit
from column stores
→ One data storage approach is not optimal to serve all possible
workloads
![Page 92: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/92.jpg)
132
Examples of systems
David Broneske | Main-Memory Database Management Systems
![Page 93: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/93.jpg)
133
Take home messages
David Broneske | Main-Memory Database Management Systems
● Modern hardware offers performance improvements for DB
applications → disk vs. main-memory access speed
● Rethinking the architecture of DBMSs to adapt them on changes in
hardware pays off → single- vs. multi-threaded OLTP engines
● New DBMS architectures mean that we have to
○ solve old problems again → durability and availability
○ optimize for different things → cache effectiveness
![Page 94: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/94.jpg)
134
References I
David Broneske | Main-Memory Database Management Systems
Abadi, D. J., Madden, S., and Hachem, N. (2008). Column-stores vs. row-stores: how different are they really? In
SIGMOD, pages 967–980.
Abadi, D. J., Myers, D. S., DeWitt, D. J., and Madden, S. (2007). Materialization Strategies in a Column-Oriented
DBMS.
In ICDE, pages 466–475.
Boncz, P. A., Zukowski, M., and Nes, N. (2005). Monetdb/x100: Hyper-pipelining query execution.In CIDR, pages
225–237.
Copeland, G. P. and Khoshafian, S. N. (1985). A decomposition storage model.
In SIGMOD, pages 268–279.
Drepper, U. (2007).
What Every Programmer Should Know About Memory.
Graefe, G. (1990).
Encapsulation of Parallelism in the Volcano Query Processing System.In SIGMOD, pages 102–111.
![Page 95: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/95.jpg)
135
References II
David Broneske | Main-Memory Database Management Systems
Harizopoulos, S., Abadi, D. J., Madden, S., and Stonebraker, M. (2008). OLTP through the looking glass, and what
we found there.
In SIGMOD, pages 981–992.
Hennessy, J. L. and Patterson, D. A. (2006).Computer Architecture: A Quantitative Approach.
Morgan Kaufmann, 4 edition.
Hennessy, J. L. and Patterson, D. A. (2012).Computer Architecture - A Quantitative Approach.
Morgan Kaufmann, 5 edition.
Manegold, S., Kersten, M. L., and Boncz, P. (2009).
Database Architecture Evolution: Mammals flourished long before Dinosaurs became extinct.
PVLDB, 2(2):1648–1653.
Raman, V., Attaluri, G., Barber, R., Chainani, N., Kalmuk, D., KulandaiSamy, V., Leenstra, J., Lightstone, S., Liu, S.,
Lohman, G. M., Malkemus, T., Mueller, R., Pandis, I., Schiefer, B., Sharpe, D., Sidle, R., Storm,
A., and Zhang, L. (2013).
DB2 with BLU Acceleration: So Much More Than Just a Column Store.
PVLDB, 6(11):1080–1091.
![Page 96: Faculty of Computer Science Database and Software Engineering Group Main-Memory ... · 2020-03-03 · The New Bottleneck: Memory Access David Broneske | Main-Memory Database Management](https://reader035.vdocument.in/reader035/viewer/2022070705/5e94efdbae07726f0660f6ef/html5/thumbnails/96.jpg)
136
References III
David Broneske | Main-Memory Database Management Systems
Raman, V. and Swart, G. (2006).
How to Wring a Table Dry: Entropy Compression of Relations and Querying of Compressed Relations.In VLDB,
pages 858–869.
Raman, V., Swart, G., Qiao, L., Reiss, F., Dialani, V., Kossmann, D., Narang, I., and Sidle, R. (2008). Constant-Time
Query Processing.
In ICDE, pages 60–69.
Stonebraker, M., Madden, S., Abadi, D. J., Harizopoulos, S., Hachem, N., and Helland, P. (2007). The end of an
architectural era: (it’s time for a complete rewrite).
In VLDB, pages 1150–1160.
Zukowski, M. (2009).
Balancing Vectorized Query Execution with Bandwidth-Optimized Storage.PhD thesis, CWI Amsterdam.