query processing and optimization in modern database systems · 2017. 3. 13. · query processing...

27
Query Processing and Optimization in Modern Database Systems Viktor Leis

Upload: others

Post on 04-Feb-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

  • Query Processing and Optimizationin Modern Database Systems

    Viktor Leis

  • Architecture of Traditional RDBMSs

    feature techniquetransaction isolation locking (2PL)synchronization latching (“lock coupling”)large data sets buffer managementdurability ARIES-style loggingindexing B+treestorage slotted pages (row-wise)SQL iterator model (interpreter)parallelization Exchange operatorsquery optimization DP, indep. assumption

    I optimizing (random) disk I/O operations

  • Architecture of Traditional RDBMSs

    feature techniquetransaction isolation locking (2PL)synchronization latching (“lock coupling”)large data sets buffer managementdurability ARIES-style loggingindexing B+treestorage slotted pages (row-wise)SQL iterator model (interpreter)parallelization Exchange operatorsquery optimization DP, indep. assumption

    I optimizing (random) disk I/O operations

  • Traditional RDBMSs on Modern Hardware

    feature technique overhead1transaction isolation locking (2PL) 16%synchronization latching (“lock coupling”) 14%large data sets buffer management 35%durability ARIES-style logging 12%indexing B+treestorage slotted pages (row-wise)SQL iterator model (interpreter)parallelization Exchange operatorsquery optimization DP, indep. assumption

    1OLTP Through the Looking Glass (Harizopoulos et al., SIGMOD 2008)

  • Modern Database Systems

    I OLAP: column stores (Vectorwise, Vertica, Microsoft Apollo,IBM BLU)

    I OLTP: main-memory systems (e.g., Microsoft Hekaton,VoltDB)

    I OLAP&OLTP: HANA, HyPer

  • HyPer in 2017

    feature HyPer in 2017 contributionstransaction isolation MVCC, precision lockingsynchronization - Part Ilarge data sets -durability physiological loggingindexing Adaptive Radix Tree Master’s [ICDE 2013]storage Data BlocksSQL LLVM compilationparallelization morsel-driven parallelism Part IIquery optimization DP, indep. assumption Part III

  • Part I:Synchronization

    on Multi-Core CPUs

    ICDE 2014, TKDE 2016, Damon 2016

  • SynchronizationI default index structure in HyPer: Adaptive Radix TreeI latch acquisition causes cache misses

    25

    50

    75

    100

    5 10 15 20threads

    M o

    pera

    tions

    /sec

    ond

    no synchronization

    lock coupling

    I this explains single-threaded databases (VoltDB, HyPer 2011)

  • SynchronizationI default index structure in HyPer: Adaptive Radix TreeI latch acquisition causes cache misses

    25

    50

    75

    100

    5 10 15 20threads

    M o

    pera

    tions

    /sec

    ond

    no synchronization

    lock coupling

    I this explains single-threaded databases (VoltDB, HyPer 2011)

  • Hardware Transactional Memory

    I recent feature offered by Intel CPUs (from Haswell)

    + the easiest way to synchronize data structures+ often very good scalability− not yet widespread− scalability issues can be hard to debug

  • Hardware Transactional Memory

    I recent feature offered by Intel CPUs (from Haswell)+ the easiest way to synchronize data structures+ often very good scalability− not yet widespread− scalability issues can be hard to debug

  • Optimistic Lock Coupling

    I idea: writers acquire latches (only on modified nodes)I readers validate accesses using version counters (restart if

    necessary)+ very general technique+ easy to use− may lead to restarts

  • Read-Optimized Write Exclusion (ROWEX)

    I idea: writers acquire latches (on modified nodes)I writers ensure that reads are always safe+ reads always succeed− more difficult than optimistic lock coupling (but easier than

    lock-free techniques)

  • Conclusions

    25

    50

    75

    100

    5 10 15 20threads

    M o

    pera

    tions

    /sec

    ond

    no synchronization

    lock coupling

    Opt. Lock Coupling

    ROWEX

    HTM

    I latching (does not scale), lock-free data structures (scalablebut slow), and HTM (not widespread) have major problems

    I Optimistic Lock Coupling and ROWEX are scalable andpractical

  • Part II:Intra-Query Parallelization

    on Multi-Core CPUs

    SIGMOD 2014, VLDB 2015

  • Motivation: Many, Many Cores

    NetBurst (Foster)NetBurst (Paxville)

    Core (Kentsfield) Core (Lynnfield)

    Nehalem (Beckton) Nehalem (Westmere EX)

    Sandy Bridge EP

    Ivy Bridge EP

    Ivy Bridge EX

    Haswell EP

    Broadwell EPBroadwell EX

    Skylake EP

    1

    10

    20

    30

    2000 2004 2008 2012 2016year

    core

    s pe

    r CPU

  • Parallel Query Processing in HyPer

    I break input into work units (“morsels”)I worker threads grab morsels dynamically (“work stealing”)I # worker threads = # hardware threadsI requires all operators to be aware of parallelismI better scalability than Exchange operators

  • Example 1: Hash Join

    morsel

    T

    Phase 1: process T morsel-wise and store NUMA-locally

    Phase 2: scan NUMA-local storage areaand insert pointers into HT

    next morsel

    Storagearea of

    blue core

    scan Insert t

    he po

    inter

    into H

    T

    globalHash Table

    Storagearea of

    red core

    Storagearea of

    green core

    v

    v

    v

  • Example 2: Window Functionsselect a, b, rank() over (partition by a order by b) from r

    1. hash partitioning (thread-local)

    thread 1 thread 2

    2. combine

    3.1. inter-partition parallelism

    3.2. intra-partition parallelism

    3. sort/evaluation

  • Scalability on 32-core System (TPC-H Queries)

    1 2 3 4 5 6

    7 8 9 10 11 12

    13 14 15 16 17 18

    19 20 21 22

    010203040

    010203040

    010203040

    010203040

    1 16 32 48 64 1 16 32 48 64 1 16 32 48 64 1 16 32 48 64threads

    spee

    dup

    over

    HyP

    er

    System

    HyPer

    Vectorwise

  • Part III:Query Optimization

    VLDB 2016

  • Query Optimization

    SELECT ...FROM R,S,TWHERE ...

    v

    B

    B

    RS

    T

    HJ

    INLcardinalityestimation

    costmodel

    plan spaceenumeration

    I Do we need a new architecture for query optimizers, too?

  • Join Order Benchmark

    I Internet Movie Data Base data set (4 GB)I much more challenging than synthetic benchmarks like TPC-HI 113 queries with 3 to 16 joins

  • Cardinality Estimation: PostgreSQL

    1e8

    1e6

    1e4

    1e2

    1

    1e2

    1e4

    0 1 2 3 4 5 6number of joins

    ←un

    dere

    stim

    atio

    n [lo

    g sc

    ale]

    ov

    eres

    t. →

    95th percentile

    5th percentile

    median75th percentile

    25th percentile

  • Cardinality Estimation: Commercial Systems

    PostgreSQL DBMS A DBMS B DBMS C HyPer

    1e8

    1e6

    1e4

    1e2

    1

    1e2

    1e4

    0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6number of joins

    ←un

    dere

    stim

    atio

    n [lo

    g sc

    ale]

    ove

    rest

    imat

    ion

    95th percentile

    5th percentile

    median75th percentile

    25th percentile

  • Conclusions

    I query optimization is essentialI most (random) join orders are badI optimizers will find good plans for most queries

    I cardinality estimation is usually the reason for bad plansI cost model much less important (with memory-resident data)I relative plan quality decreases when more indexes are availableI operators should not rely on estimates (if possible)

  • Future Work

    featuretransaction isolation MVCC, precision lockingsynchronization Optimistic Lock Couplinglarge data sets ?durability ?indexing Adaptive Radix Treestorage Data BlocksSQL LLVM compilationparallelization morsel-driven parallelismquery optimization index-based join sampling (CIDR 2017)