mysql 8.0 latest updates: hash join and explain analyze · 2)scan the corresponding chunk and match...
TRANSCRIPT
Copyright © 2019 Oracle and/or its affiliates.1
Hash join andEXPLAIN ANALYZENorvald H. Ryeng
Software Development Senior Manager
MySQL Optimizer Team
October 1, 2019
MySQL 8.0.18 latest updates
Copyright © 2019 Oracle and/or its affiliates.2
Safe harbor statement
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions.
The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of Oracle Corporation.
Copyright © 2019 Oracle and/or its affiliates.3
Program agenda
1 Quick demo2 How did we get here? Phase separation Volcano iterator model3 Hash join4 EXPLAIN ANALYZE
Copyright © 2019 Oracle and/or its affiliates.4
Quick demoMySQL 8.0.18
Copyright © 2019 Oracle and/or its affiliates.5
Copyright © 2019 Oracle and/or its affiliates.6
How did we get here?A long term investment that finally pays off
Copyright © 2019 Oracle and/or its affiliates.7
https://twitter.com/vlad_mihalcea/status/1173842312398614528
Copyright © 2019 Oracle and/or its affiliates.8
MySQL refactoring: Separating phases
Started ~10 years agoConsidered finished now
A clear separation between query processing phases Fixed a large number of bugs
Improved stability Faster feature development
Fewer surprises and complications during development
Parse
Prepare
Optimize
Execute
SQL
Resolve
Transform
Abstract syntax tree
Logical plan
Physical plan
Copyright © 2019 Oracle and/or its affiliates.9
MySQL refactoring: Parsing and preparing
Still ongoingImplemented piece by piece
Separating parsing and resolving phasesEliminate semantic actions that do too muchGet a true bottom-up parser
Makes it easier to extend with new SQL syntaxParsing doesn't have unintended side effects
Consistent name and type resolvingNames resolved top-downTypes resolved bottom-up
Transformations done in the prepare phaseBottom-up
Parse
Prepare
Optimize
Execute
SQL
Resolve
Transform
Abstract syntax tree
Logical plan
Physical plan
Copyright © 2019 Oracle and/or its affiliates.10
MySQL features made possible because we invested in refactoring
CTEs Recursive and non-recursive CTEs Traverse hierarchies Write more readable SQL
LATERAL "For-each loops"
… and many, many more!
Window functions Aggregation, ranking, analytics Sliding windows
JSON JSON_TABLE JSON window functions
Copyright © 2019 Oracle and/or its affiliates.11
MySQL refactoring: Iterator executor
Volcano iterator model Possible because phases were separated Ongoing for ~1,5 year Much more modular exeuctor
Common iterator interface for all operationsEach operation is contained within an iterator
Able to put together plans in new waysImmediate benefit: Removes temporary tables in some cases
Join is just an iteratorNested loop join is just an iteratorHash join is just an iteratorYour favorite join method is just an iterator
Parse
Prepare
Optimize
Execute
SQL
Resolve
Transform
Abstract syntax tree
Logical plan
Physical plan
Copyright © 2019 Oracle and/or its affiliates.12
Old MySQL executor vs. iterator executor
Old executor Nested loop focused Hard to extend Code for one operation spread out Different interfaces for each operation Combination of operations hard coded
Iterator executor Modular Easy to extend Each iterator encapsulates one operation Same interface for all iterators All operations can be connected
Copyright © 2019 Oracle and/or its affiliates.13
MySQL 8.0 features based on the iterator executor
EXPLAIN FORMAT=TREEPrint the iterator tree
EXPLAIN ANALYZE1. Insert intstrumentation nodes in the tree2. Execute the query3. Print the iterator tree
Hash joinJust another iterator type
Parse
Prepare
Optimize
Execute
SQL
Resolve
Transform
Abstract syntax tree
Logical plan
Physical plan
Copyright © 2019 Oracle and/or its affiliates.14
Hash joinMySQL internals
New in8.0.18
Copyright © 2019 Oracle and/or its affiliates.15
MySQL hash join
Hybrid hash join Three execution modes
Everything fits in memorySpill to disk (GRACE hash join)Looping
Equi-join xxHash64
FastGood distribution
HashJoinIterator
TableScanIterator TableScanIterator
SELECT * FROM t1 JOIN t2 ON t1.a = t2.a;
Copyright © 2019 Oracle and/or its affiliates.16
MySQL hash join — everything fits in memory
1) Hash one table into memorySmallest tableThe whole table fits in memory (if not, use next method)
2) Scan the other table and match with rows in memory
2
1
A
=
B
Copyright © 2019 Oracle and/or its affiliates.17
MySQL hash join — spill to disk
1
2
3
4
1) Hash the smallest table into memory until the buffer is fullWhen the buffer is full, dump the rest as chunks on diskProduce up to 128 hash buckets (chunks) on disk
2) Scan the other table and match with rows in memoryWrite to chunks on disk at the same timeProduce up to 128 hash buckets (chunks) on disk
3) Hash one chunk of the smallest table into memoryHash using a different seed than used to create disk buckets(If the bucket doesn't fit, see next method)
4) Scan the corresponding chunk and match with rows in memory5) Repeat 3-4 until all chunks have been processed
==
A
B
Copyright © 2019 Oracle and/or its affiliates.18
MySQL hash join — looping
1) Hash a batch of the chunk into memoryWhen the buffer is full, pause reading this operand
2) Scan the corresponding chunk and match with rows in memory3) Hash the next batch into memory
Resume from where it was paused
4) Scan the corresponding chunk and match with rows in memory5) Repeat 3-4 until the whole chunk has been processed
1 & 3
2 & 4
=
Copyright © 2019 Oracle and/or its affiliates.19
MySQL hash join optimization
Replaces BNL Currently optimized as BNL
Replace with hash join after optimizationA conservative (safe) choice
Optimizer switch to turn on/offSET optimizer_switch='hash_join=on';
Hints/*+ HASH_JOIN(tables or query blocks) *//*+ NO_HASH_JOIN(tables or query blocks) */
Buffer sizeSET join_buffer_size=number;
Parse
Prepare
Optimize
Execute
SQL
Resolve
Transform
Abstract syntax tree
Logical plan
Physical plan
Copyright © 2019 Oracle and/or its affiliates.20
467x1520x>1400x
1332x968x
MySQL hash join performance
BNL compared to hash join Force BNL/hash join in DBT-3/TPC-H
DBT-3/TPC-H without indexes
Optimizer selects BNL
Automatic conversion to hash join
Hash join is much faster than BNL Can't expect same improvement when
indexes are available
Low
er n
umbe
r is
bett
er
Copyright © 2019 Oracle and/or its affiliates.21
New in8.0.18
EXPLAIN ANALYZEMySQL internals
Copyright © 2019 Oracle and/or its affiliates.22
TimingIteratorTimingIterator
TimingIterator
MySQL EXPLAIN ANALYZE
Wrap iterators in instrumentation nodes Measurements
Time (in ms) to first rowTime (in ms) to last rowNumber of rowsNumber of loops
Execute the query and dump the stats
HashJoinIterator
TableScanIterator TableScanIterator
Copyright © 2019 Oracle and/or its affiliates.23
467x1520x>1400x
1332x968x
What's wrong with Q2?
SELECT s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_commentFROM part, supplier, partsupp, nation, regionWHERE p_partkey = ps_partkey AND s_suppkey = ps_suppkey AND p_size = 4 AND p_type LIKE '%TIN' AND s_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name = 'AMERICA' AND ps_supplycost = ( SELECT min(ps_supplycost) FROM partsupp, supplier, nation, region WHERE p_partkey = ps_partkey AND s_suppkey = ps_suppkey AND s_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name = 'AMERICA' )ORDER BY s_acctbal DESC, n_name, s_name, p_partkeyLIMIT 100;
MySQL EXPLAIN ANALYZE to the rescue!
-> Limit: 100 row(s) (actual time=591739.000..591739.018 rows=100 loops=1) -> Sort: <temporary>.s_acctbal DESC, <temporary>.n_name, <temporary>.s_name, <temporary>.p_partkey, limit input to 100 row(s) per chunk (actual time=591738.999..591739.012 rows=100 loops=1) -> Stream results (actual time=591738.599..591738.772 rows=462 loops=1) -> Inner hash join (nation.n_regionkey = region.r_regionkey), (nation.n_nationkey = supplier.s_nationkey) (cost=2074295.37 rows=98) (actual time=591738.591..591738.686 rows=462 loops=1) -> Table scan on nation (cost=0.00 rows=25) (actual time=0.024..0.026 rows=25 loops=1) -> Hash -> Inner hash join (supplier.s_suppkey = partsupp.ps_suppkey) (cost=2074041.10 rows=98) (actual time=591735.554..591738.311 rows=462 loops=1) -> Table scan on supplier (cost=0.06 rows=9760) (actual time=0.068..2.024 rows=10000 loops=1) -> Hash -> Filter: (partsupp.ps_supplycost = (select #2)) (cost=1977898.52 rows=98) (actual time=1282.855..591733.987 rows=462 loops=1) -> Inner hash join (partsupp.ps_partkey = part.p_partkey) (cost=1977898.52 rows=98) (actual time=84.827..271.307 rows=3120 loops=1) -> Table scan on partsupp (cost=3.54 rows=796168) (actual time=0.034..108.684 rows=800000 loops=1) -> Hash -> Inner hash join (cost=20353.91 rows=24) (actual time=0.274..83.964 rows=780 loops=1) -> Filter: ((part.p_size = 4) and (part.p_type like '%TIN')) (cost=20353.16 rows=2204) (actual time=0.212..83.719 rows=780 loops=1) -> Table scan on part (cost=20353.16 rows=198401) (actual time=0.160..63.957 rows=200000 loops=1) -> Hash -> Filter: (region.r_name = 'AMERICA') (cost=0.75 rows=1) (actual time=0.029..0.036 rows=1 loops=1) -> Table scan on region (cost=0.75 rows=5) (actual time=0.021..0.030 rows=5 loops=1) -> Select #2 (subquery in condition; dependent) -> Aggregate: min(partsupp.ps_supplycost) (actual time=189.566..189.566 rows=1 loops=3120) -> Inner hash join (nation.n_regionkey = region.r_regionkey), (nation.n_nationkey = supplier.s_nationkey) (cost=77988309.27 rows=79617) (actual time=189.561..189.563 rows=1 loops=3120) -> Table scan on nation (cost=0.00 rows=25) (actual time=0.005..0.007 rows=25 loops=3120) -> Hash -> Inner hash join (supplier.s_suppkey = partsupp.ps_suppkey) (cost=77789257.28 rows=79617) (actual time=187.997..189.549 rows=4 loops=3120) -> Table scan on supplier (cost=0.02 rows=9760) (actual time=0.014..1.237 rows=10000 loops=3120) -> Hash -> Filter: (part.p_partkey = partsupp.ps_partkey) (cost=81757.41 rows=79617) (actual time=92.085..187.748 rows=4 loops=3120) -> Inner hash join (cost=81757.41 rows=79617) (actual time=0.018..155.118 rows=800000 loops=3120) -> Table scan on partsupp (cost=10101.54 rows=796168) (actual time=0.011..97.353 rows=800000 loops=3120) -> Hash -> Filter: (region.r_name = 'AMERICA') (cost=0.75 rows=1) (actual time=0.004..0.005 rows=1 loops=3120) -> Table scan on region (cost=0.75 rows=5) (actual time=0.003..0.003 rows=5 loops=3120)
Time is in milliseconds
Copyright © 2019 Oracle and/or its affiliates.25
Make your queries run faster with MySQL 8.0
Automatic
Temporary table eliminationRecursive CTEsSome derived tablesUNION into derived tablesInput to sorting
Faster duplicate removal Hash join instead of BNL Analyze queries to find out where time is spent
EXPLAIN FORMAT=TREEEXPLAIN ANALYZEOptimizer trace
Copyright © 2019 Oracle and/or its affiliates.26
Feature descriptions and design detailsdirectly from the source
https://mysqlserverteam.com/
Copyright © 2019 Oracle and/or its affiliates.27
Thank you!
Norvald H. Ryeng
Software Development Senior ManagerMySQL Optimizer Team