multi-core, main-memory joins: sort vs. hash revisited · overview 1 background sort vs. hash...
TRANSCRIPT
![Page 1: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited · Overview 1 Background Sort vs. Hash Motivation 2 Merge - Sort Join The basic idea Sort Phase Merge Phase Multi-Way Merge](https://reader034.vdocument.in/reader034/viewer/2022042215/5ebc8c99a972fc4d043dea97/html5/thumbnails/1.jpg)
Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited
Presenter: Haonan Wang
Slides Credit: CMU 15-721 Spring 2018
March 19, 2019
Presenter: Haonan Wang (MIT) Sort vs Hash March 19, 2019 1 / 36
![Page 2: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited · Overview 1 Background Sort vs. Hash Motivation 2 Merge - Sort Join The basic idea Sort Phase Merge Phase Multi-Way Merge](https://reader034.vdocument.in/reader034/viewer/2022042215/5ebc8c99a972fc4d043dea97/html5/thumbnails/2.jpg)
Overview
1 BackgroundSort vs. HashMotivation
2 Merge - Sort JoinThe basic ideaSort PhaseMerge PhaseMulti-Way Merge
3 Experiment
Presenter: Haonan Wang (MIT) Sort vs Hash March 19, 2019 2 / 36
![Page 3: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited · Overview 1 Background Sort vs. Hash Motivation 2 Merge - Sort Join The basic idea Sort Phase Merge Phase Multi-Way Merge](https://reader034.vdocument.in/reader034/viewer/2022042215/5ebc8c99a972fc4d043dea97/html5/thumbnails/3.jpg)
Section 1
Background
Presenter: Haonan Wang (MIT) Sort vs Hash March 19, 2019 3 / 36
![Page 4: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited · Overview 1 Background Sort vs. Hash Motivation 2 Merge - Sort Join The basic idea Sort Phase Merge Phase Multi-Way Merge](https://reader034.vdocument.in/reader034/viewer/2022042215/5ebc8c99a972fc4d043dea97/html5/thumbnails/4.jpg)
Subsection 1
Sort vs. Hash
Presenter: Haonan Wang (MIT) Sort vs Hash March 19, 2019 4 / 36
![Page 5: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited · Overview 1 Background Sort vs. Hash Motivation 2 Merge - Sort Join The basic idea Sort Phase Merge Phase Multi-Way Merge](https://reader034.vdocument.in/reader034/viewer/2022042215/5ebc8c99a972fc4d043dea97/html5/thumbnails/5.jpg)
Sort vs. Hash
There are two main approaches for the PARALLEL JOIN ALGORITHMS:→ Hash Join→ Sort-Merge Join
History of Hash VS. Sort
1970s Sorting
1980s Hashing
1990s Equivalent
2000s Hashing
2010s Hashing (Partitioned vs. Non-Partitioned)
2020s ???
Presenter: Haonan Wang (MIT) Sort vs Hash March 19, 2019 5 / 36
![Page 6: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited · Overview 1 Background Sort vs. Hash Motivation 2 Merge - Sort Join The basic idea Sort Phase Merge Phase Multi-Way Merge](https://reader034.vdocument.in/reader034/viewer/2022042215/5ebc8c99a972fc4d043dea97/html5/thumbnails/6.jpg)
What Is Merge-Sort Join
Presenter: Haonan Wang (MIT) Sort vs Hash March 19, 2019 6 / 36
![Page 7: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited · Overview 1 Background Sort vs. Hash Motivation 2 Merge - Sort Join The basic idea Sort Phase Merge Phase Multi-Way Merge](https://reader034.vdocument.in/reader034/viewer/2022042215/5ebc8c99a972fc4d043dea97/html5/thumbnails/7.jpg)
SIMD?
What is SIMD?A class of CPU instructions that allow the processor to perform the sameoperation on multiple data points simultaneously.
Both current AMD and Intel CPUs have ISA and microarchitecturesupport SIMD operations.→ MMX, 3DNow!, SSE, SSE2, SSE3, SSE4, AVX
Presenter: Haonan Wang (MIT) Sort vs Hash March 19, 2019 7 / 36
![Page 8: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited · Overview 1 Background Sort vs. Hash Motivation 2 Merge - Sort Join The basic idea Sort Phase Merge Phase Multi-Way Merge](https://reader034.vdocument.in/reader034/viewer/2022042215/5ebc8c99a972fc4d043dea97/html5/thumbnails/8.jpg)
SIMD Makes Sorting Better Than Hashing?
Presenter: Haonan Wang (MIT) Sort vs Hash March 19, 2019 8 / 36
![Page 9: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited · Overview 1 Background Sort vs. Hash Motivation 2 Merge - Sort Join The basic idea Sort Phase Merge Phase Multi-Way Merge](https://reader034.vdocument.in/reader034/viewer/2022042215/5ebc8c99a972fc4d043dea97/html5/thumbnails/9.jpg)
Section 2
Merge - Sort Join
Presenter: Haonan Wang (MIT) Sort vs Hash March 19, 2019 9 / 36
![Page 10: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited · Overview 1 Background Sort vs. Hash Motivation 2 Merge - Sort Join The basic idea Sort Phase Merge Phase Multi-Way Merge](https://reader034.vdocument.in/reader034/viewer/2022042215/5ebc8c99a972fc4d043dea97/html5/thumbnails/10.jpg)
The basic idea for the designing
Partition Phase(Optional)→ Partition R and assign them to workers / cores.
Sort Phase→ Sort the tuples of R and S based on the join key.
Merge Phase→ Scan the sorted relations and compare tuples.→ The outer relation R only needs to be scanned once.
Presenter: Haonan Wang (MIT) Sort vs Hash March 19, 2019 10 / 36
![Page 11: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited · Overview 1 Background Sort vs. Hash Motivation 2 Merge - Sort Join The basic idea Sort Phase Merge Phase Multi-Way Merge](https://reader034.vdocument.in/reader034/viewer/2022042215/5ebc8c99a972fc4d043dea97/html5/thumbnails/11.jpg)
Subsection 2
Sort Phase
Presenter: Haonan Wang (MIT) Sort vs Hash March 19, 2019 11 / 36
![Page 12: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited · Overview 1 Background Sort vs. Hash Motivation 2 Merge - Sort Join The basic idea Sort Phase Merge Phase Multi-Way Merge](https://reader034.vdocument.in/reader034/viewer/2022042215/5ebc8c99a972fc4d043dea97/html5/thumbnails/12.jpg)
Sorting Networks(1)
Presenter: Haonan Wang (MIT) Sort vs Hash March 19, 2019 12 / 36
![Page 13: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited · Overview 1 Background Sort vs. Hash Motivation 2 Merge - Sort Join The basic idea Sort Phase Merge Phase Multi-Way Merge](https://reader034.vdocument.in/reader034/viewer/2022042215/5ebc8c99a972fc4d043dea97/html5/thumbnails/13.jpg)
Sorting Networks(2)
Presenter: Haonan Wang (MIT) Sort vs Hash March 19, 2019 13 / 36
![Page 14: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited · Overview 1 Background Sort vs. Hash Motivation 2 Merge - Sort Join The basic idea Sort Phase Merge Phase Multi-Way Merge](https://reader034.vdocument.in/reader034/viewer/2022042215/5ebc8c99a972fc4d043dea97/html5/thumbnails/14.jpg)
Sorting Networks(3)
Presenter: Haonan Wang (MIT) Sort vs Hash March 19, 2019 14 / 36
![Page 15: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited · Overview 1 Background Sort vs. Hash Motivation 2 Merge - Sort Join The basic idea Sort Phase Merge Phase Multi-Way Merge](https://reader034.vdocument.in/reader034/viewer/2022042215/5ebc8c99a972fc4d043dea97/html5/thumbnails/15.jpg)
Sorting Networks(4)
Presenter: Haonan Wang (MIT) Sort vs Hash March 19, 2019 15 / 36
![Page 16: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited · Overview 1 Background Sort vs. Hash Motivation 2 Merge - Sort Join The basic idea Sort Phase Merge Phase Multi-Way Merge](https://reader034.vdocument.in/reader034/viewer/2022042215/5ebc8c99a972fc4d043dea97/html5/thumbnails/16.jpg)
Sorting Networks(5)
Presenter: Haonan Wang (MIT) Sort vs Hash March 19, 2019 16 / 36
![Page 17: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited · Overview 1 Background Sort vs. Hash Motivation 2 Merge - Sort Join The basic idea Sort Phase Merge Phase Multi-Way Merge](https://reader034.vdocument.in/reader034/viewer/2022042215/5ebc8c99a972fc4d043dea97/html5/thumbnails/17.jpg)
Sorting Networks(6)
Presenter: Haonan Wang (MIT) Sort vs Hash March 19, 2019 17 / 36
![Page 18: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited · Overview 1 Background Sort vs. Hash Motivation 2 Merge - Sort Join The basic idea Sort Phase Merge Phase Multi-Way Merge](https://reader034.vdocument.in/reader034/viewer/2022042215/5ebc8c99a972fc4d043dea97/html5/thumbnails/18.jpg)
Sorting Networks Summary(1)
Presenter: Haonan Wang (MIT) Sort vs Hash March 19, 2019 18 / 36
![Page 19: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited · Overview 1 Background Sort vs. Hash Motivation 2 Merge - Sort Join The basic idea Sort Phase Merge Phase Multi-Way Merge](https://reader034.vdocument.in/reader034/viewer/2022042215/5ebc8c99a972fc4d043dea97/html5/thumbnails/19.jpg)
Sorting Networks Summary(2)
Always has fixed wiring paths for lists with the same number ofelements.
Efficient to execute on modern CPUs because of limited datadependencies and no branches.
Presenter: Haonan Wang (MIT) Sort vs Hash March 19, 2019 19 / 36
![Page 20: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited · Overview 1 Background Sort vs. Hash Motivation 2 Merge - Sort Join The basic idea Sort Phase Merge Phase Multi-Way Merge](https://reader034.vdocument.in/reader034/viewer/2022042215/5ebc8c99a972fc4d043dea97/html5/thumbnails/20.jpg)
Sorting Network Speed Up With SIMD(1)
Presenter: Haonan Wang (MIT) Sort vs Hash March 19, 2019 20 / 36
![Page 21: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited · Overview 1 Background Sort vs. Hash Motivation 2 Merge - Sort Join The basic idea Sort Phase Merge Phase Multi-Way Merge](https://reader034.vdocument.in/reader034/viewer/2022042215/5ebc8c99a972fc4d043dea97/html5/thumbnails/21.jpg)
Sorting Network Speed Up With SIMD(2)
Presenter: Haonan Wang (MIT) Sort vs Hash March 19, 2019 21 / 36
![Page 22: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited · Overview 1 Background Sort vs. Hash Motivation 2 Merge - Sort Join The basic idea Sort Phase Merge Phase Multi-Way Merge](https://reader034.vdocument.in/reader034/viewer/2022042215/5ebc8c99a972fc4d043dea97/html5/thumbnails/22.jpg)
Sorting Network Speed Up With SIMD(3)
Presenter: Haonan Wang (MIT) Sort vs Hash March 19, 2019 22 / 36
![Page 23: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited · Overview 1 Background Sort vs. Hash Motivation 2 Merge - Sort Join The basic idea Sort Phase Merge Phase Multi-Way Merge](https://reader034.vdocument.in/reader034/viewer/2022042215/5ebc8c99a972fc4d043dea97/html5/thumbnails/23.jpg)
Subsection 3
Merge Phase
Presenter: Haonan Wang (MIT) Sort vs Hash March 19, 2019 23 / 36
![Page 24: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited · Overview 1 Background Sort vs. Hash Motivation 2 Merge - Sort Join The basic idea Sort Phase Merge Phase Multi-Way Merge](https://reader034.vdocument.in/reader034/viewer/2022042215/5ebc8c99a972fc4d043dea97/html5/thumbnails/24.jpg)
Bitonic Merge Networks
Presenter: Haonan Wang (MIT) Sort vs Hash March 19, 2019 24 / 36
![Page 25: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited · Overview 1 Background Sort vs. Hash Motivation 2 Merge - Sort Join The basic idea Sort Phase Merge Phase Multi-Way Merge](https://reader034.vdocument.in/reader034/viewer/2022042215/5ebc8c99a972fc4d043dea97/html5/thumbnails/25.jpg)
Merging Larger Lists using Bitonic Merge
Presenter: Haonan Wang (MIT) Sort vs Hash March 19, 2019 25 / 36
![Page 26: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited · Overview 1 Background Sort vs. Hash Motivation 2 Merge - Sort Join The basic idea Sort Phase Merge Phase Multi-Way Merge](https://reader034.vdocument.in/reader034/viewer/2022042215/5ebc8c99a972fc4d043dea97/html5/thumbnails/26.jpg)
Merging-Sort Tree
Presenter: Haonan Wang (MIT) Sort vs Hash March 19, 2019 26 / 36
![Page 27: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited · Overview 1 Background Sort vs. Hash Motivation 2 Merge - Sort Join The basic idea Sort Phase Merge Phase Multi-Way Merge](https://reader034.vdocument.in/reader034/viewer/2022042215/5ebc8c99a972fc4d043dea97/html5/thumbnails/27.jpg)
Merging-Sort Hierarchy(Summary)
in-register sorting, with runs that fit into (SIMD) CPU registers;
in-cache sorting, where runs can still be held in a CPU-local cache;
out-of-cache sorting, once runs exceed cache sizes.
Presenter: Haonan Wang (MIT) Sort vs Hash March 19, 2019 27 / 36
![Page 28: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited · Overview 1 Background Sort vs. Hash Motivation 2 Merge - Sort Join The basic idea Sort Phase Merge Phase Multi-Way Merge](https://reader034.vdocument.in/reader034/viewer/2022042215/5ebc8c99a972fc4d043dea97/html5/thumbnails/28.jpg)
Subsection 4
Multi-Way Merge
Presenter: Haonan Wang (MIT) Sort vs Hash March 19, 2019 28 / 36
![Page 29: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited · Overview 1 Background Sort vs. Hash Motivation 2 Merge - Sort Join The basic idea Sort Phase Merge Phase Multi-Way Merge](https://reader034.vdocument.in/reader034/viewer/2022042215/5ebc8c99a972fc4d043dea97/html5/thumbnails/29.jpg)
Impact Of Numa
In practice, at least some merging passes will inevitably cross NUMAboundaries.
multisocket systems show an increasing asymmetry, where the NUMAinterconnect bandwidth stays further and further behind theaggregate memory bandwidth that the individual memory controllerscould provide.
Presenter: Haonan Wang (MIT) Sort vs Hash March 19, 2019 29 / 36
![Page 30: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited · Overview 1 Background Sort vs. Hash Motivation 2 Merge - Sort Join The basic idea Sort Phase Merge Phase Multi-Way Merge](https://reader034.vdocument.in/reader034/viewer/2022042215/5ebc8c99a972fc4d043dea97/html5/thumbnails/30.jpg)
m-way
Presenter: Haonan Wang (MIT) Sort vs Hash March 19, 2019 30 / 36
![Page 31: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited · Overview 1 Background Sort vs. Hash Motivation 2 Merge - Sort Join The basic idea Sort Phase Merge Phase Multi-Way Merge](https://reader034.vdocument.in/reader034/viewer/2022042215/5ebc8c99a972fc4d043dea97/html5/thumbnails/31.jpg)
Section 3
Experiment
Presenter: Haonan Wang (MIT) Sort vs Hash March 19, 2019 31 / 36
![Page 32: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited · Overview 1 Background Sort vs. Hash Motivation 2 Merge - Sort Join The basic idea Sort Phase Merge Phase Multi-Way Merge](https://reader034.vdocument.in/reader034/viewer/2022042215/5ebc8c99a972fc4d043dea97/html5/thumbnails/32.jpg)
Settings
Intel Sandy Bridge with a 256-bit AVX instruction set.
Four-socket configuration, with each CPU socket containing 8physical cores and 16 thread contexts by the help of thehyper-threading.
Cache sizes are 32 KiB for L1, 256 KiB for L2, and 20 MiB L3 (thelatter shared by the 16 threads within the socket).The cache line sizeof the system is 64 bytes. TLB1 contains 64/32 entries when using 4KiB/2 MiB pages (respectively) and 512 TLB2 entries (page size 4KiB). Total memory available is 512 GiB (DDR3 at 1600 MHz).
Presenter: Haonan Wang (MIT) Sort vs Hash March 19, 2019 32 / 36
![Page 33: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited · Overview 1 Background Sort vs. Hash Motivation 2 Merge - Sort Join The basic idea Sort Phase Merge Phase Multi-Way Merge](https://reader034.vdocument.in/reader034/viewer/2022042215/5ebc8c99a972fc4d043dea97/html5/thumbnails/33.jpg)
Scalability
Presenter: Haonan Wang (MIT) Sort vs Hash March 19, 2019 33 / 36
![Page 34: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited · Overview 1 Background Sort vs. Hash Motivation 2 Merge - Sort Join The basic idea Sort Phase Merge Phase Multi-Way Merge](https://reader034.vdocument.in/reader034/viewer/2022042215/5ebc8c99a972fc4d043dea97/html5/thumbnails/34.jpg)
Result(1)
Presenter: Haonan Wang (MIT) Sort vs Hash March 19, 2019 34 / 36
![Page 35: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited · Overview 1 Background Sort vs. Hash Motivation 2 Merge - Sort Join The basic idea Sort Phase Merge Phase Multi-Way Merge](https://reader034.vdocument.in/reader034/viewer/2022042215/5ebc8c99a972fc4d043dea97/html5/thumbnails/35.jpg)
Result(2)
Presenter: Haonan Wang (MIT) Sort vs Hash March 19, 2019 35 / 36
![Page 36: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited · Overview 1 Background Sort vs. Hash Motivation 2 Merge - Sort Join The basic idea Sort Phase Merge Phase Multi-Way Merge](https://reader034.vdocument.in/reader034/viewer/2022042215/5ebc8c99a972fc4d043dea97/html5/thumbnails/36.jpg)
The End
Presenter: Haonan Wang (MIT) Sort vs Hash March 19, 2019 36 / 36