multi-threaded queries - cmu 15-721done: implemented multi-threaded llvm versions of sequential scan...

20
Multi-threaded Queries Intra-Query Parallelism in LLVM

Upload: others

Post on 05-Jun-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multi-threaded Queries - CMU 15-721Done: Implemented multi-threaded LLVM versions of Sequential Scan with Filters and Hash-join, and the results are good. Todo: keep working on Aggregations

Multi-threaded QueriesIntra-Query Parallelism in LLVM

Page 2: Multi-threaded Queries - CMU 15-721Done: Implemented multi-threaded LLVM versions of Sequential Scan with Filters and Hash-join, and the results are good. Todo: keep working on Aggregations

Multithreaded QueriesIntra-Query Parallelism in LLVM

Tianqi WuYang Liu Hao Li

Page 3: Multi-threaded Queries - CMU 15-721Done: Implemented multi-threaded LLVM versions of Sequential Scan with Filters and Hash-join, and the results are good. Todo: keep working on Aggregations

Interpreted vs Compiled (LLVM)

Page 4: Multi-threaded Queries - CMU 15-721Done: Implemented multi-threaded LLVM versions of Sequential Scan with Filters and Hash-join, and the results are good. Todo: keep working on Aggregations

Interpreted vs Compiled (LLVM)

Interpreted Compiled (LLVM)

source: Prashanth Menon

Page 5: Multi-threaded Queries - CMU 15-721Done: Implemented multi-threaded LLVM versions of Sequential Scan with Filters and Hash-join, and the results are good. Todo: keep working on Aggregations

Interpreted vs Compiled (LLVM)

InterpretedCompiled (LLVM)

source: Prashanth Menon

Peloton currently only uses a single thread (worker) for each query.

Project: add support for intra-query parallelism that can work with the new LLVM execution engine to further improve performance.

Page 6: Multi-threaded Queries - CMU 15-721Done: Implemented multi-threaded LLVM versions of Sequential Scan with Filters and Hash-join, and the results are good. Todo: keep working on Aggregations

We wrote code in LLVM that generates IR code that runs in a multi-threaded manner.

Page 7: Multi-threaded Queries - CMU 15-721Done: Implemented multi-threaded LLVM versions of Sequential Scan with Filters and Hash-join, and the results are good. Todo: keep working on Aggregations

Goals

75%: Implement multi-threaded LLVM versions of Sequential Scan with Filters

100%: Implement multi-threaded LLVM versions of Hash-join and Aggregations

125%: Make them NUMA-aware to further improve performance

!

Page 8: Multi-threaded Queries - CMU 15-721Done: Implemented multi-threaded LLVM versions of Sequential Scan with Filters and Hash-join, and the results are good. Todo: keep working on Aggregations

Benchmark

Page 9: Multi-threaded Queries - CMU 15-721Done: Implemented multi-threaded LLVM versions of Sequential Scan with Filters and Hash-join, and the results are good. Todo: keep working on Aggregations

Benchmark

• We didn’t use TPC-H because those queries contains aggregations that we currently don’t support.

• Sequential Scan: • synthetic table with 1M tuples • predicates: a >= ? and b >= a

• Hash-join: • synthetic left table with 250K tuples • synthetic right table with 1M tuples • predicates: left_table.col == right_table.col

• Machine: • Dual Socket Intel Xeon E5-2620v3 @ 2.40GHz (6 cores / 12 threads)

Page 10: Multi-threaded Queries - CMU 15-721Done: Implemented multi-threaded LLVM versions of Sequential Scan with Filters and Hash-join, and the results are good. Todo: keep working on Aggregations

Sequential Scan: Execution Time

Single-threaded LLVM Multi-threaded LLVM (4 threads)

Page 11: Multi-threaded Queries - CMU 15-721Done: Implemented multi-threaded LLVM versions of Sequential Scan with Filters and Hash-join, and the results are good. Todo: keep working on Aggregations

Single-threaded LLVM Multi-threaded LLVM (4 threads)

TIm

es

0

1

2

3

4

Selectivity0 0.25 0.5 0.75 1

2.37 2.52.88 2.98

2.72

Speedup

Sequential Scan: Execution Time

Page 12: Multi-threaded Queries - CMU 15-721Done: Implemented multi-threaded LLVM versions of Sequential Scan with Filters and Hash-join, and the results are good. Todo: keep working on Aggregations

Table Size: 1M tuples Predicates: a >= ? and b >= a

Dual Socket Intel Xeon E5-2620v3 @ 2.40GHz (6 cores / 12 threads)

Exec

utio

n Ti

me

(ms)

0

175

350

525

700

Exec

utio

n Ti

me

(ms)

0

175

350

525

700

Number of Threads1 2 4 6 8 10 12

Sequential Scan: Execution Time

Page 13: Multi-threaded Queries - CMU 15-721Done: Implemented multi-threaded LLVM versions of Sequential Scan with Filters and Hash-join, and the results are good. Todo: keep working on Aggregations

Table Size: 1M tuples Predicates: a >= ? and b >= a

Dual Socket Intel Xeon E5-2620v3 @ 2.40GHz (6 cores / 12 threads)

Com

pila

tion

Tim

e (m

s)

0

3

6

9

12

Number of Threads1 2 4 6 8 10 12

7.387.8 7.79 7.93

8.47 8.16 8.53

6.46 6.46 6.46 6.46 6.46 6.46 6.46

Single-thraeded LLVMMulti-threaded LLVM

Sequential Scan: Compilation Time

Page 14: Multi-threaded Queries - CMU 15-721Done: Implemented multi-threaded LLVM versions of Sequential Scan with Filters and Hash-join, and the results are good. Todo: keep working on Aggregations

Design Decision: Code Generation

Single-threaded LLVM

core

ExecuteQuery(All)

Generate Code for Every Thread

core

ExecuteQuery(1/4)

core

ExecuteQuery(1/4)

core

ExecuteQuery(1/4)

core

ExecuteQuery(1/4)

Poor ScalabilityMuch Longer

Compilation Time

Page 15: Multi-threaded Queries - CMU 15-721Done: Implemented multi-threaded LLVM versions of Sequential Scan with Filters and Hash-join, and the results are good. Todo: keep working on Aggregations

Shared Code Among Threads

core core core

ExecuteQuery()

core

Context0 Context1 Context2 Context3

Design Decision: Code Generation

Page 16: Multi-threaded Queries - CMU 15-721Done: Implemented multi-threaded LLVM versions of Sequential Scan with Filters and Hash-join, and the results are good. Todo: keep working on Aggregations

Shared Code Among Threads

core core core

ExecuteQuery()

core

Context0 Context1 Context2 Context3• The amount of code it generates is pretty

much the same as single-threaded version

• Threads are independent of each other and thus can be easily bound to thread instance in a thread pool

• C++ written context and its proxies can make life easier. And the function calls won’t affect performance if they are not on the critical path.

Design Decision: Code Generation

Page 17: Multi-threaded Queries - CMU 15-721Done: Implemented multi-threaded LLVM versions of Sequential Scan with Filters and Hash-join, and the results are good. Todo: keep working on Aggregations

Performance: Hash-JoinTable Size: 250K ⨝ 1M

Dual Socket Intel Xeon E5-2620v3 @ 2.40GHz (6 cores / 12 threads)

Exec

utio

n Ti

me

(ms)

0

250

500

750

1000

Exec

utio

n Ti

me

(ms)

0

250

500

750

1000

Number of Threads

1 2 4 6 8 10 12

Multi-threaded LLVMMulti-threaded LLVMSingle-threaded LLVM

Page 18: Multi-threaded Queries - CMU 15-721Done: Implemented multi-threaded LLVM versions of Sequential Scan with Filters and Hash-join, and the results are good. Todo: keep working on Aggregations

Design Decision: Global Hash Table Construction

One thread takes care of merging all local hash tables into a global one and notify all other threads when finish.

• Threads would be aware of others, which could complicate the multi-threading model.

• Constructing local hash tables on their own stacks is more efficient, but stuff on stack cannot be reached by other threads.

Every thread takes care of merging its own local hash table into the shared global hash table (on heap)

• Threads can do the merging without being aware of other threads.

• The hash table in Peloton is written in LLVM. It can efficiently operate on raw data but it’s hard (almost impossible) to make it concurrent. In our implementation, the merging is blocking and will be served in a first-come-first-serve order.

Page 19: Multi-threaded Queries - CMU 15-721Done: Implemented multi-threaded LLVM versions of Sequential Scan with Filters and Hash-join, and the results are good. Todo: keep working on Aggregations

Summary

Done: Implemented multi-threaded LLVM versions of Sequential Scan with Filters and Hash-join, and the results are good.

Todo: keep working on Aggregations and refactoring before merging to the master branch

Writing LLVM is non-trivial, and it’s even trickier to write LLVM to generate multi-threaded code. Thank you for the help, Prashanth!!!!

Page 20: Multi-threaded Queries - CMU 15-721Done: Implemented multi-threaded LLVM versions of Sequential Scan with Filters and Hash-join, and the results are good. Todo: keep working on Aggregations

Thank You!