gt 4420/6422 // spring 2019 // @joy arulraj lecture #13...
TRANSCRIPT
![Page 1: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/1.jpg)
DATABASE SYSTEM IMPLEMENTATION
GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ
LECTURE #13: QUERY COMPILATION
![Page 2: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/2.jpg)
TODAY’S AGENDA
BackgroundCode Generation / TranspilationJIT Compilation (LLVM)Real-world Implementations
2
![Page 3: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/3.jpg)
ANATOMY OF A DATABASE SYSTEM
Connection Manager + Admission Control
Query Parser
Query Optimizer
Query Executor
Lock Manager (Concurrency Control)
Access Methods (or Indexes)
Buffer Pool Manager
Log Manager
Memory Manager + Disk Manager
Networking Manager
3
QueryTransactional
Storage Manager
Query Processor
Shared Utilities
Process Manager
Source: Anatomy of a Database System
![Page 4: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/4.jpg)
HEKATON REMARK
After switching to an in-memory DBMS, the only way to increase throughput is to reduce the number of instructions executed.→ To go 10x faster, the DBMS must execute 90% fewer
instructions…→ To go 100x faster, the DBMS must execute 99% fewer
instructions…
4
COMPILATION IN THE MICROSOFT SQL SERVER HEKATON ENGINEIEEE Data Engineering Bulletin 2011
![Page 5: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/5.jpg)
OBSERVATION
One way to achieve such a reduction in instructions is through code specialization.
This means generating code that is specific to a particular query in the DBMS.→ Encode everything known about the data (e.g., type) and
query (e.g., where clause)Most code is written to make it easy for humans to understand rather than performance…→ Interpretation vs. Compilation
5
![Page 6: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/6.jpg)
EXAMPLE DATABASE
6
CREATE TABLE A (id INT PRIMARY KEY,val INT
);
CREATE TABLE B (id INT PRIMARY KEY,val INT
);
CREATE TABLE C (a_id INT REFERENCES A(id),b_id INT REFERENCES B(id),PRIMARY KEY (a_id, b_id)
);
![Page 7: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/7.jpg)
QUERY PROCESSING
7
SELECT A.id, B.valFROM A, BWHERE A.id = B.idAND B.val > 100
A B
A.id=B.id
val>100
A.id, B.val
⨝s
p
![Page 8: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/8.jpg)
QUERY PROCESSING
8
Tuple-at-a-time→ Each operator calls next on their child to get
the next tuple to process.Vector-at-a-time→ Each operator calls next on their child to get
the next chunk of data to process.Operator-at-a-time→ Each operator materializes their entire output
for their parent operator.
SELECT A.id, B.valFROM A, BWHERE A.id = B.idAND B.val > 100
A B
A.id=B.id
val>100
A.id, B.val
⨝s
p
![Page 9: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/9.jpg)
QUERY PROCESSING
9
Tuple-at-a-time→ Each operator calls next on their child to get
the next tuple to process.Vector-at-a-time→ Each operator calls next on their child to get
the next chunk of data to process.Operator-at-a-time→ Each operator materializes their entire output
for their parent operator.
SELECT A.id, B.valFROM A, BWHERE A.id = B.idAND B.val > 100
A B
A.id=B.id
val>100
A.id, B.val
⨝s
p
![Page 10: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/10.jpg)
QUERY INTERPRETATION
10
SELECT *FROM A, C, (SELECT B.id, COUNT(*)
FROM BWHERE B.val = ? + 1GROUP BY B.id) AS B
WHERE A.val = 123 AND A.id = C.a_idAND B.id = C.b_id
⨝A.id=C.a_id
σA.val=123
A
⨝B.id=C.b_id
ΓB.id, COUNT(*)
σB.val=?+1
B C
![Page 11: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/11.jpg)
QUERY INTERPRETATION
11
SELECT *FROM A, C, (SELECT B.id, COUNT(*)
FROM BWHERE B.val = ? + 1GROUP BY B.id) AS B
WHERE A.val = 123 AND A.id = C.a_idAND B.id = C.b_id
⨝A.id=C.a_id
σA.val=123
A
⨝B.id=C.b_id
ΓB.id, COUNT(*)
σB.val=?+1
B C
⨝ for t1 in left.next():buildHashTable(t1)
for t2 in right.next():if probe(t2): emit(t1⨝t2)
for t in child.next():if evalPred(t): emit(t)σ ⨝ for t1 in left.next():
buildHashTable(t1)for t2 in right.next():
if probe(t2): emit(t1⨝t2)
for t in A:emit(t)A
for t in B:emit(t)B for t in C:
emit(t)C
for t in child.next():if evalPred(t): emit(t)σ
Γfor t in child.next():
buildAggregateTable(t)for t in aggregateTable:
emit(t)
![Page 12: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/12.jpg)
PREDICATE INTERPRETATION
12
SELECT *FROM A, C, (SELECT B.id, COUNT(*)
FROM BWHERE B.val = ? + 1GROUP BY B.id) AS B
WHERE A.val = 123 AND A.id = C.a_idAND B.id = C.b_id
![Page 13: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/13.jpg)
PREDICATE INTERPRETATION
13
SELECT *FROM A, C, (SELECT B.id, COUNT(*)
FROM BWHERE B.val = ? + 1GROUP BY B.id) AS B
WHERE A.val = 123 AND A.id = C.a_idAND B.id = C.b_id
![Page 14: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/14.jpg)
Execution Context
PREDICATE INTERPRETATION
14
SELECT *FROM A, C, (SELECT B.id, COUNT(*)
FROM BWHERE B.val = ? + 1GROUP BY B.id) AS B
WHERE A.val = 123 AND A.id = C.a_idAND B.id = C.b_id
Current Tuple(123, 1000)
Query Parameters(int:999)
Table SchemaB→(int:id, int:val)
TupleAttribute(B.val)
Constant(1)
=
+
Parameter(0)
![Page 15: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/15.jpg)
Execution Context
PREDICATE INTERPRETATION
15
SELECT *FROM A, C, (SELECT B.id, COUNT(*)
FROM BWHERE B.val = ? + 1GROUP BY B.id) AS B
WHERE A.val = 123 AND A.id = C.a_idAND B.id = C.b_id
Current Tuple(123, 1000)
Query Parameters(int:999)
Table SchemaB→(int:id, int:val)
TupleAttribute(B.val)
Constant(1)
=
+
Parameter(0)
![Page 16: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/16.jpg)
Execution Context
PREDICATE INTERPRETATION
16
SELECT *FROM A, C, (SELECT B.id, COUNT(*)
FROM BWHERE B.val = ? + 1GROUP BY B.id) AS B
WHERE A.val = 123 AND A.id = C.a_idAND B.id = C.b_id
Current Tuple(123, 1000)
Query Parameters(int:999)
Table SchemaB→(int:id, int:val)
TupleAttribute(B.val)
Constant(1)
=
+
Parameter(0)
![Page 17: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/17.jpg)
1000
Execution Context
PREDICATE INTERPRETATION
17
SELECT *FROM A, C, (SELECT B.id, COUNT(*)
FROM BWHERE B.val = ? + 1GROUP BY B.id) AS B
WHERE A.val = 123 AND A.id = C.a_idAND B.id = C.b_id
Current Tuple(123, 1000)
Query Parameters(int:999)
Table SchemaB→(int:id, int:val)
TupleAttribute(B.val)
Constant(1)
=
+
Parameter(0)
![Page 18: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/18.jpg)
1000
Execution Context
PREDICATE INTERPRETATION
18
SELECT *FROM A, C, (SELECT B.id, COUNT(*)
FROM BWHERE B.val = ? + 1GROUP BY B.id) AS B
WHERE A.val = 123 AND A.id = C.a_idAND B.id = C.b_id
Current Tuple(123, 1000)
Query Parameters(int:999)
Table SchemaB→(int:id, int:val)
TupleAttribute(B.val)
Constant(1)
=
+
Parameter(0)
![Page 19: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/19.jpg)
1000
Execution Context
PREDICATE INTERPRETATION
19
SELECT *FROM A, C, (SELECT B.id, COUNT(*)
FROM BWHERE B.val = ? + 1GROUP BY B.id) AS B
WHERE A.val = 123 AND A.id = C.a_idAND B.id = C.b_id
Current Tuple(123, 1000)
Query Parameters(int:999)
Table SchemaB→(int:id, int:val)
TupleAttribute(B.val)
Constant(1)
=
+
Parameter(0)
![Page 20: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/20.jpg)
1000
999
Execution Context
PREDICATE INTERPRETATION
20
SELECT *FROM A, C, (SELECT B.id, COUNT(*)
FROM BWHERE B.val = ? + 1GROUP BY B.id) AS B
WHERE A.val = 123 AND A.id = C.a_idAND B.id = C.b_id
Current Tuple(123, 1000)
Query Parameters(int:999)
Table SchemaB→(int:id, int:val)
TupleAttribute(B.val)
Constant(1)
=
+
Parameter(0)
![Page 21: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/21.jpg)
1000
999
Execution Context
PREDICATE INTERPRETATION
21
SELECT *FROM A, C, (SELECT B.id, COUNT(*)
FROM BWHERE B.val = ? + 1GROUP BY B.id) AS B
WHERE A.val = 123 AND A.id = C.a_idAND B.id = C.b_id
Current Tuple(123, 1000)
Query Parameters(int:999)
Table SchemaB→(int:id, int:val)
TupleAttribute(B.val)
Constant(1)
=
+
Parameter(0)
![Page 22: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/22.jpg)
1000
999 1
Execution Context
PREDICATE INTERPRETATION
22
SELECT *FROM A, C, (SELECT B.id, COUNT(*)
FROM BWHERE B.val = ? + 1GROUP BY B.id) AS B
WHERE A.val = 123 AND A.id = C.a_idAND B.id = C.b_id
Current Tuple(123, 1000)
Query Parameters(int:999)
Table SchemaB→(int:id, int:val)
TupleAttribute(B.val)
Constant(1)
=
+
Parameter(0)
![Page 23: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/23.jpg)
1000
999 1
true
1000
Execution Context
PREDICATE INTERPRETATION
23
SELECT *FROM A, C, (SELECT B.id, COUNT(*)
FROM BWHERE B.val = ? + 1GROUP BY B.id) AS B
WHERE A.val = 123 AND A.id = C.a_idAND B.id = C.b_id
Current Tuple(123, 1000)
Query Parameters(int:999)
Table SchemaB→(int:id, int:val)
TupleAttribute(B.val)
Constant(1)
=
+
Parameter(0)
![Page 24: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/24.jpg)
CODE SPECIALIZATION
Any CPU intensive entity of database can be natively compiled if they have a similar execution pattern on different inputs. → Access Methods→ Stored Procedures→ Operator Execution→ Predicate Evaluation→ Logging Operations
24
![Page 25: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/25.jpg)
BENEFITS
Attribute types are known a priori.→ Data access function calls can be converted to inline
pointer casting.Predicates are known a priori.→ They can be evaluated using primitive data comparisons.No function calls in loops→ Allows the compiler to efficiently distribute data to
registers and increase cache reuse.
25
![Page 26: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/26.jpg)
ARCHITECTURE OVERVIEW
26
SQL Query
ParserAbstract
SyntaxTree
Physical Plan
CostEstimates
SystemCatalog
Binder
OptimizerAnnotated
AST
Native Code
Compiler
![Page 27: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/27.jpg)
CODE GENERATION
Approach #1: Transpilation→ Write code that converts a relational query plan into
C/C++ and then run it through a conventional compiler to generate native code.
Approach #2: JIT Compilation→ Generate an intermediate representation (IR) of the query
that can be quickly compiled into native code .
27
![Page 28: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/28.jpg)
HIQUE – CODE GENERATION
For a given query plan, create a C/C++ program that implements that query’s execution.→ Bake in all the predicates and type conversions.
Use an off-shelf compiler to convert the code into a shared object, link it to the DBMS process, and then invoke the exec function.
28
GENERATING CODE FOR HOLISTIC QUERY EVALUATIONICDE 2010
![Page 29: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/29.jpg)
OPERATOR TEMPLATES
29
SELECT * FROM A WHERE A.val = ? + 1
![Page 30: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/30.jpg)
Interpreted Plan
OPERATOR TEMPLATES
30
for t in range(table.num_tuples):tuple = get_tuple(table, t)if eval(predicate, tuple, params):
emit(tuple)
![Page 31: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/31.jpg)
Interpreted Plan
OPERATOR TEMPLATES
31
for t in range(table.num_tuples):tuple = get_tuple(table, t)if eval(predicate, tuple, params):
emit(tuple)
1. Get schema in catalog for table.2. Calculate offset based on tuple size.3. Return pointer to tuple.
![Page 32: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/32.jpg)
Interpreted Plan
OPERATOR TEMPLATES
32
for t in range(table.num_tuples):tuple = get_tuple(table, t)if eval(predicate, tuple, params):
emit(tuple)
1. Get schema in catalog for table.2. Calculate offset based on tuple size.3. Return pointer to tuple.
1. Traverse predicate tree and pull values up.2. If tuple value, calculate the offset of the target
attribute.3. Perform casting as needed for comparison operators.4. Return true / false.
![Page 33: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/33.jpg)
Templated PlanInterpreted Plan
OPERATOR TEMPLATES
33
tuple_size = ###predicate_offset = ###parameter_value = ###
for t in range(table.num_tuples):tuple = table.data + t ∗ tuple_sizeval = (tuple+predicate_offset)if (val == parameter_value + 1):
emit(tuple)
for t in range(table.num_tuples):tuple = get_tuple(table, t)if eval(predicate, tuple, params):
emit(tuple)
1. Get schema in catalog for table.2. Calculate offset based on tuple size.3. Return pointer to tuple.
1. Traverse predicate tree and pull values up.2. If tuple value, calculate the offset of the target
attribute.3. Perform casting as needed for comparison operators.4. Return true / false.
![Page 34: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/34.jpg)
Templated PlanInterpreted Plan
OPERATOR TEMPLATES
34
tuple_size = ###predicate_offset = ###parameter_value = ###
for t in range(table.num_tuples):tuple = table.data + t ∗ tuple_sizeval = (tuple+predicate_offset)if (val == parameter_value + 1):
emit(tuple)
for t in range(table.num_tuples):tuple = get_tuple(table, t)if eval(predicate, tuple, params):
emit(tuple)
1. Get schema in catalog for table.2. Calculate offset based on tuple size.3. Return pointer to tuple.
1. Traverse predicate tree and pull values up.2. If tuple value, calculate the offset of the target
attribute.3. Perform casting as needed for comparison operators.4. Return true / false.
![Page 35: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/35.jpg)
Templated PlanInterpreted Plan
OPERATOR TEMPLATES
35
tuple_size = ###predicate_offset = ###parameter_value = ###
for t in range(table.num_tuples):tuple = table.data + t ∗ tuple_sizeval = (tuple+predicate_offset)if (val == parameter_value + 1):
emit(tuple)
for t in range(table.num_tuples):tuple = get_tuple(table, t)if eval(predicate, tuple, params):
emit(tuple)
1. Get schema in catalog for table.2. Calculate offset based on tuple size.3. Return pointer to tuple.
1. Traverse predicate tree and pull values up.2. If tuple value, calculate the offset of the target
attribute.3. Perform casting as needed for comparison operators.4. Return true / false.
![Page 36: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/36.jpg)
Templated PlanInterpreted Plan
OPERATOR TEMPLATES
36
tuple_size = ###predicate_offset = ###parameter_value = ###
for t in range(table.num_tuples):tuple = table.data + t ∗ tuple_sizeval = (tuple+predicate_offset)if (val == parameter_value + 1):
emit(tuple)
for t in range(table.num_tuples):tuple = get_tuple(table, t)if eval(predicate, tuple, params):
emit(tuple)
1. Get schema in catalog for table.2. Calculate offset based on tuple size.3. Return pointer to tuple.
1. Traverse predicate tree and pull values up.2. If tuple value, calculate the offset of the target
attribute.3. Perform casting as needed for comparison operators.4. Return true / false.
![Page 37: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/37.jpg)
DBMS INTEGRATION
The generated query code can invoke any other function in the DBMS.
This allows it to use all the same components as interpreted queries.→ Concurrency Control→ Logging / Checkpoints→ Indexes
37
![Page 38: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/38.jpg)
EVALUATION
Generic Iterators→ Canonical model with generic predicate evaluation.Optimized Iterators→ Type-specific iterators with inline predicates.Generic Hardcoded→ Handwritten code with generic iterators/predicates.Optimized Hardcoded→ Direct tuple access with pointer arithmetic.HIQUE→ Query-specific specialized code.
38
![Page 39: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/39.jpg)
QUERY COMPILATION EVALUATION
39
0
50
100
150
200
250
Generic Iterators
Optimized Iterators
Generic Hardcoded
Optimized Hardcoded
HIQUE
Exec
utio
n Ti
me (
ms)
L2-cache Miss Memory Stall Instruction Exec.
Intel Core 2 Duo 6300 @ 1.86GHzJoin Query: 10k⨝ 10k→10m
Source: Konstantinos Krikellas
![Page 40: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/40.jpg)
QUERY COMPILATION EVALUATION
40
0
50
100
150
200
250
Generic Iterators
Optimized Iterators
Generic Hardcoded
Optimized Hardcoded
HIQUE
Exec
utio
n Ti
me (
ms)
L2-cache Miss Memory Stall Instruction Exec.
Intel Core 2 Duo 6300 @ 1.86GHzJoin Query: 10k⨝ 10k→10m
Source: Konstantinos Krikellas
![Page 41: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/41.jpg)
QUERY COMPILATION COST
41
121 160 213274
403
619
0
200
400
600
800
Q1 Q2 Q3
Com
pila
tion
Tim
e (m
s)
Compile (-O0) Compile (-O2)
Intel Core 2 Duo 6300 @ 1.86GHzTPC-H Queries
Source: Konstantinos Krikellas
![Page 42: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/42.jpg)
OBSERVATION
Relational operators are a useful way to reason about a query but are not the most efficient way to execute it.
It takes a (relatively) long time to compile a C/C++ source file into executable code.
HIQUE does not allow for full pipelining…
42
![Page 43: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/43.jpg)
PIPELINED OPERATORS
43
SELECT *FROM A, C, (SELECT B.id, COUNT(*)
FROM BWHERE B.val = ? + 1GROUP BY B.id) AS B
WHERE A.val = 123 AND A.id = C.a_idAND B.id = C.b_id
⨝A.id=C.a_id
σA.val=123
A
⨝B.id=C.b_id
ΓB.id,COUNT(*)
σB.val=?+1
B C
![Page 44: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/44.jpg)
PIPELINED OPERATORS
44
SELECT *FROM A, C, (SELECT B.id, COUNT(*)
FROM BWHERE B.val = ? + 1GROUP BY B.id) AS B
WHERE A.val = 123 AND A.id = C.a_idAND B.id = C.b_id
⨝A.id=C.a_id
σA.val=123
A
⨝B.id=C.b_id
ΓB.id,COUNT(*)
σB.val=?+1
B C
Pipeline Boundaries #1
#4
#2
#3
![Page 45: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/45.jpg)
HYPER – JIT QUERY COMPILATION
Compile queries in-memory into native code using the LLVM toolkit.
Organizes query processing in a way to keep a tuple in CPU registers for as long as possible.→ Push-based vs. Pull-based→ Data Centric vs. Operator Centric
45
EFFICIENTLY COMPILING EFFICIENT QUERY PLANS FOR MODERN HARDWAREVLDB 2011
![Page 46: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/46.jpg)
LLVM
Collection of modular and reusable compiler and toolchain technologies.
Core component is a low-level programming language (IR) that is similar to assembly.
Not all of the DBMS components need to be written in LLVM IR.→ LLVM code can make calls to C++ code.
46
![Page 47: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/47.jpg)
LLVM
47
C Frontend
CommonOptimizer
X86 Backend
Fortran Frontend
Ada Frontend
PowerPCBackend
ARMBackendIntermediate
Representation (IR)
Source: The Architecture of Open Source Applications
![Page 48: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/48.jpg)
PUSH-BASED EXECUTION
48
SELECT *FROM A, C, (SELECT B.id, COUNT(*)
FROM BWHERE B.val = ? + 1GROUP BY B.id) AS B
WHERE A.val = 123 AND A.id = C.a_idAND B.id = C.b_id
Generated Query Planfor t in A:
if t.val == 123:Materialize t in HashTable ⨝(A.id=C.a_id)
for t in B:if t.val == <param> + 1:
Aggregate t in HashTable Γ(B.id)
for t in Γ(B.id):Materialize t in HashTable ⨝(B.id=C.b_id)
for t3 in C:for t2 in ⨝(B.id=C.b_id):
for t1 in ⨝(A.id=C.a_id):emit(t1⨝t2⨝t3)
![Page 49: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/49.jpg)
PUSH-BASED EXECUTION
49
SELECT *FROM A, C, (SELECT B.id, COUNT(*)
FROM BWHERE B.val = ? + 1GROUP BY B.id) AS B
WHERE A.val = 123 AND A.id = C.a_idAND B.id = C.b_id
Generated Query Planfor t in A:
if t.val == 123:Materialize t in HashTable ⨝(A.id=C.a_id)
for t in B:if t.val == <param> + 1:
Aggregate t in HashTable Γ(B.id)
for t in Γ(B.id):Materialize t in HashTable ⨝(B.id=C.b_id)
for t3 in C:for t2 in ⨝(B.id=C.b_id):
for t1 in ⨝(A.id=C.a_id):emit(t1⨝t2⨝t3)
#1
#4
#2
#3
![Page 50: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/50.jpg)
QUERY COMPILATION EVALUATION
50
1
10
100
1000
10000
100000
Q1 Q2 Q3 Q4 Q5
Exec
utio
n Ti
me (
ms)
HyPer (LLVM) HyPer (C++) VectorWise MonetDB ???
Dual Socket Intel Xeon X5770 @ 2.93GHzTPC-H Queries
Source: Thomas Neumann
![Page 51: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/51.jpg)
QUERY COMPILATION COST
51
274403
619
13 37 150
200
400
600
800
Query #1 Query #2 Query #3
Com
pila
tion
Tim
e (m
s)
HIQUE HyPer
HIQUE (-O2) vs. HyPerTPC-H Queries
Source: Konstantinos Krikellas
![Page 52: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/52.jpg)
QUERY COMPILATION COST
LLVM's compilation time grows super-linearly relative to the query size.→ # of joins→ # of predicates→ # of aggregations
Not a big issue with OLTP applications.Major problem with OLAP workloads.
52
![Page 53: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/53.jpg)
HYPER – ADAPTIVE EXECUTION
First generate the LLVM IR for the query.Then execute that IR in an interpreter.Compile the query in the background.When the compiled query is ready, seamlessly replace the interpretive execution.
53
ADAPTIVE EXECUTION OF COMPILED QUERIESICDE 2018
![Page 54: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/54.jpg)
HYPER – ADAPTIVE EXECUTION
54
![Page 55: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/55.jpg)
HYPER – ADAPTIVE EXECUTION
55
Optimizer(0.2 ms)
SQL Query
![Page 56: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/56.jpg)
HYPER – ADAPTIVE EXECUTION
56
Optimizer(0.2 ms)
SQL Query
Code Generator(0.7 ms)
Query Plan
![Page 57: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/57.jpg)
HYPER – ADAPTIVE EXECUTION
57
Optimizer(0.2 ms)
SQL Query
Code Generator(0.7 ms)
Query Plan
![Page 58: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/58.jpg)
HYPER – ADAPTIVE EXECUTION
58
Optimizer(0.2 ms)
SQL Query
Code Generator(0.7 ms)
Query Plan
Byte Code Compiler(0.4 ms)
LLVM IR
![Page 59: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/59.jpg)
HYPER – ADAPTIVE EXECUTION
59
Optimizer(0.2 ms)
Byte Code
SQL Query
Code Generator(0.7 ms)
Query Plan
Byte Code Compiler(0.4 ms)
LLVM IR
![Page 60: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/60.jpg)
HYPER – ADAPTIVE EXECUTION
60
Optimizer(0.2 ms)
Byte Code
SQL Query
Code Generator(0.7 ms)
Query Plan
Byte Code Compiler(0.4 ms)
Unoptimized LLVM Compiler
(6 ms)
LLVM IR
LLVM IR
x86 Code
![Page 61: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/61.jpg)
HYPER – ADAPTIVE EXECUTION
61
Optimizer(0.2 ms)
Byte Code
SQL Query
Code Generator(0.7 ms)
Query Plan
LLVM Passes(25 ms)
Byte Code Compiler(0.4 ms)
Unoptimized LLVM Compiler
(6 ms)
LLVM IR
LLVM IR
LLVM IR
x86 Code
![Page 62: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/62.jpg)
HYPER – ADAPTIVE EXECUTION
62
Optimizer(0.2 ms)
Byte Code
SQL Query
Code Generator(0.7 ms)
Query Plan
LLVM Passes(25 ms)
Byte Code Compiler(0.4 ms)
Unoptimized LLVM Compiler
(6 ms)
Optimized LLVM Compiler
(17 ms)
LLVM IR
LLVM IR
LLVM IR
LLVM IR
x86 Code
x86 Code
![Page 63: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/63.jpg)
REAL-WORLD IMPLEMENTATIONS
IBM System ROracleMicrosoft HekatonCloudera ImpalaActian Vector
63
MemSQLVitesseDBApache SparkPeloton
![Page 64: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/64.jpg)
IBM SYSTEM R
A primitive form of code generation and query compilation was used by IBM in 1970s.→ Compiled SQL statements into assembly code by
selecting code templates for each operator.
Technique was abandoned when IBM built DB2:→ High cost of external function calls→ Poor portability→ Software engineer complications
64
A HISTORY AND EVALUATION OF SYSTEM RCOMMUNICATIONS OF THE ACM 1981
![Page 65: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/65.jpg)
ORACLE
Convert PL/SQL stored procedures into Pro*C code and then compiled into native C/C++ code.
They also put Oracle-specific operations directlyin the SPARC chips as co-processors.→ Memory Scans→ Bit-pattern Dictionary Compression→ Vectorized instructions designed for DBMSs→ Security/encryption
65
![Page 66: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/66.jpg)
MICROSOFT HEKATON
Can compile both procedures and SQL.→ Non-Hekaton queries can access Hekaton tables through
compiled inter-operators.Generates C code from an imperative syntax tree, compiles it into DLL, and links at runtime.
Employs safety measures to prevent somebody from injecting malicious code in a query.
66
COMPILATION IN THE MICROSOFT SQL SERVER HEKATON ENGINEIEEE DATA ENGINEERING BULLETIN 2011
![Page 67: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/67.jpg)
CLOUDERA IMPALA
LLVM JIT compilation for predicate evaluation and record parsing.→ Not sure if they are also doing operator compilation.
Optimized record parsing is important for Impala because they need to handle multiple data formats stored on HDFS.
67
IMPALA: A MODERN, OPEN-SOURCE SQL ENGINE FOR HADOOPCIDR 2015
![Page 68: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/68.jpg)
MEMSQL (PRE–2016)
Performs the same C/C++ code generation as HIQUE and then invokes gcc.Converts all queries into a parameterized form and caches the compiled query plan.
68
![Page 69: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/69.jpg)
MEMSQL (PRE–2016)
Performs the same C/C++ code generation as HIQUE and then invokes gcc.Converts all queries into a parameterized form and caches the compiled query plan.
69
SELECT * FROM A WHERE A.id = 123
![Page 70: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/70.jpg)
MEMSQL (PRE–2016)
Performs the same C/C++ code generation as HIQUE and then invokes gcc.Converts all queries into a parameterized form and caches the compiled query plan.
70
SELECT * FROM A WHERE A.id = ?
SELECT * FROM A WHERE A.id = 123
![Page 71: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/71.jpg)
MEMSQL (PRE–2016)
Performs the same C/C++ code generation as HIQUE and then invokes gcc.Converts all queries into a parameterized form and caches the compiled query plan.
71
SELECT * FROM A WHERE A.id = ?
SELECT * FROM A WHERE A.id = 123
![Page 72: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/72.jpg)
MEMSQL (PRE–2016)
Performs the same C/C++ code generation as HIQUE and then invokes gcc.Converts all queries into a parameterized form and caches the compiled query plan.
72
SELECT * FROM A WHERE A.id = ?
SELECT * FROM A WHERE A.id = 123
SELECT * FROM A WHERE A.id = 456
![Page 73: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/73.jpg)
MEMSQL (2016–PRESENT)
A query plan is converted into an imperative plan expressed in a high-level imperative DSL.→ MemSQL Programming Language (MPL)→ Think of this as a C++ dialect.The DSL then gets converted into a second language of opcodes.→ MemSQL Bit Code (MBC)→ Think of this as JVM byte code.Finally the DBMS compiles the opcodes into LLVM IR and then to native code.
73
Source: Drew Paroski
![Page 74: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/74.jpg)
APACHE SPARK
Introduced in the new Tungsten engine in 2015.The system converts a query's WHERE clause expression trees into ASTs.It then compiles these ASTs to generate JVM bytecode, which is then executed natively.
74
SPARK SQL: RELATIONAL DATA PROCESSING IN SPARKSIGMOD 2015
![Page 75: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/75.jpg)
PELOTON
Full compilation of the entire query plan.
Relax the pipeline breakers of HyPer to create mini-batches for operators that can be vectorized.
Use software pre-fetching to hide memory stalls.
75
RELAXED OPERATOR FUSION FOR IN-MEMORY DATABASES: MAKING COMPILATION, VECTORIZATION, AND PREFETCHING WORK TOGETHER AT LASTVLDB 2017
![Page 76: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/76.jpg)
PELOTON
76
8814726350
87473
996021500
901 13962641
383 540892 8461763
191 220
1
10
100
1000
10000
100000
Q1 Q3 Q13 Q14 Q19
Exec
utio
n Ti
me (
ms)
Interpreted LLVM LLVM + ROF
Dual Socket Intel Xeon E5-2630v4 @ 2.20GHzTPC-H 10 GB Database
Source: Prashanth Menon
![Page 77: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/77.jpg)
PARTING THOUGHTS
Query compilation makes a difference but is non-trivial to implement.
The 2016 version of MemSQL is the best query compilation implementation out there.Hekaton is very good too.
Any new DBMS that wants to compete has to implement query compilation.
77
![Page 78: GT 4420/6422 // SPRING 2019 // @JOY ARULRAJ LECTURE #13 ...jarulraj/courses/4420-s19/slides/13... · CODE GENERATION Approach #1: Transpilation →Write code that converts a relational](https://reader033.vdocument.in/reader033/viewer/2022043004/5f870c3a7f0ee66e7217ad2c/html5/thumbnails/78.jpg)
NEXT CLASS
Query Optimization
78