![Page 1: Boosting spark performance: An Overview of Techniques](https://reader034.vdocument.in/reader034/viewer/2022042723/58cf8f081a28ab65538b4979/html5/thumbnails/1.jpg)
Boosting Spark Performance: An Overview of Techniques
Ahsan Javed Awan
![Page 2: Boosting spark performance: An Overview of Techniques](https://reader034.vdocument.in/reader034/viewer/2022042723/58cf8f081a28ab65538b4979/html5/thumbnails/2.jpg)
MotivationAbout me
● Erasmus Mundus Joint Doctoral Fellow at KTH Sweden and UPC Spain.● Visiting Researcher at Barcelona Super Computing Center.● Speaker at Spark Summit Europe 2016.● Written Licentiate Thesis, “Performance Characterization of In-Memory Data Analytics
with Apache Spark”● https://www.kth.se/profile/ajawan/
![Page 3: Boosting spark performance: An Overview of Techniques](https://reader034.vdocument.in/reader034/viewer/2022042723/58cf8f081a28ab65538b4979/html5/thumbnails/3.jpg)
MotivationWhy should you listen?
● What's new in Apache Spark 2.0● Phase 1: Memory Management and Cache-aware algorithms● Phase 2: Whole-stage Codegen and Columnar In-Memory Support
● How to get better performance by ● Choosing and Tuning GC.● Using multiple executors each with the heap size of not more than 32 GB. ● Exploiting Data Locality on DRAM nodes.● Turning off Hardware Pre-fetchers● Keeping the Hyper-Threading on
![Page 4: Boosting spark performance: An Overview of Techniques](https://reader034.vdocument.in/reader034/viewer/2022042723/58cf8f081a28ab65538b4979/html5/thumbnails/4.jpg)
MotivationApache Spark Philosophy?
![Page 5: Boosting spark performance: An Overview of Techniques](https://reader034.vdocument.in/reader034/viewer/2022042723/58cf8f081a28ab65538b4979/html5/thumbnails/5.jpg)
MotivationCont...
I
*Source: http://navcode.info/2012/12/24/cloud-scaling-schemes/
Phoenix ++,Metis, Ostrich, etc..
Hadoop, Spark,Flink, etc..
![Page 6: Boosting spark performance: An Overview of Techniques](https://reader034.vdocument.in/reader034/viewer/2022042723/58cf8f081a28ab65538b4979/html5/thumbnails/6.jpg)
MotivationCont..
*Source: SGI
● Exponential increase in core count.● A mismatch between the characteristics of emerging big data workloads and the
underlying hardware.● Newer promising technologies (Hybrid Memory Cubes, NVRAM etc)
● Clearing the clouds, ASPLOS' 12● Characterizing data analysis
workloads, IISWC' 13● Understanding the behavior of in-
memory computing workloads, IISWC' 14
![Page 7: Boosting spark performance: An Overview of Techniques](https://reader034.vdocument.in/reader034/viewer/2022042723/58cf8f081a28ab65538b4979/html5/thumbnails/7.jpg)
Substantially improve the memory and CPU efficiency of Spark backend execution and push performance closer to the limits of modern hardware.
Goals of Project Tungsten
![Page 8: Boosting spark performance: An Overview of Techniques](https://reader034.vdocument.in/reader034/viewer/2022042723/58cf8f081a28ab65538b4979/html5/thumbnails/8.jpg)
Phase 1
Foundation
Memory Management
Code Generation
Cache-aware Algorithms
Phase 2
Order-of-magnitude Faster
Whole-stage Codegen
Vectorization
Cont..
![Page 9: Boosting spark performance: An Overview of Techniques](https://reader034.vdocument.in/reader034/viewer/2022042723/58cf8f081a28ab65538b4979/html5/thumbnails/9.jpg)
Perform explicit memory management instead of relying on Java objects• Reduce memory footprint• Eliminate garbage collection overheads• Use sun.misc.unsafe rows and off heap memory
Code generation for expression evaluation• Reduce virtual function calls and interpretation overhead
Cache conscious sorting• Reduce bad memory access patterns
Summary of Phase I
![Page 10: Boosting spark performance: An Overview of Techniques](https://reader034.vdocument.in/reader034/viewer/2022042723/58cf8f081a28ab65538b4979/html5/thumbnails/10.jpg)
Progress Meeting 12-12-14Which Benchmarks ?
![Page 11: Boosting spark performance: An Overview of Techniques](https://reader034.vdocument.in/reader034/viewer/2022042723/58cf8f081a28ab65538b4979/html5/thumbnails/11.jpg)
Our Hardware Configuration
Which Machine ?
Intel's Ivy Bridge Server
![Page 12: Boosting spark performance: An Overview of Techniques](https://reader034.vdocument.in/reader034/viewer/2022042723/58cf8f081a28ab65538b4979/html5/thumbnails/12.jpg)
Performance of Cache-aware algorithms ?
DataFrame exhibit 25% less back-end bound stalls 64% less DRAM bound stalled cycles
25% less BW consumption10% less starvation of execution resources
![Page 13: Boosting spark performance: An Overview of Techniques](https://reader034.vdocument.in/reader034/viewer/2022042723/58cf8f081a28ab65538b4979/html5/thumbnails/13.jpg)
Difficult to get order of magnitude performance speed ups with profiling techniques• For 10x improvement, would need of find top hotspots that add up to
90% and make them instantaneous• For 100x, 99%
Instead, look bottom up, how fast should it run?
Phase 2
![Page 14: Boosting spark performance: An Overview of Techniques](https://reader034.vdocument.in/reader034/viewer/2022042723/58cf8f081a28ab65538b4979/html5/thumbnails/14.jpg)
Scan
Filter
Project
Aggregate
select count(*) from store_saleswhere ss_item_sk = 1000
Cont..
![Page 15: Boosting spark performance: An Overview of Techniques](https://reader034.vdocument.in/reader034/viewer/2022042723/58cf8f081a28ab65538b4979/html5/thumbnails/15.jpg)
Standard for 30 years: almost all databases do it
Each operator is an “iterator” that consumes records from its input operator
class Filter( child: Operator, predicate: (Row => Boolean)) extends Operator { def next(): Row = { var current = child.next() while (current == null ||predicate(current)) { current = child.next() } return current }}
Volcano Iterator Model
![Page 16: Boosting spark performance: An Overview of Techniques](https://reader034.vdocument.in/reader034/viewer/2022042723/58cf8f081a28ab65538b4979/html5/thumbnails/16.jpg)
select count(*) from store_saleswhere ss_item_sk = 1000
long count = 0;for (ss_item_sk in store_sales) { if (ss_item_sk == 1000) { count += 1; }}
Hand Written Code
![Page 17: Boosting spark performance: An Overview of Techniques](https://reader034.vdocument.in/reader034/viewer/2022042723/58cf8f081a28ab65538b4979/html5/thumbnails/17.jpg)
Volcano 13.95 millionrows/sec
collegefreshman
125 millionrows/sec
Note: End-to-end, single thread, single column, and data originated in Parquet on disk
High throughput
Volcano Model vs Hand Written Code
![Page 18: Boosting spark performance: An Overview of Techniques](https://reader034.vdocument.in/reader034/viewer/2022042723/58cf8f081a28ab65538b4979/html5/thumbnails/18.jpg)
Volcano Model1. Too many virtual function calls
2. Intermediate data in memory (or L1/L2/L3 cache)
3. Can’t take advantage of modern CPU features -- no loop unrolling, SIMD, pipelining, prefetching, branch prediction etc.
Hand-written code
1. No virtual function calls
2. Data in CPU registers
3. Compiler loop unrolling, SIMD, pipelining
Volcano vs Hand Written Code
![Page 19: Boosting spark performance: An Overview of Techniques](https://reader034.vdocument.in/reader034/viewer/2022042723/58cf8f081a28ab65538b4979/html5/thumbnails/19.jpg)
Fusing operators together so the generated code looks like hand optimized code:
- Identify chains of operators (“stages”)- Compile each stage into a single function- Functionality of a general purpose execution engine;
performance as if hand built system just to run your query
Whole-Stage Codegen
![Page 20: Boosting spark performance: An Overview of Techniques](https://reader034.vdocument.in/reader034/viewer/2022042723/58cf8f081a28ab65538b4979/html5/thumbnails/20.jpg)
mike
In-memoryRow Format
1 john 4.1
2 3.5
3 sally 6.4
1 2 3
john mike sally
4.1 3.5 6.4
In-memoryColumn Format
Columnar In-Memory
![Page 21: Boosting spark performance: An Overview of Techniques](https://reader034.vdocument.in/reader034/viewer/2022042723/58cf8f081a28ab65538b4979/html5/thumbnails/21.jpg)
1. More efficient: denser storage, regular data access, easier to index into
2. More compatible: Most high-performance external systems are already columnar (numpy, TensorFlow, Parquet); zero serialization/copy to work with them
3. Easier to extend: process encoded data
Why Columnar?
![Page 22: Boosting spark performance: An Overview of Techniques](https://reader034.vdocument.in/reader034/viewer/2022042723/58cf8f081a28ab65538b4979/html5/thumbnails/22.jpg)
Parquet 11 millionrows/sec
Parquetvectorized
90 millionrows/sec
Note: End-to-end, single thread, single column, and data originated in Parquet on disk
High throughput
![Page 23: Boosting spark performance: An Overview of Techniques](https://reader034.vdocument.in/reader034/viewer/2022042723/58cf8f081a28ab65538b4979/html5/thumbnails/23.jpg)
Phase 1
Spark 1.4 - 1.6
Memory Management
Code Generation
Cache-aware Algorithms
Phase 2
Spark 2.0+
Whole-stage Code Generation
Columnar in Memory Support
Both whole stage codegen [SPARK-12795] and the vectorizedparquet reader [SPARK-12992] are enabled by default in Spark 2.0+
![Page 24: Boosting spark performance: An Overview of Techniques](https://reader034.vdocument.in/reader034/viewer/2022042723/58cf8f081a28ab65538b4979/html5/thumbnails/24.jpg)
5-30xSpeedups
Operator Benchmarks: Cost/Row (ns)
![Page 25: Boosting spark performance: An Overview of Techniques](https://reader034.vdocument.in/reader034/viewer/2022042723/58cf8f081a28ab65538b4979/html5/thumbnails/25.jpg)
1. SPARK-16026: Cost Based Optimizer- Leverage table/column level statistics to optimize joins and aggregates
- Statistics Collection Framework (Spark 2.1)
- Cost Based Optimizer (Spark 2.2)
2. Boosting Spark’s Performance on Many-Core Machines- Qifan’s Talk Today at 2:55pm (Research Track)
- In-memory/ single node shuffle
3. Improving quality of generated code and better integration with the in-memory column format in Spark
Spark 2.1, 2.2 and beyond
![Page 26: Boosting spark performance: An Overview of Techniques](https://reader034.vdocument.in/reader034/viewer/2022042723/58cf8f081a28ab65538b4979/html5/thumbnails/26.jpg)
MotivationThe choice of Garbage Collector impact the data processing
capability of the system
Improvement in DPS ranges from 1.4x to 3.7x on average in Parallel Scavenge as compared to G1
![Page 27: Boosting spark performance: An Overview of Techniques](https://reader034.vdocument.in/reader034/viewer/2022042723/58cf8f081a28ab65538b4979/html5/thumbnails/27.jpg)
Our ApproachMultiple Small executors instead of single large executor
Multiple small executors can provide up-to 36% performance gain
![Page 28: Boosting spark performance: An Overview of Techniques](https://reader034.vdocument.in/reader034/viewer/2022042723/58cf8f081a28ab65538b4979/html5/thumbnails/28.jpg)
Our ApproachNUMA Awareness
NUMA Awareness results in 10% speed up on average
![Page 29: Boosting spark performance: An Overview of Techniques](https://reader034.vdocument.in/reader034/viewer/2022042723/58cf8f081a28ab65538b4979/html5/thumbnails/29.jpg)
Our ApproachHyper Threading is effective
Hyper threading reduces the DRAM bound stalls by 50%
![Page 30: Boosting spark performance: An Overview of Techniques](https://reader034.vdocument.in/reader034/viewer/2022042723/58cf8f081a28ab65538b4979/html5/thumbnails/30.jpg)
Our ApproachDisable next-line prefetchers
Disabling next-line prefetchers can improve the performance by 15%
![Page 31: Boosting spark performance: An Overview of Techniques](https://reader034.vdocument.in/reader034/viewer/2022042723/58cf8f081a28ab65538b4979/html5/thumbnails/31.jpg)
Our ApproachFurther Reading
● Performance characterization of in-memory data analytics on a modern cloud server, in 5th IEEE Conference on Big Data and Cloud Computing, 2015 (Best Paper Award).
● How Data Volume Affects Spark Based Data Analytics on a Scale-up Server in 6th Workshop on Big Data Benchmarks, Performance Optimization and Emerging Hardware (BpoE), held in conjunction with VLDB 2015, Hawaii, USA .
● Micro-architectural Characterization of Apache Spark on Batch and Stream Processing Workloads, in 6th IEEE Conference on Big Data and Cloud Computing, 2016.
● Node Architecture Implications for In-Memory Data Analytics in Scale-in Clusters in 3rd IEEE/ACM Conference in Big Data Computing, Applications and Technologies, 2016.
● Implications of In-Memory Data Analytics with Apache Spark on Near Data Computing Architectures (under submission).
![Page 32: Boosting spark performance: An Overview of Techniques](https://reader034.vdocument.in/reader034/viewer/2022042723/58cf8f081a28ab65538b4979/html5/thumbnails/32.jpg)
THANK YOU.
Acknowledgements: Sameer Agarwal for Project Tugsten slides