deep dive into project tungsten: bringing spark closer to bare metal-(josh rosen, databricks)

28
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal Josh Rosen (@ jshrsn ) June 16, 2015

Upload: spark-summit

Post on 12-Aug-2015

1.171 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Rosen, Databricks)

Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal Josh Rosen (@jshrsn) June 16, 2015

Page 2: Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Rosen, Databricks)

About Databricks

Offers a hosted service: •  Spark on EC2 •  Notebooks •  Plot visualizations •  Cluster management •  Scheduled jobs

2

Founded by creators of Spark and remains largest contributor

Page 3: Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Rosen, Databricks)

Goals of Project Tungsten

Substantially improve the memory and CPU efficiency of Spark applications . Push performance closer to the limits of modern hardware.

3

Page 4: Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Rosen, Databricks)

In this talk

4

• Motivation: why we’re focusing on compute instead of IO • How Tungsten optimizes memory + CPU • Case study: aggregation • Case study: record sorting • Performance results • Roadmap + next steps

Page 5: Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Rosen, Databricks)

Many big data workloads are now compute bound

5

NSDI’15:

•  “Network optimizations can only reduce job completion time by a median of at most 2%.”

•  “Optimizing or eliminating disk accesses can only reduce job completion time by a median of at most 19%.”

•  We’ve observed similar characteristics in many Databricks Cloud customer workloads.

Page 6: Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Rosen, Databricks)

Why is CPU the new bottleneck?

6

•  Hardware has improved: –  Increasingly large aggregate IO bandwidth, such as 10Gbps links in

networks –  High bandwidth SSD’s or striped HDD arrays for storage

•  Spark’s IO has been optimized: –  many workloads now avoid significant disk IO by pruning input data

that is not needed in a given job –  new shuffle and network layer implementations

•  Data formats have improved: –  Parquet, binary data formats

•  Serialization and hashing are CPU-bound bottlenecks

Page 7: Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Rosen, Databricks)

How Tungsten improves CPU & memory efficiency •  Memory Management and Binary Processing: leverage

application semantics to manage memory explicitly and eliminate the overhead of JVM object model and garbage collection

•  Cache-aware computation: algorithms and data structures to exploit memory hierarchy

•  Code generation: exploit modern compilers and CPUs; allow efficient operation directly on binary data

7

Page 8: Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Rosen, Databricks)

8

Page 9: Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Rosen, Databricks)

The overheads of Java objects

“abcd”

9

•  Native: 4 bytes with UTF-8 encoding •  Java: 48 bytes

java.lang.String object internals:OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) ... 4 4 (object header) ... 8 4 (object header) ... 12 4 char[] String.value [] 16 4 int String.hash 0 20 4 int String.hash32 0Instance size: 24 bytes (reported by Instrumentation API)

12 byte object header

8 byte hashcode 20 bytes of overhead + 8 bytes for chars

Page 10: Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Rosen, Databricks)

Garbage collection challenges

•  Many big data workloads create objects in ways that are unfriendly to regular Java GC.

•  Guest blog on GC tuning: tinyurl.com/db-gc-tuning

10

eden   S0   S1   tenured   permanent  

Permanent Generation Old Generation Young Generation

Survivor Space

Page 11: Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Rosen, Databricks)

sun.misc.Unsafe

11

•  JVM internal API for directly manipulating memory without safety checks (hence “unsafe”)

•  We use this API to build data structures in both on- and off-heap memory

Data  structures  

with  pointers  

Flat  data  structures  

Complex  examples  

Page 12: Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Rosen, Databricks)

Java object-based row representation

12

3 fields of type (int, string, string) with value (123, “data”, “bricks”)

GenericMutableRow  

Array   String(“data”)  

String(“bricks”)  

5+ objects; high space overhead; expensive hashCode()

BoxedInteger(123)  

Page 13: Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Rosen, Databricks)

Tungsten’s UnsafeRow format

13

•  Bit set for tracking null values •  Every column appears in the fixed-length values region:

–  Small values are inlined –  For variable-length values (strings), we store a relative offset into the variable-

length data section •  Rows are always 8-byte word aligned (size is multiple of 8 bytes) •  Equality comparison and hashing can be performed on raw bytes without

requiring additional interpretation

null  bit  set  (1  bit/field)    

values  (8  bytes  /  field)    

 variable  length  

 Offset to var. length data

Page 14: Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Rosen, Databricks)

6 “bricks”

Example of an UnsafeRow

14

0x0 123 32L 48L 4 “data”

(123, “data”, “bricks”)

Null tracking bitmap

Offset to var. length data

Offset to var. length data Field lengths

Page 15: Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Rosen, Databricks)

How we encode memory addresses

15

•  Off heap: addresses are raw memory pointers. •  On heap: addresses are base object + offset pairs. •  We use our own “page table” abstraction to enable more

compact encoding of on-heap addresses:

0  1  …  

N  –  1  Page table

Data  page  (Java  object)  

page   offset  in  page  

Page 16: Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Rosen, Databricks)

16

java.util.HashMap

…  

key  ptr   value  ptr   next  

key   value  

array

•  Huge object overheads

•  Poor memory locality

•  Size estimation is hard

Page 17: Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Rosen, Databricks)

Memory  page  

hc  

17

Tungsten’s BytesToBytesMap

ptr  

…  

array

•  Low space overheads

•  Good memory locality, especially for scans

key   value   key   value  key   value   key   value  

key   value   key   value  

Page 18: Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Rosen, Databricks)

Code generation •  Generic evaluation of expression logic

is very expensive on the JVM –  Virtual function calls –  Branches based on expression type –  Object creation due to primitive boxing –  Memory consumption by boxed

primitive objects •  Generating custom bytecode can

eliminate these overheads

18

9.33

9.36

36.65

Hand written

Code gen

Interpreted Projection

Evaluating “SELECT a + a + a” (query time in seconds)

Page 19: Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Rosen, Databricks)

Code generation •  Project Tungsten uses the Janino compiler to reduce code generation time. •  Spark 1.5 will greatly expand the number of expressions that support code

generation: –  SPARK-8159

19

Page 20: Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Rosen, Databricks)

Example: aggregation optimizations in DataFrames and Spark SQL

20

df.groupBy("department").agg(max("age"), sum("expense"))

Page 21: Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Rosen, Databricks)

Example: aggregation optimizations in DataFrames and Spark SQL

21

Input  Row   Grouping  Key   UnsafeRow  project convert

BytesToBytesMap   scan

Update  Aggregates  

Agg.  Result  

update in place

probe

SPARK-7080

Page 22: Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Rosen, Databricks)

Optimized record sorting in Spark SQL + DataFrames (SPARK-7082)

22

pointer  

•  AlphaSort-style prefix sort: –  Store prefixes of sort keys inside the sort pointer array –  During sort, compare prefixes to short-circuit and avoid full record comparisons

•  Use this to build external sort-merge join to support joins larger than memory

record  

Key  prefix   pointer   record  

Naïve layout

Cache friendly layout

Page 23: Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Rosen, Databricks)

Initial performance results for agg. query

23

0

200

400

600

800

1000

1200

1x 2x 4x 8x 16x

Run time (seconds)

Data set size (relative)

Default

Code Gen

Tungsten onheap

Tungsten offheap

Page 24: Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Rosen, Databricks)

Initial performance results for agg. query

24

0

50

100

150

200

1x 2x 4x 8x 16x

Average GC time per

node (seconds)

Data set size (relative)

Default

Code Gen

Tungsten onheap

Tungsten offheap

Page 25: Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Rosen, Databricks)

Project Tungsten Roadmap

25

Spark  1.4   Spark  1.5   Spark  1.6  

•  Binary processing for aggregation in Spark SQL / DataFrames

•  New Tungsten shuffle manager

•  Compression & serialization optimizations

•  Optimized code generation

•  Optimized sorting in Spark SQL / DataFrames

•  End-to-end processing using binary data representations

•  External aggregation

•  Vectorized / batched processing

•  ???

Page 26: Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Rosen, Databricks)

Which Spark jobs can benefit from Tungsten?

26

•  DataFrames –  Java –  Scala –  Python –  R

•  Spark SQL queries •  Some Spark RDD API programs, via general serialization + compression

optimizations

logs.join( !"users, !"logs.userId == users.userId, !""left_outer") \ !

.groupBy("userId").agg({"*": "count"}) !

Page 27: Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Rosen, Databricks)

How to enable all of Spark 1.4’s Tungsten optimizations

27

spark.sql.codegen = truespark.sql.unsafe.enabled = truespark.shuffle.manager = tungsten-sort

Warning!  These  features  are  experimental  in  1.4!  

Page 28: Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Rosen, Databricks)

Thank you. Follow our progress on JIRA: SPARK-7075