orc 2015

29
Page 1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ORC: 2015 Gopal Vijayaraghavan

Upload: t3rmin4t0r

Post on 16-Jul-2015

134 views

Category:

Software


5 download

TRANSCRIPT

Page 1: ORC 2015

Page 1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

ORC: 2015

Gopal Vijayaraghavan

Page 2: ORC 2015

Page 2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

ORC – Optimized Row-Columnar File

Columnar Storage+

Row-groups & Fixed splits

Protobuf Metadata Storage+

+

Type-safe Vectorization+

Hive ACID transactions+

Single SerDe for Format+

Page 3: ORC 2015

Page 3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Need for Speed: The Stinger Initiative

Stinger: An Open Roadmap to improve Apache Hive’s performance 100x.

Launched: February 2013; Delivered: April 2014.

Delivered in 100% Apache Open Source.

SQL Engine

VectorizedSQL Engine

ColumnarStorage

ORC

= 100X+ +

Distributed Execution

Apache Tez

Page 4: ORC 2015

Page 4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

ORC at Facebook

Saved more than 1,400 servers worth of storage.

CompressioniCompression ratio increased from 5x to 8xglobally.

Compressioni

[1]

Page 5: ORC 2015

Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

ORC at Spotify

16x less HDFS read when using ORC versus Avro.(5)

IOi32x less CPU when using ORC versus Avro.(5)

CPUi

[2]

Page 6: ORC 2015

Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

ORC: Today

What is Optimized about ORC?

Page 7: ORC 2015

Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

ORC – Optimized Row-Columnar File

Columnar Storage+

Row-groups & Stripe splits

Protobuf Metadata Storage+

+

Type-safe Vectorization+

Hive ACID transactions+

Single SerDe for Format+

Page 8: ORC 2015

Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Columnar Storage

Storage Performance

● Compress each column differently

● Detect & compress common sub-sequences

● Auto-increment ids

● String Enums

● Large Integers (uid scale)

● Unique strings (UUIDS)

Read Performance

● Column projection

● Columnar deserializers

● Data locality

Write Throughput

● Stats auto-gather

Page 9: ORC 2015

Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Row-groups & Stripe splits

Split Parallelism

● Effective parallelism

● No seeks to find boundaries

● No splits with zero data

● Decompress fixed chunks

Stripes

● Single unsplittable chunk

● Will reside in 1 HDFS block entirely

● Is self-contained for all read ops

Page 10: ORC 2015

Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

A Single SerDe for all ORC Files

A Single Writer

● No mismatch of serialization

● Forward compatibility

Readers

● Multiple reader implementations

● Allows for vector readers

● And row-mode readers

● Similar loop – good JIT hit-rate

Page 11: ORC 2015

Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Protobuf Metadata Storage

Standardized Metadata

● Readers are easier to write

● Metadata readers are auto-generated

Metadata Forward Compatibility

● Protobuf Optional fields

Statistics Storage in Metadata

● Standard serialization for stats

● Allows for PPD into the IO layer

Page 12: ORC 2015

Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Type-safe Vectorization

Schema on Write

● Write ORC Structs with types

● SerDe & Inputformat

Read Performance

● Data is read with few copies

● Primitive types are fast

● Primitives are also unboxed

● Predicates are typed too

Page 13: ORC 2015

Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

ORC: ETL Improvements

Always more new data

Page 14: ORC 2015

Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

ORC (Zlib): Compress Differently

674

389433

ORC (old zlib) ORC SNAPPY ORC (new zlib)

ETL for TPC-H LineItem (scale 1 Tb)

Time Taken

Different Zlib algorithms for encoding

● Z_FILTERED

● Z_DEFAULT

● Z_BEST_SPEED

● Z_DEFAULT_COMPRESSION

In detail

● Compress IS_NULL bitsets lightly

● Compress Integers differently from Doubles

● Compress string dictionaries differently

● Allow for user choice

Page 15: ORC 2015

Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

ORC (Zlib): Compress Differently

Different Zlib algorithms for encoding

● Z_FILTERED

● Z_DEFAULT

● Z_BEST_SPEED

● Z_DEFAULT_COMPRESSION

In detail

● Compress IS_NULL bitsets lightly

● Compress Integers differently from Doubles

● Compress string dictionaries differently

● Allow for user choice

178.5

225.1

172.2

ORC (old zlib) ORC SNAPPY ORC (new zlib)

Data Sizes for TPC-H Lineitem (Scale 1 Tb)

Size on Disk

Page 16: ORC 2015

Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Using JDK8 SIMD: Integer Writers

Integer encodings

● Base + Delta

● Run-length

● Direct

Trade-off for Size/Speed

● Use fixed bit-width loops

● Snap to nearest bit-width

0

200

400

600

800

1000

1200

1400

1600

1800

2000

1 2 4 8 16 24 32 40 48 56 64

Me

an T

ime

(m

s)

Bit Width

ORC Write Integer Performance(smaller better)

hive 0.13 bitpacking

hive 1.0 bitpacking (new)

Page 17: ORC 2015

Page 17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Double Writers

273.331

247.634231.741

0

50

100

150

200

250

300

old buffered + BE buffered + LE

Me

an T

ime

(m

s)

Double Write Modes

ORC Write Double Performance(smaller is better)

Double Writers

● JVM is big-endian

● X86 is little-endian

● Special handling of NaN

Page 18: ORC 2015

Page 18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

ORC: Scale compression buffers

269.4263.3

258.5 258.4 258.4 258.4

184.8 183.5 182.2 180.1 178.3 177.4

140

160

180

200

220

240

260

280

300

320

8 16 32 64 128 256

SizeinM

B

CompressionBufferSizeinKB

FileSize

ZLIB

SNAPPY

Large Columns vs More Columns

● Adjust when >1000 columns

Trade offs

● Compression

● Low memory use

More additions

● Dynamically partitioned insert

Page 19: ORC 2015

Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

ORC: Streaming Ingest + ACID

Broken pattern: Partitions for Atomicity-

- Isolation & Consistency on retries+

Transactions are pluggable (txn.manager)+

Cache/Replication friendly (base + deltas)+

Page 20: ORC 2015

Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

ORC: LLAP and Sub-second

ORC – Pushing for Sub-second

Page 21: ORC 2015

Page 21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

ORC: Row Indexes

Min-Max pruning

● Evaluate on statistics

Bloom filters

● Better String filters

● Filter a random distribution

LLAP Future

● Row-level vector SARGs

5999989709

540,000

10,000

No Indexes Min-Max Indexes Bloomfilter Indexes

from tpch_1000.lineitem where l_orderkey = 1212000001;

(log scale)

Rows Read

Page 22: ORC 2015

Page 22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

ORC: Row Indexes

Min-Max pruning

● Evaluate on Statistics

Bloom filters

● Better String filters

● Filter a random distribution

LLAP Future

● Row-level vector SARGs

74

4.5 1.34

No Indexes Min-Max Indexes Bloomfilter Indexes

* from tpch_1000.lineitem where l_orderkey=1212000001;(smaller better)

Time Taken (seconds)

Page 23: ORC 2015

Page 23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

ORC: JDK8 SIMD Readers

Integer encodings

● Base + Delta

● Run-length

● Direct

Trade-off for Size/Speed

● Use fixed bit-width loops

● Snap to nearest bit-width

0

200

400

600

800

1000

1200

1400

1600

1800

1 2 4 8 16 24 32 40 48 56 64

Me

an T

ime

(m

s)

Bit Width

ORC Read Integer Performance

hive 0.13 unpacking

hive-1.0 unpacking (new)

Page 24: ORC 2015

Page 24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

ORC: Vectorization + SIMD

Advantage of a Single SerDe

● Primitive Types

Allocation free tight inner loops

● JDK8 has auto-vectorization

Vectorized Early Filter

● Vectors can be filtered early in ORC

● StringDictionary can be used to binary-search

Vectorized SIMD Join

● Performance for single key joins

0x00007f13d2e6afb0: vmovdqu 0x10(%rsi,%rax,8),%ymm20x00007f13d2e6afb6: vaddpd %ymm1,%ymm2,%ymm20x00007f13d2e6afba: movslq %eax,%r100x00007f13d2e6afbd: vmovdqu 0x30(%rsi,%r10,8),%ymm3

;*daload vector.expressions.gen.DoubleColAddDoubleColumn::evaluate (line 94)

0x00007f13d2e6afc4: vmovdqu %ymm2,0x10(%rdx,%rax,8)0x00007f13d2e6afca: vaddpd %ymm1,%ymm3,%ymm20x00007f13d2e6afce: vmovdqu %ymm2,0x30(%rdx,%r10,8)

;*dastore vector.expressions.gen.DoubleColAddDoubleColumn::evaluate (line 94)

Page 25: ORC 2015

Page 25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

ORC: Split Strategies + Tez Grouping

Amdahl’s Law

● As fast as the slowest task

● Slice work thinly, but not too thin

Split-generation vs Execution time

● ETL

● BI

● Hybrid

Split-grouping & estimation

● ColumnarSplit size

● Group by estimate, not file size

● Bucket pruning

Slow split

Page 26: ORC 2015

Page 26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

ORC: LLAP

- JIT Performance for short queries+

Row-group level caching+

Asynchronous IO Elevator+

+ Multi-threaded Column Vector processing+

Page 27: ORC 2015

Page 27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

ORC: LLAP (+ SIMD + Split Strategies + Row Indexes)

Page 28: ORC 2015

Page 28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Questions?

?Interested? Stop by the Hortonworks booth to learn more

Page 29: ORC 2015

Page 29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Endnotes

(1) https://code.facebook.com/posts/229861827208629/scaling-the-facebook-data-warehouse-to-300-pb/

(2) http://www.slideshare.net/AdamKawa/a-perfect-hive-query-for-a-perfect-meeting-hadoop-summit-2014