orc 2015
TRANSCRIPT
Page 1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: 2015
Gopal Vijayaraghavan
Page 2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC – Optimized Row-Columnar File
Columnar Storage+
Row-groups & Fixed splits
Protobuf Metadata Storage+
+
Type-safe Vectorization+
Hive ACID transactions+
Single SerDe for Format+
Page 3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Need for Speed: The Stinger Initiative
Stinger: An Open Roadmap to improve Apache Hive’s performance 100x.
Launched: February 2013; Delivered: April 2014.
Delivered in 100% Apache Open Source.
SQL Engine
VectorizedSQL Engine
ColumnarStorage
ORC
= 100X+ +
Distributed Execution
Apache Tez
Page 4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC at Facebook
Saved more than 1,400 servers worth of storage.
CompressioniCompression ratio increased from 5x to 8xglobally.
Compressioni
[1]
Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC at Spotify
16x less HDFS read when using ORC versus Avro.(5)
IOi32x less CPU when using ORC versus Avro.(5)
CPUi
[2]
Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: Today
What is Optimized about ORC?
Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC – Optimized Row-Columnar File
Columnar Storage+
Row-groups & Stripe splits
Protobuf Metadata Storage+
+
Type-safe Vectorization+
Hive ACID transactions+
Single SerDe for Format+
Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Columnar Storage
Storage Performance
● Compress each column differently
● Detect & compress common sub-sequences
● Auto-increment ids
● String Enums
● Large Integers (uid scale)
● Unique strings (UUIDS)
Read Performance
● Column projection
● Columnar deserializers
● Data locality
Write Throughput
● Stats auto-gather
Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Row-groups & Stripe splits
Split Parallelism
● Effective parallelism
● No seeks to find boundaries
● No splits with zero data
● Decompress fixed chunks
Stripes
● Single unsplittable chunk
● Will reside in 1 HDFS block entirely
● Is self-contained for all read ops
Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
A Single SerDe for all ORC Files
A Single Writer
● No mismatch of serialization
● Forward compatibility
Readers
● Multiple reader implementations
● Allows for vector readers
● And row-mode readers
● Similar loop – good JIT hit-rate
Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Protobuf Metadata Storage
Standardized Metadata
● Readers are easier to write
● Metadata readers are auto-generated
Metadata Forward Compatibility
● Protobuf Optional fields
Statistics Storage in Metadata
● Standard serialization for stats
● Allows for PPD into the IO layer
Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Type-safe Vectorization
Schema on Write
● Write ORC Structs with types
● SerDe & Inputformat
Read Performance
● Data is read with few copies
● Primitive types are fast
● Primitives are also unboxed
● Predicates are typed too
Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: ETL Improvements
Always more new data
Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC (Zlib): Compress Differently
674
389433
ORC (old zlib) ORC SNAPPY ORC (new zlib)
ETL for TPC-H LineItem (scale 1 Tb)
Time Taken
Different Zlib algorithms for encoding
● Z_FILTERED
● Z_DEFAULT
● Z_BEST_SPEED
● Z_DEFAULT_COMPRESSION
In detail
● Compress IS_NULL bitsets lightly
● Compress Integers differently from Doubles
● Compress string dictionaries differently
● Allow for user choice
Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC (Zlib): Compress Differently
Different Zlib algorithms for encoding
● Z_FILTERED
● Z_DEFAULT
● Z_BEST_SPEED
● Z_DEFAULT_COMPRESSION
In detail
● Compress IS_NULL bitsets lightly
● Compress Integers differently from Doubles
● Compress string dictionaries differently
● Allow for user choice
178.5
225.1
172.2
ORC (old zlib) ORC SNAPPY ORC (new zlib)
Data Sizes for TPC-H Lineitem (Scale 1 Tb)
Size on Disk
Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Using JDK8 SIMD: Integer Writers
Integer encodings
● Base + Delta
● Run-length
● Direct
Trade-off for Size/Speed
● Use fixed bit-width loops
● Snap to nearest bit-width
0
200
400
600
800
1000
1200
1400
1600
1800
2000
1 2 4 8 16 24 32 40 48 56 64
Me
an T
ime
(m
s)
Bit Width
ORC Write Integer Performance(smaller better)
hive 0.13 bitpacking
hive 1.0 bitpacking (new)
Page 17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Double Writers
273.331
247.634231.741
0
50
100
150
200
250
300
old buffered + BE buffered + LE
Me
an T
ime
(m
s)
Double Write Modes
ORC Write Double Performance(smaller is better)
Double Writers
● JVM is big-endian
● X86 is little-endian
● Special handling of NaN
Page 18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: Scale compression buffers
269.4263.3
258.5 258.4 258.4 258.4
184.8 183.5 182.2 180.1 178.3 177.4
140
160
180
200
220
240
260
280
300
320
8 16 32 64 128 256
SizeinM
B
CompressionBufferSizeinKB
FileSize
ZLIB
SNAPPY
Large Columns vs More Columns
● Adjust when >1000 columns
Trade offs
● Compression
● Low memory use
More additions
● Dynamically partitioned insert
Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: Streaming Ingest + ACID
Broken pattern: Partitions for Atomicity-
- Isolation & Consistency on retries+
Transactions are pluggable (txn.manager)+
Cache/Replication friendly (base + deltas)+
Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: LLAP and Sub-second
ORC – Pushing for Sub-second
Page 21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: Row Indexes
Min-Max pruning
● Evaluate on statistics
Bloom filters
● Better String filters
● Filter a random distribution
LLAP Future
● Row-level vector SARGs
5999989709
540,000
10,000
No Indexes Min-Max Indexes Bloomfilter Indexes
from tpch_1000.lineitem where l_orderkey = 1212000001;
(log scale)
Rows Read
Page 22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: Row Indexes
Min-Max pruning
● Evaluate on Statistics
Bloom filters
● Better String filters
● Filter a random distribution
LLAP Future
● Row-level vector SARGs
74
4.5 1.34
No Indexes Min-Max Indexes Bloomfilter Indexes
* from tpch_1000.lineitem where l_orderkey=1212000001;(smaller better)
Time Taken (seconds)
Page 23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: JDK8 SIMD Readers
Integer encodings
● Base + Delta
● Run-length
● Direct
Trade-off for Size/Speed
● Use fixed bit-width loops
● Snap to nearest bit-width
0
200
400
600
800
1000
1200
1400
1600
1800
1 2 4 8 16 24 32 40 48 56 64
Me
an T
ime
(m
s)
Bit Width
ORC Read Integer Performance
hive 0.13 unpacking
hive-1.0 unpacking (new)
Page 24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: Vectorization + SIMD
Advantage of a Single SerDe
● Primitive Types
Allocation free tight inner loops
● JDK8 has auto-vectorization
Vectorized Early Filter
● Vectors can be filtered early in ORC
● StringDictionary can be used to binary-search
Vectorized SIMD Join
● Performance for single key joins
0x00007f13d2e6afb0: vmovdqu 0x10(%rsi,%rax,8),%ymm20x00007f13d2e6afb6: vaddpd %ymm1,%ymm2,%ymm20x00007f13d2e6afba: movslq %eax,%r100x00007f13d2e6afbd: vmovdqu 0x30(%rsi,%r10,8),%ymm3
;*daload vector.expressions.gen.DoubleColAddDoubleColumn::evaluate (line 94)
0x00007f13d2e6afc4: vmovdqu %ymm2,0x10(%rdx,%rax,8)0x00007f13d2e6afca: vaddpd %ymm1,%ymm3,%ymm20x00007f13d2e6afce: vmovdqu %ymm2,0x30(%rdx,%r10,8)
;*dastore vector.expressions.gen.DoubleColAddDoubleColumn::evaluate (line 94)
Page 25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: Split Strategies + Tez Grouping
Amdahl’s Law
● As fast as the slowest task
● Slice work thinly, but not too thin
Split-generation vs Execution time
● ETL
● BI
● Hybrid
Split-grouping & estimation
● ColumnarSplit size
● Group by estimate, not file size
● Bucket pruning
Slow split
Page 26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: LLAP
- JIT Performance for short queries+
Row-group level caching+
Asynchronous IO Elevator+
+ Multi-threaded Column Vector processing+
Page 27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ORC: LLAP (+ SIMD + Split Strategies + Row Indexes)
Page 28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Questions?
?Interested? Stop by the Hortonworks booth to learn more
Page 29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Endnotes
(1) https://code.facebook.com/posts/229861827208629/scaling-the-facebook-data-warehouse-to-300-pb/
(2) http://www.slideshare.net/AdamKawa/a-perfect-hive-query-for-a-perfect-meeting-hadoop-summit-2014