llap: sub-second analytical queries in hive
TRANSCRIPT
Page 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
LLAP: Sub-Second Analytical Queries in HiveGopal Vijayaraghavan
Page 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why LLAP?• People like Hive• Disk->Mem is getting further away
– Cloud Storage isn’t co-located– Disks are connected to the CPU via network
• Security landscape is changing– Cells & Columns are the new security boundary, not files– Safely masking columns needs a process boundary
• Concurrency, Performance & Scale are at conflict– Concurrency at 100k queries/hour– Latencies at 2-5 seconds/query– Petabyte scale warehouses (with terabytes of “hot” data)
Node
LLAP Process
Cache
Query Fragment
HDFS
Query Fragment
Page 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What is LLAP?• Hybrid model combining daemons and containers for
fast, concurrent execution of analytical workloads (e.g. Hive SQL queries)
• Concurrent queries without specialized YARN queue setup• Multi-threaded execution of vectorized operator pipelines
• Asynchronous IO and efficient in-memory caching
• Relational view of the data available thru the API• High performance scans, execution code pushdown• Centralized data security
Node
LLAP Process
Cache
Query Fragment
HDFS
Query Fragment
Page 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hive 2.0 (+ LLAP)• Transparent to Hive users, BI tools, etc.
• Hive decides where query fragments run (LLAP, Container, AM) based on configuration, data size, format, etc.
• Each Query coordinated independently by a Tez AM
• Number of concurrent queries throttled by number of active AMs
• Hive Operators used for processing
• Tez Runtime used for data transfer
HiveServer2Query/AM Controller
Client(s) YARN Cluster
AM1
llapd
llapd
Container AM1
Container AM1
llapd
Container AM2AM2
AM3
llapd
Page 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Industry benchmark – 10Tb scale
query3 query12 query20 query21 query26 query27 query42 query52 query55 query73 query89 query91 query980
5
10
15
20
25
30
35
40
45
50
LLAP vs Hive 1.x 10TB Scale
Hive 1.x LLAP
Tim
e (s
econ
ds)
90% faster
Page 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Evaluation from a customer case study1 3 5 7 9 11 13 15 17 19 21 23 25 27
05000
1000015000
• Aggregate daily statistics for a time interval:
SELECT yyyymmdd, sum(total_1), sum(total_2),
... from table where yyyymmdd >= xxx
and yyyymmdd < xxx and userid = xxx group by userid, yyyymmdd;
D W M Y D W M Y D W M YTez Phoenix LLAP
0
1
2
3
4
5
6
7
8
Max
Max
Max
Avg
Avg Avg
Exec
ution
tim
e, s
Page 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Evaluation from a customer case study• Display a large report
Execution Time in seconds over time range
D W M Y D W M YTez LLAP
0
5
10
15
20
Max
Max
AvgAvg
Exec
ution
tim
e, s
Page 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Cut-away to demo (GIF)
Page 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
How does LLAP make queries faster?
Page 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
LLAP
Queue
Technical overview – execution • LLAP daemon has a number of executors (think
containers) that execute work "fragments"
• Fragments are parts of one, or multiple parallel workloads (e.g. Hive SQL queries)
• Work queue with pluggable priority• Geared towards low latency queries over long-
running queries (by default)
• I/O is similar to containers – read/write to HDFS, shuffle, other storages and formats
• Streaming output for data API
ExecutorQ1 Map 1
ExecutorExternal read
ExecutorQ3 Reducer 3
Q1 Map 1
Q1 Map 1
Q3 Map 19
HDFS
Waiting for shuffle inputs
HBase
Container (shuffle input)
Spark executor
Page 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Executor
Technical overview – IO layer• Optional: when executing inside LLAP
• All other formats use in-sync mode
• Asynchronous IO for Hive• Wraps over InputFormat, reads through cache• Supported with ORC
• Transparent, compressed in-memory cache• Format-specific, extensible• NVMe/NVDIMM caches
RecordReader
Fragment
Cache
IO thread
Plan & decode
Read, decompress
Metadata cache
Actual data (HDFS, S3, …)
What to read Data buffers
Splits Vectorized data
Indexes
Page 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Parallel queries – priorities, preemption• Lower-priority fragments can be preempted
• For example, a fragment can start running before its inputs are ready, for better pipelining; such fragments may be preempted
• LLAP work queue examines the DAG parameters to give preference to interactive (BI) queries
LLAP
QueueExecutor
Executor
Interactive query map 1/3
…
Interactive query map 3/3
ExecutorInteractive
query map 2/3
Wide query reduce waiting
Time
Clus
ter U
tiliza
tion
Long-Running Query
Short-Running Query
Page 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
I/OExecution
In-memory processing – present and future
DecoderDistributed FS
Fragment
Hive operator
Hive operator
Vectorized processing
col1 col2
Low-level pushdown (e.g. filter)*
SSD cache*
Off-heap cache Compression codec
Native data vectors
- work in progress
*
Compact encoded data
Compressed data
col1
col2
Page 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
First query erformance• Cold LLAP is nearly as fast as
shared pre-warmed containers (impractical on real clusters)
• Realistic (long-running) LLAP ~3x faster than realistic (no prewarm) Tez
0
5
10
15
20
25
No prewarm20.94
Shared prewarm11.96
Cold LLAP13.23
Realistic LLAP7.49Fi
rst q
uery
runti
me,
s
Page 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
JIT Performance – heavy use• Cache disabled!
0 1 2 3 4 5 6 7 80
2
4
6
8
10
12
14
7.49
6.1
4.61 4.59 4.93 4.62 4.63 4.45 4.25
13.23
7.93 7.7
6.295.44 5.3 5.01 4.89 4.83
Cold LLAP
LLAP after an unre-lated workload
Query run
Que
ry ru
ntim
e, s
~2x
time
savi
ng
For cold start, the warmup took 3 runs
Page 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Parallel query execution – LLAP vs Hive 1.2
1 2 4 8 160
500
1000
1500
2000
2500
3000
3500
3131
2416
1741
14201508
531
291 147 94 75
Hive 1.2 total runtimeLLAP total runtimeHive 1.2 linear scalingLLAP linear scaling
Number of concurrent users
Tota
l run
time
for a
ll qu
erie
s, s.
Parallelism eats into latency
Page 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Parallel query execution – 10Tb scale
1 2 4 8 160
100
200
300
400
500
600
531
291
147
9475
LLAP total runtimeLLAP linear scaling
Number of concurrent users
Tota
l run
time
for a
ll qu
erie
s, s.
Latency sensitive pre-emption
Page 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Performance – cache on HDFS, 1Tb scale
24 26 28 30 32 34 36 38 40 420
5
10
15
20
25
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
18.317.6
16.2 16.216.8
14.5
11.3
20.2Query runtime, after unrelated workload
Query runtime, after related workload
Metadata hit rate
Data hit rate
Cache size, Gb
Que
ry ru
ntim
e, s
Cach
e hi
t rat
e
Page 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
LLAP as a “relational” datanode
Page 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Example - SparkSQL integration – execution flow
Page 20
HadoopRDD.compute() LlapInputFormat.getRecordReader() - Open socket for incoming data - Send package/plan to LLAP
Check permissionsGenerate splits w/LLAP locationsReturn securely signed splits
Ranger+HiveServer2
Spark Executor
HadoopRDD.getPartitions() Get Hive splits
Cluster nodes
Verify splitsRun scan + transformSend data back
LLAP
Partitions/Splits
Request splits
: :
var llapContext = LlapContext.newInstance(sparkContext, jdbcUrl)
var df: DataFrame = llapContext.sql("select * from tpch_text_5.region")
DataFrame for Hive/LLAP data
Page 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Monitoring LLAP Queries
Page 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Monitoring• LLAP exposes a UI for
monitoring
• Also has jmx endpoint with much more data, logs and jstack endpoints as usual
• Aggregate monitoring UI is work in progress
Page 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Watching queries – Tez UI integration
Page 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Questions?
?Interested? Stop by the Hortonworks booth to learn more