llap: sub-second analytical queries in hive

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

LLAP: Sub-Second Analytical Queries in HiveGopal Vijayaraghavan


Why LLAP?• People like Hive• Disk->Mem is getting further away

– Cloud Storage isn’t co-located– Disks are connected to the CPU via network

• Security landscape is changing– Cells & Columns are the new security boundary, not files– Safely masking columns needs a process boundary

• Concurrency, Performance & Scale are at conflict– Concurrency at 100k queries/hour– Latencies at 2-5 seconds/query– Petabyte scale warehouses (with terabytes of “hot” data)

Node

LLAP Process

Cache

Query Fragment

HDFS

Query Fragment


What is LLAP?• Hybrid model combining daemons and containers for

fast, concurrent execution of analytical workloads (e.g. Hive SQL queries)

• Concurrent queries without specialized YARN queue setup• Multi-threaded execution of vectorized operator pipelines

• Asynchronous IO and efficient in-memory caching

• Relational view of the data available thru the API• High performance scans, execution code pushdown• Centralized data security

Node

LLAP Process

Cache

Query Fragment

HDFS

Query Fragment


Hive 2.0 (+ LLAP)• Transparent to Hive users, BI tools, etc.

• Hive decides where query fragments run (LLAP, Container, AM) based on configuration, data size, format, etc.

• Each Query coordinated independently by a Tez AM

• Number of concurrent queries throttled by number of active AMs

• Hive Operators used for processing

• Tez Runtime used for data transfer

HiveServer2Query/AM Controller

Client(s) YARN Cluster

AM1

llapd

llapd

Container AM1

Container AM1

llapd

Container AM2AM2

AM3

llapd


Industry benchmark – 10Tb scale

query3 query12 query20 query21 query26 query27 query42 query52 query55 query73 query89 query91 query980

5

10

15

20

25

30

35

40

45

50

LLAP vs Hive 1.x 10TB Scale

Hive 1.x LLAP

Tim

e (s

econ

ds)

90% faster


Evaluation from a customer case study1 3 5 7 9 11 13 15 17 19 21 23 25 27

05000

1000015000

• Aggregate daily statistics for a time interval:

SELECT yyyymmdd, sum(total_1), sum(total_2),

... from table where yyyymmdd >= xxx

and yyyymmdd < xxx and userid = xxx group by userid, yyyymmdd;

D W M Y D W M Y D W M YTez Phoenix LLAP

0

1

2

3

4

5

6

7

8

Max

Max

Max

Avg

Avg Avg

Exec

ution

tim

e, s


Evaluation from a customer case study• Display a large report

Execution Time in seconds over time range

D W M Y D W M YTez LLAP

0

5

10

15

20

Max

Max

AvgAvg

Exec

ution

tim

e, s


Cut-away to demo (GIF)


How does LLAP make queries faster?


LLAP

Queue

Technical overview – execution • LLAP daemon has a number of executors (think

containers) that execute work "fragments"

• Fragments are parts of one, or multiple parallel workloads (e.g. Hive SQL queries)

• Work queue with pluggable priority• Geared towards low latency queries over long-

running queries (by default)

• I/O is similar to containers – read/write to HDFS, shuffle, other storages and formats

• Streaming output for data API

ExecutorQ1 Map 1

ExecutorExternal read

ExecutorQ3 Reducer 3

Q1 Map 1

Q1 Map 1

Q3 Map 19

HDFS

Waiting for shuffle inputs

HBase

Container (shuffle input)

Spark executor


Executor

Technical overview – IO layer• Optional: when executing inside LLAP

• All other formats use in-sync mode

• Asynchronous IO for Hive• Wraps over InputFormat, reads through cache• Supported with ORC

• Transparent, compressed in-memory cache• Format-specific, extensible• NVMe/NVDIMM caches

RecordReader

Fragment

Cache

IO thread

Plan & decode

Read, decompress

Metadata cache

Actual data (HDFS, S3, …)

What to read Data buffers

Splits Vectorized data

Indexes


Parallel queries – priorities, preemption• Lower-priority fragments can be preempted

• For example, a fragment can start running before its inputs are ready, for better pipelining; such fragments may be preempted

• LLAP work queue examines the DAG parameters to give preference to interactive (BI) queries

LLAP

QueueExecutor

Executor

Interactive query map 1/3

…

Interactive query map 3/3

ExecutorInteractive

query map 2/3

Wide query reduce waiting

Time

Clus

ter U

tiliza

tion

Long-Running Query

Short-Running Query


I/OExecution

In-memory processing – present and future

DecoderDistributed FS

Fragment

Hive operator

Hive operator

Vectorized processing

col1 col2

Low-level pushdown (e.g. filter)*

SSD cache*

Off-heap cache Compression codec

Native data vectors

- work in progress

*

Compact encoded data

Compressed data

col1

col2


First query erformance• Cold LLAP is nearly as fast as

shared pre-warmed containers (impractical on real clusters)

• Realistic (long-running) LLAP ~3x faster than realistic (no prewarm) Tez

0

5

10

15

20

25

No prewarm20.94

Shared prewarm11.96

Cold LLAP13.23

Realistic LLAP7.49Fi

rst q

uery

runti

me,

s


JIT Performance – heavy use• Cache disabled!

0 1 2 3 4 5 6 7 80

2

4

6

8

10

12

14

7.49

6.1

4.61 4.59 4.93 4.62 4.63 4.45 4.25

13.23

7.93 7.7

6.295.44 5.3 5.01 4.89 4.83

Cold LLAP

LLAP after an unre-lated workload

Query run

Que

ry ru

ntim

e, s

~2x

time

savi

ng

For cold start, the warmup took 3 runs


Parallel query execution – LLAP vs Hive 1.2

1 2 4 8 160

500

1000

1500

2000

2500

3000

3500

3131

2416

1741

14201508

531

291 147 94 75

Hive 1.2 total runtimeLLAP total runtimeHive 1.2 linear scalingLLAP linear scaling

Number of concurrent users

Tota

l run

time

for a

ll qu

erie

s, s.

Parallelism eats into latency


Parallel query execution – 10Tb scale

1 2 4 8 160

100

200

300

400

500

600

531

291

147

9475

LLAP total runtimeLLAP linear scaling

Number of concurrent users

Tota

l run

time

for a

ll qu

erie

s, s.

Latency sensitive pre-emption


Performance – cache on HDFS, 1Tb scale

24 26 28 30 32 34 36 38 40 420

5

10

15

20

25

0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

18.317.6

16.2 16.216.8

14.5

11.3

20.2Query runtime, after unrelated workload

Query runtime, after related workload

Metadata hit rate

Data hit rate

Cache size, Gb

Que

ry ru

ntim

e, s

Cach

e hi

t rat

e


LLAP as a “relational” datanode


Example - SparkSQL integration – execution flow

Page 20

HadoopRDD.compute() LlapInputFormat.getRecordReader() - Open socket for incoming data - Send package/plan to LLAP

Check permissionsGenerate splits w/LLAP locationsReturn securely signed splits

Ranger+HiveServer2

Spark Executor

HadoopRDD.getPartitions() Get Hive splits

Cluster nodes

Verify splitsRun scan + transformSend data back

LLAP

Partitions/Splits

Request splits

: :

var llapContext = LlapContext.newInstance(sparkContext, jdbcUrl)

var df: DataFrame = llapContext.sql("select * from tpch_text_5.region")

DataFrame for Hive/LLAP data


Monitoring LLAP Queries


Monitoring• LLAP exposes a UI for

monitoring

• Also has jmx endpoint with much more data, logs and jstack endpoints as usual

• Aggregate monitoring UI is work in progress


Watching queries – Tez UI integration


Questions?

?Interested? Stop by the Hortonworks booth to learn more

llap: sub-second analytical queries in hive

Technology