introduction sur tez par olivier renault de hortonworks meetup du 25/11/2014

29
Page 1 © Hortonworks Inc. 2011 2014. All Rights Reserved Apache Tez: Accelerating Hadoop Query Processing Hortonworks. We do Hadoop.

Upload: hadoop-user-group-france

Post on 02-Jul-2015

697 views

Category:

Internet


1 download

DESCRIPTION

During this presentation, Olivier will introduce Apache Tez. What it does ? Why is it seen by many as the Map Reduce v2. How is it helping Hive / Pig / Cascading and other increase their performance. Speaker: Olivier Renault is a Principal Solution Engineer at Hortonworks the company behind Hortonworks Data Platform. Olivier is an expert on how to deploy Hadoop at scale in a secure and performant manner.

TRANSCRIPT

Page 1: Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014

Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Apache Tez: Accelerating Hadoop Query

Processing

Hortonworks. We do Hadoop.

Page 2: Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014

Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Who am I ?

Olivier Renault – [email protected]

Solution engineer – Hortonworks EMEA

Hadoop specialist:

- platform

- security

- tuning

Trying to tame the elephant !

Page 3: Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014

Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Tez – Introduction

Distributed execution framework

targeted towards data-processing

applications.

Based on expressing a

computation as a dataflow graph.

Highly customizable to meet a

broad spectrum of use cases.

Built on top of YARN – the

resource management framework

for Hadoop.

Open source Apache project and

Apache licensed.

Page 4: Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014

Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hadoop 1 -> Hadoop 2

HADOOP 1.0

HDFS(redundant, reliable storage)

MapReduce(cluster resource management

& data processing)

Pig(data flow)

Hive(sql)

Others(cascading)

HDFS2(redundant, reliable storage)

YARN(cluster resource management)

Tez(execution engine)

HADOOP 2.0

Data FlowPig

SQLHive

Others(cascading)

BatchMapReduce Real Time

Stream Processing

Storm

Online Data

ProcessingHBase,

Accumulo

Monolithic

- Resource management

- Execution Engine

- User API

Layered

- Resource Management – YARN

- Execution Engine – Tez

- User API – Hive, Pig, Cascading, …

Page 5: Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014

Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Tez – Empowering Applications

Tez solves hard problem of running on a distributed Hadoop environment

Apps can focus on solving their domain specific problems

This design is important to be a platform for a variety of applications

App - Custom application logic

- Custom data format

- Custom data transfer technology

Tez - Distributed parallel execution

- Negotiating resources from the hadoop framework

- Fault tolerance and recovery

- Horizontal scalability

- Resource elasticity

- Shared library of ready-to-use components

- Built-in performance optimizations

- Security

Page 6: Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014

Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Tez – End User Benefits

Better performance of application

- Built-in performance + Application define optimizations

Better predictability of results

- Minimization of overheads and queuing delays

Better utilization of compute capacity

- Efficient use of allocated resources

Reduced load on distributed filesystem (HDFS)

- Reduce unnecessary replicated writes

Reduced network usage

- Better locality and data transfer using new data patterns

Higher application developer productivity

- Focus on application business logic rather than Hadoop internals

Page 7: Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014

Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Tez – Design considerations

Leverage discrete task based compute model for elasticity, scalability and

fault tolerance

Leverage several man years of work in Hadoop Map Reduce data shuffle

operations

Leverage proven resource sharing and multi-tenancy model for Hadoop

and YARN

Leverage built-in security mechanism in Hadoop for privacy and isolation

Look to the Future with an eye on the Past

Page 8: Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014

Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Tez – Problems that it addresses

Expressing the computation

- Direct and elegant representation of the data processing flow

- Interfacing with application code and new technologies

Performance

- Late binding: Make decisions as late as possible

- Leverage the resources of the cluster efficiently

- Just work out of the box

- Customizable engine to let applications tailor the job to meet their specific requirements

Operation simplicity

- Painless to operate, experiment and upgrade

Page 9: Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014

Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Tez – Simplifying Operations

No deployments to do. No side effects. Easy and safe to try it out!

- Tez is a completely client side application.

- Simply upload to any accessible FileSystem and change local Tez configuration to point to

that.

- Enables running different versions concurrently. Easy to test new functionality while keeping

stable versions for production.

- Leverages YARN local resources.

Client

Machine

Node

Manager

TezTask

Node

Manager

TezTaskTezClient

HDFS

Tez Lib 1 Tez Lib 2

Client

Machine

TezClient

Page 10: Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014

Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Tez – Expressing the computation

Distributed data processing job typically look like DAGs ( Direct Acyclic

Graph)

- Vertices in the graph represent data transformation

- Edges represent data movement from producers to consumers

Aggregate Stage

Partition Stage

Preprocessor Stage

Sampler

Task-1 Task-2

Task-1 Task-2

Task-1 Task-2

Samples

Ranges

Distributed sort

Page 11: Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014

Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Tez – Expressing the computation

Tez provides the following APIs to define the processing

DAG API

- Defines the structure of the data processing and the relationship between producers and

consumers

- Enable definition of complex data flow pipelines using simple graph connection API’s. Tez

expands the logical DAG at runtime

- Specify all the tasks in the job

Runtime API

- Defines how the framework and app code interact with each other

- App code transforms data and moves it between tasks

- Specify what actually executes in each task on the cluster nodes

Page 12: Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014

Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Tez – Deep Dive – API

// Define DAG

DAG dag = new DAG();

// Define Vertex

Vertex map1 = new Vertex(MapProcessor.class);

Vertex reduce1 = new Vertex(ReduceProcessor.class);

// Define Edge

Edge edge1 = Edge(map1, reduce1, SCATTER_GATHER,

PERSISTED, SEQUENTIAL, MOutput.class, RInput.class);

// Connect them

dag.addVertex(map1).addVertex(map2).addEdge(edge1)…

reduce1

map2

reduce2

join1

map1

Scatter_Gather

Bipartite Sequential

Scatter_Gather

Bipartite Sequential

Simple DAG definition API

Page 13: Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014

Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved© Hortonworks Inc. 2013

Tez – Deep Dive – API

Page 14

• Data movement – Defines routing of data between tasks

– One-To-One: Data from the ith producer task routes to the ith consumer task.

– Broadcast: Data from a producer task routes to all consumer tasks.

– Scatter-Gather: Producer tasks scatter data into shards and consumer tasks gather the data. The ith

shard from all producer tasks routes to the ith consumer task.

– Custom: Define your own

• Scheduling – Defines when a consumer task is scheduled

– Sequential: Consumer task may be scheduled after a producer task completes.

– Concurrent: Consumer task must be co-scheduled with a producer task.

• Data source – Defines the lifetime/reliability of a task output

– Persisted: Output will be available after the task exits. Output may be lost later on.

– Persisted-Reliable: Output is reliably stored and will always be available

– Ephemeral: Output is available only while the producer task is running

Edge properties define the connection between producer and consumer

vertices in the DAG

Page 14: Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014

Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Tez – Logical DAG expansion at Runtime

reduce1

map2

reduce2

join1

map1 map1 map2

Red1 Red2

Join1

Page 15: Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014

Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Tez – Runtime API

Flexible Inputs-Processors-Outputs Model

- Thin API to wrap around arbitrary application code

- Compose inputs, processor and outputs to execute arbitrary procesing

- Event routing based control plane architecture

- Applications decide logical data format and data transfer technology

- Customize for performance

- Built-in implementation for Hadoop 2.0 data services – HDFS and YARN ShuffleService

Input Processor Output

initialize(tezInputContext ctxt) initialize(tezProcessorContext ctxt) initialize(tezOutputContext ctxt)

reader getReader() num(List<input> inputs,

List<output> outputs)

writer getWriter()

handleEvents(list <event> evts) handleEvents(list <event> evts) handleEvents(list <event> evts)

close() close() close()

Page 16: Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014

Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved© Hortonworks Inc. 2013

Tez: Library of Inputs and Outputs

Classical ‘Map’ Classical ‘Reduce’

Intermediate ‘Reduce’ for

Map-Reduce-Reduce

Map ProcessorHDFS InputSorted Output

Reduce Processor

Shuffle Input

HDFS Output

Reduce Processor

Shuffle Input

Sorted Output

What is build in ?–Hadoop InputFormat / OutputFormat

–SortedGroupPartitioned Key-Value Input /

Output

–UnsortedGroupedPartitioned Key-Value

Input / Output

–Key-Value Input / Output

Page 17: Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014

Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Tez - Performance

Benefits of expressing the data processing as a DAG

- Reducing overheads and queuing effects

- Gives system the global pictures for better planning

Efficient use of resources

- Re-use resources to maximize utilisation

- Pre-Launch, pre-warm and cache

- Locality & resource aware scheduling

Support for application defined DAG modification at runtime

- Change task concurrency

- Change task scheduling

- Change DAG edges

- Change DAG Vertices

Page 18: Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014

Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved© Hortonworks Inc. 2013

Tez – Benefits of DAG execution

Faster Execution and Higher predicabliity

–Eliminate replicated write barrier between successive computations.

–Eliminate job launch overhead of workflow jobs.

–Eliminate extra stage of map reads in every workflow job.

–Eliminate queue and resource contention suffered by workflow jobs that are started after a

predecessor job completes.

–Better locality because the engine has got the overall picture

Page 19

Pig/Hive - MRPig/Hive - Tez

Page 19: Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014

Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hive – MR Hive – Tez

Hive-on-MR vs. Hive-on-Tez

SELECT a.x, AVERAGE(b.y) AS avg

FROM a JOIN b ON (a.id = b.id) GROUP BY a

UNION SELECT x, AVERAGE(y) AS AVG

FROM c GROUP BY x

ORDER BY AVG;

SELECT a.state

JOIN (a, c)

SELECT c.price

SELECT b.id

JOIN(a, b)

GROUP BY a.state

COUNT(*)

AVERAGE(c.price)

M M M

R R

M M

R

M M

R

M M

R

HDFS

HDFS

HDFS

M M M

R R

R

M M

R

R

SELECT a.state,

c.itemId

JOIN (a, c)

JOIN(a, b)

GROUP BY a.state

COUNT(*)

AVERAGE(c.price)

SELECT b.id

Tez avoids unneeded

writes to HDFS

Page 20: Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014

Page 21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved© Hortonworks Inc. 2013

Tez – Container Re-Use

- Reuse YARN containers/JVMs to launch new tasks

- Reduce scheduling and launching delays

- Shared JVM objects across tasks.

- JVM JIT Friendly execution

Page 21

YARN Container

TezTask Host

TezTask1

TezTask2

Share

d O

bje

cts

YARN Container

Tez

Application Master

Start Task

Task Done

Start Task

Page 21: Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014

Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Tez - Sessions

Sessions

- Standard concepts of pre-launch and pre-

warm applied

- Key for interactive queries

- Represents a connection between the

user and the cluster

- Multiple DAGs executed in the same

session

- Container re-used across queries

- Takes care of data locality and releasing

resources when idle

Client

Application Master

Task Scheduler

Start

SessionSubmit

DAG

Pre

Warmed

JVM

Shared

Object

Registry

Conta

iner

Pool

Page 22: Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014

Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Tez – Deep Dive – Scheduling

Vertex-2

Vertex-1

Start

vertex

Vertex Manager

Start

tasks

DAG

Scheduler

Get Priority

Get Priority

Start

vertex

Task

Scheduler

Get container

Get container

Vertex Manager

• Determines task

parallelism

• Determines when tasks

in a vertex can start

DAG Scheduler

• Determines priority of

task

Task Scheduler

• Allocates containers from

YARN and assigns them

to tasks

Page 23: Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014

Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Tez – Event Based Control Plane

Events used to communicate between the tasks and between task and

framework

Data Movement Event used by producer task to inform the consumer

tasks about data location, size, etc..

Input Error event sent by task to the engine to inform about errors in

reading input. The engine then takes action by re-generating the input

Other events to send task completion notification, data statistic and other

control plane information

Page 24: Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014

Page 25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Tez – Automatic Reduce Parallelism

Map Vertex

Reduce VertexApp Master

Vertex ManagerData Size Statistics

Vertex State

Machine

Set Parallelism

Cancel Task

Re-Route

Event Model

Map tasks send data

statistics events to the

Reduce Vertex Manager.

Vertex Manager

Pluggable user logic that

understands the data

statistics and can formulate

the correct parallelism.

Advises vertex controller on

parallelism

Page 25: Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014

Page 26 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Theory to practice

Page 26: Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014

Page 27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Tez – Performance

30TB Scale factor – Hive 10 RC File, Hive 13 ORC

Page 27: Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014

Page 28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Tez – Observations on Performance

Number of stage in the DAG

- High number of stages in the DAG

Cluster / Queue capacity

- Congested queue - container re-use

Size of intermediate output

- Large size of intermediate output – less HDFS usage

Size of data in the job

- Small data and lot of stages – Less overhead than MR

Offload work to the cluster

- Use DAG – utilize parallelism and resources of the cluster

Vertex caching

- Reduce re-computation

Page 28: Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014

Page 29 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Tez – Adoption Path

Pre-requisite: Hadoop 2 with YARN

Simple client-side install ( no admin support needed )

- Need a folder with write permission HDFS

- No side effects or traces left behind on your cluster

Apache Hive – Available in 0.13

- Set “hive.execution.engine” to ”tez”

Apache Pig – Available in 0.14

Cascading – Version 3.0

Run your MapReduce jobs using Tez runtime

- Set “mapreduce.framework.name” to “yarn-tez”

Page 29: Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014

Page 30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Tez - Roadmap

Richer DAG support

- Addition of vertices at runtime

- Shared edges for shared outputs

- Enhance Input / Output library

Performance optimizations

- Improve support for high concurrency

- Improve locality aware scheduling

- Add framework level data statistics

- HDFS memory storage integration

Usability

- Tez UI

- API ease of use