yarn: a resource manager for analytic platform

38
Copyright©2016 NTT corp. All Rights Reserved. YARN: A Resource Manager for Analytic Platform Tsuyoshi Ozawa [email protected] [email protected]

Upload: tsuyoshi-ozawa

Post on 12-Apr-2017

177 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: YARN: a resource manager for analytic platform

Copyright©2016 NTT corp. All Rights Reserved.

YARN: A Resource Manager for Analytic PlatformTsuyoshi [email protected]@apache.org

Page 2: YARN: a resource manager for analytic platform

2Copyright©2016 NTT corp. All Rights Reserved.

• Tsuyoshi Ozawa• Research Engineer @ NTT

• Twitter: @oza_x86_64• Over 150 reviews in 2015• Apache Hadoop Committer and PMC

• Introduction to Hadoop 2nd Edition(Japanese)” Chapter 22(YARN)

• Online article: gihyo.jp “Why and How does Hadoop work?”

About me

Page 3: YARN: a resource manager for analytic platform

3Copyright©2016 NTT corp. All Rights Reserved.

• 3 features of YARN• Behaviors of DAG processing engine on YARN

• Case study1: Tez on YARN• Case study2: Spark on YARN

• How do they work and best practice

Agenda

Page 4: YARN: a resource manager for analytic platform

4Copyright©2016 NTT corp. All Rights Reserved.

• A resource manager for Apache Hadoop• Being able to share resources not only across MapReduce

job, but also across the other processing framework

• 3 kind of features1. Managing resources in cluster2. Managing history logs of application3. Mechanism for users to know locations running on

YARN

YARN

Page 5: YARN: a resource manager for analytic platform

Copyright©2016 NTT corp. All Rights Reserved.

YARN features 1. Resource Management

Page 6: YARN: a resource manager for analytic platform

6Copyright©2016 NTT corp. All Rights Reserved.

• Taking different data analytics platform based on workloads

• Google uses Tenzing/MapReduce for batch/ETL processing,BigQuery for try and error,TensorFlow for machine learning

• There are 2 ways to do so:1. Separating each clusters

• Pros: easy to separate workload and resource• Cons: difficult to copy big data between clusters

2. Living together in the same cluster• Pros: no need to copy big data between clusters • Cons: difficult to separate resources

Why manages resources with YARN?

Page 7: YARN: a resource manager for analytic platform

7Copyright©2016 NTT corp. All Rights Reserved.

• Hadoop attaches weight to scalabilitysince Hadoop is for processing big data!

• Hence, YARN attaches weight to scalability!• Different frameworks living together in the same

cluster without moving data• Launching masters of processing framework per job

• At Hadoop MapReduce v1, master of MR get overload and reaches limit of scalability in large scale processing

Design policy of YARN

Page 8: YARN: a resource manager for analytic platform

8Copyright©2016 NTT corp. All Rights Reserved.

• Master-Worker architecture• Master manages resources in cluster

(ResourceManager)• Worker manages containers per machine

(NodeManager)

Architecture of YARN

ResourceManager

Worker

NodeManagerContainer Container Container

Worker

NodeManagerContainer Container Container

Worker

NodeManagerContainer Container Container

Master Worker Worker MasterWorker WorkerMaster Worker Worker

Client1. Submitting job

2. Allocating a container for master,

launching master3. Master requests containers to RM

and launching worker on the container

Page 9: YARN: a resource manager for analytic platform

9Copyright©2016 NTT corp. All Rights Reserved.

• Issue• Allocated containers are not monitored: it’s allocated, but

it has a possibility not to utilize them

• Suggestion by YARN communityYARN-1011(Karthik, Cloudera)

• Monitoring all allocated containers and launching(over committing) tasks if it’s not used

• Effect• NodeManager can utilize more resources per node

Issue of resource management in YARN

Page 10: YARN: a resource manager for analytic platform

Copyright©2016 NTT corp. All Rights Reserved.

YARN features2. Managing history logs

Page 11: YARN: a resource manager for analytic platform

11Copyright©2016 NTT corp. All Rights Reserved.

• Application informationwhich is running or completed

• It’s useful to tune performance and debug!

• Types of history log1. Common information which all YARN applications have

• Allocated containers, name of used queues2. Application specific information

• E.g. MapReduce case: Mapper/Reducer’s counter/amount of shuffle etc.

• Server for preserving and displaying history logs → Timeline Server

What is history log?

Page 12: YARN: a resource manager for analytic platform

12Copyright©2016 NTT corp. All Rights Reserved.

• YARN applications only show generic logwith “YARN’s common information”

• Screenshot of Timeline Server

What can Timeline Server do?

Page 13: YARN: a resource manager for analytic platform

13Copyright©2016 NTT corp. All Rights Reserved.

• Rich performance analysis With YARN application specific information

• examples: Tez-UI

What can Timeline Server do?

http://hortonworks.com/wp-content/uploads/2014/09/tez_2.png

Page 14: YARN: a resource manager for analytic platform

14Copyright©2016 NTT corp. All Rights Reserved.

Example of Tez-UI

• We can check job statistics in detail

From: http://hortonworks.com/wp-content/uploads/2015/06/Tez-UI-1024x685.png

Page 15: YARN: a resource manager for analytic platform

15Copyright©2016 NTT corp. All Rights Reserved.

• Scalability to number of applications• Easy to access history logs

• Users retrieve data via RESTful API

Design phylosophy of Timeline Server

Page 16: YARN: a resource manager for analytic platform

16Copyright©2016 NTT corp. All Rights Reserved.

• 3 versions: V1, V1.5, V2

• V1 design• Pluggable backend storage

• LevelDB is default backend storage• Timeline Server includes Writer and Reader

• Limited scalability• V2 design for scalability

• Distributed Writer and Reader• Scalable KVS as Backend storage

• Focusing on HBase• Changing API drastically

• Being enable to profiling across multiple jobs

• (New!)V1.5 design• HDFS backend storage with V2 API• Reader/Writer separation like V2• Please note the incompatibility between V1 and V1.5

Current status of Timeline Server

Page 17: YARN: a resource manager for analytic platform

17Copyright©2016 NTT corp. All Rights Reserved.

• Do you need other backends for Timeline Server?

• Alternatives• Time-series DB

• Apache Kudu• RDBMS

• PostgreSQL or MySQL

Question from me

Page 18: YARN: a resource manager for analytic platform

Copyright©2016 NTT corp. All Rights Reserved.

YARN feature3. Service Registry

Page 19: YARN: a resource manager for analytic platform

19Copyright©2016 NTT corp. All Rights Reserved.

• A service to know the location(IP, port) of YARN application

• For running long-running jobs like HBase on YARN, Hbaseon YARN, LLAP(Hive on Tez), MixServer in Hivemall

• Why do we need service registry?• To notify clients destination of write operation

What is Service Registry?

NodeManager

ServiceAM

ServiceContainer

client ZK ResourceManager1. Request to registerport and address

2. Preserving3. Get info

4. Connect

Page 20: YARN: a resource manager for analytic platform

Copyright©2016 NTT corp. All Rights Reserved.

Behavior of applicationson YARN

Page 21: YARN: a resource manager for analytic platform

21Copyright©2016 NTT corp. All Rights Reserved.

• New processing frameworks include a high-level DSL than MapReduce

• Apache Tez: HiveQL/Pig/Cascading• Apache Spark: RDD/SQL/DataFrame/DataSets• Apache Flink: Table/DataSets/DataStream

• Features• More generic Directed Acyclic Graph = DAG, instead of

MapReduce, describes jobs• One kind of compilation DSL to DAG with Spark, Hive on Tez,

Flink

MR-inspired distributed and parallel processing framework on YARN

Page 22: YARN: a resource manager for analytic platform

22Copyright©2016 NTT corp. All Rights Reserved.

• DAG can express complex job, like Map – Map –Reduce, as single job

• Why it is bad to increase number of jobs?• Before finishing Shuffle, Reduce cannot be started• Before finishing Reduce, next job cannot be started

→ Decreasing parallelism

Why use DAG?

Figure from: https://tez.apache.org/

Hive on MR Hive on Tez/Spark

Page 23: YARN: a resource manager for analytic platform

23Copyright©2016 NTT corp. All Rights Reserved.

• The first feature is job description with DAG• The second one depends on processing framework

• Apache Tez• DAG + Disk-IO optimization + Low-latency query support

• Hive/Pig/Spark/Flink can run on Tez• Apache Spark

• DAG + Low-latency in-memory processing calculation+ Functional programming-style interface

• Apache Flink• DAG + Shuffle optimization

+ Streaming/Batch processing transparent interface• Like, Apache Beam

Features of Modern processing frameworks

Page 24: YARN: a resource manager for analytic platform

24Copyright©2016 NTT corp. All Rights Reserved.

• Large-scale batch processing• Yes, MapReduce can handle it well• Sometimes MapReduce is still best choice for the stability

• Short query for data analytics• Trial and error to check the mood of data with changing

queries• Google Dremel can handle it well

• After evolving interface of MR(SQL), users need improvement of latency, not throughput

• Currently, we take on different processing framework• However, it’s a just workaround

• YARN think the former is 1st citizen at first.• How can we run the latter on YARN effectively?

Workloads modern processing frameworkscan handle

Page 25: YARN: a resource manager for analytic platform

25Copyright©2016 NTT corp. All Rights Reserved.

• Overhead of job launching• Because YARN needs 2 step to launch jobs:

master and workers• Taking time to warm up

• Because YARN terminates containers after the job exits

• Tez/Spark/Flink runs on server-side JVM,so these problems are remarkable→ In this talk, I’ll talk about how Tez and Spark manages resources on YARN!

Problem for running low-latency query on YARN

Page 26: YARN: a resource manager for analytic platform

26Copyright©2016 NTT corp. All Rights Reserved.

• MapReduce-inspired DAG processing framework for large-scale data effectively under DSL(Pig/Hive)

• Passing key-value pair internally like MapReduce• Runtime DAG rewriting

• DAG plan itself is dumped by Hive optimizer or Pig Optimizer• Being able to describe DAG without Sorting while shuffling

• Example of Hive query execution flow• Writing Hive query• Submitting Hive query• Dumping DAG of Tez by Hive optimizer• Executing Tez

Overview of Apache Tez

Page 27: YARN: a resource manager for analytic platform

27Copyright©2016 NTT corp. All Rights Reserved.

• Container reusing• Reusing containers instead of releasing containers in a job

as possible as master can• Keeping allocated containers if session lives after

completing jobs

Q. What is different between MapReduce’s Container Reuse and Tez’s one?A.

• Container reusing is removed by accident at the implementation of MapReduce on YARN

• Session enable us to reuse containers across multiple jobs

Effective resource management in Tez

Page 28: YARN: a resource manager for analytic platform

28Copyright©2016 NTT corp. All Rights Reserved.

• Long Lived and Process• Allocating containers without relationship with “Session”

→ Being able to reuse warm-up JVM and cacheMore low latency

• LLAP uses Registry Service

• Q. Is it like daemon of database?A.

• That’s right.The difference is that Tez can accelerate jobs by adding resources from YARN.

Effective resource management in Tez(cont.)

Page 29: YARN: a resource manager for analytic platform

29Copyright©2016 NTT corp. All Rights Reserved.

• Rule• Keeping container long can improve latency but can

worsen throughput• Keeping container short can improve

throughput(resource utilization) but can worsen latency

• Container reuse• tez.am.container.idle.release-timeout-min.millis

• Keeping containers in this period• tez.am.container.idle.release-timeout-max.millis

• Releasing containers after the period

• LLAP• Looking like being WIP

• Note: Hive configuration override Tez configuration, so please test it carefully

Tuning parameter

Page 30: YARN: a resource manager for analytic platform

30Copyright©2016 NTT corp. All Rights Reserved.

• Hive on Tez assumed batch processing at first• Releasing all containers after finishing jobs

→ Attaching more weight on throughput than latency

• As expanding use case of Hive, users need low latency for interactive queries running on REPL

• Improving latency based on high-throughput architecture

Tez on YARN summary

Page 31: YARN: a resource manager for analytic platform

31Copyright©2016 NTT corp. All Rights Reserved.

• In-memory processing framework (originally!)

• Standalone mode is supported• Spark on YARN is also supported

• Spark has 2 kinds of running mode on YARN• yarn-cluster mode

• Submitting job (spark-submit)• yarn-client mode

• REPL style (spark-shell)

• Both of them run on YARN

Overview of Apache Spark

Page 32: YARN: a resource manager for analytic platform

32Copyright©2016 NTT corp. All Rights Reserved.

• Launching Spark driver on YARN container• Assuming spark-submit command

yarn-cluster mode

ResourceManager

NodeManager

Spark AppMaster

NodeManager

Worker 2Worker 1

Client

Driver

1. Submitting jobs via spark-submit

2. Launching master3. Allocating resources for workers by Spark AppMaster

Page 33: YARN: a resource manager for analytic platform

33Copyright©2016 NTT corp. All Rights Reserved.

• Launching Spark driver at client-side• Assuming REPL like spark-shell

yarn-client mode

ResourceManager

NodeManager

Spark AppMaster

NodeManager

Worker 2Worker 1

Client

Driver

1. Launching spark-shell,Connecting to RM

2. Launching Master3. Allocating resourcesfor workers by Spark AppMaster 3 4. Enjoy interactive

programming!!

Page 34: YARN: a resource manager for analytic platform

34Copyright©2016 NTT corp. All Rights Reserved.

• Spark have a feature of container reuse of Teznatively

• Spark can handle low-latency query very well• However, Spark cannot “release containers”→ State of tasks in memory will be lost→ Cannot release containers still when allocated resources is not used→ Decreasing resource utilization

Resource management of Spark on YARN

NodeManager

Container Container

NodeManager

Container Container

Stage 1Stage 2

100% 100% 100% 100%0% 0% 0%100%

Page 35: YARN: a resource manager for analytic platform

35Copyright©2016 NTT corp. All Rights Reserved.

• Allocating/releasing containers based on workload

• Intermediate data is persisted by using NodeManager’sShuffle Service→ Not in memory data

• Triggers of allocating/releasing containers is timer

Dynamic resource allocation(since Spark 1.2)

NodeManager

Exe1

NodeManager

Exe 2 Exe 3Exe 4(idle)

Intermediate data

SparkAppMaster

Releasing containers when executor is idle for a period

Doubling number of Executors if executorshas tasks for a period

Intermediate data

Page 36: YARN: a resource manager for analytic platform

36Copyright©2016 NTT corp. All Rights Reserved.

• Rule• Keeping container long can improve latency but can worsen

throughput• Keeping container short can improve throughput(resource utilization)

but can worsen latency→ Please tune this points!

• Max/Min/Initial number of YARN containers→ high numbers of executors increases throughput, but can decrease resource utilizations

• spark.dynamicAllocation.maxExecutors• spark.dynamicAllocation.minExecutors• spark.dynamicAllocation. initialExecutors

• A period to double number of containers for Executers when all Executors are active→ large value can improve latency and throughput but worsen stability and resource utilization

• spark.dynamicAl location.schedulerBacklogTimeout• A period to keep containers

→ large value can improve latency but worsen resource utilization• spark.dynamicAl location.sustainedSchedulerBacklogTimeout

• A period to release containers which is empty→ large value can improve latency but worsen

• spark.dynamicAllocation.executorIdleTimeout

• Releasing executor with cache data if the executor is idle in the specified period • spark.dynamicAllocation.cachedExecutorIdleTimeout

Tuning paramaters

Page 37: YARN: a resource manager for analytic platform

37Copyright©2016 NTT corp. All Rights Reserved.

• Spark assumes interactive programming interface nativelyContainer is not released by default→ Attaching more weight on latency than throughput

• Adding a features to release containers dynamically when supporting YARN

• Increasing throughput based on low-latency architecture

Spark on YARN summary

Page 38: YARN: a resource manager for analytic platform

38Copyright©2016 NTT corp. All Rights Reserved.

• 3 kinds of features YARN provides1. Resource management in cluster2. Timeline Server3. Service registry

• Dive into resource management in Tez/Spark• Apache Tez, originated from batch processing• Apache Spark, originated from interactive processing

Both of Tez and Spark can process wider query by switching high throughput mode and low latency mode

Summary