dynamic slot allocation technique for mapreduce clusters school of computer engineering nanyang...

Dynamic Slot Allocation Technique for MapReduce Clusters

School of Computer Engineering

Nanyang Technological University

25th Sept 2013

Shanjiang Tang, Bu-Sung Lee, Bingsheng He

OutLine

• Background & Motivations• DHFS• Evaluation• Conclusion

MapReduce Computation Model

Map Intermediate

Result

Intermediate

Result

Intermediate

Result

Intermediate

Result

ReduceOutputResult

FinalResult

Map-Phase Computation

Reduce-Phase Computation

InputData

Hadoop Execution Model

• Hadoop is an open-source implementation of MapReduce Model.

• The cluster computation resources are divided into map slots and reduce slots, which are configured by Hadoop administrator in advance.

• A MapReduce job generally consists of map tasks and reduce tasks.

• Map tasks have to be allocated with map slots, and reduce tasks have to be allocated with reduce slots.

Hadoop Execution Model

Map slots Reduce slots

Map tasks start before reduce tasks

Map tasks can only run on map slots, reduce tasks can only run on reduce slots

Implication: Slots utilization can be poor for MapReduce Workloads under the current static slot configuration and allocation policy!!!

Our Goals

• To maximize the slot resource utilization for Hadoop cluster without any prior knowledge or assumption about MapReduce jobs.

• In other words, we want to achieve that at any time there should be no idle map/reduce slots available when there are pending tasks, i.e., trying to make slots as busy as possible.

• Our work focuses on Hadoop Fair Scheduler, i.e., improving the performance while guaranteeing fairness.

OutLine

Our Approach

• We propose a dynamic slot allocation technique by breaking the existing slot allocation constrain:

1). Slots are generic and can be used by map and reduce tasks.

2). Map Tasks prefer to use map slots and likewise reduce tasks

prefer to use reduce slots.

• In other words,

Case 1: , no slot borrow is needed.

Case 2: , borrow reduce slots for map tasks.

Case 3: , borrow map slots for reduce tasks.

Case 4: , no slot borrow is needed.

,M M R RN S N S ,M M R RN S N S

,M M R RN S N S

Dynamic Hadoop Fair Scheduler (DHFS)

• We provide two types of DHFS, based on different levels of fairness. Pool-Independent DHFS (PI-DHFS) Pool-dependent DHFS (PD-DHFS)

• Each MapReduce pool consists of two sub-pools: Map-phase pool Reduce-phase pool

PI-DHFS

• It’s subject to the ‘fairness’ concept of default Hadoop Fair Scheduler, i.e., fair share is done across phase-pools within a phase.

• The dynamic allocation process consists of two parts: Intra-phase dynamic slot allocation Inter-phase dynamic slot allocation

PI-DHFS

• It will compute Intra-phase dynamic slot allocation first, and then Inter-phase dynamic slot allocation.

PD-DHFS

• Fair share is done across pools, instead of phase.

• The dynamic allocation process consists of two parts: Intra-pool dynamic slot allocation Inter-pool dynamic slot allocation

PD-DHFS

• It will compute Intra-pool dynamic slot allocation first, and then Inter-pool dynamic slot allocation.

Overview of Slot Allocation Flow

• The slot allocation flow for each pool under PD-DHFS.

Reduce task assignment

Map task assignment

Pending map tasks and idle

map slots?

Pending reduce tasks

and idle reduce slots?

Pending map tasks?

Pending reduce tasks?

OutLine

Experimental Setup

• Enviroments

A Hadoop cluster consisting of 10 nodes, each with two Intel X5675 CPUs, 24GB memory and 56GB hard disks.

• Workloads

Tested Workload.

It is a mix of three representative applications, WordCount, Sort, Grep, with Wikipedia article history dataset of different sizes, e.g., 10 GB, 20GB, 30GB, 40GB.

Execution Process for DHFS

Performance Improvement

Performance Improvement Under Different Percentages of Borrowed Map and Reduce Slots

OutLine

Conclusion

• Current static slot configuration and allocation policy can make slot utilization poor.

• Two DHFSs (PI-DHFS, PD-DHFS) are proposed to address the slot utilization problem for Hadoop Fair Scheduler.

• Experimental results show that DHFS improves the performance of MapReduce workloads significantly while guaranteeing the fairness.

• The source code of DHFS is available at:

http://sourceforge.net/projects/dhfs/

Acknowledgement

• This work is supported by the ”User and Domain driven data analytics as a Service framework” project under the A*STAR Thematic Strategic Research Programme (SERC Grant No. 1021580034).

• Bingsheng He was partly supported by a startup Grant of

Nanyang Technological University, Singapore.

dynamic slot allocation technique for mapreduce clusters school of computer engineering nanyang...

Documents

contingency planning & management - nanyang ... planning &...

mapreduce vs pig | mapreduce pig integration

ntulink - nanyang technological university · 2 alumni...

mapreduce a common mistake theory of mapreduce algorithms...

nanyang technological university school of biological...

introduction to mapreduce | mapreduce architecture |...

nanyang technological...

easypdp introduction and conclusion shanjiang tang...

hadoop/mapreduce - 123seminarsonly.comhadoop mapreduce •...

mapreduce paradigm

mapreduce-mpi library users...

mapreduce. mapreduce outline mapreduce architecture...

processing with what is mapreduce? hadoop/mapreduce

speedup for multi-level parallel computing school of...

hadoop and mapreduce - courses · hadoop and mapreduce...

data intensive text processing with mapreduce - #3...

ee324 distributed systems fall 2015 mapreduce. overview 2 ...

mrorder: flexible job ordering optimization for online...

nanyang commercial bank, limited directors’ report … ·...

mapreduce with scalding · java mapreduce word count...