dynamic slot allocation technique for mapreduce clusters school of computer engineering nanyang...
TRANSCRIPT
Dynamic Slot Allocation Technique for MapReduce Clusters
School of Computer Engineering
Nanyang Technological University
25th Sept 2013
Shanjiang Tang, Bu-Sung Lee, Bingsheng He
1
OutLine
• Background & Motivations• DHFS• Evaluation• Conclusion
2
MapReduce Computation Model
Map Intermediate
Result
Intermediate
Result
Intermediate
Result
Intermediate
Result
Map
Map
Map
ReduceOutputResult
ReduceOutputResult
ReduceOutputResult
ReduceOutputResult
FinalResult
Map-Phase Computation
Reduce-Phase Computation
InputData
3
Hadoop Execution Model
• Hadoop is an open-source implementation of MapReduce Model.
• The cluster computation resources are divided into map slots and reduce slots, which are configured by Hadoop administrator in advance.
• A MapReduce job generally consists of map tasks and reduce tasks.
• Map tasks have to be allocated with map slots, and reduce tasks have to be allocated with reduce slots.
4
Hadoop Execution Model
5
Map slots Reduce slots
Map tasks start before reduce tasks
Map tasks can only run on map slots, reduce tasks can only run on reduce slots
Implication: Slots utilization can be poor for MapReduce Workloads under the current static slot configuration and allocation policy!!!
Our Goals
• To maximize the slot resource utilization for Hadoop cluster without any prior knowledge or assumption about MapReduce jobs.
• In other words, we want to achieve that at any time there should be no idle map/reduce slots available when there are pending tasks, i.e., trying to make slots as busy as possible.
• Our work focuses on Hadoop Fair Scheduler, i.e., improving the performance while guaranteeing fairness.
6
OutLine
• Background & Motivations• DHFS• Evaluation• Conclusion
7
Our Approach
• We propose a dynamic slot allocation technique by breaking the existing slot allocation constrain:
1). Slots are generic and can be used by map and reduce tasks.
2). Map Tasks prefer to use map slots and likewise reduce tasks
prefer to use reduce slots.
• In other words,
Case 1: , no slot borrow is needed.
Case 2: , borrow reduce slots for map tasks.
Case 3: , borrow map slots for reduce tasks.
Case 4: , no slot borrow is needed.
8
,M M R RN S N S ,M M R RN S N S
,M M R RN S N S
,M M R RN S N S
Dynamic Hadoop Fair Scheduler (DHFS)
• We provide two types of DHFS, based on different levels of fairness. Pool-Independent DHFS (PI-DHFS) Pool-dependent DHFS (PD-DHFS)
• Each MapReduce pool consists of two sub-pools: Map-phase pool Reduce-phase pool
9
PI-DHFS
• It’s subject to the ‘fairness’ concept of default Hadoop Fair Scheduler, i.e., fair share is done across phase-pools within a phase.
• The dynamic allocation process consists of two parts: Intra-phase dynamic slot allocation Inter-phase dynamic slot allocation
10
PI-DHFS
• It will compute Intra-phase dynamic slot allocation first, and then Inter-phase dynamic slot allocation.
11
PD-DHFS
• Fair share is done across pools, instead of phase.
• The dynamic allocation process consists of two parts: Intra-pool dynamic slot allocation Inter-pool dynamic slot allocation
12
PD-DHFS
• It will compute Intra-pool dynamic slot allocation first, and then Inter-pool dynamic slot allocation.
13
Overview of Slot Allocation Flow
• The slot allocation flow for each pool under PD-DHFS.
14
Reduce task assignment
Map task assignment
Pending map tasks and idle
map slots?
Pending reduce tasks
and idle reduce slots?
Pending map tasks?
Pending reduce tasks?
(4)
(2)
(1)
(3)
Yes
NoYes
NoYes
Yes
No
OutLine
• Background & Motivations• DHFS• Evaluation• Conclusion
15
Experimental Setup
• Enviroments
A Hadoop cluster consisting of 10 nodes, each with two Intel X5675 CPUs, 24GB memory and 56GB hard disks.
• Workloads
Tested Workload.
It is a mix of three representative applications, WordCount, Sort, Grep, with Wikipedia article history dataset of different sizes, e.g., 10 GB, 20GB, 30GB, 40GB.
16
Execution Process for DHFS
17
Performance Improvement
18
Performance Improvement Under Different Percentages of Borrowed Map and Reduce Slots
19
OutLine
• Background & Motivations• DHFS• Evaluation• Conclusion
20
Conclusion
• Current static slot configuration and allocation policy can make slot utilization poor.
• Two DHFSs (PI-DHFS, PD-DHFS) are proposed to address the slot utilization problem for Hadoop Fair Scheduler.
• Experimental results show that DHFS improves the performance of MapReduce workloads significantly while guaranteeing the fairness.
• The source code of DHFS is available at:
http://sourceforge.net/projects/dhfs/
21
Acknowledgement
• This work is supported by the ”User and Domain driven data analytics as a Service framework” project under the A*STAR Thematic Strategic Research Programme (SERC Grant No. 1021580034).
• Bingsheng He was partly supported by a startup Grant of
Nanyang Technological University, Singapore.
22
23