dynamic slot allocation technique for mapreduce clusters school of computer engineering nanyang...

23
Dynamic Slot Allocation Technique for MapReduce Clusters School of Computer Engineering Nanyang Technological University 25 th Sept 2013 Shanjiang Tang, Bu-Sung Lee, Bingsheng He 1

Upload: brandon-mcdowell

Post on 18-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dynamic Slot Allocation Technique for MapReduce Clusters School of Computer Engineering Nanyang Technological University 25 th Sept 2013 Shanjiang Tang,

Dynamic Slot Allocation Technique for MapReduce Clusters

School of Computer Engineering

Nanyang Technological University

25th Sept 2013

Shanjiang Tang, Bu-Sung Lee, Bingsheng He

1

Page 2: Dynamic Slot Allocation Technique for MapReduce Clusters School of Computer Engineering Nanyang Technological University 25 th Sept 2013 Shanjiang Tang,

OutLine

• Background & Motivations• DHFS• Evaluation• Conclusion

2

Page 3: Dynamic Slot Allocation Technique for MapReduce Clusters School of Computer Engineering Nanyang Technological University 25 th Sept 2013 Shanjiang Tang,

MapReduce Computation Model

Map Intermediate

Result

Intermediate

Result

Intermediate

Result

Intermediate

Result

Map

Map

Map

ReduceOutputResult

ReduceOutputResult

ReduceOutputResult

ReduceOutputResult

FinalResult

Map-Phase Computation

Reduce-Phase Computation

InputData

3

Page 4: Dynamic Slot Allocation Technique for MapReduce Clusters School of Computer Engineering Nanyang Technological University 25 th Sept 2013 Shanjiang Tang,

Hadoop Execution Model

• Hadoop is an open-source implementation of MapReduce Model.

• The cluster computation resources are divided into map slots and reduce slots, which are configured by Hadoop administrator in advance.

• A MapReduce job generally consists of map tasks and reduce tasks.

• Map tasks have to be allocated with map slots, and reduce tasks have to be allocated with reduce slots.

4

Page 5: Dynamic Slot Allocation Technique for MapReduce Clusters School of Computer Engineering Nanyang Technological University 25 th Sept 2013 Shanjiang Tang,

Hadoop Execution Model

5

Map slots Reduce slots

Map tasks start before reduce tasks

Map tasks can only run on map slots, reduce tasks can only run on reduce slots

Implication: Slots utilization can be poor for MapReduce Workloads under the current static slot configuration and allocation policy!!!

Page 6: Dynamic Slot Allocation Technique for MapReduce Clusters School of Computer Engineering Nanyang Technological University 25 th Sept 2013 Shanjiang Tang,

Our Goals

• To maximize the slot resource utilization for Hadoop cluster without any prior knowledge or assumption about MapReduce jobs.

• In other words, we want to achieve that at any time there should be no idle map/reduce slots available when there are pending tasks, i.e., trying to make slots as busy as possible.

• Our work focuses on Hadoop Fair Scheduler, i.e., improving the performance while guaranteeing fairness.

6

Page 7: Dynamic Slot Allocation Technique for MapReduce Clusters School of Computer Engineering Nanyang Technological University 25 th Sept 2013 Shanjiang Tang,

OutLine

• Background & Motivations• DHFS• Evaluation• Conclusion

7

Page 8: Dynamic Slot Allocation Technique for MapReduce Clusters School of Computer Engineering Nanyang Technological University 25 th Sept 2013 Shanjiang Tang,

Our Approach

• We propose a dynamic slot allocation technique by breaking the existing slot allocation constrain:

1). Slots are generic and can be used by map and reduce tasks.

2). Map Tasks prefer to use map slots and likewise reduce tasks

prefer to use reduce slots.

• In other words,

Case 1: , no slot borrow is needed.

Case 2: , borrow reduce slots for map tasks.

Case 3: , borrow map slots for reduce tasks.

Case 4: , no slot borrow is needed.

8

,M M R RN S N S ,M M R RN S N S

,M M R RN S N S

,M M R RN S N S

Page 9: Dynamic Slot Allocation Technique for MapReduce Clusters School of Computer Engineering Nanyang Technological University 25 th Sept 2013 Shanjiang Tang,

Dynamic Hadoop Fair Scheduler (DHFS)

• We provide two types of DHFS, based on different levels of fairness. Pool-Independent DHFS (PI-DHFS) Pool-dependent DHFS (PD-DHFS)

• Each MapReduce pool consists of two sub-pools: Map-phase pool Reduce-phase pool

9

Page 10: Dynamic Slot Allocation Technique for MapReduce Clusters School of Computer Engineering Nanyang Technological University 25 th Sept 2013 Shanjiang Tang,

PI-DHFS

• It’s subject to the ‘fairness’ concept of default Hadoop Fair Scheduler, i.e., fair share is done across phase-pools within a phase.

• The dynamic allocation process consists of two parts: Intra-phase dynamic slot allocation Inter-phase dynamic slot allocation

10

Page 11: Dynamic Slot Allocation Technique for MapReduce Clusters School of Computer Engineering Nanyang Technological University 25 th Sept 2013 Shanjiang Tang,

PI-DHFS

• It will compute Intra-phase dynamic slot allocation first, and then Inter-phase dynamic slot allocation.

11

Page 12: Dynamic Slot Allocation Technique for MapReduce Clusters School of Computer Engineering Nanyang Technological University 25 th Sept 2013 Shanjiang Tang,

PD-DHFS

• Fair share is done across pools, instead of phase.

• The dynamic allocation process consists of two parts: Intra-pool dynamic slot allocation Inter-pool dynamic slot allocation

12

Page 13: Dynamic Slot Allocation Technique for MapReduce Clusters School of Computer Engineering Nanyang Technological University 25 th Sept 2013 Shanjiang Tang,

PD-DHFS

• It will compute Intra-pool dynamic slot allocation first, and then Inter-pool dynamic slot allocation.

13

Page 14: Dynamic Slot Allocation Technique for MapReduce Clusters School of Computer Engineering Nanyang Technological University 25 th Sept 2013 Shanjiang Tang,

Overview of Slot Allocation Flow

• The slot allocation flow for each pool under PD-DHFS.

14

Reduce task assignment

Map task assignment

Pending map tasks and idle

map slots?

Pending reduce tasks

and idle reduce slots?

Pending map tasks?

Pending reduce tasks?

(4)

(2)

(1)

(3)

Yes

NoYes

NoYes

Yes

No

Page 15: Dynamic Slot Allocation Technique for MapReduce Clusters School of Computer Engineering Nanyang Technological University 25 th Sept 2013 Shanjiang Tang,

OutLine

• Background & Motivations• DHFS• Evaluation• Conclusion

15

Page 16: Dynamic Slot Allocation Technique for MapReduce Clusters School of Computer Engineering Nanyang Technological University 25 th Sept 2013 Shanjiang Tang,

Experimental Setup

• Enviroments

A Hadoop cluster consisting of 10 nodes, each with two Intel X5675 CPUs, 24GB memory and 56GB hard disks.

• Workloads

Tested Workload.

It is a mix of three representative applications, WordCount, Sort, Grep, with Wikipedia article history dataset of different sizes, e.g., 10 GB, 20GB, 30GB, 40GB.

16

Page 17: Dynamic Slot Allocation Technique for MapReduce Clusters School of Computer Engineering Nanyang Technological University 25 th Sept 2013 Shanjiang Tang,

Execution Process for DHFS

17

Page 18: Dynamic Slot Allocation Technique for MapReduce Clusters School of Computer Engineering Nanyang Technological University 25 th Sept 2013 Shanjiang Tang,

Performance Improvement

18

Page 19: Dynamic Slot Allocation Technique for MapReduce Clusters School of Computer Engineering Nanyang Technological University 25 th Sept 2013 Shanjiang Tang,

Performance Improvement Under Different Percentages of Borrowed Map and Reduce Slots

19

Page 20: Dynamic Slot Allocation Technique for MapReduce Clusters School of Computer Engineering Nanyang Technological University 25 th Sept 2013 Shanjiang Tang,

OutLine

• Background & Motivations• DHFS• Evaluation• Conclusion

20

Page 21: Dynamic Slot Allocation Technique for MapReduce Clusters School of Computer Engineering Nanyang Technological University 25 th Sept 2013 Shanjiang Tang,

Conclusion

• Current static slot configuration and allocation policy can make slot utilization poor.

• Two DHFSs (PI-DHFS, PD-DHFS) are proposed to address the slot utilization problem for Hadoop Fair Scheduler.

• Experimental results show that DHFS improves the performance of MapReduce workloads significantly while guaranteeing the fairness.

• The source code of DHFS is available at:

http://sourceforge.net/projects/dhfs/

21

Page 22: Dynamic Slot Allocation Technique for MapReduce Clusters School of Computer Engineering Nanyang Technological University 25 th Sept 2013 Shanjiang Tang,

Acknowledgement

• This work is supported by the ”User and Domain driven data analytics as a Service framework” project under the A*STAR Thematic Strategic Research Programme (SERC Grant No. 1021580034).

• Bingsheng He was partly supported by a startup Grant of

Nanyang Technological University, Singapore.

22

Page 23: Dynamic Slot Allocation Technique for MapReduce Clusters School of Computer Engineering Nanyang Technological University 25 th Sept 2013 Shanjiang Tang,

23