table of contentstable of contents overview scheduling in hadoop heterogeneity in hadoop the...
TRANSCRIPT
![Page 1: Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End)](https://reader035.vdocument.in/reader035/viewer/2022070323/56649d985503460f94a83574/html5/thumbnails/1.jpg)
SAMR: A Self-adaptive MapReduce Scheduling Algorithm
In Heterogeneous Environment
Quan Chen Daqiang Zhang Minyi
Guo Qianni DengDepartment of Computer Science
Shanghai Jiao Tong University, Shanghai, China
Song GuoSchool of Computer Science and
Engineering,The University of Aizu, Japan
Presented by Xiaoyu Sun
Authors
![Page 2: Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End)](https://reader035.vdocument.in/reader035/viewer/2022070323/56649d985503460f94a83574/html5/thumbnails/2.jpg)
Table of Contents
Overview
Scheduling in Hadoop
Heterogeneity in Hadoop
The LATE Scheduler(Longest Approximate Time to End)
The SAMR(A Self-adaptive MapReduce Scheduling Algorithm) Scheduler
Experiment
Conclusion
![Page 3: Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End)](https://reader035.vdocument.in/reader035/viewer/2022070323/56649d985503460f94a83574/html5/thumbnails/3.jpg)
Overview User
Program
Worker
Worker
Master
Worker
Worker
Worker
fork fork fork
assignmap
assignreduce
readlocalwrite
remoteread,sort
OutputFile 0
OutputFile 1
write
Split 0Split 1Split 2
Input Data
![Page 4: Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End)](https://reader035.vdocument.in/reader035/viewer/2022070323/56649d985503460f94a83574/html5/thumbnails/4.jpg)
The Map Step
vk
k v
k v
mapvk
vk
…
k vmap
Inputkey-value pairs
Intermediatekey-value pairs
…
k v
![Page 5: Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End)](https://reader035.vdocument.in/reader035/viewer/2022070323/56649d985503460f94a83574/html5/thumbnails/5.jpg)
The Reduce Step
k v
…
k v
k v
k v
Intermediatekey-value pairs
group
reduce
reduce
k v
k v
k v
…
k v
…
k v
k v v
v v
Key-value groups Output key-value pairs
![Page 6: Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End)](https://reader035.vdocument.in/reader035/viewer/2022070323/56649d985503460f94a83574/html5/thumbnails/6.jpg)
Overview
Google has noted that speculative execution improves response time by 44%
The paper shows an efficient way to do speculative execution in order to maximize performance
It also shows that Hadoop’s simple speculative algorithm based on comparing each task’s progress to the average progress brakes down in heterogeneous systems
![Page 7: Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End)](https://reader035.vdocument.in/reader035/viewer/2022070323/56649d985503460f94a83574/html5/thumbnails/7.jpg)
Overview
The proposed scheduling algorithm increases Hadoop’s response time
The paper addresses two important problems in speculative execution: Choosing the best node to run the speculative
task Distinguishing between nodes slightly slower than
the mean and stragglers
![Page 8: Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End)](https://reader035.vdocument.in/reader035/viewer/2022070323/56649d985503460f94a83574/html5/thumbnails/8.jpg)
Scheduling in Hadoop
Assumptions made by Hadoop Scheduler:
Nodes can perform work at roughly the same rate
Tasks progress at a constant rate throughout time
![Page 9: Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End)](https://reader035.vdocument.in/reader035/viewer/2022070323/56649d985503460f94a83574/html5/thumbnails/9.jpg)
Scheduling in Hadoop
R1:1/3
• Copy data
R2:1/3
• Order
M1:1
• Execute map function
M2:0
• Reorder intermediate results
Reduce Task
Map Task
![Page 10: Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End)](https://reader035.vdocument.in/reader035/viewer/2022070323/56649d985503460f94a83574/html5/thumbnails/10.jpg)
Scheduling in Hadoop
![Page 11: Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End)](https://reader035.vdocument.in/reader035/viewer/2022070323/56649d985503460f94a83574/html5/thumbnails/11.jpg)
Scheduling in Hadoop
• Copy• 1/3
Done• Sort• 1/3
Done• Merge• 1/4
Processing
• Copy• 1/3
Done• Sort• 1/3
Done• Merge• 1/4
Processing
• Copy• 1/3
Done• Sort• 1/5
Done Processing
11/12
11/12
Task1
8/15
Task2
Task3X
If Average PS is 10/15
![Page 12: Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End)](https://reader035.vdocument.in/reader035/viewer/2022070323/56649d985503460f94a83574/html5/thumbnails/12.jpg)
Scheduling in Hadoop
• Copy• 1/3
Done• Sort• 1/3
Done• Merge• 1/4
Processing
• Copy• 1/3
Done• Sort• 1/3
Done• Merge• 1/4
Processing
• Copy• 1/3
Done• Sort• 1/5
Done• Merge• wating
Processing
20s
Task1
Task2
Task3X
11/12
11/12
60s
8/1540s
![Page 13: Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End)](https://reader035.vdocument.in/reader035/viewer/2022070323/56649d985503460f94a83574/html5/thumbnails/13.jpg)
Scheduling in Hadoop
• Copy• 1/3
Done• Sort• 1/4
Done• Merge• waiting
Processing
• Copy• 1/3
Done• Sort• 1/12
Done• Merge• wating
Processing
Task1
Task2
7/12
5/12
20s
40s
X
X
![Page 14: Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End)](https://reader035.vdocument.in/reader035/viewer/2022070323/56649d985503460f94a83574/html5/thumbnails/14.jpg)
Scheduling in Hadoop
• Copy• 1/3
Done• Sort• waiting
Done• Merge• waiting
Processing
• Copy• 1/3
Done• Sort• 1/12
Done• Merge• wating
Processing
Task1
Task2
1/3
5/12
180s
20s
X
Not Data locality
Data locality
![Page 15: Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End)](https://reader035.vdocument.in/reader035/viewer/2022070323/56649d985503460f94a83574/html5/thumbnails/15.jpg)
The LATE Scheduler
![Page 16: Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End)](https://reader035.vdocument.in/reader035/viewer/2022070323/56649d985503460f94a83574/html5/thumbnails/16.jpg)
The LATE Scheduler
R1:1/3
• Copy data
R2:1/3
• Order
M1:1
• Execute map function
M2:0
• Reorder intermediate results
Reduce Task
Map Task
![Page 17: Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End)](https://reader035.vdocument.in/reader035/viewer/2022070323/56649d985503460f94a83574/html5/thumbnails/17.jpg)
The LATE Scheduler
• Copy• 1/3
Done• Sort• 1/3
Done• Merge• 1/4
Processing
• Copy• 1/3
Done• Sort• 1/4
Done• Merge• waiting
Processing
40s
30s
Task1
Task2
X 11/12
7/12
![Page 18: Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End)](https://reader035.vdocument.in/reader035/viewer/2022070323/56649d985503460f94a83574/html5/thumbnails/18.jpg)
The LATE Scheduler
• Copy• 1/3
Done• Sort• waiting
Done• Merge• waiting
Processing
• Copy• 1/3
Done• Sort• 1/12
Done• Merge• wating
Processing
Task1
Task2
1/3
5/12
180s
20s
X
Not Data locality
Data locality
![Page 19: Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End)](https://reader035.vdocument.in/reader035/viewer/2022070323/56649d985503460f94a83574/html5/thumbnails/19.jpg)
The LATE Scheduler
In order to get the best chance to beat the original task which was speculated the algorithm launches speculative tasks only on fast nodes
It does this using a SlowNodeThreshold which is a metric of the total work performed
Because speculative tasks cost resources LATE uses two additional heuristics:
A limit on the number of speculative tasks executed (SpeculativeCap)
A SlowTaskThreshold that determines if a task is slow enough in order to get speculated (uses progress rate for comparison)
![Page 20: Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End)](https://reader035.vdocument.in/reader035/viewer/2022070323/56649d985503460f94a83574/html5/thumbnails/20.jpg)
The SAMR Scheduler
R1: ?
• Copy data
R2:?
• Order
M1:?
• Execute map function
M2:?
• Reorder intermediate results
Reduce Task
Map Task
![Page 21: Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End)](https://reader035.vdocument.in/reader035/viewer/2022070323/56649d985503460f94a83574/html5/thumbnails/21.jpg)
The SAMR Scheduler
The way to use and update historical information
![Page 22: Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End)](https://reader035.vdocument.in/reader035/viewer/2022070323/56649d985503460f94a83574/html5/thumbnails/22.jpg)
The SAMR Scheduler
SLOW_TASK_CAP (STaC)
![Page 23: Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End)](https://reader035.vdocument.in/reader035/viewer/2022070323/56649d985503460f94a83574/html5/thumbnails/23.jpg)
The SAMR Scheduler
SLOW_TRACKER_CAP (STrC)
![Page 24: Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End)](https://reader035.vdocument.in/reader035/viewer/2022070323/56649d985503460f94a83574/html5/thumbnails/24.jpg)
The SAMR Scheduler
![Page 25: Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End)](https://reader035.vdocument.in/reader035/viewer/2022070323/56649d985503460f94a83574/html5/thumbnails/25.jpg)
The SAMR Scheduler
SLOW_TRACKER_PRO (STrP)
SlowTrackerNum< STrP*TrackerNum (14)
![Page 26: Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End)](https://reader035.vdocument.in/reader035/viewer/2022070323/56649d985503460f94a83574/html5/thumbnails/26.jpg)
The SAMR Scheduler
Launching backup tasks
BackupNum <BP(Backup Pro) * TaskNum (15)
![Page 27: Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End)](https://reader035.vdocument.in/reader035/viewer/2022070323/56649d985503460f94a83574/html5/thumbnails/27.jpg)
The SAMR Scheduler
![Page 28: Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End)](https://reader035.vdocument.in/reader035/viewer/2022070323/56649d985503460f94a83574/html5/thumbnails/28.jpg)
The SAMR Scheduler
![Page 29: Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End)](https://reader035.vdocument.in/reader035/viewer/2022070323/56649d985503460f94a83574/html5/thumbnails/29.jpg)
Experiment
Affection of “HP” on the execute time
![Page 30: Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End)](https://reader035.vdocument.in/reader035/viewer/2022070323/56649d985503460f94a83574/html5/thumbnails/30.jpg)
Experiment
Affection of “STac”,”STrC”, and “STrP” on the execute time
![Page 31: Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End)](https://reader035.vdocument.in/reader035/viewer/2022070323/56649d985503460f94a83574/html5/thumbnails/31.jpg)
Experiment
Affection of “BP” on the execute time
![Page 32: Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End)](https://reader035.vdocument.in/reader035/viewer/2022070323/56649d985503460f94a83574/html5/thumbnails/32.jpg)
Experiment
Historical information and Real information on all 8 nodes
![Page 33: Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End)](https://reader035.vdocument.in/reader035/viewer/2022070323/56649d985503460f94a83574/html5/thumbnails/33.jpg)
Experiment
HP=0.2
STaC=0.3
STrC=0.2
STrP=0.3
and BP=0.2
![Page 34: Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End)](https://reader035.vdocument.in/reader035/viewer/2022070323/56649d985503460f94a83574/html5/thumbnails/34.jpg)
Experiment
The execute results of “Sort” running on the experiment platform.
![Page 35: Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End)](https://reader035.vdocument.in/reader035/viewer/2022070323/56649d985503460f94a83574/html5/thumbnails/35.jpg)
Experiment
LATE decreases about 7% execute time
LATE using historical information decrease about 15% execute time
SAMR decreases about 24% execute time compared to Hadoop
![Page 36: Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End)](https://reader035.vdocument.in/reader035/viewer/2022070323/56649d985503460f94a83574/html5/thumbnails/36.jpg)
Conclusion
Identify the problem in Hadoop’s scheduler
Compare two schedulers for improving the performance of MapReduce in heterogeneous environment
How to improve the performance of SAMR
![Page 37: Table of ContentsTable of Contents Overview Scheduling in Hadoop Heterogeneity in Hadoop The LATE Scheduler(Longest Approximate Time to End)](https://reader035.vdocument.in/reader035/viewer/2022070323/56649d985503460f94a83574/html5/thumbnails/37.jpg)
Thanks