![Page 1: Improving MapReduce Performance in Heterogeneous … · University of California at Berkeley . Motivation 1. MapReduce becoming popular – Open-source implementation, Hadoop, used](https://reader033.vdocument.in/reader033/viewer/2022050523/5fa6dddd045af646ba0ceb72/html5/thumbnails/1.jpg)
UC Berkeley
Improving MapReduce Performance in Heterogeneous Environments
Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica
University of California at Berkeley
![Page 2: Improving MapReduce Performance in Heterogeneous … · University of California at Berkeley . Motivation 1. MapReduce becoming popular – Open-source implementation, Hadoop, used](https://reader033.vdocument.in/reader033/viewer/2022050523/5fa6dddd045af646ba0ceb72/html5/thumbnails/2.jpg)
Motivation
1. MapReduce becoming popular – Open-source implementation, Hadoop, used
by Yahoo!, Facebook, Last.fm, … – Scale: 20 PB/day at Google, O(10,000) nodes
at Yahoo, 3000 jobs/day at Facebook
![Page 3: Improving MapReduce Performance in Heterogeneous … · University of California at Berkeley . Motivation 1. MapReduce becoming popular – Open-source implementation, Hadoop, used](https://reader033.vdocument.in/reader033/viewer/2022050523/5fa6dddd045af646ba0ceb72/html5/thumbnails/3.jpg)
Motivation
2. Utility computing services like Amazon Elastic Compute Cloud (EC2) provide cheap on-demand computing
– Price: 10 cents / VM / hour – Scale: thousands of VMs – Caveat: less control over performance
![Page 4: Improving MapReduce Performance in Heterogeneous … · University of California at Berkeley . Motivation 1. MapReduce becoming popular – Open-source implementation, Hadoop, used](https://reader033.vdocument.in/reader033/viewer/2022050523/5fa6dddd045af646ba0ceb72/html5/thumbnails/4.jpg)
Results
• Main challenge for Hadoop on EC2 was performance heterogeneity, which breaks task scheduler assumptions
• Designed new LATE scheduler that can cut response time in half
![Page 5: Improving MapReduce Performance in Heterogeneous … · University of California at Berkeley . Motivation 1. MapReduce becoming popular – Open-source implementation, Hadoop, used](https://reader033.vdocument.in/reader033/viewer/2022050523/5fa6dddd045af646ba0ceb72/html5/thumbnails/5.jpg)
Outline
1. MapReduce background
2. The problem of heterogeneity
3. LATE: a heterogeneity-aware scheduler
![Page 6: Improving MapReduce Performance in Heterogeneous … · University of California at Berkeley . Motivation 1. MapReduce becoming popular – Open-source implementation, Hadoop, used](https://reader033.vdocument.in/reader033/viewer/2022050523/5fa6dddd045af646ba0ceb72/html5/thumbnails/6.jpg)
What is MapReduce?
• Programming model to split computations into independent parallel tasks
• Hides the complexity of fault tolerance – At 10,000’s of nodes, some will fail every day
J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI 2004.
![Page 7: Improving MapReduce Performance in Heterogeneous … · University of California at Berkeley . Motivation 1. MapReduce becoming popular – Open-source implementation, Hadoop, used](https://reader033.vdocument.in/reader033/viewer/2022050523/5fa6dddd045af646ba0ceb72/html5/thumbnails/7.jpg)
Fault Tolerance in MapReduce
1. Nodes fail re-run tasks
Node 1
Node 2
How to do this in heterogeneous environment?
1. 2. Nodes slow (stragglers) run backup tasks
![Page 8: Improving MapReduce Performance in Heterogeneous … · University of California at Berkeley . Motivation 1. MapReduce becoming popular – Open-source implementation, Hadoop, used](https://reader033.vdocument.in/reader033/viewer/2022050523/5fa6dddd045af646ba0ceb72/html5/thumbnails/8.jpg)
Heterogeneity in Virtualized Environments
• VM technology isolates CPU and memory, but disk and network are shared – Full bandwidth when no contention – Equal shares when there is contention
• 2.5x performance difference
0
10
20
30
40
50
60
70
1 2 3 4 5 6 7
IO P
erfo
rman
ce p
er V
M (M
B/s
)
VMs on Physical Host
![Page 9: Improving MapReduce Performance in Heterogeneous … · University of California at Berkeley . Motivation 1. MapReduce becoming popular – Open-source implementation, Hadoop, used](https://reader033.vdocument.in/reader033/viewer/2022050523/5fa6dddd045af646ba0ceb72/html5/thumbnails/9.jpg)
Backup Tasks in Hadoop’s Default Scheduler
• Start primary tasks, then look for backups to launch as nodes become free
• Tasks report “progress score” from 0 to 1 • Launch backup if
progress < avgProgress – 0.2
![Page 10: Improving MapReduce Performance in Heterogeneous … · University of California at Berkeley . Motivation 1. MapReduce becoming popular – Open-source implementation, Hadoop, used](https://reader033.vdocument.in/reader033/viewer/2022050523/5fa6dddd045af646ba0ceb72/html5/thumbnails/10.jpg)
Problems in Heterogeneous Environment
1. Too many backups, thrashing shared resources like network bandwidth
2. Wrong tasks backed up 3. Backups may be placed on slow nodes 4. Breaks when tasks start at different times
• Example: ~80% of reduces backed up, most losing to originals; network thrashed
![Page 11: Improving MapReduce Performance in Heterogeneous … · University of California at Berkeley . Motivation 1. MapReduce becoming popular – Open-source implementation, Hadoop, used](https://reader033.vdocument.in/reader033/viewer/2022050523/5fa6dddd045af646ba0ceb72/html5/thumbnails/11.jpg)
Idea: Progress Rates
• Instead of using progress values, compute progress rates, and back up tasks that are “far enough” below the mean
• Problem: can still select the wrong tasks
![Page 12: Improving MapReduce Performance in Heterogeneous … · University of California at Berkeley . Motivation 1. MapReduce becoming popular – Open-source implementation, Hadoop, used](https://reader033.vdocument.in/reader033/viewer/2022050523/5fa6dddd045af646ba0ceb72/html5/thumbnails/12.jpg)
Progress Rate Example
Time (min)
Node 1
Node 2
Node 3
3x slower
1.9x slower
1 task/min
1 min 2 min
![Page 13: Improving MapReduce Performance in Heterogeneous … · University of California at Berkeley . Motivation 1. MapReduce becoming popular – Open-source implementation, Hadoop, used](https://reader033.vdocument.in/reader033/viewer/2022050523/5fa6dddd045af646ba0ceb72/html5/thumbnails/13.jpg)
Progress Rate Example
Node 1
Node 2
Node 3
What if the job had 5 tasks?
time left: 1.8 min
2 min
Time (min)
Node 2 is slowest, but should back up Node 3’s task!
time left: 1 min
![Page 14: Improving MapReduce Performance in Heterogeneous … · University of California at Berkeley . Motivation 1. MapReduce becoming popular – Open-source implementation, Hadoop, used](https://reader033.vdocument.in/reader033/viewer/2022050523/5fa6dddd045af646ba0ceb72/html5/thumbnails/14.jpg)
Our Scheduler: LATE
• Insight: back up the task with the largest estimated finish time – “Longest Approximate Time to End” – Look forward instead of looking backward
• Sanity thresholds: – Cap number of backup tasks – Launch backups on fast nodes – Only back up tasks that are sufficiently slow
![Page 15: Improving MapReduce Performance in Heterogeneous … · University of California at Berkeley . Motivation 1. MapReduce becoming popular – Open-source implementation, Hadoop, used](https://reader033.vdocument.in/reader033/viewer/2022050523/5fa6dddd045af646ba0ceb72/html5/thumbnails/15.jpg)
LATE Details
• Estimating finish times:
• Threshold values: – 10% cap on backups, 25th percentiles for slow
node/task – Validated by sensitivity analysis
progress score
execution time progress rate =
1 – progress score
progress rate estimated time left =
![Page 16: Improving MapReduce Performance in Heterogeneous … · University of California at Berkeley . Motivation 1. MapReduce becoming popular – Open-source implementation, Hadoop, used](https://reader033.vdocument.in/reader033/viewer/2022050523/5fa6dddd045af646ba0ceb72/html5/thumbnails/16.jpg)
LATE Example
Node 1
Node 2
Node 3
2 min
Time (min)
Progress = 5.3%
Estimated time left: (1-0.66) / (1/3) = 1
Estimated time left: (1-0.05) / (1/1.9) = 1.8
Progress = 66%
LATE correctly picks Node 3
![Page 17: Improving MapReduce Performance in Heterogeneous … · University of California at Berkeley . Motivation 1. MapReduce becoming popular – Open-source implementation, Hadoop, used](https://reader033.vdocument.in/reader033/viewer/2022050523/5fa6dddd045af646ba0ceb72/html5/thumbnails/17.jpg)
Evaluation
• Environments: – EC2 (3 job types, 200-250 nodes) – Small local testbed
• Self-contention through VM placement • Stragglers through background processes
![Page 18: Improving MapReduce Performance in Heterogeneous … · University of California at Berkeley . Motivation 1. MapReduce becoming popular – Open-source implementation, Hadoop, used](https://reader033.vdocument.in/reader033/viewer/2022050523/5fa6dddd045af646ba0ceb72/html5/thumbnails/18.jpg)
EC2 Sort with Stragglers
• Average 58% speedup over native, 220% over no backups • 93% max speedup over native
0.0
0.5
1.0
1.5
2.0
2.5
Worst Best Average
Nor
mal
ized
Res
pons
e Ti
me
No Backups Hadoop Native LATE Scheduler
![Page 19: Improving MapReduce Performance in Heterogeneous … · University of California at Berkeley . Motivation 1. MapReduce becoming popular – Open-source implementation, Hadoop, used](https://reader033.vdocument.in/reader033/viewer/2022050523/5fa6dddd045af646ba0ceb72/html5/thumbnails/19.jpg)
EC2 Sort without Stragglers
• Average 27% speedup over native, 31% over no backups
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Worst Best Average
Nor
mal
ized
Res
pons
e Ti
me
No Backups Hadoop Native LATE Scheduler
![Page 20: Improving MapReduce Performance in Heterogeneous … · University of California at Berkeley . Motivation 1. MapReduce becoming popular – Open-source implementation, Hadoop, used](https://reader033.vdocument.in/reader033/viewer/2022050523/5fa6dddd045af646ba0ceb72/html5/thumbnails/20.jpg)
Conclusion
• Heterogeneity is a challenge for parallel apps, and is growing more important
• Lessons: – Back up tasks which hurt response time most – Mind shared resources
• 2x improvement using simple algorithm
![Page 21: Improving MapReduce Performance in Heterogeneous … · University of California at Berkeley . Motivation 1. MapReduce becoming popular – Open-source implementation, Hadoop, used](https://reader033.vdocument.in/reader033/viewer/2022050523/5fa6dddd045af646ba0ceb72/html5/thumbnails/21.jpg)
Questions?
? ? ?
![Page 22: Improving MapReduce Performance in Heterogeneous … · University of California at Berkeley . Motivation 1. MapReduce becoming popular – Open-source implementation, Hadoop, used](https://reader033.vdocument.in/reader033/viewer/2022050523/5fa6dddd045af646ba0ceb72/html5/thumbnails/22.jpg)
EC2 Grep and Wordcount
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Worst Best Average
Nor
mal
ized
Res
pons
e Ti
me
No Backups Hadoop Native LATE Scheduler
0
0.5
1
1.5
2
2.5
3
Worst Best Average
Nor
mal
ized
Res
pons
e Ti
me
No Backups Hadoop Native LATE Scheduler
Grep WordCount
• 36% gain over native • 57% gain over no backups
• 8.5% gain over native • 179% gain over no backups