reining in the outliers in mapreduce jobs using mantri

29
Reining in the Outliers in MapReduce Jobs using Mantri Ganesh Ananthanarayanan , Srikanth Kandula*, Albert Greenberg*, Ion Stoica , Yi Lu*, Bikas Saha*, Ed Harris* UC Berkeley * Microsoft 1

Upload: aric

Post on 05-Jan-2016

51 views

Category:

Documents


2 download

DESCRIPTION

Reining in the Outliers in MapReduce Jobs using Mantri. Ganesh Ananthanarayanan † , Srikanth Kandula*, Albert Greenberg*, Ion Stoica † , Yi Lu*, Bikas Saha*, Ed Harris* † UC Berkeley * Microsoft. MapReduce Jobs. Basis of analytics in modern Internet services - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Reining in the Outliers in MapReduce  Jobs using  Mantri

1

Reining in the Outliers in MapReduce Jobs

using Mantri

Ganesh Ananthanarayanan†, Srikanth Kandula*, Albert Greenberg*, Ion Stoica†, Yi Lu*, Bikas Saha*,

Ed Harris*

† UC Berkeley * Microsoft

Page 2: Reining in the Outliers in MapReduce  Jobs using  Mantri

2

MapReduce JobsBasis of analytics in modern Internet

services◦E.g., Dryad, Hadoop

Job {Phase} {Task}

Graph flow consists of pipelines as well as strict blocks

Page 3: Reining in the Outliers in MapReduce  Jobs using  Mantri

3

Example Dryad Job Graph

EXTRACT

AGGREGATE_PARTITION

FULL_AGGREGATE

PROCESS

COMBINE

PROCESS

Distr. File System

Distr. File System

Phase

Pipeline

Blocked untilinput is done

Map.1

Reduce.1

Map.2

Reduce.2

Join

EXTRACT

AGGREGATE_PARTITION

FULL_AGGREGATE

Distr. File System

Page 4: Reining in the Outliers in MapReduce  Jobs using  Mantri

4

Log Analysis from ProductionLogs from production cluster with

thousands of machines, sampled over six months

10,000+ jobs, 80PB of data, 4PB network transfers◦Task-level details◦Production and experimental jobs

Page 5: Reining in the Outliers in MapReduce  Jobs using  Mantri

5

Outliers hurt!Tasks that run longer than the rest in the

phase

Median phase has 10% outliers, running for >10x longer

Slow down jobs by 35% at median

Operational Inefficiency◦Unpredictability in completion times affect

SLAs◦Hurts development productivity◦Wastes compute-cycles

Page 6: Reining in the Outliers in MapReduce  Jobs using  Mantri

6

Why do outliers occur?

Mantri: A system that mitigates outliers based on root-cause

analysis

Input Unavaila

ble

Read Input

Execute

Network Congesti

on

Local Contentio

n

Workload

Imbalance

Page 7: Reining in the Outliers in MapReduce  Jobs using  Mantri

7

Mantri’s Outlier MitigationAvoid Recomputation

Network-aware Task Placement

Duplicate Outliers

Cognizant of Workload Imbalance

Page 8: Reining in the Outliers in MapReduce  Jobs using  Mantri

Recomputes: Illustration(a) Barrier phases (b) Cascading

Recomputes

InflationIdeal

Actual

Inflation

Ideal

Actual

Recompute task Normal task

8

Page 9: Reining in the Outliers in MapReduce  Jobs using  Mantri

9

What causes recomputes? [1]

Faulty machines◦Bad disks, non-persistent hardware

quirks

(4%)

Set of faulty machines varies with time, not constant

Page 10: Reining in the Outliers in MapReduce  Jobs using  Mantri

10

What causes recomputes? [2]

Transient machine load◦Recomputes correlate with machine

load◦Requests for data access dropped

Page 11: Reining in the Outliers in MapReduce  Jobs using  Mantri

11

Replicate costly outputs

Task1

Task 2

Task 3 MR3

MR2

((MR3*(1-MR2)) * T3

(MR3 * MR2) (T3+T2)

+Replicate (TRep)

TRep < TRecomp

REPLICATE

TRecomp =

MR: Recompute Probability of a machine

Recompute only Task3 or both

Task3 as well as Task2

Page 12: Reining in the Outliers in MapReduce  Jobs using  Mantri

12

Transient Failure CausesRecomputes manifest in clutchesMachine prone to cause

recomputes till the problem is fixed◦Load abates, critical process restart

etc.

Clue: At least r recomputes within t time window on a machine

Page 13: Reining in the Outliers in MapReduce  Jobs using  Mantri

13

Speculative RecomputesAnticipatorily recompute tasks

whose outputs are unread

SpeculativeRecompute

SpeculativeRecompute

(Read Fail)

Unread Data

Task

Input Data

Page 14: Reining in the Outliers in MapReduce  Jobs using  Mantri

14

Mantri’s Outlier MitigationAvoid Recomputation

◦Preferential Replication + Speculative Recomp.

Network-aware Task Placement

Duplicate Outliers

Cognizant of Workload Imbalance

Page 15: Reining in the Outliers in MapReduce  Jobs using  Mantri

Reduce TasksTasks access output of tasks from

previous phasesReduce phase (74% of total

traffic)

Reduce

Map

Network

Local

Outlier!15

Distr. File System

Page 16: Reining in the Outliers in MapReduce  Jobs using  Mantri

16

Variable Congestion

Reduce taskMap outputRack

Smart placement smoothens hotspots

Page 17: Reining in the Outliers in MapReduce  Jobs using  Mantri

17

Traffic-based Allotment

For every rack:◦d : data◦u : available uplink bandwidth ◦v : available downlink bandwidth

Goal: Minimize phase completion time

Solve for task allocation fractions, ai

Page 18: Reining in the Outliers in MapReduce  Jobs using  Mantri

18

Local Control is a good approx.

Let rack i have ai fraction of tasks◦Time uploading, Tu = di (1 - ai) / ui

◦Time downloading, Td = (D – di) ai / vi

Timei = max {Tu , Td}

Goal: Minimize phase completion timeFor every rack:◦d : data, D: data over all racks◦u : available uplink bandwidth ◦v : available downlink bandwidth

Link utilizations average out in long term, are steady on the short term

Page 19: Reining in the Outliers in MapReduce  Jobs using  Mantri

19

Mantri’s Outlier MitigationAvoid Recomputation

◦Preferential Replication + Speculative Recomp.

Network-aware Task Placement◦Traffic on link proportional to bandwidth

Duplicate Outliers

Cognizant of Workload Imbalance

Page 20: Reining in the Outliers in MapReduce  Jobs using  Mantri

20

Contentions cause outliersTasks contend for local resources

◦Processor, memory etc.

Duplicate tasks elsewhere in the cluster◦Current schemes duplicate towards end

of the phase (e.g., LATE [OSDI 2008])

Duplicate outlier or schedule pending task?

Page 21: Reining in the Outliers in MapReduce  Jobs using  Mantri

21

Resource-Aware Restart

Running task Potential restart

(tnew) nowtime

trem Save time and resources:P(c tnew < (c + 1) trem)

Continuously observe and kill wasteful copies

Page 22: Reining in the Outliers in MapReduce  Jobs using  Mantri

22

Mantri’s Outlier MitigationAvoid Recomputation

◦Preferential Replication + Speculative Recomp.

Network-aware Task Placement◦Traffic on link proportional to bandwidth

Duplicate Outliers◦Resource-Aware Restart

Cognizant of Workload Imbalance

Page 23: Reining in the Outliers in MapReduce  Jobs using  Mantri

23

Workload ImbalanceA quarter of the outlier tasks

have more data to process◦Unequal key partitions for reduce

tasksIgnoring these better than

duplication

Schedule tasks in descending order of data to process◦Time α (Data to Process)◦[Graham ‘69] At worse, 33% of

optimal

Page 24: Reining in the Outliers in MapReduce  Jobs using  Mantri

24

Mantri’s Outlier MitigationAvoid Recomputation

◦Preferential Replication + Speculative Recomp.

Network-aware Task Placement◦Traffic on link proportional to bandwidth

Duplicate Outliers◦Resource-Aware Restart

Cognizant of Workload Imbalance◦Schedule in descending order of size

Proactive

Reactive

Predict to act early

Be resource-aware

Act based on the cause

Predict to act early

Be resource-aware

Act based on the cause

Page 25: Reining in the Outliers in MapReduce  Jobs using  Mantri

25

ResultsDeployed in production Bing

clusters

Trace-driven simulations◦Mimic workflow, failures, data skew◦Compare with existing and idealized

schemes

Page 26: Reining in the Outliers in MapReduce  Jobs using  Mantri

26

Jobs in the Wild

Act Early: Duplicates issued when task 42% done (77% for Dryad)

Light: Issues fewer copies (.47X as many as Dryad)

Accurate: 2.8x higher success rate of copies

Jobs faster by 32% at median, consuming lesser resources

Jobs faster by 32% at median, consuming lesser resources

Page 27: Reining in the Outliers in MapReduce  Jobs using  Mantri

27

Recomputation Avoidance

Eliminates most recomputes with minimal extra resources

(Replication + Speculation) work well in tandem

Page 28: Reining in the Outliers in MapReduce  Jobs using  Mantri

28

Network-Aware Placement

Mantri well-approximates the ideal

Bandwidth approximations

Page 29: Reining in the Outliers in MapReduce  Jobs using  Mantri

29

SummaryFrom measurements in a production

cluster, ◦Outliers are a significant problem◦Are due to an interplay between storage,

network and map-reduce

Mantri, a cause-, resource-aware mitigation

Deployment shows encouraging results

“Reining in the Outliers in MapReduce Clusters using Mantri”, USENIX OSDI 2010