reining in the outliers in mapreduce jobs using mantri

Reining in the Outliers in MapReduce Jobs

using Mantri

Ganesh Ananthanarayanan†, Srikanth Kandula*, Albert Greenberg*, Ion Stoica†, Yi Lu*, Bikas Saha*,

Ed Harris*

† UC Berkeley * Microsoft

MapReduce JobsBasis of analytics in modern Internet

services◦E.g., Dryad, Hadoop

Job {Phase} {Task}

Graph flow consists of pipelines as well as strict blocks

Example Dryad Job Graph

EXTRACT

AGGREGATE_PARTITION

FULL_AGGREGATE

PROCESS

COMBINE

PROCESS

Distr. File System

Pipeline

Blocked untilinput is done

Reduce.1

Reduce.2

EXTRACT

AGGREGATE_PARTITION

FULL_AGGREGATE

Distr. File System

Log Analysis from ProductionLogs from production cluster with

thousands of machines, sampled over six months

10,000+ jobs, 80PB of data, 4PB network transfers◦Task-level details◦Production and experimental jobs

Outliers hurt!Tasks that run longer than the rest in the

Median phase has 10% outliers, running for >10x longer

Slow down jobs by 35% at median

Operational Inefficiency◦Unpredictability in completion times affect

SLAs◦Hurts development productivity◦Wastes compute-cycles

Why do outliers occur?

Mantri: A system that mitigates outliers based on root-cause

analysis

Input Unavaila

Read Input

Execute

Network Congesti

Local Contentio

Workload

Imbalance

Mantri’s Outlier MitigationAvoid Recomputation

Network-aware Task Placement

Duplicate Outliers

Cognizant of Workload Imbalance

Recomputes: Illustration(a) Barrier phases (b) Cascading

Recomputes

InflationIdeal

Actual

Inflation

Actual

Recompute task Normal task

What causes recomputes? [1]

Faulty machines◦Bad disks, non-persistent hardware

quirks

Set of faulty machines varies with time, not constant

What causes recomputes? [2]

Transient machine load◦Recomputes correlate with machine

load◦Requests for data access dropped

Replicate costly outputs

Task 2

Task 3 MR3

((MR3*(1-MR2)) * T3

(MR3 * MR2) (T3+T2)

+Replicate (TRep)

TRep < TRecomp

REPLICATE

TRecomp =

MR: Recompute Probability of a machine

Recompute only Task3 or both

Task3 as well as Task2

Transient Failure CausesRecomputes manifest in clutchesMachine prone to cause

recomputes till the problem is fixed◦Load abates, critical process restart

Clue: At least r recomputes within t time window on a machine

Speculative RecomputesAnticipatorily recompute tasks

whose outputs are unread

SpeculativeRecompute

(Read Fail)

Unread Data

Input Data

◦Preferential Replication + Speculative Recomp.

Network-aware Task Placement

Duplicate Outliers

Reduce TasksTasks access output of tasks from

previous phasesReduce phase (74% of total

traffic)

Reduce

Network

Outlier!15

Distr. File System

Variable Congestion

Reduce taskMap outputRack

Smart placement smoothens hotspots

Traffic-based Allotment

For every rack:◦d : data◦u : available uplink bandwidth ◦v : available downlink bandwidth

Goal: Minimize phase completion time

Solve for task allocation fractions, ai

Local Control is a good approx.

Let rack i have ai fraction of tasks◦Time uploading, Tu = di (1 - ai) / ui

◦Time downloading, Td = (D – di) ai / vi

Timei = max {Tu , Td}

Goal: Minimize phase completion timeFor every rack:◦d : data, D: data over all racks◦u : available uplink bandwidth ◦v : available downlink bandwidth

Link utilizations average out in long term, are steady on the short term

Network-aware Task Placement◦Traffic on link proportional to bandwidth

Duplicate Outliers

Contentions cause outliersTasks contend for local resources

◦Processor, memory etc.

Duplicate tasks elsewhere in the cluster◦Current schemes duplicate towards end

of the phase (e.g., LATE [OSDI 2008])

Duplicate outlier or schedule pending task?

Resource-Aware Restart

Running task Potential restart

(tnew) nowtime

trem Save time and resources:P(c tnew < (c + 1) trem)

Continuously observe and kill wasteful copies

Duplicate Outliers◦Resource-Aware Restart

Workload ImbalanceA quarter of the outlier tasks

have more data to process◦Unequal key partitions for reduce

tasksIgnoring these better than

duplication

Schedule tasks in descending order of data to process◦Time α (Data to Process)◦[Graham ‘69] At worse, 33% of

optimal

Duplicate Outliers◦Resource-Aware Restart

Cognizant of Workload Imbalance◦Schedule in descending order of size

Proactive

Reactive

Predict to act early

Be resource-aware

Act based on the cause

Predict to act early

Be resource-aware

Act based on the cause

ResultsDeployed in production Bing

clusters

Trace-driven simulations◦Mimic workflow, failures, data skew◦Compare with existing and idealized

schemes

Jobs in the Wild

Act Early: Duplicates issued when task 42% done (77% for Dryad)

Light: Issues fewer copies (.47X as many as Dryad)

Accurate: 2.8x higher success rate of copies

Jobs faster by 32% at median, consuming lesser resources

Recomputation Avoidance

Eliminates most recomputes with minimal extra resources

(Replication + Speculation) work well in tandem

Network-Aware Placement

Mantri well-approximates the ideal

Bandwidth approximations

SummaryFrom measurements in a production

cluster, ◦Outliers are a significant problem◦Are due to an interplay between storage,

network and map-reduce

Mantri, a cause-, resource-aware mitigation

Deployment shows encouraging results

“Reining in the Outliers in MapReduce Clusters using Mantri”, USENIX OSDI 2010

reining in the outliers in mapreduce jobs using mantri

phase median phase

mr3 mr2 mr3

phase completion times

t3 mr3

data access

experimental jobs

pb of data

previous phasesreduce

Documents

dm outliers

reining in online influencers

the mechcanics of reining 1

issues paper: reining in china’s technology giants

najyrc reining booklet

outliers and influential data points. no outliers?

aha 2014 arabian reining

high roller reining classic - about

reining in the outliers in map-reduce clusters using … in...

editors choice: reining - 2005

nrha rules for judging (reining)

outliers 1

gwrha 2014 low roller reining classic

intermountion reining horse assoc. irha noel skinner...

outliers chapter 5.3 data screening. outliers can bias a...

reining in growth of health spending

filters & outliers

mantri vantage by mantri developers

reining in remedies in patent litigation: three

reining in regulation