presenters: abhishek verma, nicolas zea. map reduce clean abstraction extremely rigid 2 stage...

49
Presenters: Abhishek Verma, Nicolas Zea

Upload: daniela-mccarthy

Post on 27-Dec-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

Presenters: Abhishek Verma, Nicolas Zea

Page 2: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance difficult

Google → MapReduce, Sawzall Yahoo → Hadoop, Pig Latin Microsoft → Dryad, DryadLINQ Improving MapReduce in heterogeneous

environment

Page 3: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

k1 v1

k2 v2

k1 v3

k2 v4

k1 v5

map

k1 v1

k1 v3

k1 v5

k2 v2

k2 v4

Outputrecords

map reduc

e

reduce

Inputrecords

Split

Split

shuffle

k1 v1

k1 v3

k2 v2

Local QSort

k1 v5

k2 v4

Page 4: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

Extremely rigid data flow Other flows hacked in

Stages Joins Splits Common operations must be coded by hand

Join, filter, projection, aggregates, sorting,distinct Semantics hidden inside map-reduce fns

Difficult to maintain, extend, and optimize

M R

M R M R

Page 5: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, Andrew Tomkins

Research

Page 6: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

Pigs Eat Anything Can operate on data w/o metadata : relational, nested, or

unstructured. Pigs Live Anywhere

Not tied to one particular parallel framework Pigs Are Domestic Animals

Designed to be easily controlled and modified by its users. UDFs : transformation functions, aggregates, grouping functions, and

conditionals. Pigs Fly

Processes data quickly(?)

6

Page 7: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

Dataflow language Procedural : different from SQL

Quick Start and Interoperability Nested Data Model UDFs as First-Class Citizens Parallelism Required Debugging Environment

7

Page 8: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

Data Model Atom : 'cs' Tuple: ('cs', 'ece', 'ee') Bag: { ('cs', 'ece'), ('cs')} Map: [ 'courses' → { ('523', '525', '599'}]

Expressions Fields by position $0 Fields by name f1, Map Lookup #

8

Page 9: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

Find the top 10 most visited pages in each category

URLCatego

ryPageRa

nk

cnn.com News 0.9

bbc.com News 0.8

flickr.com

Photos 0.7

espn.com Sports 0.9

Visits URL Info

User URL Time

Amy cnn.com 8:00

Amy bbc.com 10:00

Amy flickr.com 10:05

Fred cnn.com 12:00

Page 10: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

Load Visits

Group by url

Foreach urlgenerate count

Load Url Info

Join on url

Group by category

Foreach categorygenerate top10

urls

Page 11: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

visits = load ‘/data/visits’ as (user, url, time);gVisits = group visits by url;visitCounts = foreach gVisits generate url, count(visits);

urlInfo = load ‘/data/urlInfo’ as (url, category,pRank);

visitCounts = join visitCounts by url, urlInfo by url;

gCategories = group visitCounts by category;topUrls = foreach gCategories

generate top(visitCounts,10);

store topUrls into ‘/data/topUrls’;

Page 12: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

visits = load ‘/data/visits’ as (user, url, time);gVisits = group visits by url;visitCounts = foreach gVisits generate url, count(visits);

urlInfo = load ‘/data/urlInfo’ as (url, category,pRank);

visitCounts = join visitCounts by url, urlInfo by url;

gCategories = group visitCounts by category;topUrls = foreach gCategories

generate top(visitCounts,10);

store topUrls into ‘/data/topUrls’;Operates directly over files

Page 13: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

visits = load ‘/data/visits’ as (user, url, time);gVisits = group visits by url;visitCounts = foreach gVisits generate url, count(visits);

urlInfo = load ‘/data/urlInfo’ as (url, category,pRank);

visitCounts = join visitCounts by url, urlInfo by url;

gCategories = group visitCounts by category;topUrls = foreach gCategories

generate top(visitCounts,10);

store topUrls into ‘/data/topUrls’;

Schemas 0ptional can be assigned dynamically

Page 14: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

visits = load ‘/data/visits’ as (user, url, time);gVisits = group visits by url;visitCounts = foreach gVisits generate url, count(visits);

urlInfo = load ‘/data/urlInfo’ as (url, category,pRank);

visitCounts = join visitCounts by url, urlInfo by url;

gCategories = group visitCounts by category;topUrls = foreach gCategories

generate top(visitCounts,10);

store topUrls into ‘/data/topUrls’;UDFs can be used in every construct

Page 15: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

LOAD: specifying input data FOREACH: per-tuple processing FLATTEN: eliminate nesting FILTER: discarding unwanted data COGROUP: getting related data together

GROUP, JOIN STORE: asking for output Other: UNION, CROSS, ORDER, DISTINCT

15

Page 16: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance
Page 17: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

Every group or join operation forms a map-reduce

boundary

Other operations pipelined into map and reduce phases

Load Visits

Group by url

Foreach urlgenerate count

Load Url Info

Join on url

Group by category

Foreach categorygenerate top10

urls

Map1

Reduce1 Map2

Reduce2

Map3

Reduce3

Page 18: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

Write-run-debug cycle Sandbox dataset Objectives:

Realism Conciseness Completeness

Problems: UDFs

18

Page 19: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

Optional “safe” query optimizer Performs only high-confidence rewrites

User interface Boxes and arrows UI Promote collaboration, sharing code fragments

and UDFs Tight integration with a scripting language

Use loops, conditionals of host language

Page 20: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu,

Ulfar Erlingsson, Pradeep Kumar Gunda, Jon Currey

Page 21: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

Files, TCP, FIFO, NetworkFiles, TCP, FIFO, Networkjob schedule

data plane

control plane

NSNS PDPD PDPDPDPD

V V V

Job manager cluster

Page 22: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

Collection<T> collection;bool IsLegal(Key);string Hash(Key);

var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};

Page 23: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

Partition

Collection

C# objects

Partitioning: Hash, Range, RoundRobin

Apply, Fork Hints

Page 24: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

Collection<T> collection;bool IsLegal(Key k);string Hash(Key);

var results = from c in collection where IsLegal(c.key) select new { Hash(c.key),

c.value};

C#

collection

results

C# C# C#

Vertexcode

Queryplan(Dryad job)Data

Page 25: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

DryadLINQ

Client machine

(11)

Distributed query

plan

C#

Query Expr

Data center

Output TablesResults

Input Tables

Invoke Query

Output DryadTa

ble

Dryad Execution

C# Objects

JM

ToDryadTable

foreach

Page 26: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

LINQ expressions converted to execution plan graph (EPG)

similar to database query plan

DAG

annotated with metadata properties

EPG is skeleton of Dryad DFG

as long as native operations are used, properties can propagate helping optimization

Page 27: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

Pipelining

Multiple operations in a single process

Removing redundancy

Eager Aggregation

Move aggregations in front of partitionings

I/O Reduction

Try to use TCP and in-memory FIFO instead of disk space

Page 28: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

As information from job becomes available, mutate execution graph Dataset size based

decisions▪ Intelligent

partitioning of data

Page 29: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

Aggregation can turn into tree to improve I/O based on locality Example if part of

computation is done locally, then aggregated before being sent across network

Page 30: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

TeraSort - scalability

240 computer cluster of 2.6Ghz dual core AMD Opterons

Sort 10 billion 100-byte records on 10-byte key

Each computer stores 3.87 GBs

Page 31: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

DryadLINQ vs Dryad - SkyServer

Dryad is hand optimized

No dynamic optimization overhead

DryadLINQ is 10% native code

Page 32: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

High level and data type transparent

Automatic optimization friendly

Manual optimizations using Apply operator

Leverage any system running LINQ framework

Support for interacting with SQL databases

Single computer debugging made easy

Strong typing, narrow interface

Deterministic replay execution

Page 33: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

Dynamic optimizations appear data intensive What kind of overhead?

EPG analysis overhead -> high latency No real comparison with other systems Progress tracking is difficult

No speculation Will Solid State Drives diminish advantages of MapReduce? Why not use Parallel Databases? MapReduce Vs Dryad How different from Sawzall and Pig?

Page 34: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

Language Sawzall Pig Latin DryadLINQ

Built by Google Yahoo Microsoft

Programming Imperative ImperativeImperative & Declarative

Hybrid

Resemblance to SQL

Least Moderate Most

Execution EngineGoogle

MapReduceHadoop Dryad

Performance * Very Efficient5-10 times

slower1.3-2 times

slower

ImplementationInternal, inside

Google

Open Source Apache-License

Internal, inside Microsoft

ModelOperate per

recordSequence of

MRDAGs

Usage Log Analysis+ Machine Learning

+ Iterative computations

Page 35: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica

University of California at Berkeley

Page 36: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

Speculative tasks executed only if no failed or waiting avail. Notion of progress

3 phases of execution

1.Copy phase

2.Sort phase

3.Reduce phase Each phase weighted by % data processed

Determines whether a job failed or is a straggler and available for speculation

Page 37: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

1. Nodes can perform work at exactly the same rate

2. Tasks progress at a constant rate throughout time

3. There is no cost to launching a speculative task on an idle node

4. The three phases of execution take approximately same time

5. Tasks with a low progress score are stragglers

6. Maps and Reduces require roughly the same amount of work

Page 38: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

Virtualization breaks down homogeneity

Amazon EC2 - multiple vm’s on same physical host

Compete for memory/network bandwidth

Ex: two map tasks can compete for disk bandwidth, causing one to be a straggler

Page 39: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

Progress threshold in Hadoop is fixed and assumes low progress = faulty node Too Many speculative tasks executed Speculative execution can harm running tasks

Page 40: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

Task’s phases are not equal

Copy phase typically the most expensive due to network communication cost

Causes rapid jump from 1/3 progress to 1 of many tasks, creating fake stragglers

Real stragglers get usurped

Unnecessary copying due to fake stragglers

Progress score means anything with >80% never speculatively executed

Page 41: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

Longest Approximate Time to End

Primary assumption: best task to execute is the one that finishes furthest into the future

Secondary: tasks make progress at approx. constant rate

Progress Rate = ProgressScore/T*

T = time task has run for

Time to completion = (1-ProgressScore)/T

Page 42: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

Launch speculative jobs on fast nodes best chance to overcome straggler vs using first

available node Cap on total number of speculative tasks ‘Slowness’ minimum threshold Does not take into account data locality

Page 43: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

Sort

EC2 test cluster 1.0-1.2 Ghz

Opteron/Xeon w/1.7 GB mem

Page 44: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

Sort

Manually slowed down 8 VM’s with background processes

Page 45: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

Grep WordCount

Page 46: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance
Page 47: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance
Page 48: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

1.Make decisions early2.Use finishing times3.Nodes are not equal4.Resources are precious

Page 49: Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance

Focusing work on small vm’s fair? Would it be better to pay for large vm and

implement system with more customized control?

Could this be used in other systems? Progress tracking is key

Is this a fundamental contribution? Or just an optimization? “Good” research?