cluster computing with dryad mihai budiu, msr-svc livelabs, march 2008

Cluster Computing with Dryad

Mihai Budiu, MSR-SVCLiveLabs, March 2008

The Dryad Project

http://research.microsoft.com/research/sv/dryad

Dryad: Distributed Data-Parallel Programs from Sequential Building BlocksMichael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly

European Conference on Computer Systems (EuroSys), Lisbon, Portugal, March 21-23, 2007

• Dryad Design• Implementation• Policies as Plug-ins• Building on Dryad

Outline

Design Space

ThroughputLatency

Internet

Privatedata

center

Data-parallel

Sharedmemory

DryadSearch

Transaction

Data Partitioning

2-D Piping• Unix Pipes: 1-D

grep | sed | sort | awk | perl

• Dryad: 2-D grep1000 | sed500 | sort1000 | awk500 | perl50

Dryad = Execution Layer

Job (Application)

Cluster

Pipeline

Machine≈

Outline

Virtualized 2-D Pipelines

Virtualized 2-D Pipelines• 2D DAG• multi-machine• virtualized

Dryad Job Structure

sortawk

perlgrep

grepsed

Inputfiles

Vertices (processes)

Outputfiles

ChannelsStage

grep1000 | sed500 | sort1000 | awk500 | perl50

Channels

Finite Streams of items

• distributed filesystem files (persistent)• SMB/NTFS files (temporary)• TCP pipes (inter-machine)• memory FIFOs (intra-machine)

Architecture

Files, TCP, FIFO, Networkjob schedule

data plane

control plane

NS PD PDPD

Job manager cluster

JM code

vertex code

Staging1. Build

2. Send .exe

3. Start JM

5. Generate graph

7. Serializevertices

8. MonitorVertex execution

4. Querycluster resources

Cluster services6. Initialize vertices

Fault Tolerance

• Dryad Design• Implementation• Policies and Resource Management• Building on Dryad

Outline

Policy Managers

X X X X

Stage RR R

Stage X

Job Manager

R managerX ManagerR-X

Manager

Connection R-X

X[0] X[1] X[3] X[2] X’[2]

Completed vertices Slow vertex

Duplicatevertex

Duplicate Execution Manager

Duplication Policy = f(running times, data volumes)

S S S S

S S S S S S

# 1 # 2 # 1 # 3 # 3 # 2

# 3# 2# 1

static

dynamic

rack #

Aggregation Manager

Data Distribution(Group By)

Source

Source m

TT[0-?) [?-100)

Range-Distribution Manager

Tstatic

dynamic25

[0-30),[30-100)

[30-100)[0-30)

[0-100)

Goal: Declarative Programming

static dynamic

Outline

Software Stack

Windows Server

Cluster Services

Distributed Filesystem

Distributed Shell

DryadLINQ

PerlSQL

server

Windows Server

CIFS/NTFS

legacycode

sed, awk, grep, etc.

SSISQueries

Vectors

Machine Learning

itorin

SkyServer Query 18

U UN N

select distinct P.ObjIDinto results from photoPrimary U, neighbors N, photoPrimary Lwhere U.ObjID = N.ObjID and L.ObjID = N.NeighborObjID and P.ObjID < L.ObjID and abs((U.u-U.g)-(L.u-L.g))<0.05 and abs((U.g-U.r)-(L.g-L.r))<0.05 and abs((U.r-U.i)-(L.r-L.i))<0.05 and abs((U.i-U.z)-(L.i-L.z))<0.05

0 2 4 6 8 10

Number of Computers

Speed-up (times)

Dryad In-Memory

Dryad Two-pass

SQLServer 2005

SkyServer Q18 Performance

DryadLINQ

• Declarative programming • Integration with Visual Studio• Integration with .Net• Type safety• Automatic serialization• Job graph optimizations static dynamic

• Conciseness

Collection<T> collection;bool IsLegal(Key);string Hash(Key);

var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};

Collection<T> collection;bool IsLegal(Key k);string Hash(Key);

var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};

DryadLINQ = LINQ + Dryad

collection

results

C# C# C#

Vertexcode

Queryplan(Dryad job)Data

Sort & Map-Reduce in DryadLINQ

SortSort

[0-30),[30-100)

[30-100)[0-30)

[0-100)

public static IEnumerable<TSource> DryadSort<TSource, TKey>(IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IComparer<TKey> comparer, bool isDescending){

return source.AsParallel().OrderBy(keySelector, comparer);}

Machine Learning in DryadLINQ

DryadLINQ

Large Vector

Machine learningData analysis

Very Large Vector LibraryPartitionedVector<T>

Scalar<T>

Operations on Large Vectors: Map 1

f preserves partitioning

Map 2 (Pairwise)

Map 3 (Vector-Scalar)T U

Reduce (Fold)

fU U U

Linear Algebra

T U Vnmm ,,=, ,

Linear Regression

• Data

• Find

• S.t.

nt yx ,

tt yAx

},...,1{ nt

Analytic Solution

X×XT X×XT X×XT Y×XT Y×XT Y×XT

X[0] X[1] X[2] Y[0] Y[1] Y[2]

1))(( Ttt t

Ttt t xxxyA

Reduce

Linear Regression Code

Vectors x = input(0), y = input(1);Matrices xx = x.PairwiseOuterProduct(x);OneMatrix xxs = xx.Sum();Matrices yx = y.PairwiseOuterProduct(x);OneMatrix yxs = yx.Sum();OneMatrix xxinv = xxs.Map(a => a.Inverse());OneMatrix A = yxs.Map(

xxinv, (a, b) => a.Multiply(b));

1))(( Ttt t

Ttt t xxxyA

Expectation Maximization (Gaussians)

• 160 lines • 3 iterations shown

Conclusions• Dryad = distributed execution environment• Application-independent (semantics oblivious)• Supports rich software ecosystem

– Relational algebra– Map-reduce– LINQ– Etc.

• DryadLINQ = A Dryad provider for LINQ• This is only the beginning!

Backup Slides

• Many similarities• Exe + app. model• Map+sort+reduce• Few policies• Program=map+reduce• Simple• Mature (> 4 years)• Widely deployed• Hadoop

Dryad Map-Reduce

• Execution layer• Job = arbitrary DAG• Plug-in policies• Program=graph gen.• Complex ( features)• New (< 2 years)• Still growing• Internal

Small Cluster Support

Sort Sort

MergeMerge

Grouping vertices

Fast channels

U UN N

SkyServer DB query

• Took SQL plan• Manually coded in Dryad• Manually partitioned data

u: objid, colorn: objid, neighborobjid[partition by objid]

select u.color,n.neighborobjidfrom u join nwhere u.objid = n.objid

(u.color,n.neighborobjid)[re-partition by n.neighborobjid][order by n.neighborobjid]

[distinct][merge outputs]

select u.objidfrom u join <temp>where u.objid = <temp>.neighborobjid and |u.color - <temp>.color| < d

Optimization

U UN N

Optimization

U UN N

Query histogram computation

• Input: log file (n partitions)• Extract queries from log partitions• Re-partition by hash of query (k buckets)• Compute histogram within each bucket

Naïve histogram topology

is:Each

P parse linesD hash distributeS quicksortC count

occurrencesMS merge sort

Efficient histogram topologyP parse linesD hash distributeS quicksortC count

occurrencesMS merge sortM non-deterministic

Q' is:Each

Final histogram refinement

RR 450

TT 217

10,405

99,713

33.4 GB

118 GB

154 GB

10.2 TB

1,800 computers43,171 vertices11,072 processes11.5 minutes

cluster computing with dryad mihai budiu, msr-svc livelabs, march 2008

u slide

x n slide

bucket slide

color d slide

ram data slide

ms d c slide

objid u

fault tolerance slide

Documents

dryad: distributed data- parallel...

dryad and dryadlinq

dryad curation practices july 2013. dryad package/file...

cluster computing with dryad

dryad / dryadlinq

e pt livelabs ad transforming local places

hive-dryad integration

matrix multiply with dryad

livelabs pivot a future direction for library search?

kaye odin dryad presentation 2

dryad and dryadlinq theophilus benson cs 590.4. distributed...

the livelabs testbed & mobile sensing-based...

what problem does livelabs solve?

dryad system curation...

the livelabs testbed & mobile sensing-based...

dryad curation practices january 2014. dryad package/file...

microsoft dryad

dryad snack pack pattern - fabshop hop

dryad and dryadlinq installation and configuration guide

biosharing sansone-dryad-may13