cluster computing with dryadlinq mihai budiu, msr-svc parc, may 8 2008

71
Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

Post on 20-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

Cluster Computing with DryadLINQ

Mihai Budiu, MSR-SVCPARC, May 8 2008

Page 2: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

2

Aknowledgments

MSR SVC and ISRC SVC

Michael Isard, Yuan Yu, Andrew Birrell, Dennis Fetterly

Ulfar Erlingsson, Pradeep Kumar Gunda, Jon Currey

Page 3: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

3

Computer Evolution

1961 2008 2040

?

Page 4: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

4

Computer Evolution

ENIAC 1943

30 tons200kW

Datacenter 2008

500,000 ft2

40MW

?2040

Page 5: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

5

2040

Page 6: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

6

Layers

Networking

Storage

Distributed Execution

Scheduling

Resource Management

Applications

Identity & Security

Caching and Synchronization

Programming Languages and APIs

Ope

ratin

g Sy

stem

Page 7: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

7

Pieces of the Global Computer

Page 8: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

8

This Work

Page 9: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

9

The Rest of This Talk

Windows Server

Cluster Services

Distributed Filesystem

Dryad

DryadLINQ

Windows Server

Windows Server

Windows Server

CIFS/NTFS

Large Vectors

Machine Learning

Page 10: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

10

How fast can you sort 1010 100-byte records (1Tb)?

Sequential scan/disk = 4.6 hours

Current record: 435 seconds (7.2 min)cluster of 40 Itanium2, 2520 SAN disks

Code: 3300 lines of C

Our result: 349 seconds (5.8 min)cluster of 240 AMD64 (quad) machines, 920 disks

Code: 17 lines of LINQ

TeraSort

Page 11: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

11

• Introduction• Dryad • DryadLINQ• Building on DryadLINQ

Outline

Page 12: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

12

• Introduction• Dryad

– deployed since 2006– many thousands of machines– analyzes many petabytes of data/day

• DryadLINQ• Building on DryadLINQ

Outline

Page 13: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

13

Goal

Page 14: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

14

Design Space

ThroughputLatency

Internet

Privatedata

center

Data-parallel

Sharedmemory

DryadSearch

HPC

Grid

Transaction

Page 15: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

15

Data Partitioning

RAM

DATA

DATA

Page 16: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

16

2-D Piping• Unix Pipes: 1-D

grep | sed | sort | awk | perl

• Dryad: 2-D grep1000 | sed500 | sort1000 | awk500 | perl50

Page 17: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

17

Dryad = Execution Layer

Job (application)

Dryad

Cluster

Pipeline

Shell

Machine≈

Page 18: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

18

Virtualized 2-D Pipelines

Page 19: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

19

Virtualized 2-D Pipelines

Page 20: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

20

Virtualized 2-D Pipelines

Page 21: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

21

Virtualized 2-D Pipelines

Page 22: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

22

Virtualized 2-D Pipelines• 2D DAG• multi-machine• virtualized

Page 23: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

23

Dryad Job Structure

grep

sed

sortawk

perlgrep

grepsed

sort

sort

awk

Inputfiles

Vertices (processes)

Outputfiles

ChannelsStage

Page 24: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

24

Channels

X

M

Items

Finite Streams of items

• distributed filesystem files (persistent)• SMB/NTFS files (temporary)• TCP pipes (inter-machine)• memory FIFOs (intra-machine)

Page 25: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

25

Architecture

Files, TCP, FIFO, Networkjob schedule

data plane

control plane

NS PD PDPD

V V V

Job manager cluster

Page 26: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

Fault Tolerance

Page 27: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

X[0] X[1] X[3] X[2] X’[2]

Completed vertices Slow vertex

Duplicatevertex

Dynamic Graph Rewriting

Duplication Policy = f(running times, data volumes)

Page 28: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

28

S S S S

A A A

S S

T

S S S S S S

T

# 1 # 2 # 1 # 3 # 3 # 2

# 3# 2# 1

static

dynamic

rack #

Dynamic Aggregation

Page 29: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

29

Data-Parallel Computation

Storage

Execution

Application

Parallel Databases

Map-Reduce

GFSBigTable

Dryad

Page 30: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

30

• Introduction• Dryad • DryadLINQ• Building on Dryad

Outline

Page 31: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

31

DryadLINQ

Dryad

Page 32: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

32

LINQ

Collection<T> collection;bool IsLegal(Key);string Hash(Key);

var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};

Page 33: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

33

Collection<T> collection;bool IsLegal(Key k);string Hash(Key);

var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};

DryadLINQ = LINQ + Dryad

C#

collection

results

C# C# C#

Vertexcode

Queryplan(Dryad job)Data

Page 34: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

34

Data Model

Partition

Collection

C# objects

Page 35: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

35

Query Providers

DryadLINQ

Client machine

(11)

Distributed query plan

C#

Query Expr

Data center

Output TablesResults

Input TablesInvoke Query

Output DryadTable

Dryad Execution

C# Objects

JM

ToDryadTable

foreach

Page 36: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

36

Demo

Page 37: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

37

Example: Histogrampublic static IQueryable<Pair> Histogram( IQueryable<LineRecord> input, int k){ var words = input.SelectMany(x => x.line.Split(' ')); var groups = words.GroupBy(x => x); var counts = groups.Select(x => new Pair(x.Key, x.Count())); var ordered = counts.OrderByDescending(x => x.count); var top = ordered.Take(k); return top;}

“A line of words of wisdom”

[“A”, “line”, “of”, “words”, “of”, “wisdom”]

[[“A”], [“line”], [“of”, “of”], [“words”], [“wisdom”]]

[ {“A”, 1}, {“line”, 1}, {“of”, 2}, {“words”, 1}, {“wisdom”, 1}]

[{“of”, 2}, {“A”, 1}, {“line”, 1}, {“words”, 1}, {“wisdom”, 1}]

[{“of”, 2}, {“A”, 1}, {“line”, 1}]

Page 38: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

38

Histogram Plan

SelectManyHashDistribute

MergeGroupBy

Select

OrderByDescendingTake

MergeSortTake

Page 39: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

39

Map-Reduce in DryadLINQ

public static IQueryable<S> MapReduce<T,M,K,S>( this IQueryable<T> input, Expression<Func<T, IEnumerable<M>>> mapper, Expression<Func<M,K>> keySelector, Expression<Func<IGrouping<K,M>,S>> reducer) { var map = input.SelectMany(mapper); var group = map.GroupBy(keySelector); var result = group.Select(reducer); return result;}

Page 40: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

40

Map-Reduce Plan

M

D

R

G

M

Q

G1

R

D

MS

G2

R

(1) (2) (3)

X

X

M

Q

G1

R

D

MS

G2

R

X

M

Q

G1

R

D

MS

G2

R

X

M

Q

G1

R

D

M

Q

G1

R

D

MS

G2

R

X

M

Q

G1

R

D

MS

G2

R

X

M

Q

G1

R

D

MS

G2

R

MS

G2

R

map

sort

groupby

reduce

distribute

mergesort

groupby

reduce

mergesort

groupby

reduce

consumer

map

parti

al a

ggre

gatio

nre

duce

S S S S

A A A

S S

T

Page 41: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

41

Distributed Sorting in DryadLINQ

public static IQueryable<TSource>DSort<TSource, TKey>(this IQueryable<TSource> source,                                  Expression<Func<TSource, TKey>> keySelector,                                  int pcount){            var samples = source.Apply(x => Sampling(x));            var keys = samples.Apply(x => ComputeKeys(x, pcount));            var parts = source.RangePartition(keySelector, keys);            return parts.OrderBy(keySelector);}

Page 42: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

42

Distributed Sorting Plan

O

DS

H

D

M

S

DS

H

D

M

S

DS

D

DS

H

D

M

S

DS

D

M

S

M

S

(1) (2) (3)

Page 43: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

43

• Introduction• Dryad • DryadLINQ• Building on DryadLINQ

Outline

Page 44: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

44

Machine Learning in DryadLINQ

Dryad

DryadLINQ

Large Vector

Machine learningData analysis

Page 45: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

45

Operations on Large Vectors: Map 1

U

T

T Uf

f

f preserves partitioning

Page 46: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

46

V

Map 2 (Pairwise)

T Uf

V

U

T

f

Page 47: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

47

Map 3 (Vector-Scalar)T U

fV

V

47

U

T

f

Page 48: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

Reduce (Fold)

48

U UU

U

f

f f f

fU U U

U

Page 49: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

49

Linear Algebra

T U Vnmm ,,=, ,

T

Page 50: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

50

Linear Regression

• Data

• Find

• S.t.

mt

nt yx ,

mnA

tt yAx

},...,1{ nt

Page 51: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

51

Analytic Solution

X×XT X×XT X×XT Y×XT Y×XT Y×XT

Σ

X[0] X[1] X[2] Y[0] Y[1] Y[2]

Σ

[ ]-1

*

A

1))(( Ttt t

Ttt t xxxyA

Map

Reduce

Page 52: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

52

Linear Regression Code

Vectors x = input(0), y = input(1);Matrices xx = x.Map(x, (a,b) => a.OuterProd(b));OneMatrix xxs = xx.Sum();Matrices yx = y.Map(x, (a,b) => a.OuterProd(b));OneMatrix yxs = yx.Sum();OneMatrix xxinv = xxs.Map(a => a.Inverse());OneMatrix A = yxs.Map(xxinv, (a, b) => a.Mult(b));

1))(( Ttt t

Ttt t xxxyA

Page 53: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

Expectation Maximization (Gaussians)

53

• 160 lines • 3 iterations shown

Page 54: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

Conclusions

• Dryad = distributed execution environment• Application-independent (semantics oblivious)• Supports rich software ecosystem

– Relational algebra, Map-reduce, LINQ• DryadLINQ = Compiles LINQ to Dryad• C# objects and declarative programming• .Net and Visual Studio for parallel programming

54

Page 55: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

55

Backup Slides

Page 56: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

56

Software Stack

Windows Server

Cluster Services

Distributed Filesystem

Dryad

Distributed Shell

PSQL

DryadLINQ

PerlSQL

server

C++

Windows Server

Windows Server

Windows Server

C++

CIFS/NTFS

legacycode

sed, awk, grep, etc.

SSISScope

C#

Vectors

Machine Learning

C#

Job

queu

eing

, mon

itorin

g

Page 57: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

57

Very Large Vector LibraryPartitionedVector<T>

T

Scalar<T>

T T

T

Page 58: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

58

DryadLINQ

• Declarative programming • Integration with Visual Studio• Integration with .Net• Type safety• Automatic serialization• Job graph optimizations static dynamic

• Conciseness

Page 59: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

59

Sort & Map-Reduce in DryadLINQ

Page 60: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

60

• Many similarities• Exe + app. model• Map+sort+reduce• Few policies• Program=map+reduce• Simple• Mature (> 4 years)• Widely deployed• Hadoop

Dryad Map-Reduce

• Execution layer• Job = arbitrary DAG• Plug-in policies• Program=graph gen.• Complex ( features)• New (< 2 years)• Still growing• Internal

Page 61: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

61

PLINQ

public static IEnumerable<TSource> DryadSort<TSource, TKey>(IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IComparer<TKey> comparer, bool isDescending){

return source.AsParallel().OrderBy(keySelector, comparer);}

Page 62: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

Query histogram computation

• Input: log file (n partitions)• Extract queries from log partitions• Re-partition by hash of query (k buckets)• Compute histogram within each bucket

Page 63: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

Naïve histogram topology

Q Q

R

Q

R k

k

k

n

n

is:Each

R

is:

Each

MS

C

P

C

S

C

S

D

P parse linesD hash distributeS quicksortC count

occurrencesMS merge sort

Page 64: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

Efficient histogram topologyP parse linesD hash distributeS quicksortC count

occurrencesMS merge sortM non-deterministic

merge

Q' is:Each

R

is:

Each

MS

C

M

P

C

S

Q'

RR k

T

k

n

T

is:

Each

MS

D

C

Page 65: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

Final histogram refinement

Q' Q'

RR 450

TT 217

450

10,405

99,713

33.4 GB

118 GB

154 GB

10.2 TB

1,800 computers43,171 vertices11,072 processes11.5 minutes

Page 66: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

66

Data Distribution(Group By)

Dest

Source

Dest

Source

Dest

Source m

n

m x n

Page 67: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

TT[0-?) [?-100)

Range-Distribution Manager

S

D D D

S S

S S S

Tstatic

dynamic67

Hist

[0-30),[30-100)

[30-100)[0-30)

[0-100)

Page 68: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

68

Goal: Declarative Programming

X

T

S

X X

S S

T T T

X

static dynamic

Page 69: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

JM code

vertex code

Staging1. Build

2. Send .exe

3. Start JM

5. Generate graph

7. Serializevertices

8. MonitorVertex execution

4. Querycluster resources

Cluster services6. Initialize vertices

Page 70: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

70

SkyServer Query 18

D D

MM 4n

SS 4n

YY

H

n

n

X Xn

U UN N

L L

select distinct P.ObjIDinto results from photoPrimary U, neighbors N, photoPrimary Lwhere U.ObjID = N.ObjID and L.ObjID = N.NeighborObjID and P.ObjID < L.ObjID and abs((U.u-U.g)-(L.u-L.g))<0.05 and abs((U.g-U.r)-(L.g-L.r))<0.05 and abs((U.r-U.i)-(L.r-L.i))<0.05 and abs((U.i-U.z)-(L.i-L.z))<0.05

Page 71: Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May 8 2008

71

0.0

2.0

4.0

6.0

8.0

10.0

12.0

14.0

16.0

0 2 4 6 8 10

Number of Computers

Speed-up (times)

Dryad In-Memory

Dryad Two-pass

SQLServer 2005

SkyServer Q18 Performance