dryad: distributed data-parallel programs from sequential

137
Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks Course: CS655 Rabiet Louis Colorado State University Thursday 19 September 2013 1 / 48

Upload: others

Post on 17-Mar-2022

11 views

Category:

Documents


0 download

TRANSCRIPT

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Dryad: Distributed Data-Parallel Programs fromSequential Building Blocks

Course: CS655

Rabiet Louis

Colorado State University

Thursday 19 September 2013

1 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

1 Motivation and GoalWhy use Dryad ?

2 DryadGraphExampleProperties

3 Some othersAntsPegasus

4 An improvement: CielDynamic tasks

5 Comparison

6 Conclusion

2 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Plan1 Motivation and Goal

Why use Dryad ?

2 DryadGraphExampleProperties

3 Some othersAntsPegasus

4 An improvement: CielDynamic tasks

5 Comparison

6 Conclusion

3 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Why use Dryad ?

Parallel Programming

GoalMake parallel programs easier for developers.

Current Choices

Google’s MapReduce is a popular framework for developingcloud-scale applications.

GPU shader languages.

Parallel databases.

4 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Why use Dryad ?

Parallel Programming

GoalMake parallel programs easier for developers.

Current Choices

Google’s MapReduce is a popular framework for developingcloud-scale applications.

GPU shader languages.

Parallel databases.

4 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Why use Dryad ?

Parallel Programming

GoalMake parallel programs easier for developers.

Current Choices

Google’s MapReduce is a popular framework for developingcloud-scale applications.

GPU shader languages.

Parallel databases.

4 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Why use Dryad ?

Parallel Programming

GoalMake parallel programs easier for developers.

Current Choices

Google’s MapReduce is a popular framework for developingcloud-scale applications.

GPU shader languages.

Parallel databases.

4 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Why use Dryad ?

Solution and Results

How esle can we achieve that ?Using graphs for describing the scheduling/ relation between jobs.

The system handles the rest i.e. managing execution of stages thatcompose the graph.

ResultsCan be performance efficient and simpler.

5 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Why use Dryad ?

Solution and Results

How esle can we achieve that ?Using graphs for describing the scheduling/ relation between jobs.The system handles the rest i.e. managing execution of stages thatcompose the graph.

ResultsCan be performance efficient and simpler.

5 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Why use Dryad ?

Solution and Results

How esle can we achieve that ?Using graphs for describing the scheduling/ relation between jobs.The system handles the rest i.e. managing execution of stages thatcompose the graph.

ResultsCan be performance efficient and simpler.

5 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Plan1 Motivation and Goal

Why use Dryad ?

2 DryadGraphExampleProperties

3 Some othersAntsPegasus

4 An improvement: CielDynamic tasks

5 Comparison

6 Conclusion

6 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

System overview

7 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Graph

Dryad program is a graph

Several operators over them can be specified

Adding and cloning vertices

Adding edges

Merging two graphs

8 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Graph

Dryad program is a graph

Several operators over them can be specified

Adding and cloning vertices

Adding edges

Merging two graphs

8 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Graph

Dryad program is a graph

Several operators over them can be specified

Adding and cloning vertices

Adding edges

Merging two graphs

8 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Graph

Dryad program is a graph

Several operators over them can be specified

Adding and cloning vertices

Adding edges

Merging two graphs

8 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Graph

Dryad program is a graph

Several operators over them can be specified

Adding and cloning vertices

Adding edges

Merging two graphs

8 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Graph

Dryad program is a graph

Several operators over them can be specified

Adding and cloning vertices

Adding edges

Merging two graphs

8 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Graph

Dryad program is a graph

Several operators over them can be specified

Adding and cloning vertices

Adding edges

Merging two graphs

8 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Graph

Merge

Merge with bypass

Asymmetric join

9 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Graph

Merge

Merge with bypass

Asymmetric join

9 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Graph

Merge

Merge with bypass

Asymmetric join

9 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Graph

Merge

Merge with bypass

Asymmetric join

9 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Graph

Merge

Merge with bypass

Asymmetric join

9 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Graph

Merge

Merge with bypass

Asymmetric join

9 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Example

Code example

select distinct p.objID

from photoObjAll p

join neighbors n -- call this join \X"

on p.objID = n.objID

and n.objID < n.neighborObjID

and p.mode = 1

join photoObjAll l -- call this join \Y"

on l.objid = n.neighborObjID

and l.mode = 1

and abs((p.u-p.g)-(l.u-l.g))<0.05

and abs((p.g-p.r)-(l.g-l.r))<0.05

and abs((p.r-p.i)-(l.r-l.i))<0.05

and abs((p.i-p.z)-(l.i-l.z))<0.05

10 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Example

Corresponding graph

11 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Example

Corresponding vertex program

GraphBu i l d e r XSet = moduleXˆN;GraphBu i l d e r DSet = moduleDˆN;GraphBu i l d e r MSet = moduleMˆ(N∗4) ;G raphBu i l d e r SSet = moduleS ˆ(N∗4) ;G raphBu i l d e r YSet = moduleYˆN;GraphBu i l d e r HSet = moduleH ˆ1 ;

G raphBu i l d e r XInputs = ( u g r i z 1 >= XSet ) | | ( n e i ghbo r >= XSet ) ;G raphBu i l d e r YInputs = ug r i z 2 >= YSet ;G raphBu i l d e r XToY = XSet >= DSet >> MSet >= SSet ;f o r ( i = 0 ; i < N∗4 ; ++i ){

XToY = XToY | | ( SSet . GetVer tex ( i ) >= YSet . GetVer tex ( i /4) ) ;}GraphBu i l d e r YToH = YSet >= HSet ;G raphBu i l d e r HOutputs = HSet >= output ;G raphBu i l d e r f i n a l = XInputs | | YInputs | | XToY | | YToH | | HOutputs ;

12 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Properties

Some nice properties ...

ChannelsCommunication can be chosen between

Files

TCP Pipe

Shared-Memory FIFO

Support for legacy applications

Node/vertices can execute old code not specifically written forDryad.

13 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Properties

Some nice properties ...

ChannelsCommunication can be chosen between

Files

TCP Pipe

Shared-Memory FIFO

Support for legacy applications

Node/vertices can execute old code not specifically written forDryad.

13 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Properties

Some nice properties

Runtime aggregation

Fault toleranceRestart failed vertex.

14 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Properties

Some nice properties

Runtime aggregation

Fault toleranceRestart failed vertex.

14 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Properties

Scalability ...

15 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Properties

Scalability

16 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Properties

SQL

17 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Plan1 Motivation and Goal

Why use Dryad ?

2 DryadGraphExampleProperties

3 Some othersAntsPegasus

4 An improvement: CielDynamic tasks

5 Comparison

6 Conclusion

18 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

Some limitations of Dryad

7 Fault tolerance: what happens if a stage manager fails.

7 No way of defining dynamic graphs.

19 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

Some limitations of Dryad

7 Fault tolerance: what happens if a stage manager fails.

7 No way of defining dynamic graphs.

19 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

Some limitations of Dryad

7 Fault tolerance: what happens if a stage manager fails.

7 No way of defining dynamic graphs.

19 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

Linking ACO and Dryad

Finding a path between producers and consumers

Is finding a path sufficient ?

What happens if a node is already close to overutilization ?

1 Need a way of evaluating the quality of the path.

2 Avoid interference between two tasks.

20 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

Linking ACO and Dryad

Finding a path between producers and consumers

Is finding a path sufficient ?

What happens if a node is already close to overutilization ?

1 Need a way of evaluating the quality of the path.

2 Avoid interference between two tasks.

20 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

Linking ACO and Dryad

Finding a path between producers and consumers

Is finding a path sufficient ?

What happens if a node is already close to overutilization ?

1 Need a way of evaluating the quality of the path.

2 Avoid interference between two tasks.

20 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

Linking ACO and Dryad

Finding a path between producers and consumers

Is finding a path sufficient ?

What happens if a node is already close to overutilization ?

1 Need a way of evaluating the quality of the path.

2 Avoid interference between two tasks.

20 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

Linking ACO and Dryad

Finding a path between producers and consumers

Is finding a path sufficient ?

What happens if a node is already close to overutilization ?

1 Need a way of evaluating the quality of the path.

2 Avoid interference between two tasks.

20 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

Linking ACO and Dryad

Finding a path between producers and consumers

Is finding a path sufficient ?

What happens if a node is already close to overutilization ?

1 Need a way of evaluating the quality of the path.

2 Avoid interference between two tasks.

20 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

Linking ACO and Dryad

Finding a path between producers and consumers

Is finding a path sufficient ?

What happens if a node is already close to overutilization ?

1 Need a way of evaluating the quality of the path.

2 Avoid interference between two tasks.

20 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

Abstraction of the problem

Each cell is a server

In a cell C, we have

N(C ), the neighboursΦC

D,n: probability if the ant has the destination D that the antis going to the cell n when currently on C .Routing table: mean and variance of the time to go from C toD.

21 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

Abstraction of the problem

Each cell is a server

In a cell C, we have

N(C ), the neighboursΦC

D,n: probability if the ant has the destination D that the antis going to the cell n when currently on C .Routing table: mean and variance of the time to go from C toD.

21 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

Abstraction of the problem

Each cell is a server

In a cell C, we have

N(C ), the neighboursΦC

D,n: probability if the ant has the destination D that the antis going to the cell n when currently on C .Routing table: mean and variance of the time to go from C toD.

21 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

Abstraction of the problem

Each cell is a server

In a cell C, we have

N(C ), the neighboursΦC

D,n: probability if the ant has the destination D that the antis going to the cell n when currently on C .Routing table: mean and variance of the time to go from C toD.

21 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

First type of ant: routing ants

An ant records its path, knows the final destination D and thesource Cs . At each step, when an ant is on a cell C (different fromD).

1 Record queue waiting time at C .

2 Choose next hop (unknown neighbour first) using ΦCD,n.

[Detection of cycles]

3 En-queue on the link to go to n.

4 Record waiting time and transmission time.

5 If the ant has not arrived at D, insert itself in the queue,repeat 1-5.

6 If the ant arrived at D, start to go back to Cs

22 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

First type of ant: routing ants

An ant records its path, knows the final destination D and thesource Cs . At each step, when an ant is on a cell C (different fromD).

1 Record queue waiting time at C .

2 Choose next hop (unknown neighbour first) using ΦCD,n.

[Detection of cycles]

3 En-queue on the link to go to n.

4 Record waiting time and transmission time.

5 If the ant has not arrived at D, insert itself in the queue,repeat 1-5.

6 If the ant arrived at D, start to go back to Cs

22 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

First type of ant: routing ants

An ant records its path, knows the final destination D and thesource Cs . At each step, when an ant is on a cell C (different fromD).

1 Record queue waiting time at C .

2 Choose next hop (unknown neighbour first) using ΦCD,n.

[Detection of cycles]

3 En-queue on the link to go to n.

4 Record waiting time and transmission time.

5 If the ant has not arrived at D, insert itself in the queue,repeat 1-5.

6 If the ant arrived at D, start to go back to Cs

22 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

First type of ant: routing ants

An ant records its path, knows the final destination D and thesource Cs . At each step, when an ant is on a cell C (different fromD).

1 Record queue waiting time at C .

2 Choose next hop (unknown neighbour first) using ΦCD,n.

[Detection of cycles]

3 En-queue on the link to go to n.

4 Record waiting time and transmission time.

5 If the ant has not arrived at D, insert itself in the queue,repeat 1-5.

6 If the ant arrived at D, start to go back to Cs

22 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

First type of ant: routing ants

An ant records its path, knows the final destination D and thesource Cs . At each step, when an ant is on a cell C (different fromD).

1 Record queue waiting time at C .

2 Choose next hop (unknown neighbour first) using ΦCD,n.

[Detection of cycles]

3 En-queue on the link to go to n.

4 Record waiting time and transmission time.

5 If the ant has not arrived at D, insert itself in the queue,repeat 1-5.

6 If the ant arrived at D, start to go back to Cs

22 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

First type of ant: routing ants

An ant records its path, knows the final destination D and thesource Cs . At each step, when an ant is on a cell C (different fromD).

1 Record queue waiting time at C .

2 Choose next hop (unknown neighbour first) using ΦCD,n.

[Detection of cycles]

3 En-queue on the link to go to n.

4 Record waiting time and transmission time.

5 If the ant has not arrived at D, insert itself in the queue,repeat 1-5.

6 If the ant arrived at D, start to go back to Cs

22 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

First type of ant: routing ants

An ant records its path, knows the final destination D and thesource Cs . At each step, when an ant is on a cell C (different fromD).

1 Record queue waiting time at C .

2 Choose next hop (unknown neighbour first) using ΦCD,n.

[Detection of cycles]

3 En-queue on the link to go to n.

4 Record waiting time and transmission time.

5 If the ant has not arrived at D, insert itself in the queue,repeat 1-5.

6 If the ant arrived at D, start to go back to Cs

22 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

Second type: scouting ants

Try to find alternative path to go from Cs to D.

If the computing time of the new tasks Lnew and thecomputing time of the old tasks Lprev does not surpass athreshold LT (user defined).

if LT is not defined, the ant just ensures that the load on theserver is inferior to α (random variable). g is an other randomvariable to determine the number of tasks the ant will placeon the server.

The system generate scout ants with various α and g . If someants are not able to place all the tasks and have reached Dthen they will crawl back and modify the ΦC

D,n (NegativeReinforcement).

23 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

Second type: scouting ants

Try to find alternative path to go from Cs to D.

If the computing time of the new tasks Lnew and thecomputing time of the old tasks Lprev does not surpass athreshold LT (user defined).

if LT is not defined, the ant just ensures that the load on theserver is inferior to α (random variable). g is an other randomvariable to determine the number of tasks the ant will placeon the server.

The system generate scout ants with various α and g . If someants are not able to place all the tasks and have reached Dthen they will crawl back and modify the ΦC

D,n (NegativeReinforcement).

23 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

Second type: scouting ants

Try to find alternative path to go from Cs to D.

If the computing time of the new tasks Lnew and thecomputing time of the old tasks Lprev does not surpass athreshold LT (user defined).

if LT is not defined, the ant just ensures that the load on theserver is inferior to α (random variable). g is an other randomvariable to determine the number of tasks the ant will placeon the server.

The system generate scout ants with various α and g . If someants are not able to place all the tasks and have reached Dthen they will crawl back and modify the ΦC

D,n (NegativeReinforcement).

23 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

Second type: scouting ants

Try to find alternative path to go from Cs to D.

If the computing time of the new tasks Lnew and thecomputing time of the old tasks Lprev does not surpass athreshold LT (user defined).

if LT is not defined, the ant just ensures that the load on theserver is inferior to α (random variable). g is an other randomvariable to determine the number of tasks the ant will placeon the server.

The system generate scout ants with various α and g . If someants are not able to place all the tasks and have reached Dthen they will crawl back and modify the ΦC

D,n (NegativeReinforcement).

23 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

Third type: enforcement ants

When enough scout ants are reporting success, someenforcement ants are released.

Except if extraordinary results are shown by scouts reports,the number of successful scout ants need to be higher than athreshold.

The enforcement ants recompute the cost (and compare itwith the scout results); if everything is fine the placement isdone.

24 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

Third type: enforcement ants

When enough scout ants are reporting success, someenforcement ants are released.

Except if extraordinary results are shown by scouts reports,the number of successful scout ants need to be higher than athreshold.

The enforcement ants recompute the cost (and compare itwith the scout results); if everything is fine the placement isdone.

24 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

Third type: enforcement ants

When enough scout ants are reporting success, someenforcement ants are released.

Except if extraordinary results are shown by scouts reports,the number of successful scout ants need to be higher than athreshold.

The enforcement ants recompute the cost (and compare itwith the scout results); if everything is fine the placement isdone.

24 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

Third type: enforcement ants

When enough scout ants are reporting success, someenforcement ants are released.

Except if extraordinary results are shown by scouts reports,the number of successful scout ants need to be higher than athreshold.

The enforcement ants recompute the cost (and compare itwith the scout results); if everything is fine the placement isdone.

24 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

Some drawbacks

7 Evaluation is done via simulation

7 No theoretical proof of correctness for placement algorithm

7 Very few optimizations → what about overhead costs ?

25 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

Some drawbacks

7 Evaluation is done via simulation

7 No theoretical proof of correctness for placement algorithm

7 Very few optimizations → what about overhead costs ?

25 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

Some drawbacks

7 Evaluation is done via simulation

7 No theoretical proof of correctness for placement algorithm

7 Very few optimizations → what about overhead costs ?

25 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

Some drawbacks

7 Evaluation is done via simulation

7 No theoretical proof of correctness for placement algorithm

7 Very few optimizations → what about overhead costs ?

25 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Pegasus

26 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Pegasus

27 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Pegasus

Scheduling horizon

28 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Plan1 Motivation and Goal

Why use Dryad ?

2 DryadGraphExampleProperties

3 Some othersAntsPegasus

4 An improvement: CielDynamic tasks

5 Comparison

6 Conclusion

29 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Dynamic tasks

Control Flow

30 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Dynamic tasks

Publishing: how to handle dynamic graphs

31 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Dynamic tasks

Lazy evaluation

Lazy evaluation is used to schedule task.

1 Start by the (expected) output of the root task

2 If T has only concrete dependencies, execute it immediately

3 Else block T until the input becomes concrete.

32 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Dynamic tasks

Lazy evaluation

Lazy evaluation is used to schedule task.

1 Start by the (expected) output of the root task

2 If T has only concrete dependencies, execute it immediately

3 Else block T until the input becomes concrete.

32 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Dynamic tasks

Lazy evaluation

Lazy evaluation is used to schedule task.

1 Start by the (expected) output of the root task

2 If T has only concrete dependencies, execute it immediately

3 Else block T until the input becomes concrete.

32 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Dynamic tasks

Lazy evaluation

Lazy evaluation is used to schedule task.

1 Start by the (expected) output of the root task

2 If T has only concrete dependencies, execute it immediately

3 Else block T until the input becomes concrete.

32 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Dynamic tasks

An optimization

result = spawn_exec(executor, args, n);

Unique Naming

for the i th output: name = executor:SHA1(arg‖n):i

MemoisationIf a new task’s outputs have already been produced by a previoustask, the new task need not be executed at all.

33 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Master

Ciel support master failure

34 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Persistent logging

Write when a new job is launched

The descriptors (graphs) are periodically written to the disk.

If the master reboots, it looks for unmatched jobs. They arethen restarted (the graph has been stored).

35 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Persistent logging

Write when a new job is launched

The descriptors (graphs) are periodically written to the disk.

If the master reboots, it looks for unmatched jobs. They arethen restarted (the graph has been stored).

35 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Persistent logging

Write when a new job is launched

The descriptors (graphs) are periodically written to the disk.

If the master reboots, it looks for unmatched jobs. They arethen restarted (the graph has been stored).

35 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Secondary masters

Everything is sent to every master (primary and secondaries).

Every secondary records workers address.

36 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Secondary masters

Everything is sent to every master (primary and secondaries).

Every secondary records workers address.

36 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Secondary masters

Everything is sent to every master (primary and secondaries).

Every secondary records workers address.

36 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Object table reconstruction

In a case of a master rebooting (failure then recovering):

1 Workers detect master failure (heartbeat).

2 If a failure is detected, workers send registration to the master.

3 Master resends objects list.

37 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Object table reconstruction

In a case of a master rebooting (failure then recovering):

1 Workers detect master failure (heartbeat).

2 If a failure is detected, workers send registration to the master.

3 Master resends objects list.

37 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Object table reconstruction

In a case of a master rebooting (failure then recovering):

1 Workers detect master failure (heartbeat).

2 If a failure is detected, workers send registration to the master.

3 Master resends objects list.

37 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Object table reconstruction

In a case of a master rebooting (failure then recovering):

1 Workers detect master failure (heartbeat).

2 If a failure is detected, workers send registration to the master.

3 Master resends objects list.

37 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Speed comparison

38 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Plan1 Motivation and Goal

Why use Dryad ?

2 DryadGraphExampleProperties

3 Some othersAntsPegasus

4 An improvement: CielDynamic tasks

5 Comparison

6 Conclusion

39 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Map Reduce

3 Simple

3 Efficient for a specific kind of programs: web indexing

7 Reliability very simple

7 Workflow hardcoded

7 New ? PL-SQL, Teradata.

Improvement

Instead of having a Map phase then a Reduce phase: do acombination of Map-Reduce.

Computation Reuse !

40 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Map Reduce

3 Simple

3 Efficient for a specific kind of programs: web indexing

7 Reliability very simple

7 Workflow hardcoded

7 New ? PL-SQL, Teradata.

Improvement

Instead of having a Map phase then a Reduce phase: do acombination of Map-Reduce.

Computation Reuse !

40 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Map Reduce

3 Simple

3 Efficient for a specific kind of programs: web indexing

7 Reliability very simple

7 Workflow hardcoded

7 New ? PL-SQL, Teradata.

Improvement

Instead of having a Map phase then a Reduce phase: do acombination of Map-Reduce.

Computation Reuse !

40 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Map Reduce

3 Simple

3 Efficient for a specific kind of programs: web indexing

7 Reliability very simple

7 Workflow hardcoded

7 New ? PL-SQL, Teradata.

Improvement

Instead of having a Map phase then a Reduce phase: do acombination of Map-Reduce.

Computation Reuse !

40 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Map Reduce

3 Simple

3 Efficient for a specific kind of programs: web indexing

7 Reliability very simple

7 Workflow hardcoded

7 New ?

PL-SQL, Teradata.

Improvement

Instead of having a Map phase then a Reduce phase: do acombination of Map-Reduce.

Computation Reuse !

40 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Map Reduce

3 Simple

3 Efficient for a specific kind of programs: web indexing

7 Reliability very simple

7 Workflow hardcoded

7 New ? PL-SQL, Teradata.

Improvement

Instead of having a Map phase then a Reduce phase: do acombination of Map-Reduce.

Computation Reuse !

40 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Map Reduce

3 Simple

3 Efficient for a specific kind of programs: web indexing

7 Reliability very simple

7 Workflow hardcoded

7 New ? PL-SQL, Teradata.

Improvement

Instead of having a Map phase then a Reduce phase: do acombination of Map-Reduce.

Computation Reuse !

40 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Map Reduce

3 Simple

3 Efficient for a specific kind of programs: web indexing

7 Reliability very simple

7 Workflow hardcoded

7 New ? PL-SQL, Teradata.

Improvement

Instead of having a Map phase then a Reduce phase: do acombination of Map-Reduce.

Computation Reuse !

40 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Dryad

3 Some optimizations: aggregation (data locality).

3 Much more general than Map Reduce

7 Not so simple (look at the code)

7 No cycles

7 Coming from SQL.

Improvement

Ciel

41 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Dryad

3 Some optimizations: aggregation (data locality).

3 Much more general than Map Reduce

7 Not so simple (look at the code)

7 No cycles

7 Coming from SQL.

Improvement

Ciel

41 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Dryad

3 Some optimizations: aggregation (data locality).

3 Much more general than Map Reduce

7 Not so simple (look at the code)

7 No cycles

7 Coming from SQL.

Improvement

Ciel

41 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Dryad

3 Some optimizations: aggregation (data locality).

3 Much more general than Map Reduce

7 Not so simple (look at the code)

7 No cycles

7 Coming from SQL.

Improvement

Ciel

41 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Dryad

3 Some optimizations: aggregation (data locality).

3 Much more general than Map Reduce

7 Not so simple (look at the code)

7 No cycles

7 Coming from SQL.

Improvement

Ciel

41 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Dryad

3 Some optimizations: aggregation (data locality).

3 Much more general than Map Reduce

7 Not so simple (look at the code)

7 No cycles

7 Coming from SQL.

Improvement

Ciel

41 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Dryad

3 Some optimizations: aggregation (data locality).

3 Much more general than Map Reduce

7 Not so simple (look at the code)

7 No cycles

7 Coming from SQL.

Improvement

Ciel

41 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

3 Analogy with nature.

3 Distributed

3 General

3 Account for machine load

7 Difficult to predict behaviour

7 Overhead !!

Improvement

What happens if an ant is faulty ?

I would replace it with a simpler search of optimal paths.

42 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

3 Analogy with nature.

3 Distributed

3 General

3 Account for machine load

7 Difficult to predict behaviour

7 Overhead !!

Improvement

What happens if an ant is faulty ?

I would replace it with a simpler search of optimal paths.

42 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

3 Analogy with nature.

3 Distributed

3 General

3 Account for machine load

7 Difficult to predict behaviour

7 Overhead !!

Improvement

What happens if an ant is faulty ?

I would replace it with a simpler search of optimal paths.

42 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

3 Analogy with nature.

3 Distributed

3 General

3 Account for machine load

7 Difficult to predict behaviour

7 Overhead !!

Improvement

What happens if an ant is faulty ?

I would replace it with a simpler search of optimal paths.

42 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

3 Analogy with nature.

3 Distributed

3 General

3 Account for machine load

7 Difficult to predict behaviour

7 Overhead !!

Improvement

What happens if an ant is faulty ?

I would replace it with a simpler search of optimal paths.

42 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

3 Analogy with nature.

3 Distributed

3 General

3 Account for machine load

7 Difficult to predict behaviour

7 Overhead !!

Improvement

What happens if an ant is faulty ?

I would replace it with a simpler search of optimal paths.

42 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

3 Analogy with nature.

3 Distributed

3 General

3 Account for machine load

7 Difficult to predict behaviour

7 Overhead !!

Improvement

What happens if an ant is faulty ?

I would replace it with a simpler search of optimal paths.

42 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

3 Analogy with nature.

3 Distributed

3 General

3 Account for machine load

7 Difficult to predict behaviour

7 Overhead !!

Improvement

What happens if an ant is faulty ?

I would replace it with a simpler search of optimal paths.

42 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

3 Analogy with nature.

3 Distributed

3 General

3 Account for machine load

7 Difficult to predict behaviour

7 Overhead !!

Improvement

What happens if an ant is faulty ?

I would replace it with a simpler search of optimal paths.

42 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

3 Analogy with nature.

3 Distributed

3 General

3 Account for machine load

7 Difficult to predict behaviour

7 Overhead !!

Improvement

What happens if an ant is faulty ?

I would replace it with a simpler search of optimal paths.

42 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ants

3 Analogy with nature.

3 Distributed

3 General

3 Account for machine load

7 Difficult to predict behaviour

7 Overhead !!

Improvement

What happens if an ant is faulty ?

I would replace it with a simpler search of optimal paths.

42 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Pegasus

3 Performance issue

7 Not general enough

Improvement

Reliability (only partition level)

43 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Pegasus

3 Performance issue

7 Not general enough

Improvement

Reliability (only partition level)

43 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Pegasus

3 Performance issue

7 Not general enough

Improvement

Reliability (only partition level)

43 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Pegasus

3 Performance issue

7 Not general enough

Improvement

Reliability (only partition level)

43 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Pegasus

3 Performance issue

7 Not general enough

Improvement

Reliability (only partition level)

43 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ciel

3 Much more general.

3 Dynamic task dependencies

Improvement

Random funtions

How far are you from machine performance ?

44 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ciel

3 Much more general.

3 Dynamic task dependencies

Improvement

Random funtions

How far are you from machine performance ?

44 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ciel

3 Much more general.

3 Dynamic task dependencies

Improvement

Random funtions

How far are you from machine performance ?

44 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ciel

3 Much more general.

3 Dynamic task dependencies

Improvement

Random funtions

How far are you from machine performance ?

44 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ciel

3 Much more general.

3 Dynamic task dependencies

Improvement

Random funtions

How far are you from machine performance ?

44 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Ciel

3 Much more general.

3 Dynamic task dependencies

Improvement

Random funtions

How far are you from machine performance ?

44 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Perpectives

Adding user’s transformations for graphs: tiling, merge, etc...

BGP (distance-vector routing protocol)

Look at this problem using maximum flow problem.

45 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Perpectives

Adding user’s transformations for graphs: tiling, merge, etc...

BGP (distance-vector routing protocol)

Look at this problem using maximum flow problem.

45 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Perpectives

Adding user’s transformations for graphs: tiling, merge, etc...

BGP (distance-vector routing protocol)

Look at this problem using maximum flow problem.

45 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Perpectives

Adding user’s transformations for graphs: tiling, merge, etc...

BGP (distance-vector routing protocol)

Look at this problem using maximum flow problem.

45 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Plan1 Motivation and Goal

Why use Dryad ?

2 DryadGraphExampleProperties

3 Some othersAntsPegasus

4 An improvement: CielDynamic tasks

5 Comparison

6 Conclusion

46 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Conclusion

Microsoft decided to stop developing Dryad for commercialreasons.

47 / 48

Motivation and Goal Dryad Some others An improvement: Ciel Comparison Conclusion

Conclusion

Summary

Several restrictions that make the result possible:

Node results are deterministic

Job manager doesn’t fail

Acyclic graph

Doesn’t add enough to MapReduce

48 / 48