tai, yu-chang 4/29/2013 future generation computer systems(fgcs.j) journal homepage: saeid...

32
Tai, Yu-Chang 4/29/2013 Future Generation Computer Systems(FGCS.J) journal homepage: www.elsevier.com/locate/fgcs Saeid Abrishami a, , Mahmoud Naghibzadeha, Dick H.J. Epemab Deadline- constrained workflow scheduling algorithms for Infrastructure as a Service Clouds

Upload: kathryn-craig

Post on 18-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Tai, Yu-Chang

4/29/2013

Future Generation Computer Systems(FGCS.J)journal homepage: www.elsevier.com/locate/fgcsSaeid Abrishami a,∗, Mahmoud Naghibzadeha, Dick H.J. Epemab

Deadline-constrained workflow scheduling algorithms for Infrastructure as a Service Clouds

*Outline

* Introduction

* Scheduling system model

* IaaS cloud partial critical paths algorithms

* An illustrative example

*Time complexity

* Performance evaluation

* Conclusions

*Introduction

*Clouds are different from utility Grids - on-demand resource provisioning

- homogeneous networks

- the pay-as-you-go pricing model

*consider the benefits of using Cloud computing for executing scientific workflows -there exist several commercial Clouds, such as Amazon

*IntroductionInfrastructure as a Service (IaaS) Clouds, has some potential benefits

for executing scientific workflows1. users can dynamically obtain and release resources on demand,

and charged on a pay-as-you-go basis

2.resource provisioning

3. illusion of unlimited resources

important parameter : economic cost-faster resources are more expensive than slower ones

-time-cost tradeoff in selecting appropriate services

-belongs to the multi-criteria optimization problems

minimize the execution cost of the workflow, while completing the workflow before the user specified deadline

IaaS Cloud Partial Critical Paths (IC-PCP)

IaaS Cloud Partial Critical Paths with Deadline Distribution (IC-PCPD2)

*Scheduling system model*An application is modeled by a directed acyclic graph

G(T , E)

*T is a set of n tasks {t1, t2, . . . , tn}

*E is a set of dependencies ei,j=(ti,tj)

*two dummy tasks tentry and texit to the beginning and the end of the workflow (zero execution time and they are connected with zero-weight dependencies to the actual entry and exit tasks)

*Scheduling system model

*services S = {s1,s2,…,sm} with different QoS parameters such as CPU type

and memory size, and different prices

*The pricing model is based on a pay-as-you-go basis similar to the current commercial Clouds, i.e., the users are charged based on the number of time intervals that they have used the resource, even if they have not completely used the last time interval

c1=5c2=2c3=1

*ET(ti, sj) :execution time of task ti on computation service sj

*average bandwidth between the computation services is roughly equal

* TT(ei,j) : data transfer time of a dependency ei,j

* MET(ti) : Minimum Execution Time of a task ti

-execution time of task ti on a service sj ∈ S which has the

minimum ET(ti, sj) between all available services

ti

p p

ti

cc

*SS (ti) = sj,k : Selected Service for each scheduled task ti

*sj,k : kth instance of service sj.

*AST (ti) : Actual Start Time of ti

*assigned node :has already been assigned to (scheduled on)

a service

*Critical Parent : of a node ti is the unassigned parent of ti that has the latest data arrival time at ti, that is, it is the parent tp of ti for which EFT(tp) + TT(ep,i), is maximal

*PCP: The Partial Critical Path of a node ti is:

- empty if ti does not have any unassigned parents

-consists of the Critical Parent tp of ti and the Partial

Critical Path of tp if has any unassigned parents

*Algorithm1IC-

PCP

*example

258

4610

4811

5811

51216

3811

368

359

5814

0~3__16

0~5__16

0~2__19 3~7__24

7~10__23

7~11__22

8~13__30

14~17__30

D=30

14~19__30

1

3

2

0 10 20 30

*example

258

4610

4811

5811

51216

3811

368

359

5814

0~3__16

0~5__16

0~2__19 3~7__24

7~10__23

7~11__22

8~13__30

14~17__30

D=30

Path{t2,t6,t9}

14~19__30

1

S2,1

28

3

2 2

14~17__23

21~24__23

12~20__23

20~28__30

0~3__12

0~12__14

0 10 20 30

258

4610

4811

5811

51216

3811

368

359

5814

0~3__12

0~12__14

0~2__19 3~7__24

14~17__23

12~20__22

8~13__30

21~24__30

D=30

Path{t3}

20~28__30

1

S2,1

28

3

2 2

0~9__12

S3,1

9 1

0 10 20 30

258

4610

4811

5811

51216

3811

368

359

5814

0~9__12

0~12__14

0~2__19 3~7__24

14~17__23

12~20__22

8~13__30

21~24__30

D=30

Path{t5,t8}

20~28__30

1

S2,1

28

3

2 2

S3,1

9 1

14~22__24

22~28__30

3~7__230~2__18

S2,2

14 6

0 10 20 30

258

4610

4811

5811

51216

3811

368

359

5814

0~9__12

0~12__14

0~2__18 3~7__23

14~22__24

12~20__22

8~13__30

22~28__30

D=30

Path{t1,t4}

20~28__30

1

S2,1

28

3

2 2

S3,1

9 18~18__23

0~8__13

S2,2

6

19~24__30

S3,2

18

2

0 10 20 30

14

258

4610

4811

5811

51216

3811

368

359

5814

0~9__12

0~12__14

0~8__13

14~22__24

12~20__22

22~28__30

D=30

Path{t7}

20~28__30

1

S2,1

28

3

2 2

S3,1

9 1 8~18__23

S2,2

6

19~24__30

S3,2

18 2

18~29__30

S3,3

11

1

COST=2*5+1*4=14

0 10 20 30

14

5

2

1

*Applicable

*applicable instance for a path if it satisfies two conditions:

- The path can be scheduled on the instance such that each

task of the path is finished before its latest finish time

- The new schedule uses (a part of) the extra time of the

instance,which is the remaining time of the last time

interval of thatinstance.

P

C

P

Cost=zero

C

*Algorithm2

IC-PCPD2

Call PLANNING(G(T,E))

Assign subdeadline on PCP node(assigned node)

258

4610

4811

5811

51216

3811

368

359

5814

0~3__16

0~5__7

0~2__6 3~7__24

7~10__13

7~11__17

8~13__30

14~17__30

D=30

14~19__30

sb=0

tentety t4

0~5__6

S1,1

S2,1

S3,1

t2 t3t1

6~10__24

11~16__30

t1

0 10 20 30

258

4610

4811

5811

51216

3811

368

359

5814

0~3__16

0~5__7

0~5__6 6~10__24

7~10__13

7~11__17

11~16__30

14~17__30

D=30

14~19__30

sb=0

tentety t4 t5

S1,1

S2,1

S3,1

t2 t3t1

t1

t2

0 10 20 30

258

4610

4811

5811

51216

3811

368

359

5814

0~3__16

0~5__7

0~5__6 6~10__24

7~10__13

7~11__17

11~16__30

14~17__30

D=30

14~19__30

sb=0

tentety t4

0~9__16

t5 t6

S1,1

S2,1

S3,1

t2 t3t1

11~15__17

18~23__30

t1

t2

t3

0 10 20 30

258

4610

4811

5811

51216

3811

368

359

5814

0~9__16

0~5__7

0~5__6 6~10__24

7~10__13

11~15__17

11~16__30

14~17__30

D=30

sb=0

tentety t4 t5 t6 t7

S1,1

S2,1

S3,1

t2 t3t1

18~23__30

t1

t2

t3

6~16__24

17~22__30

17~20__30

t4S3,2

0 10 20 30

258

4610

4811

5811

51216

3811

368

359

5814

0~5__7

0~5__6

7~10__13

D=30

sb=0

tentety t4

0~9__16

t5 t6 t7 t8

S1,1

S2,1

S3,1

t2 t3t1

11~15__17

18~23__30

t1

t2 t5

t3

6~16__24

17~22__30

17~20__30

t4S3,2

0 10 20 30

258

4610

4811

5811

51216

3811

368

359

5814

0~5__7

0~5__6

7~10__13

D=30

sb=0

tentety t4

0~9__16

t5 t9t6 t7 t8

S1,1

S2,1

S3,1

t2 t3t1

11~15__17

18~23__30

t1

t2 t5

t3

6~16__24

17~22__30

17~20__30

t4S3,2

S1,2

t6

0 10 20 30

258

4610

4811

5811

51216

3811

368

359

5814

0~5__7

0~5__6

7~10__13

D=30

sb=0

tentety t4

0~9__16

t5 t9t6 t7 t8

S1,1

S2,1

S3,1

t2 t3t1

11~15__17

18~23__30

t1

t2 t5

t3

6~16__24

17~22__30

17~20__30

t4S3,2

S1,2

t6

17~29__30

t7

16~28__30

0 10 20 30

258

4610

4811

5811

51216

3811

368

359

5814

0~5__7

0~5__6

7~10__13

D=30

sb=0

tentety t4

0~9__16

t5 t9t6 t7 t8

S1,1

S2,1

S3,1

t2 t3t1

11~15__17

18~23__30

t1

t2 t5

t3

6~16__24

16~28__30

17~20__30

t4S3,2

S1,2

t6

t7

17~25__30

t8S3,3

0 10 20 30

258

4610

4811

5811

51216

3811

368

359

5814

0~5__7

0~5__6

7~10__13

D=30

sb=0

tentety t4

0~9__16

t5 t9t6 t7 t8

S1,1

S2,1

S3,1

t2 t3t1

11~15__17

18~23__30

t1

t2 t5

t3

6~16__24

16~28__30

17~25__30

t4S3,2

S1,2

t6

t7

18~26__30

t8S3,3

t9S2,2

COST=5*2+2*2+1*4=18

0 10 20 30

5

2

1

*Time complexity O(n+e)~O(n^

2)

O(n-1)

O(m*n)=O(n^2)

IC-PCP=O(n^2)

O(n)O(n^2)

IC-PCPD2=O(n^2)

Call PLANNING(G(T,E))

Assign subdeadline on PCP node

*Time complexity

O(n+e)~O(n^2)

O(n)

O(n^2)

O(n^2)

*evaluationAlgo1IC-PCP

Algo2IC-PCPD2

Algo3IC-LOSS

normalize the total cost of each workflow execution

Cheapest schedule : scheduling all workflow tasks on a single instance of the cheapest computation service

Fastest schedule : scheduling each workflow task on a distinct instance of the fastest computation service, while all data transmission times are considered to be zero MF =makespan of the Fastest schedule

deadline factor αset the deadline = α・MF -Since the problem has no solution for α = 1, we let α ranges from 1.5 to 5 in our experiments, with a step length equal to 0.5

*evaluation

Algo1IC-PCP

Algo2IC-PCPD2

Algo3IC-LOSS

1>2>3

1>2>3

1>2>3

1≈2>3

1>2>3

2>1>3

1>2>3

1≈2>3

1>2>3

2>1>3

*Conclusions*The new algorithms consider the main features of the current

commercial Clouds such as on-demand resource provisioning, homogeneous networks, and the pay-as-you-go pricing model

*The time complexity of both algorithms is O(n2), The polynomial time complexity makes them suitable options for the large workflows

*IC-PCP outperforms both, IC-PCPD2 and IC-Loss in most cases

*experiments show that the computation times of the algorithms are very low, less than 500 ms for the large workflows

*intend to improve our algorithms for the real Cloud environments

*Thanks~

*The end