1 scheduling ceg 4131 computer architecture iii miodrag bolic slides developed by dr. hesham...

Post on 12-Jan-2016

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Scheduling

CEG 4131 Computer Architecture IIIMiodrag Bolic

Slides developed by Dr. Hesham El-Rewini

Copyright Hesham El-Rewini

2

Outline

• Scheduling models• Scheduling without considering communication• Including communication in scheduling• Heuristic algorithms

3

Partitioner

Grains of Sequential Code

Parallel/Distributed System

Parallel Program Tasks

Scheduler

Schedule

Processors

Time

Program Tasks

Sequential Program

Explicit ApproachImplicit Approach

Dependence Analyzer

Ideal Parallelism

Scheduling Parallel Tasks

4

Program Tasks

Task Notation: (T, <, D, A)

• T set of tasks

• < partial order on T

• D Communication Data

• A amount of computation

F

20

A

5

Task Graph

10

D

15

E

10

B

15

C

10

G

15

H

15

I

30

558 7

5

5 5

10

5

4 5 4

20

Task

Amount of Computation

Communication Data

Dependency

6

Machine

• m heterogeneous processors

• Connected via an arbitrary interconnection network (network graph)

• Associated with each processor Pi is its speed Si

• Associated with each edge (i,j) is the transfer rate Rij

7

Task Schedule

• Gantt Chart

• Mapping (f) of tasks to a processing element and a starting time

• Formally: f(v) = (i,t) task v is scheduled to be processed by processor i starting at time t

8

Gantt Chart

Processor

Time

1 2 3 4 5 0

1

2

3

4

Stop

Start

W-1 W-2 W-3 W-4 W-5

10

Execution and Communication Times

• If task ti is executed on pj

Execution time = Ai/Sj

• The communication delay between ti and tj, when executed on adjacent processing elements pk and pl is

Dij/Rkl

11

Complexity

• Computationally intractable in general

• Small number of polynomial optimal algorithms in restricted cases

• A large number of heuristics in more general cases

• Quality of the scheduleschedule vs. Quality of the schedulerscheduler

12

Scheduling Task Graphs without considering communication

• Polynomial-Time Optimal Algorithms in the following cases:

1. Task graph is in-forest: each node has at most one immediate successor, or out-forest: each node has at most one immediate predecessor

2. Task graph is an interval order

In-Forest vs. Out-Forest Structure

In-Forest Out-Forest

13

14

Assumptions

• A task graph consisting of n tasks• A distributed system made up of m processors• The execution time of each task is one unit of time• Communication between any pair of tasks is zero• The goal is to find an optimal schedule, which

minimizes the completion time

15

List Scheduling

• All considered algorithms belong to the list scheduling class.

• Each task is assigned a priority, and a list of tasks is constructed in a decreasing priority order.

• A task becomes ready for execution when its immediate predecessors in the task graph have already been executed or if it does not have any predecessors.

16

Scheduling Inforest/Outforest task graphs

1. The level of each node in the task graph is calculated as given above and used as each node’s priority

2. Whenever a processor becomes available, assign it the unexecuted ready task with the highest priority

17

Example 1: Simple List Scheduling

10

14 13

12

11

9

8 7

5

6

4

2

3

1

P1 P2 P3

1

2 3 4

5 6 7

8 9 10 11 12

13 14 Level 5

Level 4

Level 3

Level 2

Level 11

1 1 1

1 1 1

1 1 1 1 1

1 1

Scheduling

Example 2: Simple List Scheduling

Task Priority

A 5

B 5

C 5

D 4

E 4

F 4

G 4

H 3

I 3

J 3

K 2

L 2

M 118

A B C

D E F

H I J

K L

M

G

t Processors

0 P1 P2 P3 P4

1 A B C E

2 D F G H

3 I J L

4 K

5 M

Priority Assignment

Scheduling

C D E

F G H

I J

K L

M

Priority Assignment

SchedulingA B

Example 3: Simple List Scheduling

19

20

Interval Orders

• A task graph is an interval order when its nodes can be mapped into intervals on the real line, and two elements are related iff the corresponding intervals do not overlap.

• For any interval ordered pair of nodes u and v, either the successors of u are also successors of v or the successors of v are also successors of u.

21

Scheduling interval ordered tasks

1. The number of successors of each node is used as each node’s priority

2. Whenever a processor becomes available, assign it the unexecuted ready task with the highest priority

22

Example 1: Scheduling Interval Ordered tasks

0 0 0

3 2 1

4 5 61

9

1 1

1 1 1

1 1 1

1 2 3

654

7 8

Time P1 P2

123

4 5 6

87 9

P30

1

2

3

Example 2: Scheduling Interval Ordered tasks

23

Task Priority

A 8

B 6

C 5

D 5

E 4

F 1

G 3

H 0

I 0

J 0

23

A B

C D E

F G

I JH

t Processors

0 P1 P2 P3

1 A B

2 C D E

3 G F

4 H I J

Priority Assignment

Scheduling

Example 3: Scheduling Interval Ordered tasks

2424

A B

C D E

G

K LH

Priority Assignment

Scheduling

F

I JH

25

Communication Models

• Completion Time– Execution time– Communication time

• Completion Time as 2 Components• Completion Time from the Gantt Chart

26

Completion Time as 2 Components

• Completion Time = Execution Time + Total Communication Delay

• Total Communication Delay = Number of communication messages * delay per message

• Execution time maximum finishing time of any task

• Number of communication messages – Model A– Model B

27

Completion Time from the Gantt Chart (Model C)• Completion Time = Schedule Length

• This model assumes the existence of an I/O processor with every processor in the system

• Communication delay between two tasks allocated to the same processor is negligible.

• Communication delay is counted only between two tasks assigned to different processors

28

Example

A

1

D

1

E

1

B

1

C

1

Assume a system with 2 processors

29

Models A and B

• Assume tasks A, B, and D are assigned to P1 and tasks C and E are assigned to p2

A

B

D

P1

C

E

P2

Model A

Number of messages = 2

Completion time = 3 + 2

Model B

Number of messages = 1

Completion time = 3 + 1

A

1

D

1

E

1

B

1

C

1

30

Model C

A

B

C D

E

Communication Delay

P1 P20

1

2

3

4

A

1

D

1

E

1

B

1

C

1

31

A

4

D

5

E

3

B

9

C

7

L

1

M

1

F

1

G

1

I

1

H

1

K

1

J

1

Processors

P1 P2 P3

A

B C D

E H J

F L K

G M

H I

Model A B Task

Assignment

Processors

P1 P2 P3

A

B

B C D

E H J

F K

G L

H M

I

ModelC Task

Assignment

Model ANumber of Messages = 2 + 2Completion time = 3 + (2*4 + 2*3) = 17

Model BNumber of Messages = 2 + 1 = 3 Completion time = 3 + (2*4 + 1*3) = 14

Model CCompletion time = 8

Communication delay is displayed in the graph for A & B.Assume execution time of a task is 1.

(assume all communication delay is 1 for simplicity)

Models A,B,C Example

32

Heuristics

A heuristic produces an answer in less than exponential time, but does not guarantee an optimal solution.

•Communication delay versus parallelism•Clustering•Duplication

33

a

b c

a

b

c

a

b

c

c

P1 P2 P1 P2

10

25

0

10

25

0

40

15

35c

40

50

x

x

x

Task Graph

Gantt Chart-2Gantt Chart-1

Task Exceution time

abc

101515

y

Arc Communication

(a,b) y

(a,c) x < y

x = 5 x = 25

30

Time Time

Communication Delay versus Parallelism

34

Clustering

a

bc

g

d e

f

a

bc

g

d e

f

a

bc

g

d e

f

(a)(b)

(c)

a

bc

g

d e

f

(d)

Clustering Example 1 Part 1

35

A

B

C

ED

F

G

4 3

2

1.5

2

1.5

5

1

1

1

1

2

1

1

Time P1 P2

1 A

2

B

3 C

4 D

5

6 E

7

8 F

9

10 G

Task Assignment

1

Communication DelayNOP

Clustering Example 1 Part 2

36

A

B

C

ED

F

G

4 3

2

1.5

2

1.5

5

1

1

1

1

2

1

1

Time P1 P2

1 A

2

B3 C

4 D

5

6E7

8 F

9 G

Task Assignment

1

Communication DelayNOP

37

Clustering Example 2

37

A

B D

FE

G

H

4 3

2

22

5

3

2

1

1

2

3

1

Time P1 P2

1 A

2

B3

4

5 D

6 D

7

C

E

8 E

9 F

10 F

11 G

12

13 H

Task Assignment

C

2

1

5

2

1

Communication DelayNOP

38

a

b c

a

b c

a

b

c

P1 P2 P1 P2

xx

Task Graph

x

a

No Duplication Task a is Duplicated

Duplications

Duplication Example (Using Clustering Example 1 Part 2)

39

A

B

C

ED

F

G

4 3

2

1.5

2

1.5

5

1

1

1

1

2

1

1

Time P1 P2

1 A A

2

B

C

3 D

4

5E6

7 F

8 G

Task Assignment

1

Communication DelayNOP

40

Scheduling and grain packing• Four major steps are involved in the grain determination and the process of

scheduling optimization:

– Step 1. Construct a fine-grain program graph.

– Step 2. Schedule the fine-grain computation.

– Step 3. Grain packing to produce the coarse grains.

– Step 4. Generate a parallel schedule based on the packed graph.

41

Program decomposition for static multiprocessor scheduling

• two 2 x 2 matrices A and B are multiplied to compute the sum of the four elements in the resulting product matrix C = A x B. There are eight multiplications and seven additions to be performed in this program, as written below:

2221

1211

2221

1211

2221

1211

C C

C C

B B

B B

A A

A A

42

Example 2.5 Ctd’– C11 = A11 B11 + A12 B21

– C12 = A11 B12 + A12 B22

– C21 = A21 B11 + A22 B21

– C22 = A21 B11 + A22 B22

– Sum = C11 + C12 + C21 + C22

2221

1211

2221

1211

2221

1211

C C

C C

B B

B B

A A

A A

43

44

45

top related