1
Scheduling
CEG 4131 Computer Architecture IIIMiodrag Bolic
Slides developed by Dr. Hesham El-Rewini
Copyright Hesham El-Rewini
2
Outline
• Scheduling models• Scheduling without considering communication• Including communication in scheduling• Heuristic algorithms
3
Partitioner
Grains of Sequential Code
Parallel/Distributed System
Parallel Program Tasks
Scheduler
Schedule
Processors
Time
Program Tasks
Sequential Program
Explicit ApproachImplicit Approach
Dependence Analyzer
Ideal Parallelism
Scheduling Parallel Tasks
4
Program Tasks
Task Notation: (T, <, D, A)
• T set of tasks
• < partial order on T
• D Communication Data
• A amount of computation
F
20
A
5
Task Graph
10
D
15
E
10
B
15
C
10
G
15
H
15
I
30
558 7
5
5 5
10
5
4 5 4
20
Task
Amount of Computation
Communication Data
Dependency
6
Machine
• m heterogeneous processors
• Connected via an arbitrary interconnection network (network graph)
• Associated with each processor Pi is its speed Si
• Associated with each edge (i,j) is the transfer rate Rij
7
Task Schedule
• Gantt Chart
• Mapping (f) of tasks to a processing element and a starting time
• Formally: f(v) = (i,t) task v is scheduled to be processed by processor i starting at time t
8
Gantt Chart
Processor
Time
1 2 3 4 5 0
1
2
3
4
Stop
Start
W-1 W-2 W-3 W-4 W-5
10
Execution and Communication Times
• If task ti is executed on pj
Execution time = Ai/Sj
• The communication delay between ti and tj, when executed on adjacent processing elements pk and pl is
Dij/Rkl
11
Complexity
• Computationally intractable in general
• Small number of polynomial optimal algorithms in restricted cases
• A large number of heuristics in more general cases
• Quality of the scheduleschedule vs. Quality of the schedulerscheduler
12
Scheduling Task Graphs without considering communication
• Polynomial-Time Optimal Algorithms in the following cases:
1. Task graph is in-forest: each node has at most one immediate successor, or out-forest: each node has at most one immediate predecessor
2. Task graph is an interval order
In-Forest vs. Out-Forest Structure
In-Forest Out-Forest
13
14
Assumptions
• A task graph consisting of n tasks• A distributed system made up of m processors• The execution time of each task is one unit of time• Communication between any pair of tasks is zero• The goal is to find an optimal schedule, which
minimizes the completion time
15
List Scheduling
• All considered algorithms belong to the list scheduling class.
• Each task is assigned a priority, and a list of tasks is constructed in a decreasing priority order.
• A task becomes ready for execution when its immediate predecessors in the task graph have already been executed or if it does not have any predecessors.
16
Scheduling Inforest/Outforest task graphs
1. The level of each node in the task graph is calculated as given above and used as each node’s priority
2. Whenever a processor becomes available, assign it the unexecuted ready task with the highest priority
17
Example 1: Simple List Scheduling
10
14 13
12
11
9
8 7
5
6
4
2
3
1
P1 P2 P3
1
2 3 4
5 6 7
8 9 10 11 12
13 14 Level 5
Level 4
Level 3
Level 2
Level 11
1 1 1
1 1 1
1 1 1 1 1
1 1
Scheduling
Example 2: Simple List Scheduling
Task Priority
A 5
B 5
C 5
D 4
E 4
F 4
G 4
H 3
I 3
J 3
K 2
L 2
M 118
A B C
D E F
H I J
K L
M
G
t Processors
0 P1 P2 P3 P4
1 A B C E
2 D F G H
3 I J L
4 K
5 M
Priority Assignment
Scheduling
C D E
F G H
I J
K L
M
Priority Assignment
SchedulingA B
Example 3: Simple List Scheduling
19
20
Interval Orders
• A task graph is an interval order when its nodes can be mapped into intervals on the real line, and two elements are related iff the corresponding intervals do not overlap.
• For any interval ordered pair of nodes u and v, either the successors of u are also successors of v or the successors of v are also successors of u.
21
Scheduling interval ordered tasks
1. The number of successors of each node is used as each node’s priority
2. Whenever a processor becomes available, assign it the unexecuted ready task with the highest priority
22
Example 1: Scheduling Interval Ordered tasks
0 0 0
3 2 1
4 5 61
9
1 1
1 1 1
1 1 1
1 2 3
654
7 8
Time P1 P2
123
4 5 6
87 9
P30
1
2
3
Example 2: Scheduling Interval Ordered tasks
23
Task Priority
A 8
B 6
C 5
D 5
E 4
F 1
G 3
H 0
I 0
J 0
23
A B
C D E
F G
I JH
t Processors
0 P1 P2 P3
1 A B
2 C D E
3 G F
4 H I J
Priority Assignment
Scheduling
Example 3: Scheduling Interval Ordered tasks
2424
A B
C D E
G
K LH
Priority Assignment
Scheduling
F
I JH
25
Communication Models
• Completion Time– Execution time– Communication time
• Completion Time as 2 Components• Completion Time from the Gantt Chart
26
Completion Time as 2 Components
• Completion Time = Execution Time + Total Communication Delay
• Total Communication Delay = Number of communication messages * delay per message
• Execution time maximum finishing time of any task
• Number of communication messages – Model A– Model B
27
Completion Time from the Gantt Chart (Model C)• Completion Time = Schedule Length
• This model assumes the existence of an I/O processor with every processor in the system
• Communication delay between two tasks allocated to the same processor is negligible.
• Communication delay is counted only between two tasks assigned to different processors
28
Example
A
1
D
1
E
1
B
1
C
1
Assume a system with 2 processors
29
Models A and B
• Assume tasks A, B, and D are assigned to P1 and tasks C and E are assigned to p2
A
B
D
P1
C
E
P2
Model A
Number of messages = 2
Completion time = 3 + 2
Model B
Number of messages = 1
Completion time = 3 + 1
A
1
D
1
E
1
B
1
C
1
30
Model C
A
B
C D
E
Communication Delay
P1 P20
1
2
3
4
A
1
D
1
E
1
B
1
C
1
31
A
4
D
5
E
3
B
9
C
7
L
1
M
1
F
1
G
1
I
1
H
1
K
1
J
1
Processors
P1 P2 P3
A
B C D
E H J
F L K
G M
H I
Model A B Task
Assignment
Processors
P1 P2 P3
A
B
B C D
E H J
F K
G L
H M
I
ModelC Task
Assignment
Model ANumber of Messages = 2 + 2Completion time = 3 + (2*4 + 2*3) = 17
Model BNumber of Messages = 2 + 1 = 3 Completion time = 3 + (2*4 + 1*3) = 14
Model CCompletion time = 8
Communication delay is displayed in the graph for A & B.Assume execution time of a task is 1.
(assume all communication delay is 1 for simplicity)
Models A,B,C Example
32
Heuristics
A heuristic produces an answer in less than exponential time, but does not guarantee an optimal solution.
•Communication delay versus parallelism•Clustering•Duplication
33
a
b c
a
b
c
a
b
c
c
P1 P2 P1 P2
10
25
0
10
25
0
40
15
35c
40
50
x
x
x
Task Graph
Gantt Chart-2Gantt Chart-1
Task Exceution time
abc
101515
y
Arc Communication
(a,b) y
(a,c) x < y
x = 5 x = 25
30
Time Time
Communication Delay versus Parallelism
34
Clustering
a
bc
g
d e
f
a
bc
g
d e
f
a
bc
g
d e
f
(a)(b)
(c)
a
bc
g
d e
f
(d)
Clustering Example 1 Part 1
35
A
B
C
ED
F
G
4 3
2
1.5
2
1.5
5
1
1
1
1
2
1
1
Time P1 P2
1 A
2
B
3 C
4 D
5
6 E
7
8 F
9
10 G
Task Assignment
1
Communication DelayNOP
Clustering Example 1 Part 2
36
A
B
C
ED
F
G
4 3
2
1.5
2
1.5
5
1
1
1
1
2
1
1
Time P1 P2
1 A
2
B3 C
4 D
5
6E7
8 F
9 G
Task Assignment
1
Communication DelayNOP
37
Clustering Example 2
37
A
B D
FE
G
H
4 3
2
22
5
3
2
1
1
2
3
1
Time P1 P2
1 A
2
B3
4
5 D
6 D
7
C
E
8 E
9 F
10 F
11 G
12
13 H
Task Assignment
C
2
1
5
2
1
Communication DelayNOP
38
a
b c
a
b c
a
b
c
P1 P2 P1 P2
xx
Task Graph
x
a
No Duplication Task a is Duplicated
Duplications
Duplication Example (Using Clustering Example 1 Part 2)
39
A
B
C
ED
F
G
4 3
2
1.5
2
1.5
5
1
1
1
1
2
1
1
Time P1 P2
1 A A
2
B
C
3 D
4
5E6
7 F
8 G
Task Assignment
1
Communication DelayNOP
40
Scheduling and grain packing• Four major steps are involved in the grain determination and the process of
scheduling optimization:
– Step 1. Construct a fine-grain program graph.
– Step 2. Schedule the fine-grain computation.
– Step 3. Grain packing to produce the coarse grains.
– Step 4. Generate a parallel schedule based on the packed graph.
41
Program decomposition for static multiprocessor scheduling
• two 2 x 2 matrices A and B are multiplied to compute the sum of the four elements in the resulting product matrix C = A x B. There are eight multiplications and seven additions to be performed in this program, as written below:
2221
1211
2221
1211
2221
1211
C C
C C
B B
B B
A A
A A
42
Example 2.5 Ctd’– C11 = A11 B11 + A12 B21
– C12 = A11 B12 + A12 B22
– C21 = A21 B11 + A22 B21
– C22 = A21 B11 + A22 B22
– Sum = C11 + C12 + C21 + C22
2221
1211
2221
1211
2221
1211
C C
C C
B B
B B
A A
A A
43
44
45