of 21 1 low-cost task scheduling for distributed-memory machines andrei radulescu and arjan j.c. van...
TRANSCRIPT
![Page 1: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam](https://reader035.vdocument.in/reader035/viewer/2022070408/56649e585503460f94b51ae0/html5/thumbnails/1.jpg)
of 21
1
Low-Cost Task Scheduling for Distributed-Memory Machines
Andrei Radulescu and Arjan J.C. Van Gemund
Presented by Bahadır Kaan Özütam
![Page 2: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam](https://reader035.vdocument.in/reader035/viewer/2022070408/56649e585503460f94b51ae0/html5/thumbnails/2.jpg)
of 21
2
Outline Introduction List Scheduling Preliminaries General Framework for LSSP Complexity Analysis Case Study Extensions for LSDP Conclusion
![Page 3: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam](https://reader035.vdocument.in/reader035/viewer/2022070408/56649e585503460f94b51ae0/html5/thumbnails/3.jpg)
of 21
3
Introduction
Task Scheduling Scheduling heuristics Shared-memory - Distributed Memory Bounded - unbounded number of processors Multistep - singlestep methods Duplicating - nonduplicating methods Static - dynamic priorities
![Page 4: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam](https://reader035.vdocument.in/reader035/viewer/2022070408/56649e585503460f94b51ae0/html5/thumbnails/4.jpg)
of 21
4
List Scheduling LDSP and LSSP algorithms LSSP (List Scheduling with Static Priorities);
Tasks are scheduled in the order of their previously computed priorities on the task’s “best” processor.
Best processor is ... The processor enabling the earliest start time, if the
performance is the main concern The processor becoming idle the earliest, if the speed
is the main concern. LSDP (List Scheduling with Dynamic Priorities);
Priorities for task-processor pairs more complex
![Page 5: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam](https://reader035.vdocument.in/reader035/viewer/2022070408/56649e585503460f94b51ae0/html5/thumbnails/5.jpg)
of 21
5
List Scheduling Reducing LSSP time complexity
O(V log(V) + (E+V)P)
=> O(V log (P) + E)
V = expected number of tasks
E = expected number of dependencies
P = number of processors
1. Considering only two processors
2. Maintaining partially-sorted task priority queue with a limited number of tasks
![Page 6: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam](https://reader035.vdocument.in/reader035/viewer/2022070408/56649e585503460f94b51ae0/html5/thumbnails/6.jpg)
of 21
6
Preliminaries
Parallel programs (DAG) G = (V,E) Computation cost Tw(t)
Communication cost Tc(t, t’)
Communication and computation ratio (CCR)
The task graph width (W)
EE
E EE
EE
EE
V
V V V
VVV
V
E
![Page 7: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam](https://reader035.vdocument.in/reader035/viewer/2022070408/56649e585503460f94b51ae0/html5/thumbnails/7.jpg)
of 21
7
Preliminaries Entry and exit tasks The bottom level (Tb) of the task Ready = parents scheduled Start time Ts(t)
Finish time Tf(t) Partial schedule Processor ready time
Tr(p) = max Tf(t) , t V, Pr(t)=p.
Processor becoming idle the earliest (pr)
Tr(pr) = min Tr(p) , p P
![Page 8: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam](https://reader035.vdocument.in/reader035/viewer/2022070408/56649e585503460f94b51ae0/html5/thumbnails/8.jpg)
of 21
8
Preliminaries The last message arrival time
Tm(t) = max { Tf(t’) + Tc(t’, t) }
(t’, t) E The enabling processor pe(t); from which last
message arrives Effective message arrival time
Te(t,p) = max { Tf(t’) + Tc(t’, t) }
(t’, t) E , pt(t’) <> p
The start time of a ready task, once scheduled Ts(t, p) = max { Te(t, p), Tr(p) }
![Page 9: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam](https://reader035.vdocument.in/reader035/viewer/2022070408/56649e585503460f94b51ae0/html5/thumbnails/9.jpg)
of 21
9
General Framework for LSSP
General LSSP algorithm Task’s priority computation,
O(E + V) Task selection,
O(V log W) Processor selection
O( (E + V) P)
![Page 10: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam](https://reader035.vdocument.in/reader035/viewer/2022070408/56649e585503460f94b51ae0/html5/thumbnails/10.jpg)
of 21
10
General Framework for LSSP
Processor Selection selecting a processor
1. The enabling processor
2. Processor becoming idle first
Ts(t) = max { Te (t, p), Tr ( p ) }
![Page 11: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam](https://reader035.vdocument.in/reader035/viewer/2022070408/56649e585503460f94b51ae0/html5/thumbnails/11.jpg)
of 21
11
General Framework for LSSP Lemma 1.
p <> pe(t) : Te (t, p) = Tm(t)
Theorem 1. t is a ready task, one of the processors p {pe(t), pr } satisfies
Ts (t, p) = min Ts(t, px), px P
O( (E + V) P ) O (V log (P) + E ) O (E + V) to traverse the task graph O (V log P) to maintain the processors sorted
![Page 12: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam](https://reader035.vdocument.in/reader035/viewer/2022070408/56649e585503460f94b51ae0/html5/thumbnails/12.jpg)
of 21
12
General Framework for LSSP
Task Selection O (V log W) can be reduced by sorting only
some of the tasks. Task priority queue
1. A sorted list of size H
2. A FIFO list ( O ( 1 ) ) decreases to O(V log H)
H needs to be adjusted H = P is optimal ( O ( V log P ) )
![Page 13: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam](https://reader035.vdocument.in/reader035/viewer/2022070408/56649e585503460f94b51ae0/html5/thumbnails/13.jpg)
of 21
13
Complexity Analysis
Computing task prioritiesO ( E + V )
Task selection O ( V log W ) O ( V log H ) for partially sorted priority
queue
O ( V log (P) ) for queue of size P Processor Selection O (E + V)
O (V log P) Total complexity
O ( V ( log (W) + log (P) ) + E) fully sorted
O ( V ( log (P) + E ) partially sorted
![Page 14: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam](https://reader035.vdocument.in/reader035/viewer/2022070408/56649e585503460f94b51ae0/html5/thumbnails/14.jpg)
of 21
14
Case Study
MCP (Modified Critical Path) The task having the highest
bottom level has the highest priority
FCP (Fast Critical Path) 3 Processors Partially sorted priority queue
of size 2 7 tasks
4
4
1
1 32
31
11
t0 / 2
t1 / 2 t2 / 2 t3 / 2
t6 / 2t5 / 3t4 / 3
t7 / 2
2
![Page 15: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam](https://reader035.vdocument.in/reader035/viewer/2022070408/56649e585503460f94b51ae0/html5/thumbnails/15.jpg)
of 21
15
Case Study
4
4
1
1 32
31
11
t0 / 2
t1 / 2 t2 / 2 t3 / 2
t6 / 2t5 / 3t4 / 3
t7 / 2
2
Ready tasks Scheduling
sorted FIFOt
t -> p [ Ts - Tf ]
t0 [15] - t0 t0 -> p0 [0 - 2]
t1 [11]
t2 [9]t3 [12] t1 t1 -> p0 [2 - 4]
t3 [12] t4 [6]
t2 [9] t5 [8]t3 t3 -> p1 [3 - 6]
t2 [9]
t4 [6]t5 [8] t2 t2 -> p0 [4 - 6]
t5 [8]
t4 [6]t6 [6] t5 t5 -> p2 [6 - 9]
t4 [6]
t6 [6]- t4 t4 -> p0 [6 - 9]
t6 [6] - t6 t6 -> p1 [7 - 9]
t7 [2] - t7 t7 -> p2 [11 - 13]
![Page 16: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam](https://reader035.vdocument.in/reader035/viewer/2022070408/56649e585503460f94b51ae0/html5/thumbnails/16.jpg)
of 21
16
Extensions for LSDP
Extend the approach to dynamic priorities ETF : ready task starts the earliest
ERT : ready task finishes the earliest
DLS : task-processor having highest dynamic level General formula
(t, p) = ( t ) + max { Te (T, p), Tr (p) } ETF ( t ) = 0
ERT ( t ) = Tw( t )
DLS ( t ) = - Tb(t)
![Page 17: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam](https://reader035.vdocument.in/reader035/viewer/2022070408/56649e585503460f94b51ae0/html5/thumbnails/17.jpg)
of 21
17
Extensions for LSDP
EP case on each processor, the tasks are sorted the processors are sorted
non-EP case the processor becoming idle first if this is EP, it falls to the EP case
![Page 18: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam](https://reader035.vdocument.in/reader035/viewer/2022070408/56649e585503460f94b51ae0/html5/thumbnails/18.jpg)
of 21
18
Extensions for LSDP
3 tries; 1 for EP case, 1 for non-EP case
Task priority queues maintained; P for EP case, 2 for non-EP case
Each task is added to 3 queues; 1 for EP case, 2 for non-EP case
Processor queues; 1 for EP case, 1 for non-EP case
![Page 19: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam](https://reader035.vdocument.in/reader035/viewer/2022070408/56649e585503460f94b51ae0/html5/thumbnails/19.jpg)
of 21
19
Complexity
Originally O ( W ( E + V ) P )
now O ( V (log (W) + log (P) ) + E )
can be further reduced using partially sorted priority queue. A size of P is required to maintain comparable performance
O ( V log (P) + E )
![Page 20: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam](https://reader035.vdocument.in/reader035/viewer/2022070408/56649e585503460f94b51ae0/html5/thumbnails/20.jpg)
of 21
20
Conclusion
LSSP can be performed at a significantly lower cost... Processor selection between only two processors;
enabling processor or processor becoming idle first Task selection, only a limited number of tasks are
sorted Using the extension of this method, LSDP
complexity also can be reduced For large program and processor dimensions,
superior cost-performance trade-off.
![Page 21: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam](https://reader035.vdocument.in/reader035/viewer/2022070408/56649e585503460f94b51ae0/html5/thumbnails/21.jpg)
of 21
21
Thank You
Questions?