task scheduling on adaptive multi-core
TRANSCRIPT
![Page 1: TASK SCHEDULING ON ADAPTIVE MULTI-CORE](https://reader033.vdocument.in/reader033/viewer/2022051112/55aa74131a28ab6a2e8b48f8/html5/thumbnails/1.jpg)
Task Scheduling on Adaptive Multi-Core
Under the guidance of KAVITHA n (asst. Professor)
Haris M E
Roll No. 29
Seminar on :
1
![Page 2: TASK SCHEDULING ON ADAPTIVE MULTI-CORE](https://reader033.vdocument.in/reader033/viewer/2022051112/55aa74131a28ab6a2e8b48f8/html5/thumbnails/2.jpg)
contents
INTRODUCTION
ADAPTIVE MULTI-CORE
BACKGROUND WORK
BAHURUPI ADAPTIVE MULTI-CORE
OPTIMAL SCHEDULE FOR ADAPTIVE MULTI-CORE
CONCLUSION
FUTURE WORK
REFERENCES
2
![Page 3: TASK SCHEDULING ON ADAPTIVE MULTI-CORE](https://reader033.vdocument.in/reader033/viewer/2022051112/55aa74131a28ab6a2e8b48f8/html5/thumbnails/3.jpg)
terms
3
• Instruction-level parallelism (ILP) is a measure of how many of the operations in a computer program can be performed simultaneously.
• Thread-Level Parallelism (TLP) is a software capability that enables a program to work with multiple threads at the same time.
![Page 4: TASK SCHEDULING ON ADAPTIVE MULTI-CORE](https://reader033.vdocument.in/reader033/viewer/2022051112/55aa74131a28ab6a2e8b48f8/html5/thumbnails/4.jpg)
INTRODUCTION
4
• Computing systems have made the irreversible transition towards multi-core architectures due to power and thermal limits.
• Increasing number of simple cores enable parallel applications benefit from abundant thread-level parallelism (TLP), while sequential fragments suffer from poor exploitation of instruction-level parallelism (ILP)
• Adaptive architectures can seamlessly exploit both ILP and TLP.
![Page 5: TASK SCHEDULING ON ADAPTIVE MULTI-CORE](https://reader033.vdocument.in/reader033/viewer/2022051112/55aa74131a28ab6a2e8b48f8/html5/thumbnails/5.jpg)
Symmetric multi-core
5
• Symmetric multi-core solutions are perfect match for easily parallelizable applications that can exploit significant thread-level parallelism (TLP).
• Current trend is to multiply the number of cores on chip to offer more TLP, while reducing the complexity of the core to avoid power and thermal issues.
![Page 6: TASK SCHEDULING ON ADAPTIVE MULTI-CORE](https://reader033.vdocument.in/reader033/viewer/2022051112/55aa74131a28ab6a2e8b48f8/html5/thumbnails/6.jpg)
Asymmetric multi-core
6
• Asymmetric multi-cores are positioned to accommodate various workload (mix of ILP and TLP) better than symmetric multi-cores.
• The asymmetric multi-core architecture provides opportunity for the sequential applications to exploit ILP through a complex core.
• As the mix of simple and complex cores is freezed at design time, an asymmetric multi-core lacks the flexibility to adjust itself to dynamic workload.
![Page 7: TASK SCHEDULING ON ADAPTIVE MULTI-CORE](https://reader033.vdocument.in/reader033/viewer/2022051112/55aa74131a28ab6a2e8b48f8/html5/thumbnails/7.jpg)
Adaptive multi-core
• Adaptive multi-core architectures are physically fabricated as a set of simple, identical cores.
• At run time, two or more such simple cores can be coalesced together to create a more complex virtual core.
7
• The next step forward to support diverse and dynamic workload is to design a multi-core that can transform itself at runtime according to the applications.
![Page 8: TASK SCHEDULING ON ADAPTIVE MULTI-CORE](https://reader033.vdocument.in/reader033/viewer/2022051112/55aa74131a28ab6a2e8b48f8/html5/thumbnails/8.jpg)
ADAPTIVE MULTI-CORE…
8
• Adaptive multi-cores appear well poised to support diverse and dynamic workload consisting of a mix of ILP and TLP.
• The performance evaluation of the adaptive multi-core only looks at how well a virtual complex core formed with simple physical cores can exploit ILP in a sequential application or TLP in parallel applications, independently.
![Page 9: TASK SCHEDULING ON ADAPTIVE MULTI-CORE](https://reader033.vdocument.in/reader033/viewer/2022051112/55aa74131a28ab6a2e8b48f8/html5/thumbnails/9.jpg)
Background work
9
The following are few adaptive multi-core architectures have been proposed in the literature recently.
• Core Fusion fuses homogeneous cores using complex hardware mechanism.
• They evaluate the performance by running serial tasks and parallel tasks separately on the adaptive multi-core.
![Page 10: TASK SCHEDULING ON ADAPTIVE MULTI-CORE](https://reader033.vdocument.in/reader033/viewer/2022051112/55aa74131a28ab6a2e8b48f8/html5/thumbnails/10.jpg)
Background work….
10
• Voltron exploits different forms of parallelism by coupling cores that together act as a VLIW processor.
• This architecture relies on a very complex compiler that must detect all forms of Parallelism found in the sequential program.
• The evaluation is performed with only sequential applications.
• Federation shows an alternative solution where two scalar cores are merged into a 2-way out-of-order (ooo) core.
• They evaluate the performance by running only sequential applications.
![Page 11: TASK SCHEDULING ON ADAPTIVE MULTI-CORE](https://reader033.vdocument.in/reader033/viewer/2022051112/55aa74131a28ab6a2e8b48f8/html5/thumbnails/11.jpg)
Bahurupi adaptive multi-core
11
• Bahurupi is a cluster based architecture that imposes realistic constraints on the scheduler.
• Figure shows an example of 8-core Bahurupi architecture with two clusters.(C0-C3) and (C4-C7) consisting of 2-way-out-of-order cores.
• In coalition mode, the cores can merge together to create a virtual core that can exploit more ILP.
![Page 12: TASK SCHEDULING ON ADAPTIVE MULTI-CORE](https://reader033.vdocument.in/reader033/viewer/2022051112/55aa74131a28ab6a2e8b48f8/html5/thumbnails/12.jpg)
12
• A 2-core coalition behaves like a 4-way ooo processor, while a 4-core coalition behaves like an 8-way ooo processor.
• Bahurupi can only create coalition within a cluster.
• Thus each coalition can consist of at most 4 cores and each cluster can have at most one active coalition at a time. The highlighted cores in Fig. 3 are involved in two coalitions of two and four cores.
• Here one parallel application runs its two threads on cores C2 and C3, one medium-ILP sequential application is scheduled to coalition(P0-P1) and one high-ILP sequential application is scheduled to coalition(P4-P7).
![Page 13: TASK SCHEDULING ON ADAPTIVE MULTI-CORE](https://reader033.vdocument.in/reader033/viewer/2022051112/55aa74131a28ab6a2e8b48f8/html5/thumbnails/13.jpg)
OPTIMAL SCHEDULE FOR ADAPTIVE MULTI-CORE
13
• The ideal adaptive multi-core architecture consist of m physical 2-way superscalar out-of-order cores. Any subset of these cores can be combined together to form one or more complex out-of-order cores.• If r cores(r<=m) form a coalition, then the resulting
complex core supports 2r-way out-of-order executions.• The architecture can support any no of coalitions as
long as the total number cores included in the coalition at any point does not exceed m.
![Page 14: TASK SCHEDULING ON ADAPTIVE MULTI-CORE](https://reader033.vdocument.in/reader033/viewer/2022051112/55aa74131a28ab6a2e8b48f8/html5/thumbnails/14.jpg)
14
The scheduling problem:• Consider an ideal adaptive multi-core architecture
consisting of m homogeneous independent physical processors {P0, P1, P2,…,Pm-1} running a set of n (n< = m) preemptive malleable tasks {T0, T1,T2,…,Tn-1}. We assume that the core coalition does not incur any performance overhead. The objective is to allocate and schedule the tasks on the core so as to minimize the makespan Cmax
=maxj{Cj} where Cj is the finish time of task Tj.
• The adaptive architecture allows both sequential and parallel applications to use time varying number of cores.
![Page 15: TASK SCHEDULING ON ADAPTIVE MULTI-CORE](https://reader033.vdocument.in/reader033/viewer/2022051112/55aa74131a28ab6a2e8b48f8/html5/thumbnails/15.jpg)
15
• Let us denote the number of processors assigned to the task Tj by rj, where 0<rj<=m.• Then the set of feasible resource allocations of
processors to task is a follows
R = {r = (r0,…,rn-1 ) | rj>0, Σrj<=m}n-1
j=0
![Page 16: TASK SCHEDULING ON ADAPTIVE MULTI-CORE](https://reader033.vdocument.in/reader033/viewer/2022051112/55aa74131a28ab6a2e8b48f8/html5/thumbnails/16.jpg)
16
Task scheduling in bahurupi
Task Scheduling constraints:
When scheduling tasks on a realistic adaptive multi-core architecture, we must take into consideration all the constraints and limitations imposed by the system.
• Constraint 1(C1):
A sequential application can use at most four cores.
![Page 17: TASK SCHEDULING ON ADAPTIVE MULTI-CORE](https://reader033.vdocument.in/reader033/viewer/2022051112/55aa74131a28ab6a2e8b48f8/html5/thumbnails/17.jpg)
17
• Constraint 2(C2):It can form at most m/4 coalitions at any time.
• Constraint 3(C3):A sequential application can only use cores that
belong to the same cluster.
• There is no such constraints for the parallel tasks. A parallel task may use any number of available cores to minimize the overall timespan.
![Page 18: TASK SCHEDULING ON ADAPTIVE MULTI-CORE](https://reader033.vdocument.in/reader033/viewer/2022051112/55aa74131a28ab6a2e8b48f8/html5/thumbnails/18.jpg)
18
Task scheduling algorithm
![Page 19: TASK SCHEDULING ON ADAPTIVE MULTI-CORE](https://reader033.vdocument.in/reader033/viewer/2022051112/55aa74131a28ab6a2e8b48f8/html5/thumbnails/19.jpg)
19
![Page 20: TASK SCHEDULING ON ADAPTIVE MULTI-CORE](https://reader033.vdocument.in/reader033/viewer/2022051112/55aa74131a28ab6a2e8b48f8/html5/thumbnails/20.jpg)
The experiments
20
• They conducted a study to characterize the performance limit of adaptive multi-cores compared to static multi-cores.
• They generated different workload (task sets) consisting of varying mix of ILP and TLP tasks.
• Used 850 task sets and computed the makespan of each task set on different multi-core architectures.
![Page 21: TASK SCHEDULING ON ADAPTIVE MULTI-CORE](https://reader033.vdocument.in/reader033/viewer/2022051112/55aa74131a28ab6a2e8b48f8/html5/thumbnails/21.jpg)
21
• They modeled seven different static and adaptive multi-core configurations in the study as shown in Table.
![Page 22: TASK SCHEDULING ON ADAPTIVE MULTI-CORE](https://reader033.vdocument.in/reader033/viewer/2022051112/55aa74131a28ab6a2e8b48f8/html5/thumbnails/22.jpg)
Results
22
![Page 23: TASK SCHEDULING ON ADAPTIVE MULTI-CORE](https://reader033.vdocument.in/reader033/viewer/2022051112/55aa74131a28ab6a2e8b48f8/html5/thumbnails/23.jpg)
23
• The normalized speedup of adaptive architectures ranges from10% to 49%.
• On an average, adaptive architectures outperform the asymmetric configurations A1, A2 and A3 by 18%, 35% and 52% respectively.
• When compared with the symmetric configuration S2, the adaptive architecture performs 26% better, which makes the symmetric configuration S2 a better option than the asymmetric configurations A2 andA3.
![Page 24: TASK SCHEDULING ON ADAPTIVE MULTI-CORE](https://reader033.vdocument.in/reader033/viewer/2022051112/55aa74131a28ab6a2e8b48f8/html5/thumbnails/24.jpg)
24
• The results clearly demonstrate that adaptive architectures (Ideal and Bahurupi) perform significantly better when compared to static symmetric and asymmetric architectures.
• The performance of Bahurupi is practically identical to that of Ideal adaptive architecture even though Bahurupi imposes certain constraints on core coalition.
![Page 25: TASK SCHEDULING ON ADAPTIVE MULTI-CORE](https://reader033.vdocument.in/reader033/viewer/2022051112/55aa74131a28ab6a2e8b48f8/html5/thumbnails/25.jpg)
limitations
• Adaptive multicore architectures require complex scheduling algorithms unlike static multicores.
• Assumption that the core coalition does not incur any performance overhead is not true.
• In bahurupi adaptive multi-core architecture core coalitions are limited by realistic constraints which makes it perform lesser than ideal adaptive multi-core architectures.
25
![Page 26: TASK SCHEDULING ON ADAPTIVE MULTI-CORE](https://reader033.vdocument.in/reader033/viewer/2022051112/55aa74131a28ab6a2e8b48f8/html5/thumbnails/26.jpg)
conclusion
• They have presented a comprehensive quantitative approach to establish the performance potential of adaptive multi-core architectures compared to static symmetric and asymmetric multi-cores.
• This is the first performance study that considers a mix of sequential and parallel workloads to observe the capability of adaptive multi-cores in exploiting both ILP and TLP.
26
![Page 27: TASK SCHEDULING ON ADAPTIVE MULTI-CORE](https://reader033.vdocument.in/reader033/viewer/2022051112/55aa74131a28ab6a2e8b48f8/html5/thumbnails/27.jpg)
27
• They employed an optimal algorithm that allocates and schedules the tasks on varying number of cores so as to minimize the timespan.
• The experiments reveal that both the ideal and the realistic adaptive architecture provide significant reduction in timespan for mixed workload compared to static symmetric and asymmetric architectures.
![Page 28: TASK SCHEDULING ON ADAPTIVE MULTI-CORE](https://reader033.vdocument.in/reader033/viewer/2022051112/55aa74131a28ab6a2e8b48f8/html5/thumbnails/28.jpg)
FUTURE WORK
• Developing optimal scheduler for higher versions of adaptive multicore architectures.
• Improving the scheduler algorithm for better utilization of processors.
• Improving bahurupi adaptive multicore architecture more close to ideal adaptive multicore architecture.
28
![Page 29: TASK SCHEDULING ON ADAPTIVE MULTI-CORE](https://reader033.vdocument.in/reader033/viewer/2022051112/55aa74131a28ab6a2e8b48f8/html5/thumbnails/29.jpg)
REFERENCES
• P.-E. Bernard, T. Gautier, and D. Trystram,“Large Scale Simulation of Parallel Molecular Dynamics,” Proc. 13th Int’l Symp. Parallel Processing 10thSymp. Parallel Distributed Processing, pp. 638-644, 1999.
• C. Bienia, S. Kumar, J.P. Singh, and K. Li,“The PARSEC Benchmark Suite: Characterization and Architectural Implications,”Proc. 17th Int’l Conf. Parallel Architectures Compilation Techniques, pp. 72-81, 2008.
• E. Blayo and L. Debreu, “Adaptive Mesh Refinement for Finite-Difference Ocean Models: First Experiments,”J. Physical Oceanogra-phy, vol. 29, pp. 1239-1250, 1999.
29