![Page 1: Asymmetry Aware Scheduling Algorithms for Asymmetric Processors](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c44550346895da5c1ba/html5/thumbnails/1.jpg)
Asymmetry Aware Scheduling Algorithms for Asymmetric Processors
Nagesh Lakshminarayana Sushma Rao Hyesoon Kim
Computer Science Georgia Institute of Technology
![Page 2: Asymmetry Aware Scheduling Algorithms for Asymmetric Processors](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c44550346895da5c1ba/html5/thumbnails/2.jpg)
Outline
• Background and Problem
• Application characteristics on AMP/SMP
• LJFPF Policy
• CJFPF Policy
• Conclusion
![Page 3: Asymmetry Aware Scheduling Algorithms for Asymmetric Processors](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c44550346895da5c1ba/html5/thumbnails/3.jpg)
Heterogeneous Architectures
• A particularly interesting class of parallel machines is Heterogeneous Architecture:– Multiple types of Processing Elements (PEs)
available on the same machine
PEA
PEB
PEB
PEB
PEB
Inte
rcon
nect
![Page 4: Asymmetry Aware Scheduling Algorithms for Asymmetric Processors](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c44550346895da5c1ba/html5/thumbnails/4.jpg)
Heterogeneous Architectures
• Heterogeneous architectures are becoming very common:
Multicore CPU + GPU
IBM Cell processor
Special accelerator
Fast core
Slow core
Slow core
Slow core
Slow core
Focus of this talk
Asymmetric Processors
Fast core
![Page 5: Asymmetry Aware Scheduling Algorithms for Asymmetric Processors](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c44550346895da5c1ba/html5/thumbnails/5.jpg)
Scheduling Problem: Multiple applications
Fast core
Slow core
Slow core
Slow core
Slow core
Scalable applications
Non-scalable applications
Fast core
Fast Core
Slow Core
![Page 6: Asymmetry Aware Scheduling Algorithms for Asymmetric Processors](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c44550346895da5c1ba/html5/thumbnails/6.jpg)
Scheduling Problem: Multi-threaded application
Fast core
Slow core
Slow core
Slow core
Slow core
Fast core
![Page 7: Asymmetry Aware Scheduling Algorithms for Asymmetric Processors](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c44550346895da5c1ba/html5/thumbnails/7.jpg)
Problem
How to schedule multi-threaded applications on Asymmetric Multiprocessors (AMP)?
![Page 8: Asymmetry Aware Scheduling Algorithms for Asymmetric Processors](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c44550346895da5c1ba/html5/thumbnails/8.jpg)
Outline
• Background and Problem
• Application characteristics on AMP/SMP
• LJFPF Policy
• CJFPF Policy
• Conclusion
![Page 9: Asymmetry Aware Scheduling Algorithms for Asymmetric Processors](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c44550346895da5c1ba/html5/thumbnails/9.jpg)
Experimental Methodology
• Use a 1.87GHz two-socket Quad-core machine to measure the performance
• Use SpeedStep technology to emulate an AMP
All-slow (SMP) All 8 processors are running at 1.6 GHz
One-fast (AMP) 1 processors are running at 1.87 GHz
7 processors are running at 1.6GHz
Half-half (AMP) 4 processors are running at 1.87GHz
4 processors are running at 1.6GHz
All-fast (SMP) All processors are running at 1.87GHz
![Page 10: Asymmetry Aware Scheduling Algorithms for Asymmetric Processors](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c44550346895da5c1ba/html5/thumbnails/10.jpg)
Performance Results on AMP/SMP
0.8
0.85
0.9
0.95
1
1.05
No
rma
lize
d e
xe
cu
tio
n t
ime
All-slow
One-fast
Half-half
All-fast
![Page 11: Asymmetry Aware Scheduling Algorithms for Asymmetric Processors](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c44550346895da5c1ba/html5/thumbnails/11.jpg)
Fast core
Slow core
Slow core
Slow core
Slow core
Fast core
Slow-Limited Applications
barrier
![Page 12: Asymmetry Aware Scheduling Algorithms for Asymmetric Processors](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c44550346895da5c1ba/html5/thumbnails/12.jpg)
Middle-perf Benchmarks
barrier
Similar to a slow-limited benchmark but sequential section is much longer
![Page 13: Asymmetry Aware Scheduling Algorithms for Asymmetric Processors](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c44550346895da5c1ba/html5/thumbnails/13.jpg)
Unstable Benchmarks
barrier
barrier
Lots of barriers Asymmetric workloads
![Page 14: Asymmetry Aware Scheduling Algorithms for Asymmetric Processors](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c44550346895da5c1ba/html5/thumbnails/14.jpg)
PARSEC Benchmarks
Application Locks Barriers Cond. Variables
AMP performance category
BlackSholes 39 8 0.000 slow-limited
Bodytrack 6824702 111160 0.003 unstable
Canneal 34 0 0.003 middle-perf
dedup 10002625 0 0.009 unstable
ferret 1422579 0 0.014 slow-limited
facesim 7384488 0 0.03 middle-perf
Fluidanimate 1153407308 31998 0.02 unstable
Freqmine 39 0 0.12 middle-perf
streamcluster 1379 633174 0.013 middle-perf
swaptions 9 0 0.00 slow-limited
vips 11 0 0.0049 unstable
x264 207692 0 13793 middle-perf
![Page 15: Asymmetry Aware Scheduling Algorithms for Asymmetric Processors](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c44550346895da5c1ba/html5/thumbnails/15.jpg)
Outline
• Background and Problem
• Applications on AMP/SMP
• LJFPF Policy
• CJFPF Policy
• Conclusion
![Page 16: Asymmetry Aware Scheduling Algorithms for Asymmetric Processors](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c44550346895da5c1ba/html5/thumbnails/16.jpg)
LJFPF Policy
• Longest Job to a Fast Processor First
barrier
Fast core
Fast core Slow core
Slow core
![Page 17: Asymmetry Aware Scheduling Algorithms for Asymmetric Processors](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c44550346895da5c1ba/html5/thumbnails/17.jpg)
How Does the Scheduler Know
• Length of work?
• Current mechanism: application sends the information
• On-going work: Prediction mechanism
![Page 18: Asymmetry Aware Scheduling Algorithms for Asymmetric Processors](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c44550346895da5c1ba/html5/thumbnails/18.jpg)
Evaluation
• Matrix Multiplication
Sequential version
Parallel versionSymmetric workload
Parallel versionAsymmetric workload
![Page 19: Asymmetry Aware Scheduling Algorithms for Asymmetric Processors](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c44550346895da5c1ba/html5/thumbnails/19.jpg)
Asymmetric Workload (Matrix Multiplication)
0.9
0.95
1
1.05
1.1
1.15
1.2
300-300
310-290
320-280
330-270
340-260
350-250
360-240
No
rma
lize
d e
xecu
tion
tim
e
All-fast
Half-half(LJFPF)
Half-half (RR)
All-slow
![Page 20: Asymmetry Aware Scheduling Algorithms for Asymmetric Processors](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c44550346895da5c1ba/html5/thumbnails/20.jpg)
Real Application
• ITK (Medical image processing tool kit)– Open source but a real application
![Page 21: Asymmetry Aware Scheduling Algorithms for Asymmetric Processors](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c44550346895da5c1ba/html5/thumbnails/21.jpg)
Evaluation: MultiRegistration
• Kernel loop has 50 iterations
50 % 8 ≠0
• Divide 50 iterations into 7, 7, 7, 7, 6, 6, 5, 5
![Page 22: Asymmetry Aware Scheduling Algorithms for Asymmetric Processors](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c44550346895da5c1ba/html5/thumbnails/22.jpg)
0.92
0.94
0.96
0.98
1
1.02
1.04
All-f
ast
Ha
lf-h
alf
(LJF
PF
)
Ha
lf-h
alf
(RR
)
All-s
low
No
rma
lize
d e
xe
cu
tio
n t
imeResults: ITK Benchmark
2.3%
![Page 23: Asymmetry Aware Scheduling Algorithms for Asymmetric Processors](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c44550346895da5c1ba/html5/thumbnails/23.jpg)
Outline
• Background and Problem
• Application characteristics on AMP/SMP
• LJFPF Policy
• CJFPF Policy
• Conclusion
![Page 24: Asymmetry Aware Scheduling Algorithms for Asymmetric Processors](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c44550346895da5c1ba/html5/thumbnails/24.jpg)
Critical Section
Lock
Lock
![Page 25: Asymmetry Aware Scheduling Algorithms for Asymmetric Processors](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c44550346895da5c1ba/html5/thumbnails/25.jpg)
Critical Section Limited Workloads
Critical section
Useful workwaiting
Case (a)
Case (b)
![Page 26: Asymmetry Aware Scheduling Algorithms for Asymmetric Processors](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c44550346895da5c1ba/html5/thumbnails/26.jpg)
Critical Section Effects
0
1
2
3
4
5
6
7
8
9
10%CS 15%CS 20%CS
sp
eed
up
All-fast
Half-half
All-slow
Half-half performs similar to all-fast
![Page 27: Asymmetry Aware Scheduling Algorithms for Asymmetric Processors](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c44550346895da5c1ba/html5/thumbnails/27.jpg)
CJFPF Policy
• Critical Job to a Fast Processor First Policy
Fast core
Slow core
Slow core
Slow core
![Page 28: Asymmetry Aware Scheduling Algorithms for Asymmetric Processors](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c44550346895da5c1ba/html5/thumbnails/28.jpg)
0
1
2
3
4
5
6
7
8-12 16-24 40-60
sp
eed
up
CJFPF
RR
CJFPF Results
Longer critical sectionThe benefit of the CJFPF policy decreases
![Page 29: Asymmetry Aware Scheduling Algorithms for Asymmetric Processors](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c44550346895da5c1ba/html5/thumbnails/29.jpg)
Conclusion
• We evaluated the characteristics of multi-threaded applications on AMPs.
• Barriers and critical sections are important factors.• Propose two new scheduling policies: Longest job
to fast core first (LJFPF), critical job to fast core first (CJFPF)– Scheduling polices improve performance for asymmetric
workloads.• Future work
– Develop a prediction mechanism– Evaluate symmetric workloads on AMPs– Other kinds of heterogeneous architectures
![Page 30: Asymmetry Aware Scheduling Algorithms for Asymmetric Processors](https://reader036.vdocument.in/reader036/viewer/2022070404/56813c44550346895da5c1ba/html5/thumbnails/30.jpg)
Thank you!