methods for emulation of multi-core cpu performance
TRANSCRIPT
Methods for Emulation of Multi-Core CPUPerformance
Tomasz Buchert1 Lucas Nussbaum2 Jens Gustedt1
1 INRIA Nancy – Grand Est
2 LORIA / Nancy - Universite
Validation of distributed systems
Approaches:Theoretical approach (paper and pencil)
, the most general results and understanding/ very hard (leads to unsolvability results)
Experimentation (real application on a real environment), realistic context, credibility/ difficulty of preparation and control, questionable reproducibility
Simulation (modeled application inside modeled environment), very simple and perfectly reproducible/ experimental bias, possibly unrealistic
Emulation (real application inside a modeled environment), control over the experiment parameters/ difficult
Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 2 / 26
Emulation
The perfect emulated environment should emulate (independently):
Network bandwidth, latency, topologyMemory capabilitiesBackground noise (network, faults)CPU speed and its features
Some parts implemented in Wrekavoc – a tool to define and controlheterogeneity of the cluster
In this talk, however, we specifically concentrate on
Emulation of CPU speed
Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 3 / 26
Emulation
The perfect emulated environment should emulate (independently):
Network bandwidth, latency, topologyMemory capabilitiesBackground noise (network, faults)CPU speed and its features
Some parts implemented in Wrekavoc – a tool to define and controlheterogeneity of the cluster
In this talk, however, we specifically concentrate on
Emulation of CPU speed
Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 3 / 26
Our goal
0 1 2 3 4 5 6 7
(1) control over the speed of each CPU/core independently
Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 4 / 26
Our goal
0 1 2 3 4 5 6 7
VN 1 VN 2 VN 3 Virtual node 4
(2) ability to create separately scheduled zones of tasks
Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 4 / 26
Existing methods (CPU-Freq)
Hardware solution to reduce heat, noise and power usageFor:
no overhead of emulationcompletely unintrusivemeaningful CPU time measure
Against:only a finite set of different frequency levels
Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 5 / 26
Existing methods (CPU-Lim)
Method available in Wrekavoc toolAlgorithm:
if CPU usage ≥ threshold → send SIGSTOP to the processif CPU usage < threshold → send SIGCONT to the process
CPU usage = CPU time of the processprocess lifetime
For:easy and almost POSIX-compliant
Against:intrusive and unscalabledecision based on one process instead of global CPU usagesleeping is indistinguishable from preemption
Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 6 / 26
Existing methods (Fracas)
Based on idea from KRASH (a load injection tool)Uses Linux Cgroups and Completely Fair SchedulerA predefined portion of the CPU is given to tasks burning CPUAll other processes are given the remaining CPU time
Emulatedprocesses
CPU burner
Core 1
Emulatedprocesses
CPU burner
Core 2
Emulatedprocesses
CPU burner
Core 3
Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 7 / 26
Existing methods (Fracas)
Based on idea from KRASH (a load injection tool)Uses Linux Cgroups and Completely Fair SchedulerA predefined portion of the CPU is given to tasks burning CPUAll other processes are given the remaining CPU time
For:unintrusivescalable
Against:unportable to other systemssensitive to the configuration of the scheduler
Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 7 / 26
New methods (CPU-Gov)
Generalization of CPU-FreqAlternates between two neighbouring hardware frequencies
1.2 GHz 2.4
GH
z
τ
1.5 GHz
τ
0.75τ 0.25τ
Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 8 / 26
New methods (CPU-Gov)
Generalization of CPU-FreqAlternates between two neighbouring hardware frequencies
For:no overhead, unintrusive and meaningful CPU time measure(inherited from CPU-Freq)continuous range of emulated frequency
Against:dependency on the hardware implementation (inherited fromCPU-Freq)special algorithm for small values of emulated frequency
Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 8 / 26
New methods (CPU-Hogs)
Generalization of CPU-burning techniqueFor each core there is a high-priority thread createdThey ”burn” a required number of CPU cycles
For:simple and portable (POSIX)does not rely on the hardware
Against:theoretical problems with scalability (not observed)
Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 9 / 26
New methods (CPU-Hogs)
Generalization of CPU-burning techniqueFor each core there is a high-priority thread createdThey ”burn” a required number of CPU cycles
cores
time
0
1
Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 9 / 26
Evaluation
Microbenchmarks with different types of work:
CPU intensive – running a tight computational loopIO bound – sending UDP packets over a networkCPU and IO intensive – sleeping mixed with a computationmultiprocessing – running multiple processes with CPU workmultithreading – running multiple threads with CPU workmemory speed (STREAM benchmark) – sustainable memorybandwidth
Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 10 / 26
Evaluation (cont.)
Tested with 1, 2, 4 and 8 emulated coresX-axis – emulated frequencyY-axis – speed perceived by the benchmarkeach test repeated 40 times, results = average with 95%confidence intervalEvaluation performed on Grid’5000 platform
Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 11 / 26
Grid’5000
9 sites, 1528 machinesLille, Rennes, Orsay, Nancy, Bordeaux, Lyon, Grenoble, Toulouse, Sophia
Dedicated to research on distributed systems and HPC
Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 12 / 26
CPU intensive work (one core)
0.5 1 1.5 2 2.5 3
2,000
4,000
6,000
8,000
Emulated CPU frequency (GHz)
Loop
s/se
c
CPU-LimCPU-Hogs
FracasCPU-GovCPU-Freq
�� ��All methods work as expected.
Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 13 / 26
IO-intensive work (one core)
0.5 1 1.5 2 2.5 3
0.5
1
·104
Emulated CPU frequency (GHz)
Loop
s/se
c
CPU-LimCPU-Hogs
FracasCPU-GovCPU-Freq
�� ��IO operations should not scale with CPU frequency.
Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 14 / 26
Memory speed (one core)
0.5 1 1.5 2 2.5 3
0.2
0.4
0.6
0.8
1
·104
Emulated CPU frequency (GHz)
MB/
sec
CPU-LimCPU-Hogs
FracasCPU-GovCPU-Freq
�� ��Ideally, memory speed would not be scaled as well.
Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 15 / 26
Computing and sleeping workload (one core)
0.5 1 1.5 2 2.5 3
2,000
4,000
6,000
Emulated CPU frequency (GHz)
Loop
s/se
c
CPU-LimCPU-Hogs
FracasCPU-GovCPU-Freq
�� ��The relation should be proportional, but CPU-Lim’s is not.
Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 16 / 26
Multiprocessing benchmark (one core)
0.5 1 1.5 2 2.5 3
500
1,000
1,500
Emulated CPU frequency (GHz)
Loop
s/se
c
CPU-LimCPU-Hogs
FracasCPU-GovCPU-Freq
�� ��This relation should be proportional again (but CPU-Lim’s is not).
Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 17 / 26
Multithreading benchmark (one core)
0.5 1 1.5 2 2.5 3
500
1,000
1,500
Emulated CPU frequency (GHz)
Loop
s/se
c
CPU-LimCPU-Hogs
FracasCPU-GovCPU-Freq
�� ��The execution speed scales with the frequency.
Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 18 / 26
Multithreading benchmark (two cores)
0.5 1 1.5 2 2.5 3
1,000
2,000
3,000
4,000
Emulated CPU frequency (GHz)
Loop
s/se
c
CPU-LimCPU-Hogs
FracasCPU-GovCPU-Freq
�� ��CPU-Lim and Fracas run twice as slow as CPU-Freq.
Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 19 / 26
Multiprocessing benchmark (two cores)
0.5 1 1.5 2 2.5 3
1,000
2,000
3,000
4,000
Emulated CPU frequency (GHz)
Loop
s/se
c
CPU-LimCPU-Hogs
FracasCPU-GovCPU-Freq
�� ��CPU-Lim and Fracas fail in this benchmark too.
Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 20 / 26
Multiprocessing benchmark (four cores)
0.5 1 1.5 2 2.5 30
2,000
4,000
6,000
Emulated CPU frequency (GHz)
Loop
s/se
c
CPU-LimCPU-Hogs
FracasCPU-GovCPU-Freq
�� ��What happened here?
Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 21 / 26
Multiprocessing benchmark (eight cores)
0.5 1 1.5 2 2.5 30
5,000
10,000
Emulated CPU frequency (GHz)
Loop
s/se
c
CPU-LimCPU-Hogs
FracasCPU-GovCPU-Freq
�� ��Fracas fails once more (but CPU-Lim doesn’t!).
Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 22 / 26
Summary of the evaluation
CPU-Freq:very good resultscoarse granularity
CPU-Lim:not scalable due to implementation, intrusivehigher variancecontrols processes, not threads
Fracas:good behavior for a single-task workloadscalablebad behavior for multitask workloadbehavior differs from one version of Linux to another
Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 23 / 26
Summary of the evaluation (cont.)
CPU-Gov and CPU-Hogs:improvement over previous methodsgood and stable behavior in virtually every benchmarkscalabilityindependent from the underlying OS
Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 24 / 26
Future work
Emulate memory bandwidthEmulate other aspects of CPUTest the methods with real-life applicationsIntegrate the best methods into an open source, user-friendlyemulator (Wrekavoc)
Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 25 / 26
Conclusions
Presented CPU-Hogs, CPU-Gov and previously existing methodsCompared them by running a set of microbenchmarksEvaluated experimentally on Grid’5000New methods show a big improvement in the quality of emulation
Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 26 / 26
Thanks for listening.
Questions?
Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 27 / 26