methods for emulation of multi-core cpu performance

32
Methods for Emulation of Multi-Core CPU Performance Tomasz Buchert 1 Lucas Nussbaum 2 Jens Gustedt 1 1 INRIA Nancy – Grand Est 2 LORIA / Nancy - Universit´ e

Upload: others

Post on 23-Jul-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Methods for Emulation of Multi-Core CPU Performance

Methods for Emulation of Multi-Core CPUPerformance

Tomasz Buchert1 Lucas Nussbaum2 Jens Gustedt1

1 INRIA Nancy – Grand Est

2 LORIA / Nancy - Universite

Page 2: Methods for Emulation of Multi-Core CPU Performance

Validation of distributed systems

Approaches:Theoretical approach (paper and pencil)

, the most general results and understanding/ very hard (leads to unsolvability results)

Experimentation (real application on a real environment), realistic context, credibility/ difficulty of preparation and control, questionable reproducibility

Simulation (modeled application inside modeled environment), very simple and perfectly reproducible/ experimental bias, possibly unrealistic

Emulation (real application inside a modeled environment), control over the experiment parameters/ difficult

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 2 / 26

Page 3: Methods for Emulation of Multi-Core CPU Performance

Emulation

The perfect emulated environment should emulate (independently):

Network bandwidth, latency, topologyMemory capabilitiesBackground noise (network, faults)CPU speed and its features

Some parts implemented in Wrekavoc – a tool to define and controlheterogeneity of the cluster

In this talk, however, we specifically concentrate on

Emulation of CPU speed

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 3 / 26

Page 4: Methods for Emulation of Multi-Core CPU Performance

Emulation

The perfect emulated environment should emulate (independently):

Network bandwidth, latency, topologyMemory capabilitiesBackground noise (network, faults)CPU speed and its features

Some parts implemented in Wrekavoc – a tool to define and controlheterogeneity of the cluster

In this talk, however, we specifically concentrate on

Emulation of CPU speed

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 3 / 26

Page 5: Methods for Emulation of Multi-Core CPU Performance

Our goal

0 1 2 3 4 5 6 7

(1) control over the speed of each CPU/core independently

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 4 / 26

Page 6: Methods for Emulation of Multi-Core CPU Performance

Our goal

0 1 2 3 4 5 6 7

VN 1 VN 2 VN 3 Virtual node 4

(2) ability to create separately scheduled zones of tasks

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 4 / 26

Page 7: Methods for Emulation of Multi-Core CPU Performance

Existing methods (CPU-Freq)

Hardware solution to reduce heat, noise and power usageFor:

no overhead of emulationcompletely unintrusivemeaningful CPU time measure

Against:only a finite set of different frequency levels

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 5 / 26

Page 8: Methods for Emulation of Multi-Core CPU Performance

Existing methods (CPU-Lim)

Method available in Wrekavoc toolAlgorithm:

if CPU usage ≥ threshold → send SIGSTOP to the processif CPU usage < threshold → send SIGCONT to the process

CPU usage = CPU time of the processprocess lifetime

For:easy and almost POSIX-compliant

Against:intrusive and unscalabledecision based on one process instead of global CPU usagesleeping is indistinguishable from preemption

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 6 / 26

Page 9: Methods for Emulation of Multi-Core CPU Performance

Existing methods (Fracas)

Based on idea from KRASH (a load injection tool)Uses Linux Cgroups and Completely Fair SchedulerA predefined portion of the CPU is given to tasks burning CPUAll other processes are given the remaining CPU time

Emulatedprocesses

CPU burner

Core 1

Emulatedprocesses

CPU burner

Core 2

Emulatedprocesses

CPU burner

Core 3

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 7 / 26

Page 10: Methods for Emulation of Multi-Core CPU Performance

Existing methods (Fracas)

Based on idea from KRASH (a load injection tool)Uses Linux Cgroups and Completely Fair SchedulerA predefined portion of the CPU is given to tasks burning CPUAll other processes are given the remaining CPU time

For:unintrusivescalable

Against:unportable to other systemssensitive to the configuration of the scheduler

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 7 / 26

Page 11: Methods for Emulation of Multi-Core CPU Performance

New methods (CPU-Gov)

Generalization of CPU-FreqAlternates between two neighbouring hardware frequencies

1.2 GHz 2.4

GH

z

τ

1.5 GHz

τ

0.75τ 0.25τ

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 8 / 26

Page 12: Methods for Emulation of Multi-Core CPU Performance

New methods (CPU-Gov)

Generalization of CPU-FreqAlternates between two neighbouring hardware frequencies

For:no overhead, unintrusive and meaningful CPU time measure(inherited from CPU-Freq)continuous range of emulated frequency

Against:dependency on the hardware implementation (inherited fromCPU-Freq)special algorithm for small values of emulated frequency

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 8 / 26

Page 13: Methods for Emulation of Multi-Core CPU Performance

New methods (CPU-Hogs)

Generalization of CPU-burning techniqueFor each core there is a high-priority thread createdThey ”burn” a required number of CPU cycles

For:simple and portable (POSIX)does not rely on the hardware

Against:theoretical problems with scalability (not observed)

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 9 / 26

Page 14: Methods for Emulation of Multi-Core CPU Performance

New methods (CPU-Hogs)

Generalization of CPU-burning techniqueFor each core there is a high-priority thread createdThey ”burn” a required number of CPU cycles

cores

time

0

1

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 9 / 26

Page 15: Methods for Emulation of Multi-Core CPU Performance

Evaluation

Microbenchmarks with different types of work:

CPU intensive – running a tight computational loopIO bound – sending UDP packets over a networkCPU and IO intensive – sleeping mixed with a computationmultiprocessing – running multiple processes with CPU workmultithreading – running multiple threads with CPU workmemory speed (STREAM benchmark) – sustainable memorybandwidth

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 10 / 26

Page 16: Methods for Emulation of Multi-Core CPU Performance

Evaluation (cont.)

Tested with 1, 2, 4 and 8 emulated coresX-axis – emulated frequencyY-axis – speed perceived by the benchmarkeach test repeated 40 times, results = average with 95%confidence intervalEvaluation performed on Grid’5000 platform

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 11 / 26

Page 17: Methods for Emulation of Multi-Core CPU Performance

Grid’5000

9 sites, 1528 machinesLille, Rennes, Orsay, Nancy, Bordeaux, Lyon, Grenoble, Toulouse, Sophia

Dedicated to research on distributed systems and HPC

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 12 / 26

Page 18: Methods for Emulation of Multi-Core CPU Performance

CPU intensive work (one core)

0.5 1 1.5 2 2.5 3

2,000

4,000

6,000

8,000

Emulated CPU frequency (GHz)

Loop

s/se

c

CPU-LimCPU-Hogs

FracasCPU-GovCPU-Freq

�� ��All methods work as expected.

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 13 / 26

Page 19: Methods for Emulation of Multi-Core CPU Performance

IO-intensive work (one core)

0.5 1 1.5 2 2.5 3

0.5

1

·104

Emulated CPU frequency (GHz)

Loop

s/se

c

CPU-LimCPU-Hogs

FracasCPU-GovCPU-Freq

�� ��IO operations should not scale with CPU frequency.

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 14 / 26

Page 20: Methods for Emulation of Multi-Core CPU Performance

Memory speed (one core)

0.5 1 1.5 2 2.5 3

0.2

0.4

0.6

0.8

1

·104

Emulated CPU frequency (GHz)

MB/

sec

CPU-LimCPU-Hogs

FracasCPU-GovCPU-Freq

�� ��Ideally, memory speed would not be scaled as well.

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 15 / 26

Page 21: Methods for Emulation of Multi-Core CPU Performance

Computing and sleeping workload (one core)

0.5 1 1.5 2 2.5 3

2,000

4,000

6,000

Emulated CPU frequency (GHz)

Loop

s/se

c

CPU-LimCPU-Hogs

FracasCPU-GovCPU-Freq

�� ��The relation should be proportional, but CPU-Lim’s is not.

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 16 / 26

Page 22: Methods for Emulation of Multi-Core CPU Performance

Multiprocessing benchmark (one core)

0.5 1 1.5 2 2.5 3

500

1,000

1,500

Emulated CPU frequency (GHz)

Loop

s/se

c

CPU-LimCPU-Hogs

FracasCPU-GovCPU-Freq

�� ��This relation should be proportional again (but CPU-Lim’s is not).

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 17 / 26

Page 23: Methods for Emulation of Multi-Core CPU Performance

Multithreading benchmark (one core)

0.5 1 1.5 2 2.5 3

500

1,000

1,500

Emulated CPU frequency (GHz)

Loop

s/se

c

CPU-LimCPU-Hogs

FracasCPU-GovCPU-Freq

�� ��The execution speed scales with the frequency.

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 18 / 26

Page 24: Methods for Emulation of Multi-Core CPU Performance

Multithreading benchmark (two cores)

0.5 1 1.5 2 2.5 3

1,000

2,000

3,000

4,000

Emulated CPU frequency (GHz)

Loop

s/se

c

CPU-LimCPU-Hogs

FracasCPU-GovCPU-Freq

�� ��CPU-Lim and Fracas run twice as slow as CPU-Freq.

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 19 / 26

Page 25: Methods for Emulation of Multi-Core CPU Performance

Multiprocessing benchmark (two cores)

0.5 1 1.5 2 2.5 3

1,000

2,000

3,000

4,000

Emulated CPU frequency (GHz)

Loop

s/se

c

CPU-LimCPU-Hogs

FracasCPU-GovCPU-Freq

�� ��CPU-Lim and Fracas fail in this benchmark too.

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 20 / 26

Page 26: Methods for Emulation of Multi-Core CPU Performance

Multiprocessing benchmark (four cores)

0.5 1 1.5 2 2.5 30

2,000

4,000

6,000

Emulated CPU frequency (GHz)

Loop

s/se

c

CPU-LimCPU-Hogs

FracasCPU-GovCPU-Freq

�� ��What happened here?

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 21 / 26

Page 27: Methods for Emulation of Multi-Core CPU Performance

Multiprocessing benchmark (eight cores)

0.5 1 1.5 2 2.5 30

5,000

10,000

Emulated CPU frequency (GHz)

Loop

s/se

c

CPU-LimCPU-Hogs

FracasCPU-GovCPU-Freq

�� ��Fracas fails once more (but CPU-Lim doesn’t!).

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 22 / 26

Page 28: Methods for Emulation of Multi-Core CPU Performance

Summary of the evaluation

CPU-Freq:very good resultscoarse granularity

CPU-Lim:not scalable due to implementation, intrusivehigher variancecontrols processes, not threads

Fracas:good behavior for a single-task workloadscalablebad behavior for multitask workloadbehavior differs from one version of Linux to another

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 23 / 26

Page 29: Methods for Emulation of Multi-Core CPU Performance

Summary of the evaluation (cont.)

CPU-Gov and CPU-Hogs:improvement over previous methodsgood and stable behavior in virtually every benchmarkscalabilityindependent from the underlying OS

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 24 / 26

Page 30: Methods for Emulation of Multi-Core CPU Performance

Future work

Emulate memory bandwidthEmulate other aspects of CPUTest the methods with real-life applicationsIntegrate the best methods into an open source, user-friendlyemulator (Wrekavoc)

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 25 / 26

Page 31: Methods for Emulation of Multi-Core CPU Performance

Conclusions

Presented CPU-Hogs, CPU-Gov and previously existing methodsCompared them by running a set of microbenchmarksEvaluated experimentally on Grid’5000New methods show a big improvement in the quality of emulation

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 26 / 26

Page 32: Methods for Emulation of Multi-Core CPU Performance

Thanks for listening.

Questions?

Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Methods for Emulation of Multi-Core CPU Perf. 27 / 26