advanced modular software performance monitoring

Post on 21-May-2015

89 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

CPU profiling with Intel® VTune™ Amplifier XE

TRANSCRIPT

Advanced Modular Software Performance Monitoring

CPU profiling with Intel® VTune™ Amplifier XE

Alexander MazurovFerrara University, CERN

2

I. Event Processing Software II. Profilers III. Intel® VTune™ Amplifier XE IV. Gaudi Framework V. Gaudi Intel Profiler Auditor VI. Profiling examples

3

Physics events

The Higgs Boson

Simulation * Trigger * Analysis

I. Event Processing Software

4

Detectorevents

Events to storage

106 events/sec 4500 events/sec

LHCb High Level Trigger (HLT) Software

Moore

5

II. Profilers

Collect information related to how an application or

system perform.

6

Measure frequency and duration of functions calls and/or code

instructions.

CPU Profiler

7

Profiling Techniques

- Hardware counters- Instrumenting the code

8

Hardware countersExploit hardware performance counters from Performance Monitoring Unit (PMU)

Counters: - Translation lookaside buffer (TLB) misses - Cache misses - Stall cycles - Memory access latency - ...

Perfmon2 * Intel VTune Amplifier

9

Instrumenting the code

- Statically: * Change code manually / automatically * Compiler assisted (gcc -pg)

- Dynamically (at runtime): * Change code in runtime - Valgrind - Google Performance Tools - Intel VTune Amplifier

10

III. VTune™ Amplifier XEPerformance Profiling Tool

- x86 (32 and 64-bit)- GUI and CLI

11

VTune™ FeaturesRuntime instrumenting profiler

- User-mode sampling- Hardware-based sampling- Concurrency and locks and waits analysis- Threading timeline- Attach to a running process- Source view

12

1) Interupts a process2) Collect samples of all active instruction addresses

3) Restore a call sequence upon each sample.

How user-mode sampling works?

13

User-mode analysis types

- Hotspots- Concurrency- Locks and Waits

14

User-mode samplingHotspots analysis:

15

Group results

16

Call Stack

17

Filter by timeline

18

CPU time by code line

Debug mode (-g)

19

User-mode sampling is a statistical method and does not provide a 100% accurate results.

Accuracy depends on:- Duration of the collection- Speed of processor- Amount of software activity- Sampling interval * recommended value is 10 ms * profiling is only 5% slower

Sampling Accuracy

20

Integrating VTune™ Amplifierto Event Processing Framework

21

IV. GaudiEvent processing framework

MooreTrigger

GaussSimulation

BrunelReconstruction

OnlineMonitoring

and commissioningDaVinci

Physicsanalysis

22

Gaudi Architecture

Algorithms * Services * Tools

23

Moore Event LoopHlt1DiMuonHighMassFilterSequence Hlt1DiMuonHighMassStreamer FastVeloHlt MuonRec Velo2CandidatesDiMuonHighMass GECLooseUnit createITLiteClusters createVeloLiteClusters

Algorithms Sequence

How to profile algorithms?

24

V. Gaudi Intel Profiling Auditor

VTune™ User API +

Gaudi Auditors API

25

VTune™ User API

- Start/Pause profiling- Mark profiling regions

26

Gaudi Auditors API

Algorithm

Start event End event

Callback functions

27

Algorithms profiling (I)

CPU time per sequence branch

28

Algorithms profiling (II)

29

Gaudi configuration

from Configurables import IntelProfilerAuditorprofiler = IntelProfilerAuditor()profiler.StartFromEventN = 5000 profiler.StopAtEventN = 15000AuditorSvc().Auditors +=  [profiler]

30

Run: $> intelprofiler -o /collected/data job.py

Analyze (GUI): $> amplxe-gui /collecter/data/r001hs

Analyze (CLI): $> amplxe-cl -reports hotspots -r /collecter/data/r001hs

31

VI. Profiling examples

1. Memory allocation functions2. Measuring profiling accuracy3. Custom reports

32

1. Memory allocation functionsoperatornew from libstdc++ library:

tc_new from tcmalloc library:

tc_new uses twice less time then operatornew

33

2. Measuring profiling accuracy

Intel Profiling Auditorvs .

Timing AuditorMeasures the absolute time of

algorithm's run1000 events

34

3. Custom reportsBuild reports using CSV files exported

from VTune Amplifier

35

Conclusions

Intel® VTune™ Amplifier XE:

+ Various analysis types and reports + Rich User API + Reasonable overhead time

top related