language-centric performance analysis of openmp programs ... · demo: npb’s mg benchmark...

51
Language-Centric Performance Analysis of OpenMP Programs with Aftermath Andi Drebes The University of Manchester School of Computer Science Advanced Processor Technologies [email protected] Joint work with: Jean-Baptiste Br ´ ejon, Antoniu Pop, Karine Heydemann, Albert Cohen IWOMP 2016

Upload: others

Post on 21-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Language-Centric Performance Analysis of OpenMPPrograms with Aftermath

Andi Drebes

The University of ManchesterSchool of Computer Science

Advanced Processor [email protected]

Joint work with:Jean-Baptiste Brejon, Antoniu Pop, Karine Heydemann, Albert Cohen

IWOMP 2016

Page 2: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Analysis of OpenMP Programs

Hardware

Run-time OS

Application

Andi Drebes – Aftermath: Language-Centric Performance Analysis 1 / 10

Page 3: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Analysis of OpenMP Programs

Hardware

Run-time OS

Application

Andi Drebes – Aftermath: Language-Centric Performance Analysis 1 / 10

Page 4: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Analysis of OpenMP Programs

Programming model

#pragma omp task depend(...){ ... }

#pragma omp parallel forfor(int i = 0; i < N; i++){ ... }

Application Hardware

Run-time OS

Andi Drebes – Aftermath: Language-Centric Performance Analysis 1 / 10

Page 5: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

New Tools for Performance Analysis

Frequent topics for performance analysis:I Amount of parallelism and load balacingI Duration of execution phasesI Synchronization overhead (e.g., barriers)I Choice of an appropriate loop scheduleI Data distribution on NUMA systemsI Relate hardware events to loops / tasks

Our tools: Aftermath & Aftermath-OpenMPI Aftermath: Graphical tool for performance analysisI Aftermath-OpenMP: Instrumented LLVM/clang run-time

Andi Drebes – Aftermath: Language-Centric Performance Analysis 2 / 10

Page 6: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

New Tools for Performance Analysis

Frequent topics for performance analysis:I Amount of parallelism and load balacingI Duration of execution phasesI Synchronization overhead (e.g., barriers)I Choice of an appropriate loop scheduleI Data distribution on NUMA systemsI Relate hardware events to loops / tasks

Our tools: Aftermath & Aftermath-OpenMPI Aftermath: Graphical tool for performance analysisI Aftermath-OpenMP: Instrumented LLVM/clang run-time

Andi Drebes – Aftermath: Language-Centric Performance Analysis 2 / 10

Page 7: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Outline

1. Overview of Trace-based Analysis

2. Overview of Aftermath’s GUI

3. Demo

4. Overhead of Tracing

5. Summary & Conclusion

Page 8: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Trace-based Analysis with Aftermath

Application HardwareAftermath-OpenMP

Run-timeOS

Trace file

Andi Drebes – Aftermath: Language-Centric Performance Analysis 3 / 10

Page 9: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Trace-based Analysis with Aftermath

Application HardwareAftermath-OpenMP

Run-timeOS

Trace file

Aftermath

Visualizations &Exploration

Statistics &Accurate Numbers

ProgrammingModel-centric

Analysis

Andi Drebes – Aftermath: Language-Centric Performance Analysis 3 / 10

Page 10: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Terminology

#pragma omp parallel for schedule(static, 10)for(int i = 0; i < 100; i++){ ... }

Andi Drebes – Aftermath: Language-Centric Performance Analysis 4 / 10

Page 11: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Terminology

Loop construct#pragma omp parallel for schedule(static, 10)for(int i = 0; i < 100; i++){ ... }

Andi Drebes – Aftermath: Language-Centric Performance Analysis 4 / 10

Page 12: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Terminology

Loop construct

0 99Iteration space

Loop

#pragma omp parallel for schedule(static, 10)for(int i = 0; i < 100; i++){ ... }

Andi Drebes – Aftermath: Language-Centric Performance Analysis 4 / 10

Page 13: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Terminology

Loop construct

0 99Iteration space

LoopC0 C1 C2 C3 C4 C5 C6 C7 C8 C9

#pragma omp parallel for schedule(static, 10)for(int i = 0; i < 100; i++){ ... }

Andi Drebes – Aftermath: Language-Centric Performance Analysis 4 / 10

Page 14: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Terminology

Loop construct

0 99Iteration space

LoopC0 C1 C2 C3 C4 C5 C6 C7 C8 C9

Chunk

#pragma omp parallel for schedule(static, 10)for(int i = 0; i < 100; i++){ ... }

Andi Drebes – Aftermath: Language-Centric Performance Analysis 4 / 10

Page 15: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Terminology

Loop construct

0 99Iteration space

LoopC0 C1 C2 C3 C4 C5 C6 C7 C8 C9

Chunk

Worker 0

Worker 1

Worker 2

C0 C3 C6 C9

C1 C4 C7

C2 C5 C8

[0-9] [30-39] [60-69] [90-99]

[10-19] [40-49] [70-79]

[20-29] [50-59] [80-89]

#pragma omp parallel for schedule(static, 10)for(int i = 0; i < 100; i++){ ... }

Andi Drebes – Aftermath: Language-Centric Performance Analysis 4 / 10

Page 16: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Terminology

Loop construct

0 99Iteration space

LoopC0 C1 C2 C3 C4 C5 C6 C7 C8 C9

Chunk

Worker 0

Worker 1

Worker 2

C0 C3 C6 C9

C1 C4 C7

C2 C5 C8

[0-9] [30-39] [60-69] [90-99]

[10-19] [40-49] [70-79]

[20-29] [50-59] [80-89]

Iteration set

#pragma omp parallel for schedule(static, 10)for(int i = 0; i < 100; i++){ ... }

Andi Drebes – Aftermath: Language-Centric Performance Analysis 4 / 10

Page 17: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Terminology

Loop construct

0 99Iteration space

LoopC0 C1 C2 C3 C4 C5 C6 C7 C8 C9

Chunk

Worker 0

Worker 1

Worker 2

C0 C3 C6 C9

C1 C4 C7

C2 C5 C8

[0-9] [30-39] [60-69] [90-99]

[10-19] [40-49] [70-79]

[20-29] [50-59] [80-89]

Iteration set

Worker 0

Worker 1

Worker 2

#pragma omp parallel for schedule(static, 10)for(int i = 0; i < 100; i++){ ... }

Andi Drebes – Aftermath: Language-Centric Performance Analysis 4 / 10

Page 18: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Terminology

Loop construct

0 99Iteration space

LoopC0 C1 C2 C3 C4 C5 C6 C7 C8 C9

Chunk

Worker 0

Worker 1

Worker 2

C0 C3 C6 C9

C1 C4 C7

C2 C5 C8

[0-9] [30-39] [60-69] [90-99]

[10-19] [40-49] [70-79]

[20-29] [50-59] [80-89]

Iteration set

Worker 0

Worker 1

Worker 2

#pragma omp parallel for schedule(static, 10)for(int i = 0; i < 100; i++){ ... }

Andi Drebes – Aftermath: Language-Centric Performance Analysis 4 / 10

Page 19: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Terminology

Loop construct

0 99Iteration space

LoopC0 C1 C2 C3 C4 C5 C6 C7 C8 C9

Chunk

Worker 0

Worker 1

Worker 2

C0 C3 C6 C9

C1 C4 C7

C2 C5 C8

[0-9] [30-39] [60-69] [90-99]

[10-19] [40-49] [70-79]

[20-29] [50-59] [80-89]

Iteration set

Worker 0

Worker 1

Worker 2

Iteration period Iteration period

#pragma omp parallel for schedule(static, 10)for(int i = 0; i < 100; i++){ ... }

Andi Drebes – Aftermath: Language-Centric Performance Analysis 4 / 10

Page 20: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Aftermath: Overview of the GUI

Andi Drebes – Aftermath: Language-Centric Performance Analysis 5 / 10

Page 21: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Aftermath: Overview of the GUI

Detailed Text viewDetailed Text view

Time lineTime line

Filte

rsFi

lters

Stat

istic

sSt

atis

tics

Andi Drebes – Aftermath: Language-Centric Performance Analysis 5 / 10

Page 22: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Aftermath: Overview of the GUI

Time lineAndi Drebes – Aftermath: Language-Centric Performance Analysis 5 / 10

Page 23: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Aftermath: Overview of the GUI

Time

Proc

esor

s Activityduring

execution

Time lineAndi Drebes – Aftermath: Language-Centric Performance Analysis 5 / 10

Page 24: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Aftermath: Overview of the GUI

Time

Proc

esor

s

Sequential Execution(orange)

Time line: Run-time statesAndi Drebes – Aftermath: Language-Centric Performance Analysis 5 / 10

Page 25: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Aftermath: Overview of the GUI

Time

Proc

esor

s Parallel loop(green)

Time line: Run-time statesAndi Drebes – Aftermath: Language-Centric Performance Analysis 5 / 10

Page 26: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Aftermath: Overview of the GUI

Time

Proc

esor

s

BarrierSynchronization

(dark red)

Time line: Run-time statesAndi Drebes – Aftermath: Language-Centric Performance Analysis 5 / 10

Page 27: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Aftermath: Overview of the GUI

Time

Proc

esor

s No activity(background visible)

Time line: Run-time statesAndi Drebes – Aftermath: Language-Centric Performance Analysis 5 / 10

Page 28: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Aftermath: Overview of the GUI

Time

Proc

esor

s

Time line: Loop constructsAndi Drebes – Aftermath: Language-Centric Performance Analysis 5 / 10

Page 29: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Aftermath: Overview of the GUI

Time

Proc

esor

s

Time line: Loop constructsAndi Drebes – Aftermath: Language-Centric Performance Analysis 5 / 10

Page 30: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Aftermath: Overview of the GUI

State statisticsAndi Drebes – Aftermath: Language-Centric Performance Analysis 5 / 10

Page 31: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Aftermath: Overview of the GUI

State statisticsAndi Drebes – Aftermath: Language-Centric Performance Analysis 5 / 10

Page 32: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Aftermath: Overview of the GUI

Histogram showing duration of iteration periodsAndi Drebes – Aftermath: Language-Centric Performance Analysis 5 / 10

Page 33: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Aftermath: Overview of the GUI

Histogram showing duration of iteration periodsAndi Drebes – Aftermath: Language-Centric Performance Analysis 5 / 10

Page 34: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Aftermath: Overview of the GUI

Detailed text view for parallel loopsAndi Drebes – Aftermath: Language-Centric Performance Analysis 5 / 10

Page 35: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Aftermath: Overview of the GUI

Detailed text view for parallel loopsAndi Drebes – Aftermath: Language-Centric Performance Analysis 5 / 10

Page 36: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Aftermath: Overview of the GUI

Filter for loop constructsAndi Drebes – Aftermath: Language-Centric Performance Analysis 5 / 10

Page 37: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Aftermath: Overview of the GUI

Filter for loop constructsAndi Drebes – Aftermath: Language-Centric Performance Analysis 5 / 10

Page 38: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Demo: NPB’s MG benchmark

Benchmark: NPB MGI NPB 2.3 C implementation from the Omni Compiler ProjectI C input class (512× 512 elements)

Test platformI SGI UV 2000 (Xeon E5-4640)I 192 cores (Hyperthreading disabled)I 24 NUMA nodes, 756 GiB RAMI LLVM/clang 3.8.0I Aftermath-OpenMP for trace generation

Andi Drebes – Aftermath: Language-Centric Performance Analysis 6 / 10

Page 39: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

DEMO

Page 40: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Demo: Summary

Execution phasesI Parallel initializations + Main ComputationI Sequential execution in between

Time spent in barriersI States on time line / statistics panel

Load imbalanceI Sufficient parallelismI High load imbalance, but not due to partitioning / scheduleI Same NUMA node→ Aprox. same execution time

SolutionI Change allocation scheme: one big allocationI Reduce number of workers: #iters = n × #workersI Result: 35× speedup

Andi Drebes – Aftermath: Language-Centric Performance Analysis 7 / 10

Page 41: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Demo: Summary

Execution phasesI Parallel initializations + Main ComputationI Sequential execution in between

Time spent in barriersI States on time line / statistics panel

Load imbalanceI Sufficient parallelismI High load imbalance, but not due to partitioning / scheduleI Same NUMA node→ Aprox. same execution time

SolutionI Change allocation scheme: one big allocationI Reduce number of workers: #iters = n × #workersI Result: 35× speedup

Andi Drebes – Aftermath: Language-Centric Performance Analysis 7 / 10

Page 42: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Demo: Summary

Execution phasesI Parallel initializations + Main ComputationI Sequential execution in between

Time spent in barriersI States on time line / statistics panel

Load imbalanceI Sufficient parallelismI High load imbalance, but not due to partitioning / scheduleI Same NUMA node→ Aprox. same execution time

SolutionI Change allocation scheme: one big allocationI Reduce number of workers: #iters = n × #workersI Result: 35× speedup

Andi Drebes – Aftermath: Language-Centric Performance Analysis 7 / 10

Page 43: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Demo: Summary

Execution phasesI Parallel initializations + Main ComputationI Sequential execution in between

Time spent in barriersI States on time line / statistics panel

Load imbalanceI Sufficient parallelismI High load imbalance, but not due to partitioning / scheduleI Same NUMA node→ Aprox. same execution time

SolutionI Change allocation scheme: one big allocationI Reduce number of workers: #iters = n × #workersI Result: 35× speedup

Andi Drebes – Aftermath: Language-Centric Performance Analysis 7 / 10

Page 44: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Overhead of Tracing

CG EP FT LU MG sparselu strassen alignment fft sort Geometricmean (abs.)

10

5

0

5

10

15

20

0.88 0.60-0.66

0.351.77

0.01

5.79

-0.030.24

4.07

0.46

NPB-2.3 (loop-based) BOTS 1.1.2 (task-based)

Relative Increase of Execution Time [%](mean for 50 runs / error bars: standard deviation)

Test systemI SGI UV 2000 (192 cores, 24 NUMA nodes)

Missing benchmarksI Outlier: floorplan (+380% execution time; very small tasks)I Segfaults (BT, nqueens, uts) / Excessive Execution time (IS) /

Verification Failure (health)

Andi Drebes – Aftermath: Language-Centric Performance Analysis 8 / 10

Page 45: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Overhead of Tracing

CG EP FT LU MG sparselu strassen alignment fft sort Geometricmean (abs.)

10

5

0

5

10

15

20

0.88 0.60-0.66

0.351.77

0.01

5.79

-0.030.24

4.07

0.46

NPB-2.3 (loop-based) BOTS 1.1.2 (task-based)

Relative Increase of Execution Time [%](mean for 50 runs / error bars: standard deviation)

Test systemI SGI UV 2000 (192 cores, 24 NUMA nodes)

Missing benchmarksI Outlier: floorplan (+380% execution time; very small tasks)I Segfaults (BT, nqueens, uts) / Excessive Execution time (IS) /

Verification Failure (health)Andi Drebes – Aftermath: Language-Centric Performance Analysis 8 / 10

Page 46: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Using Aftermath & Aftermath-OpenMP

Drop-in replacement for libomp with wrapper script:$ aftermath-openmp-trace -o events.ost -- <program> <args>

$ aftermath events.ost

Source code and tutorial:http://www.openstream.info/aftermath

Virtual Machine(Aftermath + Aftermath-OpenMP + sample traces + documentation):http://www.openstream.info/vm

Andi Drebes – Aftermath: Language-Centric Performance Analysis 9 / 10

Page 47: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Using Aftermath & Aftermath-OpenMP

Drop-in replacement for libomp with wrapper script:$ aftermath-openmp-trace -o events.ost -- <program> <args>

$ aftermath events.ost

Source code and tutorial:http://www.openstream.info/aftermath

Virtual Machine(Aftermath + Aftermath-OpenMP + sample traces + documentation):http://www.openstream.info/vm

Andi Drebes – Aftermath: Language-Centric Performance Analysis 9 / 10

Page 48: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Summary

AftermathI Reactive graphical user interface for trace analysisI Programming model-centric analysis: Loops and tasks

Aftermath-OpenMPI Instrumented LLVM/clang OpenMP run-timeI Low tracing overhead

Future workI Dependent tasksI Automate recurring analyses

On-line resourceshttp://www.openstream.info/aftermath (Main website)http://www.openstream.info/vm (VM image)

Andi Drebes – Aftermath: Language-Centric Performance Analysis 10 / 10

Page 49: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Summary

AftermathI Reactive graphical user interface for trace analysisI Programming model-centric analysis: Loops and tasks

Aftermath-OpenMPI Instrumented LLVM/clang OpenMP run-timeI Low tracing overhead

Future workI Dependent tasksI Automate recurring analyses

On-line resourceshttp://www.openstream.info/aftermath (Main website)http://www.openstream.info/vm (VM image)

Andi Drebes – Aftermath: Language-Centric Performance Analysis 10 / 10

Page 50: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Summary

AftermathI Reactive graphical user interface for trace analysisI Programming model-centric analysis: Loops and tasks

Aftermath-OpenMPI Instrumented LLVM/clang OpenMP run-timeI Low tracing overhead

Future workI Dependent tasksI Automate recurring analyses

On-line resourceshttp://www.openstream.info/aftermath (Main website)http://www.openstream.info/vm (VM image)

Andi Drebes – Aftermath: Language-Centric Performance Analysis 10 / 10

Page 51: Language-Centric Performance Analysis of OpenMP Programs ... · Demo: NPB’s MG benchmark Benchmark: NPB MG I NPB 2.3 C implementation from the Omni Compiler Project I C input class

Summary

AftermathI Reactive graphical user interface for trace analysisI Programming model-centric analysis: Loops and tasks

Aftermath-OpenMPI Instrumented LLVM/clang OpenMP run-timeI Low tracing overhead

Future workI Dependent tasksI Automate recurring analyses

On-line resourceshttp://www.openstream.info/aftermath (Main website)http://www.openstream.info/vm (VM image)

Andi Drebes – Aftermath: Language-Centric Performance Analysis 10 / 10