instrumenting parsecs raytrace
DESCRIPTION
(Check my blog @ http://www.marioalmeida.eu/ ) In this presentation I present the performance metrics and results of running the parsec benchmark with the raytrace application on Upc's boada serverTRANSCRIPT
![Page 1: Instrumenting parsecs raytrace](https://reader034.vdocument.in/reader034/viewer/2022052620/5575bb71d8b42a312a8b460a/html5/thumbnails/1.jpg)
Instrumenting a benchmark applicationTools and Measurements TechniquesProject by Mário Almeida (EMDC)
Barcelona, 25 April 2012
![Page 2: Instrumenting parsecs raytrace](https://reader034.vdocument.in/reader034/viewer/2022052620/5575bb71d8b42a312a8b460a/html5/thumbnails/2.jpg)
Index (1/2)Tools and configuration● Parsec
○ Overview○ Benchmark programs
● Extrae● Paraver● Configuration
1
![Page 3: Instrumenting parsecs raytrace](https://reader034.vdocument.in/reader034/viewer/2022052620/5575bb71d8b42a312a8b460a/html5/thumbnails/3.jpg)
Index (2/2)Measurements● Raytrace
○ Overview○ Code○ Inputs○ Traces○ Load Balancing○ Cache misses and instructions○ Execution time○ Configuration comparisons○ Extrae overhead
Conclusions 2
![Page 4: Instrumenting parsecs raytrace](https://reader034.vdocument.in/reader034/viewer/2022052620/5575bb71d8b42a312a8b460a/html5/thumbnails/4.jpg)
Tools and configuration
![Page 5: Instrumenting parsecs raytrace](https://reader034.vdocument.in/reader034/viewer/2022052620/5575bb71d8b42a312a8b460a/html5/thumbnails/5.jpg)
ParsecOverview● Benchmark with the following characteristics:
○ Multithreaded○ Emerging workloads○ Diverse○ Not HPC-focused○ Research
3
![Page 6: Instrumenting parsecs raytrace](https://reader034.vdocument.in/reader034/viewer/2022052620/5575bb71d8b42a312a8b460a/html5/thumbnails/6.jpg)
ParsecBenchmark programs● blackscholes● bodytrack● canneal● dedup● facesim● ferret● fluidanimate● freqmine● raytrace● ... 4
![Page 7: Instrumenting parsecs raytrace](https://reader034.vdocument.in/reader034/viewer/2022052620/5575bb71d8b42a312a8b460a/html5/thumbnails/7.jpg)
Extrae● Instrumentation package to trace programs
and run with shared memory model and message passing programming.
5
![Page 8: Instrumenting parsecs raytrace](https://reader034.vdocument.in/reader034/viewer/2022052620/5575bb71d8b42a312a8b460a/html5/thumbnails/8.jpg)
Paraver● Detailed quantitative analysis of a program
performance.● Concurrent comparative analysis of several
traces.● Support for mixed message passing and
shared memory.● Building of derived metrics.
6
![Page 9: Instrumenting parsecs raytrace](https://reader034.vdocument.in/reader034/viewer/2022052620/5575bb71d8b42a312a8b460a/html5/thumbnails/9.jpg)
Configuration (1/4)Boada server:
● Dual CPU Six Core with Hyperthreading.● Kills applications after a few minutes.● 24 GB of RAM.
Boada server:
● Used cpulimit to limit the cpu usage up to four cores.
7
![Page 10: Instrumenting parsecs raytrace](https://reader034.vdocument.in/reader034/viewer/2022052620/5575bb71d8b42a312a8b460a/html5/thumbnails/10.jpg)
Configuration (2/4)Installed and/or configured:
● Parsec 2.1 with raytrace package only.● Extrae 2.2.1.● Paraver 4.3.0 (in my laptop).● CpuLimit● Minor configurations on .bashrc.● Multiple scripts to clean, build and run.
8
![Page 11: Instrumenting parsecs raytrace](https://reader034.vdocument.in/reader034/viewer/2022052620/5575bb71d8b42a312a8b460a/html5/thumbnails/11.jpg)
Configuration (3/4)
9
![Page 12: Instrumenting parsecs raytrace](https://reader034.vdocument.in/reader034/viewer/2022052620/5575bb71d8b42a312a8b460a/html5/thumbnails/12.jpg)
Configuration (4/4)
10
![Page 13: Instrumenting parsecs raytrace](https://reader034.vdocument.in/reader034/viewer/2022052620/5575bb71d8b42a312a8b460a/html5/thumbnails/13.jpg)
Measurements
![Page 14: Instrumenting parsecs raytrace](https://reader034.vdocument.in/reader034/viewer/2022052620/5575bb71d8b42a312a8b460a/html5/thumbnails/14.jpg)
RaytraceOverview● Physical simulation for visualization● Computer animation● Input is a complex object of many triangles.
11
![Page 15: Instrumenting parsecs raytrace](https://reader034.vdocument.in/reader034/viewer/2022052620/5575bb71d8b42a312a8b460a/html5/thumbnails/15.jpg)
RaytraceCodeFor every pixel in the image
calculate trajectory of ray striking pixelfind closest intersection point of ray with scene
geometrycalculate contribution of all lights at intersection pointrecursively trace specularly reflected ray
end for
12
![Page 16: Instrumenting parsecs raytrace](https://reader034.vdocument.in/reader034/viewer/2022052620/5575bb71d8b42a312a8b460a/html5/thumbnails/16.jpg)
RaytraceInputs● simsmall - 1 million polygons (480x270)● simmedium - 1 million poly (960x540)● simlarge - 1 million poly (1920x1080)● native - 10 million poly (1920x1080)
13
![Page 17: Instrumenting parsecs raytrace](https://reader034.vdocument.in/reader034/viewer/2022052620/5575bb71d8b42a312a8b460a/html5/thumbnails/17.jpg)
RaytraceTrace (1/2)Only 10% of the execution time is parallel!
14
Not created Running
![Page 18: Instrumenting parsecs raytrace](https://reader034.vdocument.in/reader034/viewer/2022052620/5575bb71d8b42a312a8b460a/html5/thumbnails/18.jpg)
Render time is proportional to the # of frames!
RaytraceTrace (2/2)
15
RenderInit and adding object Build Context
![Page 19: Instrumenting parsecs raytrace](https://reader034.vdocument.in/reader034/viewer/2022052620/5575bb71d8b42a312a8b460a/html5/thumbnails/19.jpg)
RaytraceLoad balancing (1/2)
16Not created
Barrier
Create Threads Task
Wait for all threads
![Page 20: Instrumenting parsecs raytrace](https://reader034.vdocument.in/reader034/viewer/2022052620/5575bb71d8b42a312a8b460a/html5/thumbnails/20.jpg)
Good load balancing between the slave threads.
RaytraceLoad balancing (2/2)
17
![Page 21: Instrumenting parsecs raytrace](https://reader034.vdocument.in/reader034/viewer/2022052620/5575bb71d8b42a312a8b460a/html5/thumbnails/21.jpg)
RaytraceCache and instructions
18
High number of cache misses Very low number of cache misses
There were no significative diferences of IPC between threads.
![Page 22: Instrumenting parsecs raytrace](https://reader034.vdocument.in/reader034/viewer/2022052620/5575bb71d8b42a312a8b460a/html5/thumbnails/22.jpg)
RaytraceExecution time (1/3)
These are average times from multiple executions of the parallel code only and without extrae overhead.There was a high average deviation of 0.3 seconds in the experiments.Bigger inputs were more accurate.
19
![Page 23: Instrumenting parsecs raytrace](https://reader034.vdocument.in/reader034/viewer/2022052620/5575bb71d8b42a312a8b460a/html5/thumbnails/23.jpg)
RaytraceExecution time (2/3)
There was a smaller average deviation of 0.03 seconds. With 64 threads it runs almost three times faster!
20
![Page 24: Instrumenting parsecs raytrace](https://reader034.vdocument.in/reader034/viewer/2022052620/5575bb71d8b42a312a8b460a/html5/thumbnails/24.jpg)
RaytraceExecution time (3/3)
There was a even smaller average deviation of 0.02 seconds. With 64 threads it runs almost three times faster!
21
![Page 25: Instrumenting parsecs raytrace](https://reader034.vdocument.in/reader034/viewer/2022052620/5575bb71d8b42a312a8b460a/html5/thumbnails/25.jpg)
RaytraceConfiguration comparison
22
In the case of the limited configuration, although perfomance doesn't seem to degrade, the execution time seems to stabilize for more than 8 threads.
![Page 26: Instrumenting parsecs raytrace](https://reader034.vdocument.in/reader034/viewer/2022052620/5575bb71d8b42a312a8b460a/html5/thumbnails/26.jpg)
RaytraceExtrae overhead
23
![Page 27: Instrumenting parsecs raytrace](https://reader034.vdocument.in/reader034/viewer/2022052620/5575bb71d8b42a312a8b460a/html5/thumbnails/27.jpg)
Conclusions
![Page 28: Instrumenting parsecs raytrace](https://reader034.vdocument.in/reader034/viewer/2022052620/5575bb71d8b42a312a8b460a/html5/thumbnails/28.jpg)
Conclusions (1/3)● The system seemed to perform worse for a
number of threads multiple of the total number of physical cores.
● The program has a good load balancing. ● Fine-granular parallelism.
24
![Page 29: Instrumenting parsecs raytrace](https://reader034.vdocument.in/reader034/viewer/2022052620/5575bb71d8b42a312a8b460a/html5/thumbnails/29.jpg)
Conclusions (2/3)● Although it wasn't possible to verify,
increasing the input should cause higher cache misses, because of the big working sets that won't fit on the memory.
● Memory bandwidth should be the main issue
for good speedups. ● Boada killed almost all the native input
executions. 25
![Page 30: Instrumenting parsecs raytrace](https://reader034.vdocument.in/reader034/viewer/2022052620/5575bb71d8b42a312a8b460a/html5/thumbnails/30.jpg)
Conclusions (3/3)● Paraver simplifies the process of analyzing
an application performance. ● Better knowledge of the systems
architecture would be needed in order further analyse the performance of the application.
26
![Page 31: Instrumenting parsecs raytrace](https://reader034.vdocument.in/reader034/viewer/2022052620/5575bb71d8b42a312a8b460a/html5/thumbnails/31.jpg)
Questions