Download - Fluidanimate:PARSEC Application Analysis
PARSEC Benchmark Performance AnalysisFluidanimate
Iuliia ProskurniaEMDC
April 17, 2012
IntroductionScalability Analysis
Extrae AnalysisParaver
Conclusions
IntroductionPARSECFluidanimate
Scalability AnalysisMacBook AirBoada
Extrae AnalysisCodeEvent Log
ParaverTrace AnalysisPerformance Counter Analysis
Conclusions
Iuliia Proskurnia EMDC PARSEC Benchmark Performance Analysis 1
IntroductionScalability Analysis
Extrae AnalysisParaver
Conclusions
PARSECFluidanimate
What is PARSEC?
I Benchmark suite - multithreaded programs
I Next-generation shared-memory programsI Key Features
I MultithreadedI Emerging WorkloadsI Research ...
Iuliia Proskurnia EMDC PARSEC Benchmark Performance Analysis 2
IntroductionScalability Analysis
Extrae AnalysisParaver
Conclusions
PARSECFluidanimate
Benchmark Applications
I blackscholes
I bodytrack
I canneal
I facesim
I ferret
I x264
I fluidanimate
I freqmine
I raytrace
I streamcluster
I swaptions
I vips
Iuliia Proskurnia EMDC PARSEC Benchmark Performance Analysis 3
IntroductionScalability Analysis
Extrae AnalysisParaver
Conclusions
PARSECFluidanimate
Fluidanimate
I Fluid dynamics for animation purposeswith Smoothed Particle Hydrodynamics(SPH) method
I Computer animation application
I Coarse-granular parallelism, static loadbalancing
Iuliia Proskurnia EMDC PARSEC Benchmark Performance Analysis 4
IntroductionScalability Analysis
Extrae AnalysisParaver
Conclusions
MacBook AirBoada
IntroductionPARSECFluidanimate
Scalability AnalysisMacBook AirBoada
Extrae AnalysisCodeEvent Log
ParaverTrace AnalysisPerformance Counter Analysis
Conclusions
Iuliia Proskurnia EMDC PARSEC Benchmark Performance Analysis 5
IntroductionScalability Analysis
Extrae AnalysisParaver
Conclusions
MacBook AirBoada
Characteristics
MacBook Air:
I Intel(R) Core(TM) i5-2557M CPU @ 1.70GHz
I 2 Cores with HT support
I 4 Gb RAM
Boada Server:
I Intel(R) Xeon(R) CPU E5645 @ 2.40GHz
I 12 Cores with HT support
I 24 Gb RAM
Iuliia Proskurnia EMDC PARSEC Benchmark Performance Analysis 6
IntroductionScalability Analysis
Extrae AnalysisParaver
Conclusions
MacBook AirBoada
MacBook Air Performance
5,399
3,323 3,225 3,678
3,973 4,317
0
1
2
3
4
5
6
1 2 4 8 16 32
Time, se
c
Number of threads
MacBook Air Performance
*used input set is simlarge
Iuliia Proskurnia EMDC PARSEC Benchmark Performance Analysis 7
IntroductionScalability Analysis
Extrae AnalysisParaver
Conclusions
MacBook AirBoada
Boada Server Performance
2,708
1,62 1,003
0,726 0,681 0,667
0
1
2
3
4
5
6
1 2 4 8 16 32
Time, se
c
Number of threads
Boada Performance
*used input set is simlarge
Iuliia Proskurnia EMDC PARSEC Benchmark Performance Analysis 8
IntroductionScalability Analysis
Extrae AnalysisParaver
Conclusions
CodeEvent Log
IntroductionPARSECFluidanimate
Scalability AnalysisMacBook AirBoada
Extrae AnalysisCodeEvent Log
ParaverTrace AnalysisPerformance Counter Analysis
Conclusions
Iuliia Proskurnia EMDC PARSEC Benchmark Performance Analysis 9
IntroductionScalability Analysis
Extrae AnalysisParaver
Conclusions
CodeEvent Log
int Main(){...}int main(int argc, char *argv[]) {
Extrae_init();
Extrae_eventandcounters(0, 1);
...
Extrae_eventandcounters(0, 2);
for(int i = 0; i < threadnum; ++i) {
...
pthread_create(..., AdvanceFramesMT, ...);
}
Extrae_eventandcounters(0, 3);
for(int i = 0; i < threadnum; ++i) {
pthread_join(...);
}
Extrae_eventandcounters(0, 4);
...
Extrae_fini();
}
Iuliia Proskurnia EMDC PARSEC Benchmark Performance Analysis 10
IntroductionScalability Analysis
Extrae AnalysisParaver
Conclusions
CodeEvent Log
Thread
for (int i = 0; i < targs->frames; ++i) {
ClearParticlesMT(i);
pthread_barrier_wait(&barrier);
RebuildGridMT(i);
pthread_barrier_wait(&barrier);
ComputeDensitiesMT(i);
...
pthread_barrier_wait(&barrier);
...
ProcessCollisionsMT(i);
pthread_barrier_wait(&barrier);
AdvanceParticlesMT(i);
pthread_barrier_wait(&barrier);
}
Iuliia Proskurnia EMDC PARSEC Benchmark Performance Analysis 11
IntroductionScalability Analysis
Extrae AnalysisParaver
Conclusions
CodeEvent Log
Event Log
List of events1 - Master thread started, begin intialization2 - Master thread finished intialization, start of threads creation3 - Master finished creating threads, start of parallel part4 - Master joined threads5 - Thread start/run6 - Thread barrier7 - End thread
Iuliia Proskurnia EMDC PARSEC Benchmark Performance Analysis 12
IntroductionScalability Analysis
Extrae AnalysisParaver
Conclusions
Trace AnalysisPerformance Counter Analysis
IntroductionPARSECFluidanimate
Scalability AnalysisMacBook AirBoada
Extrae AnalysisCodeEvent Log
ParaverTrace AnalysisPerformance Counter Analysis
Conclusions
Iuliia Proskurnia EMDC PARSEC Benchmark Performance Analysis 13
IntroductionScalability Analysis
Extrae AnalysisParaver
Conclusions
Trace AnalysisPerformance Counter Analysis
2 Threads
Real Size:
Zoom up to two for cycles
*red - run, green - waiting
Iuliia Proskurnia EMDC PARSEC Benchmark Performance Analysis 14
IntroductionScalability Analysis
Extrae AnalysisParaver
Conclusions
Trace AnalysisPerformance Counter Analysis
8 Threads
Real Size:
Zoom up:
*red - run, green - waiting
Waiting
20% time
Iuliia Proskurnia EMDC PARSEC Benchmark Performance Analysis 15
IntroductionScalability Analysis
Extrae AnalysisParaver
Conclusions
Trace AnalysisPerformance Counter Analysis
8 Threads - More Analysis
Number of Instructions:
Number of total cycles:
*lighter color greater than darker color
Iuliia Proskurnia EMDC PARSEC Benchmark Performance Analysis 16
IntroductionScalability Analysis
Extrae AnalysisParaver
Conclusions
Trace AnalysisPerformance Counter Analysis
8 Threads - More Analysis
IPC:
*darker color greater than lighter color
L1 Cache Misses:
*lighter color greater than darker color
Iuliia Proskurnia EMDC PARSEC Benchmark Performance Analysis 17
IntroductionScalability Analysis
Extrae AnalysisParaver
Conclusions
Trace AnalysisPerformance Counter Analysis
8 Threads - Even More Analysis
IPC:
6,23818
6,78529
5,87637
6,53314
7,89735
8,68508
6,10146
7,8378
0,84
1
0,96
1,05
1,13
1,03
0,99
0,81
0 1 2 3 4 5 6 7 8 9
2
3
4
5
6
7
8
9
Thread
s
CacheMisses
IPC
L1 Cache Misses:
*red - run, green - waiting
Iuliia Proskurnia EMDC PARSEC Benchmark Performance Analysis 18
IntroductionScalability Analysis
Extrae AnalysisParaver
Conclusions
Trace AnalysisPerformance Counter Analysis
Locks
ComputeForceMT() function:
Results:
I Fine-Grained Locking
Iuliia Proskurnia EMDC PARSEC Benchmark Performance Analysis 19
IntroductionScalability Analysis
Extrae AnalysisParaver
Conclusions
IntroductionPARSECFluidanimate
Scalability AnalysisMacBook AirBoada
Extrae AnalysisCodeEvent Log
ParaverTrace AnalysisPerformance Counter Analysis
Conclusions
Iuliia Proskurnia EMDC PARSEC Benchmark Performance Analysis 20
IntroductionScalability Analysis
Extrae AnalysisParaver
Conclusions
Conclusions
Summary
I PARSEC: Fluidanimate
I Bad Load - Balancing
I Fine - Grained Locking. No lock contention
I Bus Contention
I Hyper Threading on the Server
Iuliia Proskurnia EMDC PARSEC Benchmark Performance Analysis 21
PARSEC Benchmark Performance AnalysisFluidanimate
Iuliia ProskurniaEMDC
April 17, 2012