1 proactive performance evaluation with nas benchmarks and optimization of oo spmd brian...
TRANSCRIPT
![Page 1: 1 ProActive performance evaluation with NAS benchmarks and optimization of OO SPMD Brian AmedroVladimir Bodnartchouk](https://reader030.vdocument.in/reader030/viewer/2022032600/56649dce5503460f94ac21b8/html5/thumbnails/1.jpg)
1
ProActive performance evaluationwith NAS benchmarks
andoptimization of OO SPMD
Brian Amedro Vladimir Bodnartchouk
![Page 2: 1 ProActive performance evaluation with NAS benchmarks and optimization of OO SPMD Brian AmedroVladimir Bodnartchouk](https://reader030.vdocument.in/reader030/viewer/2022032600/56649dce5503460f94ac21b8/html5/thumbnails/2.jpg)
2
Outline
• TimIt : A profiling tool for ProActive
• OO SPMD model in ProActive
• Performance evaluation with NAS benchmarks
• Optimizing group communications
![Page 3: 1 ProActive performance evaluation with NAS benchmarks and optimization of OO SPMD Brian AmedroVladimir Bodnartchouk](https://reader030.vdocument.in/reader030/viewer/2022032600/56649dce5503460f94ac21b8/html5/thumbnails/3.jpg)
3
TimIt : A profiling tool for ProActiveA ProActive feature to time and analyze applications
![Page 4: 1 ProActive performance evaluation with NAS benchmarks and optimization of OO SPMD Brian AmedroVladimir Bodnartchouk](https://reader030.vdocument.in/reader030/viewer/2022032600/56649dce5503460f94ac21b8/html5/thumbnails/4.jpg)
4
OO SPMD model
• A parallel programming model
• Flexibility and high level of abstraction
• Strongly used in NAS benchmarks implementations
One To All Scattering Reduce operation
![Page 5: 1 ProActive performance evaluation with NAS benchmarks and optimization of OO SPMD Brian AmedroVladimir Bodnartchouk](https://reader030.vdocument.in/reader030/viewer/2022032600/56649dce5503460f94ac21b8/html5/thumbnails/5.jpg)
5
NAS Parallel Benchmarks
• Designed by NASA to evaluate benefits of high performance systems
• Strongly based on CFD
• 5 benchmarks (kernels) to test different aspects of a system
• Easy to implement thanks to OOSPMD pattern
• Tests performed on Sun 1.5 with RMI for ProActive and PGI 6.0 compiler for MPI
![Page 6: 1 ProActive performance evaluation with NAS benchmarks and optimization of OO SPMD Brian AmedroVladimir Bodnartchouk](https://reader030.vdocument.in/reader030/viewer/2022032600/56649dce5503460f94ac21b8/html5/thumbnails/6.jpg)
6
CG Kernel (Conjugate Gradient)
• Floating point operations
• Eigen value computation
• High number of unstructuredcommunications
• 12000 calls
• 570 MB sent
• 1 min 32
• 65 % comms
![Page 7: 1 ProActive performance evaluation with NAS benchmarks and optimization of OO SPMD Brian AmedroVladimir Bodnartchouk](https://reader030.vdocument.in/reader030/viewer/2022032600/56649dce5503460f94ac21b8/html5/thumbnails/7.jpg)
7
MG Kernel (Multi Grid)
• Floating point operations
• Solving Poisson problem
• Structured communications
• 600 calls
• 45 MB sent
• 1 min 32
• 80 % comms
![Page 8: 1 ProActive performance evaluation with NAS benchmarks and optimization of OO SPMD Brian AmedroVladimir Bodnartchouk](https://reader030.vdocument.in/reader030/viewer/2022032600/56649dce5503460f94ac21b8/html5/thumbnails/8.jpg)
8
IS Kernel (Integer Sort)
• Keys ranking operations
• Bucket sort
• Large arrays in memory
• 65 calls
• 22 MB sent
• 4 min 32
• 60 % comms
![Page 9: 1 ProActive performance evaluation with NAS benchmarks and optimization of OO SPMD Brian AmedroVladimir Bodnartchouk](https://reader030.vdocument.in/reader030/viewer/2022032600/56649dce5503460f94ac21b8/html5/thumbnails/9.jpg)
9
EP Kernel (Embarrassingly Parallel)
• Random numbers generation
• Almost no communications
• 6 calls
• 246 bytes sent
• 7 min 32
• 2 % comms
![Page 10: 1 ProActive performance evaluation with NAS benchmarks and optimization of OO SPMD Brian AmedroVladimir Bodnartchouk](https://reader030.vdocument.in/reader030/viewer/2022032600/56649dce5503460f94ac21b8/html5/thumbnails/10.jpg)
10
FT Kernel (Fourier Transformation)
• Floating point operations• Big messages : 8 MB per call
• 22 calls
• 180 MB sent
• 1 min 32
• 40 % comms
![Page 11: 1 ProActive performance evaluation with NAS benchmarks and optimization of OO SPMD Brian AmedroVladimir Bodnartchouk](https://reader030.vdocument.in/reader030/viewer/2022032600/56649dce5503460f94ac21b8/html5/thumbnails/11.jpg)
11
Optimizing group communications
Implement efficient group communication• Minimize the TCP traffic• Decrease the network congestion
Use clustering techniques to choose the better algorithm to use
![Page 12: 1 ProActive performance evaluation with NAS benchmarks and optimization of OO SPMD Brian AmedroVladimir Bodnartchouk](https://reader030.vdocument.in/reader030/viewer/2022032600/56649dce5503460f94ac21b8/html5/thumbnails/12.jpg)
12
Ring all-to-all algorithm
• Best for large size communications
• Takes n-1 steps
step
1 2 3
![Page 13: 1 ProActive performance evaluation with NAS benchmarks and optimization of OO SPMD Brian AmedroVladimir Bodnartchouk](https://reader030.vdocument.in/reader030/viewer/2022032600/56649dce5503460f94ac21b8/html5/thumbnails/13.jpg)
13
Recursive doubling all-to-all algorithm
• Best for small size communications
• Takes log(n) steps
step
1 2
![Page 14: 1 ProActive performance evaluation with NAS benchmarks and optimization of OO SPMD Brian AmedroVladimir Bodnartchouk](https://reader030.vdocument.in/reader030/viewer/2022032600/56649dce5503460f94ac21b8/html5/thumbnails/14.jpg)
14
Conclusion
• TimIt : easy and helpful profiling tool
• NAS benchmarks easy to implements with ProActive and OO SPMD patternhttp://www-sop.inria.fr/oasis/proactive/nas
• Good performances expected with future Sun Java 6 and usage of Ibis RMI
![Page 15: 1 ProActive performance evaluation with NAS benchmarks and optimization of OO SPMD Brian AmedroVladimir Bodnartchouk](https://reader030.vdocument.in/reader030/viewer/2022032600/56649dce5503460f94ac21b8/html5/thumbnails/15.jpg)
15
Questions
?
![Page 16: 1 ProActive performance evaluation with NAS benchmarks and optimization of OO SPMD Brian AmedroVladimir Bodnartchouk](https://reader030.vdocument.in/reader030/viewer/2022032600/56649dce5503460f94ac21b8/html5/thumbnails/16.jpg)
16
MPI / ProActiveMPI ProActive
Mpirun deployment
MPI_Init activities creationMPI_Finalize
MPI_Comm_Size getMyGroupSize
MPI_Comm_rank getMyRank
MPI_*Send method call (setter and getter)MPI_*Recv
MPI_Barrier barrier
MPI_Bcast method call
MPI_Scatter method call with a scattergroup as parameter
MPI_Gather result of a group communication
MPI_Reduce programmer's method
Back