fast static performance analysis of parallel …...fast static performance analysis of parallel...
TRANSCRIPT
Fast static performance analysis
of parallel program schemes
Yuriy Sheynin, Boris Sedov,
Alexey Syschikov, Vera Ivanova {sheynin, boris.sedov,
alexey.syschikov, vera.ivanova}@guap.ru
Presenting: Sergey Pakharev
Software for embedded systems and parallelism
2/13
20-24 April 2015 17th FRUCT Conference
For parallel software a very
important opportunity early to
assess the potential parallelism
and possible acceleration
depending on the number of
processors platform
Parallel program
3/13
20-24 April 2015 17th FRUCT Conference
VPL – visual programming language
Program on VPL – directed graph represented as block-schemes:
• vertices are the operators
• arcs are pointers, links operators
Early performance evaluation tool
4/13
20-24 April 2015 17th FRUCT Conference
Static analysis
• Evaluation of parallelism
and performance at early
stages
• Quick and “cheap” task
Complex performance analysis
Static analysis
Virtual simulator Platform simulator
5/13
Parallelism and data
20-24 April 2015 17th FRUCT Conference
Part of the program is parallel, the execution of such a program on the 2
processors must significantly reduce the total execution time. Real acceleration
of program execution < 1.5%.
The reason - the difference is the size of the
input data received on each parallel branch
program
6/13
Parallelism and data
adding matrix 𝑂 𝑛2
multiply matrix 𝑂 𝑛3
20-24 April 2015 17th FRUCT Conference
Some operators have asymptotic complexity that depends on the size of data
being processed.
For the analysis of the user specifies:
• Minimal data amount 𝑁𝑚𝑖𝑛
• Base data amount 𝑁𝑏𝑎𝑠𝑒
• Maximal data amount 𝑁𝑚𝑎𝑥
• Base time of program execution
𝐸𝑥𝑒𝑐𝐶𝑜𝑠𝑡𝑏𝑎𝑠𝑒
𝐸𝑥𝑒𝑐𝐶𝑜𝑠𝑡 = 𝐸𝑥𝑒𝑐𝐶𝑜𝑠𝑡𝑏𝑎𝑠𝑒
𝑂 𝑁𝑏𝑎𝑠𝑒∙ 𝑂 𝑁
7/13
Parallelism and data
20-24 April 2015 17th FRUCT Conference
Parallelism scheme decreases with increasing size of the matrix. The program is
not suitable for parallel platforms
• 𝑁𝑚𝑖𝑛 = 1
• 𝑁𝑚𝑎𝑥 = 15
• 𝑁𝑏𝑎𝑠𝑒 = 1
Hierarchy
8/13
12
3
20-24 April 2015 17th FRUCT Conference
VPL scheme program may also contain terminal blocks (data processing) and
composite operators (structural units)
Composite components are designed for
a hierarchical structuring of the program.
They may contain terminal operators and
other composite operators
Hierarchy
9/13
1 2 3complex
node
1 2 3P1
P2
1
2
3
complex
node
1
2
3P1
P2
20-24 April 2015 17th FRUCT Conference
Model performance composite structures:
Fully sequential
• all nodes in the body of the
compound statement are placed
on one processor
Fully parallel
• all nodes in the body of a
compound operator placed all
available processors by the
general rules
Hierarchy
10/13
Sequential model Parallel model
C1
C2
F1 F2
F3
F4P1
P2C2
F1 F2
F3
F4P1
P2C1
t=700 t=600
20-24 April 2015 17th FRUCT Conference
11/13
Iterations
For
While
F1
F2 F3 F4
P1
P2
20-24 April 2015 17th FRUCT Conference
Most of the computing in the program are presented as conditional (while) or
iterative (for) loops, they have a significant impact on the performance of the
program.
• The asymptotic complexity of the loop body
• The number of iterations
• Execution model (parallel / sequential)
Loop execution time = accumulated time execution of the body * number of
iterations
12/13
Conclusion
20-24 April 2015 17th FRUCT Conference
Static analyzer of parallel VPL programs provides:
• Evaluation of the program speedup on a different number of processors
• Evaluation of parallelism deviations depending on data amount and
processing operators complexity
• Evaluation includes aspects of the program hierarchy and loops
Further areas of work :
• Accounting features conditional statements (if/switch)
• Implementation of deeper analysis with virtual and platform simulator
13/13
Thank you!
20-24 April 2015 17th FRUCT Conference