© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
A Proposal of Operation History Management System for Source-to-Source Optimization of HPC Programs
Yasushi Negishi, Hiroki Murata and Takao MoriyamaDeep Computing, Tokyo Research Laboratory, IBM Research
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
2
Outline of this Presentation
1.Proposal of an algorithm for managing operation history of source-to-source optimization.
2.Prototype system with new user interface for managing operation history explicitly.
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
3
Outline of this Presentation
1.Proposal of an algorithm for managing operation history of source-to-source optimization.
2.Prototype system with new user interface for managing operation history explicitly.
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
4
Background Improvement of single processor performance is stopping, and architectures of
supercomputers is becoming more complex.
– Architecture-specific optimizations are needed to utilize various kinds of network and processor architectures to achieve reasonable performance.
Application areas for numerical simulations continue to expand.
–We need solve performance issues more effectively and more easily.
Source-to-source optimization tools are becoming important.
–Automatic conversion (a.k.a. refactoring) for optimization
–Support typical architecture-specific and application-specific performance optimization patterns.
–Reduce programmer’s time and human errors by supporting routine but troublesome optimization.
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
5
Strength reduction– Replace costly operation with an equivalent but less expensive operation
• E.g. x = r ** (-1) x = 1 / r– Steps
1. Modify the code to use less expensive operation by manual editing Loop unrolling & SIMDization
– Use SIMD instructions If compiler does not generate optimal SIMD instructions in a loop• E.g. x(i) = a(i) + b(i) * c(i) x(i) = FPMADD(a(i), b(i), c(i))• x(i+1) = a(i+1) + b(i+1) * c(i+1)
– Steps1. Unroll the loop by automatic conversion with specifying the range and unroll factor.2. Modify the unrolled loop body with in-line assemble code for SIMD by manual editing
Loop tiling (a.k.a. loop blocking, strip mine and interchange)– Change loop structure to increase memory access locality and cache hit ratio.
• E.g.
– Steps1. Modify the loop by automatic conversion with specifying the range and blocking factors.
Typical Source-to-Source Optimization Steps
for (i=0; i<N; i++) for (j=0; j<N; j++) c[i] = c[i]+ a[i,j]*b[j];
for (i=0; i<N; i+= Bi) for (j=0; j<N; j+= Bj ) for (ii=i; ii<min(i+Bi,N); ii++) for (jj=j; jj<min(j+Bj,N); jj++) c[ii] =c[ii]+ a[ii,jj]*b[jj];
Optimization steps are combinations of automatic conversion and manual editing
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
6
“Reapplication Conflict”
Because of trial-and-error nature of optimization work, it is sometimes required to undo an operation in the past or to insert or change operation in the past even if a single user manages the code.
We call this conflict caused by a single user as “Reapplication Conflict”. System for supporting Source-to-Source optimization should handle this conflict
correctly.
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
7
Issues of Existing Version Management Systems Handling “Reapplication Conflict” Because of trial-and-error nature of optimization work, it is sometimes
required to undo an operation in the past or to insert or change operation in the past even if a single user manages the code.
–We call this conflict caused by a single user as “Reapplication Conflict”.System should handle this conflict correctly.
Existing version management systems use algorithm of “patch” command or similar one to handle conflicts.
But the patch algorithm has a issue.–As for modification by manual editing, the patch algorithm works fine.
• The algorithm applies difference by an operation on different base code, with adjusting target range to be applied.
–As for modification by automatic conversion, the patch algorithm may generate unexpected results.
Scenario in which existing system does not work expectedly is shown.
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
8
Example Scenario of “Reapplication Conflict” (original) program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, x(n), sin, s, t1, t2, t3, rtc
a = 0 b = 0 pi = 3.14159265d0 s = rtc() do i = 1, n x(i) = i * sin(i / (pi * 4.0d0)) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + x(i) ** (-1) enddo t2 = rtc() - s s = rtc() do i = 2, n b = b + ((x(i) + a) / (pi * 4.0d0) + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end
Original
Original code is checked out.
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
9
Example Scenario of “Reapplication Conflict” (Step 1) program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc
a = 0 b = 0 pi = 3.14159265d0 s = rtc() fourpi = pi * 4.0d0 do i = 1, n x(i) = i * sin(i / fourpi) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + x(i) ** (-1) enddo t2 = rtc() - s s = rtc() do i = 2, n b = b + ((x(i) + a) / fourpi + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end
Original:
Step 1:
Original Operation A
Step 1: Do loop invariant code motion by manual editing, and check it in
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
10
Step 2: Do strength reduction by manual editing, and check it in.
Example Scenario of “Reapplication Conflict” (Step 2) program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc
a = 0 b = 0 pi = 3.14159265d0 s = rtc() fourpi = pi * 4.0d0 do i = 1, n x(i) = i * sin(i / fourpi) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1.0d0 / x(i) enddo t2 = rtc() - s s = rtc() do i = 2, n b = b + ((x(i) + a) / fourpi + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end
Original:
Step 1:
Step 2:
Original A B
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
11
Step 3: Do loop unrolling by automatic conversion, and check it in.
Example Scenario of “Reapplication Conflict” (Step 3) program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc
a = 0 b = 0 pi = 3.14159265d0 s = rtc() fourpi = pi * 4.0d0 do i = 1, n x(i) = i * sin(i / fourpi) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1.0d0 / x(i) enddo t2 = rtc() - s s = rtc() do i = 2, n, 4 b = b + ((x(i) + a) / fourpi + 1.0d0) b = b + ((x(i+1) + a) / fourpi + 1.0d0) b = b + ((x(i+2) + a) / fourpi + 1.0d0) b = b + ((x(i+3) + a) / fourpi + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end
Original:
Step 1:
Step 2:
Original A B C
Step 3:
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
12
Example Scenario of “Reapplication Conflict” (Step 4) program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc
a = 0 b = 0 pi = 3.14159265d0 s = rtc() fourpi = pi * 4.0d0 do i = 1, n x(i) = i * sin(i / fourpi) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1.0d0 / x(i) enddo t2 = rtc() - s s = rtc() do i = 2, n, 4 b = b + ((x(i) + a) / fourpi + 1.0d0) b = b + ((x(i+1) + a) / fourpi + 1.0d0) b = b + ((x(i+2) + a) / fourpi + 1.0d0) b = b + ((x(i+3) + a) / fourpi + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end
Original:
Step 1:
Step 2:
Original A B C
Step 3:
Step 4: Compile and execute the code, and analyze effects of optimizations
Find the following results Optimization A: not effective Optimization B: effective Optimization C: effective
N.G. O.K. O.K.
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
13
Example Scenario of “Reapplication Conflict” (Step 5)
Original:
Step 1:
Step 2:
Original A B C
Step 3:
Step 5:
Step 5: Undo the optimization A by “patch” command
program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc
a = 0 b = 0 pi = 3.14159265d0 s = rtc() fourpi = pi * 4.0d0 do i = 1, n x(i) = i * sin(i / fourpi) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1.0d0 / x(i) enddo t2 = rtc() - s s = rtc() do i = 2, n, 4 b = b + ((x(i) + a) / fourpi + 1.0d0) b = b + ((x(i+1) + a) / fourpi + 1.0d0) b = b + ((x(i+2) + a) / fourpi + 1.0d0) b = b + ((x(i+3) + a) / fourpi + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end
Target of optimization A
Not target of optimization A, but influenced
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
14
Example Scenario of “Reapplication Conflict” (Final Results) program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, x(n), sin, s, t1, t2, t3, rtc
a = 0 b = 0 pi = 3.14159265d0 s = rtc() do i = 1, n x(i) = i * sin(i / (pi * 4.0d0)) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1 / x(i) enddo t2 = rtc() - s s = rtc() do i = 2, n, 4 b = b + ((x(i) + a) / (pi * 4.0d0) + 1.0d0) b = b + ((x(i+1) + a) / fourpi + 1.0d0) b = b + ((x(i+2) + a) / fourpi + 1.0d0) b = b + ((x(i+3) + a) / fourpi + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end
Problem:The wrong line is unrolled !!
Because “patch” does not actually apply the automatic conversion operation again, but does just apply difference of the results by automatic conversion operation.
System for managing automatic conversion operations needed.
(1) Adjust the target range
(2) Apply the automatic operation actually again.
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
15
Proposed Algorithm for saving/applying automatic operations Manual editing handled by the patch algorithm
Automatic conversion handled by our proposed algorithm
Originalcode
Optimizationresults
Manual Editing
Context difference file
Saving an operation
Modifiedcode
Applying an saved operation
Optimized results on modified codePatch
algorithm
OriginalCode
Pseudo change file
SpecifyRange
Optimizationresults
Specify Conversion ID and arguments
Operation log
Context difference file
Operation log
Conversion ID
Arguments
ModifiedCode
Pseudo change file
Optimizationresults
Context difference file
Conversion ID
ArgumentsOperation log
Context difference fileOperation log
Patch algorithm Apply automatic
conversion
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
16
Scenario of Proposed Algorism to Save Automatic Operations program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc
a = 0 b = 0 pi = 3.14159265d0 s = rtc() fourpi = pi * 4.0d0 do i = 1, n x(i) = i * sin(i / fourpi) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1.0d0 / x(i) enddo t2 = rtc() - s s = rtc() do i = 2, n b = b + ((x(i) + a) / fourpi + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end
Algorithm for saving operation history program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc
a = 0 b = 0 pi = 3.14159265d0 s = rtc() fourpi = pi * 4.0d0 do i = 1, n x(i) = i * sin(i / fourpi) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1.0d0 / x(i) enddo t2 = rtc() - s s = rtc()$BEGIN do i = 2, n b = b + ((x(i) + a) / fourpi + 1.0d0) enddo$END t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end
pseudo change file
Step 1: Generate pseudo change file by inserting special lines to specify range for the automatic operation.
Step 2: Create context difference file between the file before editing and the pseudo change file
“loop unrolling”
*** opeB.F Sat Jul 11 11:36:34 2009--- opeC2.F Sun Jul 12 13:36:10 2009****************** 19,27 ****--- 19,29 ---- enddo t2 = rtc() - s s = rtc()+ $BEGIN do i = 2, n b = b + ((x(i) + a) / fourpi + 1.0d0) enddo+ $END t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3
4
By saving this context difference file, range-adjust algorithm of “patch” command can be used for identifying the target range of automatic conversion.
Step 3: Save identifier of automatic conversion operation (e.g. “loop unrolling”), its parameter (e.g. “4”), and the context difference file as its operation log.
context difference file
parameter
Identifier of automatic conversion Operation log
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
17
Scenario of Proposed Algorism to Apply Automatic Operation (Step 1) program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, x(n), sin, s, t1, t2, t3, rtc
a = 0 b = 0 pi = 3.14159265d0 s = rtc() do i = 1, n x(i) = i * sin(i / (pi * 4.0d0)) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + x(i) ** (-1) enddo t2 = rtc() - s s = rtc() do i = 2, n b = b + ((x(i) + a) / (pi * 4.0d0) + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end Algorithm for applying operation history
on modified target codeStep1: Apply the context diff file to the target program by using algorithm used by the “patch” command.
Trial 1: Apply the history at the same position
Not Match
Trial 2: Ignore the starting and ending line numbers
Match
“loop unrolling”
*** opeB.F Sat Jul 11 11:36:34 2009--- opeC2.F Sun Jul 12 13:36:10 2009****************** 19,27 ****--- 19,29 ---- enddo t2 = rtc() - s s = rtc()+ $BEGIN do i = 2, n b = b + ((x(i) + a) / fourpi + 1.0d0) enddo+ $END t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3
4
context difference file
parameter
Identifier of automatic conversion Operation log
Trial 3: Ignore outer most one line before/after the modificationTrial 4: Ignore outer most two lines before/after the modification
pseudo change file
program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc
a = 0 b = 0 pi = 3.14159265d0 s = rtc() fourpi = pi * 4.0d0 do i = 1, n x(i) = i * sin(i / fourpi) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1.0d0 / x(i) enddo t2 = rtc() - s s = rtc()$BEGIN do i = 2, n b = b + ((x(i) + a) / fourpi + 1.0d0) enddo$END t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
18
Scenario of Proposed Algorism to Apply Automatic Operation (Step 2)
Algorithm for applying operation history on modified target codeStep2: Redo automatic conversion with its parameter saved in the operation log.
*** opeB.F Sat Jul 11 11:36:34 2009--- opeC2.F Sun Jul 12 13:36:10 2009****************** 19,27 ****--- 19,29 ---- enddo t2 = rtc() - s s = rtc()+ $BEGIN do i = 2, n b = b + ((x(i) + a) / fourpi + 1.0d0) enddo+ $END t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3
context difference file
parameter
Identifier of automatic conversion Operation log
pseudo change file
program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc
a = 0 b = 0 pi = 3.14159265d0 s = rtc() fourpi = pi * 4.0d0 do i = 1, n x(i) = i * sin(i / fourpi) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1.0d0 / x(i) enddo t2 = rtc() - s s = rtc()$BEGIN do i = 2, n b = b + ((x(i) + a) / fourpi + 1.0d0) enddo$END t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end
“loop unrolling”
4
Redo “loop unrolling” “4” times on “the loop”
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
19
Proposed Algorism to Apply Automatic Operation (Final Results)
program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, x(n), sin, s, t1, t2, t3, rtc
a = 0 b = 0 pi = 3.14159265d0 s = rtc() do i = 1, n x(i) = i * sin(i / (pi * 4.0d0)) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1 / x(i) enddo t2 = rtc() - s s = rtc() do i = 2, n, 4 b = b + ((x(i) + a) / (pi * 4.0d0) + 1.0d0) b = b + ((x(i+1) + a) / (pi * 4.0d0) + 1.0d0) b = b + ((x(i+2) + a) / (pi * 4.0d0) + 1.0d0) b = b + ((x(i+3) + a) / (pi * 4.0d0) + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end
Problem solved.The correct line is unrolled !!
The proposed system can reapply automatic conversion operations correctly.
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
20
Outline of this Presentation
1.Proposal of an algorithm for managing operation history of source-to-source optimization.
2.Prototype system with new user interface for managing operation history explicitly.
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
21
Prototype Implementation of the Proposed System
Implemented as an Eclipse plug-in module– Worked with open source CDT/Photran modules– Use CDT/Photran’s C/Fortran parser
Eclipse
Photran module
(Fortran)
Open
SourceHPC refactoring module
CDT module
(C)
Open
Source
Pre
-def
ined
Tra
nsf
orm
atio
n
rule
s
Use
r d
efin
edT
ran
sfo
rmat
ion
ru
les
Use
r d
efin
edT
ran
sfo
rmat
ion
ru
les
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
22
Proposal of user interface for operation history management system
Source code tree view
Information and console output view
Source code view
Operation history view
Operation history view
1. Operation History is displayed as a sequence, and user can select and modify any point of source code.
3. Operations are categorized into the following three categories according to the status and necessity of the reapplication, and are displayed by using three colors.
Green: AppliedYellow: Not tried to appliedRed: Tried to applied, but fail.
2. The succeeding operations are automatically reapplied as needed to produce a new version according to the user’s instructions.
© 2009 IBM Corporation
19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois
23
Conclusion
1.Explained proposal of an algorithm for managing operation history of source-to-source optimization.
2.Explained Prototype system with new user interface for managing operation history explicitly.