ncsu 2/24/06 1 array dependence analysis with the chains of recurrences framework for loop...
TRANSCRIPT
![Page 1: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/1.jpg)
NCSU 2/24/06 1
Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization
Robert van Engelen
Florida State UniversityAlso thanks to J. Birch, Y. Shou, and K. Gallivan
![Page 2: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/2.jpg)
NCSU 2/24/06 2
Outline
Motivation Restructuring compilers Chains of recurrences algebra and associated
algorithms for the GCC and Polaris compilers Nonlinear array dependence testing for loop
restructuring and vectorization Experimental results Conclusions
![Page 3: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/3.jpg)
NCSU 2/24/06 3
Motivation
Intel CTO: “the increased power requirements of newer chips will lead to CPUs that are hotter than the surface of the sun by 2010”
Enter multi-core CPUs Increase the overall system speed by adding CPU cores Speed up multi-threaded applications Can effectively lower the power consumption
Enter (more?) multi-media extensions Vector-like instruction sets: MMX, SSE, AltiVec Speed up multi-media codes, such as JPEG, MPEG
![Page 4: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/4.jpg)
NCSU 2/24/06 4
Code Optimization by Hand or Automatic? Rewriting applications by hand to exploit parallelism is
doable, if: Tasks can be identified that run independently, such as a Web
browser’s rendering and communications tasks Course-grain parallelism: tasks must have sufficient work
Rewriting applications by hand to exploit lots of fine-grain parallelism is not doable Thousands of read-after-write (RAW), write-after-read (WAR),
and write-after-write (WAW), data dependences must be analyzed
![Page 5: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/5.jpg)
NCSU 2/24/06 5
Restructuring Compilers
A restructuring compiler typically applies source-code transformations automatically to meet various performance enhancement criteria: Exploit parallelism in loops by reordering the loop structure to
run loop iterations in parallel Find small loops to replace with vector instructions Optimize data locality by reordering code to change memory
access order and cache
All code changes are safe as long as RAW, WAR, and WAW data dependences are preserved!
![Page 6: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/6.jpg)
NCSU 2/24/06 6
Example: Loop Fission
Loop fission splits a single loop into multiple loops Allows vectorization and
parallelization of the new loops when original loop was sequential
Loop fission must preserve all dependence relations of the original loop
S1 DO I = 1, 10S2 DO J = 1, 10S3 A(I,J) = B(I,J) + C(I,J)S4 D(I,J) = A(I,J-1) * 2.0S5 ENDDOS6 ENDDO
S1 DO I = 1, 10S2 DO J = 1, 10S3 A(I,J) = B(I,J) + C(I,J)Sx ENDDOSy DO J = 1, 10S4 D(I,J) = A(I,J-1) * 2.0S5 ENDDOS6 ENDDO
S1 PARALLEL DO I = 1, 10S3 A(I,1:10)=B(I,1:10)+C(I,1:10)S4 D(I,1:10)=A(I,0:9) * 2.0S6 ENDDO
S3 (=,<) S4
S3 (=,<) S4
S3 (=,<) S4
![Page 7: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/7.jpg)
NCSU 2/24/06 7
Loop Fission: Algorithm
Compute the acyclic condensation of the dependence graph to find a legal order of the loops
S1 DO I = 1, 10S2 A(I) = A(I) + B(I-1)S3 B(I) = C(I-1)*X + ZS4 C(I) = 1/B(I)S5 D(I) = sqrt(C(I))S6 ENDDO
S2
S3
S4
S5
0
01
1
Dependence graph
S2 S5
S3 S4
Acyclic condensation
S1 DO I = 1, 10S3 B(I) = C(I-1)*X + ZS4 C(I) = 1/B(I)Sx ENDDOS2 A(1:10) = A(1:10) + B(0:9)S5 D(1:10) = sqrt(C(1:10))
S3 (<) S2
S4 (<) S3
S3 (=) S4
S4 (=) S5
![Page 8: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/8.jpg)
NCSU 2/24/06 8
Example: Loop Interchange
Changes the loop nesting order Allows vectorization of an
outer loop and more effective parallelization of an inner loop
Can be used to improve spatial locality
Loop interchange must preserve all dependence relations of the original loop
S1 DO I = 1, NS2 DO J = 1, MS3 A(I,J) = A(I,J-1) + B(I,J)S4 ENDDOS5 ENDDO
S2 DO J = 1, MS1 DO I = 1, NS3 A(I,J) = A(I,J-1) + B(I,J)S4 ENDDOS5 ENDDO
S2 DO J = 1, MS3 A(1:N,J)=A(1:N,J-1)+B(1:N,J)S5 ENDDO
S3 (=,<) S3
S3 (<,=) S3
S3 (<,=) S3
![Page 9: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/9.jpg)
NCSU 2/24/06 9
Loop Interchange: Algorithm
Compute the direction matrix and find which columns (and therefore which loops) can be permuted without violating dependence relations in the original loop nest
S1 DO I = 1, NS2 DO J = 1, MS3 DO K = 1, LS4 A(I+1,J+1,K) = A(I,J,K) + A(I,J+1,K+1)S5 ENDDOS6 ENDDOS7 ENDDO
S4 (<,<,=) S4
S4 (<,=,>) S4
< < =< = >
Direction matrix
< = <= > <
< < =< = >
Invalid
< < == < >
< < =< = >
Valid
![Page 10: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/10.jpg)
NCSU 2/24/06 10
Complications
Loop restructuring is complicated by: The presence of several induction variables Nonlinear and symbolic array index expressions The use of pointer arithmetic instead of arrays in C Non-unit loop strides and unstructured loops Control flow
Need loop normalization and preprocessing Apply induction variable substitution Convert pointer dereferences to array accesses Normalize the loop iteration space
![Page 11: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/11.jpg)
NCSU 2/24/06 11
Induction Variable Substitution
Example loop After IV substitution (IVS) (note the affine indexes)
After parallelization
I = 0 J = 1 while (I<N) I = I+1 … = A[J] J = J+2 K = 2*I A[K] = … endwhile
for i=0 to N-1 S1: … = A[2*i+1] S2: A[2*i+2] = … endfor
forall (i=0,N-1) … = A[2*i+1] A[2*i+2] = … endforall
GCD test to solve dependence equation 2id - 2iu = -1Since 2 does not divide 1 there is no data dependence.
W R W R W R
A[2*i+1]
…
A[2*i+2]
A[]
Dep testIVS
![Page 12: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/12.jpg)
NCSU 2/24/06 12
IV Recognitionon SSA Forms
I1 = 3M1 = 0do I2 = (I1,I3) J1 = (?,J3) K1 = (?,K2) L1 = (?,L2) M2 = (M1,M3) J2 = 3 I3 = I2+1 L2 = M2+1 M3 = L2+2 J3 = I3+J2
K2 = 2*J3
while (…)
I2(i) = 3+i J1(i) = 7+iL2(i) = 1+3i K1(i) = 14+2iM2(i) = 3i
Spanningtree
[Cytron91, Wolfe92]
![Page 13: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/13.jpg)
NCSU 2/24/06 13
Symbolic Differencingdo x = x+z y = z+1 z = y+1while (…)
Iteration x y z
1 x+z diff z+1 diff z diff
2 x+2z+2 z+2 diff z+3 2 z+2 2
3 x+3z+6 z+4 2 z+5 2 z+4 2
Use abstract interpretation to evaluate loop iterations and construct symbolic difference table of the IV values
x(i) = x0 + z0i + (i2-i) y(i) = z0 + 2i + 1 z(i) = z0 + 2i
[Haghighat95]
![Page 14: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/14.jpg)
NCSU 2/24/06 14
Pointer-to-Array Conversion
f += 2;lsp += 2;for (i = 2; i <= 5; i++){ *f = f[-2]; for (j = 1; j < i; j++, f--) *f += f[-2]-2*(*lsp)*f[-1]; *f -= 2*(*lsp); f += i; lsp += 2;}
Lsp_az speech codec segmentfrom ETSI with pointer updates.
for (i = 0; i <= 3; i++){ f[i+2] = f[i]; for (j = 0; j <= i; j++) f[i-j+2] += f[i-j]- 2*lsp[2*i+2]*f[i-j+1]; f[1] -= 2*lsp[2*i+2];}
Lsp_az speech codec segmentafter pointer-to-array conversion.
Note that all array indexexpressions are affine.
[vanEngelen01, Franke01]
![Page 15: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/15.jpg)
NCSU 2/24/06 15
Control-Flow Issues
Conditional array accesses and conditionally updated induction variables present problems:
do { K = 3; K = K+J; if (…) J = K; else J = J+3; A[J] = …} while (J<N)
DO I=1,10 IF … J = J+2 ELSE J = I ENDIF A(J) = …ENDDO
for (…) { if (…) A[I] = … else … = A[J]
}
Assume RAW andWAR dependences
Extensive analysisreveals that J:=J+3
Problem: J has nosingle recurrence form
![Page 16: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/16.jpg)
NCSU 2/24/06 16
Chains of Recurrences for Compiler Optimization
Chains of recurrence forms and algebra can be used to: Detect (non)linear coupled IVs Analyze pointer arithmetic Effectively handle control flow Implement array dependence testing
![Page 17: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/17.jpg)
NCSU 2/24/06 17
Chains of Recurrences
A chain of recurrences (CR) represents a polynomial or exponential function or mix evaluated over a unit-distance grid [Zima92]
Basic form: {init, , stride}
Iteration {init, , stride} f(i) = 2i+1 = {1,+,2} f(i) = 2i = {1,*,2}
i = 0 init 1 1
i = 1 init stride 3 2
i = 2 init stride stride 5 4
i = 3 init stride stride stride 7 8
![Page 18: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/18.jpg)
NCSU 2/24/06 18
Chains of Recurrences:General Formulation The key idea is to represent a non-constant CR stride in
CR form itself, thereby forming a chain of recurrences
Example: f(i) = i2 = {0, +, s(i-1)} = {0, +, 1, +, 2} where s(i-1) = {1, +, 2}
Iteration {init, , s(i-1)} s(i) = {1, +, 2} f(i) = {0, +, s(i-1)}
i = 0 init 1 0
i = 1 init s(0) 3 1
i = 2 init s(0) s(1) 5 4
i = 3 init s(0) s(1) s(2) 7 9
![Page 19: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/19.jpg)
NCSU 2/24/06 19
CRs for Expediting Function Evaluations on Grids Suppose f(i) = a + b·i + c·i2 = {a, +, {b+c, +, 2c}} We have two IVs x and y:
f(i) = x = {x0, +, y} with x0 = as(i) = y = {y0, +, 2c} with y0 = b+c
Implement loop to update x and y for efficient evaluation of f(i) over a unit-distance grid i = 0, …, n :
x = ay = b+cfor i=0 to n f[i] = x x = x+y y = y+2*cendfor
0
5
10
15
20
25
30
1 2 3 4 5 6 7 8 9 10
Iteration
s(i)
![Page 20: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/20.jpg)
NCSU 2/24/06 20
Multi-Dimensional Example
Let f(i,j) = i2 + i·j + 1
1. Create IV k for f(i,j) in j-loop:f(i,j) = kj = {pi, +, ri}j with pi = i2 + 1 and ri = i
2. Create IVs for pi and ri in i-loop:pi = {p0, +, qi}i with p0 = 1qi = {q0, +, 2}i with q0 = 1ri = {r0, +, 1}i with r0 = 0
3. Implement k, p, q, and r ini-j-loop nest
p = 1q = 1r = 0for i = 0 to n k = p for j = 0 to m f[i,j] = k k = k+r endfor p = p+q q = q+2 r = r+1endfor
![Page 21: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/21.jpg)
NCSU 2/24/06 21
CR Construction with the CR Algebra To construct the CR form of a symbolic function f(i):
1. Replace i with CR {0,+,1}2. Apply CR algebra rewrite rules (selected rules shown):
Example:f(i) = c·(i+a) = c·({0, +, 1}+a) = c{a, +, 1} = {c·a, +, c}
{x, +, y} + c {x+c, +, y}
c{x, +, y} {c·x, +, c·y}
{x, +, y} + {u, +, v} {x+u, +, y+v}
{x, +, y} * {u, +, v} {x·u, +, y{u, +, v}+v{x, +, y}+y·v}
![Page 22: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/22.jpg)
NCSU 2/24/06 22
Loop Analysis with CR Forms
The basic idea: Scan the loop to detect IV updates Construct the CR form for each IV using the CR algebra
do J = J+I I = I+3 P = 2*P while (…)
J = {J0, +, I} J = {J0, +, {I0, +, 3}} I = {I0, +, 3} P = {P0, *, 2}
[vanEngelen01]
![Page 23: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/23.jpg)
NCSU 2/24/06 23
Algorithm 1: Find Recurrences
Input: Loop L with live variable informationOutput: Set S of recurrence relations of IVs
1. Start with set S = { v, v | v is live at loop header }2. Search L from bottom to top:
for each assignment v = x of expression x to scalar variable v update tuples u, y in S by replacing v in y with x
Loop L Step Changes to S = {H, H, I, I, J, J, K, K}
do M = 2 L = J-H J = L+M K = K+M*I I = I+1 while (…)
54321
S5 = {H, H, I, I+1, J, J-H+2, K, K+2*I}S4 = {H, H, I, I+1, J, J-H+M, K, K+M*I}S3 = {H, H, I, I+1, J, L+M, K, K+M*I}S2 = {H, H, I, I+1, J, J, K, K+M*I}S1 = {H, H, I, I+1, J, J, K, K}
![Page 24: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/24.jpg)
NCSU 2/24/06 24
Algorithm 2: Compute CR Forms
Input: Set S with recurrence relationsOutput: CR forms for IVs in S
1. For each relation v, x in S do:if x is of the form v then v = v0 (v is loop invariant) if x is of the form v + y then v = {v0, +, y}if x is of the form v * y then v = {v0, *, y}if x does not contain v then v = {v0, #, y} (v is wrap around)
2. Simplify the CR forms with the CR algebra rewrite rules
Recurrence relation in S CR form Simplified CR form
H, HI, I+1J, J-H+2K, K+2*I
H = H0
I = {I0, +, 1}
J = {J0, +, 2-H}
K = {K0, +, 2*I}
H = H0
I = {I0, +, 1}
J = {J0, +, 2-H0}
K = {K0, +, 2I0, +, 2}
![Page 25: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/25.jpg)
NCSU 2/24/06 25
Algorithm 3: Solve
Input: CR forms for IVsOutput: Closed-form solutions for IVs (when possible)
1. For each CR form of v apply the CR inverse algebra, assuming loop is normalized for i = 0, …, n
2. Certain “exotic” mixed non-polynomial and non-exponential CR forms may not have closed forms
Loop L Simplified CR form Closed form
do M = 2 L = J-H J = L+M K = K+M*I I = I+1 while (…)
J = {J0, +, 2-H0} K = {K0, +, 2I0, +, 2} I = {I0, +, 1}
J(i) = J0 + (2-H0)*i K(i) = K0 + i2 + (2I0-1)*i I(i) = I0 + i
![Page 26: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/26.jpg)
NCSU 2/24/06 26
Example 1
Loop L Step S = {x, x, z, z} CR form Closed form
x = 2 z = 0 do A(x) = A(z) x = x+z y = z+1 z = y+1 while (z<N)
321
S3 = {x, x+z, z, z+2}S2 = {x, x, z, z+2}S1 = {x, x, z, y+1}
x = {x0, +, z} z = {z0, +, 2}
x(i) = x0 + z0i + i2-i z(i) = z0+2i
do i=0,2*N-2 A(i*i-i+2) = A(2*i)end do
![Page 27: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/27.jpg)
NCSU 2/24/06 27
Example 2
DO I=1,M DO J=1,I ij = ij+1 ijkl = ijkl+I-J+1 DO K=I+1,M DO L=1,K ijkl = ijkl+1 xijkl[ijkl]=xkl[L] ENDDO ENDDO ijkl = ijkl+ij+left ENDDOENDDO
TRFD code segmentfrom Perfect Benchmark
with IV updates
DO I=0,M-1 DO J=0,I DO K=0,M-I-2 DO L=0,I+K+1 tmp = ijkl+L+I*(K+(M+M*M+2*left+6)/4)+J*(left+(M+M*M)/2)+((I*I*M*M)+2*(K*K+3*K+I*I*(left+1))+M*I*I)/4+2 xijkl[tmp] = xkl[L+1] ENDDO ENDDO ENDDOENDDO
TRFD after aggressiveinduction variable substitution
IVS
![Page 28: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/28.jpg)
NCSU 2/24/06 28
Example 3 (SSA)
a = 1; a0 = 1while (a<10) { if (a0>=10) goto L2 x = a+2; L1: a = a+1; a1 = (a0, a2) } x0 = a1 + 2 a2 = a1+1 if (a2<10) goto L1 L2:
1
a1
a0
+a2
1
x0
+
2
a1 = {1,+,1}
GCC 4.x uses our approachapplied to SSA form.
Note: GCC developers referto CRs as “scalar evolutions”
![Page 29: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/29.jpg)
NCSU 2/24/06 29
Example 4 (SSA)
x = 0; x0 = 0 i = 1; i0 = 1while (i<10) { if (i0>=10) goto L2 x = x+i; L1: x1 = (x0, x2) i = i+1; i1 = (i0, i2) } x2 = x1+i1 i2 = i1+1 if (i2<10) goto L1 L2:
1
i1
i0
+i2
1
x1
x0
0
+x2
i1 = {1,+,1}x1 = {0,+,i1} = {0,+,1,+,1}
![Page 30: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/30.jpg)
NCSU 2/24/06 30
Example 5 (SSA)
j = 0;i = 1;while (i<10) { if (p) j = j+2; else j = j+3; i = i+1;}
j0 = 0 i0 = 1 if (i0>=10) goto L2
L1: i1 = (i0, i2)
j1 = (j0, j4)
if (!p) goto L3
j2 = j1+2 goto L4
L3: j3 = j1+3
L4: j4 = (j2, j3)
i2 = i1+1 if (i2<10) goto L1
L2:
0
j1
j0
+
j4
2
j2
j3
+
3
{0,+,2} < j1 < {0,+,3}
![Page 31: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/31.jpg)
NCSU 2/24/06 31
Recognizing Mixed Functional Forms and Reductions
Loop L Simplified CR form Factorial
I = 1 do F = F*I I = I+1 while (…)
F = {F0, *, 1, +, 1} I = {1, +, 1}
F = F0 * i!
Loop L Simplified CR form Reduction
I = 0; S = 0 do S = S+A[I] I = I+2 while (…)
S = {0, +, A[{0, +, 2}]} I = {0, +, 2}
S = ∑ A[2i]
![Page 32: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/32.jpg)
NCSU 2/24/06 32
Pointer Access Descriptions of Pointer and Array References
A pointer access description (PAD) [vanEngelen01] is a CR form of a pointer or array reference in a loop nest
PADs are computed with the CR-based IV algorithms
Loop Code PAD Sequence
a[i] {a, +, 1} a[0],a[1],a[2],a[3]
a[2*i+1] {a+1, +, 2} a[1],a[3],a[5],a[7]
a[(i*i-i)/2] {a, +, 0, +, 1} a[0],a[0],a[1],a[3]
a[1<<i] {a+1, +, 1, *, 2} a[1],a[2],a[4],a[8]
p++ {a, +, 1} a[0],a[1],a[2],a[3]
p+=i {a, +, 0, +, 1} a[0],a[0],a[1],a[3]
short a[…], *p;int i;p = a;for(i=0;…;i++){
}
![Page 33: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/33.jpg)
NCSU 2/24/06 33
CR-Enhanced Array Dependence Testing
Basic idea: construct dependence equations in CR form for both pointer and array accesses Determine the solution intervals by computing the value
ranges of the equations in CR form If the solution space is empty, there is no dependence
![Page 34: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/34.jpg)
NCSU 2/24/06 34
Example
float a[…], *p, *q; p = a; q = a+2*n; for (i=0; i<n; i++) { t = *p; S: *p++ = *q; *q-- = t; }
Dependence equation:{a, +, 1}id = {a+2n, + ,-1}iu
Constraints:0 < id < n-10 < iu < n-1
Rewrite dependence equation:{a, +, 1}id = {a+2n, +, -1}iu
{a, +, 1}id - {a+2n, +, -1}iu = 0 {{-2n, +, 1}iu, +, 1}id = 0
Compute solution interval:Low[{{-2n, +, 1}iu, +, 1}id]= Low[{-2n, +, 1}iu]= -2nUp[{{-2n, +, 1}iu, +, 1}id]= Up[{-2n, +, 1}iu + n-1]= Up[-2n + 2n - 2]= -2
No dependence
S *
p={a, +, 1}q={a+2n, +, -1}
![Page 35: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/35.jpg)
NCSU 2/24/06 35
Determining the Value Range of a CR Form
Suppose x(i) = {x0, +, s(i-1)} for i = 0, …, n If s(i-1) > 0 then x(i) is monotonically increasing If s(i-1) < 0 then x(i) is monotonically decreasing
If a function is monotonic on its domain, then it is trivial to find its exact value range
![Page 36: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/36.jpg)
NCSU 2/24/06 36
Example: Nonlinear and Symbolic Dependence Testing
float a[…], *p, *q;p = q = a;for (i=0; i<n; i++){ for (j=0; j<=i; j++) *q += *++p; q++;}
CR dep. test disprovesflow dependence (<, <)
p = {{a+1, +, 1, +, 1}i, +, 1}j = a[(i2+i)/2+j+1]q = {a, +, 1}i = a[i]
DO i = 1, M+1 S1: A[I*N+10] = ... S2: ... = A[2*I+K] K = 2*K+N ENDDO
S1: A[{N+10, +, N}i]S2: A[{K0+2N, +, K0+ N+2, *, 2}i]
CR range test disprovesdependence when
K+N > 10 and K > 2
![Page 37: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/37.jpg)
NCSU 2/24/06 37
Results
Implemented a CR-enhanced trapezoidal Banerjee test Relatively simple test Enhanced with support for nonlinear forms Enhanced with support for conditional flow Construct dependence equations in CR form
Implementation based on the Polaris compiler Pros: can compare to powerful dependence tests such as
Omega and Range test Cons: Fortran only
![Page 38: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/38.jpg)
NCSU 2/24/06 38
Additional Independences Filtered over Omega Test
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
DYFESM
MDGOCEAN
QCD TRFD GEP NEP SEP
CR-EVT
Omega
LAPACKPerf. Benchmark
![Page 39: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/39.jpg)
NCSU 2/24/06 39
Additional Independences Filtered over Range Test
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
DYFESM
MDGOCEAN
QCD TRFD GEP NEP SEP
CR-EVT
Range
![Page 40: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/40.jpg)
NCSU 2/24/06 40
Additional Independences Filtered over Omega+Range
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
DYFESM
MDGOCEAN
QCD TRFD GEP NEP SEP
CR-EVT
Omega+Range
![Page 41: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/41.jpg)
NCSU 2/24/06 41
Percentage of Conditional IVs w/o Closed Forms in LAPACK
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
GEP NEP SEP
Conditional IVs
Other IVs
![Page 42: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/42.jpg)
NCSU 2/24/06 42
Timing Comparison: Perf Bench.
0
1
2
3
4
5
6
7
8
9
10
DYFESM MDG OCEAN QCD TRFD
Time (s)
Range
Omega
CR-EVT
CR-EVT (opt)
![Page 43: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/43.jpg)
NCSU 2/24/06 43
Timing Comparison: LAPACK
0
10
20
30
40
50
60
70
GEP NEP SEP
Time (s)
Range
Omega
CR-EVT
CR-EVT (opt)
![Page 44: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/44.jpg)
NCSU 2/24/06 44
Conclusions
A CR-based compiler framework has advantages: Applicable to CFG, AST, and SSA forms Handles conditional flow Handles nonlinear and symbolic induction variable expressions Allows array and pointer-based dependence testing to be
applied directly to the CR forms without induction variable substitution
Future work: Improve GCC implementation Enhance other dependence tests with CR forms
![Page 45: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/45.jpg)
NCSU 2/24/06 45
Further Reading Robert van Engelen, Johnnie Birch, Yixin Shou, Burt Walsh, and Kyle Gallivan, “A
Unified Framework for Nonlinear Dependence Testing and Symbolic Analysis”, in the proceedings of the ACM International Conference on Supercomputing (ICS), 2004, pages 106-115.
Robert van Engelen, Johnnie Birch, and Kyle Gallivan, “Array Dependence Testing with the Chains of Recurrences Algebra”, in the proceedings of the IEEE International Workshop on Innovative Architectures for Future Generation High-Performance Processors and Systems (IWIA), January 2004, pages 70-81.
Robert van Engelen and Kyle Gallivan, “An Efficient Algorithm for Pointer-to-Array Access Conversion for Compiling and Optimizing DSP Applications”, in proceedings of the 2001 International Workshop on Innovative Architectures for Future Generation High-Performance Processors and Systems (IWIA), January 2001, pages 80-89.
Robert van Engelen, “Efficient Symbolic Analysis for Optimizing Compilers”, in proceedings of the International Conference on Compiler Construction, ETAPS 2001, LNCS 2027, pages 118-132.
![Page 46: NCSU 2/24/06 1 Array Dependence Analysis with the Chains of Recurrences Framework for Loop Optimization Robert van Engelen Florida State University Also](https://reader034.vdocument.in/reader034/viewer/2022051316/56649ec55503460f94bcf7b7/html5/thumbnails/46.jpg)
NCSU 2/24/06 46
The End