Download - Introduction to Polyhedral Compilation
Introduction to Polyhedral Compilation
Akihiro Hayashi, Jun Shirako Rice University
1
Outline q High-level Summary q Theory q Compilers and Tools
2
HIGH LEVEL SUMMARY Introduction to Polyhedral Compilation
3
q The first priority is “performance”
4
Supercomputers Personal Computers Smartphones Embedded Pictures Borrowed From : commons.wikimedia.org, www.hirt-japan.info
Parallel Computing
Parallel programming is hard…
5
DRAM
L3 Cache
Core Core Core Core
L2 Cache L2 Cache
SIMD SIMD SIMD SIMD
L1$ L1$ L1$ L1$
DR
AM (s
low
est)
– R
egis
ter (
fast
est) Exploiting
SIMD
Scheduling tasks on CPUs
OptimizingData Locality
Multi-core CPUs Many-core GPUs
C
L2 Cache
DRAM
CCC
CC
CC
C C C C
CC
CC
CC
CC
C C C CUtilizing
Accelerators
A gap between domain experts and hardware
6
Application Domain(Domain Experts)
Prog Lang. Compilers Runtime
Want to get significant performance
improvement easily (Performance Portability)
Hard to exploit the full capability of hardware
We believe Languages and Compilers are very important!
Hardware (Concurrency Experts)
A review of literature q Automatic Parallelizing Compiler
§ IBM XL Compilers, Intel Compilers, OSCAR, Pluto, Polly, Polaris, R-Stream, SUIF, …
q Parallel Languages § Language-based:
ü Cilk, CUDA, OpenCL, C++AMP, Java, Habanero C/Java, PGAS, … § Directive-based:
ü OpenMP, OpenACC, OmpSs, … § Library-based:
ü Charm++, TBB, Thrust, RAJA, Kokkos, UPC++, HJLib, …
7
From the perspective of compilers…
q Compilers are one of the most complicated software L § Pointer Analysis § Scalar Optimizations § Loop Transformations § Vectorization/SIMDization § Scheduling § Exploiting accelerators § …
8Credits: dragon by Cassie McKown from the Noun Project, crossed swords by anbileru adaleru from the Noun Project, https://en.wikipedia.org/
What are compilers doing?
9
x = a + b;y = a + b;z = x + y;
+
a b
+
a b
+
Intermediate Representation(e.g. AST)
Programs
x = a + b;y = x;
z = x + y;
“Optimized” Code
Parsing Optimizations
What are compilers doing?
10
q Compiler can modify programs (e.g. change the execution order of statements) as long as maintaining the semantics of programs
x = a + b;y = a + b;z = x + y;
+
a b
+
a b
+
Intermediate Representation(e.g. AST)
Programs
x = a + b;y = x;
z = x + y;
“Optimized” Codez = x + y;x = a + b;
Examples of optimizations:Scalar optimizations
11
x = a + b;y = x;
z = x + y;
x = a + b;y = a + b;z = x + y;
a = 0;if (a) {
… }
ConstantPropaga4on
a = 0;if (0) {
… }
DeadCodeElimina4on a = 0;
CSE
Examples of optimizations:loop permutation (interchange)
12
for (i = 0; i < M; i++) { for (j = 0; j < N; j++) { b[i][j] = a[i][j]; }}
for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { b[i][j] = a[i][j]; }}
Offset access(faster on CPUs)
Stride access (slower on CPUs)
InterchangedOriginal
Examples of optimizations:loop fusion/distribution
13
for (i = 0; i < N; i++) { a[i] = b[i] + c[i]; d[i] = a[i] + e[i];}
for (i = 0; i < N; i++) { a[i] = b[i] + c[i];}for (i = 0; i < N; i++) { d[i] = a[i] + e[i];}
Better temporal locality on CPUs
Fused Distributed
Good for Vectorizationon CPUs
Depending on the loop size “N”
The phase-ordering problem q Which order is better?
14
a = 0;if (a) {
… }
DeadCodeElimina4on
a = 0;if (a) {
… }
a = 0;if (0) {
… }
ConstantPropaga4on
a = 0;if (a) {
… }
ConstantPropaga4on
a = 0;if (0) {
… }
DeadCodeElimina4on
a = 0;
15
x = a + b;y = a + b;z = x + y;
+
a b
+
a b
+
ASTPrograms
x = a + b;y = x;
z = x + y;
“Optimized” Code
AST vs. The Polyhedral Model
i >= 0;i < N;
…Polyhedron
(Affine Inequalities) “Synthesized” Code
TODAYAST
Why Polyhedral Model?
q One solution for tackling the phase-ordering problem q Good for performing a set of loop transformations
§ Loop permutation § Loop fusion/distribution § Loop tiling § …
16
“The Polyhedral Model is a convenient alternative representation which combines analysis power, expressiveness and high flexibility”- OpenScop Specification and Library
THEORY Introduction to Polyhedral Compilation
17
The polyhedral model in a nutshell q The polyhedral transformation = “scheduling (determine the execution order of statements)”
q 3 important things: § Domain: A set of instances for a statement § Scattering (Scheduling): an instance -> time stamp § Access: an instance -> array element(s)
q Limitation: Only applicable for Static Control Part (SCoP) in general § Loop bounds and conditionals are affine functions of the surrounding the loop iterators
18
for (i=1; …){ S1; for (j=1; …) S2;
1 ≤ iS1 ≤ 2;1 ≤ iS2 ≤ 2;1 ≤ jS2 ≤ 3;iS1 = iS2;
InequalitiesProgramConstraints:
Cost Function:
ILP
δe(!s,!t) = φSj
(!t) − φSi
( !s)
for (i=1; …){ S1;}for (i=1; …) { …;
“Synthesized” Code
Ci − Cj ≥ 0,!
Representation of “Domain”
q Observations: § S1 is executed 30
times (30 instances) § Each instance is
associated with (i,j)19
for (i=1; i <= 5; i++) for (j=1; j <= 6;j++) S1;
“The key aspect of the polyhedral model is to consider statement instances.”- OpenScop Specification and Library
Iteration Domain
q A set of constraints to represent instances of a statement § Using iteration vectors (i,j); § If those constraints are affine -> Polyhedron
20
for (i=1; i <= 5; i++) for (j=1; j <= 6;j++) S1;
1 ≤ i ≤ 5,1 ≤ j ≤ 6;
DS1 =
1 0 −1−1 0 50 1 −10 −1 6
⎛
⎝
⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟
ij1
⎛
⎝
⎜⎜⎜
⎞
⎠
⎟⎟⎟≥ 0
Credits: Clint (https://www.ozinenko.com/clint)
Representation of “Scheduling”:1-dimensional schedules
q Function T: returns the logical date of each statement
21
x = a + b; // S1y = a + b; // S2z = x + y; // S3
T_S1 = 0;T_S2 = 1;T_S3 = 2;
Logi
cal T
ime
T=0T=1T=2
Representation of “Scheduling”:multi-dimensional schedules
22
x = a + b; // S1for (i = 0; i < 2; i++) { a[i] = x; // S2}z = x + y; // S3Lo
gica
l Tim
e
T=1
T=2
T_S1 = (0);
T_S2(0) = (1, 0);T_S2(1) = (1, 1);T_S3 = (2)
T=0
i=0i=1
q Function T: returns the logical date of each statement q Logical dates may be multi-dimensional (c.f. clocks
§ Lexicographical Order: § C.f. Clocks (days, hours, minutes, seconds)
TS1 ≺ TS2 ≺ TS3 ⇔ (0) ≺ (1,i) ≺ (2)
Representation of “Scheduling”:multi-dimensional schedules
23
x = a + b; // S1for (i = 0; i < 2; i++) { a[i] = x; // S2}z = x + y; // S3Lo
gica
l Tim
e
T=1
T=2
T_S1 = (0);
T_S2(i) = (1, i);
T_S3 = (2)
T=0
i=0i=1
Parameterized:
Recall “Iteration domain”
0 ≤ i < 2
q Function T: returns the logical date of each statement q Logical dates may be multi-dimensional (c.f. clocks
§ Lexicographical Order: § C.f. Clocks (days, hours, minutes, seconds)
TS1 ≺ TS2 ≺ TS3 ⇔ (0) ≺ (1,i) ≺ (2)
Representation of “Scheduling”:multi-dimensional schedules
24
x = a + b; // S1for (i = 0; i < 2; i++) { a[i] = x; // S2}for (i = 0; i < 2; i++) { for (j = 0; j < 3; j++) { b[i][j] += a[i]; // S3 }}
Logi
cal T
ime
T=1
T=2
T_S1 = (0);T_S2(i) = (1, i);
T_S3(i,j) = (2, i, j);
T=0
i=0i=1
j=0i=1
j=0i=1
i=0
i=1
Loop transformations with schedules
25
for (i = 0; i < 2; i++) { for (j = 0; j < 3; j++) { b[i][j] = ...; // S1 }}
for (i = 0; i < 2; i++) { for (j = 0; j < 3; j++) { b[i][j] = ...; // S1 }}
TS1(i,j) = 1 00 1
⎛
⎝⎜
⎞
⎠⎟ i
j⎛
⎝⎜⎜
⎞
⎠⎟⎟ = i
j⎛
⎝⎜⎜
⎞
⎠⎟⎟
T_S1(i, j) = (i, j);
T_S1(i, j) = (i, j);
Originalschedule
Newschedule
New Schedule
Iteration Vector
Original
NewTransformation
Loop transformations with schedules: Loop Reversal
26
for (i = 0; i < 2; i++) { for (j = 0; j < 3; j++) { b[i][j] = ...; // S1 }}
T_S1(i, j) = (i, j);
T_S1(i, j) = (-i, j);
Originalschedule
Newschedule
Original
NewTS1(i,j) = −1 0
0 1⎛
⎝⎜
⎞
⎠⎟ i
j⎛
⎝⎜⎜
⎞
⎠⎟⎟ = −i
j⎛
⎝⎜⎜
⎞
⎠⎟⎟
New Schedule
Iteration VectorTransformation
for (i = -1; i <= 0; i++) { for (j = 0; j < 3; j++) { b[-i][j] = ...; // S1 }} inew = −iold;
iold → −inew;
Loop transformations with schedules: Loop Permutation
27
for (i = 0; i < 2; i++) { for (j = 0; j < 3; j++) { b[i][j] = ...; // S1 }}
for (j = 0; j < 3; j++) { for (i = 0; i < 2; i++) b[i][j] = ...; // S1 }}
T_S1(i, j) = (i, j);
T_S1(i, j) = (j, i);
Originalschedule
Newschedule
Original
NewTS1(i,j) = 0 1
1 0⎛
⎝⎜
⎞
⎠⎟ i
j⎛
⎝⎜⎜
⎞
⎠⎟⎟ = j
i⎛
⎝⎜⎜
⎞
⎠⎟⎟
New Schedule
Iteration VectorTransformation
Loop transformations with schedules: Loop Skewing
28
for (i = 1; i <= 5; i++) { for (j = 1; j <= 5; j++) { a[i][j] = a[i-1][j+1]; // S1 }}
for (i = 1; i <= 5; i++) { for (j = i+1; j <= i+5; j++) { a[i][j-i] = a[i-1][j-i+1]; // S1 }}
T_S1(i, j) = (i, j);
T_S1(i, j) = (i, i+j);
Originalschedule
Newschedule
Original
NewTS1(i,j) = 1 0
1 1⎛
⎝⎜
⎞
⎠⎟ i
j⎛
⎝⎜⎜
⎞
⎠⎟⎟ = i
i + j⎛
⎝⎜⎜
⎞
⎠⎟⎟
New Schedule
Iteration VectorTransformation
jnew = i + jold;jold → jnew − i;
Loop transformations with schedules: Loop Skewing (Cont’d)
29
TS1 = 1 01 1
⎛
⎝⎜
⎞
⎠⎟ i
j⎛
⎝⎜⎜
⎞
⎠⎟⎟
Credits: Clint (https://www.ozinenko.com/clint)
(i,i+j)=(1,2);(1,3);(1,4);(1,5);(2,3);(2,4);(2,5);(2,6);(2,7);(3,4);(3,5);(3,6);(3,7);(3,8);(4,5);…
(i,j)=(1,1);(1,2);(1,3);(1,4);(1,5);(2,1);(2,2);(2,3);(2,4);(2,5);(3,1);(3,2);(3,3);(3,4);(3,5);…
DependenceExecution Order
Scalar Dimensions in schedules q 2d+1 format (d+d+1) q Can represent/transform imperfectly nested loops
§ e.g., Loop fusion/distribution
30
for (i = 0; i < 2; i++) s[i] = ...; // S1 for (j = 0; j < 3; j++) a[i][j] = ...; // S2for (i = 0; i < 2; i++) for (j = 0; j < 3; j++) b[i] = ...; // S3
T_S1(i) = (0, i, 0);T_S2(i,j) = (0, i, 1, j, 0);
T_S3(i,j) = (1, i, 0, j, 0)
Loop transformations to schedulesloop fusion w/ scalar dimensions
31
for (i = 0; i < 2; i++) for (j = 0; j < 3; j++) a[i] = ...; // S1for (i = 0; i < 2; i++) for (j = 0; j < 3; j++) b[i] = ...; // S2
for (i = 0; i < 2; i++) for (j = 0; j < 3; j++) a[i] = ...; // S1 for (j = 0; j < 3; j++) b[i] = ...; // S2
T_S1(i,j) = (0, i, 0, j); T_S2(i,j) = (1, i, 0, j);
T_S1(i,j) = (0, i, 0, j);T_S2(i,j) = (0, i, 1, j);
Originalschedule
Newschedule
TS2(i,j) =
0 01 00 00 1
⎛
⎝
⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟
ij⎛
⎝⎜⎜⎞
⎠⎟⎟ +
0010
⎛
⎝
⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟
=
0i1j
⎛
⎝
⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟
New Schedule
Scalar DimensionsTransformation
Original
New
Schedules in general
32
TS(!i) =
φS1(!i)
φS2(!i)
φS3(!i)
φS4(!i)"
φSd(!i)
⎛
⎝
⎜⎜⎜⎜⎜⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟⎟⎟⎟⎟⎟
=
C11S C12
S C13S C14
S " C1mSS
C21S C22
S C23S C24
S " C2mSS
C31S C32
S C33S C34
S " C3mSS
C41S C42
S C43S C44
S " C4mSS
# # # # $ #Cd1
S Cd2S Cd3
S Cd 4S " CdmS
S
⎛
⎝
⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟
!i( ) +
C10S
C20S
C30S
C40S
!Cd 0
S
⎛
⎝
⎜⎜⎜⎜⎜⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟⎟⎟⎟⎟⎟
Scalar DimensionsA transformation for an iteration vector
d
mS
d
1
Schedulese.g.,(0,i,0,j)
d = 2mS + 1, mS = the size of iteration vector
Schedules in general
33
TS(!i) =
φS1(!i)
φS2(!i)
φS3(!i)
φS4(!i)"
φSd(!i)
⎛
⎝
⎜⎜⎜⎜⎜⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟⎟⎟⎟⎟⎟
=
C11S C12
S C13S C14
S " C1mSS
C21S C22
S C23S C24
S " C2mSS
C31S C32
S C33S C34
S " C3mSS
C41S C42
S C43S C44
S " C4mSS
# # # # $ #Cd1
S Cd2S Cd3
S Cd 4S " CdmS
S
⎛
⎝
⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟
!i( ) +
C10S
C20S
C30S
C40S
!Cd 0
S
⎛
⎝
⎜⎜⎜⎜⎜⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟⎟⎟⎟⎟⎟
Scalar DimensionsA transformation for an iteration vector
d
mS
d
1
Schedulese.g.,(0,i,0,j)
d = 2mS + 1, mS = the size of iteration vector
Goal: Compute the coefficients and offsets for each statement
Legality of transformations
q All transformations are valid? NO! 34
for (i = 1; i <= 10; i++) s[i] = ...; // S1 for (j = 0; j < 3; j++) a[i][j] = s[i]; // S2
T_S1(i) = (0, i, 0);T_S2(i,j) = (0, i, 1, j, 0);
for (i = 1; i <= 10; i++) for (j = 0; j < 3; j++) a[i][j] = s[i]; // S2 s[i] = ...; // S1
T_S2(i,j) = (0, i, 0, j, 0);T_S2(i) = (0, i, 1);
Original
NewTransforma4on
Dependences q Three types of dependence:
§ Read-After-Write: (a=1; then b=a;) § Write-After-Read: (b=a; then a=1;) § Write-After-Write: (a=1; then a=2;)
q Dependence: computed from domain, access, and schedule § Transformation = Find a new schedule that satisfies
all dependences
35
Dependence polyhedron
q Dependence polyhedron : a set of inequalities ( ) § A general and accurate representation of instance-wise dependences
36
for (i = 1; i <= 10; i++) s[i] = ...; // S1 for (j = 0; j < 3; j++) a[i][j] = s[i]; // S2
iS1 = iS21 ≤ iS1 ≤ 10,1 ≤ iS2 ≤ 10;0 ≤ jS2 < 3;
1 −1 0 01 0 0 −1−1 0 0 100 0 1 00 0 −1 2
⎛
⎝
⎜⎜⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟⎟⎟
iS1iS2jS2
1
⎛
⎝
⎜⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟⎟
=
≥0
DS1
DS2
S1
S2
Credits: Clint (https://www.ozinenko.com/clint)
iS1 = iS2 ⇒ iS1 − iS2 ≥ 0 ∧ iS2 − iS1 ≥ 0
Legality of transformations q Dependence polyhedron: q Legality:
§ § If “source” instance must happen before
“target” instance in the original program, the transformed program must preserve this property (must satisfy the dependence)
37
∀ s,t ∈ Pe,(s ∈ DSi,t ∈ DSj),TSi(s) ≺ TSj(t)
Pe
Putting it all together
q Goal : Compute all coefficients and offsets such that
38
TS2(i,j) =
C11S2 C12
S2
C21S2 C22
S2
C31S2 C32
S2
C41S2 C32
S2
C51S2 C52
S2
⎛
⎝
⎜⎜⎜⎜⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟⎟⎟⎟⎟
iS2jS2
⎛
⎝⎜⎜
⎞
⎠⎟⎟ +
C10S2
C20S2
C30S2
C40S2
C50S2
⎛
⎝
⎜⎜⎜⎜⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟⎟⎟⎟⎟
TS1(i) =
C11S1
C21S1
C31S1
⎛
⎝
⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟
iS1( ) +C10
S1
C20S1
C30S1
⎛
⎝
⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟
1 −1 0 01 0 0 −1−1 0 0 100 0 1 00 0 −1 2
⎛
⎝
⎜⎜⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟⎟⎟
iS1iS2jS2
1
⎛
⎝
⎜⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟⎟
=
≥0
∀ s,t ∈ Pe,(s ∈ DS1,t ∈ DS2),TS1(s) ≺ TS2(t)
DependencePolyhedron PeSchedules
iS1 = iS21 ≤ iS1 ≤ 10,1 ≤ iS2 ≤ 10;0 ≤ jS2 < 3;
Linearizing the legality condition(The Pluto Algorithm) q The Legality condition (for iteration vectors) q Uniform dependences : distance between two dependent
iteration is a constant ( is a constant) q Non-uniform dependences : distance between two
dependence varies ( is a function of j ) § Apply the Farkas lemma
39
δ(s,t) = (c1Sj,c2
Sj,…,cmSjSj)!t − (c1
Si,c2Si,…,cmSi
Si)!s ≥ 0, s,t ∈ P
i → i + 1 ⇒ δ(s,t)
i → i + j ⇒ δ(s,t)
(c1Sj,c2
Sj,…,cmSjSj)!t − (c1
Si,c2Si,…,cmSi
Si)!s ≥ 0, s,t ∈ Pe ⇔
(c1Sj,c2
Sj,…,cmSjSj)!t − (c1
Si,c2Si,…,cmSi
Si)!s ≡ λe0 + λekk =1
me∑ Pek,λek ≥ 0
Each inequality in a dependence
polyhedron
Cost Function & Objective Function(The Pluto Algorithm) q Compute all coefficients and offsets under the legality
condition : Solve an ILP problem q Cost Function = Transformation policy
§ Pluto’s cost function = dependence distance
ü Fuse loops as much as possible ü Push loops carrying dependence inner level
§ Also used in ISL (Polly, PPCG, …) q Objective Function:
§ Iteratively find linearly independent solutions 40
δ(s,t) = (c1Sj,c2
Sj,…,cmSjSj)!t − (c1
Si,c2Si,…,cmSi
Si)!s, s,t ∈ P
minimize ≺ (u1,w,c1Sj,c2
Sj)
Step-by-step example
41
for (i = 0; i < N; i++) { for (j = 1; j < N; j++) { a[i][j] = a[j][i] + a[i][j-1]; // S1 }}
a[0][1] = a[1][0] + a[0][0]; // S1(0,1)a[0][2] = a[2][0] + a[0][1]; // S1(0,2)a[0][3] = a[3][0] + a[0][2]; // S1(0,3)...a[1][1] = a[1][1] + a[1][0]; // S1(1,1)a[1][2] = a[2][1] + a[1][1]; // S1(1,2)a[1][3] = a[3][1] + a[1][2]; // S1(1,3)...a[2][1] = a[1][2] + a[2][0]; // S1(2,1)a[2][2] = a[2][2] + a[2][1]; // S1(2,2)a[2][3] = a[3][2] + a[2][2]; // S1(2,3)...a[3][1] = a[1][3] + a[3][0]; // S1(3,1)
Dependence 1 (RAW)Dependence 2 (RAW)Dependence 3 (WAR)
(is,js) → (it,jt)
c1S1,c2
S1( ) itjt
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟− c1
S1,c2S1( ) is
js
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟≥ 0, is,js,it,jt ∈ Pe1
⇒ c1S1it + c2
S1jt − (c1S1is + c2
S1js) = c1S1it + c2
S1jt − (c1S1it + c2
S1(jt − 1)) ≥ 0⇒ c2
S1 ≥ 0
Step-by-step example:Legality Constraints 1 (The Pluto Algorithm) q Dependence 1 : RAW (flow dependence )
42
Source: a[0][1] = a[1][0] + a[0][0]; // S1(0,1)Target: a[0][2] = a[2][0] + a[0][1]; // S1(0,2)...
Pe1 : is = it,js = jt − 1,0 ≤ it ≤ N − 1,2 ≤ jt ≤ Nδ(s,t) = (c1
Sj,c2Sj,…,cmSj
Sj)!t − (c1
Si,c2Si,…,cmSi
Si)!s ≥ 0, s,t ∈ PLegality Constraints:
DependencePolyhedronPe1
Step-by-step example:Legality Constraints 2 (The Pluto Algorithm) q Dependence 2 : RAW (flow dependence )
43
Pe2 : is = jt,js = it,1 ≤ it ≤ N,2 ≤ jt ≤ N,it − jt ≥ 1
c1S1,c2
S1( ) itjt
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟− c1
S1,c2S1( ) is
js
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟≥ 0, is,js,it,jt ∈ Pe1
⇒ c1S1it + c2
S1jt − (c1S1is + c2
S1js) = c1S1it + c2
S1jt − (c1S1jt + c2
S1it) ≥ 0⇒ (c1
S1 − c2S1)it + (c2
S1 − c1S1)jt ≥ 0,1 ≤ it ≤ N,2 ≤ jt ≤ N,it − jt ≥ 1
δ(s,t) = (c1Sj,c2
Sj,…,cmSjSj)!t − (c1
Si,c2Si,…,cmSi
Si)!s ≥ 0, s,t ∈ PLegality Constraints:
DependencePolyhedronPe2
Target: a[1][2] = a[2][1] + a[1][1]; // S1(1,2)...Source: a[2][1] = a[1][2] + a[2][0]; // S1(2,1)
(is,js) → (it,jt)
FarkasLemma+FourierMozkin c1S1 − c2
S1 ≥ 0
Step-by-step example:Putting it all together (The Pluto Algorithm) q Dependence 1 q Dependence 2 & 3
q Avoiding zero vector
q Objective Function:
44
c2S1 ≥ 0,w ≥ c2
S1
c1S1 − c2
S1 ≥ 0,u1 ≥ 0,u1 ≥ c1S1 − c2
S1,3u1 + w ≥ c1S1 − c2
S1
c1S1 + c2
S1 ≥ 1
minimize ≺ (u1,w,c1S1,c2
S1) → (0,1,1,1)
Constraints using parameter N that bound the dependence distances
Find linearly Independent answer TS1(i,j) = 1 11 0
⎛
⎝⎜
⎞
⎠⎟ i
j⎛
⎝⎜⎜
⎞
⎠⎟⎟
Summary q The polyhedral transformation = “scheduling (determine the execution order of statements)”
q 3 important things: § Domain: A set of instances for a statement § Scattering (Scheduling): an instance -> time stamp § Access: an instance -> array element(s)
q Limitation: Only applicable for Static Control Part (SCoP) in general § Loop bounds and conditionals are affine functions of the surrounding the loop iterators
45
for (i=1; …){ S1; for (j=1; …) S2;
1 ≤ iS1 ≤ 2;1 ≤ iS2 ≤ 2;1 ≤ jS2 ≤ 3;iS1 = iS2;
InequalitiesProgramConstraints:
Cost Function:
ILP
δe(!s,!t) = φSj
(!t) − φSi
( !s)
for (i=1; …){ S1;}for (i=1; …) { …;
“Synthesized” Code
Ci − Cj ≥ 0,!
COMPILERS AND TOOLS Introduction to Polyhedral Compilation
46
Polyhedral Compilers & Tools q PoCC (The Polyhedral Compiler Collection)
§ http://web.cs.ucla.edu/~pouchet/software/pocc/ § Clan: extract a polyhedral IR from the source code § Candl: a dependence analyzer § LetSee: legal transformation space explorer § PLuTo: an automatic parallelizer and locality
optimizer § CLooG: code generation from the polyhedral IR
47
Polyhedral Compilers & Tools q Polly
§ http://polly.llvm.org/ § ISL: Integer Set Library (including code generator)
q Clay/Chrole/Clint § https://www.ozinenko.com/projects § Clay: “Chunky Loop Alteration wizardrY” § Chrole: “Recovering high-level syntactic description of the
automatically computed polyhedral optimization” § Clint: “Interactive graphical interface to the manual and
compiler-assisted program restructuring in the polyhedral model”
48
Clint
49
Further readings q Fundamentals
§ OpenScop Specification ü http://icps.u-strasbg.fr/people/bastoul/public_html/development/openscop/docs/openscop.html
§ ISL ü https://lirias.kuleuven.be/bitstream/123456789/270231/1/icms2010verdoolaege.pdf
q Pluto algorithm § U. Bondhugula, “Effective Automatic Parallelization and Locality Optimization Using The Polyhedral
Model” (PhD Dissertation, 2010) § U. Bondhugula, A. Hartono, J. Ramanujam, P. Sadayappan, “A Practical Automatic Polyhedral
Parallelizer and Locality Optimizer.” [PLDI’08] q Polly
§ T. Grosser, S. Verdoolaege, A. Cohen, “Polyhedral AST generation is more than scanning polyhedra” [ACM TOPLAS2015]
q Polyhedral model + AST-based Cost Function § J. Shirako, L.N. Pouchet, V. Sarkar, “Oil and Water Can Mix: An Integration of Polyhedral and AST-
based Transformations.” [SC’14] q GPU Code Generation
§ S. Verdoolaege, J.C Juega. A. Cohen, J.I Gomez, C. Tenllado, F. Catthoor, “Polyhedral parallel code generation for CUDA” [ACM TACO2013]
§ J. Shirako, A. Hayashi, V. Sarkar., “Optimized Two-level Parallelization for GPU Accelerators using the Polyhedral Model” [CC’17] 50