![Page 1: INSPIRE The Insieme Parallel Intermediate Representation](https://reader033.vdocument.in/reader033/viewer/2022051420/56816005550346895dcf0678/html5/thumbnails/1.jpg)
INSPIREThe Insieme Parallel
Intermediate Representation
Herbert Jordan, Peter Thoman, Simone
Pellegrini, Klaus Kofler, and Thomas Fahringer
University of Innsbruck
PACT’13 - 9 September
![Page 2: INSPIRE The Insieme Parallel Intermediate Representation](https://reader033.vdocument.in/reader033/viewer/2022051420/56816005550346895dcf0678/html5/thumbnails/2.jpg)
Programming Models
void main(…) { int sum = 0; for(i = 1..10) sum += i; print(sum);}
C / C++
HW:
C
Memory
User:
![Page 3: INSPIRE The Insieme Parallel Intermediate Representation](https://reader033.vdocument.in/reader033/viewer/2022051420/56816005550346895dcf0678/html5/thumbnails/3.jpg)
Programming Models
void main(…) { int sum = 0; for(i = 1..10) sum += i; print(sum);}
C / C++
HW:
C
Memory
User: Compiler:
.START ST ST: MOV R1,#2 MOV R2,#1 M1: CMP R2,#20 BGT M2 MUL R1,R2 INI R2 JMP M1
IR
PL Assembly• instruction selection• register allocation• optimization
![Page 4: INSPIRE The Insieme Parallel Intermediate Representation](https://reader033.vdocument.in/reader033/viewer/2022051420/56816005550346895dcf0678/html5/thumbnails/4.jpg)
Programming Models
void main(…) { int sum = 0; for(i = 1..10) sum += i; print(sum);}
C / C++
HW:User: Compiler:
C
Memory$
.START ST ST: MOV R1,#2 MOV R2,#1 M1: CMP R2,#20 BGT M2 MUL R1,R2 INI R2 JMP M1
IR
PL Assembly• instruction selection• register allocation• optimization
• loops & latency
![Page 5: INSPIRE The Insieme Parallel Intermediate Representation](https://reader033.vdocument.in/reader033/viewer/2022051420/56816005550346895dcf0678/html5/thumbnails/5.jpg)
Programming Models
void main(…) { int sum = 0; for(i = 1..10) sum += i; print(sum);}
C / C++
HW:User: Compiler:
C
Memory$
.START ST ST: MOV R1,#2 MOV R2,#1 M1: CMP R2,#20 BGT M2 MUL R1,R2 INI R2 JMP M1
IR
PL Assembly• instruction selection• register allocation• optimization
• loops & latency
10 year old architecture
![Page 6: INSPIRE The Insieme Parallel Intermediate Representation](https://reader033.vdocument.in/reader033/viewer/2022051420/56816005550346895dcf0678/html5/thumbnails/6.jpg)
Parallel Architectures
Memory$
C CC C
Multicore: Accelerators:
M$C G
M
Clusters:
M$C
M$C
M$C
OpenMP/Cilk OpenCL/CUDA MPI/PGAS
![Page 7: INSPIRE The Insieme Parallel Intermediate Representation](https://reader033.vdocument.in/reader033/viewer/2022051420/56816005550346895dcf0678/html5/thumbnails/7.jpg)
Compiler Support
void main(…) { int sum = 0; #omp pfor for(i = 1..10) sum += i;}
C / C++.START ST ST: MOV R1,#2 MOV R2,#1 _GOMP_PFORM1: CMP R2,#20 BGT M2 MUL R1,R2 INI
IR pfor: mov eax,-2 cmp eax, 2 xor eax, eax ...
lib
Start: mov eax,2 mov ebx,1 call “pfor”Label 1: lea esi, Str push esi
bin
Frontend Backend
sequential
![Page 8: INSPIRE The Insieme Parallel Intermediate Representation](https://reader033.vdocument.in/reader033/viewer/2022051420/56816005550346895dcf0678/html5/thumbnails/8.jpg)
Situation Compilers
unaware of thread-level parallelism magic happens in libraries
Libraries limited perspective / scope no static analysis, no transformations
User has to manage and coordinate parallelism no performance portability
![Page 9: INSPIRE The Insieme Parallel Intermediate Representation](https://reader033.vdocument.in/reader033/viewer/2022051420/56816005550346895dcf0678/html5/thumbnails/9.jpg)
Compiler Support?
M$
C CC C
G
MM$
C CC C
G
M
HW: Compiler: User:
.START ST ST: MOV R1,#2 MOV R2,#1 M1: CMP R2,#20 BGT M2 MUL R1,R2 INI R2 JMP M1
IR
PL Assembly• instruction selection• register allocation• optimization
• loops & latency• vectorization
void main(…) { int sum = 0; for(i = 1..10) sum += i; print(sum);}
C / C++
M$
C CC C
G
MM$
C CC C
G
M
![Page 10: INSPIRE The Insieme Parallel Intermediate Representation](https://reader033.vdocument.in/reader033/viewer/2022051420/56816005550346895dcf0678/html5/thumbnails/10.jpg)
M$
C CC C
G
MM$
C CC C
G
M
Our approach:HW: Compiler: User:
.START ST ST: MOV R1,#2 MOV R2,#1 M1: CMP R2,#20 BGT M2 MUL R1,R2 INI R2 JMP M1
IR
PL Assembly• instruction selection• register allocation• optimization
• loops & latency• vectorization
void main(…) { int sum = 0; for(i = 1..10) sum += i; print(sum);}
C / C++
M$
C CC C
G
MM$
C CC C
G
M
![Page 11: INSPIRE The Insieme Parallel Intermediate Representation](https://reader033.vdocument.in/reader033/viewer/2022051420/56816005550346895dcf0678/html5/thumbnails/11.jpg)
Our approach: InsiemeUser:Compiler:
.START ST ST: MOV R1,#2 MOV R2,#1 M1: CMP R2,#20 BGT M2 MUL R1,R2 INI R2 JMP M1
IR
PL Assembly• instruction selection• register allocation• optimization
• loops & latency• vectorization
PL PL + extras• coordinate parallelism• high-level optimization• auto tuning• instrumentation
void main(…) { int sum = 0; #omp pfor for(i = 1..10) sum += i;}
C / C++
M$
C C
C CG
MM$
C C
C CG
M
HW:
M$
C C
C CG
MM$
C C
C CG
M
Insieme:
unit main(...) { ref<int> v1 =0; pfor(..., (){ ... });}
INSPIRE
![Page 12: INSPIRE The Insieme Parallel Intermediate Representation](https://reader033.vdocument.in/reader033/viewer/2022051420/56816005550346895dcf0678/html5/thumbnails/12.jpg)
Goal:to establish a research platform for
hybrid, thread level parallelism
The Insieme ProjectFr
onte
nd
Back
end
INSPIRE
Static OptimizerIR Toolbox IRSM
C/C++OpenMP
CilkOpenCL
MPI and extensions
Compiler
Dyn. OptimizerScheduler
Monitoring
Exec. Engine
Runtime
![Page 13: INSPIRE The Insieme Parallel Intermediate Representation](https://reader033.vdocument.in/reader033/viewer/2022051420/56816005550346895dcf0678/html5/thumbnails/13.jpg)
Parallel Programming OpenMP
Pragmas (+ API)
Cilk Keywords
MPI library
OpenCL library + JIT
Objective:combine those using a
unified formalism and to provide an infrastructure for
analysis and manipulations
![Page 14: INSPIRE The Insieme Parallel Intermediate Representation](https://reader033.vdocument.in/reader033/viewer/2022051420/56816005550346895dcf0678/html5/thumbnails/14.jpg)
INSPIRE RequirementsOpenMP / Cilk / OpenCL / MPI / others
INSPIRE
• complete • unified• explicit• analyzable• transformable• compact• high level• whole program• open system• extensible
OpenCL / MPI / Insieme Runtime / others
![Page 15: INSPIRE The Insieme Parallel Intermediate Representation](https://reader033.vdocument.in/reader033/viewer/2022051420/56816005550346895dcf0678/html5/thumbnails/15.jpg)
INSPIRE Functional Basis
first-class functions and closures generic (function) types program = 1 expression
Imperative Constructs loops, conditions, mutable state
Explicit Parallel Constructs to model parallel control flow
![Page 16: INSPIRE The Insieme Parallel Intermediate Representation](https://reader033.vdocument.in/reader033/viewer/2022051420/56816005550346895dcf0678/html5/thumbnails/16.jpg)
Parallel Model Parallel Control Flow
defined by jobs: processed cooperatively by thread
groups
![Page 17: INSPIRE The Insieme Parallel Intermediate Representation](https://reader033.vdocument.in/reader033/viewer/2022051420/56816005550346895dcf0678/html5/thumbnails/17.jpg)
Parallel Model (2) one work-sharing construct
one data-sharing construct
point-to-point communication abstract channels type:
![Page 18: INSPIRE The Insieme Parallel Intermediate Representation](https://reader033.vdocument.in/reader033/viewer/2022051420/56816005550346895dcf0678/html5/thumbnails/18.jpg)
Evaluation What inherent impact does the
INSPIRE detour impose?
FE BEINSPIRE
C
Input Code
Insieme Compiler
Binary A(GCC)
Binary B(Insieme)
GCC 4.6.3 (-O3)
Target Code(IRT)
C
No Optimization!
![Page 19: INSPIRE The Insieme Parallel Intermediate Representation](https://reader033.vdocument.in/reader033/viewer/2022051420/56816005550346895dcf0678/html5/thumbnails/19.jpg)
Performance Impact
Relative execution time ()
![Page 20: INSPIRE The Insieme Parallel Intermediate Representation](https://reader033.vdocument.in/reader033/viewer/2022051420/56816005550346895dcf0678/html5/thumbnails/20.jpg)
Derived Work (subset) Adaptive Task Granularity Control
P. Thoman, H. Jordan, T. Fahringer, Adaptive Granularity Control in Task Parallel Programs using Multiversioning, EuroPar 2013
Multiobjective Auto-TuningH. Jordan, P. Thoman, J. J. Durillo et al., A Multi-Objective Auto-Tuning Framework for Parallel Codes, SC 2012
Compiler aided Loop SchedulingP. Thoman, H. Jordan, S. Pellegrini et al., Automatic OpenMP Loop Scheduling: A Combined Compiler and Runtime Approach, IWOMP 2012
OpenCL Kernel PartitioningK. Kofler, I. Grasso, B. Cosenza, T. Fahringer, An Automatic Input-Sensitive Approach for Heterogeneous Task Partitioning, ICS 2013
Improved usage of MPI PrimitivesS. Pellegrini, T. Hoefler, T. Fahringer, On the Effects of CPU Caches on MPI Point-to-Point Communications, Cluster 2012
![Page 21: INSPIRE The Insieme Parallel Intermediate Representation](https://reader033.vdocument.in/reader033/viewer/2022051420/56816005550346895dcf0678/html5/thumbnails/21.jpg)
Conclusion INSPIRE is designed to
represent and unify parallel applications to analyze and manipulate parallel codes provide the foundation for researching parallel
language extensions
based on comprehensive parallel model sufficient to cover leading standards for parallel
programming
Practicality has been demonstrated by a variety of derived work
![Page 22: INSPIRE The Insieme Parallel Intermediate Representation](https://reader033.vdocument.in/reader033/viewer/2022051420/56816005550346895dcf0678/html5/thumbnails/22.jpg)
Thank You!Visit: http://insieme-compiler.orgContact: [email protected]
![Page 23: INSPIRE The Insieme Parallel Intermediate Representation](https://reader033.vdocument.in/reader033/viewer/2022051420/56816005550346895dcf0678/html5/thumbnails/23.jpg)
Types 7 type constructors:
![Page 24: INSPIRE The Insieme Parallel Intermediate Representation](https://reader033.vdocument.in/reader033/viewer/2022051420/56816005550346895dcf0678/html5/thumbnails/24.jpg)
Expressions 8 kind of expressions:
![Page 25: INSPIRE The Insieme Parallel Intermediate Representation](https://reader033.vdocument.in/reader033/viewer/2022051420/56816005550346895dcf0678/html5/thumbnails/25.jpg)
Statements 9 types of statements: