![Page 1: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/1.jpg)
Make HPC Easy with Domain-Specific Languages and High-Level Frameworks
Biagio Cosenza, Ph.D.DPS Group, Institut für Informatik
Universität Innsbruck, Austria
![Page 2: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/2.jpg)
HPC Seminar at FSP Scientific Computing, Innsbruck, May 15th, 2013
Outline
• Complexity in HPC– Parallel hardware– Optimizations– Programming models
• Harnessing compexity– Automatic tuning– Automatic parallelization – DSLs– Abstractions for HPC
• Related work in Insieme
![Page 3: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/3.jpg)
COMPLEXITY IN HPC
![Page 4: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/4.jpg)
HPC Seminar at FSP Scientific Computing, Innsbruck, May 15th, 2013
Complexity in Hardware
• The need of parallel computing• Parallelism in hardware• Three walls– Power wall – Memory wall – Instruction Level Parallelism
![Page 5: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/5.jpg)
HPC Seminar at FSP Scientific Computing, Innsbruck, May 15th, 2013
The Power WallPower is expensive, but transistors are free
• We can put more transistors on a chip than we have the power to turn on• Power efficiency challenge
– Performance per watt is the new metric – systems are often constrained by power & cooling
• This forces us to concede the battle for maximum performance of individual processing elements, in order to win the war for application efficiency through optimizing total system performance
• Example– Intel Pentium 4 HT 670 (released on May 2005)
• Clock rate 3.8 GHz– Intel Core i7 3930K Sandy Bridge (released on Nov. 2011)
• Clock rate 3.2 GHz
![Page 6: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/6.jpg)
HPC Seminar at FSP Scientific Computing, Innsbruck, May 15th, 2013
The Memory Wall
The growing disparity of speed between CPU and memory outside the CPU chip, would become an
overwhelming bottleneck
• It change the way we optimize programs– Optimize for memory vs optimize computation
• E.g. multiply is no longer considered a harming slow operation, if compared to load and store
![Page 7: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/7.jpg)
HPC Seminar at FSP Scientific Computing, Innsbruck, May 15th, 2013
The ILP WallThere are diminishing returns on finding more ILP
• Instruction Level Parallelism– The potential overlap among instructions – Many ILP techniques
• Instruction pipelining • Superscalar execution • Out-of-order execution • Register renaming • Branch prediction
• The goal of compiler and processor designers is to identify and take advantage of as much ILP as possible
• There is an increasing difficulty of finding enough parallelism in a single instructions stream to keep a high-performance single-core processor busy
![Page 8: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/8.jpg)
Parallelism in Hardware
Xeon Phi 5110p Tesla K20x FirePro S10000 Cortex A50 TILE-Gx8072 Power7+
Company Intel NVidia AMD ARM Tilera IBM
Memory 8 GB 320 GB/s bandwidth
6 GB250 GB/sec bandwidth
6 GB480 GB/s bandwidth (dual)
4 GB and banked L2
23Mbyteon chip cache32K /core256 KB L2/core18 MB L3 cache
2 MB L2 cache (256KB core)32 MB of L3 cache (4 MB per core) for the 8-core SCM.
Cores 60 (240 treads)
2688 CUDA cores, arranged in SMs
2x1792 stream processors
up to 16(4x4 cluster)
72 8-core SCM, 64 with 4 drawers(4 SMT threads per core)
Core frequency
1.053 GHz 1 Ghz 825 Mhz 1.0 GHz 4.14 GHz
SIMD/SIMT 512 bit 32 th. warp 64 th. wavefront
32, 16, and 8 bit ops
![Page 9: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/9.jpg)
The “Many-core” challenges
• Many-core vs multi-core– Multi-core architectures
and programming models suitable for 2 to 32 processors will not easily incrementally evolve to serve many-core systems of 1000’s of processors
– Many-core is the future
Tilera TILE-Gx807
![Page 10: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/10.jpg)
HPC Seminar at FSP Scientific Computing, Innsbruck, May 15th, 2013
What does it mean?
• Hardware is evolving– The number of cores is the new Megahertz
• We need– New programming model– New system software– New supporting architecture that are naturally
parallel
![Page 11: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/11.jpg)
HPC Seminar at FSP Scientific Computing, Innsbruck, May 15th, 2013
New Challenges
• Make easy to write programs that execute efficiently on highly parallel computing systems – The target should be 1000s of cores per chip– Maximize productivity
• Programming models should– be independent of the number of processors– support successful models of parallelism, such as task-level
parallelism, word-level parallelism, and bit-level parallelism• “Autotuners” should play a larger role than
conventional compilers in translating parallel programs
![Page 12: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/12.jpg)
Parallel Programming ModelsReal-Time Worksop
(MathWorks) Binary Modular Data Flow Machine(TU Munich and AS Nuremberg)
MPI
Pthreads
MapReduce(Google)
StreamIt(MIT&Microsoft)
CUDA(NVidia)
OpenCL(Khronos Group)Brook
(Stanford)DataCutter(Maryland)
OpenMP
Thread Building Blocks(Intel)
Cilk(MIT)
NESL(CMU)
HPCS Chapel(Cray)
HPCS X10(IBM)
HPCS Fortress(Sun)Sequoia
(Stanford)
Charm(Illinois)
Erlang
Borealis(Brown)
HMPPOpenAcc
![Page 13: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/13.jpg)
Parallel Programming ModelsReal-Time Worksop
(MathWorks) Binary Modular Data Flow Machine(TU Munich and AS Nuremberg)
MPI
Pthreads
MapReduce(Google)
StreamIt(MIT&Microsoft)
CUDA(NVidia)
OpenCL(Khronos Group)Brook
(Stanford)DataCutter(Maryland)
OpenMP
Thread Building Blocks(Intel)
Cilk(MIT)
NESL(CMU)
HPCS Chapel(Cray)
HPCS X10(IBM)
HPCS Fortress(Sun)Sequoia
(Stanford)
Charm(Illinois)
Erlang
Borealis(Brown)
HMPPOpenAcc
![Page 14: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/14.jpg)
HPC Seminar at FSP Scientific Computing, Innsbruck, May 15th, 2013
Reconsidering…
• Applications– What are common parallel kernel applications?– Parallel patterns
• Instead of traditional benchmarks, design and evaluate parallel programming models and architectures on parallel patterns
• A parallel pattern (“dwarf”) is an algorithmic method that captures a pattern of computation and communication
• E.g. dense linear algebra, sparse algebra, spectral methods, …
• Metrics– Scalability
• An old belief was that less than linear scaling for a multi-processor application is failure
• With new hardware trend, this is no longer true– Any speedup is OK!
![Page 15: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/15.jpg)
HARNESSING COMPLEXITY
![Page 16: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/16.jpg)
HPC Seminar at FSP Scientific Computing, Innsbruck, May 15th, 2013
Harnessing Complexity
• Compiler approaches – DSL, automatic parallelization, …
• Library-based approaches
![Page 17: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/17.jpg)
HPC Seminar at FSP Scientific Computing, Innsbruck, May 15th, 2013
What a compiler can do for us?
• Optimize code• Automatic tuning• Automatic code generation– e.g. in order to support different hardware
• Automatically parallelize code
![Page 18: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/18.jpg)
Automatic Parallelization
Critical opinions on parallel programming model:
The other way:• Auto-parallelizing compilers– Sequential code => parallel code
Wen-mei Hwu, University of Illinois, Urbana-ChampaignWhy sequential programming models could be the best way to program many-core
systemshttp://view.eecs.berkeley.edu/w/images/3/31/Micro-keynote-hwu-12-11-2006_.pdf
![Page 19: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/19.jpg)
Automatic Parallelization
• Nowadays compilers have new “tools” for analysis– Polyhedral model
• …but performance– are still far from a
manual parallelization approach
IR
Polyhedral Model
Analyses & Transformations
for(int i=0;i<100;i++) { A[i] = A[i+1];}
Code generation:• Generate IR code
from model
Polyhedral extraction:• SCoP detection• Translation to polyhedral
D: { i in N: 0 <= i < 100 }R: A[ i ] for each i in DW: A[i+1] for each i in D
![Page 20: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/20.jpg)
HPC Seminar at FSP Scientific Computing, Innsbruck, May 15th, 2013
Autotuners vs Traditional Compilers
• Performance of future parallel applications will crucially depend on the quality of the code generated by the compiler
• The compiler selects which optimizations to perform, chooses parameters for these optimizations, and selects from among alternative implementations of a library kernel
• The resulting space of optimization is large• Programming model may simplify the problem– but not to solve it
![Page 21: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/21.jpg)
HPC Seminar at FSP Scientific Computing, Innsbruck, May 15th, 2013
Optimizations’ ComplexityAn example
Input• Openmp code• Simple parallel codes– matrix multiplication, jacobi, stencil3d,…
• Few optimizations and tuning parameters– Tiling 2d/3d– # of threads
Goal: Optimize for performance and efficiency
![Page 22: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/22.jpg)
HPC Seminar at FSP Scientific Computing, Innsbruck, May 15th, 2013
Optimizations’ ComplexityAn example
• Problem– Big search space• brute force takes year of computation
– Analytical model fails to find the best configuration• Solution– Multi-objective search• Offline search of Pareto front solutions• Runtime selection according to the objective
– Multi versioning
H. Jordan, P. Thoman, J. J. Durillo, S. Pellegrini, P. Gschwandtner, T. Fahringer, H. Moritsch A Multi-Objective Auto-Tuning Framework for Parallel Codes
ACM Super Computing, 2012
![Page 23: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/23.jpg)
HPC Seminar at FSP Scientific Computing, Innsbruck, May 15th, 2013
Optimizations’ Complexity
Input Code
Parallel Target Platform
Analyzer1
Optimizer
2
CodeRegions
ConfigurationsMeasure-
ments3
Backend
BestSolutions
4
Multi-Versioned
Code
5
Runtime System
DynamicSelection6
compile time runtime
H. Jordan, P. Thoman, J. J. Durillo, S. Pellegrini, P. Gschwandtner, T. Fahringer, H. Moritsch A Multi-Objective Auto-Tuning Framework for Parallel Codes
ACM Super Computing, 2012
![Page 24: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/24.jpg)
Domain Specific Languages
• Easy of programming– Use of domain specific concepts• E.g. “color”, “pixel”, “particle”, “atom”
– Simple interface• Hide complexity– Data structures– Parallelization issues– Optimizations’ tuning– Address specific parallelization pattern
![Page 25: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/25.jpg)
Domain Specific Languages
• DSL may help parallelization– Focus on domain concepts and abstractions– Language constraints may help automatic parallelization by
compilers• 3 major benefits– Productivity– Performance– Portability and forward scalability
![Page 26: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/26.jpg)
HPC Seminar at FSP Scientific Computing, Innsbruck, May 15th, 2013
Domain Specific LanguagesGLSL Shader (OpenGL)
OpenGL 4.3 Pipeline
PrimitiveSetup and
Rasterization
FragmentShader Blending
VertexData
PixelData
VertexShader
TextureStore
GeometryShader
TessellationControlShader
TessellationEvaluation
Shader
![Page 27: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/27.jpg)
attribute vec3 vertex;attribute vec3 normal;attribute vec2 uv1;uniform mat4 _mvProj;uniform mat3 _norm;varying vec2 vUv;varying vec3 vNormal;
void main(void) { // compute position gl_Position = _mvProj * vec4(vertex, 1.0);
vUv = uv1; // compute light info vNormal= _norm * normal;}
varying vec2 vUv;varying vec3 vNormal;uniform vec3 mainColor;uniform float specularExp;uniform vec3 specularColor;uniform sampler2D mainTexture;uniform mat3 _dLight;uniform vec3 _ambient;void getDirectionalLight(vec3 normal, mat3 dLight, float specularExp, out vec3 diffuse, out float specular){ vec3 ecLightDir = dLight[0]; // light direction in eye coordinates vec3 colorIntensity = dLight[1]; vec3 halfVector = dLight[2]; float diffuseContribution = max(dot(normal, ecLightDir), 0.0); float specularContribution = max(dot(normal, halfVector), 0.0); specular = pow(specularContribution, specularExponent);
diffuse = (colorIntensity * diffuseContribution);}void main(void) { vec3 diffuse; float spec; getDirectionalLight(normalize(vNormal), _dLight, specularExp, diffuse, spec); vec3 color = max(diffuse,_ambient.xyz)*mainColor; gl_FragColor = texture2D(mainTexture,vUv) * vec4(color,1.0) + vec4(specular*specularColor,0.0);}
PrimitiveSetup and
Rasterization
FragmentShader Blending
VertexData
PixelData
VertexShader
TextureStore
GeometryShader
TessellationControlShader
TessellationEvaluation
Shader
fragmentvertex
![Page 28: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/28.jpg)
DSL Examples
Matlab, DLA DSL (dense linear algebra), Python, shell script, SQL, XML, CSS, BPEL, …
• Interesting recent research work
A. S. Green, P. L. Lumsdaine, N. J. Ross, and B. Valiron Quipper: A Scalable Quantum Programming Language
ACM PLDI 2013
Charisee Chiw, Gordon Kindlmann, John Reppy, Lamont Samuels, Nick SeltzerDiderot: A Parallel DSL for Image Analysis and Visualization
ACM PLDI 2012
Leo A. Meyerovich, Matthew E. Torok, Eric Atkinson, Rastislav Bodık Superconductor: A Language for Big Data Visualization LASH-C 2013
![Page 29: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/29.jpg)
HPC Seminar at FSP Scientific Computing, Innsbruck, May 15th, 2013
Harnessing Complexity
• Compilers can do– Automatic parallelization– Optimization of (parallel) code– DSL and code generation
• But well written and optimized parallel code still outperforms a compiler based approach
![Page 30: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/30.jpg)
HPC Seminar at FSP Scientific Computing, Innsbruck, May 15th, 2013
Harnessing Complexity
• Compiler approaches – DSL, automatic parallelization, …
• Library-based approaches
![Page 31: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/31.jpg)
HPC Seminar at FSP Scientific Computing, Innsbruck, May 15th, 2013
Some Examples
• Pattern oriented– MapReduce (Google)
• Problem specific– FLASH, adaptive-mesh refinement (AMR)
code– GROMACS, molecular dynamics
• Hardware/programming model specific– Cactus– libWater*
bestperformance
![Page 32: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/32.jpg)
HPC Seminar at FSP Scientific Computing, Innsbruck, May 15th, 2013
Insieme Compiler and Research
• Compiler infrastructure• Runtime support
![Page 33: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/33.jpg)
Insieme Research: Automatic Task Partitioning for Heterogeneous HW
• Heterogeneous platforms– E.g. CPU + 2 GPUs
• Input: OpenCL for single device• Output: OpenCL code for multiple devices• Automatic partitioning of work-items between
multiple devices– Based on hw, program and input size
• Machine-learning approachK. Kofler, I. Grasso, B. Cosenza, T. Fahringer
An Automatic Input-Sensitive Approach for Heterogeneous Task PartitioningACM International Conference on Supercomputing, 2013
![Page 34: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/34.jpg)
HPC Seminar at FSP Scientific Computing, Innsbruck, May 15th, 2013
Results – Architecture 1
DataTrans
VectorA
dd
MatMul
BlackSch
oles
SineWave
Convolution
MolecularD
ynSP
MVLin
Reg
KmeansKNN
SYR2K
SobelFi
lter
MedianFilter
RayInterse
ct
FTLE
FC
FlowMap
Reduction
PerlinNoise
GeoMean
MersTwist
er
Compression
Pendulum0
20
40
60
80
100
CPUGPUANN
![Page 35: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/35.jpg)
HPC Seminar at FSP Scientific Computing, Innsbruck, May 15th, 2013
Results – Architecture 2
DataTrans
VectorA
dd
MatMul
BlackSch
oles
SineWave
Convolution
MolecularD
ynSP
MVLin
Reg
KmeansKNN
SYR2K
SobelFi
lter
MedianFilter
RayInterse
ct
FTLE
FC
FlowMap
Reduction
PerlinNoise
GeoMean
MersTwist
er
Compression
Pendulum0
20
40
60
80
100
CPUGPUANN
![Page 36: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/36.jpg)
Insieme Research: OpenCL on Cluster of Heterogeneous Nodes
• libWater• OpenCL extensions for clusters– Event based, extension on OpenCL event– Supporting intra-deice synchronization
• DQL– A DSL language for device query, management and
discovery
I. Grasso, S. Pellegrini, B. Cosenza, T. Fahringer libWater: Heterogeneous Distributed Copmuting Made Easy
ACM International Conference on Supercomputing, 2013
![Page 37: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/37.jpg)
libWater
• Runtime– OpenCL– pthread, opemp– MPI
• DAG command event representation
![Page 38: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/38.jpg)
libWater: DAG Optimizations
• Dynamic Collective communication pattern Replacement (DCR)
• Latency hiding• Intra-node copy
optimizations
![Page 39: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/39.jpg)
![Page 40: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/40.jpg)
Insieme (Ongoing) Research:Support for DSLs
InputCodesDSL Frontend Backend
pthreadsOpenCL
MPI
InputCodes
OutputCodes
Transformation Framework
Polyhedral modelParallel optimizations Stencil computation
Automatic tuning support
IntermediateRepresentation
Library SupportRendering algorithm
implementations, geometry loader, …
RuntimeSystem
Target hardware:GPU, CPU, heterogeneous platform, compute cluster
![Page 41: Make HPC Easy with Domain-Specific Languages and High-Level Frameworks](https://reader036.vdocument.in/reader036/viewer/2022062410/568165b5550346895dd8b0bc/html5/thumbnails/41.jpg)
HPC Seminar at FSP Scientific Computing, Innsbruck, May 15th, 2013
About Insieme
• Insieme compiler– Research framework– OpenMP, Cilk, MPI, OpenCL– Run time, IR– Support for polyhedral model– Multi-objective optimization– Machine learning – Extensible
• Insieme (GPL) and libWater (LGPL) soon available on GitHub