building-blocks for performance oriented dsls · performance oriented dsls are a promising parallel...
TRANSCRIPT
![Page 1: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/1.jpg)
Building-Blocks for
Performance Oriented DSLs Tiark Rompf, Martin Odersky
EPFL
Arvind Sujeeth, HyoukJoong Lee, Kevin Brown, Hassan Chafi, Kunle Olukotun
Stanford University
![Page 2: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/2.jpg)
DSL Benefits
Make programmers more productive
Raise the level of abstraction
Easier to reason about programs
Maintenance, verification, etc
![Page 3: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/3.jpg)
Performance Oriented DSLs
Make compiler more productive, too!
Generate better code
Optimize using domain knowledge
Target heterogeneous + parallel hardware
![Page 4: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/4.jpg)
DSLs under Development
Liszt (mesh based PDE solvers) DeVito et al.: Liszt: A Domain-Specific Language for Building Portable
Mesh-based PDE solvers. Supercomputing (SC) 2011
OptiML (machine learning) Sujeeth et al.: OptiML: An Implicitly Parallel Domain-Specific Language
for Machine Learning. International Conference for Machine Learning (ICML) 2011
OptiQL (data query)
all embedded in Scala
heterogeneous compilation (multi core CPU/GPU)
good absolute performance and speedups
![Page 5: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/5.jpg)
Common DSL Infrastructure
Don’t start from scratch for each new DSL It’s just too hard …
Delite Framework + Runtime See also Brown et al.: A Heterogeneous Parallel Framework for
Domain-Specific Languages. PACT’11
This Talk/Paper: Building blocks that work together in new or interesting ways
![Page 6: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/6.jpg)
Focus on 2 things:
#1: DeliteOps high-level view of common execution
patterns (i.e. loops)
parallelism and heterogeneous targets
#2: Staging DSL programs are program generators
move (costly) abstraction to generating stage
Case study: SPADE app in OptiML
![Page 7: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/7.jpg)
#1: DeliteOps
![Page 8: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/8.jpg)
Heterogeneous Parallel Programming
Cray
Jaguar
Sun
T2
Nvidia
Fermi
Altera
FPGA
MPI
Pthreads OpenMP
CUDA OpenCL
Verilog VHDL
Today: Performance = heterogeneous + parallel
![Page 9: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/9.jpg)
Heterogeneous Parallel Programming
Cray
Jaguar
Sun
T2
Nvidia
Fermi
Altera
FPGA
MPI
Pthreads OpenMP
CUDA OpenCL
Verilog VHDL
Your favourite Java, Haskell, Scala, C++ compiler will not generate code for these platforms.
Compilers have not kept pace!
![Page 10: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/10.jpg)
Programmability Chasm
Too many different programming models
Cray
Jaguar
Sun
T2
Nvidia
Fermi
Altera
FPGA
MPI
Pthreads OpenMP
CUDA OpenCL
Verilog VHDL
Virtual
Worlds
Personal
Robotics
Data informatics
Scientific
Engineering
Applications
![Page 11: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/11.jpg)
DeliteOps
Capture common parallel execution patterns map, filter, reduce, … join, bfs, …
Map them efficiently to a variety of target platforms Multi core CPU, GPU
Express your DSL as DeliteOps => Parallelism for free!
![Page 12: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/12.jpg)
Intermediate Representation (IR)
Delite DSL Compiler
Provide a common IR that can be extended while still benefitting from generic analysis and opt.
Extend common IR and provide IR nodes that encode data parallel execution patterns
Now can do parallel optimizations and mapping
DSL extends appropriate data parallel nodes for their operations
Now can do domain-specific analysis and opt.
Generate an execution graph, kernels and data structures
Scala Embedding
Framework
Delite
Execution
Graph
Delite Parallelism
Framework
Base IR
Generic
Analysis & Opt.
Code Generation
Kernels
(Scala, C,
Cuda, MPI
Verilog, …)
Liszt
program
OptiML
program
DS IR
Domain
Analysis & Opt.
Delite IR
Parallelism Analysis,
Opt. & Mapping
⇒ ⇒
Data Structures
(arrays, trees,
graphs, …)
![Page 13: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/13.jpg)
Delite Op Fusion
Operates on all loop-based ops
Reduces op overhead and improves locality
Elimination of temporary data structures
Merging loop bodies may enable further optimizations
Fuse both dependent and side-by-side operations
Fused ops can have multiple inputs + outputs
Algorithm: fuse two loops if
size(loop1) == size(loop2)
No mutual dependencies (which aren’t removed by fusing)
![Page 14: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/14.jpg)
Delite Op Fusion
def square(x: Rep[Double]) = x*x def mean(xs: Rep[Array[Double]]) = xs.sum / xs.length def variance(xs: Rep[Array[Double]]) = xs.map(square) / xs.length - square(mean(xs)) val array1 = Array.fill(n) { i => 1 } val array2 = Array.fill(n) { i => 2*i } val array3 = Array.fill(n) { i => array1(i) + array2(i) } val m = mean(array3) val v = variance(array3) println(m) println(v)
// begin reduce x47,x51,x11 var x47 = 0 var x51 = 0 var x11 = 0 while (x11 < x0) { val x44 = 2.0*x11 val x45 = 1.0+x44 val x50 = x45*x45 x47 += x45 x51 += x50 x11 += 1 } // end reduce val x48 = x47/x0 val x49 = println(x48) val x52 = x51/x0 val x53 = x48*x48 val x54 = x52-x53 val x55 = println(x54)
3+1+(1+1) = 6 traversals, 4 arrays 1 traversal, 0 arrays
![Page 15: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/15.jpg)
#2: Staging
How do we go from DSL source to DeliteOps?
![Page 16: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/16.jpg)
2 Challenges:
#1: generate intermediate representation (IR) from DSL code embedded in Scala
#2: do it in such a way that the IR is free from unnecessary abstraction
Avoid abstraction penalty!
![Page 17: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/17.jpg)
Example val v = Vector.rand(100)
println("today’s lucky number is: ")
println(v.sum)
abstract class Vector[T]
def vector_rand(n: Rep[Int]): Rep[Vector[Double]]
def infix_sum[T:Numeric](v: Rep[Vector[T]]): Rep[T]
DSL program
DSL interface
case class VectorRand(n: Exp[Int]) extends Def[Vector[Double]
case class VectorSum[T:Numeric](in: Exp[Vector[T]])
extends DeliteOpReduce[Exp[T]] {
def func = (a,b) => a + b
}
def vector_rand(n: Exp[Int]) = new VectorRand(n)
def infix_sum[T:Numeric](v: Exp[Vector[T]]) = new VectorSum(v)
type
Rep[T]
type
Rep[T] =
Exp[T]
class
Exp[T]
class
Def[T]
DSL imlpl.
![Page 18: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/18.jpg)
“Finally Tagless” / Polymorphic embedding Carette, Kiselyov, Shan: Finally Tagless, Partially Evaluated: Tagless
Staged Interpreters for Simpler Typed Languages. APLAS’07/J. Funct. Prog. 2009.
Hofer, Ostermann, Rendel, Moors: Polymorphic Embeddings of DSLs. GPCE’08.
Lightweight Modular Staging (LMS) Rompf, Odersky: Lightweight Modular Staging: A Pragmatic
Approach to Runtime Code Generation and Compiled DSLs. GPCE’10.
![Page 19: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/19.jpg)
Can use the full host language to compose DSL program fragments!
Move (costly) abstraction to the generating stage
![Page 20: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/20.jpg)
Example
Use higher order functions in DSL programs
While keeping the DSL first order!
![Page 21: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/21.jpg)
Higher-Order functions
val xs: Rep[Vector[Int]] = …
println(xs.count(x => x > 7))
def infix_foreach[A](v: Rep[Vector[A]])(f: Rep[A] => Rep[Unit]) = {
var i: Rep[Int] = 0
while (i < v.length) {
f(v(i))
i += 1
}
}
def infix_count[A](v: Rep[Vector[A]])(f: Rep[A] => Rep[Boolean]) = {
var c: Rep[Int] = 0
v foreach { x => if (f(x)) c += 1 }
c
}
val v: Array[Int] = ...
var c = 0
var i = 0
while (i < v.length) {
val x = v(i)
if (x > 7)
c += 1
i += 1
}
println(c)
![Page 22: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/22.jpg)
Continuations
val u,v,w: Rep[Vector[Int]] = ...
nondet {
val a = amb(u)
val b = amb(v)
val c = amb(w)
require(a*a + b*b == c*c)
println("found:")
println(a,b,c)
}
def amb[T](xs: Rep[Vector[T]]): Rep[T] @cps[Rep[Unit]] = shift { k =>
xs foreach k
}
def require(x: Rep[Boolean]): Rep[Unit] @cps[Rep[Unit]] = shift { k =>
if (x) k() else ()
}
while (…) {
while (…) {
while (…) {
if (…) {
println("found:")
println(a,b,c)
}
}
}
}
![Page 23: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/23.jpg)
Result
Function values and continuations translated away by staging
Control flow strictly first order
Much simpler analysis for other optimizations
![Page 24: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/24.jpg)
Regular Compiler optimizations
Common subexpression and dead code elimination
Global code motion
Symbolic execution / pattern rewrites
Coarse-grained: optimizations can happen on vectors, matrices or whole loops
![Page 25: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/25.jpg)
In the Paper:
Removing data structure abstraction
Partial evaluation/symbolic execution of staged IR
Effect abstractions
Extending the framework/modularity
![Page 26: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/26.jpg)
Case Study: OptiML
A DSL For Machine Learning
![Page 27: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/27.jpg)
OptiML: A DSL For Machine Learning
Provides a familiar (MATLAB-like) language and API for writing ML applications Ex. val c = a * b (a, b are Matrix[Double])
Implicitly parallel data structures General data types: Vector[T], Matrix[T], Graph[V,E]
Independent from the underlying implementation
Specialized data types: Stream, TrainingSet, TestSet, IndexVector, Image, Video ..
Encode semantic information & structured, synchronized communication
Implicitly parallel control structures sum{…}, (0::end) {…}, gradient { … }, untilconverged { … }
Allow anonymous functions with restricted semantics to be passed as arguments of the control structures
![Page 28: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/28.jpg)
Putting it all together: SPADE
kernelWidth
Downsample:
L1 distances between all 106 events in 13D
space… reduce to 50,000 events
val distances = Stream[Double](data.numRows, data.numRows){ (i,j) => dist(data(i),data(j)) } for (row <- distances.rows) { if(densities(row.index) == 0) { val neighbors = row find { _ < apprxWidth } densities(neighbors) = row count { _ < kernelWidth } } }
![Page 29: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/29.jpg)
val distances = Stream[Double](data.numRows, data.numRows){
(i,j) => dist(data(i),data(j))
}
for (row <- distances.rows) {
row.init // expensive! part of the stream foreach operation
if(densities(row.index) == 0) {
val neighbors = row find { _ < apprxWidth }
densities(neighbors) = row count { _ < kernelWidth }
}
}
SPADE transformations
row is 235,000 elements
in one typical dataset –
fusing is a big win!
![Page 30: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/30.jpg)
SPADE generated code
// FOR EACH ELEMENT IN ROW
while (x155 < x61) {
val x168 = x155 * x64
var x180 = 0
// INITIALIZE STREAM VALUE (dist(i,j))
while (x180 < x64) {
val x248 = x164 + x180
// …
}
// VECTOR FIND
if (x245) x201.insert(x201.length, x155)
// VECTOR COUNT
if (x246) {
val x207 = x208 + 1
x208 = x207
}
x155 += 1
}
From a ~5 line algorithm description in OptiML
…to an efficient, fused, imperative version that closely resembles a hand-optimized C++ baseline!
![Page 31: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/31.jpg)
Impact of Op Fusion
0.9
1.8
3.3
5.6
1.0
1.9
3.4
5.8
0.3
0.6
0.9
1.0
0
0.5
1
1.5
2
2.5
3
3.5
1 2 4 8
No
rm
alized
Execu
tio
n T
ime
Processors
C++ OptiML Fusing OptiML No Fusing
![Page 32: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/32.jpg)
Experiments on larger apps 1.0
1.7
3.1
4.9
0.7
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
1.60
1 CPU 2 CPU 4 CPU 8 CPU
No
rm
alized
Execu
tion
Tim
e
TM
OptiML C++
1.0
1.9
3.4
5.8
0.9
1.8
3.3
5.6
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1 CPU 2 CPU 4 CPU 8 CPU
SPADE
1.0
1.7
2.5
3.3
1.2
1.5
3.5
5.4
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1 CPU 2 CPU 4 CPU 8 CPU
LBP
![Page 33: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/33.jpg)
Experiments on ML kernels 1
.0
1.6
1.8
1.9
41
.3
0.5
0.9
1.4
1.6
2.6
13
.2
0.0
0.5
1.0
1.5
2.0
2.5
1 CPU 2 CPU 4 CPU 8 CPU CPU +
GPUNo
rm
ali
zed
Execu
tio
n T
ime
GDA
1.0
2.1
4.1
7.1
2.3
0.3
0.4
0.4
0.4
0.3
0.3
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
1 CPU 2 CPU 4 CPU 8 CPU CPU +
GPU
K-means
1.0
1.7
2.7
3.5
11
.0
1.0
1.9
3.2
4.7
8.9
16
.1
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1 CPU 2 CPU 4 CPU 8 CPU CPU +
GPU
RBM
1.0
1.9
3.8
5.8
1.1
0.1
0.2
0.2
0.3
0.1
0.0
2.0
4.0
6.0
8.0
10.0
1 CPU 2 CPU 4 CPU 8 CPU CPU +
GPU
0.0
1
100.0
110.0
Naive Bayes
..
1.0
1.4
2.0
2.3
1.6
0.5
0.9
1.3
1.1
0.4
0.3
0.0
1.0
2.0
3.0
4.0
1 CPU 2 CPU 4 CPU 8 CPU CPU +GPU
Linear Regression
1.0
1.9
3.1
4.2
1.1
0.9
1.2
1.4
1.4
0.0
0.5
1.0
1.5
2.0
1 CPU 2 CPU 4 CPU 8 CPU CPU +
GPU
0.1
7.0
15.0
SVM
..
0.2
OptiML Parallelized MATLAB MATLAB + Jacket
![Page 34: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/34.jpg)
Summary
Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and
high performance
Delite can simplify the task of implementing DSLs
OptiML outperforms MATLAB and C++ on a set of well known machine learning applications, with expressive code
![Page 35: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/35.jpg)
Questions?
![Page 36: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/36.jpg)
Performance
Productivity Generality
Programming Language Design Space
![Page 37: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/37.jpg)
Performance
Productivity Generality
Programming Language Design Space
![Page 38: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/38.jpg)
General Purpose Languages
Performance
Productivity Generality
Performance oriented
DSLs
![Page 39: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/39.jpg)
We need to develop all these DSLs
Current DSL methods are unsatisfactory
DSLs Present New Problem
![Page 40: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/40.jpg)
Current DSL Development Approaches
Stand-alone DSLs Can include extensive optimizations Enormous effort to develop to a sufficient degree of maturity
Actual Compiler/Optimizations Tooling (IDE, Debuggers,…)
Interoperation between multiple DSLs is very difficult
Purely embedded DSLs ⇒ “just a library” Easy to develop (can reuse full host language) Easier to learn DSL Can Combine multiple DSLs in one program Can Share DSL infrastructure among several DSLs Hard to optimize using domain knowledge Target same architecture as host language
Need to do better
![Page 41: Building-Blocks for Performance Oriented DSLs · Performance oriented DSLs are a promising parallel programming platform Capable of achieving portability, productivity, and high performance](https://reader036.vdocument.in/reader036/viewer/2022070804/5f0348af7e708231d4087360/html5/thumbnails/41.jpg)
DSLs: trade off generality for productivity and performance
DSL embedding:
Combine benefits of pure embedding with
analyzability of external dsls