![Page 1: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/1.jpg)
Towards Intelligent Programing Systems for Modern Computing
Computer Science, North Carolina State University
Xipeng Shen
![Page 2: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/2.jpg)
Unprecedented Scale
2sources: SciDAC, IBM
Y201120 petaflops10mw power
Y202X1000 petaflops20mw power
50X perf
2X power!
![Page 3: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/3.jpg)
Heterogeneity becomes Norm
3
Massively parallel accelerators are
becoming ubiquitous.
![Page 4: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/4.jpg)
Thesis
4
To address the challenges in modern computing, one of the keys exists in making programming systems more intelligent.
For advancing programming systems, right problem formulating goes a long way.
![Page 5: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/5.jpg)
Modern Computing
5
Application Data analytics, Machine learning, …
Infrastructure Data centers, Cloud, IoT, …
Architecture Heterogeneous parallel processors, Emerging complex memory, …
TOP Algorithmic optimizer for data analytics [VLDB’15,ICML’15]
GStreamline+ PORPLE Memory optimization for GPU [ASPLOS’11, Micro’14, ICS’16]
![Page 6: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/6.jpg)
TOP: Enabling Algorithmic Optimizations for
Distance-Related Problems
6
VLDB’2015, ICML’2015
Up to 100s X speedups.
Yufei Ding
![Page 7: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/7.jpg)
Role of Compiler
7
ML Algorithm
Implementation
Execution
compiler
Runtime system/Architecture
compiler
ML experts
co-‐design
Can compilers optimize algorithms?
Learning Problem
compiler
![Page 8: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/8.jpg)
Why algorithm level?
8
Reason 1: Large benefits: orders of magnitude speedups at no extra cost.
Reason 2: Compiler may outsmart ML experts. Really?
![Page 9: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/9.jpg)
Example
9
a
d b
Triangular Inequality: a-‐b ≤ d ≤ a+b
C1
C2X
K-‐Means
![Page 10: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/10.jpg)
NIPS’2012
10
K-Means
SIAM’2010
ICML’2003
![Page 11: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/11.jpg)
11
K-NN
IJCNN’11
VisionInterface’10
SSDM’10
![Page 12: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/12.jpg)
12
P2P: Point-to-Point Shortest PathSIAM’05
ALENEX’04
![Page 13: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/13.jpg)
Observations
13
• TI has led to many enhanced algorithms across problems and domains.
• Applying TI well is tricky, hence the many manual efforts and publications.
Thoughts• Can we have an abstraction to represent all the problems? • Can we then generalize the TI optimizations into compiler-‐based transformations?
![Page 14: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/14.jpg)
14
Query Point Set Target Point Set
Distance
Relation
Constraints
Abstract Distance Problem(Q, T, D, R, C)
KMeansKNN ICP Shortest DistanceNBodyKNN join
![Page 15: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/15.jpg)
15
Abstract Distance-‐Related Problem Essence & 7 Principles of TI Optimizations
KMeansKNN ICP Shortest DistanceNBodyKNN join
Our Analysis and Abstraction
![Page 16: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/16.jpg)
16
Key Insights•Reuse through Landmarks
•Spatial & temporal reuses
•Elasticity through hierarchical landmarks
•Efficient bounds update through ghosts for iterative alg.
•Order of comparison
Lq2
q1 t1t2q3 t3
See VLDB’15 for details.
![Page 17: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/17.jpg)
17
Abstract Distance-‐Related Problem Essence & 7 Principles of TI Optimizations
KMeansKNN ICP Shortest DistanceNBodyKNN join
TOP Framework
TOP API
Compilerproblem semantic
building blocks
Opt Lib
![Page 18: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/18.jpg)
18
TOP APIBasic algorithm description
Compiler
Staged program code
TI Opt LibEfficient execution
Usage
TOP_defDistance(Euclidean);T = init();changedFlag = 1;while (changedFlag){ N = TOP_findClosestTargets(1, S, T); TOP_update(T, &changedFlag, N, S); }
Ad hoc
Systematic
![Page 19: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/19.jpg)
19
Baseline: Classic K-‐means
(16GB, 8-‐core Intel Ivy Bridge)Speedu
p (X)
K-‐Means (K=1024)
TOP Yinyang K-Means
Code link in ICML’15 paper.
Clustering results are same as original method’s.
![Page 20: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/20.jpg)
20
Speedu
p (X)
Baseline: Classic K-‐means(16GB, 8-‐core)
XX
TOP
On K-‐Means
Yinyang K-Means
![Page 21: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/21.jpg)
21
Speedups(X) by manual version0 1 102 104
Spee
dups
(X) b
y TO
P ve
rsio
n
1
102
104
KnnKnnjoinKmeansICPNbodyP2PReference line
In manual version0 106 1013
In T
OP
vers
ion
106
1013
KnnKnnjoinKmeansICPNbodyP2PReference line
Average speedups: 50X vs 20X. Save at least 93% calculations.
Speedups # distance calculations
Manually Optimized Manually Optimized TOP Optim
ized
TOP Optim
ized
Insight: The right abstraction and formulation turn a compiler into an automatic algorithm optimizer, giving out large speedups.
Intel i5-4570 CPU and 8G memory
On All Benchmarks
![Page 22: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/22.jpg)
Modern Computing
22
Application Data analytics, Machine learning, …
Infrastructure Data centers, Cloud, IoT, …
Architecture Heterogeneous parallel processors, Emerging complex memory, …
TOP Algorithmic optimizer for data analytics [VLDB’15,ICML’15]
GStreamline+ PORPLE Memory optimization for GPU [ASPLOS’11, Micro’14, ICS’16]
![Page 23: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/23.jpg)
Overcome GPU Limitations
23
Guoyang Chen (Qualcomm)
Bo Wu (Prof. @ Colorado Mines)
Zheng Zhang (Prof. @ Rutgers Univ)
![Page 24: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/24.jpg)
Xipeng Shen [email protected] 24
a SIMD group(warp)
Graphic Processing Unit (GPU)
• Massive parallelism• Favorable
• computing power• cost effectiveness• energy efficiency
![Page 25: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/25.jpg)
25
Challenges
Irregular Mem & Control
Dyn Task Parallelism
Scheduling Limitations
![Page 26: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/26.jpg)
26
Our ExplorationsCompiler-based software solutions
11/069/07
5/096/10
3/1110/11
6/122/13
9/1312/14
5/156/15
12/156/16
2/17
CUDA release
LCPC talk by David Kirk
IPDPS cross input adap. opt.
ICS remove thread diverg. dyn.
ASPLOS GStreamline
PACT treat synch. correct. GPU2CPU
ICS syn. relax. & opt. GPU2CPU
PPOPP mem coalesc.
PACT NVM for GPU
Micro PORPLE
ICS SM centric
HotOS Co-‐run on Fused
Micro Free Launch
ICS Multiview
PPOPP EffiSha
Sweet KNN; VersaPipe; Lean DNN; …
5/17
IPDPS Co-‐sched on Fused System
![Page 27: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/27.jpg)
27
Solutions
Irregular Mem & Control
Dyn Task Parallelism
Scheduling Limitations
Compiler-based software solutions
SM-‐Centric & EffiSha [ics15,ppopp17]
FreeLaunch [micro15]
Monday PPoPP Session 1
GStreamline & PORPLE [asplos11,micro14, ics16]
![Page 28: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/28.jpg)
Xipeng Shen [email protected]
Dynamic Irregularities
28
A[ ]:
P[ ] = { 0, 5, 1, 7, 4, 3, 6, 2}
... = A[P[tid]];
tid: 0 1 2 3 4 5 6 7
Degrade throughput by up to (warp size - 1) times. (warp size = 32 in modern GPUs)
memory
2 4 10 0 6 0 0A[ ]:
tid: 0 1 2 3 4 5 6 7 if (A[tid]) {...}
control flow (thread divergence)
for (i=0;i<A[tid]; i++) {...}{a mem seg.
P[ ] = { 0, 1, 2, 3, 4, 5, 6, 7}
![Page 29: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/29.jpg)
Solution 1: Thread-Data Remapping
29
{a mem seg.
4 trans/warp
{
a mem seg.
1 trans/warp
Irregularity in a warp: problematic; across warps: okey!
Principle of solution:Turn intra-warp irreg. into reg. or inter-warp irreg.
![Page 30: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/30.jpg)
Trans-1: Data Reordering
30
P[ ] = {0,5,2,3,2,3,7,6}
... = A[P[tid]];
A[ ]:
tid: 0 1 2 3 4 5 6 7
A’[ ]:
tid: 0 1 2 3 4 5 6 7
<relocation>
original
... = A’[Q[tid]];
Q[ ] = {0,1,2,3,2,3,6,7}
<redirection>
transformed
tid: thread ID; : a thread; : data access; : data relocation
maintain mapping between threads &
data values
![Page 31: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/31.jpg)
Trans-2: Job Swapping • Job = operations + data elements accessed
31
newtid = Q[tid]; . . .... = A[P[newtid]];
Q[ ] = {0,4,2,3,1,5,6,7}
<redirection>
transformed
A[ ]:... = A[P[tid]];
tid: 0 1 2 3 4 5 6 7
original
P[ ] = {0,5,2,3,2,3,7,6}
A[ ]:
tid: 0 1 2 3 4 5 6 7
![Page 32: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/32.jpg)
G-Streamline[ASPLOS’2011]
32
1.08—2.5X speedups
First framework enabling runtime thread-data remapping.
CPU-GPU pipeline to hide transformation overhead.
Kernel splitting to resolve dependences.
![Page 33: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/33.jpg)
Xipeng Shen [email protected] 33
Global memory
Texture memory
Shared memory
Constant memory
L1/L2 cacheRead-only cache
Texture cache
Solution 2: Data Placement
![Page 34: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/34.jpg)
Xipeng Shen [email protected]
GPU Memory
34
Global memory
Texture memory
Shared memory
Constant memory
L1/L2 cacheRead-only cache
Texture cache
coalescing; cache hierarchy
2D/3D locality; texture cache; read-only
on-chip; bank conflicts
broadcasting; cached; read-only
private/shared
read-only data
2D/3D locality; read-only
![Page 35: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/35.jpg)
Xipeng Shen [email protected]
Data Placement Problem
35
Global memory
Texture memory
Shared memory
Constant memory
(L1/L2 cache)(Read-only cache)
(Texture cache)
A
B
C
D
…
Data in a program
?????
3X performance difference
![Page 36: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/36.jpg)
Xipeng Shen [email protected]
Data Placement Problem
36
Properties:
Machine dependent
Changes across models/generations
Input dependent
Changes across runs
Options:
Manual efforts by programmers?
Offline autotuning?
![Page 37: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/37.jpg)
Xipeng Shen [email protected] 37
PLACER(placing engine)
MSL(mem. spec. lang.)
PORPLE-C(compiler)
architect/usermem spec
org. program
access patterns
staged program
online profile
desired placement
efficient execution
offline online
microkernels
input
PORPLE in a Whole
More details in our Micro’2014 paper.
![Page 38: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/38.jpg)
Xipeng Shen [email protected] 38
Properties of PORPLE
• Good portability to new memory
• Just need new MSL spec
• Program adapts automatically
• Adaptivity to new program inputs
• On-the-fly placement with placement-agnostic code.
• Generality to regular & irregular programs
• Static analysis + lightweight online profiling
![Page 39: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/39.jpg)
• K20c
• M2075
• C1060
GPU Models
![Page 40: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/40.jpg)
Xipeng Shen [email protected]
Potential for Future Memory Systems
40
3D Stacked Memory
Persistent Memory
DRAM (NUMA)
![Page 41: Towards Intelligent Programing Systems for Modern …SSDM’10. 12 P2P: Point-to-Point Shortest Path SIAM’05 ... • Applying’TIwell’is’tricky,’hence’the’many’manual’efforts’and’](https://reader033.vdocument.in/reader033/viewer/2022060913/60a738403588102a5a67d99c/html5/thumbnails/41.jpg)
Final Takeaways
• Large potential of compilers for modern computing
• Right problem formulation is a key
TOPAn algorithmic optimizer. Up to 100x speedups.
PORPLEPortable solution to mem. complexity. Consistent speedups cross GPUs.
GStreamline