![Page 1: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/1.jpg)
Herb Sutter
![Page 2: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/2.jpg)
1975-2005
Put a computer on every desk, in every home, in every pocket.
2005-2011
Put a parallel supercomputer on every desk, in every home, in every pocket.
2011-201x
Put a heterogeneous supercomputer on every desk, in every home, in every pocket.
Welcome to the jungle
The free lunch is so over
![Page 3: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/3.jpg)
![Page 4: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/4.jpg)
![Page 5: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/5.jpg)
Pro
cess
ors
Memory
![Page 6: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/6.jpg)
Pro
cess
ors
Memory
Xbox 360 & mainstream
computer
AMD 80x86
Phenom II Athlon
Fusion APU
AMD GPU
Other GPU
![Page 7: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/7.jpg)
Pro
cess
ors
Memory
AMD 80x86
Phenom II Athlon
Fusion APU
AMD GPU
Cloud + GPU
Microsoft Azure Cloud Computing
Other GPU
Xbox 360 & mainstream
computer
![Page 8: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/8.jpg)
Pro
cess
ors
Memory
![Page 9: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/9.jpg)
Pro
cess
ors
Memory
(GP)GPU
Multicore CPU
Cloud IaaS/HaaS
ISO C++0x
ISO C++
![Page 10: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/10.jpg)
![Page 11: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/11.jpg)
Pro
cess
ors
Memory
(GP)GPU
Multicore CPU
Cloud IaaS/HaaS
C++ PPL ISO
C++0x
![Page 12: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/12.jpg)
Pro
cess
ors
Memory
(GP)GPU
Multicore CPU
Cloud IaaS/HaaS
C++ PPL ISO
C++0x
DirectCompute
?
![Page 13: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/13.jpg)
Pro
cess
ors
Memory
(GP)GPU
Multicore CPU
Cloud IaaS/HaaS
C++ PPL ISO
C++0x
DirectCompute C++ AMP
Accelerated Massive Parallelism
![Page 14: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/14.jpg)
void MatrixMult( float* C, const vector<float>& A, const vector<float>& B, int M, int N, int W ) { for (int y = 0; y < M; y++) for (int x = 0; x < N; x++) { float sum = 0; for(int i = 0; i < W; i++) sum += A[y*W + i] * B[i*N + x]; C[y*N + x] = sum; } }
Convert this (serial loop nest)
![Page 15: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/15.jpg)
void MatrixMult( float* C, const vector<float>& A, const vector<float>& B, int M, int N, int W ) { for (int y = 0; y < M; y++) for (int x = 0; x < N; x++) { float sum = 0; for(int i = 0; i < W; i++) sum += A[y*W + i] * B[i*N + x]; C[y*N + x] = sum; } }
Convert this (serial loop nest)
… to this (parallel loop, CPU or GPU)
void MatrixMult( float* C, const vector<float>& A, const vector<float>& B, int M, int N, int W ) { array_view<const float,2> a(M,W,A), b(W,N,B); array_view<writeonly<float>,2> c(M,N,C);
parallel_for_each( c.grid, [=](index<2> idx) restrict(direct3d) { float sum = 0; for(int i = 0; i < a.x; i++) sum += a(idx.y, i) * b(i, idx.x); c[idx] = sum; } ); }
![Page 16: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/16.jpg)
![Page 17: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/17.jpg)
17 | The Programmer’s Guide to the APU Galaxy | June 2011
EVOLUTION OF HETEROGENEOUS COMPUTING A
rch
ite
ctu
re M
atu
rity
& P
rog
ram
me
r A
cce
ssib
ility
Po
or
Ex
ce
lle
nt
2012 - 2020 2009 - 2011 2002 - 2008
Graphics & Proprietary
Driver-based APIs
Proprietary Drivers Era
“Adventurous” programmers
Exploit early programmable
“shader cores” in the GPU
Make your program look like
“graphics” to the GPU
CUDA™, Brook+, etc
OpenCL™, DirectCompute
Driver-based APIs
Standards Drivers Era
Expert programmers
C and C++ subsets
Compute centric APIs, data
types
Multiple address spaces with
explicit data movement
Specialized work queue based
structures
Kernel mode dispatch
Fusion™ System Architecture
GPU Peer Processor
Architected Era
Mainstream programmers
Full C++
GPU as a co-processor
Unified coherent address space
Task parallel runtimes
Nested Data Parallel programs
User mode dispatch
Pre-emption and context
switching
![Page 18: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/18.jpg)
Pro
cess
ors
Memory
![Page 19: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/19.jpg)
PPL Parallel Patterns Library
(VS2010)
ISO C++0x
Single-core to multi-core
?
![Page 20: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/20.jpg)
PPL Parallel Patterns Library
(VS2010)
ISO C++0x
Single-core to multi-core
forall( x, y )
forall( z; w; v )
forall( k, l, m, n )
. . . ?
![Page 21: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/21.jpg)
λ Single-core to multi-core
PPL Parallel Patterns Library
(VS2010)
ISO C++0x parallel_for_each(
items.begin(), items.end(), [=]( Item e ) {
… your code here …
} );
![Page 22: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/22.jpg)
1 language feature for multicore
and STL, functors, callbacks, events, ...
![Page 23: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/23.jpg)
Multi-core to hetero-core
C++ AMP Accelerated
Massive Parallelism
ISO C++0x
?
![Page 24: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/24.jpg)
restrict
Multi-core to hetero-core
C++ AMP Accelerated
Massive Parallelism
ISO C++0x parallel_for_each(
items.grid, [=](index<2> i) restrict(direct3d) {
… your code here …
} );
![Page 25: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/25.jpg)
1 language feature for heterogeneous cores
![Page 26: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/26.jpg)
Pro
cess
ors
Memory
![Page 27: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/27.jpg)
Problem: Some cores don’t support the entire C++ language.
Solution: General restriction qualifiers enable expressing language subsets within the language. Direct3d math functions in the box.
Example
double sin( double ); // 1a: general code double sin( double ) restrict(direct3d); // 1b: specific code
double cos( double ) restrict(direct3d); // 2: same code for either
parallel_for_each( c.grid, [=](index<2> idx) restrict(direct3d) { … sin( data.angle ); // ok, chooses overload based on context cos( data.angle ); // ok … });
![Page 28: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/28.jpg)
Initially supported restriction qualifiers: restrict(cpu): The implicit default.
restrict(direct3d): Can execute on any DX11 device via DirectCompute. Restrictions follow limitations of DX11 device model
(e.g., no function pointers, virtual calls, goto).
Potential future directions: restrict(pure): Declare and enforce a function has no side effects.
Great to be able to state declaratively for parallelism.
General facility for language subsets, not just about compute targets.
![Page 29: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/29.jpg)
Problem: Memory may be flat, nonuniform, incoherent, and/or disjoint.
Solution: Portable view that works like an N-dimensional “iterator range.” Future-proof: No explicit .copy()/.sync(). As needed by each actual device.
Example
void MatrixMult( float* C, const vector<float>& A, const vector<float>& B, int M, int N, int W ) { array_view<const float,2> a(M,W,A), b(W,N,B); // 2D view over C array array_view<writeonly<float>,2> c(M,N,C); // 2D view over C++ std::vector
parallel_for_each( c.grid, [=](index<2> idx) restrict(direct3d) { … } ); }
![Page 30: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/30.jpg)
TM
![Page 31: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/31.jpg)
Bring CPU debugging experience to the GPU
![Page 32: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/32.jpg)
Bring CPU debugging experience to the GPU
![Page 33: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/33.jpg)
![Page 34: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/34.jpg)
![Page 35: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/35.jpg)
TM
![Page 36: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/36.jpg)
# cores, not counting SIMD
OoO CPU
InO CPU
GPU
Cloud OoO
Cloud GPU
![Page 37: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/37.jpg)
# cores, not counting SIMD
OoO CPU
InO CPU
GPU
Cloud OoO
Cloud GPU
Welcome to the jungle
The free lunch is so over
![Page 38: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/38.jpg)
Pro
cess
ors
Memory
![Page 39: Heterogeneous Parallelism at Microsoftdeveloper.amd.com/wordpress/media/2012/10/4-Sutter...2012/10/04 · Great to be able to state declaratively for parallelism. General facility](https://reader035.vdocument.in/reader035/viewer/2022071403/60f796fcb69a4e5e507d04f2/html5/thumbnails/39.jpg)
Herb Sutter
C++ PPL: 9:45am C++ AMP: 2:00pm, Room 406