doing more with gramps - stanford universityyoel/notes/091210-gramps-mr-gcafe.pdf4 gramps review (2)...

28
Doing More With GRAMPS Jeremy Sugerman 10 December 2009 GCafe

Upload: others

Post on 26-Feb-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Doing More With GRAMPS - Stanford Universityyoel/notes/091210-gramps-mr-gcafe.pdf4 GRAMPS Review (2) • Shaders: data-parallel, plus push • Threads/Fixed-function: stateful / tasks

Doing More With GRAMPS

Jeremy Sugerman10 December 2009

GCafe

Page 2: Doing More With GRAMPS - Stanford Universityyoel/notes/091210-gramps-mr-gcafe.pdf4 GRAMPS Review (2) • Shaders: data-parallel, plus push • Threads/Fixed-function: stateful / tasks

2

Introduction

• Past: GRAMPS for building renderers

• This Talk: GRAMPS in two new domains: map-reduce and rigid body physics

Page 3: Doing More With GRAMPS - Stanford Universityyoel/notes/091210-gramps-mr-gcafe.pdf4 GRAMPS Review (2) • Shaders: data-parallel, plus push • Threads/Fixed-function: stateful / tasks

3

GRAMPS Review (1)

• Programming model / API / run-time for heterogeneous many-core machines

• Applications are:– Graphs of multiple stages (cycles allowed)– Connected via queues

• Interesting workloads are irregular

Page 4: Doing More With GRAMPS - Stanford Universityyoel/notes/091210-gramps-mr-gcafe.pdf4 GRAMPS Review (2) • Shaders: data-parallel, plus push • Threads/Fixed-function: stateful / tasks

4

GRAMPS Review (2)

• Shaders: data-parallel, plus push• Threads/Fixed-function: stateful / tasks

Fram

ebuffer

RastFragment

ShadeBlend

Example Rasterization Pipeline

Page 5: Doing More With GRAMPS - Stanford Universityyoel/notes/091210-gramps-mr-gcafe.pdf4 GRAMPS Review (2) • Shaders: data-parallel, plus push • Threads/Fixed-function: stateful / tasks

5

GRAMPS Review (3)

• Queue set:– single logical queue, independent subqueues

• Synchronization and parallel consumption• Binning, screen-space subdivision, etc.

Fram

ebuffer

RastFragment

ShadeBlend

Page 6: Doing More With GRAMPS - Stanford Universityyoel/notes/091210-gramps-mr-gcafe.pdf4 GRAMPS Review (2) • Shaders: data-parallel, plus push • Threads/Fixed-function: stateful / tasks

6

Map-Reduce

• Popular parallel idiom:

• Used at both cluster and multi-core scale• Analytics, indexing, machine learning, …

Map:Foreach(input) {

Do something

Emit(key, &val)

}

Reduce:Foreach(key) {

Process values

EmitFinalResult()

}

Page 7: Doing More With GRAMPS - Stanford Universityyoel/notes/091210-gramps-mr-gcafe.pdf4 GRAMPS Review (2) • Shaders: data-parallel, plus push • Threads/Fixed-function: stateful / tasks

7

Map-Reduce: Combine

• Reduce often has high overhead:– Buffering of intermediate pairs (storage, stall)

– Load imbalance across keys– Serialization within a key

• In practice, Reduce is often associative and commutative (and simple).

• Combine phase enables incremental, parallel reduction

Page 8: Doing More With GRAMPS - Stanford Universityyoel/notes/091210-gramps-mr-gcafe.pdf4 GRAMPS Review (2) • Shaders: data-parallel, plus push • Threads/Fixed-function: stateful / tasks

8

Map-Reduce in GRAMPS

• A few API extensions• A few new optimizations

OutputInput Params

Produce Map ReduceCombine

Splits Pairs:Key0: vals[]Key1: vals[]…

Pairs:Key0: vals[]Key1: vals[]…

Page 9: Doing More With GRAMPS - Stanford Universityyoel/notes/091210-gramps-mr-gcafe.pdf4 GRAMPS Review (2) • Shaders: data-parallel, plus push • Threads/Fixed-function: stateful / tasks

9

Extension: Queue Sets

• Make queue sets more dynamic– Create subqueues on demand

– Sparsely indexed ‘keyed’ subqueues– ANY_SUBQUEUE flag for Reserve

Make-Grid(obj):For (cells in o.bbox) {

key = linearize(cell)

PushKey(out, key, &o)

}

Collide():While (Reserve(input, ANY))

For (each o1, o2 pair)

if (o1 overlaps o2)

...

Page 10: Doing More With GRAMPS - Stanford Universityyoel/notes/091210-gramps-mr-gcafe.pdf4 GRAMPS Review (2) • Shaders: data-parallel, plus push • Threads/Fixed-function: stateful / tasks

10

Extension: Instanced Threads

• Automatic instancing of thread stages– One to one with input subqueues

– Only when processing is independent

Make-Grid(obj):For (cells in o.bbox) {

key = linearize(cell)

PushKey(out, key, &o)

}

Collide(subqueue):For (each o1, o2 pair)

if (o1 overlaps o2)

...

Page 11: Doing More With GRAMPS - Stanford Universityyoel/notes/091210-gramps-mr-gcafe.pdf4 GRAMPS Review (2) • Shaders: data-parallel, plus push • Threads/Fixed-function: stateful / tasks

11

Extension: Fan-in Shaders

• Enable shader parallel partial reductions– Input: One packet, Output: One element

– Can operate in-place or as a filter– Run-time coalesces mostly empty packets

Sum(packet):For (i < packet.numEl)

sum += packet.v[i]

packet.v[0] = sum

packet.numEl = 1

Histogram(pixels):For (i < pixels.numEl){

c = .3r + .6g + .1b

PushKey(out,c/256,1)

}

Page 12: Doing More With GRAMPS - Stanford Universityyoel/notes/091210-gramps-mr-gcafe.pdf4 GRAMPS Review (2) • Shaders: data-parallel, plus push • Threads/Fixed-function: stateful / tasks

12

Digression: Combine is a builtin

• Alternatives:– Regular shader accumulating with atomics

– GPGPU multi-pass shader reduction– Manually replicated thread stages

– Fan-in with same queue as input and output

• Reality: Messy, micro-managed, slow– Run-time should hide complexity, not export it

Page 13: Doing More With GRAMPS - Stanford Universityyoel/notes/091210-gramps-mr-gcafe.pdf4 GRAMPS Review (2) • Shaders: data-parallel, plus push • Threads/Fixed-function: stateful / tasks

13

Optimizations

• Aggressive shader instancing• Per-subqueue push coalescing• Per-core scoreboard

Page 14: Doing More With GRAMPS - Stanford Universityyoel/notes/091210-gramps-mr-gcafe.pdf4 GRAMPS Review (2) • Shaders: data-parallel, plus push • Threads/Fixed-function: stateful / tasks

14

• App Provides:– Produce, Guts of: map, combine, reduce

• Run-time Provides:– GRAMPS bindings, elems per packet

GRAMPS Map-ReduceO

utputInput Params

Produce MapReduce

(instanced)Combine(in-place)

Splits Pairs:Key0: vals[]Key1: vals[]…

Pairs:Key0: vals[]Key1: vals[]…

Page 15: Doing More With GRAMPS - Stanford Universityyoel/notes/091210-gramps-mr-gcafe.pdf4 GRAMPS Review (2) • Shaders: data-parallel, plus push • Threads/Fixed-function: stateful / tasks

15

GRAMPS Map-Reduce Apps

Based on Phoenix map-reduce apps:• Histogram: Quantize input image into 256

buckets• Linear Regression: For a set of (x,y) pairs,

compute average x, x², y, y², and xy• PCA: For a matrix M, compute the mean

of each row and the covariance of all pairs of rows

Page 16: Doing More With GRAMPS - Stanford Universityyoel/notes/091210-gramps-mr-gcafe.pdf4 GRAMPS Review (2) • Shaders: data-parallel, plus push • Threads/Fixed-function: stateful / tasks

16

Map-Reduce App Results

4700 KB2300 KB97.2%Histogram-512

20 KB10 KB96.2%(combine)

205 KB100 KB65.5%LR-32768

1.5 KB1 KB97.0%(combine)

1 KB.5 KB99.2%PCA-128

Footprint

(Peak)

Footprint

(Avg.)

Occupancy

(CPU-Like)

Page 17: Doing More With GRAMPS - Stanford Universityyoel/notes/091210-gramps-mr-gcafe.pdf4 GRAMPS Review (2) • Shaders: data-parallel, plus push • Threads/Fixed-function: stateful / tasks

17

Histogram 512x512

Page 18: Doing More With GRAMPS - Stanford Universityyoel/notes/091210-gramps-mr-gcafe.pdf4 GRAMPS Review (2) • Shaders: data-parallel, plus push • Threads/Fixed-function: stateful / tasks

18

Histogram 512x512 (Combine)

Page 19: Doing More With GRAMPS - Stanford Universityyoel/notes/091210-gramps-mr-gcafe.pdf4 GRAMPS Review (2) • Shaders: data-parallel, plus push • Threads/Fixed-function: stateful / tasks

19

Histogram 512x512 (GPU)

Page 20: Doing More With GRAMPS - Stanford Universityyoel/notes/091210-gramps-mr-gcafe.pdf4 GRAMPS Review (2) • Shaders: data-parallel, plus push • Threads/Fixed-function: stateful / tasks

20

PCA 128x128 (CPU)

Page 21: Doing More With GRAMPS - Stanford Universityyoel/notes/091210-gramps-mr-gcafe.pdf4 GRAMPS Review (2) • Shaders: data-parallel, plus push • Threads/Fixed-function: stateful / tasks

21

PCA 128x128 (GPU)

Page 22: Doing More With GRAMPS - Stanford Universityyoel/notes/091210-gramps-mr-gcafe.pdf4 GRAMPS Review (2) • Shaders: data-parallel, plus push • Threads/Fixed-function: stateful / tasks

22

Sphere Physics

Page 23: Doing More With GRAMPS - Stanford Universityyoel/notes/091210-gramps-mr-gcafe.pdf4 GRAMPS Review (2) • Shaders: data-parallel, plus push • Threads/Fixed-function: stateful / tasks

23

Sphere Physics

A (simplified) proxy for rigid body physics:Generate N spheres, initial velocity

while(true) {

• Find all pairs of intersecting spheres

• Compute ∆v to resolve collision (conserve energy, momentum)

• Compute updated result velocity and position

}

Page 24: Doing More With GRAMPS - Stanford Universityyoel/notes/091210-gramps-mr-gcafe.pdf4 GRAMPS Review (2) • Shaders: data-parallel, plus push • Threads/Fixed-function: stateful / tasks

24

1. Split Spheres into chunks of N2. Emit(cell, sphereNum) for each sphere3. Emit(s1, s2) for each intersection in cell4. For each sphere, resolve and update

GRAMPS: Sphere PhysicsParams

SplitMakeGrid

ResolveCollide

Cell

Spheres

Page 25: Doing More With GRAMPS - Stanford Universityyoel/notes/091210-gramps-mr-gcafe.pdf4 GRAMPS Review (2) • Shaders: data-parallel, plus push • Threads/Fixed-function: stateful / tasks

25

CPU-Like: 256 Spheres

Page 26: Doing More With GRAMPS - Stanford Universityyoel/notes/091210-gramps-mr-gcafe.pdf4 GRAMPS Review (2) • Shaders: data-parallel, plus push • Threads/Fixed-function: stateful / tasks

26

Future Work

• Tuning:– Push, combine coalesce efficiency

– Map-Reduce chunk sizes for split, reduce

• Extensions to enable more shader usage in Sphere Physics?

• Flesh out how/where to apply application enhancements, optimizations

Page 27: Doing More With GRAMPS - Stanford Universityyoel/notes/091210-gramps-mr-gcafe.pdf4 GRAMPS Review (2) • Shaders: data-parallel, plus push • Threads/Fixed-function: stateful / tasks

27

Other People’s Work

• Improved sim: model ILP and caches• Micropolygon rasterization, fixed functions

• x86 many-core:

Page 28: Doing More With GRAMPS - Stanford Universityyoel/notes/091210-gramps-mr-gcafe.pdf4 GRAMPS Review (2) • Shaders: data-parallel, plus push • Threads/Fixed-function: stateful / tasks

28

Thank You

• Questions?