streaming algorithms in graphics hardware

39
Streaming Algorithms In Graphics Hardware Suresh Venkatasubramanian AT&T Labs–Research Streaming Algorithms in Graphics Hardware – p.1/22

Upload: others

Post on 12-Sep-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Streaming Algorithms In Graphics Hardware

Streaming Algorithms In Graphics Hardware

Suresh Venkatasubramanian

AT&T Labs–Research

Streaming Algorithms in Graphics Hardware – p.1/22

Page 2: Streaming Algorithms In Graphics Hardware

Two Converging Trends In Computation...

– The accelerated development of graphics accelerator cards(GPUs)

Current graphics accelerators are cheap and ubiquitous.

They are developing faster than CPUs (roughly 1.7 times fasterper year)

– The increasing need for streaming computations

Original motivation from dealing with large data sets

Also interesting from perspective of multimedia computations,image processing, visualization, and other areas.

Streaming Algorithms in Graphics Hardware – p.2/22

Page 3: Streaming Algorithms In Graphics Hardware

Two Converging Trends In Computation...

– The accelerated development of graphics accelerator cards(GPUs)

Current graphics accelerators are cheap and ubiquitous.

They are developing faster than CPUs (roughly 1.7 times fasterper year)

– The increasing need for streaming computations

Original motivation from dealing with large data sets

Also interesting from perspective of multimedia computations,image processing, visualization, and other areas.

Streaming Algorithms in Graphics Hardware – p.2/22

Page 4: Streaming Algorithms In Graphics Hardware

Graphics Cards Can Compute !

A graphics card takes a stream of objects (points, lines, triangles),and renders them on a screen.

Graphics Card

Each pixel in the screen can be viewed as a small processing unit.

glBlend � � � � �

z-test � � � ��� � � � �

Streaming Algorithms in Graphics Hardware – p.3/22

Page 5: Streaming Algorithms In Graphics Hardware

Large Set Of Diverse Applications

Occlusion Culling in scenes

Shading on objects

View dependent Simplification of Shapes

Geometric Optimization

Motion Planning and Collision Detection

Image processing (wavelet analysis)

Physical Simulations

Scientific Computations (matrix multiplication)

Data analysis (especially spatial data)

Streaming Algorithms in Graphics Hardware – p.4/22

Page 6: Streaming Algorithms In Graphics Hardware

THE GRAPHICS PIPELINE: A CLOSER LOOK

Streaming Algorithms in Graphics Hardware – p.5/22

Page 7: Streaming Algorithms In Graphics Hardware

Suresh Writes A Program

#include <gl.h>...glLight(..) // Set lightingglOrtho(..)// Set viewpoint

// Now draw objectsglColor(1,0,0);glBegin(GL_TRIANGLES)glVertex(x1,y1,z1)...glEnd()

gcc triangle.cc -lGL

Streaming Algorithms in Graphics Hardware – p.6/22

Page 8: Streaming Algorithms In Graphics Hardware

Processing Objects in the GPU: Step 1

Fragments

CPU GPU

Lighting

Color

Vertices

ViewpointCalculations

and colortransforms

Lighting

Rasterization

The Fixed-Function PipelineStreaming Algorithms in Graphics Hardware – p.7/22

Page 9: Streaming Algorithms In Graphics Hardware

Processing fragments in the GPU: Step 2

−Test

� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �

� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � �� � � � �� � � � �� � � � �� � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �

� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �� � � � � �

StencilTest

DepthTestα

? ? ?

TextureMemory

Fragments

Blending

Frame buffer GPU Display

The Fixed-Function Pipeline

Streaming Algorithms in Graphics Hardware – p.8/22

Page 10: Streaming Algorithms In Graphics Hardware

So where’s the computation ?

Stencil testif (buffer.stencil = K) continueelse drop fragment.

Depth testif (frag.depth < buffer.depth) continueelse drop fragment.

Blending operationsbuffer.color = buffer.color op fragment.color

– General arithmetic and boolean function for blending.– General comparison functions.– Convolution and histogramming operators.

Streaming Algorithms in Graphics Hardware – p.9/22

Page 11: Streaming Algorithms In Graphics Hardware

Programable Pipelines

Fragments

ViewpointCalculations

and colortransforms

Lighting

Rasterization

Vertex program Fragment program

Vertex program executes on each vertex.

Fragment program executes on each fragment.

Streaming Algorithms in Graphics Hardware – p.10/22

Page 12: Streaming Algorithms In Graphics Hardware

Capabilities

Large instruction set: general purpose arithmetic and scientificcalculations on scalars and vectors

Programs can be large: hundreds of instructions can beexecuted in a single pass.

Texture buffers allow more general purpose memory access.

Some limited pointer indirection for array lookups.

No looping in fragment programs; some looping permitted invertex programs.

Streaming Algorithms in Graphics Hardware – p.11/22

Page 13: Streaming Algorithms In Graphics Hardware

Haven’t We Seen This Before?

Standard streaming model of computation

� � � � � � �� � � � � � �� � � � � � �� � � � � � �� � � � � � �� � � � � � �� � � � � � �� � � � � � �

� � � � � � �� � � � � � �� � � � � � �� � � � � � �� � � � � � �� � � � � � �� � � � � � �� � � � � � �

Memory

4 3 51

91 16

25

Output

Input

Stream Algorithm

What’s different ?

Limited memory (really a constant vs polylog n).

Pipelining restriction: all items have to be treated the same way.

Multi-pass potential: standard streaming models assumeexactly one pass (with a few exceptions).

Streaming Algorithms in Graphics Hardware – p.12/22

Page 14: Streaming Algorithms In Graphics Hardware

Maybe We Have Seen This Before

Systolic Arrays [Kung+Leiserson 1978]

1 4 5

Special case (1-D) of systolic arrays

Have more memory access

Early graphics card design was in the framework of systoliccomputation !

Streaming Algorithms in Graphics Hardware – p.13/22

Page 15: Streaming Algorithms In Graphics Hardware

Maybe We Have Seen This Before

Systolic Arrays [Kung+Leiserson 1978]

1 4 5

Special case (1-D) of systolic arrays

Have more memory access

Early graphics card design was in the framework of systoliccomputation !

Streaming Algorithms in Graphics Hardware – p.13/22

Page 16: Streaming Algorithms In Graphics Hardware

Maybe We Have Seen This Before

Systolic Arrays [Kung+Leiserson 1978]

1 4 5

Special case (1-D) of systolic arrays

Have more memory access

Early graphics card design was in the framework of systoliccomputation !

Streaming Algorithms in Graphics Hardware – p.13/22

Page 17: Streaming Algorithms In Graphics Hardware

Maybe We Have Seen This Before

Systolic Arrays [Kung+Leiserson 1978]

1 4 5

Special case (1-D) of systolic arrays

Have more memory access

Early graphics card design was in the framework of systoliccomputation !

Streaming Algorithms in Graphics Hardware – p.13/22

Page 18: Streaming Algorithms In Graphics Hardware

Maybe We Have Seen This Before

Systolic Arrays [Kung+Leiserson 1978]

1 4 5

Special case (1-D) of systolic arrays

Have more memory access

Early graphics card design was in the framework of systoliccomputation !

Streaming Algorithms in Graphics Hardware – p.13/22

Page 19: Streaming Algorithms In Graphics Hardware

Graphics Card: Streaming Pipelined Architecture

Objects are presented to the card one-by-one.

Once processed, an object is passed to the next phase anddoes not return.

Spatial Parallelism: Each pixel processes a different stream.

There is limited local memory: each objects essentially carriesits own state with it.

Pipelining: Each object is processed in the same way.

Streaming Algorithms in Graphics Hardware – p.14/22

Page 20: Streaming Algorithms In Graphics Hardware

Graphics Card: Streaming Pipelined Architecture

Objects are presented to the card one-by-one.

Once processed, an object is passed to the next phase anddoes not return.

Spatial Parallelism: Each pixel processes a different stream.

There is limited local memory: each objects essentially carriesits own state with it.

Pipelining: Each object is processed in the same way.

Streaming Algorithms in Graphics Hardware – p.14/22

Page 21: Streaming Algorithms In Graphics Hardware

Graphics Card: Streaming Pipelined Architecture

Objects are presented to the card one-by-one.

Once processed, an object is passed to the next phase anddoes not return.

Spatial Parallelism: Each pixel processes a different stream.

There is limited local memory: each objects essentially carriesits own state with it.

Pipelining: Each object is processed in the same way.

Streaming Algorithms in Graphics Hardware – p.14/22

Page 22: Streaming Algorithms In Graphics Hardware

Graphics Card: Streaming Pipelined Architecture

Objects are presented to the card one-by-one.

Once processed, an object is passed to the next phase anddoes not return.

Spatial Parallelism: Each pixel processes a different stream.

There is limited local memory: each objects essentially carriesits own state with it.

Pipelining: Each object is processed in the same way.

Streaming Algorithms in Graphics Hardware – p.14/22

Page 23: Streaming Algorithms In Graphics Hardware

Graphics Card: Streaming Pipelined Architecture

Objects are presented to the card one-by-one.

Once processed, an object is passed to the next phase anddoes not return.

Spatial Parallelism: Each pixel processes a different stream.

There is limited local memory: each objects essentially carriesits own state with it.

Pipelining: Each object is processed in the same way.

Streaming Algorithms in Graphics Hardware – p.14/22

Page 24: Streaming Algorithms In Graphics Hardware

Graphics Card: Streaming Pipelined Architecture

Objects are presented to the card one-by-one.

Once processed, an object is passed to the next phase anddoes not return.

Spatial Parallelism: Each pixel processes a different stream.

There is limited local memory: each objects essentially carriesits own state with it.

Pipelining: Each object is processed in the same way.

Significant advantages accrue from exploiting data parallelism and

the pipeline model.

Streaming Algorithms in Graphics Hardware – p.14/22

Page 25: Streaming Algorithms In Graphics Hardware

EXAMPLES

Streaming Algorithms in Graphics Hardware – p.15/22

Page 26: Streaming Algorithms In Graphics Hardware

An Example: Voronoi Diagrams [HCKLM99]

Streaming Algorithms in Graphics Hardware – p.16/22

Page 27: Streaming Algorithms In Graphics Hardware

An Example: Voronoi Diagrams [HCKLM99]

Render right-angled cones for each point.

Streaming Algorithms in Graphics Hardware – p.16/22

Page 28: Streaming Algorithms In Graphics Hardware

An Example: Voronoi Diagrams [HCKLM99]

Render right-angled cones for each point.

Streaming Algorithms in Graphics Hardware – p.16/22

Page 29: Streaming Algorithms In Graphics Hardware

An Example: Voronoi Diagrams [HCKLM99]

Render right-angled cones for each point.

Set depth-test to LESS, so only the closest points to theviewpoint are rendered.

Streaming Algorithms in Graphics Hardware – p.16/22

Page 30: Streaming Algorithms In Graphics Hardware

An Example: Voronoi Diagrams [HCKLM99]

Render right-angled cones for each point.

Set depth-test to LESS, so only the closest points to theviewpoint are rendered.

Streaming Algorithms in Graphics Hardware – p.16/22

Page 31: Streaming Algorithms In Graphics Hardware

An Example: Voronoi Diagrams [HCKLM99]

Render right-angled cones for each point.

Set depth-test to LESS, so only the closest points to theviewpoint are rendered.

Also get diameter for free - using GREATER

Streaming Algorithms in Graphics Hardware – p.16/22

Page 32: Streaming Algorithms In Graphics Hardware

Bounding Box [AKMV03]

Each point in the primal is dualized to a plane.

Framebuffer viewed as dual plane: Each pixel represents adirection

Upper and lower envelopes in dual give extreme points (a laconvex hulls).

Superimposing different duals (using Gauss map), a simplefragment program computes the bounding box

Streaming Algorithms in Graphics Hardware – p.17/22

Page 33: Streaming Algorithms In Graphics Hardware

Quantile Computation

We want to compute the

� � �

-highest element of a sequence.

Depth ordering in scenes.

Natural streaming primitive (selection and sorting).

Relates to various geometric optimization problems.

Easy in stream model:

[MP80]: Computing in � passes requires

� �� � ��� �

memory.

[MRL98]:

� !#" �

-approximation to rank in ONE pass with� � $" %'& ( )" � �

memory.

[GK01]:

� � $" %& (" � �memory.

None of these algorithms are pipelined.

Streaming Algorithms in Graphics Hardware – p.18/22

Page 34: Streaming Algorithms In Graphics Hardware

One- and two-sided tests [GKMV03]

With hardware, we have

� � �

memory * + � %'& (, �

passes forgeneral streaming algorithm.

Depth test provides the one-sided test “Is fragment.depth - .

?”Lemma. Computing

� � �

highest element of a sequence requires

passes with aone-sided test.

Suppose we had a two-sided test “Is � - fragment.depth - �

?”Lemma. With a two-sided depth test,

� � �

highest element can be computed in%& ( , passes (randomized)

Streaming Algorithms in Graphics Hardware – p.19/22

Page 35: Streaming Algorithms In Graphics Hardware

Where do we find a two-sided test ?

Shadow test in pipeline (only in nVidia chips) [C02].

Used to render shadows on objects.

Functionally, provides (texture) buffer for second side of test.

/-test is used to simulate second side.

This can also be done using fragment programs.

Other areas where two-sided test is useful [GKMV03]:

Sweeping an arrangement of shapes

Used to compute boolean combinations of objects.

Streaming Algorithms in Graphics Hardware – p.20/22

Page 36: Streaming Algorithms In Graphics Hardware

How Do We Write Programs

Cg (from nVidia): C-like system calls are compiled into vertexand fragment programs.

Can compile for different targets (OpenGl/DirectX)Can incorporate limits on programs on different cards

HLSL: Microsoft High Level Shader Language

GL 2.0: OpenGL Standard for higher level programmingconstructs.

– General Purpose Stream Programming

High level stream programming constructs built over shaderlanguages (BROOK)

Streaming Algorithms in Graphics Hardware – p.21/22

Page 37: Streaming Algorithms In Graphics Hardware

Pipelined Streaming: Conclusions

These architectures are ever more prevalent.

Graphics chips a good platform for general purpose computing.

Numerous applications; demonstrable performance gain.

Streaming Algorithms in Graphics Hardware – p.22/22

Page 38: Streaming Algorithms In Graphics Hardware

Pipelined Streaming: Conclusions

These architectures are ever more prevalent.

Graphics chips a good platform for general purpose computing.

Numerous applications; demonstrable performance gain.

What computational model do these architectures fit into ?

Streaming Algorithms in Graphics Hardware – p.22/22

Page 39: Streaming Algorithms In Graphics Hardware

Pipelined Streaming: Conclusions

These architectures are ever more prevalent.

Graphics chips a good platform for general purpose computing.

Numerous applications; demonstrable performance gain.

What computational model do these architectures fit into ?

Strictly weaker for general streaming; probably stronger thancircuits

Results from systolic computation useful ?

New ideas needed for proving upper/lower bounds because ofmultipass nature of computations.

Streaming Algorithms in Graphics Hardware – p.22/22