stream processing

78
Stream Processing Main References: “Comparing Reyes and OpenGL on a Stream Architecture”, 2002 “Polygon Rendering on a Stream Architecture”, 2000 Department of Computer Science, University of Virginia [email protected]

Upload: mihaly

Post on 07-Feb-2016

67 views

Category:

Documents


0 download

DESCRIPTION

Stream Processing. Main References: “Comparing Reyes and OpenGL on a Stream Architecture”, 2002 “Polygon Rendering on a Stream Architecture”, 2000. Department of Computer Science, University of Virginia [email protected]. The Stream Programming Model. The Main Idea. Stream 4 data - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Stream Processing

Stream ProcessingMain References:

“Comparing Reyes and OpenGL on a Stream Architecture”, 2002 “Polygon Rendering on a Stream Architecture”, 2000

Department of Computer Science, University of [email protected]

Page 2: Stream Processing

The Stream Programming Model

Programmable Kernel

Stream 4datadatadatadatadata

Stream 3datadatadatadatadata

Stream 2datadatadatadatadata

Stream 1datadatadatadatadata

The Main Idea

Page 3: Stream Processing

The Stream Programming Model

Programmable Kernel

Stream 4datadatadatadatadata

Stream 3datadatadatadatadata

Stream 2datadatadatadatadata

Stream 1transformed datatransformed datatransformed datatransformed datatransformed data

The Main Idea

Page 4: Stream Processing

The Stream Programming Model

Programmable Kernel

Stream 4datadatadatadatadata

Stream 3datadatadatadatadata

Stream 2 data data data data data

Stream 1transformed datatransformed datatransformed datatransformed datatransformed data

The Main Idea

Page 5: Stream Processing

The Stream Programming Model

Programmable Kernel

Stream 4datadatadatadatadata

Stream 3 data data data data data

Stream 2 data data data data data

Stream 1transformed datatransformed datatransformed datatransformed datatransformed data

The Main Idea

Page 6: Stream Processing

The Stream Programming Model

Programmable Kernel

Stream 4 data data data data data

Stream 3 data data data data data

Stream 2 data data data data data

Stream 1transformed datatransformed datatransformed datatransformed datatransformed data

The Main Idea

Page 7: Stream Processing

The Stream Programming Model

Transform

Chaining Kernels Example: The Geometry Stage of the OpenGL Pipeline

InputVertexes

Shade Assemble

CullProjectTowardRasterization

Stage

Page 8: Stream Processing

The Stream Programming Model Hardware Implementation: the Imagine Stream Processor Communicate

with host and issueoperations.

Page 9: Stream Processing

The Stream Programming Model Hardware Implementation: the Imagine Stream Processor Transfer data

between parts ofthe chip.

Page 10: Stream Processing

The Stream Programming Model Hardware Implementation: the Imagine Stream Processor Local storage and

reuse of intermediatestreams.

Page 11: Stream Processing

The Stream Programming Model Hardware Implementation: the Imagine Stream Processor

Store kernel code.

Page 12: Stream Processing

The Stream Programming Model Hardware Implementation: the Imagine Stream Processor

Execute one kernel at a time.

Page 13: Stream Processing

The Stream Programming Model Hardware Implementation: the Imagine Stream Processor

Connection withother Imagine chips.

Page 14: Stream Processing

The Stream Programming Model

Programmable Kernel

Stream 5data type 1data type 1data type 1data type 1data type 1

Homogeneous Data Type for Efficiency

Stream 6data type 2data type 2data type 2data type 2data type 2

Code:if (data type== data type 1){...}if (data type==data type 2){...}

Page 15: Stream Processing

The Stream Programming Model

Programmable Kernel

Stream 5data type 1data type 1data type 1data type 1data type 1

Stream 6data type 2data type 2data type 2data type 2data type 2

Code:if (data type== data type 1){...}if (data type==data type 2){...}

Homogeneous Data Type for Efficiency

Page 16: Stream Processing

The Stream Programming Model

Programmable Kernel 1

Stream 5data type 1data type 1data type 1data type 1data type 1

Stream 6data type 2data type 2data type 2data type 2data type 2

Programmable Kernel 2

Homogeneous Data Type for Efficiency

Stream 5data type 1data type 1data type 1data type 1data type 1

Stream 5data type 1data type 1data type 1data type 1data type 1

Stream 7data type 1data type 1data type 1data type 1data type 1

DATA

SORT

Page 17: Stream Processing

Advantages of a Stream Processor Programmability

Efficient Shading Example: OpenGL Inefficiency

Page 18: Stream Processing

Advantages of a Stream Processor Programmability

Efficient Shading Example: OpenGL Inefficiency

1. Draw the plane.

Page 19: Stream Processing

Advantages of a Stream Processor Programmability

Efficient Shading Example: OpenGL Inefficiency

1. Draw the plane.2. Draw the cube.

Page 20: Stream Processing

Advantages of a Stream Processor Programmability

Efficient Shading Example: OpenGL Inefficiency

1. Draw the plane.2. Draw the cube.3. Redraw the cube.

Page 21: Stream Processing

Advantages of a Stream Processor Programmability

Efficient Shading Example: OpenGL Inefficiency

1. Draw the plane.2. Draw the cube.3. Redraw the cube.

Redraw the complete scene to obtain correct shadow on one object.

Page 22: Stream Processing

Advantages of a Stream Processor Programmability

Efficient Shading Hardware Implementation of New API

API Example: Pixar’s Renderman (Reyes Image Rendering Architecture)

Page 23: Stream Processing

Advantages of a Stream Processor Producer - Consumer Locality

Capture Example: OpenGL Pipeline Inefficiency

GeometryStage

RasterizationStage

CompositeStage

Vertexes

Page 24: Stream Processing

Advantages of a Stream Processor Producer - Consumer Locality

Capture Example: OpenGL Pipeline Inefficiency

GeometryStage

RasterizationStage

CompositeStage

Vertexes Assembled Triangles

Fragments Pixels

Page 25: Stream Processing

Advantages of a Stream Processor Producer - Consumer Locality

Capture Example: OpenGL Pipeline Inefficiency

GeometryStall

RasterizationStage

CompositeStage

Vertexes Assembled Triangles

Fragments Pixels

Page 26: Stream Processing

Advantages of a Stream Processor Producer - Consumer Locality

Capture Example: OpenGL Stream Inplementation

VertexStreams

FragmentStreams

PixelStreams

RasterizationKernels

CompositeKernels

GeometryKernels

Triangle Streams

Page 27: Stream Processing

Advantages of a Stream Processor Producer - Consumer Locality

Capture Example: OpenGL Stream Inplementation

VertexStreams

FragmentStreams

PixelStreams

RasterizationKernels

CompositeKernels

GeometryKernels

Triangle Streams

Page 28: Stream Processing

Advantages of a Stream Processor Flexible Resource Allocation

Example: OpenGL Pipeline Inefficiency

GeometryStage

RasterizationStall

CompositeStall

Vertexes

Waste of hardware capacity.

Page 29: Stream Processing

Advantages of a Stream Processor Flexible Resource Allocation

Example: OpenGL Stream Implementation

VertexStreams

RasterizationKernels

CompositeKernels

GeometryKernels

No waste: kernels are pieces of coderunning on the same hardware!

Page 30: Stream Processing

Advantages of a Stream Processor Pipeline Reordering

Example: Blending off in the OpenGL Pipeline

Part of Rasterization - Composite Stage

TextureKernel

BlendingKernel

DepthKernel

Fragments

Page 31: Stream Processing

Advantages of a Stream Processor Pipeline Reordering

Example: Blending off in the OpenGL Pipeline

Part of Rasterization - Composite Stage

TextureKernel

BlendingKernel

DepthKernel

FragmentsMany fragments are needlessly textured

Page 32: Stream Processing

Advantages of a Stream Processor Pipeline Reordering

Example: Blending off in the OpenGL Pipeline

Part of the Rasterization/Composite Stage

TextureKernel

DepthKernel

FragmentsWe can reorder the pipeline.

Page 33: Stream Processing

Advantages of a Stream Processor Obvious Scalability

Data Level Parallelism

TextureKernel

TextureKernel

TextureKernel

Fragments

Page 34: Stream Processing

Advantages of a Stream Processor Obvious Scalability

Functional Parallelism

TextureKernel

BlendingKernel

DepthKernel

Page 35: Stream Processing

Imagine’s Performance

That looks great!

Page 36: Stream Processing

Imagine’s Performance

“Interaction between host processor and graphics subsystem not modeled” in Imagine.

“Many hardware-accelerated systems are limited by the bus between the processor and the graphics subsystem”.

Page 37: Stream Processing

Imagine’s Performance

“Imagine clocks rate is also significantly higher (500MHz vs. 120 MHz)”.

Page 38: Stream Processing

Imagine’s Performance

Page 39: Stream Processing

Imagine’s Performance But the comparison is still “instructive”. “Running our tests on commercial systems gives a sens

of relative complexity”.

Frame RateNormalized to the Sphere Test

NVIDIA Quadro and Imagine RelativePerformance

Page 40: Stream Processing

Conclusions on Imagine PerformanceYear 2000 “Implementing polygon rendering on a

stream processor allows performance approaching that of special-purpose graphics hardware while at the same time providing the flexibility traditionally associated with a software-only implementation”

Page 41: Stream Processing

Conclusions on Imagine PerformanceYear 2000 “Implementing polygon rendering on a

stream processor allows performance approaching that of special-purpose graphics hardware while at the same time providing the flexibility traditionally associated with a software-only implementation”

Page 42: Stream Processing

Conclusions on Imagine PerformanceYear 2002 “The lack of specialization hurts

Imagine’s performance compared to modern graphics processors”.

Page 43: Stream Processing

Conclusions on Imagine PerformanceYear 2002 “The lack of specialization hurts

Imagine’s performance compared to modern graphics processors”.

“When comparing graphics algorithms, [the lack of specialization] does make Imagine performance-neutral to the algorithms employed”.

Page 44: Stream Processing

Comparing Reyes and OpenGL on a Stream Architecture

Why?Frame Speed

FrameComplexity/ Quality

OpenGL Reyes

Speed: Interactive(50 frames per second)

Speed:Allowing to compute the pictures of a 2 hours movie in one year(1 frame every 3 minutes or0.006 frames per second)

Page 45: Stream Processing

Comparing Reyes and OpenGL on a Stream Architecture

Why?Frame Speed

FrameComplexity/ Quality

OpenGL Reyes

Quality/ Complexity:Variable...

Quality/ Complexity:Indistinguishable from live action motion picture photography.As complex as real scenes.

Page 47: Stream Processing

The OpenGL Pipeline Command Specification

glBegin(GL_TRIANGLES) glColor3f(0.5,0.8,0.9); glVertex3f(5.,0.4,100.); glVertex3f(0.6,101.,102.); glVertex3f(2.,5.,6.);glEnd()etc...

Object Space

Page 48: Stream Processing

The OpenGL Pipeline Per Vertex Operation

Eye Space

Page 49: Stream Processing

The OpenGL Pipeline Per Vertex Operation: Lighting, Shading

Eye Space

ProgrammableStage

Page 50: Stream Processing

The OpenGL Pipeline Assembly

Eye Space

Page 51: Stream Processing

The OpenGL Pipeline Per Primitive Operation: Clip and Project

Eye Space

Page 52: Stream Processing

The OpenGL Pipeline Per Primitive Operation: Clip and Project

Eye Space

Page 53: Stream Processing

The OpenGL Pipeline Rasterization: Interpolation

Screen Space

Page 54: Stream Processing

The OpenGL Pipeline Rasterization: Fragment Generation

Screen Space

Page 55: Stream Processing

The OpenGL Pipeline Rasterization: Fragment Generation

Screen Space

. . .

. . .

. . .

.

.

.

.

.

.

.

.

.

Page 56: Stream Processing

The OpenGL Pipeline Per Fragment Operation: Texturing and

Blending

Screen Space

. . .

. . .

. . .

.

.

.

.

.

.

.

.

.

ProgrammableStage

Page 57: Stream Processing

The OpenGL Pipeline Composite: visibility filter

Screen Space

Page 58: Stream Processing

The Reyes Pipeline Command specification

Fractals Graftals Bezier surfaces etc...

Object Space

Page 59: Stream Processing

The Reyes Pipeline Tessellation.

Splitting of big primitives in smaller ones. Dicing in micropolygones.

Eye Space

Sphere split into patches. Patches split into grids of micropolygones.

1/2 pixel

Knowledge of Screen Space

Page 60: Stream Processing

The Reyes Pipeline Flat shading, texturing, blending.

Eye Space

1/2 pixel

ProgrammableStage

Page 61: Stream Processing

The Reyes Pipeline Jittering or stochastic sampling to eliminate

any artifact.

Screen Space

1 Pixel

16 subpixels

Page 62: Stream Processing

The Reyes Pipeline Jittering or stochastic sampling.

Screen Space

1 Pixel

Random displacement

Page 63: Stream Processing

The Reyes Pipeline Jittering or stochastic sampling.

Screen Space

Page 64: Stream Processing

The Reyes Pipeline Depth filtering to obtain final image.

Screen Space

Page 65: Stream Processing

Difference between OpenGL and Reyes

OpenGL ReyesTwo programming stages.

One programming stage.

Coherent access texture.

Mipmapping (non coherenttexture access).

Primitives are triangles. Primitives are micropolygons.

Does not support high order data type.

Support high order data type (e.g.: Bezier surfaces).

Reyes Hardware ImplementationEasier.

Page 66: Stream Processing

Difference between OpenGL and Reyes

OpenGL ReyesTwo programming stages.

One programming stage.

Mipmapping (non coherenttexture access).

Coherent access texture.

Primitives are triangles. Primitives are micropolygons.

Does not support high order data type.

Support high order data type (e.g.: Bezier surfaces).

Reyes saves in computationand memory bandwidth.

Page 67: Stream Processing

Difference between OpenGL and Reyes

OpenGL ReyesTwo programming stages.

One programming stage.

Mipmapping (non coherenttexture access).

Coherent access texture.

Primitives are triangles. Primitives are micropolygons.

Does not support high order data type.

Support high order data type (e.g.: Bezier surfaces).

Reyes advantages:Easy storage of primitives.Load balance.Parallelization.

OpenGL advantages:Work Factorizationfor shading and lighting.

Page 68: Stream Processing

Difference between OpenGL and Reyes

OpenGL ReyesTwo programming stages.

One programming stage.

Mipmapping (non coherenttexture access).

Coherent access texture.

Primitives are triangles. Primitives are micropolygons.

Does not support high order data type.

Support high order data type (e.g.: Bezier surfaces).

Reyes advantages:Easy storage of primitives.Load balance.Parallelization.

Triangle size gets smaller and smallerin modern graphics scenes.

Page 69: Stream Processing

Difference between OpenGL and Reyes

OpenGL ReyesTwo programming stages.

One programming stage.

Mipmapping (non coherenttexture access).

Coherent access texture.

Primitives are triangles. Primitives are micropolygons.

Does not support high order data type.

Support high order data type (e.g.: Bezier surfaces).

Reyes reduces the necessary bandwidth between host CPUand graphics card.

Page 70: Stream Processing

Implementation on the Stream Processor

OpenGL modifications: Programmable shader added. Barycentric rasterizer algorithm instead of

scanline algorithm. Reyes modifications:

No supersampling. Micropolygon size is not half a pixel

anymore.

Page 71: Stream Processing

Implementation on the Stream Processor

Frame Speed

FrameComplexity/ Quality

OpenGL Reyes

Page 72: Stream Processing

Implementation on the Stream Processor

Frame Speed

FrameComplexity/ Quality

Enhanced OpenGLImplementation

Degraded ReyesImplementation

Page 73: Stream Processing

Implementation on the Stream Processor

OpenGLImplementation

ReyesImplementation

Isim Simulator Models complete Imagine architecture.

Idebug Simulator Do not model kernel stalls Do not model cluster occupancy effects Increased size of dynamically addressable memory

How to compare the results?

Page 74: Stream Processing

Implementation on the Stream Processor

OpenGLImplementation

ReyesImplementation

Isim Simulator Models complete Imagine architecture.

Idebug Simulator Do not model kernel stalls Do not model cluster occupancy effects Increased size of dynamically addressable memory

Results of Idebug multiplied by 20%

Page 75: Stream Processing

Results

Page 76: Stream Processing

Conclusion “When comparing graphics

algorithms, [the lack of specialization] does make Imagine performance-neutral to the algorithms employed”.

“Our Reyes implementation made slight changes to the simulated Imagine hardware [...] Having a larger [size of addressable memory] was vital for kernel efficiency”.

Page 77: Stream Processing

Conclusion “Imagine is an appropriate platform

for comparing different rendering algorithms toward an eventual goal of high-performance hardware implementation.”

Page 78: Stream Processing

Conclusion “Continued work in the area of

efficient and powerful subdivision algorithm is necessary to allow a Reyes pipeline to demonstrate comparable performance to its OpenGL counterpart.”