commodity components 1 greg humphreys october 10, 2002

89
Commodity Components 1 Greg Humphreys Greg Humphreys October 10, 2002 October 10, 2002

Upload: bailey-stricklen

Post on 14-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Commodity Components 1 Greg Humphreys October 10, 2002

Commodity Components 1Commodity Components 1

Greg HumphreysGreg HumphreysOctober 10, 2002October 10, 2002

Page 2: Commodity Components 1 Greg Humphreys October 10, 2002

Life Is But a (Graphics) StreamLife Is But a (Graphics) Stream

C C C C

T

TT

T

S S S S

“There’s a bug somewhere…”

Page 3: Commodity Components 1 Greg Humphreys October 10, 2002

You Spent How Much Money You Spent How Much Money On On WhatWhat Exactly? Exactly?

You Spent How Much Money You Spent How Much Money On On WhatWhat Exactly? Exactly?

Page 4: Commodity Components 1 Greg Humphreys October 10, 2002

Several Years of Failed ExperimentsSeveral Years of Failed Experiments

1975-1980

1982-1986

1980-1982

1986-Present

Page 5: Commodity Components 1 Greg Humphreys October 10, 2002

The ProblemThe Problem

Scalable graphics solutions are rare Scalable graphics solutions are rare and expensiveand expensive

Commodity technology is getting Commodity technology is getting fasterfaster

But it tends not to scaleBut it tends not to scale

Cluster graphics solutions have been Cluster graphics solutions have been inflexibleinflexible

Page 6: Commodity Components 1 Greg Humphreys October 10, 2002

Big ModelsBig Models

Scans of Saint Matthew (386 MPolys) and the David (2 GPolys) Stanford Digital Michelangelo Project

Page 7: Commodity Components 1 Greg Humphreys October 10, 2002
Page 8: Commodity Components 1 Greg Humphreys October 10, 2002

Large DisplaysLarge Displays

Window system and large-screen interaction metaphorsFrançois Guimbretière, Stanford University HCI group

Page 9: Commodity Components 1 Greg Humphreys October 10, 2002

Modern Graphics ArchitectureModern Graphics ArchitectureNVIDIA GeForce4 Ti 4600NVIDIA GeForce4 Ti 4600Amazing technology:Amazing technology:

• 4.8 Gpix/sec (antialiased)4.8 Gpix/sec (antialiased)

• 136 Mtri/sec136 Mtri/sec

• Programmable pipeline stagesProgrammable pipeline stages

Capabilities increasing at roughly Capabilities increasing at roughly 225% per year225% per year

But it doesn’t scale:But it doesn’t scale:• Triangle rate is bus-limited – 136 Mtri/sec mostly Triangle rate is bus-limited – 136 Mtri/sec mostly

unachievableunachievable

• Display resolution growing very slowlyDisplay resolution growing very slowly

Result: There is a serious gap between dataset Result: There is a serious gap between dataset complexity and processing powercomplexity and processing power

GeForce4 Die Plot Courtesy NVIDIA

Page 10: Commodity Components 1 Greg Humphreys October 10, 2002

Why Clusters?Why Clusters?

Commodity partsCommodity parts• Complete graphics pipeline on a single chip

• Extremely fast product cycle

FlexibilityFlexibility• Configurable building blocks

CostCost• Driven by consumer demand

• Economies of scale

AvailabilityAvailability• Insufficient demand for “big iron” solutions

• Little or no ongoing innovation in graphics “supercomputers”

Page 11: Commodity Components 1 Greg Humphreys October 10, 2002

Stanford/DOE Visualization ClusterStanford/DOE Visualization Cluster32 nodes, each with graphics32 nodes, each with graphics

Compaq SP750Compaq SP750• Dual 800 MHz PIII Xeon

• i840 logic

• 256 MB memory

• 18 GB disk

• 64-bit 66 MHz PCI

• AGP-4x

GraphicsGraphics• 16 NVIDIA Quadro2 Pro

• 16 NVIDIA GeForce 3

NetworkNetwork• Myrinet (LANai 7 ~ 100 MB/sec)

Page 12: Commodity Components 1 Greg Humphreys October 10, 2002

Virginia Cluster?Virginia Cluster?

Page 13: Commodity Components 1 Greg Humphreys October 10, 2002

IdeasIdeas

Technology for driving tiled displaysTechnology for driving tiled displays•Unmodified applications

•Efficient network usage

Scalable rendering rates on clustersScalable rendering rates on clusters•161 Mtri/sec at interactive rates to a display wall

•1.6 Gvox/sec at interactive rates to a single display

Cluster graphics as stream processingCluster graphics as stream processing•Virtual graphics interface

•Flexible mechanism for non-invasive transformations

Page 14: Commodity Components 1 Greg Humphreys October 10, 2002

The Name GameThe Name Game

One idea, two systems:One idea, two systems:WireGLWireGL

• Sort-first parallel rendering for tiled displays

• Released to the public in 2000

ChromiumChromium• General stream processing framework

• Multiple parallel rendering architectures

• Open-source project started in June 2001

• Alpha release September 2001

• Beta release April 2002

• 1.0 release September 2002

Page 15: Commodity Components 1 Greg Humphreys October 10, 2002

General ApproachGeneral Approach

Replace system’s OpenGL driverReplace system’s OpenGL driver• Industry standard API

•Support existing unmodified applications

Manipulate streams of API commandsManipulate streams of API commands•Route commands over a network

•Track state!

•Render commands using graphics hardware

Allow parallel applications to issue Allow parallel applications to issue OpenGLOpenGL•Constrain ordering between multiple streams

Page 16: Commodity Components 1 Greg Humphreys October 10, 2002

Cluster GraphicsCluster Graphics

•Raw scalability is easy (just add more pipelines)•One of our goals is to expose that scalability to an application

AppGraphicsHardware

Display

AppGraphicsHardware

Display

AppGraphicsHardware

Display

AppGraphicsHardware

Display

Page 17: Commodity Components 1 Greg Humphreys October 10, 2002

Cluster GraphicsCluster Graphics

• Graphics hardware is indivisible• Each graphics pipeline managed by a network server

App Server Display

App Server Display

App Server Display

App Server Display

Page 18: Commodity Components 1 Greg Humphreys October 10, 2002

Cluster GraphicsCluster Graphics

• Flexible number of clients, servers and displays• Compute limited = more clients• Graphics limited = more servers• Interface/network limited = more of both

Server

App

Server Display

App

Server Display

App

Server

Page 19: Commodity Components 1 Greg Humphreys October 10, 2002

Output ScalabilityOutput Scalability

Larger displays with Larger displays with unmodifiedunmodified applicationsapplicationsOther possibilities: broadcast, ring networkOther possibilities: broadcast, ring network

App

Server

Server

Server

...

Display

Display

Display

...

Page 20: Commodity Components 1 Greg Humphreys October 10, 2002

1 byte

12 bytes

Protocol DesignProtocol Design

1 byte overhead per function call1 byte overhead per function call

glColor3f( 1.0, 0.5, 0.5 );

COLOR3F1.00.50.5

VERTEX3F

1.02.03.0

COLOR3F

0.51.00.5

VERTEX3F

2.03.01.0

glVertex3f( 1.0, 2.0, 3.0 );

glColor3f( 0.5, 1.0, 0.5 );

glVertex3f( 2.0, 3.0, 1.0 );

Page 21: Commodity Components 1 Greg Humphreys October 10, 2002

Protocol DesignProtocol Design

1 byte overhead per function call1 byte overhead per function call

glColor3f( 1.0, 0.5, 0.5 );

COLOR3F1.00.50.5

VERTEX3F

1.02.03.0

COLOR3F

0.51.00.5

VERTEX3F

2.03.01.0

glVertex3f( 1.0, 2.0, 3.0 );

glColor3f( 0.5, 1.0, 0.5 );

glVertex3f( 2.0, 3.0, 1.0 );

Opcodes

Data

Page 22: Commodity Components 1 Greg Humphreys October 10, 2002

Efficient Remote RenderingEfficient Remote Rendering

NetworkNetwork Mvert/secMvert/sec EfficiencyEfficiency

DirectDirect NoneNone 21.5021.50

Application draws 60,000 vertices/frame Measurements using 800 MhZ PIII + GeForce2Efficiency assumes 12 bytes per triangle

Page 23: Commodity Components 1 Greg Humphreys October 10, 2002

Efficient Remote RenderingEfficient Remote Rendering

NetworkNetwork Mvert/secMvert/sec EfficiencyEfficiency

DirectDirect NoneNone 21.5021.50

GLXGLX 100 Mbit100 Mbit 0.730.73 70%70%

Application draws 60,000 vertices/frame Measurements using 800 MhZ PIII + GeForce2Efficiency assumes 12 bytes per triangle

Page 24: Commodity Components 1 Greg Humphreys October 10, 2002

Efficient Remote RenderingEfficient Remote Rendering

NetworkNetwork Mvert/secMvert/sec EfficiencyEfficiency

DirectDirect NoneNone 21.5021.50

GLXGLX 100 Mbit100 Mbit 0.730.73 70%70%

WireGLWireGL 100 Mbit100 Mbit 0.900.90 86%86%

Application draws 60,000 vertices/frame Measurements using 800 MhZ PIII + GeForce2Efficiency assumes 12 bytes per triangle

Page 25: Commodity Components 1 Greg Humphreys October 10, 2002

Efficient Remote RenderingEfficient Remote Rendering

NetworkNetwork Mvert/secMvert/sec EfficiencyEfficiency

DirectDirect NoneNone 21.5021.50

GLXGLX 100 Mbit100 Mbit 0.730.73 70%70%

WireGLWireGL 100 Mbit100 Mbit 0.900.90 86%86%

WireGLWireGL MyrinetMyrinet 9.189.18 88%88%

Application draws 60,000 vertices/frame Measurements using 800 MhZ PIII + GeForce2Efficiency assumes 12 bytes per triangle

Page 26: Commodity Components 1 Greg Humphreys October 10, 2002

Efficient Remote RenderingEfficient Remote Rendering

NetworkNetwork Mvert/secMvert/sec EfficiencyEfficiency

DirectDirect NoneNone 21.5021.50

GLXGLX 100 Mbit100 Mbit 0.730.73 70%70%

WireGLWireGL 100 Mbit100 Mbit 0.900.90 86%86%

WireGLWireGL MyrinetMyrinet 9.189.18 88%88%

WireGLWireGL None*None* 20.9020.90 97%97%

Application draws 60,000 vertices/frame Measurements using 800 MhZ PIII + GeForce2Efficiency assumes 12 bytes per triangle*None: discard packets, measuring pack rate

Page 27: Commodity Components 1 Greg Humphreys October 10, 2002

Sort-first Stream SpecializationSort-first Stream Specialization

Update bounding box per-vertexUpdate bounding box per-vertex

Transform bounds to screen-spaceTransform bounds to screen-space

Assign primitives to servers (with Assign primitives to servers (with overlap)overlap)

Page 28: Commodity Components 1 Greg Humphreys October 10, 2002

Graphics StateGraphics State

OpenGL is a big state machineOpenGL is a big state machineState encapsulates control for State encapsulates control for

geometric operationsgeometric operations•Lighting/shading parameters

•Texture maps and texture mapping parameters

•Boolean enables/disables

•Rendering modes

Example: Example: glColor3f( 1.0, 1.0, 1.0 )glColor3f( 1.0, 1.0, 1.0 )• Sets the current color to white

•Any subsequent primitives will appear white

Page 29: Commodity Components 1 Greg Humphreys October 10, 2002

Lazy State UpdateLazy State Update

Track entire OpenGL stateTrack entire OpenGL state

Precede a tile’s geometry with state Precede a tile’s geometry with state deltasdeltas

glTexImage2D(…)

glBlendFunc(…)

glEnable(…)

glLightfv(…)

glMaterialf(…)

glEnable(…)

Ian Buck, Greg Humphreys and Pat Hanrahan, Graphics Hardware Workshop 2000

Page 30: Commodity Components 1 Greg Humphreys October 10, 2002

How Does State Tracking Work?How Does State Tracking Work?

Tracking state is a no-brainer, it’s the frequent Tracking state is a no-brainer, it’s the frequent context differences that complicate thingscontext differences that complicate things

Need to quickly find the elements that are Need to quickly find the elements that are differentdifferent

Represent state as a hierarchy of dirty bitsRepresent state as a hierarchy of dirty bits

18 top-level categories: buffer, transformation, 18 top-level categories: buffer, transformation, lighting, texture, stencil, etc.lighting, texture, stencil, etc.

Actually, use dirty bit-vectors. Each bit Actually, use dirty bit-vectors. Each bit corresponds to a rendering servercorresponds to a rendering server

Page 31: Commodity Components 1 Greg Humphreys October 10, 2002

0 0 0 0 0 0 0 0

Inside State TrackingInside State Tracking

glLightf( GL_LIGHT1, GL_SPOT_CUTOFF, 45)

0 0 0 0 0 0 0 0 Transformation

0 0 0 0 0 0 0 0 Pixel

Lighting1 1 1 1 1 1 1 1

0 0 0 0 0 0 0 0 Texture

0 0 0 0 0 0 0 0 Buffer

0 0 0 0 0 0 0 0 Fog

0 0 0 0 0 0 0 0 Line

0 0 0 0 0 0 0 0 Polygon

0 0 0 0 0 0 0 0 Viewport

0 0 0 0 0 0 0 0 Scissor

0 0 0 0 0 0 0 0

Light 00 0 0 0 0 0 0 0

Light 1

0 0 0 0 0 0 0 0 Light 2

0 0 0 0 0 0 0 0

Light 30 0 0 0 0 0 0 0

Light 4

0 0 0 0 0 0 0 0 Light 5

0 0 0 0 0 0 0 0 Texture

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 Ambient color

0 0 0 0 0 0 0 0 Diffuse color

Spot cutoff

(0,0,0,1)

(1,1,1,1)

180

0 0 0 0 0 0 0 0 Light 2

1 1 1 1 1 1 1 1

451 1 1 1 1 1 1 1

Page 32: Commodity Components 1 Greg Humphreys October 10, 2002

Context ComparisonContext Comparison

(0,0,0,1)

(1,1,1,1)

45

Client State Server 2’s State

(0,0,0,1)

(1,1,1,1)

180

Bit 2 set?Bit 2 set?Bit 2 set?Bit 2 set?Bit 2 set?

Bit 2 set?

No, skip itNo, skip itYes, drill downNo, skip itYes, drill down

No, skip it

Equal

EqualNot Equal!Pack Command!

LIGHTF_OPGL_LIGHT1

GL_SPOT_CUTOFF45

45

Page 33: Commodity Components 1 Greg Humphreys October 10, 2002

Output Scalability ResultsOutput Scalability Results

Marching Cubes

Point-to-point “broadcast” doesn’t scale at allHowever, it’s still the commercial solution [SGI Cluster]

Page 34: Commodity Components 1 Greg Humphreys October 10, 2002

Output Scalability ResultsOutput Scalability Results

Quake III: Arena

Larger polygons overlap more tiles

Page 35: Commodity Components 1 Greg Humphreys October 10, 2002

WireGL on the SIGGRAPH show floorWireGL on the SIGGRAPH show floor

Page 36: Commodity Components 1 Greg Humphreys October 10, 2002

Input ScalabilityInput Scalability

Parallel geometry extractionParallel geometry extractionParallel data submissionParallel data submissionOrdering?Ordering?

App

App

App

...DisplayServer

Page 37: Commodity Components 1 Greg Humphreys October 10, 2002

Parallel OpenGL APIParallel OpenGL API

Express ordering constraints between Express ordering constraints between multiple independent graphics contexts multiple independent graphics contexts

Don’t block the application, just encode Don’t block the application, just encode them like any other graphics commandthem like any other graphics command

Ordering is resolved by the rendering Ordering is resolved by the rendering serverserver

Homan Igehy, Gordon Stoll and Pat Hanrahan, SIGGRAPH 98

Introduce new OpenGL commands:Introduce new OpenGL commands:•glBarrierExec•glSemaphoreP•glSemaphoreV

Page 38: Commodity Components 1 Greg Humphreys October 10, 2002

Serial OpenGL ExampleSerial OpenGL Example

def Display:glClear(…)DrawOpaqueGeometry()DrawTransparentGeometry()SwapBuffers()

Page 39: Commodity Components 1 Greg Humphreys October 10, 2002

Parallel OpenGL ExampleParallel OpenGL Exampledef Init:

glBarrierInit(barrier, 2)glSemaphoreInit(sema, 0)

def Display:if my_client_id == 1:

glClear(…)glBarrierExec(barrier)DrawOpaqueGeometry(my_client_id)glBarrierExec(barrier)if my_client_id == 1:

glSemaphoreP(sema)DrawTransparentGeometry1()

else:DrawTransparentGeometry2()glSemaphoreV(sema)

glBarrierExec(barrier)if my_client_id == 1:

SwapBuffers()

Optional in Chromium

Optional in Chromium

Page 40: Commodity Components 1 Greg Humphreys October 10, 2002

Inside a Rendering ServerInside a Rendering Server

Client 1

Client 2

Clear BarrierOpaqueGeom 1

TransparentGeom 1

SwapSemaP BarrierBarrier

BarrierOpaqueGeom 1

BarrierTransparent

Geom 2SemaV Barrier

Page 41: Commodity Components 1 Greg Humphreys October 10, 2002

Inside a Rendering ServerInside a Rendering Server

Client 1

Client 2

BarrierOpaqueGeom 1

TransparentGeom 1

SwapSemaP BarrierBarrier

BarrierOpaqueGeom 1

BarrierTransparent

Geom 2SemaV Barrier

Page 42: Commodity Components 1 Greg Humphreys October 10, 2002

Inside a Rendering ServerInside a Rendering Server

Client 1

Client 2

OpaqueGeom 1

TransparentGeom 1

SwapSemaP BarrierBarrier

BarrierOpaqueGeom 1

BarrierTransparent

Geom 2SemaV Barrier

Page 43: Commodity Components 1 Greg Humphreys October 10, 2002

Inside a Rendering ServerInside a Rendering Server

Client 1

Client 2

OpaqueGeom 1

TransparentGeom 1

SwapSemaP BarrierBarrier

BarrierOpaqueGeom 1

BarrierTransparent

Geom 2SemaV Barrier

Page 44: Commodity Components 1 Greg Humphreys October 10, 2002

Inside a Rendering ServerInside a Rendering Server

Client 1

Client 2

OpaqueGeom 1

TransparentGeom 1

SwapSemaP BarrierBarrier

OpaqueGeom 1

BarrierTransparent

Geom 2SemaV Barrier

Page 45: Commodity Components 1 Greg Humphreys October 10, 2002

Inside a Rendering ServerInside a Rendering Server

Client 1

Client 2

OpaqueGeom 1

TransparentGeom 1

SwapSemaP BarrierBarrier

BarrierTransparent

Geom 2SemaV Barrier

Page 46: Commodity Components 1 Greg Humphreys October 10, 2002

Inside a Rendering ServerInside a Rendering Server

Client 1

Client 2

OpaqueGeom 1

TransparentGeom 1

SwapSemaP BarrierBarrier

TransparentGeom 2

SemaV Barrier

Page 47: Commodity Components 1 Greg Humphreys October 10, 2002

Inside a Rendering ServerInside a Rendering Server

Client 1

Client 2

OpaqueGeom 1

TransparentGeom 1

SwapSemaP BarrierBarrier

TransparentGeom 2

SemaV Barrier

Page 48: Commodity Components 1 Greg Humphreys October 10, 2002

Inside a Rendering ServerInside a Rendering Server

Client 1

Client 2

TransparentGeom 1

SwapSemaP BarrierBarrier

TransparentGeom 2

SemaV Barrier

Page 49: Commodity Components 1 Greg Humphreys October 10, 2002

Inside a Rendering ServerInside a Rendering Server

Client 1

Client 2

TransparentGeom 1

SwapSemaP Barrier

TransparentGeom 2

SemaV Barrier

Page 50: Commodity Components 1 Greg Humphreys October 10, 2002

Inside a Rendering ServerInside a Rendering Server

Client 1

Client 2

TransparentGeom 1

SwapBarrier

TransparentGeom 2

SemaV Barrier

Page 51: Commodity Components 1 Greg Humphreys October 10, 2002

Inside a Rendering ServerInside a Rendering Server

Client 1

Client 2

TransparentGeom 1

SwapBarrier

TransparentGeom 2

SemaV Barrier

Page 52: Commodity Components 1 Greg Humphreys October 10, 2002

Inside a Rendering ServerInside a Rendering Server

Client 1

Client 2

TransparentGeom 1

SwapBarrier

BarrierSemaV

Page 53: Commodity Components 1 Greg Humphreys October 10, 2002

Inside a Rendering ServerInside a Rendering Server

Client 1

Client 2

TransparentGeom 1

SwapBarrier

Barrier

Page 54: Commodity Components 1 Greg Humphreys October 10, 2002

Inside a Rendering ServerInside a Rendering Server

Client 1

Client 2

TransparentGeom 1

SwapBarrier

Page 55: Commodity Components 1 Greg Humphreys October 10, 2002

Inside a Rendering ServerInside a Rendering Server

Client 1

Client 2

TransparentGeom 1

SwapBarrier

Page 56: Commodity Components 1 Greg Humphreys October 10, 2002

Inside a Rendering ServerInside a Rendering Server

Client 1

Client 2

SwapBarrier

Page 57: Commodity Components 1 Greg Humphreys October 10, 2002

Inside a Rendering ServerInside a Rendering Server

Client 1

Client 2

Swap

Page 58: Commodity Components 1 Greg Humphreys October 10, 2002

Inside a Rendering ServerInside a Rendering Server

Client 1

Client 2

Page 59: Commodity Components 1 Greg Humphreys October 10, 2002

Input Scalability ResultsInput Scalability Results

Multiple clients, Multiple clients, oneone serverserver

Compute limited applicationCompute limited application

Page 60: Commodity Components 1 Greg Humphreys October 10, 2002

A Fully Parallel ConfigurationA Fully Parallel Configuration

App

App

App

...

Server

Server

Server

...Display

Display

Display

Page 61: Commodity Components 1 Greg Humphreys October 10, 2002

Fully Parallel ResultsFully Parallel Results

1-1 rate: 472 KTri/sec16-16 rate: 6.2 MTri/sec

Page 62: Commodity Components 1 Greg Humphreys October 10, 2002

Peak Sort-First Rendering PerformancePeak Sort-First Rendering Performance

Immediate ModeImmediate Mode• Unstructured triangle

strips

• 200 triangles/strip

Data change Data change every every frameframe

Peak observed: Peak observed: 161,000,000 161,000,000 tris/secondtris/second

Total scene size = Total scene size = 16,000,000 16,000,000 trianglestriangles

Total display size at Total display size at 32 nodes: 32 nodes: 2048x1024 2048x1024 (256x256 tiles)(256x256 tiles)

Page 63: Commodity Components 1 Greg Humphreys October 10, 2002

WireGL Image ReassemblyWireGL Image Reassembly

Composite

App

App

App

...

Server

Server

Server

...

NetworkGeometry

NetworkImagery

Display

Page 64: Commodity Components 1 Greg Humphreys October 10, 2002

WireGL Image ReassemblyWireGL Image Reassembly

Composite

ParallelOpenGL

ParallelOpenGL

App

App

App

...

Server

Server

Server

...Display

Page 65: Commodity Components 1 Greg Humphreys October 10, 2002

WireGL Image ReassemblyWireGL Image Reassembly

Server

ParallelOpenGL

ParallelOpenGL

Composite == Server!Composite == Server!

Server

Server

Server

...

App

App

App

...Display

Page 66: Commodity Components 1 Greg Humphreys October 10, 2002

Lightning-2 = Digital Video Switch from Intel ResearchLightning-2 = Digital Video Switch from Intel ResearchRoute partial scanlines from Route partial scanlines from nn inputs to inputs to mm outputs outputsTile reassembly or depth compositing at full refresh rateTile reassembly or depth compositing at full refresh rateLightning-2 and WireGL demonstrated at SIGGRAPH Lightning-2 and WireGL demonstrated at SIGGRAPH 20012001

Image Reassembly in HardwareImage Reassembly in Hardware

Gordon Stoll et al., SIGGRAPH 2001

Page 67: Commodity Components 1 Greg Humphreys October 10, 2002

Example: 16-way Tiling of One MonitorExample: 16-way Tiling of One Monitor

Framebuffer 1

Framebuffer 2

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

1 3 6

8 9 11

14 16

2 4 5

7 10 12

13 15

3 6

ReconstructedImage

StripHeaders

Page 68: Commodity Components 1 Greg Humphreys October 10, 2002

WireGL ShortcomingsWireGL Shortcomings

Sort-firstSort-first•Can be difficult to load-balance

•Screen-space parallelism limited

•Heavily dependent on spatial locality

Resource utilizationResource utilization•Geometry must move over network every frame

•Server’s graphics hardware remains underutilized

We need something more flexibleWe need something more flexible

S

AA

A

...

SS

S

...

Page 69: Commodity Components 1 Greg Humphreys October 10, 2002

Stream ProcessingStream Processing

Stream Source Transform 1

Transform 1

Transform 2...

...

Streams:• Ordered sequences of records• Potentially infinite

Transformations:• Process only the head element• Finite local storage• Can be partitioned across processors to expose

parallelism

Stream Source

Stream Output

Page 70: Commodity Components 1 Greg Humphreys October 10, 2002

Why Stream Processing?Why Stream Processing?

Elegant mechanism for dealing with huge Elegant mechanism for dealing with huge datadata• Explicitly expose and exploit parallelism

• Hide latency

State of the art in many fields:State of the art in many fields:• Databases [Terry92, Babu01]

• Telephony [Cortes00]

• Online Algorithms [Borodin98,O’Callaghan02]

• Sensor Fusion [Madden01]

• Media Processing [Halfhill00,Khailany01]

• Computer Architecture [Rixner98]

• Graphics Hardware [Owens00, NVIDIA, ATI]

Page 71: Commodity Components 1 Greg Humphreys October 10, 2002

Cluster Graphics As Stream ProcessingCluster Graphics As Stream ProcessingTreat OpenGL calls as a stream of Treat OpenGL calls as a stream of

commandscommands

Form a DAG of stream transformation Form a DAG of stream transformation nodesnodes•Nodes are computers in a cluster

•Edges are OpenGL API communication

Each node has a Each node has a serializationserialization stage stage and a and a transformationtransformation stage stage

Page 72: Commodity Components 1 Greg Humphreys October 10, 2002

Stream SerializationStream Serialization

Convert multiple streams into a single Convert multiple streams into a single streamstream

Efficiently context-switch between streamsEfficiently context-switch between streams

Constrain ordering using Parallel OpenGL Constrain ordering using Parallel OpenGL APIAPI

Two kinds of serializers:Two kinds of serializers:

•Network server:

•Application:

•Unmodified serial application

•Custom parallel application

S

AOpenGL

Page 73: Commodity Components 1 Greg Humphreys October 10, 2002

Stream TransformationStream Transformation

Serialized stream is dispatched to Serialized stream is dispatched to “Stream Processing Units” (SPUs)“Stream Processing Units” (SPUs)

Each SPU is a shared libraryEach SPU is a shared library•Exports a (partial) OpenGL interface

Each node loads a Each node loads a chainchain of SPUs at of SPUs at run timerun time

SPUs are generic and interchangeableSPUs are generic and interchangeable

Page 74: Commodity Components 1 Greg Humphreys October 10, 2002

Example: WireGL RevealedExample: WireGL Revealed

App

App

App

...Tilesort

Tilesort

Tilesort

...

Server

Server

Server

Readback

Readback

Readback

Send

Send

Send

Server

Render

Page 75: Commodity Components 1 Greg Humphreys October 10, 2002

SPU InheritanceSPU Inheritance

The Readback and Render SPUs are The Readback and Render SPUs are relatedrelated•Readback renders everything except

SwapBuffers

Readback Readback inheritsinherits from the Render from the Render SPUSPU•Override parent’s implementation of

SwapBuffers

•All OpenGL calls considered “virtual”

Page 76: Commodity Components 1 Greg Humphreys October 10, 2002

Example: Readback’s SwapBuffersExample: Readback’s SwapBuffers

Easily extended to include depth composite Easily extended to include depth composite

All other functions inherited from Render SPUAll other functions inherited from Render SPU

void RB_SwapBuffers(void){ self.ReadPixels( 0, 0, w, h, ... ); if (self.id == 0) child.Clear( GL_COLOR_BUFFER_BIT ); child.BarrierExec( READBACK_BARRIER ); child.RasterPos2i( tileX, tileY ); child.DrawPixels( w, h, ... ); child.BarrierExec( READBACK_BARRIER ); if (self.id == 0) child.SwapBuffers( );}

Optional

Optional

Page 77: Commodity Components 1 Greg Humphreys October 10, 2002

Virtual GraphicsVirtual Graphics

Separate interface from Separate interface from implementationimplementation

Underlying architecture can change Underlying architecture can change without application’s knowledgewithout application’s knowledge

Douglas Voorhies, David Kirk and Olin Lathrop, SIGGRAPH 88

Page 78: Commodity Components 1 Greg Humphreys October 10, 2002

Example: Sort-LastExample: Sort-Last

Application runs directly on graphics hardwareApplication runs directly on graphics hardware

Same application can use sort-last or sort-firstSame application can use sort-last or sort-first

...

Application

Application

Application

Readback

Readback

Readback

Send

Send

Send

Server

Render

Page 79: Commodity Components 1 Greg Humphreys October 10, 2002

Example: Sort-Last Binary SwapExample: Sort-Last Binary Swap

Application Application

Application Application

Readback Readback

Readback Readback

BSwap BSwap

BSwap BSwap

Send Send

Send Send

Server

Render

Page 80: Commodity Components 1 Greg Humphreys October 10, 2002

Binary Swap Volume Rendering ResultsBinary Swap Volume Rendering Results

One node

Two nodes

Four nodesEight nodes Sixteen nodes

Page 81: Commodity Components 1 Greg Humphreys October 10, 2002

Example: User Interface ReintegrationExample: User Interface Reintegration

App

Tilesort ...

Server

Server

Server

IBMT221

Display(3840x2400)

IBMScalableGraphicsEngine

Chromium ProtocolUDP/Gigabit EthernetDigital Video Cables

Integrate

Integrate

Integrate

Serial applications can drive the T221 with their original user interfaceParallel applications can have a user interface

Page 82: Commodity Components 1 Greg Humphreys October 10, 2002

CATIA Driving IBM’s T221 DisplayCATIA Driving IBM’s T221 Display

Jet engine nacelle model courtesy Goodrich AerostructuresX-Windows Integration SPU by Peter Kirchner and Jim Klosowski, IBM T.J. WatsonChromium is the only practical way to drive the T221 with an existing applicationDemonstrated at Supercomputing 2001

3840

24

00

Page 83: Commodity Components 1 Greg Humphreys October 10, 2002

A Hidden-line Style SPUA Hidden-line Style SPU

Page 84: Commodity Components 1 Greg Humphreys October 10, 2002

A Hidden-line Style SPUA Hidden-line Style SPU

Page 85: Commodity Components 1 Greg Humphreys October 10, 2002

DemoDemo

Quake III

Render

Quake III

StateQueryVertexArrayHiddenLine Render

Server

HiddenLine Render

Quake III

SendStateQuery

Page 86: Commodity Components 1 Greg Humphreys October 10, 2002

Is “HiddenLine” Really a SPU?Is “HiddenLine” Really a SPU?

Technically, no!Technically, no!

Requires potentially unbounded Requires potentially unbounded resourcesresources

Alternate design:Alternate design:

Application

SQ VAHiddenLine

Server

Server

ReadbackSend

ReadbackSend

Server

Render

Lines

Polygons

DepthComposit

e

Page 87: Commodity Components 1 Greg Humphreys October 10, 2002

Current ShortcomingsCurrent Shortcomings

General display lists on tiled displaysGeneral display lists on tiled displays•Display lists that affect the graphics state

Distributed texture managementDistributed texture management•Each node must provide its own texture

•Potential N2 texture explosion

•Virtualize distributed texture access?

Ease of creating parallel applicationsEase of creating parallel applications• Input event management

Page 88: Commodity Components 1 Greg Humphreys October 10, 2002

Future DirectionsFuture Directions

End-to-end visualization system for 4D End-to-end visualization system for 4D datadata•Data management and load balancing

•Volume compression

Remote/Ubiquitous VisualizationRemote/Ubiquitous Visualization•Scalable graphics as a shared resource

•Transparent remote interaction with (parallel) apps

•Shift away from desktop-attached graphics

Taxonomy of non-invasive techniquesTaxonomy of non-invasive techniques•Classify SPUs and algorithms

• Identify tradeoffs in design

Page 89: Commodity Components 1 Greg Humphreys October 10, 2002

Observations and PredictionsObservations and Predictions

Manipulation of graphics streams is a Manipulation of graphics streams is a powerful abstraction for cluster powerful abstraction for cluster graphicsgraphics•Achieves both input and output scalability

Providing Providing mechanismsmechanisms instead of instead of algorithms algorithms allows greater flexibilityallows greater flexibility•Data management algorithms can be built into

a parallel application or embedded in a SPU

Flexible remote graphics will lead to a Flexible remote graphics will lead to a revolution in ubiquitous computingrevolution in ubiquitous computing