commodity components 1 greg humphreys october 10, 2002
TRANSCRIPT
Commodity Components 1Commodity Components 1
Greg HumphreysGreg HumphreysOctober 10, 2002October 10, 2002
Life Is But a (Graphics) StreamLife Is But a (Graphics) Stream
C C C C
T
TT
T
S S S S
“There’s a bug somewhere…”
You Spent How Much Money You Spent How Much Money On On WhatWhat Exactly? Exactly?
You Spent How Much Money You Spent How Much Money On On WhatWhat Exactly? Exactly?
Several Years of Failed ExperimentsSeveral Years of Failed Experiments
1975-1980
1982-1986
1980-1982
1986-Present
The ProblemThe Problem
Scalable graphics solutions are rare Scalable graphics solutions are rare and expensiveand expensive
Commodity technology is getting Commodity technology is getting fasterfaster
But it tends not to scaleBut it tends not to scale
Cluster graphics solutions have been Cluster graphics solutions have been inflexibleinflexible
Big ModelsBig Models
Scans of Saint Matthew (386 MPolys) and the David (2 GPolys) Stanford Digital Michelangelo Project
Large DisplaysLarge Displays
Window system and large-screen interaction metaphorsFrançois Guimbretière, Stanford University HCI group
Modern Graphics ArchitectureModern Graphics ArchitectureNVIDIA GeForce4 Ti 4600NVIDIA GeForce4 Ti 4600Amazing technology:Amazing technology:
• 4.8 Gpix/sec (antialiased)4.8 Gpix/sec (antialiased)
• 136 Mtri/sec136 Mtri/sec
• Programmable pipeline stagesProgrammable pipeline stages
Capabilities increasing at roughly Capabilities increasing at roughly 225% per year225% per year
But it doesn’t scale:But it doesn’t scale:• Triangle rate is bus-limited – 136 Mtri/sec mostly Triangle rate is bus-limited – 136 Mtri/sec mostly
unachievableunachievable
• Display resolution growing very slowlyDisplay resolution growing very slowly
Result: There is a serious gap between dataset Result: There is a serious gap between dataset complexity and processing powercomplexity and processing power
GeForce4 Die Plot Courtesy NVIDIA
Why Clusters?Why Clusters?
Commodity partsCommodity parts• Complete graphics pipeline on a single chip
• Extremely fast product cycle
FlexibilityFlexibility• Configurable building blocks
CostCost• Driven by consumer demand
• Economies of scale
AvailabilityAvailability• Insufficient demand for “big iron” solutions
• Little or no ongoing innovation in graphics “supercomputers”
Stanford/DOE Visualization ClusterStanford/DOE Visualization Cluster32 nodes, each with graphics32 nodes, each with graphics
Compaq SP750Compaq SP750• Dual 800 MHz PIII Xeon
• i840 logic
• 256 MB memory
• 18 GB disk
• 64-bit 66 MHz PCI
• AGP-4x
GraphicsGraphics• 16 NVIDIA Quadro2 Pro
• 16 NVIDIA GeForce 3
NetworkNetwork• Myrinet (LANai 7 ~ 100 MB/sec)
Virginia Cluster?Virginia Cluster?
IdeasIdeas
Technology for driving tiled displaysTechnology for driving tiled displays•Unmodified applications
•Efficient network usage
Scalable rendering rates on clustersScalable rendering rates on clusters•161 Mtri/sec at interactive rates to a display wall
•1.6 Gvox/sec at interactive rates to a single display
Cluster graphics as stream processingCluster graphics as stream processing•Virtual graphics interface
•Flexible mechanism for non-invasive transformations
The Name GameThe Name Game
One idea, two systems:One idea, two systems:WireGLWireGL
• Sort-first parallel rendering for tiled displays
• Released to the public in 2000
ChromiumChromium• General stream processing framework
• Multiple parallel rendering architectures
• Open-source project started in June 2001
• Alpha release September 2001
• Beta release April 2002
• 1.0 release September 2002
General ApproachGeneral Approach
Replace system’s OpenGL driverReplace system’s OpenGL driver• Industry standard API
•Support existing unmodified applications
Manipulate streams of API commandsManipulate streams of API commands•Route commands over a network
•Track state!
•Render commands using graphics hardware
Allow parallel applications to issue Allow parallel applications to issue OpenGLOpenGL•Constrain ordering between multiple streams
Cluster GraphicsCluster Graphics
•Raw scalability is easy (just add more pipelines)•One of our goals is to expose that scalability to an application
AppGraphicsHardware
Display
AppGraphicsHardware
Display
AppGraphicsHardware
Display
AppGraphicsHardware
Display
Cluster GraphicsCluster Graphics
• Graphics hardware is indivisible• Each graphics pipeline managed by a network server
App Server Display
App Server Display
App Server Display
App Server Display
Cluster GraphicsCluster Graphics
• Flexible number of clients, servers and displays• Compute limited = more clients• Graphics limited = more servers• Interface/network limited = more of both
Server
App
Server Display
App
Server Display
App
Server
Output ScalabilityOutput Scalability
Larger displays with Larger displays with unmodifiedunmodified applicationsapplicationsOther possibilities: broadcast, ring networkOther possibilities: broadcast, ring network
App
Server
Server
Server
...
Display
Display
Display
...
1 byte
12 bytes
Protocol DesignProtocol Design
1 byte overhead per function call1 byte overhead per function call
glColor3f( 1.0, 0.5, 0.5 );
COLOR3F1.00.50.5
VERTEX3F
1.02.03.0
COLOR3F
0.51.00.5
VERTEX3F
2.03.01.0
glVertex3f( 1.0, 2.0, 3.0 );
glColor3f( 0.5, 1.0, 0.5 );
glVertex3f( 2.0, 3.0, 1.0 );
Protocol DesignProtocol Design
1 byte overhead per function call1 byte overhead per function call
glColor3f( 1.0, 0.5, 0.5 );
COLOR3F1.00.50.5
VERTEX3F
1.02.03.0
COLOR3F
0.51.00.5
VERTEX3F
2.03.01.0
glVertex3f( 1.0, 2.0, 3.0 );
glColor3f( 0.5, 1.0, 0.5 );
glVertex3f( 2.0, 3.0, 1.0 );
Opcodes
Data
Efficient Remote RenderingEfficient Remote Rendering
NetworkNetwork Mvert/secMvert/sec EfficiencyEfficiency
DirectDirect NoneNone 21.5021.50
Application draws 60,000 vertices/frame Measurements using 800 MhZ PIII + GeForce2Efficiency assumes 12 bytes per triangle
Efficient Remote RenderingEfficient Remote Rendering
NetworkNetwork Mvert/secMvert/sec EfficiencyEfficiency
DirectDirect NoneNone 21.5021.50
GLXGLX 100 Mbit100 Mbit 0.730.73 70%70%
Application draws 60,000 vertices/frame Measurements using 800 MhZ PIII + GeForce2Efficiency assumes 12 bytes per triangle
Efficient Remote RenderingEfficient Remote Rendering
NetworkNetwork Mvert/secMvert/sec EfficiencyEfficiency
DirectDirect NoneNone 21.5021.50
GLXGLX 100 Mbit100 Mbit 0.730.73 70%70%
WireGLWireGL 100 Mbit100 Mbit 0.900.90 86%86%
Application draws 60,000 vertices/frame Measurements using 800 MhZ PIII + GeForce2Efficiency assumes 12 bytes per triangle
Efficient Remote RenderingEfficient Remote Rendering
NetworkNetwork Mvert/secMvert/sec EfficiencyEfficiency
DirectDirect NoneNone 21.5021.50
GLXGLX 100 Mbit100 Mbit 0.730.73 70%70%
WireGLWireGL 100 Mbit100 Mbit 0.900.90 86%86%
WireGLWireGL MyrinetMyrinet 9.189.18 88%88%
Application draws 60,000 vertices/frame Measurements using 800 MhZ PIII + GeForce2Efficiency assumes 12 bytes per triangle
Efficient Remote RenderingEfficient Remote Rendering
NetworkNetwork Mvert/secMvert/sec EfficiencyEfficiency
DirectDirect NoneNone 21.5021.50
GLXGLX 100 Mbit100 Mbit 0.730.73 70%70%
WireGLWireGL 100 Mbit100 Mbit 0.900.90 86%86%
WireGLWireGL MyrinetMyrinet 9.189.18 88%88%
WireGLWireGL None*None* 20.9020.90 97%97%
Application draws 60,000 vertices/frame Measurements using 800 MhZ PIII + GeForce2Efficiency assumes 12 bytes per triangle*None: discard packets, measuring pack rate
Sort-first Stream SpecializationSort-first Stream Specialization
Update bounding box per-vertexUpdate bounding box per-vertex
Transform bounds to screen-spaceTransform bounds to screen-space
Assign primitives to servers (with Assign primitives to servers (with overlap)overlap)
Graphics StateGraphics State
OpenGL is a big state machineOpenGL is a big state machineState encapsulates control for State encapsulates control for
geometric operationsgeometric operations•Lighting/shading parameters
•Texture maps and texture mapping parameters
•Boolean enables/disables
•Rendering modes
Example: Example: glColor3f( 1.0, 1.0, 1.0 )glColor3f( 1.0, 1.0, 1.0 )• Sets the current color to white
•Any subsequent primitives will appear white
Lazy State UpdateLazy State Update
Track entire OpenGL stateTrack entire OpenGL state
Precede a tile’s geometry with state Precede a tile’s geometry with state deltasdeltas
glTexImage2D(…)
glBlendFunc(…)
glEnable(…)
glLightfv(…)
glMaterialf(…)
glEnable(…)
Ian Buck, Greg Humphreys and Pat Hanrahan, Graphics Hardware Workshop 2000
How Does State Tracking Work?How Does State Tracking Work?
Tracking state is a no-brainer, it’s the frequent Tracking state is a no-brainer, it’s the frequent context differences that complicate thingscontext differences that complicate things
Need to quickly find the elements that are Need to quickly find the elements that are differentdifferent
Represent state as a hierarchy of dirty bitsRepresent state as a hierarchy of dirty bits
18 top-level categories: buffer, transformation, 18 top-level categories: buffer, transformation, lighting, texture, stencil, etc.lighting, texture, stencil, etc.
Actually, use dirty bit-vectors. Each bit Actually, use dirty bit-vectors. Each bit corresponds to a rendering servercorresponds to a rendering server
0 0 0 0 0 0 0 0
Inside State TrackingInside State Tracking
glLightf( GL_LIGHT1, GL_SPOT_CUTOFF, 45)
0 0 0 0 0 0 0 0 Transformation
0 0 0 0 0 0 0 0 Pixel
Lighting1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 Texture
0 0 0 0 0 0 0 0 Buffer
0 0 0 0 0 0 0 0 Fog
0 0 0 0 0 0 0 0 Line
0 0 0 0 0 0 0 0 Polygon
0 0 0 0 0 0 0 0 Viewport
0 0 0 0 0 0 0 0 Scissor
0 0 0 0 0 0 0 0
Light 00 0 0 0 0 0 0 0
Light 1
0 0 0 0 0 0 0 0 Light 2
0 0 0 0 0 0 0 0
Light 30 0 0 0 0 0 0 0
Light 4
0 0 0 0 0 0 0 0 Light 5
0 0 0 0 0 0 0 0 Texture
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 Ambient color
0 0 0 0 0 0 0 0 Diffuse color
Spot cutoff
(0,0,0,1)
(1,1,1,1)
180
0 0 0 0 0 0 0 0 Light 2
1 1 1 1 1 1 1 1
451 1 1 1 1 1 1 1
Context ComparisonContext Comparison
(0,0,0,1)
(1,1,1,1)
45
Client State Server 2’s State
(0,0,0,1)
(1,1,1,1)
180
Bit 2 set?Bit 2 set?Bit 2 set?Bit 2 set?Bit 2 set?
Bit 2 set?
No, skip itNo, skip itYes, drill downNo, skip itYes, drill down
No, skip it
Equal
EqualNot Equal!Pack Command!
LIGHTF_OPGL_LIGHT1
GL_SPOT_CUTOFF45
45
Output Scalability ResultsOutput Scalability Results
Marching Cubes
Point-to-point “broadcast” doesn’t scale at allHowever, it’s still the commercial solution [SGI Cluster]
Output Scalability ResultsOutput Scalability Results
Quake III: Arena
Larger polygons overlap more tiles
WireGL on the SIGGRAPH show floorWireGL on the SIGGRAPH show floor
Input ScalabilityInput Scalability
Parallel geometry extractionParallel geometry extractionParallel data submissionParallel data submissionOrdering?Ordering?
App
App
App
...DisplayServer
Parallel OpenGL APIParallel OpenGL API
Express ordering constraints between Express ordering constraints between multiple independent graphics contexts multiple independent graphics contexts
Don’t block the application, just encode Don’t block the application, just encode them like any other graphics commandthem like any other graphics command
Ordering is resolved by the rendering Ordering is resolved by the rendering serverserver
Homan Igehy, Gordon Stoll and Pat Hanrahan, SIGGRAPH 98
Introduce new OpenGL commands:Introduce new OpenGL commands:•glBarrierExec•glSemaphoreP•glSemaphoreV
Serial OpenGL ExampleSerial OpenGL Example
def Display:glClear(…)DrawOpaqueGeometry()DrawTransparentGeometry()SwapBuffers()
Parallel OpenGL ExampleParallel OpenGL Exampledef Init:
glBarrierInit(barrier, 2)glSemaphoreInit(sema, 0)
def Display:if my_client_id == 1:
glClear(…)glBarrierExec(barrier)DrawOpaqueGeometry(my_client_id)glBarrierExec(barrier)if my_client_id == 1:
glSemaphoreP(sema)DrawTransparentGeometry1()
else:DrawTransparentGeometry2()glSemaphoreV(sema)
glBarrierExec(barrier)if my_client_id == 1:
SwapBuffers()
Optional in Chromium
Optional in Chromium
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
Clear BarrierOpaqueGeom 1
TransparentGeom 1
SwapSemaP BarrierBarrier
BarrierOpaqueGeom 1
BarrierTransparent
Geom 2SemaV Barrier
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
BarrierOpaqueGeom 1
TransparentGeom 1
SwapSemaP BarrierBarrier
BarrierOpaqueGeom 1
BarrierTransparent
Geom 2SemaV Barrier
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
OpaqueGeom 1
TransparentGeom 1
SwapSemaP BarrierBarrier
BarrierOpaqueGeom 1
BarrierTransparent
Geom 2SemaV Barrier
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
OpaqueGeom 1
TransparentGeom 1
SwapSemaP BarrierBarrier
BarrierOpaqueGeom 1
BarrierTransparent
Geom 2SemaV Barrier
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
OpaqueGeom 1
TransparentGeom 1
SwapSemaP BarrierBarrier
OpaqueGeom 1
BarrierTransparent
Geom 2SemaV Barrier
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
OpaqueGeom 1
TransparentGeom 1
SwapSemaP BarrierBarrier
BarrierTransparent
Geom 2SemaV Barrier
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
OpaqueGeom 1
TransparentGeom 1
SwapSemaP BarrierBarrier
TransparentGeom 2
SemaV Barrier
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
OpaqueGeom 1
TransparentGeom 1
SwapSemaP BarrierBarrier
TransparentGeom 2
SemaV Barrier
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
TransparentGeom 1
SwapSemaP BarrierBarrier
TransparentGeom 2
SemaV Barrier
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
TransparentGeom 1
SwapSemaP Barrier
TransparentGeom 2
SemaV Barrier
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
TransparentGeom 1
SwapBarrier
TransparentGeom 2
SemaV Barrier
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
TransparentGeom 1
SwapBarrier
TransparentGeom 2
SemaV Barrier
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
TransparentGeom 1
SwapBarrier
BarrierSemaV
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
TransparentGeom 1
SwapBarrier
Barrier
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
TransparentGeom 1
SwapBarrier
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
TransparentGeom 1
SwapBarrier
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
SwapBarrier
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
Swap
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
Input Scalability ResultsInput Scalability Results
Multiple clients, Multiple clients, oneone serverserver
Compute limited applicationCompute limited application
A Fully Parallel ConfigurationA Fully Parallel Configuration
App
App
App
...
Server
Server
Server
...Display
Display
Display
Fully Parallel ResultsFully Parallel Results
1-1 rate: 472 KTri/sec16-16 rate: 6.2 MTri/sec
Peak Sort-First Rendering PerformancePeak Sort-First Rendering Performance
Immediate ModeImmediate Mode• Unstructured triangle
strips
• 200 triangles/strip
Data change Data change every every frameframe
Peak observed: Peak observed: 161,000,000 161,000,000 tris/secondtris/second
Total scene size = Total scene size = 16,000,000 16,000,000 trianglestriangles
Total display size at Total display size at 32 nodes: 32 nodes: 2048x1024 2048x1024 (256x256 tiles)(256x256 tiles)
WireGL Image ReassemblyWireGL Image Reassembly
Composite
App
App
App
...
Server
Server
Server
...
NetworkGeometry
NetworkImagery
Display
WireGL Image ReassemblyWireGL Image Reassembly
Composite
ParallelOpenGL
ParallelOpenGL
App
App
App
...
Server
Server
Server
...Display
WireGL Image ReassemblyWireGL Image Reassembly
Server
ParallelOpenGL
ParallelOpenGL
Composite == Server!Composite == Server!
Server
Server
Server
...
App
App
App
...Display
Lightning-2 = Digital Video Switch from Intel ResearchLightning-2 = Digital Video Switch from Intel ResearchRoute partial scanlines from Route partial scanlines from nn inputs to inputs to mm outputs outputsTile reassembly or depth compositing at full refresh rateTile reassembly or depth compositing at full refresh rateLightning-2 and WireGL demonstrated at SIGGRAPH Lightning-2 and WireGL demonstrated at SIGGRAPH 20012001
Image Reassembly in HardwareImage Reassembly in Hardware
Gordon Stoll et al., SIGGRAPH 2001
Example: 16-way Tiling of One MonitorExample: 16-way Tiling of One Monitor
Framebuffer 1
Framebuffer 2
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
1 3 6
8 9 11
14 16
2 4 5
7 10 12
13 15
3 6
ReconstructedImage
StripHeaders
WireGL ShortcomingsWireGL Shortcomings
Sort-firstSort-first•Can be difficult to load-balance
•Screen-space parallelism limited
•Heavily dependent on spatial locality
Resource utilizationResource utilization•Geometry must move over network every frame
•Server’s graphics hardware remains underutilized
We need something more flexibleWe need something more flexible
S
AA
A
...
SS
S
...
Stream ProcessingStream Processing
Stream Source Transform 1
Transform 1
Transform 2...
...
Streams:• Ordered sequences of records• Potentially infinite
Transformations:• Process only the head element• Finite local storage• Can be partitioned across processors to expose
parallelism
Stream Source
Stream Output
Why Stream Processing?Why Stream Processing?
Elegant mechanism for dealing with huge Elegant mechanism for dealing with huge datadata• Explicitly expose and exploit parallelism
• Hide latency
State of the art in many fields:State of the art in many fields:• Databases [Terry92, Babu01]
• Telephony [Cortes00]
• Online Algorithms [Borodin98,O’Callaghan02]
• Sensor Fusion [Madden01]
• Media Processing [Halfhill00,Khailany01]
• Computer Architecture [Rixner98]
• Graphics Hardware [Owens00, NVIDIA, ATI]
Cluster Graphics As Stream ProcessingCluster Graphics As Stream ProcessingTreat OpenGL calls as a stream of Treat OpenGL calls as a stream of
commandscommands
Form a DAG of stream transformation Form a DAG of stream transformation nodesnodes•Nodes are computers in a cluster
•Edges are OpenGL API communication
Each node has a Each node has a serializationserialization stage stage and a and a transformationtransformation stage stage
Stream SerializationStream Serialization
Convert multiple streams into a single Convert multiple streams into a single streamstream
Efficiently context-switch between streamsEfficiently context-switch between streams
Constrain ordering using Parallel OpenGL Constrain ordering using Parallel OpenGL APIAPI
Two kinds of serializers:Two kinds of serializers:
•Network server:
•Application:
•Unmodified serial application
•Custom parallel application
S
AOpenGL
Stream TransformationStream Transformation
Serialized stream is dispatched to Serialized stream is dispatched to “Stream Processing Units” (SPUs)“Stream Processing Units” (SPUs)
Each SPU is a shared libraryEach SPU is a shared library•Exports a (partial) OpenGL interface
Each node loads a Each node loads a chainchain of SPUs at of SPUs at run timerun time
SPUs are generic and interchangeableSPUs are generic and interchangeable
Example: WireGL RevealedExample: WireGL Revealed
App
App
App
...Tilesort
Tilesort
Tilesort
...
Server
Server
Server
Readback
Readback
Readback
Send
Send
Send
Server
Render
SPU InheritanceSPU Inheritance
The Readback and Render SPUs are The Readback and Render SPUs are relatedrelated•Readback renders everything except
SwapBuffers
Readback Readback inheritsinherits from the Render from the Render SPUSPU•Override parent’s implementation of
SwapBuffers
•All OpenGL calls considered “virtual”
Example: Readback’s SwapBuffersExample: Readback’s SwapBuffers
Easily extended to include depth composite Easily extended to include depth composite
All other functions inherited from Render SPUAll other functions inherited from Render SPU
void RB_SwapBuffers(void){ self.ReadPixels( 0, 0, w, h, ... ); if (self.id == 0) child.Clear( GL_COLOR_BUFFER_BIT ); child.BarrierExec( READBACK_BARRIER ); child.RasterPos2i( tileX, tileY ); child.DrawPixels( w, h, ... ); child.BarrierExec( READBACK_BARRIER ); if (self.id == 0) child.SwapBuffers( );}
Optional
Optional
Virtual GraphicsVirtual Graphics
Separate interface from Separate interface from implementationimplementation
Underlying architecture can change Underlying architecture can change without application’s knowledgewithout application’s knowledge
Douglas Voorhies, David Kirk and Olin Lathrop, SIGGRAPH 88
Example: Sort-LastExample: Sort-Last
Application runs directly on graphics hardwareApplication runs directly on graphics hardware
Same application can use sort-last or sort-firstSame application can use sort-last or sort-first
...
Application
Application
Application
Readback
Readback
Readback
Send
Send
Send
Server
Render
Example: Sort-Last Binary SwapExample: Sort-Last Binary Swap
Application Application
Application Application
Readback Readback
Readback Readback
BSwap BSwap
BSwap BSwap
Send Send
Send Send
Server
Render
Binary Swap Volume Rendering ResultsBinary Swap Volume Rendering Results
One node
Two nodes
Four nodesEight nodes Sixteen nodes
Example: User Interface ReintegrationExample: User Interface Reintegration
App
Tilesort ...
Server
Server
Server
IBMT221
Display(3840x2400)
IBMScalableGraphicsEngine
Chromium ProtocolUDP/Gigabit EthernetDigital Video Cables
Integrate
Integrate
Integrate
Serial applications can drive the T221 with their original user interfaceParallel applications can have a user interface
CATIA Driving IBM’s T221 DisplayCATIA Driving IBM’s T221 Display
Jet engine nacelle model courtesy Goodrich AerostructuresX-Windows Integration SPU by Peter Kirchner and Jim Klosowski, IBM T.J. WatsonChromium is the only practical way to drive the T221 with an existing applicationDemonstrated at Supercomputing 2001
3840
24
00
A Hidden-line Style SPUA Hidden-line Style SPU
A Hidden-line Style SPUA Hidden-line Style SPU
DemoDemo
Quake III
Render
Quake III
StateQueryVertexArrayHiddenLine Render
Server
HiddenLine Render
Quake III
SendStateQuery
Is “HiddenLine” Really a SPU?Is “HiddenLine” Really a SPU?
Technically, no!Technically, no!
Requires potentially unbounded Requires potentially unbounded resourcesresources
Alternate design:Alternate design:
Application
SQ VAHiddenLine
Server
Server
ReadbackSend
ReadbackSend
Server
Render
Lines
Polygons
DepthComposit
e
Current ShortcomingsCurrent Shortcomings
General display lists on tiled displaysGeneral display lists on tiled displays•Display lists that affect the graphics state
Distributed texture managementDistributed texture management•Each node must provide its own texture
•Potential N2 texture explosion
•Virtualize distributed texture access?
Ease of creating parallel applicationsEase of creating parallel applications• Input event management
Future DirectionsFuture Directions
End-to-end visualization system for 4D End-to-end visualization system for 4D datadata•Data management and load balancing
•Volume compression
Remote/Ubiquitous VisualizationRemote/Ubiquitous Visualization•Scalable graphics as a shared resource
•Transparent remote interaction with (parallel) apps
•Shift away from desktop-attached graphics
Taxonomy of non-invasive techniquesTaxonomy of non-invasive techniques•Classify SPUs and algorithms
• Identify tradeoffs in design
Observations and PredictionsObservations and Predictions
Manipulation of graphics streams is a Manipulation of graphics streams is a powerful abstraction for cluster powerful abstraction for cluster graphicsgraphics•Achieves both input and output scalability
Providing Providing mechanismsmechanisms instead of instead of algorithms algorithms allows greater flexibilityallows greater flexibility•Data management algorithms can be built into
a parallel application or embedded in a SPU
Flexible remote graphics will lead to a Flexible remote graphics will lead to a revolution in ubiquitous computingrevolution in ubiquitous computing