geforce 8800 opengl extensions
TRANSCRIPT
GeForce 8800 OpenGL Extensions
Evan Hart
© NVIDIA Corporation 2007
Roadmap
What’s differentThe programmable coreFeeding itNew pathwaysThe backend
© NVIDIA Corporation 2007
GeForce 8800 Differences
Pipeline modificationsAdditional geometry shader stageFeedback available midstream
Unified shading hardwareSame instructions and characteristics across shaders
© NVIDIA Corporation 2007
GeForce 8800 OpenGL Pipeline
More flexible memory access model
Multiple ways to read and write
Additional pipeline stagesFundamentally new capabilities
Fixed / Built in streamProgrammable streamVideo memory IO
Fixed stage
Programmable stage
Memory
Fixed / Built in streamProgrammable streamVideo memory IO
Fixed / Built in streamProgrammable streamVideo memory IO
Fixed stage
Programmable stage
Memory
Input Assembler
Vertex Processor
Raster Operations
Primitive Assembly
Rasterization & Interpolation
Framebuffer
Video M
emory
Geometry Processor
Fragment Processor
TransformFeedback
© NVIDIA Corporation 2007
FutureFuturePhysics
Vertex
Pixel
Geometry(new)
Future
FloatingPoint
Processor
Memory
ROP
Unified Shaders
© NVIDIA Corporation 2007
Programmability
#1 design concern of all extensionsEfficient for inputEfficient for computationEfficient for output
No concessions for fixed-functionOnly the right choices for programmabilityMost will not work with the fixed function pipeline
© NVIDIA Corporation 2007
New Capabilities
Unified instruction set for programsInteger instructions and data typesUniform set of structured branching constructsIndexable constants and temporariesNew texture fetching instructionsAttribute interpolation control
Flat shaded, perspective-incorrect or centroid sampled
© NVIDIA Corporation 2007
New OpenGL Program Extensions
OpenGL Shading LanguageEXT_gpu_shader4
GLSL extension for fourth generation shaders
ARB_vertex_program style asm-like programsNV_gpu_program4
NV_vertex_program4NV_fragment_program4NV_geometry_program4
Cg 2.0Not a GL extensionWill support the capabilities
© NVIDIA Corporation 2007
GL_EXT_gpu_shader4
New integer supportunsigned int, uvec2, uvec3, uvec4Integer varying and attributesInteger texture samplers: isampler*, usampler*Real integer ops
%, &, |, ^, >>, <<, ~
© NVIDIA Corporation 2007
GL_EXT_gpu_shader4 cnt’d
Extends varying type qualifiersCentroid – keeps the value inside the covered regionFlat – no interpolation, like flat shadingNoperspective – interpolate in screen-space
© NVIDIA Corporation 2007
Flat interpolation
© NVIDIA Corporation 2007
GL_EXT_gpu_shader4 cnt’d
Instancing and procedural generationgl_VertexID
Integer index derived from glDrawElements, etcOnly with vertex arrays (no display lists)Only when using VBOs
gl_InstanceIDInteger index of the current primitiveOnly available when using DrawElementsInstancedEXT
gl_PrimitiveIDInteger describing the primitive number
© NVIDIA Corporation 2007
GL_EXT_gpu_shader4 cnt’d
User-defined output variablesDeclared as “varying out”Allow more flexible outputs from the fragment shader
IntegersIntegers and floats simultaneously
Used instead of gl_FragColor or gl_FragData
© NVIDIA Corporation 2007
GL_EXT_gpu_shader cnt’d
Exact texel fetchestexelFetch*( sampler, icoord, lod)Treats a texture as a directly addressable array of texels
Texture size querytextureSize*( sampler, lod)Returns the dimensions of the texture level
Shadow cubemapsDepth is compared against the 4th component
Texture Gradient fetchestexture*Grad( sampler, coord, dx, dy)Allows custom lod/anisotropy control
© NVIDIA Corporation 2007
GL_EXT_gpu_shader4 cnt’d
Offset texture fetchestexture*Offset( sampler, coord, offset)Offset must be compile-time constant expressionAllow convenient shortcut for building filter kernelsOffset size has an implementation dependent limit
© NVIDIA Corporation 2007
GL_NV_gpu_program4
Composed of three sub-extensionsGL_NV_vertex_program4GL_NV_fragment_program4GL_NV_geometry_program4
Still based on 4-wide registersNo longer really matches HWEnhances backward compatibility
Provides same capabilities as EXT_gpu_shader4
© NVIDIA Corporation 2007
Feeding the Shader
InstancingOptimized rendering of multiple copies of an object
New texture typesTexture arrays
New texture formatsInteger texturesAdditional HDR formats
Data buffersFast flexible ways to swap blocks of constants
Texture buffersMassive data store for shaders
© NVIDIA Corporation 2007
EXT_draw_instanced
Efficient rendering for large numbers of objectsVertex array only
glDrawArraysInstancedEXTglDrawElementsInstancedEXT
Draw calls take one additional parameter# of instances to draw
Each instance has a separate instance IDUsed by the shader to change behavior
Select transform matrixSelect material
Clever shaders can even draw different objects
© NVIDIA Corporation 2007
EXT_texture_array
Array TexturesGeneralizes 1D and 2D textures to consist of an array of 1D or 2D textures
1D array loaded using glTexImage2D2D array loaded using glTexImage3D
Array indexable from shader programAccess layer using r texture coordinate
Layers must be same size and formatNo filtering between layersArrays of cubemaps not currently supportedRemoves need for texture atlases
Useful for instancing, terrain texturing
© NVIDIA Corporation 2007
Array Textures Cont’d
S [0-1]
T [0-1]
R [0–3]
© NVIDIA Corporation 2007
Texture Format Extensions
EXT_texture_integerAdds integer texture formats
EXT_packed_floatSpace-efficient float formatRelatively low precision
More than good enough most times
EXT_texture_shared_exponentSpace-efficient float formatVariable accuracy
Can be more or less accurate than packed floatAlso more than good enough
© NVIDIA Corporation 2007
Integer texture formats
EXT_texture_integerAdds integer texture formats8, 16, and 32 bit per componentSigned and unsignedRGB, RGBA, Luminance, Alpha, Intensity, and LALack filtering supportOnly available with EXT_gpu_shader4 / NV_gpu_program4
UsesBitfieldsColor index emulationLookup tables
© NVIDIA Corporation 2007
Packed Float Textures
EXT_packed_float11/11/10 floating point format5 bit exponent per component
Bias of -15Only supports positive values (no sign bit)Can be used as framebuffer formatMax values
R/G – 65024B – 64512
Size advantage can make it much faster than float16Supports filtering, blending, and MSAA
© NVIDIA Corporation 2007
Packed Float Texture Usage
© NVIDIA Corporation 2007
RGBE / Shared Exponent textures
EXT_texture_shared_exponent9/9/9/5 RGBE formatSimilar to Radiance 8/8/8/8 RGBE formatShared 5 bit exponent (bias of -15)Source texture format only (not renderable to)Only supports positive values (no sign bit)
© NVIDIA Corporation 2007
New compressed texture formats
EXT_texture_compression_latcGood format for greyscale w/ alpha compression8-bits per texelStored in 4x4 blocks (like DXT formats)Components are compressed independently
EXT_texture_compression_rgtcSame properties as latcReturns (r,g,0,1) instead of (l,l,l,a)Useful for normal map compression
(x,y) = 2 * (r,g) – 1Z = sqrt( 1 – (x2 + y2) )
© NVIDIA Corporation 2007
OpenGL Data Buffer Extensions
EXT_bindable_uniformUsed with EXT_gpu_shader4Allows a uniform to be bound to a buffer objectQuickly switch all values in a large structure or array
NV_parameter_buffer_objectUsed with NV_gpu_program4Defines new object containing banks of uniform parametersEnables rendering to parameter buffersUseful for instancingProvides a very large parameter store
© NVIDIA Corporation 2007
EXT_texture_buffer_object
Bind buffer object as a textureJumbo 1D texture
Larger size limit (128 Mtexels)Does not support filteringAddressed by element
Not normalized [0-1]Limited format support
No RGB support (RGBA is OK)Minimum of 1 byte per componentNo compressed formats
© NVIDIA Corporation 2007
Texture buffer usage
Large constant storeSignificantly larger than EXT_bindable_uniformJumbo bone list for skinning
Instancing dataTransform matricesMaterials
Custom indexingSeparate ‘index’ for position, normal, texture coordinateLess efficient than normal vertex fetching
Not as coherentCan be useful interactive editing due to reduced sw cost
© NVIDIA Corporation 2007
New Pathways
Geometry ShadersNew pipeline stage
Render target arraysGeometry shader indexing of texture arrays
Transform feedbackExport vertices mid-stream
© NVIDIA Corporation 2007
Geometry Shader BasicsInput
Standard primitivespoint, line, triangle…
New primitive types include neighboring vertices
Line with adjacency
Triangle with adjacency
OutputUnique output type (independent from input type)Points, line strips or triangle stripsCan output zero or more primitivesGenerated primitive stream is in the same order as inputted
12 3
4
56
1
2
34
Primitive Assembly
Clipping & Rasterization
Geometry Shader
Video M
emory
© NVIDIA Corporation 2007
Geometry Shader Applications
Better point spritesRotation, non-square, motion blur
Simple subdivisionSingle pass cube map creationAutomatic stencil shadow polygon generationFur rendering
Fin generationCurve rendering
2D rendering, hair/furParticle systemsGPGPU
Data amplification – variable number of outputs
© NVIDIA Corporation 2007
Geometry Shader Extensions
EXT_geometry_shader4GLSL geometry shaders
NV_geometry_shader4Adds to EXT_geometry_shader4
NV_geometry_program4ARB_vertex_program stylePart of NV_gpu_program4
© NVIDIA Corporation 2007
EXT_geometry_shader
New link-time parametersPrimitive input and output typeMax vertices output
New shader variablesgl_VerticesIn – number of input verticesgl_Layer – texture array layer targetgl_PrimitiveID – Set primitive ID seen by the fragmentsgl_PrimitiveIDIn – Primitive ID based on input prims
© NVIDIA Corporation 2007
EXT_geometry_shader code
varying in vec3 eyeNormal[gl_VerticesIn];
varying out vec3 oEyeNormal;
for ( int i = 0; i < gl_VerticesIn; i++) {oEyeNormal = eyeNormal[i];//causes all output varying to submitEmitVertex();
}
//Start a new stripRestartPrimitive();
© NVIDIA Corporation 2007
NV_geometry_shader4
Relax restrictions of EXT_geometry_shader4Changing input/output primitive without a relinkChanging max output vertices without a relink
Defines additional behaviorQuads and polygons at turned into triangles
Shader accepting triangle accepts quads
© NVIDIA Corporation 2007
NV_geometry_program4
Same capabilities as geometry_shader4ARB_vertex_program style syntaxInputs are ‘ATTRIB’Outputs are ‘RESULT’Input primitive, output primitive, and max vertices
Declared in the shader
© NVIDIA Corporation 2007
Rendering to Texture Arrays
Can select destination layer for each primitive in geometry program
Write to “gl_Layer” or “result.layer”Can be used for single-pass render-to-cubemap
Read input triangleOutput to 6 cube map faces, transformed by correct face matrixNo real savings for GPU
Same transform and rasterization loadAdditional geometry shader work
Simple culling may help
© NVIDIA Corporation 2007
EXT_transform_feedback
Allows storing output from a vertex program or geometry program to buffer objectEnables multi-pass operations on geometry, e.g.
Store results of animation (skinning) to buffer, reuse for multiple lightsRecursive subdivision
Provides queries for number of primitives generated by geometry program
© NVIDIA Corporation 2007
Example: Terrain Subdivision
Takes quad as input (actually line_adj primitive)Geometry program subdivides into 4 new quads using diamond-square subdivisionUses two VBOs, reads from one, writes to the other using transform feedbackThen swap
© NVIDIA Corporation 2007
Terrain Subdivision
© NVIDIA Corporation 2007
The backend
Floating point depth buffersEXT_draw_buffers2sRGB FramebuffersCoverage sample anti-aliasing
© NVIDIA Corporation 2007
NV_depth_float
Provides floating point depth buffers and texturesRange extended to [–MAX_FLOAT, MAX_FLOAT]New functions for non-normalized values
glDepthRangedNVglClearDepthdNVglDepthBoundsdNV
Multiple formatsDEPTH_COMPONENT32F_NVDEPTH32F_STENCIL8_NV
FBO onlyCare must be taken in using the extra precision
© NVIDIA Corporation 2007
Float depth precision
Normal [0-1] mapping is ineffectivePrecision bunches up at 0
Logarithmic distribution of floats24 bits of precision [0.5 (2^-1) – 1.0 (2^0)]
Perspective projection pushes scene toward 12x near maps to roughly 0.590%+ of scene [0.5 – 1.0] range
Effectively a 25-bit depth bufferChanging to [0-256] is no better
[128 – 256] has the same 24-bits
© NVIDIA Corporation 2007
Depth Precision Distribution
Integer distribution
Floating point distribution
0 1
2x near
© NVIDIA Corporation 2007
The Solution
Reverse the depth mappingFar = 0.0Near = 1.0 (or higher)
Precision bunching now reversedCompression and expansion line upResults in essentially linear depth buffering
© NVIDIA Corporation 2007
A Caveat
Small precision cliff near 0FP numbers often use denormals to fill itGraphics hardware typically avoid it
Often CPU’s too, denorms are slow
Problem is minimal due to exponent rangeInfinite far plane completely fixes it
© NVIDIA Corporation 2007
EXT_draw_buffers2
Provides per draw buffer control ofBlend EnableColor mask
Does not provide independent control ofBlend FunctionBlend EquationBlend Color
© NVIDIA Corporation 2007
EXT_framebuffer_sRGB
Allows framebuffer to be stored in sRGB spaceProvides perceptual color compression
8 bits sRGB is similar to 10 bits linear RGBConverts to/from sRGB on access to framebufferImplements a gamma of 2.2
This matches most monitors relatively well
Controlled via a simple enableNot available on all formats (>8 bit per component)
© NVIDIA Corporation 2007
CSAA
Coverage Sample Anti-AliasingGL_NV_framebuffer_multisample_coverage
Decouples primitive coverage from depth/colorIncreases edge quality with little costMemory and bandwidth overhead scale with # depth samplesWorst case is as good as # of depth samples
© NVIDIA Corporation 2007
CSAA usage
Extremely simple
MSAAglRenderbufferStorageMultisampleEXT(
GL_RENDERBUFFER_EXT, 4, GL_RGBA8, width, height)
CSAAglRenderbufferStorageMultisampleCoverageNV(
GL_RENDERBUFFER_EXT, 16, 4, GL_RGBA8, width, height)
© NVIDIA Corporation 2007
CSAA Modes
Name Coverage Samples Color/Depth Samples
2x 2 2
4x 4 4
8x 8 4
8xQ 8 8
16x 16 4
16xQ 16 8
© NVIDIA Corporation 2007
Thanks
Simon GreenSamuel GateauHenry MoretonNVIDIA DevTech team
© NVIDIA Corporation 2007
New Developer Tools at GDC 02007
SDK 10
PerfKit 5 FX Composer 2
GPU-AcceleratedTexture Tools
ShaderPerf 2
Shader Library