firaxis lore

Post on 10-Dec-2015

26 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

This is a short presentation about civ5 creator engine

TRANSCRIPT

Firaxis LORE

And other uses of D3D11

Low Overhead Rendering Engine• Or, how I learned to

Render 15,000+ batches at 60 FPS

Overview• Civ 5 is a big game, covers 6000 years of

history• The entire map can be populated/ polluted

with all sorts of things the user creates• Need to be able to render a huge amount of

possibly disparate types

Early Goals• Build brand new Engine for Civilization V• Like the game, we wanted graphics engine to

be able to ‘stand the test of time’• Decided while D3D11 was in Alpha to build

the engine natively for D3D11 architecture, and map backwards to DX9

Step 1: Cutting the overhead downFSL Files

CPP / H Template Code

Compile Time Glue Code

• Shaders start in Firaxis Shading Language (FSL) superset of HLSL•Compiles into CPP and Header file – all shader constants are mapped to structs, grouped into packages where all packages have same bindings•Model Code is templated – FSL generated header is then bound with template code •Result is tiny amount of code that fills out required shading, barely shows up on profiling

Step 2: Abstracting the Rendering• Still have to Support DX9, might have to

support consoles in future• Might have to write a ‘driver’• Our solution: Make DX9 ‘look like’ DX11• Started with as a restricted design as possible,

and expanded as we needed to

Packetized Rendering• Stateless rendering, much simpler then D3D• Command based – all rendering is performed by self contained command• A command set may contain a list of surfaces to render, each with shader

constant payload• A surface is an immutable bundle of an IB, VB, textures, shader def, etc • All state is bundled into a packages Alpha State, Z State, etc. Commands

reference one of these state packages• Entire Frame is queued up• Minimal per frame allocation

Only 5 Types of commands• COMMAND_RENDER_BATCHES

– A List of surfaces to render into 1 or more rendertargets, with alpha and Zstate bundles

– Surfaces have IB, VB, sampler and texture bundles. All required state is specified

• COMMAND_GENERATE_MIPS • COMMAND_RESOLVE_RENDERTEXTURE • COMMAND_COPY_RENDERTEXTURE• COMMAND_COPY_RESOURCE

Packetized RenderingRendering System

Rendering Engine D3D/Driver

Step 3: ThreadingJob

JobJobJobJobJobJobJobJobJob

Job Manager

Job

Job

Job

Rendering System

Why do we queue up entire Frame?• Would seem like additional overhead, but perf analysis

shows it is a net win– Internal command setup is super-cheap, just some mem copies– Engine cache coherency is vastly better– D3D driver cache coherency is much better with one giant dump– Very low % of total CPU time spent in submission– Allows us to filter redundant D3D calls. Call overhead adds up – Fast even in DX9

Implementation advantages• Once ‘stateless’ concept grasped, code maintaince

easy• Next to no state-leaking (flickering alpha, textures etc) • Because rendering is packetized, individual jobs need

little or no communication between each other

• NO THREADING BUGS

Threaded D3D11 submission• Top issues:– Generally High driver overhead for batch submission– But: D3D11 has multithreaded submission– Command Streams not necessarily map 1:1 to

CommandLists– Civilization V can change how it submits via settings the

config files

Step 4: Gloating over results• Wildly surpassed commonly held beliefs on # of

batches possible, especially with threading

Test Driver with native CL support Driver without CL support

Units 1686* 931

Landmarks 1152* 673

Lategame 3616* 2052

*Believed to be GPU limited

Conclusions• High throughput rendering is possible: IF:– care taken to reduce application overhead– Job based, pay-load based rendering– Redundant state and calls filtered– Use D3D11 command lists– Engine can peg 12 threads at 97% (sans driver)

D3D11 Features: Tessellation• Major addition to

D3D11 API[Screenshot]

Terrain• Civ5 contains one of the most complex terrain

systems ever made• Complete procedural process• Use GPU to raytrace and anti-alias shadows• Caching system to deal with cases where

terrain is too big

Tessellation• Terrain very high detail, roughly 64x64

heightmap data per hex• Triangle count, when zoomed out, can be in

the millions• Used Tessellation as a ‘drop-in’

Tessellation Cont• Simple Bicupic Beta Spline patches• Adjusted global tessellation as camera moved in and

out• A strict performance increase : 10%-40% faster, on

both AMD and Nvidia hardware.• More Adapative techinques would work even better,

but didn’t have time to implement them

Leaders

Leader Rendering• Largely done with DX10.1 rendering tech• New Variable bit rate compression technology

implemented for D3D11. • 2.5 GBs of texture data reduced to 150mbs, can be

decompressed on the GPU• Details forthcoming, research is in publication

submission process – extensive use of UAVs

Future Stuff, NO AO

Future Stuff (CS), AO

Q&A

top related