developer day - khronos group · shadows shadow caching materials physically based (w/approx)...

74
© Copyright Khronos™ Group 2018 - Page 1 GDC 2019 #KhronosDevDay DEVELOPER DAY #KhronosDevDay

Upload: others

Post on 13-Mar-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© Copyright Khronos™ Group 2018 - Page 1

GDC 2019#KhronosDevDay

DEVELOPER DAY#KhronosDevDay

Page 2: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© Copyright Khronos™ Group 2018 - Page 2

GDC 2019#KhronosDevDay

DEVELOPER DAYBringing Fortnite to Mobile with Vulkan and OpenGL ES

Jack Porter, Epic Games Kostiantyn Drabeniuk, Samsung Electronics

Page 3: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 3This work is licensed under a Creative Commons Attribution 4.0 International License

Agenda• Part 1 – Fortnite Mobile Challenges and Solutions - Jack Porter, Epic Games

- Scope of the problem to bring PC & console cross-play to mobile

- Performance

- Memory

- Recent UE improvements

• Part 2 – Vulkan for Fortnite Mobile - Kostiantyn Drabeniuk, Samsung Electronics

- Vulkan advantages

- Performance optimizations

- Hitching and memory optimizations

Page 4: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 4This work is licensed under a Creative Commons Attribution 4.0 International License

The Same Game, Not a Port.From the very start we set out to support cross-play for all platforms including mobile

• The same map is used on all platforms (with regular simultaneous content updates)

• Anything that affects gameplay must be supported

• The engagement distance must be the same across platforms

• Code must not diverge from base Unreal Engine 4

Page 5: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 5This work is licensed under a Creative Commons Attribution 4.0 International License

Fortnite Rendering Features – PC & ConsoleDeferred Renderer

Movable Directional Light Cascaded Shadow Maps

Ray-traced Distance Field Shadows

Movable SkylightDistance Field Ambient Occlusion

Screen Space Ambient Occlusion

Local LightsPoint + Spot

Shadows

Shadow Caching

MaterialsPhysically Based

Subsurface Scattering

Two-sided foliage

EffectsVolumetric Fog

Light Shafts

GPU Particle Simulation

Soft Particles

Decals

Foliage Animation

Post ProcessingBloom

Object Outlines

ACES Tonemapper

Anti-aliasingTemporal AA

MSAA

Page 6: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 6This work is licensed under a Creative Commons Attribution 4.0 International License

Fortnite Rendering Features – MobileForward Renderer

Movable Directional Light Cascaded Shadow Maps

Ray-traced Distance Field Shadows

Movable SkylightDistance Field Ambient Occlusion

Screen Space Ambient Occlusion

Local LightsPoint + Spot

Shadows

Shadow Caching

MaterialsPhysically Based (w/approx)

Subsurface Scattering

Two-sided foliage

EffectsVolumetric Fog

Light Shafts

GPU Particle Simulation

Soft Particles

Decals

Foliage Animation

Post ProcessingBloom

Object Outlines

ACES Tonemapper

Anti-aliasingTemporal AA

MSAA

Page 7: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 7This work is licensed under a Creative Commons Attribution 4.0 International License

Scaling Content for Mobile• Destructible Hierarchical LOD

- Aggregate individual assets into a hierarchy of proxy objects➢Replace individual assets with proxies at a distance

- Fortnite is a game where everything is destructible➢Tag in vertex color to allow the vertex shader cull destroyed geometry from the proxy

assets

No HLOD HLOD

Page 8: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 8This work is licensed under a Creative Commons Attribution 4.0 International License

Fortnite Mobile - By the Numbers

Geometry

- 80,000 objects on the island- 10,000 typically loaded

- 800 draw calls average, 2000+ peak

- 600,000+ triangles (high end)

Shaders / PSOs

- 4,300 PSOs actually needed for rendering!- gathered using automated and manual gameplay

- from a pool of 28,000 shader programs

Memory

- 1.2GB – 2GB - Varies depending on device profile and rendering API and shader allocation strategy

Page 9: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 9This work is licensed under a Creative Commons Attribution 4.0 International License

Challenges

•Performance

•Memory

•Device Compatibility

Page 10: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 10This work is licensed under a Creative Commons Attribution 4.0 International License

Performance

CPU cost

• Draw call cost - graphics API overhead

➢Add an RHI Thread

➢Use Vulkan instead of OpenGL ES

• Reducing draw calls & state change

➢Improving occlusion culling

➢Sorting

➢Instancing

GPU cost

➢Content changes

➢Resolution and frame rate scaling

• Rendering code improvements

➢Collapsing render passes

Page 11: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 11This work is licensed under a Creative Commons Attribution 4.0 International License

Draw Call Cost – Renderer Threading

Game Thread- Update game state from player input,

network and physics simulation

- Enqueue game object state change

- Enqueue resource changes

- Send command to render scene

Render Thread- Dequeue state change into game object

render proxies

- Create or update render resources

- Render scene

1. Retrieve occlusion queries from a

previous frame

2. Calculate object visibility

3. Render shadow maps

4. Render opaque geometry including

lighting and shadows (base pass)

5. Render occlusion queries testing on

depth layed down by base pass

6. Render translucency pass

7. Render post-process and tonemap

8. Render UI

• Unreal Engine 4 has two main threads

Page 12: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 12This work is licensed under a Creative Commons Attribution 4.0 International License

Game / Render Thread

Game Thread waits

for previous frame

Render Thread

Kick

Rendering

Wait for

vsync and

SwapBuffers

Visibility

Walk scene graph and issue

draw calls for visible objects

Page 13: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 13This work is licensed under a Creative Commons Attribution 4.0 International License

Add RHI Thread

Significant part of Render thread time is spent inside GL API calls, especially when

there has been a lot of state change.

- 25ms or more on low end devices

- Time mostly spent in glDrawElements

- Most of the benefits of instancing come from sorting better by state

Improvement was to add an “RHI Thread” that does nothing but issue GL API calls

• Rendering code never waits for GL API calls to return

➢Resource creation and update APIs return to Render thread immediately with a

proxy handle

Page 14: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 14This work is licensed under a Creative Commons Attribution 4.0 International License

Game / Render / RHI Thread

Waiting on

previous RHI

thread

frame

Visibility

Enque

draw calls

to RHI

Thread

Wait for next frame

Render Kick

Wait for

vsync and

SwapBuffers

Update resources, change state,

and issue draw calls using GL ES API

Page 15: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 15This work is licensed under a Creative Commons Attribution 4.0 International License

RHI Thread Synchronization

• Render thread synchronizes with RHI thread

- Waiting on occlusion query results➢Using the RHI Thread adds an extra frame of latency for occlusion queries

• Game thread synchronizes with RHI thread

- Waits to ensure the RHI thread doesn’t get more than 2 frames behind

Page 16: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 16This work is licensed under a Creative Commons Attribution 4.0 International License

RHI Thread comparison – OpenGL ESR

HI T

hre

ad e

nable

dR

HI T

hre

ad d

isable

dGame Thread

Render Thread

RHI Thread

Overall Frame

Time (ms)

Galaxy Note 9

Adreno

GL ES Mode

Page 17: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 17This work is licensed under a Creative Commons Attribution 4.0 International License

Draw Call Cost - Graphics API Selection

• OpenGL ES 3.1

• ASTC textures

• Android 6.0 or later

• Extensions

- EXT_color_buffer_half_float

- EXT_copy_image (or ES 3.2)

- OES_get_program_binary

• Vulkan 1.0.1

• ASTC textures

• Android 8.0 or later

• Whitelisted for specific devices based on

improved measured performance

Fortnite for Android can run with either UE4’s OpenGL ES or Vulkan Render Hardware

Interface (RHI), chosen by the Device Profile at runtime.

Page 18: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 18This work is licensed under a Creative Commons Attribution 4.0 International License

Graphics API Selection

Unfortunately Vulkan is not a clear win on many devices

• Lack driver maturity on older devices can lead to poor performance

• Modest CPU win on newer devices

• Extra GPU cost can negate any gains (working to reduce this)

Many devices where we’d most like to use Vulkan – ie devices with poor CPU

performance limiting draw call counts - are unable to benefit from it.

Page 19: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 19This work is licensed under a Creative Commons Attribution 4.0 International License

Graphics API Selection

Currently Fortnite enables Vulkan only on:

• Galaxy S9 Adreno

• Galaxy Note 9 Mali and Adreno

• Galaxy S10 Mali and Adreno

• Vulkan is also a win on modern devices such as Snapdragon 845 and Mali-G76 devices

➢Expect to ship it enabled by default for many of this year’s flagship devices

Vulkan is enabling us to push quality and performance at the high end

Page 20: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 20This work is licensed under a Creative Commons Attribution 4.0 International License

RHI Thread comparison – VulkanR

HI T

hre

ad e

nable

dR

HI T

hre

ad d

isable

dGame Thread

Render Thread

RHI Thread

Overall Frame

Time (ms)

Galaxy Note 9

Adreno

Vulkan Mode

Page 21: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 21This work is licensed under a Creative Commons Attribution 4.0 International License

RHI Thread comparison – Vulkan vs OpenGL ESGame Thread

Render Thread

RHI Thread

Overall Frame

Time (ms)

Galaxy Note 9

Adreno

Vulk

an

OpenG

L E

S

Page 22: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 22This work is licensed under a Creative Commons Attribution 4.0 International License

Draw Call Cost - Occlusion Culling

• Render proxy geometry against the depth buffer wrapped with a query

- glBeginQuery, glEndQuery / vkCmdBeginQuery, vkCmdEndQuery

• Check if any pixels of the proxy geometry was renderered- glGetQueryObjectuiv(GL_QUERY_RESULT) / vkGetQueryPoolResults(VK_QUERY_RESULT_WAIT_BIT)

• Use that information to

decide whether to render

the real geometry

Page 23: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 23This work is licensed under a Creative Commons Attribution 4.0 International License

Latency

• We need an existing depth buffer to test against - On PC & console we do a depth prepass so we can render the queries early in the frame

- On mobile we don’t have the depth buffer until the end of the base pass

- We need to adds one extra frame of latency to insure the results are available in time

Frame 3

Frame 2

Frame 1

Occlusion Culling – Implementation

Render

Base

Pass

Render

Occlusion

Proxies

Visi-

bility

Render

Trans-

lucency

Post

Process

Sw

ap

Render

Base

Pass

Render

Occlusion

Proxies

Visi-

bility

Render

Trans-

lucency

Post

Process

Sw

ap

Render

Base

Pass

Render

Occlusion

Proxies

Visi-

bility

Render

Trans-

lucency

Post

Process

Sw

ap

Page 24: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 24This work is licensed under a Creative Commons Attribution 4.0 International License

Thread synchronization when reading results

- We can only wait for queries on RHI Thread

- Results are needed on the Render thread where we calculate visibility➢Poll results using glGetQueryObjectuiv(GL_QUERY_RESULT_AVAILABLE) between

RHIThread commands and update a thread-safe flag

➢Usually have results before Render thread asks for them and we do not need to

block

Poll Queries from Previous Frames

Occlusion Culling – Implementation

Render

Base

Pass

Render

Occlusion

Proxies

Render

Trans-

lucency

Post

Process

Sw

ap

Visi-

bilityDraw (Render Thread)

Render

Base

Pass

Render

Occlusion

Proxies

Render

Trans-

lucency

Post

Process

Sw

ap

Visi-

bilityDraw (Render Thread)

Visi-

bilityDraw (Render Thread)

Poll Queries from Previous Frames

Page 25: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 25This work is licensed under a Creative Commons Attribution 4.0 International License

Occlusion Culling – Implementation

Limited number of queries

• Ideally we would have one occlusion query per object

• Some mobile devices have internal limits for the number of outstanding queries

➢OpenGL and Vulkan RHIs virtualize occlusion queries to abstract this away

➢Aggregate proxy geometry on some frames

Page 26: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 26This work is licensed under a Creative Commons Attribution 4.0 International License

Shader Program Memory• In UE4 the majority of shaders are created from artist-generated materials

hlslcc

Artist-created material shader graphs

GLSL

Page 27: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 27This work is licensed under a Creative Commons Attribution 4.0 International License

Shader Permutations• From each material graph we generate fragment and vertex shader permutations

- “Vertex Factory” (mesh type)- Static mesh, skeletal mesh, particle, terrain, …

- Forward lighting pass- Base forward pass with CSM shadow

- Base forward pass, unshadowed

- Shadow depths- Shared for opaque objects

- Unique for alpha masked objects

- Translucency & effects

• Result is over 28,000 individual shaders

Page 28: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 28This work is licensed under a Creative Commons Attribution 4.0 International License

Shaders – OpenGL ES• Set of PSOs encountered while playing Fortnite gathered offline using automated

and manual gameplay

- 4,300 PSOs actually needed for rendering

On OpenGL ES we must compile from GLSL source code

• First launch of Fortnite

- Compile all shader programs

- Save the resulting shader program binary to the user’s phone using

GL_OES_get_program_binary

• Subsequent launches

- Recreate shaders with glProgramBinaryOES

Page 29: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 29This work is licensed under a Creative Commons Attribution 4.0 International License

Shaders – OpenGL ES – LRU Cache

• Ideally we would have all shader programs created before gameplay starts

- Shaders measured to expand to more than 10x their binary size in driver RAM

allocations

- Instead we use an LRU cache to keep only a limited number of shader programs

resident

- Saves over 400MB on some devices

Page 30: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 30This work is licensed under a Creative Commons Attribution 4.0 International License

Shaders – OpenGL ES – LRU Cache

• Shader eviction strategies

- When resident shader program count exceeds some threshold

- Estimation of resident shader memory

- On object destruction- Not great for transient but frequent uses like particles, so we add an extra delay

• Shader restoration strategies

- Stream shader binary from storage, creates hitches in practice

- Recreate shader from compressed binary in RAM- On Adreno, shaders total about 20MB compressed so it’s feasible to always keep them

resident

- On Mali we keep binaries in RAM for non-resident shader programs

- Create create binary from shader program on eviction and store in RAM

- Restore shader program from RAM binary and free RAM binary

Page 31: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 31This work is licensed under a Creative Commons Attribution 4.0 International License

Shaders - Vulkan

• We gather Vulkan PSOs using the same mechanism as for OpenGL GL

• On Vulkan we create pipelines on first launch and save vkPipelineCache to storage

• Vulkan mode also has a runtime PSO cache in memory with LRU

• Kostiantyn will provide some details

including shader memory savings

Page 32: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 32This work is licensed under a Creative Commons Attribution 4.0 International License

Recent UE4 Renderer Improvements

Unreal Engine has been evolving to support new platforms and rendering APIs since its

inception.

The Render Hardware Interface abstraction layer (RHI) has had some recent

improvements made to better support modern graphics APIs like Vulkan.

1. Explicit render passes

2. Vulkan subpasses

3. High-level rendering refactor

Page 33: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 33This work is licensed under a Creative Commons Attribution 4.0 International License

1. Explicit Render PassesStarting a new renderpass can be expensive on mobile tiled GPUs

- Save out the results of the previous render pass from the GPU core to RAM

- Load an existing render target from RAM back into the GPU core

• Render passes in the high level code were originally implict

- Engine code set render targets and then the RHI guessed if we were starting a

new render pass

- Each rendering operation (eg shadows, base color, translucency) called functions

at the beginning and end of their operations to set render targets and resolve the

results

• New in UE 4.22

- RHI functions have been added to explictly begin and end render passes

- UE4 mobile renderer now makes use of these to remove of unnecessary

transitions➢eg base pass → translucency → post processing

Page 34: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 34This work is licensed under a Creative Commons Attribution 4.0 International License

2. Vulkan Sub-passes

• Use case: Soft Particle Translucency and Deferred Decals

- These rendering techniques require access to an existing fragment depth value

- In GLES we use EXT_shader_framebuffer_fetch to

get an existing depth value- Depth previously written to alpha in base pass, or we

use ARM_shader_framebuffer_fetch_depth_stencil

where available

➢Compare fragment depth against existing depth

Page 35: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 35This work is licensed under a Creative Commons Attribution 4.0 International License

2. Vulkan Sub-passes

• Very Vulkan-specific feature, and only applicable to

mobile GPUs➢No general UE4 support for subpasses, instead:

• RHIBeginRenderPass() call provides a hint that the

following passes will use depth

- Vulkan RHI sets up 2 subpasses

- RHINextSubpass()- VulkanRHI calls VkNextSubpass()

- OpenGLRHI could use this to call

FramebufferFetchBarrierQCOM() to support

QCOM_shader_framebuffer_fetch_noncoherent

Page 36: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 36This work is licensed under a Creative Commons Attribution 4.0 International License

2. Vulkan Sub-passesUnfortunately extra PSO permutations are necessary to support MSAA

• The depth is fetched using GLSL subpassLoad(input), but when using MSAA you must

use subpassLoad(input, sampleindex)

➢So toggling MSAA requires alternate shaders

• Targeting UE 4.23

Page 37: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 37This work is licensed under a Creative Commons Attribution 4.0 International License

3. High Level Rendering Refactor

• Much more aggressive caching for static scene elements

• The full state of each drawcall is cached when mesh added to the scene

- Pipeline State ObjectBound resources, shader constants and uniform buffers

• Much reduced Render thread cost➢After calculating visibility it simply walks the drawlist and applies the cached state

• Initial release in UE 4.22

Page 38: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 38This work is licensed under a Creative Commons Attribution 4.0 International License

3. High Level Rendering Refactor

• Automatic geometry instancing support

- Sort draw list by PSO, mesh and bound resources

- Examine for sets of matching PSOs and bound resources

- Requires all per-instance constants (eg transform matrices) to be stored in a

single buffer

- Look up per-instance parameters in the shader

- Potentially bad for mobile performance.

- Previously measured ~30% cost. Work in progress.

Page 39: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 39This work is licensed under a Creative Commons Attribution 4.0 International License

Kostiantyn Drabeniuk,

Samsung Electronics

Part 2

Vulkan for Fortnite

Mobile

Page 40: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 40This work is licensed under a Creative Commons Attribution 4.0 International License

• Provide the best gaming experience to customers on Samsung devices

• Promote new technologies usage

• Contribute to the most popular game engines

• Support game developers all over the world

Galaxy GameDev

Page 41: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 41This work is licensed under a Creative Commons Attribution 4.0 International License

Agenda

• Advantages of using Vulkan in mobile games

• How to get more FPS - performance optimizations

• How to get stable FPS - hitching/memory optimizations

Page 42: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 42This work is licensed under a Creative Commons Attribution 4.0 International License

OpenGL ES

0

10

20

30

40

0 100 200 300 400 500 600 700 800 900 1000

ms

Frames

RHI Thread time

0

20

40

60

0 5 10 15 20 25 30

Seconds

FPS Chart

FPS 39

RHI Thread

Time (ms)16.94

Page 43: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 43This work is licensed under a Creative Commons Attribution 4.0 International License

Vulkan

• Balanced CPU/GPU usage

• Lower CPU overhead

• Parallel tasking

• Explicit control

• No error checking at runtime

Page 44: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 44This work is licensed under a Creative Commons Attribution 4.0 International License

Vulkan vs GLES

0

10

20

30

40

0 100 200 300 400 500 600 700 800 900 1000

ms

Frames

RHI Thread time

GLES

Vulkan

GLES Vulkan

FPS 39 47(+8)

RHI Thread Time (ms) 16.94 8.25(-51%)

0

20

40

60

0 5 10 15 20 25 30

Seconds

FPS Chart

GLES

Vulkan

Page 45: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 45This work is licensed under a Creative Commons Attribution 4.0 International License

Performance optimizations

• DescriptorSet cache

• Merge RenderPasses

• Remove useless barriers

• Remove extra depth copy

• Occlusion query

• Buffer upload

Page 46: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 46This work is licensed under a Creative Commons Attribution 4.0 International License

DescriptorSet cache• Reuse already updated DescriptorSets

AllocateDS

+

UpdateDS

BindDS

+

Draw

Allocate and Update

DescriptorSets before each

draw call

Reuse Descriptors Sets from

cache

AllocateDS

+

UpdateDS

BindDS

+

Draw

AllocateDS

+

UpdateDS

BindDS

+

Draw

AllocateDS

+

UpdateDS

BindDS

+

Draw

……

AllocateDS

+

UpdateDS

BindDS

+

Draw

BindDS

+

Draw

BindDS

+

Draw

AllocateDS

+

UpdateDS

BindDS

+

Draw

……

BindDS

+

Draw

BindDS

+

Draw

Page 47: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 47This work is licensed under a Creative Commons Attribution 4.0 International License

• There were a lot of cache misses due to storing buffer offset inside DescriptorSet

DescriptorSet cache

DescriptorSet1

Binding 0

offset = ...

DescriptorSet3

Binding 0

offset = …

DescriptorSet2

Binding 0

offset = …

UBO2 UBO3UBO1

2240 352

VkBuffer

128

DescriptorSet1

Binding 0

offset = 128

DescriptorSet3

Binding 0

offset = 352

DescriptorSet2

Binding 0

offset = 224

Shader

Binding 0

Shader

Binding 0

Shader

Binding 0

Allocate DescriptorSet

Update DescriptorSet

Bind DescriptorSet

Page 48: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 48This work is licensed under a Creative Commons Attribution 4.0 International License

DescriptorSet cache• Hit rate can be improved by using Dynamic Uniform Buffer

DescriptorSet1

Binding 0

offset = …

UBO2 UBO3UBO1

2240 352

VkBuffer

128

DescriptorSet1

Binding 0

offset = 0

Shader

Binding 0

Shader

Binding 0

Shader

Binding 0

DynamicOffset=128 DynamicOffset=224 DynamicOffset=352

DescriptorSet1 can be used for all binds,

no need to allocate and update new one

Allocate DescriptorSet

Update DescriptorSet

Bind DescriptorSet

Page 49: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 49This work is licensed under a Creative Commons Attribution 4.0 International License

DescriptorSet cache• Do not use Vulkan Handle for hash calculation

- Vulkan can use same handles for different types

- Vulkan can reuse handles from destroyed resources

• Generate own Handle ID for all Vulkan resources and use it for hash calculation

HashInfo

BufferInfo

VkBuffer

Range

Offset

Hash

Value

HashInfo

BufferInfo

Buffer Handle ID

Range

Offset

Hash

Value

Page 50: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 50This work is licensed under a Creative Commons Attribution 4.0 International License

DescriptorSet cache

0

150

300

450

600

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Percentile vkUpdateDescriptorSets() calls

Original DSCache

0

5

10

15

20

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

ms

Percentile RHI Thread time

Original DSCache

Original DSCache

Updates (avg calls per frame) 252 2(-99.2%)

RHI Thread Time Avg (ms) 10.12 9.15(-0.97)

Page 51: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 51This work is licensed under a Creative Commons Attribution 4.0 International License

Merge RenderPasses

DecalRenderTarget

Load

Store

Translucency RenderTarget

Load

Store

Decal+

Translucency

RenderTarget

Load

Store

UpscaleRenderTarget

Clear

Store

SlateUI RenderTarget

Load

Store

Upscale+

SlateUI

RenderTarget

Clear

Store

Page 52: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 52This work is licensed under a Creative Commons Attribution 4.0 International License

Remove useless barriers

READ_ONLY -> COLOR_ATTACHMENT

COLOR_ATTACHMENT -> READ_ONLY

READ_ONLY -> READ_ONLY

RenderDoc capture

Page 53: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 53This work is licensed under a Creative Commons Attribution 4.0 International License

Merge RenderPasses/Remove extra barriers

0

10

20

30

40

50

60

0 20 40 60 80 100 120 140 160 180 200 220 240 260

Seconds

FPS Chart

Original MergeRP

41 44

Median FPS

Page 54: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 54This work is licensed under a Creative Commons Attribution 4.0 International License

Decal,

Translucency passes

(Z write off)

...

Decal,

Translucency passes

(Z write off)

...

Remove extra depth copy

Base pass

Color Depth

Color Depth

New Depth

Base pass

Color Depth

Draw

Draw

Color Depth

Draw

Drawfetchfetch

Barrier to Read

Barrier to Optimal

Copy

Barrier to SRC

Page 55: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 55This work is licensed under a Creative Commons Attribution 4.0 International License

Occlusion query• Get occlusion query result for 3 frames back

- UE4 gets occlusion results for 2 frames back by default

- 3 swapchain back buffers are used in Android, so sometimes waiting happens

Frame N-3

Render

Occlusion… …

Frame N-2

Render

Occlusion… …

Frame N-1

Render

Occlusion… …

Frame N

GetQuery

Result… …

Frame N-3

Render

Occlusion… …

Frame N-2

Render

Occlusion… …

Frame N-1

Render

Occlusion… …

Frame N

GetQuery

Result… …

Sometimes, CPU need to wait

occlusion query

DO NOT need to wait

occlusion query

Swapchain queue size 2

Swapchain queue size 2

Page 56: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 56This work is licensed under a Creative Commons Attribution 4.0 International License

Occlusion query• Query management in original version

- Use one global query pool

GlobalPool

GlobalPool

Request M queries

GlobalPool

Get

Result

Get

Result

Get

Result

Get

Result

Get

Result

Get

Result

Get

Result

Get

Result

Get

Result

Get

ResultK calls of

vkGetQueryPoolResults()

K calls of

vkCmdResetQueryPool()

N frame

N-1 frame

N-2 frame

free queries submitted in N-3 frame

submitted in N-2 frame submitted in N frame

Reset Reset Reset Reset Reset Reset Reset Reset Reset Reset

Page 57: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 57This work is licensed under a Creative Commons Attribution 4.0 International License

Occlusion query• Query management after optimization

- Use separate pool for each frame

N frame

N-1 frame

N-2 frame

free queries submitted in N-3 frame

submitted in N-2 frame submitted in N frame

Pool1…

Request M queries

Pool2… Pool3…

Pool1… Pool2… Pool3…

GetResult

Reset

Pool1… Pool2… Pool3…

1 call of vkGetQueryPoolResults(…,0,K,…)

by specifying firstQuery and queryCount

1 call of vkCmdResetQueryPool(…,0,K,…) by

specifying firstQuery and queryCount

Page 58: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 58This work is licensed under a Creative Commons Attribution 4.0 International License

Occlusion query• Performance measurement

OriginalOcclusionQuery

Optimization

Median FPS 27 29(+2)

FPS Stability 75% 90%(+15%)

CPU Usage 16.32% 15.55%

GPU Usage 70.90% 79.52%

Page 59: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 59This work is licensed under a Creative Commons Attribution 4.0 International License

Buffer upload• Remove staging buffer usage

- Mobile GPUs usually have unified memory. Such memory allow direct host access

- For mobile GPUs staging buffer is not needed and extra copying can be removed

Buffer’s raw

dataStaging VkBuffer

HOST_VISIBLE

VkBufferDEVICE_LOCAL

vkCmdCopyBuffer()copy

Buffer’s raw

data

VkBufferHOST_VISIBLE /DEVICE_LOCAL

copy

Page 60: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 60This work is licensed under a Creative Commons Attribution 4.0 International License

Buffer upload

0

5

10

15

20

25

30

35

14000 14200 14400 14600 14800 15000

ms

Frames

RHI Thread time

Original RemoveStagingBuffer

5

10

15

20

25

30

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

ms

Percentile RHI Thread time

Original RemoveStagingBuffer

14.96 ms 14.00 msRHI Thread Time (avg)

Page 61: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 61This work is licensed under a Creative Commons Attribution 4.0 International License

Hitching/memory optimizations

• Asynchronous Vertex/Index buffer create

• Upload Texture

• DescriptorSetLayout cache miss

• Remove shader duplication

• Purge ShaderModules

• PSO cache miss

Page 62: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 62This work is licensed under a Creative Commons Attribution 4.0 International License

Asynchronous Vertex/Index buffer create• Allow asynchronous Vertex/Index buffer creation

- The basic versions of CreateVertex/IndexBuffer() use needless RHI Thread stall

- Vulkan RHI allows asynchronous Vertex/Index buffer creation

… …

… …

Wait until RHI Thread finish

execution of current task

No waiting

Render Thread

RHI ThreadOriginal

AsyncBuffer

Creation

Frame Time 92 89(-3)

Render Thread

Time38 16(-22)

RHI Thread

Time41 38(-3)

Number of Hitches

Task …

StallRHIThread

CreateVertexBufferRT

… …

… … Render Thread

RHI ThreadTask …

… …

CreateVertexBufferRT

Page 63: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 63This work is licensed under a Creative Commons Attribution 4.0 International License

Upload Texture• Texture uploading process:

UploadTexture

Image’s raw data

Staging VkBuffer

HOST_VISIBLE

VkImage DEVICE_LOCAL

Record commandsto copy Buffer into

Image

copy SubmitUploadCmdBuffer

Barrier to SRC

vkCmdCopyBufferToImage()

Barrier to READ_ONLY

Page 64: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 64This work is licensed under a Creative Commons Attribution 4.0 International License

• Record texture upload commands into one command buffer

Upload Texture

Frame N

Upload

Texture1

Submit

UploadCmd

Buffer1

… …

N calls of vkQueueSubmit()

1 call of vkQueueSubmit()

……Upload

Texture2

Submit

UploadCmd

Buffer2

Upload

TextureN

Submit

UploadCmd

BufferN

Frame N

Upload

Texture1 ……Upload

Texture2

Upload

TextureN

Upload

Texture3

Upload

Texture4

SubmitUpload

CmdBuffer1…

Page 65: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 65This work is licensed under a Creative Commons Attribution 4.0 International License

Upload Texture

0

20

40

60

80

100

120

0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 22000 24000

ms

Frames

Execution time of SubmitUploadCmdBuffer()

Original UploadTextureOptimization

OriginalUploadTexture

Optimization

Frame Time 89 40(-49)

Render Thread Time 16 8(-8)

RHI Thread Time 38 4(-34)

Number of Hitches

Page 66: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 66This work is licensed under a Creative Commons Attribution 4.0 International License

PSO cache miss• Do not hash pointers

• Hash only info which is used for Pipeline creation

• Calculate shader’s key from ShaderCode and use it for hashing

HashInfo

Pipeline Create Info

Shaders

Depth state

Rasterization state

Extra data

Hash

Value

Shader’s keys

Depth state

Rasterization state

Hash

Value

Calculated from

pointers and extra data

HashInfo

Shader’s pointers

Pointer to Depth state

Pointer to Rasterization

state

Extra data

Before After

1 session 33.5% 34.9%

2 session 77.4% 95.4%

3 session 80.8% 99.6%

Cache hit rate

Page 67: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 67This work is licensed under a Creative Commons Attribution 4.0 International License

DescriptorSetLayout cache miss• Hash data instead of pointers

DSLayoutHashInfo

Binding’s pointers

HashValue

Calculated from

pointers

DSLayoutHashInfo

Bindings

HashValue

OriginalDSLayout

HashOpt

vkDescriptor

SetLayout 113791231

(-89%)

vkPipeline

Layout 5999621

(-90%)0

2000

4000

6000

8000

10000

12000

0 100 200 300 400 500 600Frame

Number of vkDescriptorSetLayouts

Original DSLayoutHashOpt

0

1000

2000

3000

4000

5000

6000

7000

0 100 200 300 400 500 600Frame

Number of vkPipelineLayouts

Original DSLayoutHasOpt

Page 68: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 68This work is licensed under a Creative Commons Attribution 4.0 International License

DescriptorSetLayout cache miss

0

500

1000

1500

2000

2500

3000

0 100 200 300 400 500 600 700 800 900

MB

Seconds

Total memory

~250MBytes

Original

DescriptorSetLayout

CacheOpt

Page 69: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 69This work is licensed under a Creative Commons Attribution 4.0 International License

Remove shader duplication• Add shader cache

ShaderCode

CreateShader()

CreateShader()

Shader1

Shader2

ShaderCacheShaderCode

CreateShader()

CreateShader()

Shader1

Shader’s duplications

Page 70: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 70This work is licensed under a Creative Commons Attribution 4.0 International License

Remove shader duplication

0

500

1000

1500

2000

2500

3000

0 100 200 300 400 500 600 700 800 900

MB

Seconds

Total memory

~280MBytesShaderCache

Original

Page 71: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 71This work is licensed under a Creative Commons Attribution 4.0 International License

Purge ShaderModules• Don’t create shader module at shader creation time

• Create shader modules before pipeline creation and destroy after

CreateShader

Modules()CreatePipeline()

PurgeShader

Modules()CreateShader()

GetShader

Modules()CreatePipeline()

VertexShader

ShaderModuleHandle

(0x15sf478e)

CreateShader()

VertexShader

ShaderModuleHandle

(0x15sf478e)

VertexShader

ShaderModuleHandle

(0x15sf478e)

VertexShader

ShaderModuleHandle

(VK_NULL_HANDLE)

VertexShader

ShaderModuleHandle

(0x15sf478e)

VertexShader

ShaderModuleHandle

(0x15sf478e)

VertexShader

ShaderModuleHandle

(VK_NULL_HANDLE)

Page 72: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 72This work is licensed under a Creative Commons Attribution 4.0 International License

Purge ShaderModules

0

500

1000

1500

2000

2500

3000

0 100 200 300 400 500 600 700 800 900

MB

Seconds

Total memory

~110MBytes

Original

Purge

ShaderModules

Page 73: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 73This work is licensed under a Creative Commons Attribution 4.0 International License

0

500

1000

1500

2000

2500

3000

0 100 200 300 400 500 600 700 800 900

MB

Seconds

Original

0

500

1000

1500

2000

2500

3000

0 100 200 300 400 500 600 700 800 900

MB

Seconds

Original

+DSLayoutCacheOpt

~250MBytes

0

500

1000

1500

2000

2500

3000

0 100 200 300 400 500 600 700 800 900

MB

Seconds

Original

+ShaderCache

+DSLayoutCacheOpt

~530MBytes

0

500

1000

1500

2000

2500

3000

0 100 200 300 400 500 600 700 800 900

MB

Seconds

Original

+ShaderCache

+DSLayoutCacheOpt

+PurgeSM~640MBytes

Total Memory Usage

Page 74: DEVELOPER DAY - Khronos Group · Shadows Shadow Caching Materials Physically Based (w/approx) Subsurface Scattering Two-sided foliage Effects Volumetric Fog Light Shafts GPU Particle

© The Khronos® Group Inc. 2018 - Page 74This work is licensed under a Creative Commons Attribution 4.0 International License

Thank you!

Jack PorterEpic Games

https://[email protected]

Kostiantyn DrabeniukSamsung Galaxy GameDev

https://developer.samsung.com/[email protected]

[email protected]