optimization strategies for intel® hd graphics · 2013-02-26 · intel® gpa helps you analyze and...
TRANSCRIPT
www.intel.com/software/siggraph
Omar A Rodriguez Software Engineer / Developer Relations August 10, 2011
Optimization Strategies for Intel® HD Graphics
Agenda
• 2nd Generation Intel® Core™ Processors
• Quick Intro to Intel® GPA 4.1
• Graphics Optimizations on Darkspore™
• Pre-Instrumented Middleware
• Q & A
2
2nd Generation Intel® Core™ Processors (codenamed “Sandy Bridge”)
• 32nm process technology throughout
• Full silicon integration
• Graphics power management upgraded to CPU class power management techniques
3
Architecture Optimized for Energy Efficiency and Performance
• DirectX 10 and OpenGL 3.0
• Optimal balance of fixed function to compute
• Minimized driver overhead to free up CPU load to redirect power to Graphics frequency
• High Speed Transcode with HD Decode and Encode
4
Improved 3D and Game Performance over Previous Generations of Intel® Graphics
5
Improved Gaming Experience on Mainstream Graphics with Sandy Bridge
• 1280 x 720
• 30 FPS
• Medium settings
6
Intel® GPA 4.1 released in July 2011
• Full support for DirectX* 9, 10, and 11
• Display Depth Buffers in Frame Analyzer
• OpenCL support in Platform Analyzer
• New Media Performance Analyzer
• Analyze Graphics-rich Web Content – Google Chrome, Mozilla Firefox, and Microsoft Internet Explorer
• http://www.intel.com/software/gpa
7
FREE!
Intel® GPA helps you analyze and tune your game for performance
• In-game analysis with System Analyzer HUD using state overrides and real-time metrics graphs
• Deep frame analysis with Frame Analyzer down to the draw call level, including shaders, textures, D3D states, pixel history
• View system wide picture of CPU and GPU workload with Platform Analyzer
8
Intel® GPA 4.1: The Workflow
Target Game with HUD
If CPU bound, use Platform Analyzer or other CPU profiling
tools like VTune
If GPU bound, use Frame Analyzer
9
In-game Analysis with System Analyzer HUD
FPS, DirectX*, Resolution
4 metrics graphs configurable in real time from GPA Monitor
Use state overrides to find high-level bottlenecks Disable Draw Calls: CPU or GPU bound 1x1 Scissor Rect: Determine if fill rate is bottleneck Simple Pixel Shader: Determine if pixel shader complexity is an issue
10
Deep Frame Analysis with Frame Analyzer
Draw call visualization Configurable X and Y axes
List of draw calls Render Target Viewer View selected geometry
Hardware metrics, textures View and modify state, shaders Run experiments
11
Optimizing Darkspore™ to Reach the Most Players
• Online Action RPG
• Play a Squad of 3 Heroes
• Spore* Editor
• 100 Heroes to Unlock
• 4 player Co-op
• 2v2 PVP
12
Darkspore™ Rendering: The Deferred Pass
Normal (RGB) + Gloss (A) Depth (R*256+G) + SpecPow (B) + ToonId (A)
13
Darkspore™ Rendering: The Lighting Pass
Diffuse (RGB) + Specular (A)
14
Darkspore™ Rendering: The Final Pass
Color + Glow + Post FX + Particles + UI =
15
Profiling before and after GPA
16
Identifying the Bottlenecks in Darkspore™ with Frame Analyzer
• Blood decals in the Deferred and Final pass
Parallel lights + cloud shadows + shadows
Edge detection for toon lining
17
Optimizing the Blood Decal Pixel Shaders
Supported tiling decals but never used that features
1) Normalized a vector used only for cubemap lookup
2) Fresnel term that was adding very little given the fixed camera angle
1) Implementing alpha test with a clip and blending
2) Calculating normal was overly complex with values that were moved to the vertex shader
18
Hot-loading Shaders in Frame Analyzer Shows 30% and 24% Improvement
19
Blood decals are volumes that write more pixels
20
Use the Stencil to Kill Pixel in the Final Pass
Frame Analyzer reports final pass draw call was improved by 65.1%!
21
Optimizing the Blood Decal Pixel Shaders
Parallel lights + cloud shadows + shadows
22
Further Optimizations for Darkspore™
Trees all had roots below the ground! Creatures were really dense and burning quads.
23
Further Optimizations for Darkspore™
View space normals took only two channels but weren’t worth the cost.
Terrain mixes 4 textures together per pass, but large sections only really need one.
24
Reaching the most players by optimizing for mainstream graphics
• 30% improvement to blood decals shaders in deferred pass
• 24% improvement to blood decals shaders in final pass
• Additional 65% improvement to blood decals in final pass using a stencil
25
System-wide Analysis with Platform Analyzer
Task groups visualization
Instrumented tasks across threads CPU & GPU frame duration DirectX* calls
Trace statistics Relations, dependencies and hierarchy CPU/GPU & DirectX* metrics
26
Leading Middleware Products Instrumented for Platform Analyzer
27
Simple and Flexible Instrumentation API
#include <ittnotify.h> void System::DoWork( … ) { __itt_begin_task( “System::DoWork” ); // do work __itt_end_task(); }
28
30
Intel Sessions – Wednesday, August 10
4:30-5:30pm Visual Computing Performance Optimization:
Tools and Strategies
Please turn in your evaluation forms
Legal Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. EXCEPT AS PROVIDED IN INTEL'S TERMS
AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY RELATING TO SALE AND/OR USE OF INTEL PRODUCTS, INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT, OR OTHER INTELLECTUAL PROPERTY RIGHT.
Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications.
Intel Corporation may have patents or pending patent applications, trademarks, copyrights, or other intellectual property rights that relate to the presented subject matter. The furnishing of documents and other materials and information does not provide any license, express or implied, by estoppel or otherwise, to any such patents, trademarks, copyrights, or other intellectual property rights.
Intel may make changes to specifications, product descriptions, and plans at any time, without notice.
The Intel processor and/or chipset products referenced in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
All dates provided are subject to change without notice. All dates specified are target dates, are provided for planning purposes only and are subject to change.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
* Other names and brands may be claimed as the property of others.
Copyright © 2010, Intel Corporation. All rights reserved.
Optimization Notice Optimization Notice
Intel® compilers, associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel® and non-Intel microprocessors (for example SIMD instruction sets), but do not optimize equally for non-Intel microprocessors. In addition, certain compiler options for Intel compilers, including some that are not specific to Intel micro-architecture, are reserved for Intel microprocessors. For a detailed description of Intel compiler options, including the instruction sets and specific microprocessors they implicate, please refer to the “Intel® Compiler User and Reference Guides” under “Compiler Options." Many library routines that are part of Intel® compiler products are more highly optimized for Intel microprocessors than for other microprocessors. While the compilers and libraries in Intel® compiler products offer optimizations for both Intel and Intel-compatible microprocessors, depending on the options you select, your code and other factors, you likely will get extra performance on Intel microprocessors. Intel® compilers, associated libraries and associated development tools may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include Intel® Streaming SIMD Extensions 2 (Intel® SSE2), Intel® Streaming SIMD Extensions 3 (Intel® SSE3), and Supplemental Streaming SIMD Extensions 3 (Intel® SSSE3) instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel® and non-Intel microprocessors, Intel recommends that you evaluate other compilers and libraries to determine which best meet your requirements. We hope to win your business by striving to offer the best performance of any compiler or library; please let us know if you find we do not.
Notice revision #20101101
33