© Copyright Khronos Group 2014 - Page 1
Press Briefing GDC, March 2014
Neil Trevett Vice President Mobile Ecosystem, NVIDIA
President Khronos
© Copyright Khronos Group 2014 - Page 2
Lots of Khronos News at GDC! • OpenGL ES 3.1 Released
- Compute shaders & enhanced rendering coming to over a billion mobile devices
• OpenGL Momentum
- Developer tools and insights into low-overhead drivers
• EGL 1.5 Released
- Enhanced rendering, interop and system portability
• OpenCL 2.0 Adopters Program Released
- Full OpenCL 2.0 conformance tests available
• WebCL 1.0 Released
- Web developers to get access to heterogeneous parallel computing
• SYCL 1.2 Provisional Released
- Enabling high-level, C++ frameworks over OpenCL
© Copyright Khronos Group 2014 - Page 3
Speakers This Morning • Neil Trevett
- Vice President Mobile Ecosystem, NVIDIA
- President, Khronos
- Chair, OpenCL Working Group
• Tom Olson
- Director of Graphics Research at ARM Media Processing Division
- Chair, OpenGL ES Working Group
• Andrew Richards
- CEO, Codeplay
- Chair, SYCL Working group
• Jon Peddie
- Founder and President, Jon Peddie Research
• David Cole
- Founder and CEO, DFC Intelligence
© Copyright Khronos Group 2014 - Page 4
Khronos Connects Software to Silicon
Open Consortium creating
ROYALTY-FREE, OPEN STANDARD
APIs for hardware acceleration
Defining the roadmap for
low-level silicon interfaces
needed on every platform
Graphics, compute, rich media,
vision, sensor and camera
processing
Rigorous specifications AND
conformance tests for cross-
vendor portability
Acceleration APIs
BY the Industry
FOR the Industry
Well over a BILLION people use Khronos APIs
Every Day…
© Copyright Khronos Group 2014 - Page 5
Khronos Standards
Visual Computing - 3D Graphics - Heterogeneous Parallel Computing
3D Asset Handling - 3D authoring asset interchange
- 3D asset transmission format with compression
Acceleration in HTML5 - 3D in browser – no Plug-in
- Heterogeneous computing for JavaScript
Camera
Control API
Over 100 companies defining royalty-free
APIs to connect software to silicon
Sensor Processing - Vision Acceleration - Camera Control - Sensor Fusion
© Copyright Khronos Group 2014 - Page 6
OpenGL ES 3.1 Tom Olson, Director of Graphics Research at ARM
Media Processing Division Chair, OpenGL ES Working Group
© Copyright Khronos Group 2014 - Page 7
OpenGL ES 3.1 Launched at GDC!
• OpenGL ES is used by over a BILLION users every day
- Used in almost every mobile device – phones, tablets, embedded, more…
• Introducing OpenGL ES 3.1 at GDC!
- A significant upgrade in functionality coming for a huge number of users
• Expecting rapid adoption
- A driver upgrade for many SOCs
- Backward compatible with 2.0/3.0 so apps can incrementally adopt features
2002 Working
Group
Formed
2003
1.0
2004
1.1
2007
2.0
2012
3.0
2014
3.1
Driver
Update
Silicon
Update
Silicon
Update
Driver
Update
© Copyright Khronos Group 2014 - Page 8
OpenGL ES 3.1 Goals
• Bringing developer requested features from desktop OpenGL 4 to mobile
- Advanced features, modern programming styles
- Higher performance with lower overhead
• Headline features - Compute Shaders and Draw-Indirect
- Compute shaders can create geometry or other rendering data
- …and also the draw commands needed to render them
- Offload work from CPU to GPU – critical for mobile perf and power
• Run on OpenGL ES 3.0 hardware – expose hidden capabilities of shipping devices
- Enable very rapid adoption across the industry
• Better looking, faster performing apps!
© Copyright Khronos Group 2014 - Page 9
• API and shading language specifications ratified and released
- Manual pages also available
• Conformance test is code complete
- Expect to be accepting conformance submission within three months
• Widespread industry participation
- Tool and Game Engine Developers
- GPU Designers
- SoC Vendors
- Platform Owners
- End Equipment Makers
- Middleware ISVs
Apple
Key Working Group Status and Participants
© Copyright Khronos Group 2014 - Page 10
Other OpenGL ES 3.1 Key Features • Separate shader objects
- Vertex and fragment shaders treated as separate programs
- Applications can mix and match shaders with compatible interfaces
- Supports popular console / PC programming styles
• Texture gather
- Read texture samples from a 2x2 pixel block
- Supports fast PCF shadow filtering, other applications
• New texture types
- Multi-sampled textures, packed depth/stencil textures
• Language features
- Multidimensional arrays, bitfield operations, synchronization primitives
© Copyright Khronos Group 2014 - Page 11
OpenGL ES 3.1 Optional Extensions • Useful functionality that may be headed to future core releases
• Image atomics
- Special shading language functions for atomic operations on images
• Multi-sampled array textures
• Advanced blending modes
- Supports compositing APIs found in UI, web standards, and 2D art pipelines
• Sample shading
- Fragment shader can run per-sample in multi-sampled rendering
• Stand-alone 8-bit stencil textures
© Copyright Khronos Group 2014 - Page 12
OpenGL ES 3.1 Desktop Compatibility • ARB_ES_3_1_compatibility specification
- Under development
• Will enable desktop drivers to be used for mobile development
- OpenGL 4.4 drivers will be able to support “OpenGL ES 3.1 context”
• Adds features missing in OpenGL
- New function MemoryBarrierByRegion()
- Raise minimum SSBO size to 128 MB
- Support for GLSL ES version 310 - ImageAtomicExchange()
- Extend mix() to int, uint and bool
- gl_helperInvocation
- gl_MaxSamples
- Adds several gl_Max*ImageUniforms builtins
- gl_MaxCombinedShaderOutputResources
© Copyright Khronos Group 2014 - Page 13
OpenGL Ecosystem News • Valve’s VOGL – OpenGL capture / playback debugger
- OpenGL 3.3, OpenGL 4 in progress
- Now on github!
• Valve’s ToGL
- Subset of Direct3D 9.0c -> OpenGL
- API and DX bytecode
- On github
• SIGGRAPH course Introduction to OpenGL programming
- Free on youtube
• OpenTK updated to OpenGL 4.4
- Low-level C# library that wraps OpenGL and more
© Copyright Khronos Group 2014 - Page 14
OpenGL and Reducing Driver Overhead • Because driver overhead == cost
• Costs
- CPU cycles from app
- CPU cache from app
- Power / battery
- GPU throughput
• Can we drive driver overhead to ZERO?
- In OpenGL!
© Copyright Khronos Group 2014 - Page 15
OpenGL Fallacy: Old and Inefficient
Immediate Mode Fixed
Function
Ancient crufty stuff
Feedback
Selection
Evaluators
Display Lists
Selectors
© Copyright Khronos Group 2014 - Page 16
OpenGL Reality: Modern & Efficient
Bindless ARB
SSBO GL4.3
Multi-Draw Indirect GL4.3
UBO GL3.1
Texture Arrays GL3.0
Buffer Storage GL4.4
© Copyright Khronos Group 2014 - Page 17
Plus, OpenGL has all the features
Compute
Tessellation
Geometry Shaders
Sparse Textures
Image Load/Store
© Copyright Khronos Group 2014 - Page 18
indirect draw
buffer object
buffer object
texture object
buffer object
buffer object
texture object
buffer object
buffer object
buffer object
render target
buffer object
Classic OpenGL Model
CPU
GPU
…
…
Memory
cmd
cmd
cmd
cmd
Direct Drawing Commands
(via the command fifo)
© Copyright Khronos Group 2014 - Page 19
indirect draw
buffer object
indirect draw
buffer object
texture object
buffer object
indirect draw
buffer object
texture object
buffer object
buffer object
buffer object
render target
buffer object
Efficient OpenGL Model
CPU
CPU
CPU
CPU
GPU
…
…
Memory
CPU Writes Memory –
multi-threaded (no API)!
GPU Writes Commands to Memory
Reads Commands from Memory
No API – Minimal CPU Involvement
Memory access
mediated through
OpenGL fences
© Copyright Khronos Group 2014 - Page 20
Results • OpenGL enables scalable multi-threading with no new API
- CPU and GPU Cores just write to memory
- GPU work creation - builds buffers, constructs MDI commands
• Integer multiple speedups ~5x – ~15x (not a typo)
- On driver limited cases, obviously
• Works TODAY on existing drivers!
- Mostly OpenGL 4.2+
- Extensions are at least EXT
• Does not require a new object model
- Does not require breaking existing applications
“Approaching Zero Driver Overhead in OpenGL”
NVIDIA, AMD, Intel presenting
Thursday 1PM, Room 2004, West Hall
© Copyright Khronos Group 2014 - Page 21
EGL 1.5 Released • EGL 1.5 brings functionality from
multiple extensions into core
- Increased reliability and portability
• EGLImages
- Sharing textures and renderbuffers
• Context Robustness
- Defending against malicious code
• EGLSync objects
- Improved OpenGL /OpenCL interop
• Platform extensions
- Standardized interactions for multiple
OS e.g. Android and 64-bit platforms
• sRGB colorspace rendering
Applications
OS and Display
Platforms
Application Portability EGL abstracts graphics context
management, surface and
buffer binding and rendering synchronization
API Interop EGL provides efficient
transfer of data and events
between Khronos APIs
© Copyright Khronos Group 2014 - Page 22
OpenCL: Portable Heterogeneous Computing • Portable Heterogeneous programming of diverse compute resources
- Targeting supercomputers -> embedded systems -> mobile devices
• One code tree can be executed on CPUs, GPUs, DSPs and hardware
- Dynamically interrogate system load and balance work across available processors
• OpenCL = Two APIs and C-based Kernel language
- Kernel language - Subset of ISO C99 + language extensions
OpenCL
Kernel
Code
OpenCL
Kernel
Code
OpenCL
Kernel
Code
OpenCL C
Kernel
Code
GPU
DSP CPU
CPU HW
C Platform API To query, select and initialize
compute devices
C Runtime API To build and execute kernels
across multiple devices
© Copyright Khronos Group 2014 - Page 23
OpenCL 2.0 Adopters Program Launched • Official conformance test suite for the OpenCL 2.0 specification
- Implementers can certify that their implementations are officially conformant
• Released a set of header files for OpenCL 2.0
- Available on www.khronos.org
• Updated OpenCL 2.0 specification
- Clarifications and corrections to the specification first released in November 2013
• First conformant implementations of OpenCL 2.0 expected to be
available to developers in the first half of 2014
© Copyright Khronos Group 2014 - Page 24
WebCL: Heterogeneous Computing for the Web • WebCL 1.0 specification officially finalized today at GDC!
- https://www.khronos.org/webcl
• WebCL defines JavaScript binding to the OpenCL APIs
- Enables initiation of Kernels written in OpenCL C within the browser
• Typical Use Cases
- 3D asset codecs, video codecs and processing, imaging and vision processing
- Physics for WebGL games, Online data visualization, Augmented Reality
OpenCL
Kernel
Code
OpenCL
Kernel
Code
OpenCL
Kernel
Code
OpenCL C
Kernel
Code
GPU
DSP CPU
CPU HW
JavaScript Platform API To query, select and initialize
compute devices
JavaScript Runtime API To build and execute kernels
across multiple devices
© Copyright Khronos Group 2014 - Page 25
Content
JavaScript, HTML, CSS, ...
WebGL/WebCL Ecosystem
JavaScript Middleware
JavaScript HTML5
/ CSS
Browser provides WebGL and WebCL Alongside other HTML5 technologies
No plug-in required
OS Provided Drivers WebGL uses OpenGL ES 2.0 or
Angle for OpenGL ES 2.0 over DX9
WebCL uses OpenCL 1.X
Content downloaded from the Web
Middleware can make WebGL and WebCL
accessible to non-expert programmers E.g. three.js library: http://threejs.org/ used by
majority of WebGL content
Low-level APIs provide
a powerful foundation
for a rich JavaScript
middleware ecosystem
© Copyright Khronos Group 2014 - Page 26
WebCL Architectural Security Designed-in • Leverages OpenCL 1.2 robustness/security extensions
- Context Termination: to prevent DoS from long running kernels
- Memory Initialization: no leakage from out of bounds memory access
• WebCL Kernel Validator https://github.com/KhronosGroup/webcl-validator
- Open source - provided as a “library API” for easy integration into browsers
- Parses and validates kernel code against specification
- Initializes local/private memory if underlying OpenCL implementation does not
- Tracks memory allocations and traces valid ranges for reads and writes
- Run time checks to make all memory accesses safe
• API and Language Restrictions
- Not supported: structures as Kernel arguments, Kernel names>256 characters,
mapping of CL memory objects into host memory, program binaries,
some OpenCL API calls and built-ins
© Copyright Khronos Group 2014 - Page 27
Open Source Implementations and Resources • WebCL Conformance Framework and Test Suite (contributed by Samsung)
- https://github.com/KhronosGroup/WebCL-conformance/
• Nokia - Firefox build with integrated WebCL
- Firefox extension, open sourced May 2011 (Mozilla Public License 2.0)
- https://github.com/toaarnio/webcl-firefox
• Samsung - uses WebKit, open sourced June 2011 (BSD)
- https://github.com/SRA-SiliconValley/webkit-webcl
• Motorola Mobility - uses Node.js, open sourced April 2012 (BSD)
- https://github.com/Motorola-Mobility/node-webcl
• AMD – uses Chromium (open source)
- https://github.com/amd/Chromium-WebCL
http://fract.ured.me/ Based on Iñigo Quilez, Shader Toy Based on Apple QJulia Based on Iñigo Quilez, Shader Toy
© Copyright Khronos Group 2014 - Page 28
WebCL Parallel Computing for Web Acceleration
http://www.youtube.com/user/SamsungSISA#p/a/u/1/9Ttux1A-Nuc
© Copyright Khronos Group 2014 - Page 29
OpenCL as Parallel Compute Foundation • 100+ tool chains and languages leveraging OpenCL
- Heterogeneous solutions emerging for the most popular programming languages
JavaScript
binding to
OpenCL
WebCL River Trail
Language
extensions to
JavaScript
C++ AMP
Shevlin Park
Uses Clang
and LLVM
OpenCL provides vendor optimized,
cross-platform, cross-vendor access to
heterogeneous compute resources
Harlan
High level
language for GPU
programming
Compiler
directives for
Fortran C and C++
Aparapi
Java language
extensions for
parallelism
PyOpenCL
Python wrapper
around
OpenCL
Image
Processing
Language
Halide
Device X Device Y Device Z
© Copyright Khronos Group 2014 - Page 30
Widening OpenCL Ecosystem
Device X Device Y Device Z
OpenCL C
Kernel Source
SPIR Generator (e.g. patched Clang)
Alternative
Language for
Kernels
Alternative
Language for
Kernels
Alternative
Language for
Kernels
High-level
Frameworks High-level
Frameworks Apps and
Frameworks
OpenCL C
Runtime
OpenCL run-time
can consume SPIR
SPIR Standard Portable
Intermediate Representation
SPIR 1.2 Released
January 2014
SYCL Programming abstraction that combines
portability and efficiency of OpenCL with
ease of use and flexibility of C++
SPIR 1.2 Released
here at GDC!
© Copyright Khronos Group 2014 - Page 31
SPIR and LLVM • SPIR 1.2
- Portable non-source encoding for OpenCL 1.2 device programs
• LLVM is an optimizing compiler toolkit
- Open source platform for innovation that is portable, flexible, well understood
- SPIR based on LLVM 3.2 with open consultation with LLVM community
• Consumption API for target hardware
- cl_khr_spir extension to OpenCL runtime API
• Example SPIR generator
- Open source patch to Clang translates OpenCL C to SPIR IR
- https://github.com/KhronosGroup/SPIR
If you can do it in OpenCL C
You can do it in SPIR
© Copyright Khronos Group 2014 - Page 32
SYCL for OpenCL Andrew Richards, CEO Codeplay
Chair, SYCL Working group GDC, March 2014
© Copyright Khronos Group 2014 - Page 33
Where is OpenCL today? • OpenCL: supported by a very wide
range of platforms
- Huge industry adoption
• Provides a C-based kernel language
• NEW: SPIR provides ability to build other
languages on top of OpenCL run-time
• Now, we need to provide additional
languages and libraries
• Topic for today: C++
OpenCL C
Kernels
Other Language
Kernels
Device X Device Y Device Z
OpenCL
Runtime
© Copyright Khronos Group 2014 - Page 34
SYCL for OpenCL • Pronounced ‘sickle’
- To go with ‘spear’ (SPIR)
• Royalty-free, cross-platform C++
programming layer
- Builds on portability and
efficiency of OpenCL
- Ease of use and flexibility of C++
• Single-source C++ development
- C++ template functions can contain
host & device code
- Construct complex reusable algorithm
templates that use OpenCL for
acceleration
OpenCL C
Kernels
Device X Device Y Device Z
OpenCL
Runtime
High-level
Frameworks High-level
Frameworks C++ Based Apps
& Frameworks
© Copyright Khronos Group 2014 - Page 35
Enabling C++ within OpenCL Ecosystem • Want C++ code to be portable to OpenCL
- C++ libraries supported on OpenCL
- C++ tools supported on OpenCL
• Aim to achieve long-term support for OpenCL features with C++
- Good performance of C++ software on OpenCL
• Multiple sources of implementations and enable future innovation
- Allows innovators in C++ for heterogeneous devices to leverage an open standard
- Example of what can be done now OpenCL supports multiple languages with SPIR
• Developers can now use OpenCL as the basis for a whole range of innovations in
software for heterogeneous systems
© Copyright Khronos Group 2014 - Page 36
Simple Example #include <CL/sycl.hpp>
int main ()
{
int result; // this is where we will write our result
{ // by sticking all the SYCL work in a {} block, we ensure
// all SYCL tasks must complete before exiting the block
// create a queue to work on
cl::sycl::queue myQueue;
// wrap our result variable in a buffer
cl::sycl::buffer<int> resultBuf (&result, 1);
// create some ‘commands’ for our ‘queue’
cl::sycl::command_group (myQueue, [&] ()
{
// request access to our buffer
auto writeResult =
resultBuf.access<cl::sycl::access:write_only> ();
// enqueue a single, simple task
single_task(kernel_lambda<class simple_test>([=] ()
{
writeResult [0] = 1234;
}
}); // end of our commands for this queue
} // end scope, so we wait for the queue to complete
printf (“Result = %d\n”, result);
}
Does everything* expected of an
OpenCL program: compilation,
startup, shutdown, host fall-back,
queue-based parallelism, efficient
data movement.
* (this sample doesn’t catch exceptions)
© Copyright Khronos Group 2014 - Page 37
What Now? • We are releasing this provisional specification to get feedback from developers
- So please give feedback!
- Khronos forums are the best place
- http://www.khronos.org/opencl/sycl
• Next steps
- Full specification, based on feedback
- Conformance test suite to ensure compatibility between implementations
• Release of implementations
- Codeplay is working on an implementation
- Anyone can implement it - it’s an open, royalty-free standard!
• Roadmap
- OpenCL working group considering C++ as kernel language - Complementary to SPIR
- SPIR 2.0 and SYCL 2.0 for use with OpenCL 2.0