2013 korean tour daegu
DESCRIPTION
Khronos toured Korea in November 2013. Erik Noreke VP of Business Development visited "Human Care Center Workshop" and "Interaction Standard Workshop". Additional details may be found on the Khronos Group event page https://www.khronos.org/news/events/korean-tour-2013TRANSCRIPT
© Copyright Khronos Group, 2013 - Page 1
Graphics Technology Update
Presented by:
Erik Noreke, Khronos Group Vice President of Business Development
November 2013
© Copyright Khronos Group, 2013 - Page 2
Khronos Connects Software to Silicon
ROYALTY-FREE, OPEN STANDARD APIs for
advanced hardware acceleration
Low level silicon to software interfaces needed on every platform
Graphics, video, audio, compute,
vision, sensor and camera processing
Defines the forward looking roadmap for
the silicon community
Shipping on billions of devices across
multiple operating systems
Rigorous conformance tests for
cross-vendor consistency
Khronos is OPEN for any company to
join and participate
Acceleration APIs BY the Industry
FOR the Industry
© Copyright Khronos Group, 2013 - Page 3
Power is the New Limit to Performance • GPUs are much more power efficient than CPUs for data parallelism
- When exploiting data parallelism can x10 as efficient – but can go further…
• Lots of space for transistors on SOC – but can’t turn them all on at same time!
- Would exceed Thermal Design Point
• Dark Silicon - specialized hardware – only turned on when needed
- Dedicated units can increase locality and parallelism of computation
Power Efficiency
Computation Flexibility
Enabling new mobile use cases requires pushing computation
onto GPUs and dedicated hardware
Dedicated Hardware
GPU Compute
Multi-core CPU X1
X10
X100
How do we provide
access to this diversity of
processors and hardware
without horrible platform
fragmentation?
Standards!
© Copyright Khronos Group, 2013 - Page 4
OpenCL – Heterogeneous Computing
• Native framework for programming diverse
parallel computing resources
- CPU, GPU, DSP – as well as hardware blocks(!)
• Powerful, low-level flexibility
- Foundational access to compute resources for
higher-level engines, frameworks and languages
• Embedded profile
- No need for a separate “ES” spec
- Reduces precision requirements
A cross-platform, cross-vendor standard for
harnessing all the compute resources in an SOC
OpenCL
Kernel
Code
OpenCL
Kernel
Code
OpenCL
Kernel
Code
OpenCL
Kernel
Code
GPU
DSP
One code tree can be executed on
CPUs, GPUs, DSPs and hardware.
Dynamically interrogate system load
and load balance work across
available processors
CPU
CPU HW
© Copyright Khronos Group, 2013 - Page 5
OpenCL Overview • C Platform Layer API
- Query, select and initialize compute devices
• Kernel Language Specification
- Subset of ISO C99 with language extensions
- Well-defined numerical accuracy - IEEE 754 rounding with specified max error
- Rich set of built-in functions: cross, dot, sin, cos, pow, log …
• C Runtime API
- Runtime or build-time compilation of kernels
- Execute compute kernels across multiple devices
• Memory management is explicit
- Application must move data from
host global local and back
- Implementations can optimize data movement
in unified memory systems
© Copyright Khronos Group, 2013 - Page 6
OpenCL: Execution Model • Kernel
- Basic unit of executable code ~ C function
- Data-parallel or task-parallel
• Program
- Collection of kernels and functions
~ dynamic library with run-time linking
• Command Queue
- Applications queue kernels & data transfers
- Performed in-order or out-of-order
• Work-item
- An execution of a kernel by a processing element
~ thread
• Work-group
- A collection of related work-items that execute on
a single compute unit ~ core
Example of parallelism types
© Copyright Khronos Group, 2013 - Page 7
OpenCL Built-in Kernels • Used to control non-OpenCL C-capable
resources on an SOC – ‘Custom Devices’
- E.g. Video encode/decode, Camera ISP …
• Represent functions of Custom Devices
as an OpenCL kernel
- Can enqueue Built-in Kernels to Custom
Devices alongside standard OpenCL kernels
• OpenCL run-time a powerful coordinating
framework for ALL SOC resources
- Programmable and custom devices
controlled by one run-time
Built-in kernels enable control of specialized processors and hardware
from OpenCL run-time
© Copyright Khronos Group, 2013 - Page 8
OpenCL SPIR 1.2 Provisional released!
OpenCL Roadmap
OpenCL 2.0
Significant enhancements to memory and execution models to
expose emerging hardware capabilities and provide increased
flexibility, functionality and performance to developers
OpenCL SPIR (Standard Parallel Intermediate Representation)
LLVM-based, low-level Intermediate Representation for IP Protection and as
target back-end for alternative high-level languages
OpenCL HLM (High Level Model)
High-level programming model, unifying host and device execution environments through
language syntax for increased usability and broader optimization opportunities
OpenCL 2.0 Provisional released!
© Copyright Khronos Group, 2013 - Page 9
OpenCL Milestones • 24 month cadence for major OpenCL 2.0 update
- Slightly longer than18 month cadence between versions of OpenCL 1.X
• Provisional Specification enables public review
- Warning! The spec may change before final release!
OpenCL 1.0 released. Conformance tests
released Dec08
Dec08
Jun10
OpenCL 1.1 Specification and
conformance tests
released Nov11
OpenCL 1.2 Specification and conformance tests
released
Within 6
months (depends on
feedback)
OpenCL 2.0 Specification finalized
and conformance tests released
Jul13
OpenCL 2.0 Provisional
Specification released for public
review
© Copyright Khronos Group, 2013 - Page 10
Mobile OpenCL Shipping • Android ICD extension released in latest extension specification
- OpenCL implementations can be discovered and loaded as a shared object
• Multiple implementations shipping in Android NDK
- ARM, Imagination, Vivante, Qualcomm, Samsung …
© Copyright Khronos Group, 2013 - Page 11
Key OpenCL 2.0 Features • Shared Virtual Memory
- Host and device kernels can directly share complex, pointer-containing data
structures such as trees and linked lists, providing significant programming
flexibility and eliminating costly data transfers between host and devices
• Dynamic Parallelism
- Device kernels can enqueue kernels to the same device with no host interaction,
enabling flexible work scheduling paradigms and avoiding the need to transfer
execution control and data between the device and host, often significantly
offloading host processor bottlenecks
• Generic Address Space
- Functions can be written without specifying a named address space for
arguments, especially useful for those arguments that are declared to be a
pointer to a type, eliminating the need for multiple functions to be written for
each named address space used in an application
© Copyright Khronos Group, 2013 - Page 12
Key OpenCL 2.0 Features – continued… • Images
- Improved image support including sRGB images and 3D image writes, the ability
for kernels to read from and write to the same image, and the creation of
OpenCL images from a mip-mapped or a multi-sampled OpenGL texture for
improved OpenGL interop
• C11 Atomics
- Subset of C11 atomics and synchronization operations to enable assignments in
one work-item to be visible to other work-items in a work-group, across work-
groups executing on a device or for sharing data between OpenCL device and host
• Pipes
- Pipes are memory objects that store data organized as a FIFO. OpenCL 2.0
provides built-in functions for kernels to read from or write pipes, providing
straightforward programming that can be highly optimized by implementers
© Copyright Khronos Group, 2013 - Page 13
OpenCL as Parallel Compute Foundation
C++
syntax/compiler
extensions
OpenCL HLM
JavaScript binding to
OpenCL for initiation
of OpenCL C kernels
WebCL River Trail
Language
extensions to
JavaScript
C++ AMP
Shevlin Park
Uses Clang
and LLVM
OpenCL provides vendor optimized,
cross-platform, cross-vendor access to
heterogeneous compute resources
Harlan
High level
language for GPU
programming
Compiler
directives for
Fortran C and C++
Aparapi
Java language
extensions for
parallelism
PyOpenCL
Python wrapper
around
OpenCL
© Copyright Khronos Group, 2013 - Page 14
OpenGL 3D API Family Tree
OpenGL ES 1.0
OpenGL ES 1.1 OpenGL ES 2.0 OpenGL ES 3.0
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
OpenGL 1.5 OpenGL 2.0 OpenGL 4.3 OpenGL 2.1
OpenGL 3.0
OpenGL 3.1
OpenGL 3.2
OpenGL 3.3
OpenGL 4.0
OpenGL 4.1
OpenGL 4.2
2002
OpenGL 1.3
ES-Next
GL-Next
OpenGL ES 2.0
Content OpenGL ES 1.1
Content
OpenGL ES 3.0
Content
ES3 is backward compatible
so new features can be
added incrementally Fixed function
3D Pipeline
Programmable vertex
and fragment shaders
WebGL 1.0
OpenGL 4.4 is a
superset of DX11
WebGL-Next
Desktop 3D
Mobile 3D
OpenGL 4.4
© Copyright Khronos Group, 2013 - Page 15
OpenGL ES 3.0 Highlights • Better looking, faster performing games and apps – at lower power
- Incorporates proven features from OpenGL 3.3 / 4.x
- 32-bit integers and floats in shader programs
- NPOT, 3D textures, depth textures, texture arrays
- Multiple Render Targets for deferred rendering, Occlusion Queries
- Instanced Rendering, Transform Feedback …
• Make life better for the programmer
- Tighter requirements for supported features to reduce implementation variability
• Backward compatible with OpenGL ES 2.0
- OpenGL ES 2.0 apps continue to run unmodified
• Standardized Texture Compression
- #1 developer request!
© Copyright Khronos Group, 2013 - Page 16
DirectX 11.1
2004 2006 2008 2009 2010 2005 2007 2011
Accelerating OpenGL Innovation
DirectX 10.1
OpenGL 2.0 OpenGL 2.1 OpenGL 3.0
OpenGL 3.1
DirectX 9.0c DirectX 10.0 DirectX 11
OpenGL 3.2
OpenGL 3.3/4.0
OpenGL 4.1
Bringing state-of-the-art functionality to cross-platform graphics
2012
OpenGL 4.2
OpenGL 4.4
2013
OpenGL 4.3
© Copyright Khronos Group, 2013 - Page 17
OpenGL 4.3 Compute Shaders • Execute algorithmically general-purpose GLSL shaders
- Can operate on uniforms, images and textures
• Process graphics data in the context of the graphics pipeline
- Easier than interoperating with a compute API IF processing ‘close to the pixel’
• Standard part of all OpenGL 4.3 implementations
- Matches DX11 DirectCompute functionality
Physics AI Simulation Ray Tracing Imaging Global Illumination
© Copyright Khronos Group, 2013 - Page 18
OpenCL and OpenGL Compute Shaders • OpenGL compute shaders and OpenCL support distinctly different use cases
- OpenCL provides a significantly more powerful and complete compute solution
Enhanced 3D
Graphics apps
“Shaders++”
Pure compute
apps touching
no pixels
Compute Shaders
1. Full ANSI C programming of
heterogeneous CPUs and GPUs
2. Utilize multiple processors
3. Precisely defined IEEE accuracy
1. Fine grain compute operations
inside OpenGL
2. GLSL Shading Language
3. Execute on single GPU only
Imaging
Video
Physics
AI
© Copyright Khronos Group, 2013 - Page 19
OpenGL 4.4 reference pages
Huge thanks to Graham Sellers!!!
© Copyright Khronos Group, 2013 - Page 20
OpenGL Conformance Test Suite released!
Conformance submissions are required for GL 4.4 implementations encouraged for earlier driver versions
Shared codebase with OpenGL ES 3.0 CTS additional desktop-specific tests
Core profile functionality
Enhancements underway to add more coverage
© Copyright Khronos Group, 2013 - Page 21
Leveraging Proven Native APIs into HTML5 • Khronos and W3C liaison
- Leverage proven native API investments into the Web
- Fast API development and deployment
- Designed by the hardware community
- Familiar foundation reduces developer learning curve
Native APIs shipping
or Khronos working group
JavaScript API shipping,
acceleration being developed
or work underway
WebVX? Vision
Processing
WebCAM(!) Camera
control and
video
processing
Possible future
JavaScript APIs or
acceleration
WebStream? Sensor Fusion
Native
JavaScript Canvas
Path Rendering
Camera
Control
HTML
© Copyright Khronos Group, 2013 - Page 22
Zygote Body, formerly Google Body • Rendering in Zygote Body uses WebGL
www.zygote.com
www.zygotebody.com
© Copyright Khronos Group, 2013 - Page 23
Content
JavaScript, HTML, CSS, ...
WebGL Implementation Anatomy
JavaScript Middleware
HTML5
JavaScript CSS
Browser provides WebGL functionality
alongside other HTML5 technologies
- no plug-in required
OS Provided Drivers. WebGL on Windows
can use Google Angle to create conformant
OpenGL ES 2.0 over DX9
OpenGL ES 2.0 OpenGL
DX9/Angle
Content downloaded from the Web.
Middleware can make WebGL accessible to
non-expert 3D programmers
© Copyright Khronos Group, 2013 - Page 24
WebGL Availability in Browsers
- Microsoft – “where you have IE11, you have WebGL – turned on by default and working all the time” - Microsoft - WebGL also enabled for Windows applications - web app framework and web view - Apple - WebGL must be explicitly turned on MAC Safari and only exposed on iOS for iAds - Chrome OS - WebGL is the only cross-platform API to program the GPU - Google IO announcement - Chrome on Android will soon launch with WebGL
Much WebGL content uses three.js library:
http://threejs.org/
© Copyright Khronos Group, 2013 - Page 25
Sectional Anatomy: MR Knee • //sectional-anatomy.org/
© Copyright Khronos Group, 2013 - Page 26
Sectional Anatomy: MR Knee • //sectional-anatomy.org/
© Copyright Khronos Group, 2013 - Page 27
C/C++
SDK Dalvik (Java)
Objective C C#
DirectX
HTML/CSS HTML/CSS HTML/CSS
Cross-OS Portability
HTML5 provides cross
platform portability. GPU
accessibility through
WebGL available soon on
~90% mobile systems
Preferred development
environments not
designed for portability
Native code is portable-
but apps must cope with
different available APIs
and libraries
© Copyright Khronos Group, 2013 - Page 28
WebGL First Wave Application Categories • Maps and Navigation
• Modeling Tools and Repositories
• Games
• 3D Printing
• Visualization
• Music Videos and Promotion
• Education
• Photo Editors
• Music Visualizers
• Vision/Video Processing
© Copyright Khronos Group, 2013 - Page 29
WebCL – Parallel Computing for the Web • JavaScript bindings to OpenCL APIs
- Enables initiation of Kernels written in OpenCL C within the browser
http://www.youtube.com/user/SamsungSISA#p/a/u/1/9Ttux1A-Nuc
© Copyright Khronos Group, 2013 - Page 30
3D Needs a Transmission Format! • Compression and streaming of 3D assets becoming essential
- Mobile and connected devices need access to increasingly large asset databases
• 3D is the last media type to define a compressed format
- 3D is more complex – diverse asset types and use cases
• Needs to be royalty-free
- Avoid an ‘internet video codec war’ scenario
• Eventually enable hardware implementations of successful codecs
- High-performance and low power – but pragmatic adoption strategy is key
Audio Video Images 3D
MP3 H.264 JPEG ? !
An effective and widely adopted codec ignites previously
unimagined opportunities for a media type
© Copyright Khronos Group, 2013 - Page 31
glTF Goals • Binary file format for efficient transmission for 3D assets
- Reduce network bandwidth and minimize client processing overhead
• Run-time neutral - DO NOT IMPLY OR MANDATE ANY RUN-TIME BEHAVIOR
- Can be used by any app or run-time – usually WebGL accelerated
• Scalable to handle compression and streaming
- Though baseline format does not include compression
• ‘Direct load efficiency’ for WebGL
- Little or NO processing to drop glTF data into WebGL client
• Carry conditioned data from any authoring format
- Prototyping and optimizing efficient handling of COLLADA assets
A standards-based
content pipeline for
rich native and Web 3D
applications Playback Authoring
© Copyright Khronos Group, 2013 - Page 32
COLLADA and glTF Open Source Ecosystem
Tool Interop
Three.js glTF Importer. Rest3D initiative
COLLADA2GLTF
Translator
OpenCOLLADA
Importer/Exporter
and COLLADA
Conformance Tests
On GitHUB
Pervasive WebGL deployment
Other
authoring
formats
Web-based Tools
https://github.com/KhronosGroup/glTF
https://github.com/KhronosGroup/OpenCOLLADA
https://github.com/KhronosGroup/COLLADA-CTS
© Copyright Khronos Group, 2013 - Page 33
WebGL as Test-bed for 3D Asset Compression • Integrating and benchmarking 3D geometry compression formats with glTF
- Baseline is GZIP
- Open3DGC - implementation of the MPEG-SC3DMC - Scalable Complexity 3D Mesh Compression codec
- WebGL-loader is Google lightweight compression for WebGL content
Model COLLADA glTF+webgl-loader glTF+Open3DGC ascii glTF+Open3DGC binary
XML gzip raw gzip raw gzip raw •raw bin
•gzip JSON
• utf8:42k
• JSON:12k
• utf8:34k
•JSON:2kb
• ascii:29k
• JSON:11k
• ascii:19k
• JSON:2k
• bin:18k
• JSON:11k
• bin:18k
• JSON:2k
336k 106k 54k 36k 40k 21k 29k 20k
•utf8:8747k
• JSON:753k
•utf8:1325k
• JSON:29k
• ascii:7793k
• JSON:587k
• ascii:1433k
• JSON:29k
• bin:3205k
• JSON:589k
• bin:3205k
• JSON:29k
56763k 7378k 9500k 1354k 8380k 1462k 3794k 3234k
© Copyright Khronos Group, 2013 - Page 34
Compression Example Results Overview • Early days – Khronos embarking on methodical analysis using glTF as test-bed
• For mobile - need to balance file size AND decompression processing
- Extensive processing can take more time/power than transmission
• OpenCTM is promising but LZMA is very processor intensive
- Work may lead to LZMA in hardware?
© Copyright Khronos Group, 2013 - Page 35
Texture Compression is Key •Texture compression saves precious resources
- Network bandwidth, device memory space AND device memory bandwidth
•Developers need the same texture compression EVERYWHERE - Otherwise portable apps – such as WebGL need multiple copies of same texture
DXTC/S3TC Windows
PVRTC iOS
ETC1 Mandated in
Android Froyo
(400M devices)
ETC2 / EAC MANDATED in
OpenGL ES 3.0
OpenGL 4.3
ASTC OpenGL ES 3.0
and OpenGL 4.3
extensions -> Core
once proven
Pervasive Deployment
Quality
NOT Royalty-free.
Platform
Fragmentation
Royalty-free
BUT only optional in ES.
Only 4bpp | 3 channel
No alpha support
Royalty-free
Backward compatible with ETC1
ETC2: 4bpp | 3 channel
EAC: 4 (8) bpp | 1(2) channel
COMBINED: RGBA 8bpp | 4 channel
Does not have 1-2 bit compression
WITH ALPHA
Royalty-free
Best quality.
Independent control of bit-rate
and # channels
1 to 4 channel
1-8bpp in fine steps
2008-2010 2012-2013 2014->
© Copyright Khronos Group, 2013 - Page 36
ASTC – Universal Texture Standard • Adaptive Scalable Texture Compression (ASTC)
- Quality significantly exceeds S3TC or PVRTC at same bit rate
• Industry-leading orthogonal compression rate and format flexibility - 1 to 4 color components: R / RG / RGB / RGBA
- Choice of bit rate: from 8bpp to <1bpp in fine steps
• ASTC is royalty-free and so is available to be universally adopted - Shipping as OpenGL/OpenGL ES extension today for industry feedback
Original
24bpp
ASTC Compression
8bpp 3.56bpp 2bpp
© Copyright Khronos Group, 2013 - Page 37
Conclusion • Hardware acceleration is a complex application domain and needs multiple
standards across diverse domains
• Advances in SOC silicon processing and associated APIs to access them are about
to enable mobile devices to truly meet user expectations
• Now is a good time to get involved with the standards initiatives
that effect your business
• These slides and more details at
www.khronos.org