2013 korean tour daegu

37
© Copyright Khronos Group, 2013 - Page 1 Graphics Technology Update Presented by: Erik Noreke, Khronos Group Vice President of Business Development November 2013

Upload: the-khronos-group-inc

Post on 13-Jan-2015

269 views

Category:

Technology


0 download

DESCRIPTION

Khronos toured Korea in November 2013. Erik Noreke VP of Business Development visited "Human Care Center Workshop" and "Interaction Standard Workshop". Additional details may be found on the Khronos Group event page https://www.khronos.org/news/events/korean-tour-2013

TRANSCRIPT

Page 1: 2013 Korean tour Daegu

© Copyright Khronos Group, 2013 - Page 1

Graphics Technology Update

Presented by:

Erik Noreke, Khronos Group Vice President of Business Development

November 2013

Page 2: 2013 Korean tour Daegu

© Copyright Khronos Group, 2013 - Page 2

Khronos Connects Software to Silicon

ROYALTY-FREE, OPEN STANDARD APIs for

advanced hardware acceleration

Low level silicon to software interfaces needed on every platform

Graphics, video, audio, compute,

vision, sensor and camera processing

Defines the forward looking roadmap for

the silicon community

Shipping on billions of devices across

multiple operating systems

Rigorous conformance tests for

cross-vendor consistency

Khronos is OPEN for any company to

join and participate

Acceleration APIs BY the Industry

FOR the Industry

Page 3: 2013 Korean tour Daegu

© Copyright Khronos Group, 2013 - Page 3

Power is the New Limit to Performance • GPUs are much more power efficient than CPUs for data parallelism

- When exploiting data parallelism can x10 as efficient – but can go further…

• Lots of space for transistors on SOC – but can’t turn them all on at same time!

- Would exceed Thermal Design Point

• Dark Silicon - specialized hardware – only turned on when needed

- Dedicated units can increase locality and parallelism of computation

Power Efficiency

Computation Flexibility

Enabling new mobile use cases requires pushing computation

onto GPUs and dedicated hardware

Dedicated Hardware

GPU Compute

Multi-core CPU X1

X10

X100

How do we provide

access to this diversity of

processors and hardware

without horrible platform

fragmentation?

Standards!

Page 4: 2013 Korean tour Daegu

© Copyright Khronos Group, 2013 - Page 4

OpenCL – Heterogeneous Computing

• Native framework for programming diverse

parallel computing resources

- CPU, GPU, DSP – as well as hardware blocks(!)

• Powerful, low-level flexibility

- Foundational access to compute resources for

higher-level engines, frameworks and languages

• Embedded profile

- No need for a separate “ES” spec

- Reduces precision requirements

A cross-platform, cross-vendor standard for

harnessing all the compute resources in an SOC

OpenCL

Kernel

Code

OpenCL

Kernel

Code

OpenCL

Kernel

Code

OpenCL

Kernel

Code

GPU

DSP

One code tree can be executed on

CPUs, GPUs, DSPs and hardware.

Dynamically interrogate system load

and load balance work across

available processors

CPU

CPU HW

Page 5: 2013 Korean tour Daegu

© Copyright Khronos Group, 2013 - Page 5

OpenCL Overview • C Platform Layer API

- Query, select and initialize compute devices

• Kernel Language Specification

- Subset of ISO C99 with language extensions

- Well-defined numerical accuracy - IEEE 754 rounding with specified max error

- Rich set of built-in functions: cross, dot, sin, cos, pow, log …

• C Runtime API

- Runtime or build-time compilation of kernels

- Execute compute kernels across multiple devices

• Memory management is explicit

- Application must move data from

host global local and back

- Implementations can optimize data movement

in unified memory systems

Page 6: 2013 Korean tour Daegu

© Copyright Khronos Group, 2013 - Page 6

OpenCL: Execution Model • Kernel

- Basic unit of executable code ~ C function

- Data-parallel or task-parallel

• Program

- Collection of kernels and functions

~ dynamic library with run-time linking

• Command Queue

- Applications queue kernels & data transfers

- Performed in-order or out-of-order

• Work-item

- An execution of a kernel by a processing element

~ thread

• Work-group

- A collection of related work-items that execute on

a single compute unit ~ core

Example of parallelism types

Page 7: 2013 Korean tour Daegu

© Copyright Khronos Group, 2013 - Page 7

OpenCL Built-in Kernels • Used to control non-OpenCL C-capable

resources on an SOC – ‘Custom Devices’

- E.g. Video encode/decode, Camera ISP …

• Represent functions of Custom Devices

as an OpenCL kernel

- Can enqueue Built-in Kernels to Custom

Devices alongside standard OpenCL kernels

• OpenCL run-time a powerful coordinating

framework for ALL SOC resources

- Programmable and custom devices

controlled by one run-time

Built-in kernels enable control of specialized processors and hardware

from OpenCL run-time

Page 8: 2013 Korean tour Daegu

© Copyright Khronos Group, 2013 - Page 8

OpenCL SPIR 1.2 Provisional released!

OpenCL Roadmap

OpenCL 2.0

Significant enhancements to memory and execution models to

expose emerging hardware capabilities and provide increased

flexibility, functionality and performance to developers

OpenCL SPIR (Standard Parallel Intermediate Representation)

LLVM-based, low-level Intermediate Representation for IP Protection and as

target back-end for alternative high-level languages

OpenCL HLM (High Level Model)

High-level programming model, unifying host and device execution environments through

language syntax for increased usability and broader optimization opportunities

OpenCL 2.0 Provisional released!

Page 9: 2013 Korean tour Daegu

© Copyright Khronos Group, 2013 - Page 9

OpenCL Milestones • 24 month cadence for major OpenCL 2.0 update

- Slightly longer than18 month cadence between versions of OpenCL 1.X

• Provisional Specification enables public review

- Warning! The spec may change before final release!

OpenCL 1.0 released. Conformance tests

released Dec08

Dec08

Jun10

OpenCL 1.1 Specification and

conformance tests

released Nov11

OpenCL 1.2 Specification and conformance tests

released

Within 6

months (depends on

feedback)

OpenCL 2.0 Specification finalized

and conformance tests released

Jul13

OpenCL 2.0 Provisional

Specification released for public

review

Page 10: 2013 Korean tour Daegu

© Copyright Khronos Group, 2013 - Page 10

Mobile OpenCL Shipping • Android ICD extension released in latest extension specification

- OpenCL implementations can be discovered and loaded as a shared object

• Multiple implementations shipping in Android NDK

- ARM, Imagination, Vivante, Qualcomm, Samsung …

Page 11: 2013 Korean tour Daegu

© Copyright Khronos Group, 2013 - Page 11

Key OpenCL 2.0 Features • Shared Virtual Memory

- Host and device kernels can directly share complex, pointer-containing data

structures such as trees and linked lists, providing significant programming

flexibility and eliminating costly data transfers between host and devices

• Dynamic Parallelism

- Device kernels can enqueue kernels to the same device with no host interaction,

enabling flexible work scheduling paradigms and avoiding the need to transfer

execution control and data between the device and host, often significantly

offloading host processor bottlenecks

• Generic Address Space

- Functions can be written without specifying a named address space for

arguments, especially useful for those arguments that are declared to be a

pointer to a type, eliminating the need for multiple functions to be written for

each named address space used in an application

Page 12: 2013 Korean tour Daegu

© Copyright Khronos Group, 2013 - Page 12

Key OpenCL 2.0 Features – continued… • Images

- Improved image support including sRGB images and 3D image writes, the ability

for kernels to read from and write to the same image, and the creation of

OpenCL images from a mip-mapped or a multi-sampled OpenGL texture for

improved OpenGL interop

• C11 Atomics

- Subset of C11 atomics and synchronization operations to enable assignments in

one work-item to be visible to other work-items in a work-group, across work-

groups executing on a device or for sharing data between OpenCL device and host

• Pipes

- Pipes are memory objects that store data organized as a FIFO. OpenCL 2.0

provides built-in functions for kernels to read from or write pipes, providing

straightforward programming that can be highly optimized by implementers

Page 13: 2013 Korean tour Daegu

© Copyright Khronos Group, 2013 - Page 13

OpenCL as Parallel Compute Foundation

C++

syntax/compiler

extensions

OpenCL HLM

JavaScript binding to

OpenCL for initiation

of OpenCL C kernels

WebCL River Trail

Language

extensions to

JavaScript

C++ AMP

Shevlin Park

Uses Clang

and LLVM

OpenCL provides vendor optimized,

cross-platform, cross-vendor access to

heterogeneous compute resources

Harlan

High level

language for GPU

programming

Compiler

directives for

Fortran C and C++

Aparapi

Java language

extensions for

parallelism

PyOpenCL

Python wrapper

around

OpenCL

Page 14: 2013 Korean tour Daegu

© Copyright Khronos Group, 2013 - Page 14

OpenGL 3D API Family Tree

OpenGL ES 1.0

OpenGL ES 1.1 OpenGL ES 2.0 OpenGL ES 3.0

2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013

OpenGL 1.5 OpenGL 2.0 OpenGL 4.3 OpenGL 2.1

OpenGL 3.0

OpenGL 3.1

OpenGL 3.2

OpenGL 3.3

OpenGL 4.0

OpenGL 4.1

OpenGL 4.2

2002

OpenGL 1.3

ES-Next

GL-Next

OpenGL ES 2.0

Content OpenGL ES 1.1

Content

OpenGL ES 3.0

Content

ES3 is backward compatible

so new features can be

added incrementally Fixed function

3D Pipeline

Programmable vertex

and fragment shaders

WebGL 1.0

OpenGL 4.4 is a

superset of DX11

WebGL-Next

Desktop 3D

Mobile 3D

OpenGL 4.4

Page 15: 2013 Korean tour Daegu

© Copyright Khronos Group, 2013 - Page 15

OpenGL ES 3.0 Highlights • Better looking, faster performing games and apps – at lower power

- Incorporates proven features from OpenGL 3.3 / 4.x

- 32-bit integers and floats in shader programs

- NPOT, 3D textures, depth textures, texture arrays

- Multiple Render Targets for deferred rendering, Occlusion Queries

- Instanced Rendering, Transform Feedback …

• Make life better for the programmer

- Tighter requirements for supported features to reduce implementation variability

• Backward compatible with OpenGL ES 2.0

- OpenGL ES 2.0 apps continue to run unmodified

• Standardized Texture Compression

- #1 developer request!

Page 16: 2013 Korean tour Daegu

© Copyright Khronos Group, 2013 - Page 16

DirectX 11.1

2004 2006 2008 2009 2010 2005 2007 2011

Accelerating OpenGL Innovation

DirectX 10.1

OpenGL 2.0 OpenGL 2.1 OpenGL 3.0

OpenGL 3.1

DirectX 9.0c DirectX 10.0 DirectX 11

OpenGL 3.2

OpenGL 3.3/4.0

OpenGL 4.1

Bringing state-of-the-art functionality to cross-platform graphics

2012

OpenGL 4.2

OpenGL 4.4

2013

OpenGL 4.3

Page 17: 2013 Korean tour Daegu

© Copyright Khronos Group, 2013 - Page 17

OpenGL 4.3 Compute Shaders • Execute algorithmically general-purpose GLSL shaders

- Can operate on uniforms, images and textures

• Process graphics data in the context of the graphics pipeline

- Easier than interoperating with a compute API IF processing ‘close to the pixel’

• Standard part of all OpenGL 4.3 implementations

- Matches DX11 DirectCompute functionality

Physics AI Simulation Ray Tracing Imaging Global Illumination

Page 18: 2013 Korean tour Daegu

© Copyright Khronos Group, 2013 - Page 18

OpenCL and OpenGL Compute Shaders • OpenGL compute shaders and OpenCL support distinctly different use cases

- OpenCL provides a significantly more powerful and complete compute solution

Enhanced 3D

Graphics apps

“Shaders++”

Pure compute

apps touching

no pixels

Compute Shaders

1. Full ANSI C programming of

heterogeneous CPUs and GPUs

2. Utilize multiple processors

3. Precisely defined IEEE accuracy

1. Fine grain compute operations

inside OpenGL

2. GLSL Shading Language

3. Execute on single GPU only

Imaging

Video

Physics

AI

Page 19: 2013 Korean tour Daegu

© Copyright Khronos Group, 2013 - Page 19

OpenGL 4.4 reference pages

Huge thanks to Graham Sellers!!!

Page 20: 2013 Korean tour Daegu

© Copyright Khronos Group, 2013 - Page 20

OpenGL Conformance Test Suite released!

Conformance submissions are required for GL 4.4 implementations encouraged for earlier driver versions

Shared codebase with OpenGL ES 3.0 CTS additional desktop-specific tests

Core profile functionality

Enhancements underway to add more coverage

Page 21: 2013 Korean tour Daegu

© Copyright Khronos Group, 2013 - Page 21

Leveraging Proven Native APIs into HTML5 • Khronos and W3C liaison

- Leverage proven native API investments into the Web

- Fast API development and deployment

- Designed by the hardware community

- Familiar foundation reduces developer learning curve

Native APIs shipping

or Khronos working group

JavaScript API shipping,

acceleration being developed

or work underway

WebVX? Vision

Processing

WebCAM(!) Camera

control and

video

processing

Possible future

JavaScript APIs or

acceleration

WebStream? Sensor Fusion

Native

JavaScript Canvas

Path Rendering

Camera

Control

HTML

Page 22: 2013 Korean tour Daegu

© Copyright Khronos Group, 2013 - Page 22

Zygote Body, formerly Google Body • Rendering in Zygote Body uses WebGL

www.zygote.com

www.zygotebody.com

Page 23: 2013 Korean tour Daegu

© Copyright Khronos Group, 2013 - Page 23

Content

JavaScript, HTML, CSS, ...

WebGL Implementation Anatomy

JavaScript Middleware

HTML5

JavaScript CSS

Browser provides WebGL functionality

alongside other HTML5 technologies

- no plug-in required

OS Provided Drivers. WebGL on Windows

can use Google Angle to create conformant

OpenGL ES 2.0 over DX9

OpenGL ES 2.0 OpenGL

DX9/Angle

Content downloaded from the Web.

Middleware can make WebGL accessible to

non-expert 3D programmers

Page 24: 2013 Korean tour Daegu

© Copyright Khronos Group, 2013 - Page 24

WebGL Availability in Browsers

- Microsoft – “where you have IE11, you have WebGL – turned on by default and working all the time” - Microsoft - WebGL also enabled for Windows applications - web app framework and web view - Apple - WebGL must be explicitly turned on MAC Safari and only exposed on iOS for iAds - Chrome OS - WebGL is the only cross-platform API to program the GPU - Google IO announcement - Chrome on Android will soon launch with WebGL

Much WebGL content uses three.js library:

http://threejs.org/

Page 25: 2013 Korean tour Daegu

© Copyright Khronos Group, 2013 - Page 25

Sectional Anatomy: MR Knee • //sectional-anatomy.org/

Page 26: 2013 Korean tour Daegu

© Copyright Khronos Group, 2013 - Page 26

Sectional Anatomy: MR Knee • //sectional-anatomy.org/

Page 27: 2013 Korean tour Daegu

© Copyright Khronos Group, 2013 - Page 27

C/C++

SDK Dalvik (Java)

Objective C C#

DirectX

HTML/CSS HTML/CSS HTML/CSS

Cross-OS Portability

HTML5 provides cross

platform portability. GPU

accessibility through

WebGL available soon on

~90% mobile systems

Preferred development

environments not

designed for portability

Native code is portable-

but apps must cope with

different available APIs

and libraries

Page 28: 2013 Korean tour Daegu

© Copyright Khronos Group, 2013 - Page 28

WebGL First Wave Application Categories • Maps and Navigation

• Modeling Tools and Repositories

• Games

• 3D Printing

• Visualization

• Music Videos and Promotion

• Education

• Photo Editors

• Music Visualizers

• Vision/Video Processing

Page 29: 2013 Korean tour Daegu

© Copyright Khronos Group, 2013 - Page 29

WebCL – Parallel Computing for the Web • JavaScript bindings to OpenCL APIs

- Enables initiation of Kernels written in OpenCL C within the browser

http://www.youtube.com/user/SamsungSISA#p/a/u/1/9Ttux1A-Nuc

Page 30: 2013 Korean tour Daegu

© Copyright Khronos Group, 2013 - Page 30

3D Needs a Transmission Format! • Compression and streaming of 3D assets becoming essential

- Mobile and connected devices need access to increasingly large asset databases

• 3D is the last media type to define a compressed format

- 3D is more complex – diverse asset types and use cases

• Needs to be royalty-free

- Avoid an ‘internet video codec war’ scenario

• Eventually enable hardware implementations of successful codecs

- High-performance and low power – but pragmatic adoption strategy is key

Audio Video Images 3D

MP3 H.264 JPEG ? !

An effective and widely adopted codec ignites previously

unimagined opportunities for a media type

Page 31: 2013 Korean tour Daegu

© Copyright Khronos Group, 2013 - Page 31

glTF Goals • Binary file format for efficient transmission for 3D assets

- Reduce network bandwidth and minimize client processing overhead

• Run-time neutral - DO NOT IMPLY OR MANDATE ANY RUN-TIME BEHAVIOR

- Can be used by any app or run-time – usually WebGL accelerated

• Scalable to handle compression and streaming

- Though baseline format does not include compression

• ‘Direct load efficiency’ for WebGL

- Little or NO processing to drop glTF data into WebGL client

• Carry conditioned data from any authoring format

- Prototyping and optimizing efficient handling of COLLADA assets

A standards-based

content pipeline for

rich native and Web 3D

applications Playback Authoring

Page 32: 2013 Korean tour Daegu

© Copyright Khronos Group, 2013 - Page 32

COLLADA and glTF Open Source Ecosystem

Tool Interop

Three.js glTF Importer. Rest3D initiative

COLLADA2GLTF

Translator

OpenCOLLADA

Importer/Exporter

and COLLADA

Conformance Tests

On GitHUB

Pervasive WebGL deployment

Other

authoring

formats

Web-based Tools

https://github.com/KhronosGroup/glTF

https://github.com/KhronosGroup/OpenCOLLADA

https://github.com/KhronosGroup/COLLADA-CTS

Page 33: 2013 Korean tour Daegu

© Copyright Khronos Group, 2013 - Page 33

WebGL as Test-bed for 3D Asset Compression • Integrating and benchmarking 3D geometry compression formats with glTF

- Baseline is GZIP

- Open3DGC - implementation of the MPEG-SC3DMC - Scalable Complexity 3D Mesh Compression codec

- WebGL-loader is Google lightweight compression for WebGL content

Model COLLADA glTF+webgl-loader glTF+Open3DGC ascii glTF+Open3DGC binary

XML gzip raw gzip raw gzip raw •raw bin

•gzip JSON

• utf8:42k

• JSON:12k

• utf8:34k

•JSON:2kb

• ascii:29k

• JSON:11k

• ascii:19k

• JSON:2k

• bin:18k

• JSON:11k

• bin:18k

• JSON:2k

336k 106k 54k 36k 40k 21k 29k 20k

•utf8:8747k

• JSON:753k

•utf8:1325k

• JSON:29k

• ascii:7793k

• JSON:587k

• ascii:1433k

• JSON:29k

• bin:3205k

• JSON:589k

• bin:3205k

• JSON:29k

56763k 7378k 9500k 1354k 8380k 1462k 3794k 3234k

Page 34: 2013 Korean tour Daegu

© Copyright Khronos Group, 2013 - Page 34

Compression Example Results Overview • Early days – Khronos embarking on methodical analysis using glTF as test-bed

• For mobile - need to balance file size AND decompression processing

- Extensive processing can take more time/power than transmission

• OpenCTM is promising but LZMA is very processor intensive

- Work may lead to LZMA in hardware?

Page 35: 2013 Korean tour Daegu

© Copyright Khronos Group, 2013 - Page 35

Texture Compression is Key •Texture compression saves precious resources

- Network bandwidth, device memory space AND device memory bandwidth

•Developers need the same texture compression EVERYWHERE - Otherwise portable apps – such as WebGL need multiple copies of same texture

DXTC/S3TC Windows

PVRTC iOS

ETC1 Mandated in

Android Froyo

(400M devices)

ETC2 / EAC MANDATED in

OpenGL ES 3.0

OpenGL 4.3

ASTC OpenGL ES 3.0

and OpenGL 4.3

extensions -> Core

once proven

Pervasive Deployment

Quality

NOT Royalty-free.

Platform

Fragmentation

Royalty-free

BUT only optional in ES.

Only 4bpp | 3 channel

No alpha support

Royalty-free

Backward compatible with ETC1

ETC2: 4bpp | 3 channel

EAC: 4 (8) bpp | 1(2) channel

COMBINED: RGBA 8bpp | 4 channel

Does not have 1-2 bit compression

WITH ALPHA

Royalty-free

Best quality.

Independent control of bit-rate

and # channels

1 to 4 channel

1-8bpp in fine steps

2008-2010 2012-2013 2014->

Page 36: 2013 Korean tour Daegu

© Copyright Khronos Group, 2013 - Page 36

ASTC – Universal Texture Standard • Adaptive Scalable Texture Compression (ASTC)

- Quality significantly exceeds S3TC or PVRTC at same bit rate

• Industry-leading orthogonal compression rate and format flexibility - 1 to 4 color components: R / RG / RGB / RGBA

- Choice of bit rate: from 8bpp to <1bpp in fine steps

• ASTC is royalty-free and so is available to be universally adopted - Shipping as OpenGL/OpenGL ES extension today for industry feedback

Original

24bpp

ASTC Compression

8bpp 3.56bpp 2bpp

Page 37: 2013 Korean tour Daegu

© Copyright Khronos Group, 2013 - Page 37

Conclusion • Hardware acceleration is a complex application domain and needs multiple

standards across diverse domains

• Advances in SOC silicon processing and associated APIs to access them are about

to enable mobile devices to truly meet user expectations

• Now is a good time to get involved with the standards initiatives

that effect your business

• These slides and more details at

www.khronos.org