"update on khronos standards for vision and machine learning," a presentation from the...

39
© Copyright Khronos Group 2017 - Page 1 Update on Khronos Standards for Vision and Machine Learning December 2017 Neil Trevett | Khronos President NVIDIA | VP Developer Ecosystem [email protected] | @neilt3d |www.khronos.org

Upload: embedded-vision-alliance

Post on 21-Jan-2018

28 views

Category:

Technology


0 download

TRANSCRIPT

© Copyright Khronos Group 2017 - Page 1

Update on Khronos Standards for

Vision and Machine LearningDecember 2017

Neil Trevett | Khronos President

NVIDIA | VP Developer Ecosystem

[email protected] | @neilt3d |www.khronos.org

© Copyright Khronos Group 2017 - Page 2

Khronos Mission

Software

Silicon

Khronos is an International Industry Consortium of over 100 companies creating

royalty-free, Open Standard APIs to enable software to access hardware

acceleration for 3D Graphics, Virtual and Augmented Reality, Parallel Computing,

Neural Networks and Vision Processing

© Copyright Khronos Group 2017 - Page 3

Khronos Open Standards

Vision

sensor(s)

Tracking and

Positioning

Geometric scene

reconstruction

Semantic scene

understanding

(Neural Networks)

Generate Low Latency 3D

Content and Augmentations

Interact with sensor, haptic

and display devices

Download 3D

augmentation object

and scene data

VR/AR Application

© Copyright Khronos Group 2017 - Page 4

Embedded/Mobile

Vision/Inferencing HardwareEmbedded/Mobile

Vision/Inferencing HardwareEmbedded/Mobile

Vision/Inferencing Hardware

Neural Net Training

FrameworksNeural Net Training

FrameworksNeural Net Training

Frameworks

Vision and Neural Net

Inferencing Runtime

Embedded/Mobile

Vision/Inferencing Hardware

Standards for Vision and Neural Networks

Vision/AI

Applications

Neural Net Training

Frameworks

Desktop and

Cloud Hardware

Trained

Networks

© Copyright Khronos Group 2017 - Page 5

NNEF - Solving Neural Net Fragmentation

NN Authoring Framework 1

NN Authoring Framework 2

NN Authoring Framework 3

Inference Engine 1

Inference Engine 2

Inference Engine 3Every Tool Needs an Exporter to

Every Accelerator

With NNEF- NN Training and Inferencing Interoperability

Before NNEF – NN Training and Inferencing Fragmentation

NN Authoring Framework 1

NN Authoring Framework 2

NN Authoring Framework 3

Inference Engine 1

Inference Engine 2

Inference Engine 3

Optimization and processing tools

NNEF is a Cross-vendor Neural Net file formatEncapsulates network formal semantics, structure, data formats,

commonly-used operations (such as convolution, pooling, normalization, etc.)

© Copyright Khronos Group 2017 - Page 6

NNEF Status and Roadmap• NNEF V1.0 provisional release – targeting late December 2017

- Focus on passing trained frameworks to embedded inference engines

- Support ‘First cut’ range of network types

• NNEF Roadmap

- Track development of new network types

- Allow authoring and retraining (3rd party tools)

- Address a wider range of applications (outside vision apps)

- Increase the expressive power of the format Baseline

Importer

File

Format

Validator

Working Group Participants

Baseline

Exporter

Khronos

Open Source

Projects

© Copyright Khronos Group 2017 - Page 7

OpenVX Pow

er

Eff

icie

ncy

Computation Flexibility

Dedicated Hardware

GPUCompute

Multi-coreCPUX1

X10

X100

Vision DSPs

Wide range of vision hardware architectures

OpenVX provides a high-level Graph-based abstraction

->

Enables Graph-level optimizations!

Can be implemented on almost any hardware or processor!->

Portable, Efficient Vision Processing!

VisionNode

VisionNode

VisionNodeVision

Node

Vision Processing Graph

GPU

Vision Engines

Middleware

Applications

DSP

Hardware

Software Portability

© Copyright Khronos Group 2017 - Page 8

Simple Edge Detector in OpenVXvx_image input = vxCreateImage(1920, 1080);

vx_image output = vxCreateImage(0, 0);

vx_image horiz = vxCreateVirtualImage();

vx_image vert = vxCreateVirtualImage();

vx_image mag = vxCreateVirtualImage();

vx_graph g = vxCreateGraph();

vxSobel3x3Node(g, input, horiz, vert);

vxMagnitudeNode(g, horiz, vert, mag);

vxThresholdNode(g, mag, THRESH, output);

status = vxVerifyGraph(g);

status = vxProcessGraph(g);

m

hi

v

oS M T

Compile the Graph

Execute the Graph

Declare Input and Output Images

Declare Intermediate Images

Construct the Graph topology

© Copyright Khronos Group 2017 - Page 9

OpenVX Evolution

OpenVX 1.0 Spec released October 2014

Conformant

Implementations

OpenVX 1.1 Spec released May 2016

Conformant

Implementations

AMD OpenVX Tools- Open source, highly optimized

for x86 CPU and OpenCL for GPU

- “Graph Optimizer”

- Scripting for rapid prototyping,

without re-compiling, at

production performance levelshttp://gpuopen.com/compute-product/amd-openvx/

New FunctionalityExpanded Nodes Functionality

Enhanced Graph Framework

OpenVX 1.2 Spec released May 2017

Adopters Program November 2017

New FunctionalityConditional node execution

Feature detection

Classification operators

Expanded imaging operations

ExtensionsNeural Network Acceleration

Graph Save and Restore

16-bit image operation

Safety CriticalOpenVX 1.1 SC for

safety-certifiable systems

OpenVX Roadmap under Discussion

NNEF Import

Programmable user

kernels with

accelerator offload

© Copyright Khronos Group 2017 - Page 10

New OpenVX 1.2 Functions• Feature detection: find features useful for object detection and recognition

- Histogram of gradients – HOG Template matching

- Local binary patterns – LBP Line finding

• Classification: detect and recognize objects in an image based on a set of features

- Import a classifier model trained offline

- Classify objects based on a set of input features

• Image Processing: transform an image

- Generalized nonlinear filter: Dilate, erode, median with arbitrary kernel shapes

- Non maximum suppression: Find local maximum values in an image

- Edge-preserving noise reduction

• Conditional execution & node predication

- Selectively execute portions of a graph based on a true/false predicate

• Many, many minor improvements

• New Extensions

- Import/export: compile a graph; save and run later

- 16-bit support: signed 16-bit image data

- Neural networks: Layers are represented as OpenVX nodes

B C

S

ACondition

If A then S ← B else S ← C

© Copyright Khronos Group 2017 - Page 11© Copyright Khronos Group 2017 - Page 11

OpenVX 1.2 and Neural Net Extension• Convolution Neural Network topologies can be represented as OpenVX graphs

- Layers are represented as OpenVX nodes

- Layers connected by multi-dimensional tensors objects

- Layer types include convolution, activation, pooling, fully-connected, soft-max

- CNN nodes can be mixed with traditional vision nodes

• Import/Export Extension

- Efficient handling of network Weights/Biases or complete networks

• OpenVX will be able to import NNEF files into OpenVX Neural Nets

VisionNode

VisionNode

VisionNode

Downstream

Application

Processing

Native

Camera

Control CNN Nodes

An OpenVX graph mixing CNN nodes

with traditional vision nodes

© Copyright Khronos Group 2017 - Page 12© Copyright Khronos Group 2017 - Page 12

OpenVX SC - Safety Critical Vision Processing• OpenVX 1.1 - based on OpenVX 1.1 main specification

- Enhanced determinism

- Specification identifies and numbers requirements

• MISRA C clean per KlocWorks v10

• Divides functionality into “development” and “deployment” feature sets

- Adds requirement to support import/export extension

OpenVX SC

Development Feature

Set (Create Graph)

OpenVX SC

Deployment Feature Set

(Execute Graph)

Binary

format

Verify

Export

Import

Entire graph creation API No graph creation APIImplementation

dependent format

© Copyright Khronos Group 2017 - Page 13

Safety Critical APIs

New Generation APIs for safety

certifiable vision, graphics and

computee.g. ISO 26262 and DO-178B/C

OpenGL ES 1.0 - 2003Fixed function graphics

OpenGL ES 2.0 - 2007Shader programmable pipeline

OpenGL SC 1.0 - 2005Fixed function graphics subset

OpenGL SC 2.0 - April 2016Shader programmable pipeline subset

Experience and Guidelines

OpenCL SC TSG FormedWorking on OpenCL SC 1.2

Eliminate Undefined Behavior

Eliminate Callback Functions

Static Pool of Event Objects

OpenVX SC 1.1 Released May 2017Restricted “deployment” implementation

executes on the target hardware by reading the

binary format and executing the pre-compiled

graphs

Khronos SCAP ‘Safety Critical Advisory Panel’

Guidelines for designing APIs that ease

system certification.

Open to Khronos member AND

industry experts

© Copyright Khronos Group 2017 - Page 14

Embedded/Mobile

Vision/Inferencing HardwareEmbedded/Mobile

Vision/Inferencing HardwareEmbedded/Mobile

Vision/Inferencing Hardware

Neural Net Training

FrameworksNeural Net Training

FrameworksNeural Net Training

Frameworks

Vision and Neural Net

Inferencing Runtime

Embedded/Mobile

Vision/Inferencing Hardware

Standards for Vision and Neural Networks

Vision/AI

Applications

Neural Net Training

Frameworks

Desktop and

Cloud Hardware

Trained

Networks

© Copyright Khronos Group 2017 - Page 15

OpenCL – Low-level Parallel Programing• Low-level, explicit programming of heterogeneous parallel compute resources

- One code tree can be executed on CPUs, GPUs, DSPs and FPGA …

• OpenCL C or C++ language to write kernel programs to execute on any compute device

- Platform Layer API - to query, select and initialize compute devices

- Runtime API - to build and execute kernels programs on multiple devices

• The programmer gets to control:

- What programs execute on what device

- Where data is stored in various speed and size memories in the system

- When programs are run, and what operations are dependent on earlier operations

OpenCL

Kernel

Code

OpenCL

Kernel

Code

OpenCL

Kernel

Code

OpenCL

Kernel

Code

GPU

DSPCPU

CPUFPGA

Kernel code

compiled for

devicesDevices

CPU

Host

Runtime API

loads and executes

kernels across devices

© Copyright Khronos Group 2017 - Page 16

OpenCL 2.2 Released in May 2017

2011

OpenCL 1.2

Becomes industry

baseline for heterogeneous

parallel computing

OpenCL 2.1SPIR-V 1.0

SPIR-V in CoreKernel Language

Flexibility

OpenCL 2.2SPIR-V 1.2

OpenCL C++ Kernel LanguageStatic subset of C++14

Templates and Lambdas

SPIR-V 1.2 OpenCL C++ support

PipesEfficient device-scope

communication between kernels

Code Generation Optimizations - Specialization constants at

SPIR-V compilation time- Constructors and destructors of

program scope global objects- User callbacks can be set at

program release time

201720152013

OpenCL 2.0

Enables new class of hardware

SVMGeneric AddressesOn-device dispatch

https://www.khronos.org/opencl/

© Copyright Khronos Group 2017 - Page 17

OpenCL Conformant Implementations

OpenCL 1.0Specification

Dec08 Jun10OpenCL 1.1

Specification

Nov11OpenCL 1.2

SpecificationOpenCL 2.0

Specification

Nov13

1.0|Jul13

1.0|Aug09

1.0|May09

1.0|May10

1.0 | Feb11

1.0|May09

1.0|Jan10

1.1|Aug10

1.1|Jul11

1.2|May12

1.2|Jun12

1.1|Feb11

1.1|Mar11

1.1|Jun10

1.1|Aug12

1.1|Nov12

1.1|May13

1.1|Apr12

1.2|Apr14

1.2|Sep13

1.2|Dec12

Desktop

Mobile

FPGA

2.0|Jul14

OpenCL 2.1 Specification

Nov15

1.2|May15

2.0|Dec14

1.0|Dec14

1.2|Dec14

1.2|Sep14

Vendor timelines are

first conformant

submission for each spec

generation

1.2|May15

Embedded

1.2|Aug15

1.2|Mar16

2.0|Nov15

2.0|Apr17

2.1 | Jun16

© Copyright Khronos Group 2017 - Page 18

OpenCL as Language/Library Backend

C++ based

Neural

network

framework

MulticoreWare

open source

project on

Bitbucket

Compiler

directives for

Fortran,

C and C++

Java language

extensions

for

parallelism

Language for

image

processing and

computational

photography

Single

Source C++

Programming

for OpenCL

Hundreds of languages, frameworks

and projects using OpenCL to access

vendor-optimized, heterogeneous

compute runtimes

Vision

processing

open source

project

Open source

software library

for machine

learning

Over 4,000 GitHub repositories

using OpenCL: tools, applications,

libraries, languages - up from

2,000 two years ago

© Copyright Khronos Group 2017 - Page 19

SYCL Ecosystem• Single-source heterogeneous programming using STANDARD C++

- Use C++ templates and lambda functions for host & device code

- Layered over OpenCL

• Fast and powerful path for bring C++ apps and libraries to OpenCL

- C++ Kernel Fusion - better performance on complex software than hand-coding

- Halide, Eigen, Boost.Compute, SYCLBLAS, SYCL Eigen, SYCL TensorFlow, SYCL GTX

- triSYCL, ComputeCpp, VisionCpp, ComputeCpp SDK …

• More information at http://sycl.tech

© Copyright Khronos Group 2017 - Page 20

SYCL 1.2.1 Publicly Released Today!• Major update representing two and a half years of work by Khronos members

- Incorporates significant experience gained from three separate implementations

- Feedback from developers of machine learning frameworks such as TensorFlow

- TensorFlow now supports SYCL alongside the original CUDA accelerator back-end

- Layers over OpenCL 1.2

“This is a significant release update for SYCL with an enhanced

ecosystem that matches our intention to support machine learning

and align with modern C++17. SYCL continues to help us to drive

the C++ standard towards heterogeneous support. Our intention is

to move forward quickly with the SYCL roadmap to bring even more

emphasis to machine learning and Safety Critical support, as well as

continued alignment with future ISO C++,”

Michael Wong, SYCL working group chair

C++ Kernel LanguageLow Level Control

‘GPGPU’-style separation of

device-side kernel source

code and host code

Single-source C++Programmer FamiliarityApproach also taken by

C++ AMP and OpenMP

Developer ChoiceThe development of the two specifications are aligned so

code can be easily shared between the two approaches

© Copyright Khronos Group 2017 - Page 21

OpenCL Roadmap

2011

OpenCL 1.2OpenCL C Kernel

Language

OpenCL 2.1 SPIR-V in Core

2015

SYCL 1.2C++11 Single source

programming

OpenCL 2.2 C++ Kernel Language

2017

SYCL 2.2C++14 Single source

programming

Industry working to bring

Heterogeneous compute to

standard ISO C++C++17 Parallel STL hosted by Khronos

Executors – for scheduling work

“Managed pointers” or “channels” –for sharing data

OpenCL for DSPs- Embedded imaging, vision and inferencing

- Flexible reduced precision - Conformance without IEEE 32 Floating Point

- Explicit DMA

Single source C++ programming.

Great for supporting C++ apps,

libraries and frameworks

Help bring OpenCL-

class compute to Vulkan

for GPUs

© Copyright Khronos Group 2017 - Page 22

Embedded/Mobile

Vision/Inferencing HardwareEmbedded/Mobile

Vision/Inferencing HardwareEmbedded/Mobile

Vision/Inferencing Hardware

Neural Net Training

FrameworksNeural Net Training

FrameworksNeural Net Training

Frameworks

Vision and Neural Net

Inferencing Runtime

Embedded/Mobile

Vision/Inferencing Hardware

Standards for Vision and Neural Networks

Vision/AI

Applications

Neural Net Training

Frameworks

Desktop and

Cloud Hardware

Trained

Networks

© Copyright Khronos Group 2017 - Page 23

Pervasive VulkanAll Major GPU Companies shipping Vulkan Drivers for Desktop and Mobile Platforms

Embedded LinuxAndroid TVAndroid 7.0+ Nintendo Switch

http://vulkan.gpuinfo.org/

Oculus RiftSteamVR

VR Platforms

Desktop, Mobile, Embedded and Console Platforms Supporting VulkanIncluding phones and tablets from Google, Huawei, Samsung, Sony, Xiaomi - both premium and mid-range devices

GearVR Google Daydream

Game Engines

Desktop

© Copyright Khronos Group 2017 - Page 24

Market Demand for Universal 3D Portability

Games

Engines

Native 3D

AppsBrowser

Engines Games

Engines

Native 3D

Apps

Browser

Engines

Vulkan Universally

Portable Subset

JavaScript and

WebAssembly Native bindings

for ‘nexgen WebGL’

Community Outreach at GDC 2017

Create a hybrid Portability API?

Feedback - AVOID CREATING A FOURTH API!!!

Would need new specification, CTS, Documentation.

Additional developer learning curve.

A whole new specification to name, brand, promote.

Would INCREASE industry fragmentation

Tools Layers

API Libraries

Shader Translators

MAP Vulkan to Metal

and DX12

© Copyright Khronos Group 2017 - Page 25

Vulkan Portability TSG Process

API

Overlap

Analysis

Metal Shading

Language

HLSL

Vulkan Portability Deliverables1. Vulkan Subset Diff Spec

2. Vulkan Subset Development Layer

3. Vulkan Subset API Library over DX12/Metal

4. SPIRV-Cross Translator

5. Vulkan Subset Conformance Tests

Expand/test existing

open source SPIRV-Cross Tool

Possible proposals for Vulkan extensions for

enhanced portability (and possibly Web

robustness) sent to Vulkan WG

Identify Vulkan

features not directly mappable

to DX12 and Metal

New Vulkan functionality may affect the

overlap analysis

Layers, APIs, Translators and Tests all to be

developed and released in open source

Open source project with similar goals https://github.com/gfx-rs/gfx

Vulkan on iOS and macOShttps://moltengl.com/moltenvk/

© Copyright Khronos Group 2017 - Page 26

Vulkan Evolution

Feb16

Vulkan 1.0

Vulkan ExtensionsMaintenance updates plus additional functionality

Explicit Building Blocks for VR

Explicit Building Blocks for Homogeneous Multi-GPUEnhanced Windows System Integration

Increased Shader Language Flexibility

Enhanced Cross-Process and Cross-API Sharing

Vulkan Roadmap

Regular Vulkan Core ReleasesIntegrates proven KHR extensions

Roadmap DiscussionsPushes forward the envelope of GPU Acceleration:

Enhanced Compute and Language FlexibilityOptimized Vision and Inferencing Acceleration

Ray Tracing

Widened Platform SupportThrough Vulkan Portability Initiative

Including Vulkan on macOS and iOS

Strengthened Ecosystem and SDKEnhanced developer and debugging tools

Regression testing for SDK stability

Enhanced Conformance Testing (API now has 198K test cases

- up from 107K last year)

Compiler robustness - including HLSL support

Explicit Access to GPU

Acceleration

© Copyright Khronos Group 2017 - Page 27

Clspv OpenCL C to Vulkan Compiler• Experimental collaboration between Google, Codeplay, and Adobe

- Successfully tested on over 200K lines of Adobe OpenCL C production code

- Released in open source https://github.com/google/clspv

- Tracks top-of-tree LLVM and clang, not a fork

• Uses new Vulkan extensions to support OpenCL C compute operations

- VK_KHR_16bit_storage/SPV_KHR_16bit_storage

- VK_KHR_variable_pointers/SPV_KHR_variable_pointers

• Compiles OpenCL C’s programming model to Vulkan’s SPIR-V execution environment

- Proof-of-concept that OpenCL compute can be brought seamlessly to Vulkan

Vulkan

Universal

Portability

OpenCL-class

compute in

Vulkan

Universally

Portable

Graphics and

Advanced

Compute

© Copyright Khronos Group 2017 - Page 28

Dedicated Vision

Hardware

Layered Vision/ Neural Net Ecosystem

Programmable Vision

Processors

Application

Implementers may use OpenCL or Vulkan to implement

OpenVX nodes on programmable processors

And then developers can use OpenVX to enable a

developer to easily connect those nodes into a graph

The OpenVX graph abstraction enables implementers to optimize execution

across diverse hardware architectures for optimal power and performance

OpenVX enables the graph to be extended to include hardware

architectures that don’t support programmable APIs

Vulkan RoadmapEnhanced compute – vision and

inferencing on any platforms with

GPU acceleration

OpenCL RoadmapFlexible precision for

widespread deployment on low-

cost embedded processors

© Copyright Khronos Group 2017 - Page 29

Khronos Cooperative Framework

Royalty-Free

Specifications

Adopters

Programs

Implementations, Tools

Documentation, Samples

Adopters Build conformant

implementation and products

DevelopersDevelop applications

using the APIs

Educators / CertifiersCreate Courses

Training and Certification

Educator Guidelines

Courseware Materials

API Working Groups (Industry, Academic and

Associate members)

$

$

Khronos BoardStrategy, budget and

oversight$$

One vote per industry member

Conformance

Tests

Open and

royalty-free.

Often open sourced

Advisory PanelsBy invitation, no fee.

Provide requirements

and draft spec feedback

Industry and

CommunityFeedback and

contributions on

released materials

Membership FeesAcademic $1,000

Small companies $175/employee

(min $3,500)

Non-profits $7,500

Larger companies - $18,000

© Copyright Khronos Group 2017 - Page 30

New Open Source Engagement Model• Khronos is open sourcing specification sources,

conformance tests, tools

- Merge requests welcome from the community

(subject to review by OpenCL working group)

• Deeper Community Enablement

- Mix your own documentation!

- Contribute and fix conformance tests

- Fix the specification, headers, ICD etc.

- Contribute new features (carefully)

Khronos builds and Ratifies

Canonical Specification under

Khronos IP Framework.

No changes or re-hosting allowed

Spec and Ref Language Source and

derivative materials.

Re-mixable under CC-BY by the

industry and community

Source Materials for Specifications and Reference

Documentation CONTRIBUTED Under Khronos IP Framework

(you won’t assert patents against conformant implementations,

and license copyright for Khronos use)

Contributions and

Distribution under

Apache 2.0 Spec Build

System and

Scripts

Spec and

Ref Language

SourceRedistribution

under CC-BY 4.0

Conformance

Test Suite

Source

Community built

documentation and

tools

Contributions and

Distribution under

Apache 2.0

Anyone can test any

implementation at

any time

Khronos Adopters

Program

Conformant Implementations can

use trademark and are covered

by Khronos IP Framework

© Copyright Khronos Group 2017 - Page 31

Khronos Advisory Panels

Working

Group

Advisory

Panel

Shared Email

list and

Repository

Khronos MembersAny company can join

Membership Fee

Sign NDA and IP Framework

Khronos AdvisorsInvited industry experts

$0 Cost

Sign NDA and IP Framework

Specification drafts and

invitations for

requirements and feedback

Requirements and

feedback on

specification drafts

Hosted by Khronos.

Under Khronos NDA

Advisory Panels Active for Vulkan, OpenCL/SYCL and NNEF

© Copyright Khronos Group 2017 - Page 32© Copyright Khronos Group 2017 - Page 32

Need for Camera Control API?• Advanced control of ISP and camera subsystem – with cross-platform portability

- Generate sophisticated image stream for advanced imaging & vision apps

• No platform API currently fulfills all developer requirements

- Portable access to growing sensor diversity: e.g. depth sensors and sensor arrays

- Cross sensor synch: e.g. synch of camera and MEMS sensors

- Advanced, high-frequency per-frame burst control of camera/sensor: e.g. ROI

- Multiple input, output re-circulating streams with RAW, Bayer or YUV Processing

Image Signal

Processor (ISP)

Defines control of Sensor, Color Filter Array

Lens, Flash, Focus, Aperture

Auto Exposure (AE)

Auto White Balance (AWB)

Auto Focus (AF) Stream of Images for

Vision Processing

OpenKCAM standard is

currently on ice – do we

need to restart?

© Copyright Khronos Group 2017 - Page 33© Copyright Khronos Group 2017 - Page 33

Please Get Involved!• Khronos is creating cutting-edge royalty-free open standards

- For embedded vision and neural network acceleration

• Khronos standards are key to many emerging markets such as

3D rendering and advanced gaming, Augmented and Virtual Reality,

Embedded Vision and machine learning

- Advanced next generation capabilities for ALL platforms

• Khronos encourages your participation

- www.khronos.org

- [email protected]

- @neilt3d

© Copyright Khronos Group 2017 - Page 34© Copyright Khronos Group 2017 - Page 34

OpenVX - Graph-Level Abstraction • OpenVX developers express a graph of image operations (‘Nodes’)

- Using a C API

• Nodes can be executed on any hardware or processor coded in any language

- Implementers can optimize under the high-level graph abstraction

• Graphs are the key to run-time power and performance optimizations

- E.g. Node fusion, tiled graph processing for cache efficiency etc.

Array of

Keypoints

YUV

Frame

Gray

Frame

Camera

Input

Rendering

Output

Pyrt

Color Conversion

Channel Extract

Optical Flow

Harris Track

Image Pyramid

RGB

Frame

Array of

FeaturesFtrt-1OpenVX Graph

OpenVX Nodes

Feature Extraction Example Graph

© Copyright Khronos Group 2017 - Page 35

OpenVX Efficiency through Graphs..

Reuse

pre-allocated

memory for

multiple

intermediate data

Memory

Management

Less allocation overhead,

more memory for

other applications

Replace a sub-

graph with a

single faster node

Kernel

Fusion

Better memory

locality, less kernel

launch overhead

Split the graph

execution across

the whole

system: CPU /

GPU / dedicated

HW

Graph

Scheduling

Faster execution

or lower power

consumption

Execute a sub-

graph at tile

granularity

instead of image

granularity

Data

Tiling

Better use of

data cache and

local memory

© Copyright Khronos Group 2017 - Page 36

SPIR-V Ecosystem

LLVM

Third party kernel and

shader languages

SPIR-V• Khronos defined cross-API IR

• Native graphics and parallel compute

• Easily parsed/extended 32-bit stream

• Data object/control flow retained for

effective code generation/translation

OpenCL C++Front-end

OpenCL CFront-end

glslangKhronos open source

tools and translators

IHV Driver

Runtimes

Additional

Intermediate Forms

SPIR-V Validator

SPIR-V (Dis)Assembler

LLVM to SPIR-V

Bi-directional

Translator

https://github.com/KhronosGroup/SPIRV-Tools

GLSL HLSL

Khronos liaising with

Clang/LLVM CommunityE.g. discussing SPIR-V as

supported Clang target

SPIRV-CrossGLSL

HLSL

MSL

SPIRV-opt | SPIRV-remap

SPIR-V Optimizations• Inlining (exhaustive)

• Store/Load Elimination

• Dead Code Elimination

• Dead Branch Elimination

• Common Uniform Elimination

Coming• Loop Unrolling and Constant Folding

• Common Subexpression Elimination

© Copyright Khronos Group 2017 - Page 37

Vulkan and New Generation 3D APIs

Only AppleOnly Windows 10

Cross Platform

Clean, modern architecture | Low overhead, explicit GPU access

Portable across desktop and mobile | Multi-thread / multi-core friendly

Efficient, low-latency, predictable performance

© Copyright Khronos Group 2017 - Page 38

Vulkan Explicit GPU Control

GPU

High-level Driver

AbstractionLayered GPU Control

Context management

Memory allocation

Full GLSL compiler

Error detection

ApplicationSingle thread per context

GPU

Thin DriverExplicit GPU Control

ApplicationMemory allocation

Thread management

Synchronization

Multi-threaded generation

of command buffers

Multiple Front-end

CompilersGLSL, HLSL etc.

Loadable debug and

validation layers

Vulkan 1.0 provides access to

OpenGL ES 3.1 / OpenGL 4.X-class GPU functionality

but with increased performance and flexibility

SPIR-V pre-compiled shaders

Complex drivers

cause overhead

and inconsistent

behavior across

vendors

Always active

error handling

Full GLSL

preprocessor and

compiler in

driver

OpenGL vs.

OpenGL ES

Resource management

offloaded to app:

low-overhead, low-latency

driver

Consistent behavior:

no ‘fighting with driver

heuristics’

Validation and debug layers

loaded only when needed

SPIR-V intermediate

language: shading language

flexibility

Multi-threaded command

creation. Multiple graphics,

command and DMA queues

Unified API across all

platforms with feature set

flexibility

Vulkan = high performance and

low latency 3D and GPU Compute.

Ideal for VR/AR applications

© Copyright Khronos Group 2017 - Page 39

OpenCL Ecosystem

Hardware ImplementersDesktop/Mobile/Embedded/FPGA

100s of applications using

OpenCL accelerationRendering, visualization, video editing,

simulation, image processing, vision and

neural network inferencing

Single Source C++ Programming

Portable Kernel Intermediate Language

Core API and Language Specs

OpenCL 2.2 – Top to Bottom C++