accelerating mobile augmented reality - nvidia · 2012-12-10 · accelerating mobile augmented...

19
1 © 2012 NVIDIA - Page 1 Accelerating Mobile Augmented Reality Neil Trevett VP Mobile Content, NVIDIA President, Khronos Group © 2012 NVIDIA - Page 2 Topics Why mobile devices will be ideal for Augmented Reality The challenges for compelling mobile Augmented Reality Trends in mobile SOCs acceleration and associated APIs Emerging trends and directions in the future of mobile AR How to start developing mobile AR on Tegra devices

Upload: others

Post on 20-May-2020

12 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Accelerating Mobile Augmented Reality - NVIDIA · 2012-12-10 · Accelerating Mobile Augmented Reality Neil Trevett VP Mobile Content, NVIDIA President, ... Why mobile devices will

1

© 2012 NVIDIA - Page 1

Accelerating Mobile Augmented RealityNeil TrevettVP Mobile Content, NVIDIAPresident, Khronos Group

© 2012 NVIDIA - Page 2

Topics

Why mobile devices will be ideal for Augmented Reality

The challenges for compelling mobile Augmented Reality

Trends in mobile SOCs acceleration and associated APIs

Emerging trends and directions in the future of mobile AR

How to start developing mobile AR on Tegra devices

Page 2: Accelerating Mobile Augmented Reality - NVIDIA · 2012-12-10 · Accelerating Mobile Augmented Reality Neil Trevett VP Mobile Content, NVIDIA President, ... Why mobile devices will

2

© 2012 NVIDIA - Page 3

0

500

1000

1500

2000

Million Units

Computing Device TAMSmartphones

Tablets

Netbooks

Notebooks

Desktops

Personal Computing Redefined

Source: IDC, Gartner, Morgan Stanley

© 2012 NVIDIA - Page 4

Global Smartphone Market Share

Windows Mobile/Phone

Page 3: Accelerating Mobile Augmented Reality - NVIDIA · 2012-12-10 · Accelerating Mobile Augmented Reality Neil Trevett VP Mobile Content, NVIDIA President, ... Why mobile devices will

3

© 2012 NVIDIA - Page 5

Vision-based Augmented Reality

Camera video stream sent to the compositor

3D AugmentationRendering

3D augmentationscomposited with video stream

CameraTracking

GPS, accelerometer,

gyro and camera images used to track the

camera’s location and orientation

Camera-to-scene transform locks the 3D rendering to the real world

© 2012 NVIDIA - Page 6

AR = Input AND Output Processing

Augmented Reality is uniquely challenging

Needs sophisticated INPUT AND OUTPUT processing

In perfect harmony

Generating 3D Graphical

Augmentations

Sensing and tracking the context and the scene around the user

Detecting user gestures and actions

Compositing real and synthetic elements to delight and inform

Augmented RealityApplication

Page 4: Accelerating Mobile Augmented Reality - NVIDIA · 2012-12-10 · Accelerating Mobile Augmented Reality Neil Trevett VP Mobile Content, NVIDIA President, ... Why mobile devices will

4

© 2012 NVIDIA - Page 7

How Many Sensors are in a Smartphone?

LightProximity2 cameras3 microphones (ultrasound)TouchPosition

GPSWiFi (fingerprint)Cellular (tri-lateration)NFC, Bluetooth (beacons)

AccelerometerMagnetometerGyroscopePressureTemperatureHumidity

7

19

!

© 2012 NVIDIA - Page 8

The Vision of Wearable Computing - Tomorrow

Google Glass concept videoLooks Great! BUT…

… Google Glass hardware does not provide full FOV augmentations

… Not using vision tracking,just GPS

But wearable computers will be essential to consumer adoption of Augmented Reality..

Courtesy Googlehttp://www.youtube.com/watch?v=9c6W4CCU9M4

Page 5: Accelerating Mobile Augmented Reality - NVIDIA · 2012-12-10 · Accelerating Mobile Augmented Reality Neil Trevett VP Mobile Content, NVIDIA President, ... Why mobile devices will

5

© 2012 NVIDIA - Page 9

State-of-the-art Mobile Vision-Based AR - Today

In the meantime a lot of development is happening with todays devices..

Combining GPS, accelerometer, gyro and camera tracking

Lots of current research into best feature tracking algorithms – all of them very compute intensive

Beachhead industrial applications could use todays devices…

Courtesy Metaiohttp://www.youtube.com/watch?v=xw3M-TNOo44&feature=related

© 2012 NVIDIA - Page 10

Accelerating AR to Meet User Expectations

Ingredients for great mobile AR…

Cameras with advanced controls

Lots of secondary sensors – GPS, sensor, gyro etc.

High performance image and vision processing

Fast 3D processing

Mobile is an enabling platform for Augmented Reality

Mobile SOC and sensor capabilities are expanding quickly

But we need mobile AR to be 60Hz buttery smooth AND low power

Power is now the main challenge to increasing quality of the AR experience

SOC =‘System On Chip’

Complete compute system minus memory and some peripherals

Page 6: Accelerating Mobile Augmented Reality - NVIDIA · 2012-12-10 · Accelerating Mobile Augmented Reality Neil Trevett VP Mobile Content, NVIDIA President, ... Why mobile devices will

6

© 2012 NVIDIA - Page 11

The World’s First Mobile Quad Core,

with 5th Companion Core for Low PowerTegra 3

CPUCPUQuad Core, with 5Quad Core, with 5thth Companion CoreCompanion Core

—— Up to 1.4GHz Single Core, 1.3GHz Quad CoreUp to 1.4GHz Single Core, 1.3GHz Quad Core

GPUGPUUp to 3x Higher GPU PerformanceUp to 3x Higher GPU Performance

—— 12 Core12 Core GeForce GPUGeForce GPU

VIDEOVIDEOBluBlu--Ray Quality VideoRay Quality Video

—— 1080p1080p High Profile @ 40MbpsHigh Profile @ 40Mbps

AUDIOAUDIO HD AudioHD Audio, 7.1 channel surround, 7.1 channel surround

© 2012 NVIDIA - Page 12

Mobile SOC Performance Increases

1

100

CPU/G

PU A

GGREGATE P

ERFORMANCE

2012 2014

WAYNE

2013

2010

2011

TEGRA 2

Tegra 3 (KAL-EL)5x

LOGAN

10

Core 2 Duo

STARK

10x

50x

75x

Core i5

HTC One X+

Google Nexus 7

25x perf increase in next three

years

Page 7: Accelerating Mobile Augmented Reality - NVIDIA · 2012-12-10 · Accelerating Mobile Augmented Reality Neil Trevett VP Mobile Content, NVIDIA President, ... Why mobile devices will

7

© 2012 NVIDIA - Page 13

Power is the New Design Limit

The Process Fairy keeps bringing more transistors

Transistors are getting cheaper

The End of Voltage Scaling

The Process Fairy isn’t helping as much on power as in the past

In the Good Old DaysLeakage was not important, and voltage

scaled with feature size

L’ = L/2V’ = V/2E’ = CV2 = E/8f’ = 2fD’ = 1/L2 = 4DP’ = P

Halve L and get 4x the transistors and 8x the capability for the same power

The New RealityLeakage has limited threshold voltage,

largely ending voltage scaling

L’ = L/2V’ = ~VE’ = CV2 = E/2f’ = ~2fD’ = 1/L2 = 4DP’ = 4P

Halve L and get 4x the transistors and 8x the capability for

4x the power!!

© 2012 NVIDIA - Page 14

Mobile Thermal Design Point

2-4W4-7W

6-10W30-90W

4-5” Screen takes 250-500mW

4-5” Screen takes 250-500mW

7” Screen takes 1W7” Screen takes 1W

10” Screen takes 1-2WResolution makes a difference! The iPad3 screen takes up to

8W

10” Screen takes 1-2WResolution makes a difference! The iPad3 screen takes up to

8W

Leakage current in latest silicon geometries means that Moore’s Law no longer delivers more performance without increasing power

Typical max system power levels before thermal failureEven as battery technology improves - these thermal limits remain

Page 8: Accelerating Mobile Augmented Reality - NVIDIA · 2012-12-10 · Accelerating Mobile Augmented Reality Neil Trevett VP Mobile Content, NVIDIA President, ... Why mobile devices will

8

© 2012 NVIDIA - Page 15

How to Save Power?

Much more expensive toMOVE data than COMPUTE data

Energy efficiency must now be key metric during silicon AND software design

Awareness of where data lives, where

computation happens, how is it scheduled

Need to use hardware acceleration

Lots of processing in parallel

Efficient caching and memory usage

Reduces data movement

32-bit Integer Add1pJ

32-bit Float Operation7pJ

32-bit Register Write0.5pJ

Send 32-bits 2mm24pJ

Send 32-bits Off-chip50pJ

For 40nm,1V process

Write 32-bits to Memory600pJ

© 2012 NVIDIA - Page 16

Example - Typical Camera ISP

Camera ISP (Image Signal Processor) has little or no programmability

Scan-line-based, Data flows through compact hardware pipe

No global memory used to minimize power

BUT… computational photography apps now want to mix non-programmable ISP processing with more flexible GPU or CPU processing

ISP pipelines will provide tap/insertion points to/from CPU/GPU at critical points

~760 math Ops~42K vals = 670Kb

~250Gops @ 300MHz

~760 math Ops~42K vals = 670Kb

~250Gops @ 300MHz

Page 9: Accelerating Mobile Augmented Reality - NVIDIA · 2012-12-10 · Accelerating Mobile Augmented Reality Neil Trevett VP Mobile Content, NVIDIA President, ... Why mobile devices will

9

© 2012 NVIDIA - Page 17

Engineering Software Apps for Power

Use Dark Silicon - specialized hardware only turned on when needed

SOCs lots of space for transistors – just can’t turn them all on at same time!

Dedicated units increase locality and parallelism of computation to save power compared to programmable processors

Be power smart when using programmable processors

Instrumentation will soon appear for energy-aware compilers and profilers

Dynamic and feedback-driven software power optimization

Power optimizing compiler back-end compilers / installers

Smart, holistic use of sensors and peripherals

Wireless modems and networks

Motion sensors, cameras, networking, GPS

© 2012 NVIDIA - Page 18

Programmers View of Typical SOC c. 2012

CPU Complex1-5 Cortex A9/A15 Cores with NEON

L1 Cache

L2 Cache

UnifiedMemoryController for

0.5-2GB DDR3L/LP-DDR3

2D/3D GPUComplex

GraphicsCache

PeripheralBusses

1-4 DisplayControllers

SensorArray

Image Signal

Processor

1-3 Cameras

Video EncoderDecoder

Automatic scaling of frequency/voltage and # cores to meet current software load

Automatic scaling of frequency/voltage and # cores to meet current software load

Significant programmable acceleration. Fully C-programmable soon.

10-50+ cores

Significant programmable acceleration. Fully C-programmable soon.

10-50+ cores

Typically NO or very limited programmable functionalityTypically NO or very limited programmable functionality

Software configurability

Software configurability

VisionProcessor(Future)

Programmable? Or dedicated hardware for power efficiency?Programmable? Or dedicated

hardware for power efficiency?

HD AudioEngine / IO

Software configurability

Software configurability

Unified memory is critical departure from

PC architecture

Common virtual address space for all units – but typically not coherent

Common virtual address space for all units – but typically not coherent

Page 10: Accelerating Mobile Augmented Reality - NVIDIA · 2012-12-10 · Accelerating Mobile Augmented Reality Neil Trevett VP Mobile Content, NVIDIA President, ... Why mobile devices will

10

© 2012 NVIDIA - Page 19

Khronos Connects Software to Silicon

ROYALTY-FREE, OPEN STANDARD APIs for advanced hardware acceleration

Graphics, video, audio, compute, visual and sensor processing

Low level silicon to software interface needed on every platform

Defines the forward looking roadmap for the silicon community

Shipping on billions of devices across multiple operating systems

Rigorous conformance tests for cross-vendor consistency

Khronos is OPEN for any company to join and participate

Acceleration APIs BY the Industry FOR the Industry

© 2012 NVIDIA - Page 20

Mobile Platform Innovation Vectors

Console-Class 3DPerformance, Quality,

Controllers and TV connectivity

HTML5 and WebGLWeb Apps that can be discovered on the Net and run on any platform

VisionCameras as sensors,

Computational Photography,Gesture Processing

Sensor FusionDevices become ‘magically’ context aware –location, usage, position

New platform capabilities being driven by SILICON and APIs

Page 11: Accelerating Mobile Augmented Reality - NVIDIA · 2012-12-10 · Accelerating Mobile Augmented Reality Neil Trevett VP Mobile Content, NVIDIA President, ... Why mobile devices will

11

© 2012 NVIDIA - Page 21

Native APIs for Augmented Reality

Advanced Camera Control and stream

generation

3D Rendering and Video Composition

On GPU

AudioRendering

Application on CPUs and GPUs

Positional and GPS Sensor Data

Computer Vision/Tracking &Computational Photography

Position and Tracking SemanticsSynchronization

and sensor fusion

Positional Sensors

CameraEGLStream

Image streams to GPU and CPU

Tracked features

Dataflow and synchronization

Proprietary Vendors APIs

© 2012 NVIDIA - Page 22

FCAM – Open Source Project

Capture of stream of camera images with precision control

A pipeline that converts requests into images

All parameters packed into the requests - no visible state

Programmer has full control over sensor settings for each stream frame

Control over focus and flash

No hidden daemon running autofocus/metering

Control ISP

Can access supplemental statistics from ISP

Page 12: Accelerating Mobile Augmented Reality - NVIDIA · 2012-12-10 · Accelerating Mobile Augmented Reality Neil Trevett VP Mobile Content, NVIDIA President, ... Why mobile devices will

12

© 2012 NVIDIA - Page 23

Requested Camera Extensions for AR

Enhanced exposure parameters in FCAM single or burst mode

ISO, white balance, frame rate, focus modes, resolution

Synchronization with other system sensors

And with output displayed frame

ROI extraction

From wide angle and fish-eye lenses

Data output format control

Grayscale, RGB(A), YUV

Access to the raw data e.g. Bayer pattern

Query camera information

Focal length (fx, fy), principal point (cx, cy), skew (s), image resolution (h, w)

Spatial information of how cameras and sensors are placed on device

Calibration and lens distortion

© 2012 NVIDIA - Page 24

OpenCV – Open Source Project

Computer vision open source project

Excellent functionality - widely used in academia, fast prototyping, some products

Not an API definition and not managed by Khronos

Extensive functionality >1,000 functions

Difficult for silicon vendors to provide complete acceleration

Traditionally runs on a single CPU

Some partial acceleration projects underway: OpenCL, CUDA, Neon …

Page 13: Accelerating Mobile Augmented Reality - NVIDIA · 2012-12-10 · Accelerating Mobile Augmented Reality Neil Trevett VP Mobile Content, NVIDIA President, ... Why mobile devices will

13

© 2012 NVIDIA - Page 25

OpenCV4Tegra

Tegra-specific acceleration for OpenCV Library

OpenGL ES GLSL, ARM Multithreading and NEON optimizations

Available on Tegra Android Development Platform Toolkit (TADP)

Develop native vision functionality in the NDK

Can access from Java applications using JNI

FunctionsCore

absdiff, add, bitwise_and, bitwise_not, bitwise_or, bitwise_xor, compare, countNonZero, cvRound, dot, max, mean, meanStdDev, min, minMaxLoc, norm2, norm, phase32f, reduceC, reduceR, subtract,sum

Imgproc

Blur3x3, blur5x5, canny, cvtColorGray2RGBA, cvtColorRGBA2Gray, cvtColorYCrCb, cvtColorYUV420, dilate, erode, gaussianBlur3x3, gaussianBlur5x5, integral , medianBlur3x3, pyrDown, pyrUp , resizeAreaDownBy2 , resizeAreaDownBy4, resizeDownLinear, resizeUpLinear, scharrFilter , sobelFilter , threshold

Calib3d

Features2d (FAST detector)

Stitching (warpers)

© 2012 NVIDIA - Page 26

OpenCV4Tegra – Neon Speedups

NEON Speedup on Tegra

Page 14: Accelerating Mobile Augmented Reality - NVIDIA · 2012-12-10 · Accelerating Mobile Augmented Reality Neil Trevett VP Mobile Content, NVIDIA President, ... Why mobile devices will

14

© 2012 NVIDIA - Page 27

OpenCV4Tegra – GPU Speedups

Tegra GPU Speedup

© 2012 NVIDIA - Page 28

OpenVX

Vision Hardware Acceleration Layer

Enables hardware vendors to

implement accelerated imaging and

vision algorithms

For use by high-level libraries or apps

Focus on enabling real-time vision

On mobile and embedded systems

Diversity of efficient implementations

From programmable processors,

through GPUS to dedicated hardware

pipelines

Open source sample implementation?

Hardware vendor implementations

OpenCV open source library

Other higher-levelCV libraries

Application

Dedicated hardware can help make vision processing performant and low-power enough for pervasive ‘always-on’ use

OpenVX does not duplicate OpenCV functionality JUST

provides essential acceleration

OpenVX does not duplicate OpenCV functionality JUST

provides essential acceleration

Aiming for provisional specification in 1H 2013 and final specification 2H13

Page 15: Accelerating Mobile Augmented Reality - NVIDIA · 2012-12-10 · Accelerating Mobile Augmented Reality Neil Trevett VP Mobile Content, NVIDIA President, ... Why mobile devices will

15

© 2012 NVIDIA - Page 29

OpenVX Execution Flow

OpenVX Graph for efficient execution

Each Node can be implemented in software or accelerated hardware

EGL provides data and event interop – with streaming

BUT use of other Khronos APIs are not mandated

VXU Utility Library provides efficient access to single nodes

Open source implementation

OpenVXNode

OpenVXNode

OpenVXNode

OpenVXNode

Camera Control & Image Processing UI and Display

Compute Processing

Other Outputs

Compute Processing

Other Inputs

© 2012 NVIDIA - Page 30

Market Demand for Sensor Fusion API

Innovative use of growing sensor diversity

PORTABLE apps need to be isolated from sensor and OS

details Application developers do not wish to be Sensor

Fusion experts

Synchronized use of multiple interoperating

sensors in one app

StreamInputA High-level Sensor Fusion API

Do NOT force the application developer to access individual sensors (unlike almost all other sensor APIs)

High-level API enables sensor vendors to drive and deliver competitive

sensor fusion innovation

Page 16: Accelerating Mobile Augmented Reality - NVIDIA · 2012-12-10 · Accelerating Mobile Augmented Reality Neil Trevett VP Mobile Content, NVIDIA President, ... Why mobile devices will

16

© 2012 NVIDIA - Page 31

StreamInput - Portable Access to Sensor Fusion

Advanced Sensors EverywhereRGB and depth cameras, multi-axis motion/position, touch and gestures,

microphones, wireless controllers, haptics keyboards, mice, track pads

Apps Need SophisticatedAccess to Sensor DataWithout coding to specific

sensor hardware

Apps request semantic sensor informationStreamInput defines possible requests, e.g.

“Provide Skeleton Position” “Am I in an elevator?”

Processing graph provides sensor data streamUtilizes optimized, smart, sensor middlewareApps can gain ‘magical’ situational awareness

UniversalTimestamps

Standardized NodeIntercommunication

InputDevice

Input Device

Input Device

FilterNode

FilterNode

AppFilterNode

Spec expected in 2013

© 2012 NVIDIA - Page 32

Using Android Java for AR

CameraProcessing

3D Rendering and Video Composition

On GPU

AudioRendering

Application on CPUs

Positional and GPS Sensor Data

Computer Vision and Tracking

Position and Tracking SemanticsSynchronization

and sensor fusion

Positional Sensors

Camera

Video stream to GPU

OpenCV through JNI

SensorManager

JetPlayer

GLSurfaceViewSurfaceTexture

MediaSDK

Java CallbackPreview frames

No advanced camera controls: ROI, burst parameter control,

focus modes, format choices etc.

No advanced camera controls: ROI, burst parameter control,

focus modes, format choices etc.30FPS but 2-3 frames

latency. Framerate jitter30FPS but 2-3 frames

latency. Framerate jitter

Limited fusion. No cross sensor synch.

No virtual sensors

Limited fusion. No cross sensor synch.

No virtual sensors

Java provides access to EGL-type and OpenGL ES functionality

Java provides access to EGL-type and OpenGL ES functionality

Need synch between input and composited frames

Need synch between input and composited frames

No programmable, close to sensor, image processing

No programmable, close to sensor, image processing

RenderScript

Just CPU parallel programming – no

GPU compute

Just CPU parallel programming – no

GPU compute

Page 17: Accelerating Mobile Augmented Reality - NVIDIA · 2012-12-10 · Accelerating Mobile Augmented Reality Neil Trevett VP Mobile Content, NVIDIA President, ... Why mobile devices will

17

© 2012 NVIDIA - Page 33

http://developer.nvidia.com/develop4tegra

Tegra Android Development PackTegra Android Development Pack

CPU DEBUGGING with Nsight Tegra

GPU DEBUGGING with PerfHUD ES

OPTIMIZE applications with Tegra Profiler

REFERENCE docs, samples & tutorials

CPU DEBUGGING with Nsight Tegra

GPU DEBUGGING with PerfHUD ES

OPTIMIZE applications with Tegra Profiler

REFERENCE docs, samples & tutorials

OPTIMIZED for Tegra Android development

FLASHES Tegra DevKit with OS Image

CONFIGURED for debugging and profiling

INCLUDES Kernel symbols and DS-5 support

OPTIMIZED for Tegra Android development

FLASHES Tegra DevKit with OS Image

CONFIGURED for debugging and profiling

INCLUDES Kernel symbols and DS-5 support

For Windows, OSX and LinuxFor Windows, OSX and Linux

GET STARTED in minutes NOT hours

INSTALLS all tools required for Tegra Android

GET STARTED in minutes NOT hours

INSTALLS all tools required for Tegra Android

© 2012 NVIDIA - Page 34

AR Portability with Middleware Engines

Native APIs and Java APIsOpenGL ES, EGLStreams, StreamInput

Middleware SDKs and tools for application developers

Hides OS details – provides OS portability

Porting and optimization through close cooperation with silicon vendors

NVIDIA has close relationships with Metaio and Unity

Optimizing for Tegra power/performance

Applications

Apps Engines

Partners

Page 18: Accelerating Mobile Augmented Reality - NVIDIA · 2012-12-10 · Accelerating Mobile Augmented Reality Neil Trevett VP Mobile Content, NVIDIA President, ... Why mobile devices will

18

© 2012 NVIDIA - Page 35

Future AR Tracking

The industry has just started figuring out how to do highly realistic Augmented Reality

Research today using CUDA-accelerated PCs

Real-Time Surface Light-field Capture for Augmentation of Planar Specular SurfacesJan Jachnik, Richard A. Newcombe, Andrew J. Davison

Imperial College, UK

© 2012 NVIDIA - Page 36

More Hyper Realistic Augmentations

High-Quality Reflections, Refractions, and Caustics in Augmented Reality

and their Contribution to Visual Coherence

P. Kán, H. Kaufmann

Institute of Software Technology and Interactive Systems,

Vienna University of Technology, Vienna, Austria

Page 19: Accelerating Mobile Augmented Reality - NVIDIA · 2012-12-10 · Accelerating Mobile Augmented Reality Neil Trevett VP Mobile Content, NVIDIA President, ... Why mobile devices will

19

© 2012 NVIDIA - Page 37

CUDA on Tegra

Tegra + discrete GPU development platforms

Available to select developers

CUDA 5.0

Kepler support

Enables early development of ARM-based applications with desktop-class graphics and compute

See me after the presentation if you are interested

Or email [email protected]

© 2012 NVIDIA - Page 38

In Summary

Augmented Reality will change the way we use computing – again – but needs significantly more improvement

Multiple APIs and standards being put into place to deliver the needed performance at low power

You can get started today with AR on NVIDIA Tegra

Today – Optimized Android Java APIs, OpenCV4Tegra, OpenGL ES, Unity

Talk to me if you have interest ARM/desktop GPU development systems

Soon – FCAM, GPU Compute, OpenVX, Metaio AR SDK