nvidia ai brain of self driving and hd mappingimages.nvidia.com/cn/gtc/downloads/pdf/auto/401... ·...

34
September 13, 2016 NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPING

Upload: others

Post on 18-Jan-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPINGimages.nvidia.com/cn/gtc/downloads/pdf/auto/401... · • 64 GB eMMC • 64 MB Boot ROM • Automotive IO • Connect & fuse data from

September 13, 2016

NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPING

Page 2: NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPINGimages.nvidia.com/cn/gtc/downloads/pdf/auto/401... · • 64 GB eMMC • 64 MB Boot ROM • Automotive IO • Connect & fuse data from

2

Training on DGX-1

Driving with DriveWorks

KALDI

LOCALIZATION

MAPPING

DRIVENET

NVIDIA DGX-1 NVIDIA DRIVE PX 2

AI FOR AUTONOMOUS DRIVING

Page 3: NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPINGimages.nvidia.com/cn/gtc/downloads/pdf/auto/401... · • 64 GB eMMC • 64 MB Boot ROM • Automotive IO • Connect & fuse data from

3

• Tegra Parker SoC

• 1.3 TFLOPS GPU

• 6 CPU Cores

• Integrated ISP

• 8 GB LPDDR4

• 64 GB eMMC

• 64 MB Boot ROM

• Automotive IO

• Connect & fuse data

from cameras, LIDAR,

radar, ultrasonic sensors

• Includes DriveWorks

software & SDK

• 125 x 125 mm

• 10 W

NVIDIA DRIVE PX 2 FOR AUTOCRUISE

Page 4: NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPINGimages.nvidia.com/cn/gtc/downloads/pdf/auto/401... · • 64 GB eMMC • 64 MB Boot ROM • Automotive IO • Connect & fuse data from

4

NVIDIA DRIVE PX 2 FOR AUTOCHAUFFEUR

• Processing Power

• 2x Tegra Parker SoC

• 2x Pascal dGPU

• 8 TFLOPS

• 24 DNN TOPs

• Connect & fuse data

from up to 12 cameras,

LIDAR, radar, ultrasonic

sensors

• Includes DriveWorks

software & SDK

• Platform for AI, part of

deep learning system

• Available now

Page 5: NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPINGimages.nvidia.com/cn/gtc/downloads/pdf/auto/401... · • 64 GB eMMC • 64 MB Boot ROM • Automotive IO • Connect & fuse data from

5

A PEEK INSIDE DRIVE PX 2 PARKER SOC

CPU GPU ACCEL

A57 A57

A57 A57

2MB L2

Pascal

SM

Pascal

SM

HDR ISP

VIDEO DECODE

VIDEO ENCODE

128b LPDDR4 – 51.2GB/sec

x12 CSI x6 IO SERDES @ 5 Gbps

512KB L2

DENVER DENVER

2MB L2

ASIL-B SAFETY ARCHITECTURE Integrated Safety Engine

Lock-Step R5 Cluster

Memory Error Correction

PERFORMANCE Best in Class GPU & CPU

1.3 TFLOPS processing

1.5Gpix/s Native HDR ISP

Highest Memory BW and Efficiency

Lowest Sustained Power Consumption

AUTOMOTIVE INTEGRATION CAN and Ethernet AVB I/O

Up to 12 Camera Inputs

x6 IO SERDES up to 5Gbps

Page 6: NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPINGimages.nvidia.com/cn/gtc/downloads/pdf/auto/401... · • 64 GB eMMC • 64 MB Boot ROM • Automotive IO • Connect & fuse data from

6

NVIDIA AI SELF-DRIVING CAR PLATFORM

CLOUD HD MAP

NVIDIA DRIVEWORKS OS

MAPPING

AI

PERCEPTION

AI

LOCALIZATION

AI/CV

DRIVING

AI

NVIDIA DRIVE PX

CAR AI SUPERCOMPUTER

CLOUD MAP

+

AI ALGORITHMS

+

AI SUPERCOMPUTER

Page 7: NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPINGimages.nvidia.com/cn/gtc/downloads/pdf/auto/401... · • 64 GB eMMC • 64 MB Boot ROM • Automotive IO • Connect & fuse data from

7

SOFTWARE

NVIDIA Vibrante Linux & Comprehensive BSP

Rich Middleware

SDK, Samples and more

A full stack of rich software components

Page 8: NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPINGimages.nvidia.com/cn/gtc/downloads/pdf/auto/401... · • 64 GB eMMC • 64 MB Boot ROM • Automotive IO • Connect & fuse data from

8

DRIVEINSTALL An easy tool to flash your board

9/23/2016

Page 9: NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPINGimages.nvidia.com/cn/gtc/downloads/pdf/auto/401... · • 64 GB eMMC • 64 MB Boot ROM • Automotive IO • Connect & fuse data from

9

DRIVEWORKS SDK

Tegra, Discrete GPU

Cameras

Other Sensors

NVMEDIA CUDA, OpenGL ES

CUDA accelerated libs: GIE, cuDNN, NPP, …

DriveWorks SAL

DriveWorks Algorithm Modules

Autonomous Driving Applications

DriveWorks Tools

SUBJECT TO CHANGE.

SW Stack

HW Linux SDK DriveWorks Applications

Page 10: NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPINGimages.nvidia.com/cn/gtc/downloads/pdf/auto/401... · • 64 GB eMMC • 64 MB Boot ROM • Automotive IO • Connect & fuse data from

10

DRIVEWORKS TOOLS

SMART CAMERAS

CAMERAS

LIDAR

RADAR

Page 11: NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPINGimages.nvidia.com/cn/gtc/downloads/pdf/auto/401... · • 64 GB eMMC • 64 MB Boot ROM • Automotive IO • Connect & fuse data from

11

CALIBRATION AND SENSOR REGISTRATION Set of tools to calibrate sensors, and runtime module to perform online calibration

Features • Factory calibration tool (goal: zero stop calibration) • Camera Intrinsic calibration – OCAM/Pinhole model. Pattern • Camera Extrinsic calibration

• Two cameras • 4-camera setup (surroundview config)

• Lidar to camera extrinsic calibration

• Online calibration • Recalibration of extrinsics only, with possible extension

recalibrate intrinsics as well • Optimized bundle adjustment for automotive configurations

Modules • Productized tools • Patterns and tool for intrinsic calibration • Patterns and library for extrinsic calibration • Libraries for on rig calibration

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. SUBJECT TO CHANGE.

Page 12: NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPINGimages.nvidia.com/cn/gtc/downloads/pdf/auto/401... · • 64 GB eMMC • 64 MB Boot ROM • Automotive IO • Connect & fuse data from

12

CALIBRATION AND SENSOR REGISTRATION

• Rig defines sensors and also rough location estimates

• Camera Intrinsics: OCAM and OpenCV Pinhole parameters

• Camera Extrinsics: 4 SurroundView or relative between 2 cameras

• Lidar Extrinsic: relative to a camera that sees the pattern

Rig Configuration Camera Intrinsics Camera Extrinsics Lidar Extrinsics

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. SUBJECT TO CHANGE.

Page 13: NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPINGimages.nvidia.com/cn/gtc/downloads/pdf/auto/401... · • 64 GB eMMC • 64 MB Boot ROM • Automotive IO • Connect & fuse data from

13

TRACE CAPTURING AND REPLAY

Same platform and SW as both development and deployment

• Tuned performance to avoid glitches during capturing and recording

• Optimized for Load balancing threads and cores, memory and IO

Unique time synchronization protocol (PTP Aurix)

Single man operation:

• Support to launch multi-sensor recording at one key press

• Coordinated play/pause/stop for all sensors

For future versions include

• Synchronization between multiple processes (different Tegras)

• Built-in calibration capabilities

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. SUBJECT TO CHANGE.

Page 14: NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPINGimages.nvidia.com/cn/gtc/downloads/pdf/auto/401... · • 64 GB eMMC • 64 MB Boot ROM • Automotive IO • Connect & fuse data from

14

DATA LOGGING

Data Logger DRIVE™ PX 2

Camera

Bypass

Vehicle Information Other Sensors

CAN/FlexRay

Gigabit Ethernet

Other Sensors & GPS (LiDAR/Radar)

2x 10 Gigabit Ethernet

Uncompressed Camera stream

12x Camera

GMSL

2x HDMI (4K/60) as Data pipe

Optional: for uncompressed video

10 Gigabit Ethernet

Optional: for uncompressed video ECU debug data

12x Camera

GMSL HDMI-10GbE

Page 15: NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPINGimages.nvidia.com/cn/gtc/downloads/pdf/auto/401... · • 64 GB eMMC • 64 MB Boot ROM • Automotive IO • Connect & fuse data from

15

DRIVEWORKS SDK

Tegra, Discrete GPU

Cameras

Other Sensors

NVMEDIA CUDA, OpenGL ES

CUDA accelerated libs: GIE, cuDNN, NPP, …

DriveWorks SAL

DriveWorks Algorithm Modules

Autonomous Driving Applications

DriveWorks Tools

SUBJECT TO CHANGE.

SW Stack

HW Linux SDK DriveWorks Applications

Page 16: NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPINGimages.nvidia.com/cn/gtc/downloads/pdf/auto/401... · • 64 GB eMMC • 64 MB Boot ROM • Automotive IO • Connect & fuse data from

16

DRIVEWORKS SAL

SMART CAMERAS

CAMERAS

LIDAR

RADAR

Page 17: NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPINGimages.nvidia.com/cn/gtc/downloads/pdf/auto/401... · • 64 GB eMMC • 64 MB Boot ROM • Automotive IO • Connect & fuse data from

17

SENSOR ABSTRACTION LAYER (SAL)

Goals

• Provide a common and simple unified interface to the sensors

• Provide both HW sensor abstraction as well as virtual sensors (for replay)

• Provide raw sensor serialization (for recording)

• Deal with platform and SW particularities

• API/Processor Conversion/transfer: CUDA, GL, NvMedia, CPU

• Exploit additional SoC engines: H264/H265 codec, VIC

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. SUBJECT TO CHANGE.

Page 18: NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPINGimages.nvidia.com/cn/gtc/downloads/pdf/auto/401... · • 64 GB eMMC • 64 MB Boot ROM • Automotive IO • Connect & fuse data from

18

COMMON SENSOR API

Minimum Latency

dwSAL_createSensor

dwSAL_startSensor

dwSAL_stopSensor

dwSAL_releaseSensor

Data consumption

Prepare sensor for data delivery: power up, establish connection, open socket, allocation FIFOs, etc...

Start recording into sensor FIFO

Stop recording into sensor FIFO, drain FIFO.

Shutdown sensor, release resources

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. SUBJECT TO CHANGE.

Page 19: NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPINGimages.nvidia.com/cn/gtc/downloads/pdf/auto/401... · • 64 GB eMMC • 64 MB Boot ROM • Automotive IO • Connect & fuse data from

19

SCHEDULING

- Current paradigm is non-blocking functions and blocking with timeout

- Defined by EGL, CUDA and NvMedia paradigms and capabilities

- Goal is event-driven and non-blocking data-flow model to be light-weight and efficient

- Be able to schedule work ahead to hide latencies on triggering work for all our HW engines

- Use as little threads as necessary to increase runtime determinism of the system

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. SUBJECT TO CHANGE.

Page 20: NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPINGimages.nvidia.com/cn/gtc/downloads/pdf/auto/401... · • 64 GB eMMC • 64 MB Boot ROM • Automotive IO • Connect & fuse data from

20

DRIVEWORKS SDK

Tegra, Discrete GPU

Cameras

Other Sensors

NVMEDIA CUDA, OpenGL ES

CUDA accelerated libs: GIE, cuDNN, NPP, …

DriveWorks SAL

DriveWorks Algorithm Modules

Autonomous Driving Applications

DriveWorks Tools

SUBJECT TO CHANGE.

SW Stack

HW Linux SDK DriveWorks Applications

Page 21: NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPINGimages.nvidia.com/cn/gtc/downloads/pdf/auto/401... · • 64 GB eMMC • 64 MB Boot ROM • Automotive IO • Connect & fuse data from

21

DRIVEWORKS SDK

MODULES

Page 22: NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPINGimages.nvidia.com/cn/gtc/downloads/pdf/auto/401... · • 64 GB eMMC • 64 MB Boot ROM • Automotive IO • Connect & fuse data from

22

DRIVEWORKS SDK

DRIVEWORKS SDK

System SW

Hardware

Sensors

DETECTION

Detection/Classification Map Localization Vehicle Control

Sensor Fusion HD-Map Interfacing

Path Planning solvers Segmentation Egomotion

(SFM, Visual Odometry)

Scene understanding

V4L/V4Q, CUDA , cuDNN, NPP, OpenGL, …

Camera, LIDAR, Radar, GPS, Ultrasound, Odometry, Maps

Tegra , dGPU

LOCALIZATION DRIVING VISUALIZATION

ADAS rendering

Debug Rendering

Streaming to cluster

SUBJECT TO CHANGE.

Overview

Page 23: NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPINGimages.nvidia.com/cn/gtc/downloads/pdf/auto/401... · • 64 GB eMMC • 64 MB Boot ROM • Automotive IO • Connect & fuse data from

23

DRIVEWORKS ALGORITHM MODULES

Sensor

Fusion Detection

Locali-

zation HD Maps Planning Driving

NVIDIA DRIVENET

Page 24: NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPINGimages.nvidia.com/cn/gtc/downloads/pdf/auto/401... · • 64 GB eMMC • 64 MB Boot ROM • Automotive IO • Connect & fuse data from

24

FP 32 / FP16 / INT8 | Vertical & Horizontal Fusion | Pruning

VGG, GoogLeNet, ResNet, AlexNet & Custom Layers

Available on DRIVE PX 2 Today

TENSOR RT INFERENCE ENGINE

Page 25: NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPINGimages.nvidia.com/cn/gtc/downloads/pdf/auto/401... · • 64 GB eMMC • 64 MB Boot ROM • Automotive IO • Connect & fuse data from

25

A COMPLETE DEEP LEARNING PLATFORM MANAGE TRAIN DEPLOY

DIGITS

DATA CENTER AUTOMOTIVE

TRAIN TEST

MANAGE / AUGMENT EMBEDDED

GPU INFERENCE ENGINE

PROTOTXT

Page 26: NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPINGimages.nvidia.com/cn/gtc/downloads/pdf/auto/401... · • 64 GB eMMC • 64 MB Boot ROM • Automotive IO • Connect & fuse data from

26

Workflow

DIGITS OPTIMIZATION RUNTIME PLAN NEURAL NETWORK

TENSOR RT INFERENCE ENGINE

Page 27: NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPINGimages.nvidia.com/cn/gtc/downloads/pdf/auto/401... · • 64 GB eMMC • 64 MB Boot ROM • Automotive IO • Connect & fuse data from

27

GRAPH OPTIMIZATION Unoptimized network

concat

max pool

input

next input

3x3 conv.

relu

bias

1x1 conv.

relu

bias

1x1 conv.

relu

bias

1x1 conv.

relu

bias

concat

1x1 conv.

relu

bias 5x5 conv.

relu

bias

Page 28: NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPINGimages.nvidia.com/cn/gtc/downloads/pdf/auto/401... · • 64 GB eMMC • 64 MB Boot ROM • Automotive IO • Connect & fuse data from

28

GRAPH OPTIMIZATION Vertical fusion

concat

max pool

input

next input

concat

1x1 CBR 3x3 CBR 5x5 CBR 1x1 CBR

1x1 CBR 1x1 CBR

Page 29: NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPINGimages.nvidia.com/cn/gtc/downloads/pdf/auto/401... · • 64 GB eMMC • 64 MB Boot ROM • Automotive IO • Connect & fuse data from

29

GRAPH OPTIMIZATION Horizontal fusion

concat

max pool

input

next input

concat

3x3 CBR 5x5 CBR 1x1 CBR

1x1 CBR

Page 30: NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPINGimages.nvidia.com/cn/gtc/downloads/pdf/auto/401... · • 64 GB eMMC • 64 MB Boot ROM • Automotive IO • Connect & fuse data from

30

GRAPH OPTIMIZATION Concat elision

max pool

input

next input

3x3 CBR 5x5 CBR 1x1 CBR

1x1 CBR

Page 31: NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPINGimages.nvidia.com/cn/gtc/downloads/pdf/auto/401... · • 64 GB eMMC • 64 MB Boot ROM • Automotive IO • Connect & fuse data from

31

BUILD

// create a builder object Ibuilder* builder = createInferBuilder(gLogger); // create the network definition INetworkDefinition* network = infer->createNetwork(); // populate the network definition and map CaffeParser* parser = new CaffeParser; IBlobNameToTensor *blobToTensor = parser->parse(“net.prototxt”, “net.weights”, *network, DataType::kFLOAT); // or kHALF // tell GIE which tensors are network outputs for (auto& s : outputs) network->markOutput(*blobNameToTensor->find(s.c_str()));

Importing a Caffe Model

Page 32: NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPINGimages.nvidia.com/cn/gtc/downloads/pdf/auto/401... · • 64 GB eMMC • 64 MB Boot ROM • Automotive IO • Connect & fuse data from

32

BUILD

// Specify the maximum batch size, scratch size, internal format builder->setMaxBatchSize(maxBatchSize); builder->setMaxWorkspaceSize(1 << 20); builder->setHalf2Mode(true); // create the engine, serialize to storage (C++ ostream) ICudaEngine* builtEngine = builder->buildCudaEngine(*network); builtEngine->serialize(storage); // deserialize the engine from storage IRuntime* runtime = createInferRuntime(gLogger); ICudaEngine* engine = runtime->deserializeCudaEngine(storage);

Engine Creation

Page 33: NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPINGimages.nvidia.com/cn/gtc/downloads/pdf/auto/401... · • 64 GB eMMC • 64 MB Boot ROM • Automotive IO • Connect & fuse data from

33

RUNTIME

// create an execution context for each engine instance // Network weights are shared between contexts IExecutionContext* context = engine->createExecutionContext(); // add GIE kernels to the given cuda stream cudaEvent_t reuseInput; context->enqueue(batchSize, buffers, stream, reuseInput); <…> // wait on the execution stream cudaStreamSynchronize(stream);

Running the Engine

Page 34: NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPINGimages.nvidia.com/cn/gtc/downloads/pdf/auto/401... · • 64 GB eMMC • 64 MB Boot ROM • Automotive IO • Connect & fuse data from

34

RUNTIME

• GIE is currently split into two libraries: libnvcaffeparser and libnvinfer

• libnvinfer currently includes the builder and the runtime

• It’s possible to build/run networks without Caffe parser via C++ API

• Sample C++ API calls:

• ITensor* in = network->addInput(“input”, DataType::kFloat, Dims4{…});

• IPoolingLayer* pool = network->addPooling(in, PoolingType::kMAX, …);

• pool->setStride(Dims2{2,2});

• …

Caffe-free operation