deep neural networks changing the autonomous …€¦ · 2 the rise of gpu computing 1980 1990 2000...

46
Dennis Lui | August 2017 DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS VEHICLE LANDSCAPE

Upload: others

Post on 12-Apr-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

Dennis Lui | August 2017

DEEP NEURAL NETWORKSCHANGING THE AUTONOMOUS VEHICLE LANDSCAPE

Page 2: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

2

THE RISE OF GPU COMPUTING

1980 1990 2000 2010 2020

GPU-Computing perf

1.5X per year

1000X

by

2025

Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte,

O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected

for 2010-2015 by K. Rupp

102

103

104

105

106

107

Single-threaded perf

1.5X per year

1.1X per year

APPLICATIONS

SYSTEMS

ALGORITHMS

CUDA

ARCHITECTURE

Page 3: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

3

Page 4: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

4

AI BREAKTHROUGHSRecent Breakthroughs

“Superhuman” Image Recognition

Atari Games

AlphaGo Rivals World Champion

Conversational Speech Recognition

Lip Reading

2015 2016 2017

Page 5: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

5

AI IMPROVING AT AMAZING RATES

IMAGENET ACCURACYSPEECH RECOGNITION

ACCURACY

Page 6: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

6

AI IS THE SOLUTION TO SELF DRIVING

PERCEPTION REASONING DRIVING

HD MAP AI COMPUTINGMAPPING

Page 7: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

7

DEEP LEARNING FOR AUTONOMOUS DRIVING

Page 8: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

8

DEEP NEURAL NETWORK

…..

I0

I1

I2

In

w0

w1

w2

wn

…..

Page 9: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

9

Description

Demonstrates

NVIDIA’s proprietary

deep neural network

(DNN) to perform

object detection

Types Detected/Color Code

Red: Cars

Cyan: Trucks

Green: Traffic Signs

(Detection Only)

Blue: Bicycles

Yellow: Pedestrians

MULTICLASS OBJECT DETECTION &CLASSIFICATION NETWORK

Page 10: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

10

Description

Demonstrates NVIDIA’s

proprietary deep neural

network (DNN) to perform

lane detection on the road

Detects ego-lane by

showing the boundaries of

the left and right lane, and

in some cases, is able to

show the left and right

boundaries of adjacent

lanes as well

Color CodeRed: Ego-lane left

Green: Ego-lane right

Yellow: Left adjacent lane

Blue: Right adjacent lane

LANE DETECTION NETWORK

Page 11: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

11

Description

Demonstrates

NVIDIA’s proprietary

deep neural network

(DNN) to detect free

space in front of the

vehicle.

Color Code

Red: cars

Green: Curbs

Blue: Pedestrians

Yellow: Others

FREE SPACE DETECTION NETWORK

Page 12: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

12

END-TO-END AUTONOMOUS DRIVING NETWORK

Training

data

Human actuator commands

Sensordata

Error (training) signal

CNN actuator

commands

CNN

Page 13: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

13

END-TO-END AUTONOMOUS DRIVING NETWORK

Page 14: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

14

GPU DEEP LEARNING COMPUTING MODEL

Page 15: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

15

MANAGE TRAIN DEPLOY

MANAGE / AUGMENT DATA

CENTERAUTOMOTIVEEMBEDDED

TRAINTEST

DIGITS

PROTOTXT

TensorRT

A COMPLETE DEEP LEARNING PLATFORM

Page 16: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

16

DEEP LEARNING PLATFORMSFrom Training to Development and Production

DRIVE PX 2

2 PARKER + 2 PASCAL GPU

20 TOPS DL

120 SPECINT

80W

XAVIER

30 TOPS DL

160 SPECINT

30W

Nvidia DGX-1

with Tesla V100 (DGX-1V)

DEPLOYTRAINING

ONE ARCHITECTURE

Page 17: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

17

TENSOR CORE

CUDA TensorOp instructions & data formats

4x4 matrix processing array

D[FP32] = A[FP16] * B[FP16] + C[FP32]

Optimized for deep learning

Activation Inputs Weights Inputs Output Results

Page 18: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

18

DATASET CREATION

Page 19: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

19

DATA ACQUISITION

Vehicle InformationOther Sensors

CAN

10 Gigabit Ethernet /

USB 3

12x Camera

GMSL DRIVE™ PX 2

Network

Drive / SSD

Gigabit Ethernet

Lidar/ Radar / GPS

Page 20: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

20

DATA ANNOTATION

Filter and keep data of interest Bounding boxes, per pixel labeling

DATA CURATIONSTART FROM

TRAINED NETWORK

May reduce required data size

DATASET CREATION

Page 21: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

21

TRAINING DEEP NEURAL NETWORKS

Page 22: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

22

NVIDIA DIGITSInteractive Deep Learning GPU Training System

MANAGE DATA CONFIGURE NEURAL NET MONITOR TRAINING VISUALIZE RESULTS

Page 23: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

23

NVIDIA DIGITSMonitor Training

WELL BEHAVED OVERFIT ILL BEHAVED MANY OUTPUTS

Page 24: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

24

DNN TRAININGIterate and Innovate Faster

96X

7.4 hoursDGX-1

with Tesla V100

40X1X

18 hours

711 hours

Workload: ResNet50, 90 epochs to solution | CPU Server: Dual Xeon E5-2699 v4, 2.6GHz

8-way GPU Server

Dual Socket CPU

Page 25: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

25

DNN INFERENCE OPTIMIZATIONS

Page 26: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

26

TensorRTPRUNINGHARDWARE

ACCELERATIONS

DNN INFERENCE OPTIMIZATIONS

Specialized instructions for deep learning

operations

Accelerated neural network inference engine

Prune down the network size

(neurons + connections) to reduce inference time

Page 27: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

27

HARDWARE ACCELERATIONSSpecialized Instruction for Deep Learning Operations

INT8 dot product

DP4A

int8 int8 int8 int8

int8 int8 int8 int8

int32

int32

int32

A

B

C

D

Page 28: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

28

PRUNING

Prune Connections

Train Connectivity

Train WeightsBefore Pruning After Pruning

PruningSynapses

PruningNeurons

Page 29: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

29

PRUNING

NETWORK TOP-1 ERROR TOP-5 ERROR PARAMETERS COMPRESSION RATE

LeNet-300-100 Ref 1.64% - 262K 12x

LeNet-300-100 Pruned 1.59% - 22K

LeNet-5 Ref 0.80% - 431K 12x

LeNet-5 Pruned 0.77% - 36K

AlexNet Ref 42.78% 19.73% 61M 9x

AlexNet Pruned 42.77% 19.67% 6.7M

VGG-16 Ref 31.50% 11.32% 138M 13x

VGG-16 Pruned 31.34% 10.88% 10.3M

Source: Han, S., Pool, J., Tran, J., Dally, W.J.: Learning both weights and connections for efficient neural networks. CoRR abs/1506.02626 (2015), http://arxiv.org/ abs/1506.02626

Page 30: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

30

TensorRT

High-performance framework makes it easy to develop GPU-accelerated inference

Production deployment solution for deep learning inference

Optimized inference for a given trained neural network and target GPU

Solutions for Hyperscale, ADAS, Embedded

Supports deployment of fp32,fp16,int8* inference

TensorRT for Data Center

Image Classification

Object

Detection

Image

Segmentation

TensorRT for Automotive

PedestrianDetection

Lane

Tracking

Traffic Sign

Recognition

NVIDIA DRIVE PX 2* int8 support will be available from v2

Page 31: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

31

TensorRTOptimizations

Fuse network layers

Eliminate concatenation layers

Kernel specialization

Auto-tuning for target platform

Tuned for given batch sizeTRAINED

NEURAL NETWORK

OPTIMIZEDINFERENCERUNTIME

Page 32: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

32

TensorRT — iGPU (FP16)GoogleNet, AlexNet, VGG19

94

189

226

258272

112

232

353366

450

1735 39 39 40

0

100

200

300

400

500

1 2 4 8 16

Images

/ s

econd

Batch size

GoogleNet AlexNet VGG19

Configuration: HW (DRIVE PX 2 iGPU@1275 MHz), SW (PDK ALPHA 2.0, TensorRT 1.0RC), GoogleNet & VGG19 Input Image Resolution (224x224)AlexNet Input Image Resolution (227x227)

Page 33: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

33

TensorRT — dGPU (INT8)GoogleNet, AlexNet, VGG19

427

704

1051

1309

1511

423

799

1401

2152

2948

111 154 181 208 219

0

500

1000

1500

2000

2500

3000

3500

1 2 4 8 16

Images

/ s

econd

Batch size

GoogleNet AlexNet VGG19

Configuration: HW (DRIVE PX 2 dGPU@1290 MHz), SW (PDK ALPHA 2.0, TensorRT 2.0EA), GoogleNet & VGG19 Input Image Resolution (224x224)AlexNet Input Image Resolution (227x227)

Page 34: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

34

FP32Training Framework

INT8 OPTIMIZATION USING TensorRT

INT8 RUNTIMEUSING TensorRT

INT8PLAN

FP32 NEURALNETWORK

Calibration

Dataset

Calibration

Parameters

Validation

Dataset

Scoring

Function

Batch Size

Precision

TensorRTINT8 Workflow

Page 35: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

35

TensorRT8-bit Inference: Top-1 Accuracy

NETWORK FP32 TOP1 INT8 TOP1 DIFFERENCE PERF GAIN

AlexNet 57.22% 56.96% 0.26% 3.70x

GoogLeNet 68.87% 68.49% 0.38% 3.01x

VGG 68.56% 68.45% 0.11% 3.23x

Resnet-152 75.18% 74.56% 0.61% 3.42%

Page 36: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

36

TensorRT — GRAPH OPTIMIZATIONUnoptimized Network

concat

max pool

input

next input

3x3 conv.

relu

bias

1x1 conv.

relu

bias

1x1 conv.

relu

bias

1x1 conv.

relu

bias

concat

1x1 conv.

relu

bias5x5 conv.

relu

bias

Page 37: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

37

TensorRT — GRAPH OPTIMIZATIONVertical Fusion

concat

max pool

input

next input

concat

1x1 CBR 3x3 CBR 5x5 CBR 1x1 CBR

1x1 CBR 1x1 CBR

Page 38: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

38

TensorRT — GRAPH OPTIMIZATIONHorizontal Fusion

concat

max pool

input

next input

concat

3x3 CBR 5x5 CBR 1x1 CBR

1x1 CBR

Page 39: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

39

TensorRT — GRAPH OPTIMIZATIONConcat Elision

max pool

input

next input

3x3 CBR 5x5 CBR 1x1 CBR

1x1 CBR

Page 40: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

40

AUTONOMOUS DRIVING CHALLENGES

Page 41: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

41

AUTONOMOUS DRIVINGChallenges

Page 42: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

42

AUTONOMOUS DRIVINGChallenges

Page 43: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

43

PUTTING IT ALL TOGETHER

Page 44: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

44

Page 45: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

45

https://developer.nvidia.com/drive

http://www.nvidia.com/object/volta-architecture-whitepaper.html

https://www.nvidia.com/en-us/data-center/dgx-1/

https://developer.nvidia.com/digits

https://developer.nvidia.com/tensorrt

RESOURCES

NVIDIA DRIVE Platform

NVIDIA DGX-1

NVIDIA DIGITS

Volta Architecture Whitepaper

TensorRT

Page 46: DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS …€¦ · 2 THE RISE OF GPU COMPUTING 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the

QUESTIONS?