deep learning in matlab - gtc on-demand featured...

28
1 © 2017 The MathWorks, Inc. Deep learning in MATLAB From Concept to CUDA Code Roy Fahn Applications Engineer Systematics [email protected] 03-7660111 Ram Kokku Principal Engineer MathWorks [email protected]

Upload: trinhbao

Post on 23-Jul-2018

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Deep learning in MATLAB - GTC On-Demand Featured Talkson-demand.gputechconf.com/gtc-il/2017/presentation/sil7137-roy fahn... · Deep learning in MATLAB From Concept to CUDA Code Roy

1© 2017 The MathWorks, Inc.

Deep learning in MATLABFrom Concept to CUDA Code

Roy Fahn

Applications Engineer

Systematics

[email protected]

03-7660111

Ram Kokku

Principal Engineer

MathWorks

[email protected]

Page 2: Deep learning in MATLAB - GTC On-Demand Featured Talkson-demand.gputechconf.com/gtc-il/2017/presentation/sil7137-roy fahn... · Deep learning in MATLAB From Concept to CUDA Code Roy

2

Talk Outline

Design Deep

Learning & Vision

Algorithms

High Performance

Deployment

• Manage large image sets

• Automate image labeling

• Easy access to models

• Pre-built training

frameworks

Automate compilation

with GPU Coder

On TitanXP: 7x faster than TensorFlow

5x faster than pyCaffe2

On Jetson: On par with TensorRT

2x faster than C++-Caffe

Accelerate and Scale

Training

• Acceleration with GPU’s

• Scale to clusters

Page 3: Deep learning in MATLAB - GTC On-Demand Featured Talkson-demand.gputechconf.com/gtc-il/2017/presentation/sil7137-roy fahn... · Deep learning in MATLAB From Concept to CUDA Code Roy

3

Example: Transfer Learning Workflow

Transfer Learning

Images

New

ClassifierLearn New

Weights

Modify

Network

Structure

Load

Reference

NetworkLabels

Training Data

Labels: Cars, Trucks,

BigTrucks, SUVs, Vans

Page 4: Deep learning in MATLAB - GTC On-Demand Featured Talkson-demand.gputechconf.com/gtc-il/2017/presentation/sil7137-roy fahn... · Deep learning in MATLAB From Concept to CUDA Code Roy

4

Example: Transfer Learning in MATLAB

Set up

training

dataset

Split, shuffle, re-arrange images

Read image, Data augmentation

(clip, rotate, resize, etc)

Easily manage large sets of images

Single line of code to access images

Operates on disk, database, big-data file system

Page 5: Deep learning in MATLAB - GTC On-Demand Featured Talkson-demand.gputechconf.com/gtc-il/2017/presentation/sil7137-roy fahn... · Deep learning in MATLAB From Concept to CUDA Code Roy

5

Example: Transfer Learning in MATLAB

Load

Reference

Network

Set up

training

dataset

Create DNNs in MATLAB

1. Easy access to research models

2. Caffe Model importer

3. Build from scratch

Page 6: Deep learning in MATLAB - GTC On-Demand Featured Talkson-demand.gputechconf.com/gtc-il/2017/presentation/sil7137-roy fahn... · Deep learning in MATLAB From Concept to CUDA Code Roy

6

Example: Transfer Learning in MATLAB

Modify

Network

Structure

Load

Reference

Network

Set up

training

dataset

Page 7: Deep learning in MATLAB - GTC On-Demand Featured Talkson-demand.gputechconf.com/gtc-il/2017/presentation/sil7137-roy fahn... · Deep learning in MATLAB From Concept to CUDA Code Roy

7

Example: Transfer Learning in MATLAB

Modify

Network

Structure

Load

Reference

Network

Set up

training

dataset

Page 8: Deep learning in MATLAB - GTC On-Demand Featured Talkson-demand.gputechconf.com/gtc-il/2017/presentation/sil7137-roy fahn... · Deep learning in MATLAB From Concept to CUDA Code Roy

8

Example: Transfer Learning in MATLAB

Learn New

Weights

Modify

Network

Structure

Load

Reference

Network

Set up

training

dataset

Many more training options

Page 9: Deep learning in MATLAB - GTC On-Demand Featured Talkson-demand.gputechconf.com/gtc-il/2017/presentation/sil7137-roy fahn... · Deep learning in MATLAB From Concept to CUDA Code Roy

9

Deep learning on CPU, GPU, multi-GPU and clusters

More GPUs

Page 10: Deep learning in MATLAB - GTC On-Demand Featured Talkson-demand.gputechconf.com/gtc-il/2017/presentation/sil7137-roy fahn... · Deep learning in MATLAB From Concept to CUDA Code Roy

10

Deep learning on CPU, GPU, multi-GPU and clusters

More GPUs

Mo

re C

PU

s

Page 11: Deep learning in MATLAB - GTC On-Demand Featured Talkson-demand.gputechconf.com/gtc-il/2017/presentation/sil7137-roy fahn... · Deep learning in MATLAB From Concept to CUDA Code Roy

11

Deep learning on CPU, GPU, multi-GPU and clusters

More GPUs

Mo

re C

PU

s

Page 12: Deep learning in MATLAB - GTC On-Demand Featured Talkson-demand.gputechconf.com/gtc-il/2017/presentation/sil7137-roy fahn... · Deep learning in MATLAB From Concept to CUDA Code Roy

13

Visualizing and Debugging Intermediate Results

Filters…

Activations

Deep Dream

Training Accuracy Visualization Deep Dream

Layer Activations Feature Visualization

• Many options for visualizations and debugging• Examples to get started

Page 13: Deep learning in MATLAB - GTC On-Demand Featured Talkson-demand.gputechconf.com/gtc-il/2017/presentation/sil7137-roy fahn... · Deep learning in MATLAB From Concept to CUDA Code Roy

14

GPU Coder for Deployment: New Product in R2017b

Neural Networks

Deep Learning, machine learning

Image Processing and

Computer Vision

Image filtering, feature detection/extraction

Signal Processing and

Communications FFT, filtering, cross correlation,

7x faster than state-of-art 700x faster than CPUs

for feature extraction

20x faster than

CPUs for FFTs

GPU Coder

Accelerated implementation of

parallel algorithms on GPUs

Page 14: Deep learning in MATLAB - GTC On-Demand Featured Talkson-demand.gputechconf.com/gtc-il/2017/presentation/sil7137-roy fahn... · Deep learning in MATLAB From Concept to CUDA Code Roy

15

GPU Coder Compilation Flow

GPU Coder

CUDA Kernel creation

Memory allocation

Data transfer minimization

• Library function mapping

• Loop optimizations

• Dependence analysis

• Data locality analysis

• GPU memory allocation

• Data-dependence analysis

• Dynamic memcpy reduction

Page 15: Deep learning in MATLAB - GTC On-Demand Featured Talkson-demand.gputechconf.com/gtc-il/2017/presentation/sil7137-roy fahn... · Deep learning in MATLAB From Concept to CUDA Code Roy

16

Scalarized MATLAB

GPU Coder Generates CUDA from MATLAB: saxpy

CUDA kernel for GPU parallelization

CUDA

Vectorized MATLAB

Loops and matrix operations are

directly compiled in to kernels

Page 16: Deep learning in MATLAB - GTC On-Demand Featured Talkson-demand.gputechconf.com/gtc-il/2017/presentation/sil7137-roy fahn... · Deep learning in MATLAB From Concept to CUDA Code Roy

17

Generated CUDA Optimized for Memory Performance

Mandelbrot space

CUDA kernel for GPU parallelization

… …

… …

CUDA

Kernel data allocation is

automatically optimized

Page 17: Deep learning in MATLAB - GTC On-Demand Featured Talkson-demand.gputechconf.com/gtc-il/2017/presentation/sil7137-roy fahn... · Deep learning in MATLAB From Concept to CUDA Code Roy

20

Algorithm Design to Embedded Deployment Workflow

MATLAB algorithm

(functional reference)

Functional test1 Deployment

unit-test

2

Desktop

GPU

C++

Deployment

integration-test

3

Desktop

GPU

C++

Real-time test4

Embedded GPU

.mex .lib Cross-compiled

.lib

Build type

Call CUDA

from MATLAB

directly

Call CUDA from

(C++) hand-

coded main()

Call CUDA from (C++)

hand-coded main().

Page 18: Deep learning in MATLAB - GTC On-Demand Featured Talkson-demand.gputechconf.com/gtc-il/2017/presentation/sil7137-roy fahn... · Deep learning in MATLAB From Concept to CUDA Code Roy

21

Demo: Alexnet Deployment with ‘mex’ Code Generation

Page 19: Deep learning in MATLAB - GTC On-Demand Featured Talkson-demand.gputechconf.com/gtc-il/2017/presentation/sil7137-roy fahn... · Deep learning in MATLAB From Concept to CUDA Code Roy

22

Algorithm Design to Embedded Deployment on Tegra GPU

MATLAB algorithm

(functional reference)

Functional test1

(Test in MATLAB on host)

Deployment

unit-test

2

(Test generated code in

MATLAB on host + GPU)

Tesla

GPU

C++

Deployment

integration-test

3

(Test generated code within

C/C++ app on host + GPU)

Tesla

GPU

C++

Real-time test4

(Test generated code within

C/C++ app on Tegra target)

Tegra GPU

.mex .lib Cross-compiled

.lib

Build type

Call CUDA

from MATLAB

directly

Call CUDA from

(C++) hand-

coded main()

Call CUDA from (C++)

hand-coded main().

Cross-compiled on host

with Linaro toolchain

Page 20: Deep learning in MATLAB - GTC On-Demand Featured Talkson-demand.gputechconf.com/gtc-il/2017/presentation/sil7137-roy fahn... · Deep learning in MATLAB From Concept to CUDA Code Roy

23

Alexnet Deployment to Tegra: Cross-Compiled with ‘lib’

Two small changes

1. Change build-type to ‘lib’

2. Select cross-compile toolchain

Page 21: Deep learning in MATLAB - GTC On-Demand Featured Talkson-demand.gputechconf.com/gtc-il/2017/presentation/sil7137-roy fahn... · Deep learning in MATLAB From Concept to CUDA Code Roy

24

End-to-End Application: Lane Detection

Transfer Learning

Alexnet

Lane detection

CNN

Post-processing

(find left/right lane

points)Image

Image with

marked lanes

Left lane co-efficients

Right lane co-efficients

Output of CNN is lane parabola co-efficients according to: y = ax^2 + bx + c

GPU coder generates code for whole application

https://tinyurl.com/ybaxnxjg

Page 22: Deep learning in MATLAB - GTC On-Demand Featured Talkson-demand.gputechconf.com/gtc-il/2017/presentation/sil7137-roy fahn... · Deep learning in MATLAB From Concept to CUDA Code Roy

25

How Good is Generated Code Performance

Performance of image processing and computer vision

Performance of CNN inference (Alexnet) on Titan XP GPU

Performance of CNN inference (Alexnet) on Jetson (Tegra) TX2

Page 23: Deep learning in MATLAB - GTC On-Demand Featured Talkson-demand.gputechconf.com/gtc-il/2017/presentation/sil7137-roy fahn... · Deep learning in MATLAB From Concept to CUDA Code Roy

26

GPU Coder for Image Processing and Computer Vision

Distance

transform

Fog removal

SURF feature

extraction

Ray tracing

Stereo disparity

Orders magnitude speedup over CPU

Page 24: Deep learning in MATLAB - GTC On-Demand Featured Talkson-demand.gputechconf.com/gtc-il/2017/presentation/sil7137-roy fahn... · Deep learning in MATLAB From Concept to CUDA Code Roy

27

Alexnet Inference on NVIDIA Titan XP

MATLAB GPU Coder

(R2017b)

TensorFlow (1.2.0)

Caffe2 (0.8.1)

Fra

mes p

er

second

Batch Size

CPU Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz

GPU Pascal Titan Xp

cuDNN v5

Testing platform

mxNet (0.10)

MATLAB (R2017b)

2x 7x5x

Page 25: Deep learning in MATLAB - GTC On-Demand Featured Talkson-demand.gputechconf.com/gtc-il/2017/presentation/sil7137-roy fahn... · Deep learning in MATLAB From Concept to CUDA Code Roy

28

0

50

100

150

200

250

300

350

400

1 16 32 64 128 256

Alexnet Inference on Jetson TX2: Frame-Rate Performance

MATLAB GPU Coder

(R2017b)

Fra

me

s p

er

se

co

nd

Batch Size

C++ Caffe

(1.0.0-rc5)

TensorRT (2.1)

2x

0.85x

Page 26: Deep learning in MATLAB - GTC On-Demand Featured Talkson-demand.gputechconf.com/gtc-il/2017/presentation/sil7137-roy fahn... · Deep learning in MATLAB From Concept to CUDA Code Roy

30

Alexnet Inference on Jetson TX2: Memory PerformanceP

ea

k M

em

ory

(M

B)

Batch Size

MATLAB GPU Coder

(R2017b)

C++ Caffe

(1.0.0-rc5)

TensorRT 2.1

(using giexec wrapper)

Page 27: Deep learning in MATLAB - GTC On-Demand Featured Talkson-demand.gputechconf.com/gtc-il/2017/presentation/sil7137-roy fahn... · Deep learning in MATLAB From Concept to CUDA Code Roy

31

Design Your DNNs in MATLAB, Deploy with GPU Coder

Design Deep

Learning & Vision

Algorithms

High Performance

Deployment

Manage large image sets

Automate image labeling

Easy access to models

Pre-built training

frameworks

Automate compilation

with GPU Coder

On TitanXP: 7x faster than TensorFlow

5x faster than pyCaffe2

On Jetson TX2: On par with TensorRT

2x faster than C++-Caffe

Accelerate and Scale

Training

Acceleration with GPU’s

Scale to clusters

Page 28: Deep learning in MATLAB - GTC On-Demand Featured Talkson-demand.gputechconf.com/gtc-il/2017/presentation/sil7137-roy fahn... · Deep learning in MATLAB From Concept to CUDA Code Roy

32

Check Out Deep Learning in MATLAB and GPU Coder

GPU Coder

Deep learning in MATLAB

systematics.co.il\mwevents