the long road to model deployment · 2018-04-11 · •hyper-parameters 5 categorical parameters of...

…or how to make a good (machine learning) model great.

The Long Road to Model Deployment

GTC, March 2018Greg Heinrich

22

Exemplar model A less successful trial

Same model design, same data, different parameters.

MODELS IN ACTION

33

Object Detector Concorde

It is as easy as flying a Concorde.

PARAMETER TUNING

• Topology parameters▪ Number of layers and their width

▪ Choice of activations

• Training parameters▪ Learning rate

▪ Batch size

▪ Choice of optimizer

▪ Normalization

▪ Number of iterations

• Data parameters▪ Spatial augmentation

▪ Color augmentation

▪ Rasterization

▪ Number of training samples

Source: Christian Kath

44

Hypothetical example

What is the scale of the problem?

PARAMETER TUNING

• Data▪ 5 cars, 6 cameras, 1 month, 10 frames per

minute per camera → 1.3M images

▪ 60 epochs

• Hyper-parameters▪ 5 categorical parameters of 4 values

▪ 5 continuous parameters

• Quasi-Exhaustive (“Grid”) search▪ Explore only 3 values of the continuous

parameters → 250k jobs

▪ 10ms/image → 6 thousand years (!)

• Random search▪ May reduce total time by orders of

magnitude: 50 jobs → 1.25 years

Workflow

Model

Experiment

Dataset

→→

55

Divide & conquer

• Run the 50 jobs in

parallel

• Use 8 GPUs per job

• Total time → 1 day

Parallelism comes to the rescue

PARAMETER TUNING

Run

Training

Use

Datasets

Analyze

ResultsBuild

Experiments

Dataset

ServiceExperiment

Service

Workflow

Manager

Training Cluster

(10’s of thousands of GPUs)

Metrics

66

One service, multiple identical workers

PARAMETER TUNING

Worker

Experiment/hyperopt Service

Get

ParamsTrain Evaluate

Report

MetricsContinue?

77

Jupyter notebook

Collecting and analyzing results

PARAMETER TUNING

88

Overall parameter sensitivity When Accuracy is over 60%

Which parameters have the most impact on metrics?

PARAMETER TUNING

99

Learning rate Batch size

Zooming in on important parameters

PARAMETER TUNING

1010

Eliminating underperformers

PARAMETER TUNING

1111

Diminishing returns

• Adding data helps

• Increasing accuracy

through more data is

increasingly expensive

Do we need more?

DATA

1212

Active learning

• Collecting data is

relatively cheap

• Annotating data is very

expensive

• Use trained model to

select next frame to

annotate

• Strategies:▪ Maximum variance

▪ Maximum entropy

Selecting better data

DATA

Annotated dataset

Train

Trained model

Select imageto annotate

Annotate

Unannotated dataset

See: Adam Lesnikowsky’s talk on Deep Active Learning

1313

Inference time

• Using DL framework to

run inference

• DrivePX2 platform

• Time/frame (excluding

data loading): 73ms

• 6 cameras → one

prediction every 438ms

Great accuracy, slow response time

SELECTING BEST MODEL

1414

Unpruned network Pruned network

Pruning unimportant neurons

REDUCING MODEL COMPLEXITY

12 neurons, 32 connections

11 neurons, 24 connections

1515

Selecting neurons to prune Workflow

Pruning implementation

REDUCING MODEL COMPLEXITY

• Exhaustive search

• Random

• Minimum weight

• Minimum activations

• Gradient based

Train Prune Re-train

Results

• Min weight method, single pass

• 83% of weights removed

• Inference time: 73ms → 26ms

• 2.7x speed-up

See: Jose Alvarez’s talk on Model Compression

1616

TensorRT conversion

• Inference-specific

optimizations

• Platform-specific

optimizations

Inference optimization

TENSORRT

TensorRT OptimizerTrained Neural Network

Plan 1

Plan 2

Plan 3

Optimized Plans

ImportModel

SerializeEngine

Results (FP32 precision)

• 26ms → 8.5ms/frame

• 3x speed-up

1717

TensorRT conversion

• Store weights and

activations in 8 bits

• Accumulations in FP16

INT8 precision

TENSORRT

Results (FP16 precision)

• 8.5ms → 3.9ms/frame

• 2.2x speed-up

• Even greater speed-up

with Tensor Cores on

Xavier SoC!

Dataset

Pre-

processor

Pre-

processed

images

INT8

calibrator

INT8 cal

file

Train Model

TensorRT

Optimizer

TensorRT

engineEvaluatorMetrics

Process Artifact

Legend

1818

Automated workflows enable traceability

AUTOMATION

Code

BuildData

Loader

SCM Data

TrainPrune Re-Train

Pick best

modelTensorRT Evaluate

Train

Train

Train

Train

Train

Knowledge base

Config

Traceability firewall

1919

Q&A

Fore More information contact:Poonam Chitale ([email protected])NVIDIA AV Perception Infrastructure Product Manager

mailto:[email protected]

the long road to model deployment · 2018-04-11 · •hyper-parameters 5 categorical parameters of...

Documents