of deep learning introduction to practical aspects · basic pipeline 1. (installation and import)...

30
Introduction to practical aspects of Deep Learning Yuping Luo

Upload: others

Post on 25-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Introduction to practical aspects of Deep LearningYuping Luo

Page 2: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Introduction to practical aspects of Deep Learning PyTorchYuping Luo

Page 3: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

● Performance;

● Fewer bugs;

● Code reuse (backpropagation, convolution, etc.);

● Community;

● ...

Why using a framework?

Page 4: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Basic Pipeline

1. (Installation and Import)

2. Data Loading

3. Network Architecture

4. Optimization(train)

5. Evaluation

Page 5: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Installation and Import

$ pip install torch torchvision

# Optimizers

# Basic Tensor operations

# Modules and layers (class style)

# Modules and layers (function style)

Page 6: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Data Loading

Data preprocessing

Page 7: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Network Architecture and Optimizer

Move parameters to device (cpu or cuda)

Page 8: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Training

Use train mode: Dropout/BatchNorm/etc.

compute gradient

apply gradient

Page 9: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Evaluation

+ Average over multiple random seeds!

don’t need gradient

Use train mode: Dropout/BatchNorm/etc.

Page 10: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Main Loop

Page 11: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Run!

Page 12: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

A container of parameters.

nn.Module: Flexible Network Architecture

Nested module

Page 13: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

nn.Module: Flexible Network Architecture

He, Kaiming, et al.

Page 14: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Inplace Operations

Do NOT use inplace operations if you require grads.

Live examples!

Page 15: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

GPU

1. GPU/CPU interaction is slow.

2. Large batch size.

a. Data parallel (mostly used)

b. Async gradient update (ASGD, etc.)

3. Floating point: 32-bit (float) vs 64-bit (double) vs 16-bit (half)

4. nvidia-smi

5. Async preprocessing (by CPU).

Page 16: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Reproducibility

1. Easier to debug.

2. Fix random seed!

3. Make a copy of source code / command lines.

Page 17: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

...TensorFlow

1. Fewer calls to sess.run due to large overhead in TensorFlow.

2. Debug?

a. tf.Print

b. sess.run(‘Add:0’)

c. ...or eager mode

Page 18: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Overfitting

1. Regularization (L2, etc.);

2. Dropout;

3. Data augmentation;

4. Smaller network;

5. Early stop;

6. ...

Page 19: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Hyperparameter Tuning

1. Coordinate Descent;

2. Grid Search;

3. Random Search;

4. …

Page 20: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Advanced Techniques

Page 21: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Hessian-vector product

+ Why not storing the whole matrix?

+ Quadratic Form

+ Minimizer

+ Conjugate Gradient only requires to compute Hv

Page 22: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Hessian-vector product

symbolic

Page 23: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Hessian-vector product

Page 24: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Hessian-vector product

Page 25: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Gradient Checkpointing

From Rhu, Minsoo, et al.

Page 26: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Gradient Checkpointing

https://github.com/openai/gradient-checkpointing

Page 27: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Gradient Checkpointing

Page 28: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Gradient Checkpointing

Re-compute

Page 29: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Gradient Checkpointing

Re-compute

Page 30: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Gradient Checkpointing