deep neural networks a practical optimization algorithm for · a practical optimization algorithm...

30
Neumann Optimizer A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception Research

Upload: others

Post on 28-Mar-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception

Neumann OptimizerA Practical Optimization Algorithm for

Deep Neural Networks

Shankar Krishnan, Ying Xiao, Rif A. SaurousGoogle AI Perception Research

Page 2: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception

Training Deep Networks

● Loss function is non-convex

● Finite sample size

● Lower loss doesn’t necessarily imply better model performance

● Computational hurdles

Page 3: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception

Iterative Methods

Create function approximation around the current iterate

Different methods result depending on information used for the approximation

Page 4: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception

Iterative Methods

Create function approximation around the current iterate

Different methods result depending on information used for the approximation

1st order:

Page 5: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception

Iterative Methods

Create function approximation around the current iterate

Different methods result depending on information used for the approximation

1st order:

2nd order:

Page 6: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception

Iterative Methods

Create function approximation around the current iterate

Different methods result depending on information used for the approximation

1st order:

2nd order:

Cubic reg.:

Page 7: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception

Neumann Optimizer

● Designed for large batch setting

● Incorporates second order information, without explicitly representing

the Hessian

● Based on cubic regularized approximation

● Total per step cost is same as Adam

Page 8: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception

Neumann Optimizer - Results

● Linear speedup with increasing batch size

● 0.8 - 0.9% test accuracy improvements over baseline on image

models

● 10-30% faster than current methods for comparable computational

resources

Page 9: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception

Two Loop Algorithm

● For each mini-batch: solve the quadratic problem of the mini-batch to “good” accuracy.

● Use the solutions of the quadratic problem as updates.

● Hypothesis: with very large mini-batches, the cost-benefit is in favour of extracting more information.

Page 10: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception

Inner loop: solving a quadratic via power series

How to solve the quadratic?

Use the geometric (Neumann) series for inverse, if

This implies the iteration:

Page 11: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception

Eliminate the Hessian calculation (too expensive)

Back to the mini-batch problem:

This implies an iteration:

Page 12: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception

Convex Quadratics

Page 13: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception

Some tweaks for actual neural networksCubic regularizer (convexification of overall problem) + Repulsive regularizer

Convexification of each mini-batch.

Page 14: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception

Spectrum of the Hessian

● Studied on various image models● Lanczos method during

optimization● Similar behavior of extreme

eigenvalues● Consistent with other studies

[Sagun ‘16, Chaudhari ‘16, Dauphin ‘14]

Page 15: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception

Spectrum of the Hessian

● Studied on various image models● Lanczos method during

optimization● Similar behavior of extreme

eigenvalues● Consistent with other studies

[Sagun ‘16, Chaudhari ‘16, Dauphin ‘14]

Page 16: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception

Our Algorithm

Page 17: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception

Experimental Results

● Used image models for experimentation

○ Cifar10, Cifar100, Imagenet

● Various network architectures

○ Resnet-V1, Inception-V3, Inception-Resnet-V2

● Training on GPUs (P100, K40)

○ Multiple GPUs in sync mode for large batch training

● Update steps/epochs to measure performance

Page 18: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception

Experiments on ImageNet (Inception V3)

Page 19: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception

Experiments on ImageNet (Resnet Architectures)

Page 20: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception

Final Top-1 Validation Error

Page 21: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception

Large batch training

Page 22: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception

Scaling Performance on Resnet-50

Page 23: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception

Mini-batch Training

Stochastic methods use unbiased estimates to carry out optimization

For first order methods with batch size B,

Page 24: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception

Mini-batch Training

Stochastic methods use unbiased estimates to carry out optimization

For first order methods with batch size B,

Page 25: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception

Mini-batch Training

Stochastic methods use unbiased estimates to carry out optimization

For first order methods with batch size B,

After T steps:

Page 26: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception

Mini-batch Training

Stochastic methods use unbiased estimates to carry out optimization

For first order methods with batch size B,

After T steps:

Page 27: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception

Conclusion and Future Work

● Stochastic optimization algorithm for training deep nets

● Exhibits linear speedup with batch size

● Gives better model performance

● Total per step cost on par with existing popular optimizers

Page 28: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception

Conclusion and Future Work

● Stochastic optimization algorithm for training deep nets

● Exhibits linear speedup with batch size

● Gives better model performance

● Total per step cost on par with existing popular optimizers

● Other model architectures and models

○ Preconditioned Neumann

● Experiment on TPUs

Page 29: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception

Thank you!

Paper: https://goo.gl/7M9Avr

Code: Tensorflow implementation coming soon

Page 30: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception

Ablation Experiment