neural optimizer search with reinforcement learning2017/11/09  · 1(op 1);u 2(op 2)) irwan bello,...

Post on 06-Aug-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Neural Optimizer Search with Reinforcement Learning

Irwan Bello1 Barret Zoph1 Vijay Vasudevan1 Quoc V. Le1

1Google Brain

ICLR, 2017/ Presenter: Anant Kharkar

Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 1

/ 20

Outline

1 IntroductionMotivationApproach

2 MethodsDomain-Specific LanguageController RNN

3 ExperimentsOptimizer DiscoveryTransfer Experiment

4 Summary

Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 2

/ 20

Outline

1 IntroductionMotivationApproach

2 MethodsDomain-Specific LanguageController RNN

3 ExperimentsOptimizer DiscoveryTransfer Experiment

4 Summary

Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 3

/ 20

Motivation

Classical optimizers:

SGD

SGD w/Momentum

Adam

RMSProp

Combination of stochastic methods and heuristic approximations

Want to automate process of generating update rulesProduce equation, not just numerical updates

Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 4

/ 20

Outline

1 IntroductionMotivationApproach

2 MethodsDomain-Specific LanguageController RNN

3 ExperimentsOptimizer DiscoveryTransfer Experiment

4 Summary

Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 5

/ 20

Approach

RNN controller produces update rule string

Controller updated based on performance of optimizer

RL approach to training

How to generate update rules? First define space of update rules

Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 6

/ 20

Previous Work

LSTM for numerical updates (Andrychowicz et al., 2016)

Equations are more transferrable

Genetic programming for update equations (Orchard & Wang, 2016)

Slow and needs heuristics

Neural Architecture Search (Zoph & Le, 2017) - seen earlier

RNN produces network architecture

Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 7

/ 20

Outline

1 IntroductionMotivationApproach

2 MethodsDomain-Specific LanguageController RNN

3 ExperimentsOptimizer DiscoveryTransfer Experiment

4 Summary

Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 8

/ 20

Domain-Specific Language

Each optimizer has computational graph - binary expression tree

Components:

2 operands

Unary function for each operand

Binary function to combine

∆w = λ ∗ b(u1(op1), u2(op2))

Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 9

/ 20

Outline

1 IntroductionMotivationApproach

2 MethodsDomain-Specific LanguageController RNN

3 ExperimentsOptimizer DiscoveryTransfer Experiment

4 Summary

Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 10

/ 20

Controller RNN

Trained with Adam

Objective function: J(θ) = E∆∼pθ(.)[R(∆)]Optimize reward (accuracy of target model)

Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 11

/ 20

Search Space

Operands

Gradients: g , g2, g3, sign(g)Moving averages: m, v , y , sign(m)Weights: 10−4w , 10−3w , 10−2w , 10−1wADAM, RMSProp, 1, small noise

Unary Functions

x ,−x , ex , log |x |, clip, drop, signBinary Functions

x + y , x − y , x ∗ y , xy+ε

Optimizers tested on 3x3 ConvNet (32 filters) for 5 epochs

Favors early progress

Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 12

/ 20

Outline

1 IntroductionMotivationApproach

2 MethodsDomain-Specific LanguageController RNN

3 ExperimentsOptimizer DiscoveryTransfer Experiment

4 Summary

Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 13

/ 20

Optimizer Discovery

Recurring element:esign(g)∗sign(m) ∗ g

If sign(g) agrees with running average, scale e - g keeps decreasing

Else scale 1e - gradient direction changed

Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 14

/ 20

Outline

1 IntroductionMotivationApproach

2 MethodsDomain-Specific LanguageController RNN

3 ExperimentsOptimizer DiscoveryTransfer Experiment

4 Summary

Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 15

/ 20

Rosenbrock Function

Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 16

/ 20

CIFAR-10

Wide ResNet (Zagoruyko & Komodakis, 2016) - 300 epochs

Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 17

/ 20

CIFAR-10

Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 18

/ 20

Neural Machine Translation

Completely different model & task: WMT 2014 English → German taskGNMT model - 8 LSTM layers

Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 19

/ 20

Summary

RNN generates optimizer equations

Train RNN via RL setup

Optimizers tested on small ConvNet

New optimizers on par with state of the art

Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 20

/ 20

top related