neural optimizer search with reinforcement learning2017/11/09 · 1(op 1);u 2(op 2)) irwan bello,...

Neural Optimizer Search with Reinforcement Learning

Irwan Bello1 Barret Zoph1 Vijay Vasudevan1 Quoc V. Le1

1Google Brain

ICLR, 2017/ Presenter: Anant Kharkar

Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le (Northwestern University and Intel Corporation)Neural Optimizer Search with Reinforcement LearningICLR, 2017/ Presenter: Anant Kharkar 1

Outline

1 IntroductionMotivationApproach

2 MethodsDomain-Specific LanguageController RNN

3 ExperimentsOptimizer DiscoveryTransfer Experiment

4 Summary

Outline

4 Summary

Motivation

Classical optimizers:

SGD w/Momentum

RMSProp

Combination of stochastic methods and heuristic approximations

Want to automate process of generating update rulesProduce equation, not just numerical updates

Outline

4 Summary

Approach

RNN controller produces update rule string

Controller updated based on performance of optimizer

RL approach to training

How to generate update rules? First define space of update rules

Previous Work

LSTM for numerical updates (Andrychowicz et al., 2016)

Equations are more transferrable

Genetic programming for update equations (Orchard & Wang, 2016)

Slow and needs heuristics

Neural Architecture Search (Zoph & Le, 2017) - seen earlier

RNN produces network architecture

Outline

4 Summary

Domain-Specific Language

Each optimizer has computational graph - binary expression tree

Components:

2 operands

Unary function for each operand

Binary function to combine

∆w = λ ∗ b(u1(op1), u2(op2))

Outline

4 Summary

Controller RNN

Trained with Adam

Objective function: J(θ) = E∆∼pθ(.)[R(∆)]Optimize reward (accuracy of target model)

Search Space

Operands

Gradients: g , g2, g3, sign(g)Moving averages: m, v , y , sign(m)Weights: 10−4w , 10−3w , 10−2w , 10−1wADAM, RMSProp, 1, small noise

Unary Functions

x ,−x , ex , log |x |, clip, drop, signBinary Functions

x + y , x − y , x ∗ y , xy+ε

Optimizers tested on 3x3 ConvNet (32 filters) for 5 epochs

Favors early progress

Outline

4 Summary

Optimizer Discovery

Recurring element:esign(g)∗sign(m) ∗ g

If sign(g) agrees with running average, scale e - g keeps decreasing

Else scale 1e - gradient direction changed

Outline

4 Summary

Rosenbrock Function

CIFAR-10

Wide ResNet (Zagoruyko & Komodakis, 2016) - 300 epochs

CIFAR-10

Neural Machine Translation

Completely different model & task: WMT 2014 English → German taskGNMT model - 8 LSTM layers

Summary

RNN generates optimizer equations

Train RNN via RL setup

Optimizers tested on small ConvNet

New optimizers on par with state of the art

neural optimizer search with reinforcement learning2017/11/09 · 1(op 1);u 2(op 2)) irwan bello,...

Documents

recurrent neural networks for statistical machine...

neural and fuzzy neural networks

artificial neural networks. introduction to neural networks

neural communication: the neural chain

arxiv:2006.06882v1 [cs.cv] 11 jun 2020rethinking...

artificial neural networks lect1: introduction & neural...

2017 illinois agricultural education report learning il ag...

neural codes and neural rings: topology and algebraic...

artiﬁcial neural networks and fuzzy neural networks for...

metalearned neural memory - neural information processing...

what is a neural network - 123seminarsonly.com4 neural...

swish: a self-gated activation function · 2017-10-18 ·...

a neural-network architecture for syntax analysis - neural...

english - livewire learning2017-2-13 · ©2012 live-wire...

jon shlens, google brain team vijay vasudevan,...

cns development. stages in neural tube development neural...

chapter 2 introduction to neural...

explainable neural computation via stack neural module...

deep parametric continuous convolutional neural...

neural communication: the neural impulse