recurrent neural networks: process sequencesbuzzard/ma598-spring2019/lectures… · fei-fei li...

72
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 10 Today: Recurrent Neural Networks Slides adapted from http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf

Upload: others

Post on 01-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201710

Today: Recurrent Neural Networks

Slides adapted from http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf

Page 2: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201711

Vanilla Neural Networks

“Vanilla” Neural Network

Page 3: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201712

Recurrent Neural Networks: Process Sequences

e.g. Image Captioningimage -> sequence of words

Page 4: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201713

Recurrent Neural Networks: Process Sequences

e.g. Sentiment Classificationsequence of words -> sentiment

Page 5: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201714

Recurrent Neural Networks: Process Sequences

e.g. Machine Translationseq of words -> seq of words

Page 6: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201715

Recurrent Neural Networks: Process Sequences

e.g. Video classification on frame level

Page 7: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201716

Sequential Processing of Non-Sequence Data

Ba, Mnih, and Kavukcuoglu, “Multiple Object Recognition with Visual Attention”, ICLR 2015.Gregor et al, “DRAW: A Recurrent Neural Network For Image Generation”, ICML 2015Figure copyright Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra, 2015. Reproduced with permission.

Classify images by taking a series of “glimpses”

Page 8: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201718

Recurrent Neural Network

x

RNN

Page 9: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201719

Recurrent Neural Network

x

RNN

yusually want to predict a vector at some time steps

Page 10: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201720

Recurrent Neural Network

x

RNN

yWe can process a sequence of vectors x by applying a recurrence formula at every time step:

new state old state input vector at some time step

some functionwith parameters W

Page 11: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201721

Recurrent Neural Network

x

RNN

yWe can process a sequence of vectors x by applying a recurrence formula at every time step:

Notice: the same function and the same set of parameters are used at every time step.CNNs share parameters across space; RNNs share across time.

Page 12: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201722

(Vanilla) Recurrent Neural Network

x

RNN

y

The state consists of a single “hidden” vector h:

Page 13: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201723

h0 fW h1

x1

RNN: Computational Graph

Page 14: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201724

h0 fW h1 fW h2

x2x1

RNN: Computational Graph

Page 15: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201725

h0 fW h1 fW h2 fW h3

x3

x2x1

RNN: Computational Graph

hT

Page 16: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201726

h0 fW h1 fW h2 fW h3

x3

x2x1W

RNN: Computational Graph

Re-use the same weight matrix at every time-step

hT

Page 17: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201727

h0 fW h1 fW h2 fW h3

x3

yT

x2x1W

RNN: Computational Graph: Many to Many

hT

y3y2y1

Page 18: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201730

h0 fW h1 fW h2 fW h3

x3

y

x2x1W

RNN: Computational Graph: Many to One

hT

Page 19: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201731

h0 fW h1 fW h2 fW h3

yT

xW

RNN: Computational Graph: One to Many

hT

y3y3y3

Page 20: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201732

Sequence to Sequence: Many-to-one + one-to-many

h0

fWh1

fWh2

fWh3

x3

x2

x1

W1

hT

Many to one: Encode input sequence in a single vector

Page 21: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201733

Sequence to Sequence: Many-to-one + one-to-many

h0

fWh1

fWh2

fWh3

x3

x2

x1

W1

hT

y1

y2

Many to one: Encode input sequence in a single vector

One to many: Produce output sequence from single input vector

fWh1

fWh2

fW

W2

Page 22: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201734

Example: Character-levelLanguage Model

Vocabulary:[h,e,l,o]

Example trainingsequence:“hello”

Page 23: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201735

Example: Character-levelLanguage Model

Vocabulary:[h,e,l,o]

Example trainingsequence:“hello”

Page 24: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201736

Example: Character-levelLanguage Model

Vocabulary:[h,e,l,o]

Example trainingsequence:“hello”

Page 25: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201737

Example: Character-levelLanguage ModelSampling

Vocabulary:[h,e,l,o]

At test-time sample characters one at a time, feed back to model

.03

.13

.00

.84

.25

.20

.05

.50

.11

.17

.68

.03

.11

.02

.08

.79Softmax

“e” “l” “l” “o”Sample

Page 26: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201738

.03

.13

.00

.84

.25

.20

.05

.50

.11

.17

.68

.03

.11

.02

.08

.79Softmax

“e” “l” “l” “o”SampleExample:

Character-levelLanguage ModelSampling

Vocabulary:[h,e,l,o]

At test-time sample characters one at a time, feed back to model

Page 27: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201739

.03

.13

.00

.84

.25

.20

.05

.50

.11

.17

.68

.03

.11

.02

.08

.79Softmax

“e” “l” “l” “o”SampleExample:

Character-levelLanguage ModelSampling

Vocabulary:[h,e,l,o]

At test-time sample characters one at a time, feed back to model

Page 28: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201740

.03

.13

.00

.84

.25

.20

.05

.50

.11

.17

.68

.03

.11

.02

.08

.79Softmax

“e” “l” “l” “o”SampleExample:

Character-levelLanguage ModelSampling

Vocabulary:[h,e,l,o]

At test-time sample characters one at a time, feed back to model

Page 29: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201741

Backpropagation through timeLoss

Forward through entire sequence to compute loss, then backward through entire sequence to compute gradient

Page 30: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201742

Truncated Backpropagation through timeLoss

Run forward and backward through chunks of the sequence instead of whole sequence

Page 31: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201743

Truncated Backpropagation through timeLoss

Carry hidden states forward in time forever, but only backpropagate for some smaller number of steps

Page 32: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201744

Truncated Backpropagation through timeLoss

Page 33: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201745

min-char-rnn.py gist: 112 lines of Python

(https://gist.github.com/karpathy/d4dee566867f8291f086)

Page 34: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201746

x

RNN

y

Page 35: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201747

train more

train more

train more

at first:

Page 36: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201748

Page 37: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201749

The Stacks Project: open source algebraic geometry textbook

Latex source http://stacks.math.columbia.edu/The stacks project is licensed under the GNU Free Documentation License

Page 38: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201750

Page 39: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201751

Page 40: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201752

Page 41: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201753

Generated C code

Page 42: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201754

Page 43: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201755

Page 44: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201756

Searching for interpretable cells

Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016

Page 45: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201758

Searching for interpretable cells

Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission

quote detection cell

Page 46: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201759

Searching for interpretable cells

Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission

line length tracking cell

Page 47: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201763

Explain Images with Multimodal Recurrent Neural Networks, Mao et al.Deep Visual-Semantic Alignments for Generating Image Descriptions, Karpathy and Fei-FeiShow and Tell: A Neural Image Caption Generator, Vinyals et al.Long-term Recurrent Convolutional Networks for Visual Recognition and Description, Donahue et al.Learning a Recurrent Visual Representation for Image Caption Generation, Chen and Zitnick

Image Captioning

Figure from Karpathy et a, “Deep Visual-Semantic Alignments for Generating Image Descriptions”, CVPR 2015; figure copyright IEEE, 2015.Reproduced for educational purposes.

Page 48: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201764

Convolutional Neural Network

Recurrent Neural Network

Page 49: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

test image

Page 50: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

test image

X

Page 51: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

test image

x0<START>

<START>

Page 52: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

h0

x0<START>

y0

<START>

test image

before:h = tanh(Wxh * x + Whh * h)

now:h = tanh(Wxh * x + Whh * h + Wih * v)

v

Wih

Page 53: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

h0

x0<START>

y0

<START>

test image

straw

sample!

Page 54: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

h0

x0<START>

y0

<START>

test image

straw

h1

y1

Page 55: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

h0

x0<START>

y0

<START>

test image

straw

h1

y1

hat

sample!

Page 56: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

h0

x0<START>

y0

<START>

test image

straw

h1

y1

hat

h2

y2

Page 57: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

h0

x0<START>

y0

<START>

test image

straw

h1

y1

hat

h2

y2

sample<END> token=> finish.

Page 58: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201775

A cat sitting on a suitcase on the floor

A cat is sitting on a tree branch

A dog is running in the grass with a frisbee

A white teddy bear sitting in the grass

Two people walking on the beach with surfboards

Two giraffes standing in a grassy field

A man riding a dirt bike on a dirt track

Image Captioning: Example Results

A tennis player in action on the court

Captions generated using neuraltalk2All images are CC0 Public domain: cat suitcase, cat tree, dog, bear, surfers, tennis, giraffe, motorcycle

Page 59: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201776

Image Captioning: Failure Cases

A woman is holding a cat in her hand

A woman standing on a beach holding a surfboard

A person holding a computer mouse on a desk

A bird is perched on a tree branch

A man in a baseball uniform throwing a ball

Captions generated using neuraltalk2All images are CC0 Public domain: fur coat, handstand, spider web, baseball

Page 60: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201787

Visual Question Answering

Agrawal et al, “VQA: Visual Question Answering”, ICCV 2015Zhu et al, “Visual 7W: Grounded Question Answering in Images”, CVPR 2016Figure from Zhu et al, copyright IEEE 2016. Reproduced for educational purposes.

Page 61: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201789

time

depth

Multilayer RNNs

LSTM:

Page 62: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201791

ht-1

xt

W

stack

tanh

ht

Vanilla RNN Gradient FlowBackpropagation from ht to ht-1 multiplies by W (actually Whh

T)

Bengio et al, “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks, 1994Pascanu et al, “On the difficulty of training recurrent neural networks”, ICML 2013

Page 63: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201793

Vanilla RNN Gradient Flow

h0 h1 h2 h3 h4

x1 x2 x3 x4

Largest singular value > 1: Exploding gradients

Largest singular value < 1:Vanishing gradients

Computing gradient of h0 involves many factors of W(and repeated tanh)

Bengio et al, “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks, 1994Pascanu et al, “On the difficulty of training recurrent neural networks”, ICML 2013

Page 64: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201794

Vanilla RNN Gradient Flow

h0 h1 h2 h3 h4

x1 x2 x3 x4

Largest singular value > 1: Exploding gradients

Largest singular value < 1:Vanishing gradients

Gradient clipping: Scale gradient if its norm is too bigComputing gradient

of h0 involves many factors of W(and repeated tanh)

Bengio et al, “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks, 1994Pascanu et al, “On the difficulty of training recurrent neural networks”, ICML 2013

Page 65: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201795

Vanilla RNN Gradient Flow

h0 h1 h2 h3 h4

x1 x2 x3 x4

Computing gradient of h0 involves many factors of W(and repeated tanh)

Largest singular value > 1: Exploding gradients

Largest singular value < 1:Vanishing gradients Change RNN architecture

Bengio et al, “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks, 1994Pascanu et al, “On the difficulty of training recurrent neural networks”, ICML 2013

Page 66: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201796

Long Short Term Memory (LSTM)

Hochreiter and Schmidhuber, “Long Short Term Memory”, Neural Computation 1997

Vanilla RNN LSTM

Page 67: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017

98

ct-1

ht-1

xt

fig

o

W ☉

+ ct

tanh

☉ ht

Long Short Term Memory (LSTM)[Hochreiter et al., 1997]

stack

Page 68: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017

99

ct-1

ht-1

xt

fig

o

W ☉

+ ct

tanh

☉ ht

Long Short Term Memory (LSTM): Gradient Flow[Hochreiter et al., 1997]

stack

Backpropagation from ct to ct-1 only elementwise multiplication by f, no matrix multiply by W

Page 69: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017100

Long Short Term Memory (LSTM): Gradient Flow[Hochreiter et al., 1997]

c0 c1 c2 c3

Uninterrupted gradient flow!

If ft is near 0, then the forward flow is interrupted - acts as a forgetting gate.In this case, the gradient flow backward is also interrupted.

Page 70: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017101

Long Short Term Memory (LSTM): Gradient Flow[Hochreiter et al., 1997]

c0 c1 c2 c3

Uninterrupted gradient flow!

Input

Softm

ax

3x3 conv, 64

7x7 conv, 64 / 2

FC 1000

Pool

3x3 conv, 64

3x3 conv, 643x3 conv, 64

3x3 conv, 643x3 conv, 64

3x3 conv, 1283x3 conv, 128 / 2

3x3 conv, 1283x3 conv, 128

3x3 conv, 1283x3 conv, 128

...

3x3 conv, 643x3 conv, 64

3x3 conv, 643x3 conv, 64

3x3 conv, 643x3 conv, 64

Pool

Similar to ResNet!

Page 71: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017103

Other RNN Variants

[LSTM: A Search Space Odyssey, Greff et al., 2015]

[An Empirical Exploration of Recurrent Network Architectures, Jozefowicz et al., 2015]

GRU [Learning phrase representations using rnn encoder-decoder for statistical machine translation, Cho et al. 2014]

Page 72: Recurrent Neural Networks: Process Sequencesbuzzard/MA598-Spring2019/Lectures… · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017104

Summary- RNNs allow a lot of flexibility in architecture design- Vanilla RNNs are simple but don’t work very well- Common to use LSTM or GRU: their additive interactions

improve gradient flow- Backward flow of gradients in RNN can explode or vanish.

Exploding is controlled with gradient clipping. Vanishing is controlled with additive interactions (LSTM)

- Better/simpler architectures are a hot topic of current research- Better understanding (both theoretical and empirical) is needed.