deep learning cnn - rnn - pytorch - university of rochesterzduan/teaching/ece477/lectures...2016 :...
TRANSCRIPT
![Page 1: Deep Learning CNN - RNN - Pytorch - University of Rochesterzduan/teaching/ece477/lectures...2016 : AlphaGo beatsLeeSedol on Go game 2016 : WaveNet synthesizes high-fidelity speech](https://reader035.vdocument.in/reader035/viewer/2022070115/60b15fa606cb17014f40de06/html5/thumbnails/1.jpg)
Deep Learning CNN - RNN - Pytorch
Christodoulos Benetatos2019
![Page 2: Deep Learning CNN - RNN - Pytorch - University of Rochesterzduan/teaching/ece477/lectures...2016 : AlphaGo beatsLeeSedol on Go game 2016 : WaveNet synthesizes high-fidelity speech](https://reader035.vdocument.in/reader035/viewer/2022070115/60b15fa606cb17014f40de06/html5/thumbnails/2.jpg)
MLP - Pytorch
![Page 3: Deep Learning CNN - RNN - Pytorch - University of Rochesterzduan/teaching/ece477/lectures...2016 : AlphaGo beatsLeeSedol on Go game 2016 : WaveNet synthesizes high-fidelity speech](https://reader035.vdocument.in/reader035/viewer/2022070115/60b15fa606cb17014f40de06/html5/thumbnails/3.jpg)
Convolutional Neural Nets● 2012 : AlexNet achieves state-of-the-art results on ImageNet● 2013 : DQN beats humans on 3 Atari games● 2014 : GaussianFace surpasses humans on face detection● 2015 : PReLU-Net surpasses humans on ImageNet● 2016 : AlphaGo beats Lee Sedol on Go game● 2016 : WaveNet synthesizes high-fidelity speech
![Page 4: Deep Learning CNN - RNN - Pytorch - University of Rochesterzduan/teaching/ece477/lectures...2016 : AlphaGo beatsLeeSedol on Go game 2016 : WaveNet synthesizes high-fidelity speech](https://reader035.vdocument.in/reader035/viewer/2022070115/60b15fa606cb17014f40de06/html5/thumbnails/4.jpg)
Convolution
![Page 5: Deep Learning CNN - RNN - Pytorch - University of Rochesterzduan/teaching/ece477/lectures...2016 : AlphaGo beatsLeeSedol on Go game 2016 : WaveNet synthesizes high-fidelity speech](https://reader035.vdocument.in/reader035/viewer/2022070115/60b15fa606cb17014f40de06/html5/thumbnails/5.jpg)
Why Convolution ?
1. Sparse Connections○ Output units are calculated from a small
neighborhood of input units
2. Parameter Sharing○ Each member (weight) of the kernel is
used at every position of the input
![Page 6: Deep Learning CNN - RNN - Pytorch - University of Rochesterzduan/teaching/ece477/lectures...2016 : AlphaGo beatsLeeSedol on Go game 2016 : WaveNet synthesizes high-fidelity speech](https://reader035.vdocument.in/reader035/viewer/2022070115/60b15fa606cb17014f40de06/html5/thumbnails/6.jpg)
Why Convolution ?
![Page 7: Deep Learning CNN - RNN - Pytorch - University of Rochesterzduan/teaching/ece477/lectures...2016 : AlphaGo beatsLeeSedol on Go game 2016 : WaveNet synthesizes high-fidelity speech](https://reader035.vdocument.in/reader035/viewer/2022070115/60b15fa606cb17014f40de06/html5/thumbnails/7.jpg)
Why Convolution ? 3. Equivariant to translation
○ Shifting input by x, results in a shift in the output feature map by x. f(g(x)) = g(f(x))
http://visual.cs.ucl.ac.uk/pubs/harmonicNets/pdfs/worrallEtAl2017.pdf
![Page 8: Deep Learning CNN - RNN - Pytorch - University of Rochesterzduan/teaching/ece477/lectures...2016 : AlphaGo beatsLeeSedol on Go game 2016 : WaveNet synthesizes high-fidelity speech](https://reader035.vdocument.in/reader035/viewer/2022070115/60b15fa606cb17014f40de06/html5/thumbnails/8.jpg)
Convolution LayerHyperparameters : channel_in, channel_out, kernel_size, zero_padInput : (H_in, W_in, channel_in)Output : (H_out, W_out, channel_out)
https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/convolutional_neural_networks.html
![Page 9: Deep Learning CNN - RNN - Pytorch - University of Rochesterzduan/teaching/ece477/lectures...2016 : AlphaGo beatsLeeSedol on Go game 2016 : WaveNet synthesizes high-fidelity speech](https://reader035.vdocument.in/reader035/viewer/2022070115/60b15fa606cb17014f40de06/html5/thumbnails/9.jpg)
Pooling Layers● Reduce spatial size ● Reduce parameters
![Page 10: Deep Learning CNN - RNN - Pytorch - University of Rochesterzduan/teaching/ece477/lectures...2016 : AlphaGo beatsLeeSedol on Go game 2016 : WaveNet synthesizes high-fidelity speech](https://reader035.vdocument.in/reader035/viewer/2022070115/60b15fa606cb17014f40de06/html5/thumbnails/10.jpg)
CNN ArchitecturesINPUT -> [[CONV -> RELU]*N -> POOL?]*M -> [FC -> RELU]*K -> FC
224 x 224 x 3
![Page 11: Deep Learning CNN - RNN - Pytorch - University of Rochesterzduan/teaching/ece477/lectures...2016 : AlphaGo beatsLeeSedol on Go game 2016 : WaveNet synthesizes high-fidelity speech](https://reader035.vdocument.in/reader035/viewer/2022070115/60b15fa606cb17014f40de06/html5/thumbnails/11.jpg)
CNN for Audio● Apply 1D convolution on audio samples (Wavenet)● Audio → Spectrogram → Treat spectrogram as an image
Applications :
● Classification/Identification: sound, genre, instrument, speaker, etc.● Source Separation: mask prediction● Generation: predict the next audio sample
Disadvantages:
● In images, neighbor pixels belong to the same object, not the same for spectrograms.
● CNNs are applied in magnitude, and not phase● CNNs do not exploit the temporal information
![Page 12: Deep Learning CNN - RNN - Pytorch - University of Rochesterzduan/teaching/ece477/lectures...2016 : AlphaGo beatsLeeSedol on Go game 2016 : WaveNet synthesizes high-fidelity speech](https://reader035.vdocument.in/reader035/viewer/2022070115/60b15fa606cb17014f40de06/html5/thumbnails/12.jpg)
Feed forward NNs on Sequential Data
● Limited Memory● Fixed window size L● Increasing L →
Parameters increasefast
Time Series Analysis: Forecasting and Control by Box and Jenkins (1976)
![Page 13: Deep Learning CNN - RNN - Pytorch - University of Rochesterzduan/teaching/ece477/lectures...2016 : AlphaGo beatsLeeSedol on Go game 2016 : WaveNet synthesizes high-fidelity speech](https://reader035.vdocument.in/reader035/viewer/2022070115/60b15fa606cb17014f40de06/html5/thumbnails/13.jpg)
Feed forward NNs on Sequential Data
● Limited Memory● Fixed window size L● Increasing L →
Parameters increasefast
![Page 14: Deep Learning CNN - RNN - Pytorch - University of Rochesterzduan/teaching/ece477/lectures...2016 : AlphaGo beatsLeeSedol on Go game 2016 : WaveNet synthesizes high-fidelity speech](https://reader035.vdocument.in/reader035/viewer/2022070115/60b15fa606cb17014f40de06/html5/thumbnails/14.jpg)
Recurrent NNs on Sequential Data
● Unlimited Memory(in theory)
● Variable input length● Increasing L →
Parameters remainthe same
![Page 15: Deep Learning CNN - RNN - Pytorch - University of Rochesterzduan/teaching/ece477/lectures...2016 : AlphaGo beatsLeeSedol on Go game 2016 : WaveNet synthesizes high-fidelity speech](https://reader035.vdocument.in/reader035/viewer/2022070115/60b15fa606cb17014f40de06/html5/thumbnails/15.jpg)
RNN
Unroll in time
At some timesteps we may want to generate an output
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
![Page 16: Deep Learning CNN - RNN - Pytorch - University of Rochesterzduan/teaching/ece477/lectures...2016 : AlphaGo beatsLeeSedol on Go game 2016 : WaveNet synthesizes high-fidelity speech](https://reader035.vdocument.in/reader035/viewer/2022070115/60b15fa606cb17014f40de06/html5/thumbnails/16.jpg)
Bidirectional RNNs
http://colah.github.io/posts/2015-09-NN-Types-FP/img/RNN-bidirectional.png
Forward States
Backward States
![Page 17: Deep Learning CNN - RNN - Pytorch - University of Rochesterzduan/teaching/ece477/lectures...2016 : AlphaGo beatsLeeSedol on Go game 2016 : WaveNet synthesizes high-fidelity speech](https://reader035.vdocument.in/reader035/viewer/2022070115/60b15fa606cb17014f40de06/html5/thumbnails/17.jpg)
Stacked RNNs
![Page 18: Deep Learning CNN - RNN - Pytorch - University of Rochesterzduan/teaching/ece477/lectures...2016 : AlphaGo beatsLeeSedol on Go game 2016 : WaveNet synthesizes high-fidelity speech](https://reader035.vdocument.in/reader035/viewer/2022070115/60b15fa606cb17014f40de06/html5/thumbnails/18.jpg)
BackPropagation Through Time● Same as regular backpropagation →
repeatedly apply chain rule● For Why, we propagate
on the vertical axis
Easy to calc
![Page 19: Deep Learning CNN - RNN - Pytorch - University of Rochesterzduan/teaching/ece477/lectures...2016 : AlphaGo beatsLeeSedol on Go game 2016 : WaveNet synthesizes high-fidelity speech](https://reader035.vdocument.in/reader035/viewer/2022070115/60b15fa606cb17014f40de06/html5/thumbnails/19.jpg)
BackPropagation Through Time● Same as regular backpropagation →
repeatedly apply chain rule● For Whh and Wxh, we propagate
On the horizontal time axis
It also depends on Whh
![Page 20: Deep Learning CNN - RNN - Pytorch - University of Rochesterzduan/teaching/ece477/lectures...2016 : AlphaGo beatsLeeSedol on Go game 2016 : WaveNet synthesizes high-fidelity speech](https://reader035.vdocument.in/reader035/viewer/2022070115/60b15fa606cb17014f40de06/html5/thumbnails/20.jpg)
● Computing gradient for the first timestepIncludes multiple factors of Whh matrix for each timestep.
● norm(Whh)>1 → Exploding Gradient○ Clip gradients
● norm(Whh)<1 → Vanishing Gradient○ Truncated BPTT○ Gated architectures
Vanishing - Exploding Gradients
![Page 21: Deep Learning CNN - RNN - Pytorch - University of Rochesterzduan/teaching/ece477/lectures...2016 : AlphaGo beatsLeeSedol on Go game 2016 : WaveNet synthesizes high-fidelity speech](https://reader035.vdocument.in/reader035/viewer/2022070115/60b15fa606cb17014f40de06/html5/thumbnails/21.jpg)
Truncated BPTT (stateful)
![Page 22: Deep Learning CNN - RNN - Pytorch - University of Rochesterzduan/teaching/ece477/lectures...2016 : AlphaGo beatsLeeSedol on Go game 2016 : WaveNet synthesizes high-fidelity speech](https://reader035.vdocument.in/reader035/viewer/2022070115/60b15fa606cb17014f40de06/html5/thumbnails/22.jpg)
Truncated BPTT (stateless)
![Page 23: Deep Learning CNN - RNN - Pytorch - University of Rochesterzduan/teaching/ece477/lectures...2016 : AlphaGo beatsLeeSedol on Go game 2016 : WaveNet synthesizes high-fidelity speech](https://reader035.vdocument.in/reader035/viewer/2022070115/60b15fa606cb17014f40de06/html5/thumbnails/23.jpg)
Gated Architectures - LSTM● Idea : Allow gradients to also flow unchanged.● Cell state is the internal memory● Three Gates perform delete/write/read
operations on memory
![Page 24: Deep Learning CNN - RNN - Pytorch - University of Rochesterzduan/teaching/ece477/lectures...2016 : AlphaGo beatsLeeSedol on Go game 2016 : WaveNet synthesizes high-fidelity speech](https://reader035.vdocument.in/reader035/viewer/2022070115/60b15fa606cb17014f40de06/html5/thumbnails/24.jpg)
Application : Source Separation
![Page 25: Deep Learning CNN - RNN - Pytorch - University of Rochesterzduan/teaching/ece477/lectures...2016 : AlphaGo beatsLeeSedol on Go game 2016 : WaveNet synthesizes high-fidelity speech](https://reader035.vdocument.in/reader035/viewer/2022070115/60b15fa606cb17014f40de06/html5/thumbnails/25.jpg)
LSTM - Pytorch
![Page 26: Deep Learning CNN - RNN - Pytorch - University of Rochesterzduan/teaching/ece477/lectures...2016 : AlphaGo beatsLeeSedol on Go game 2016 : WaveNet synthesizes high-fidelity speech](https://reader035.vdocument.in/reader035/viewer/2022070115/60b15fa606cb17014f40de06/html5/thumbnails/26.jpg)
Model ValidationSplit the dataset in three subsets
● Training Set : Data used for learning, namely to fit the parameters (weights) of the model
● Validation Set : Data used to tune the design parameters [i.e., architecture, not weights] of a model (hidden units, layers, batch size, etc.). Also used to prevent overfitting
● Test Set : Data used to evaluate the generalization of the model on unseen data
![Page 27: Deep Learning CNN - RNN - Pytorch - University of Rochesterzduan/teaching/ece477/lectures...2016 : AlphaGo beatsLeeSedol on Go game 2016 : WaveNet synthesizes high-fidelity speech](https://reader035.vdocument.in/reader035/viewer/2022070115/60b15fa606cb17014f40de06/html5/thumbnails/27.jpg)
Overfitting - Regularization- When model capacity is very large → memorizing input data instead of learning
useful features- “Regularization is any modification we make to a learning algorithm that is
intended to reduce its generalization error but not its training error” - Deep Learning 2016
- Goal of regularization is to prevent overfitting
● Some regularization methods: ○ Early stopping○ Adding noise to train data○ Penalize the norm of weights○ Data Set Augmentation ○ Dropout
![Page 28: Deep Learning CNN - RNN - Pytorch - University of Rochesterzduan/teaching/ece477/lectures...2016 : AlphaGo beatsLeeSedol on Go game 2016 : WaveNet synthesizes high-fidelity speech](https://reader035.vdocument.in/reader035/viewer/2022070115/60b15fa606cb17014f40de06/html5/thumbnails/28.jpg)
SGD Algorithm
● m=1 : Stochastic Gradient Descent (SGD)● m<dataset : Minibatch SGD● m=dataset : Gradient Descent
Learning Rate
![Page 29: Deep Learning CNN - RNN - Pytorch - University of Rochesterzduan/teaching/ece477/lectures...2016 : AlphaGo beatsLeeSedol on Go game 2016 : WaveNet synthesizes high-fidelity speech](https://reader035.vdocument.in/reader035/viewer/2022070115/60b15fa606cb17014f40de06/html5/thumbnails/29.jpg)
SGD Pytorch Code - Feedforward NN
![Page 30: Deep Learning CNN - RNN - Pytorch - University of Rochesterzduan/teaching/ece477/lectures...2016 : AlphaGo beatsLeeSedol on Go game 2016 : WaveNet synthesizes high-fidelity speech](https://reader035.vdocument.in/reader035/viewer/2022070115/60b15fa606cb17014f40de06/html5/thumbnails/30.jpg)
SGD Pytorch Code - RNN