introduction to deep learning · neural network training: (stochastic) gradient descent 20 1....

37
Introduction to Deep Learning Quan Geng Columbia University October 25, 2019 1

Upload: others

Post on 28-Feb-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input

Introduction to Deep Learning

Quan Geng

Columbia UniversityOctober 25, 2019

1

Page 2: Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input

Outline

● Background of myself● Motivation: Success of Deep Learning● Basics of Deep Learning

○ Neural networks: Neuron, activation function○ Optimizers: (Stochastic) Gradient Descent ○ Backpropagation○ Convolutional Neural Network

● Applications of Deep Learning○ Personal Photo Search, Search Ranking, Smart Reply, ...

● Summary2

Page 3: Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input

Reference

● Jeffrey Hinton, Yoshua Bengio, and Yann LeCun, Tutorial on Deep Learning

● Jeff Dean, Trends and Developments in Deep Learning Research● Jeff Dean, Large-Scale Deep Learning With TensorFlow● Ian Goodfellow and Yoshua Bengio and Aaron Courville, Deep Learning

(textbook)

● Google, Machine Learning Crash Course

3

Page 4: Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input

Background of myself

● 2013 PhD from ECE Dept., University of Illinois Urbana Champaign● 2014 - 2015 Quantitative Analyst, Tower Research, New York● 2015 - now Senior Software Engineer, Google Research, New York

● Homepage: https://dreaven.github.io/

4

Page 5: Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input

Machine Learning Jobs

5

https://www.quanwei.tech/?job=machine+learning

Page 6: Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input

Motivation: Success of Deep Learning

6

Page 7: Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input

2018 ACM Turing Awards

7https://awards.acm.org/about/2018-turing

ACM named Yoshua Bengio, Geoffrey Hinton, and Yann LeCun recipients of the 2018 ACM A.M. Turing Award for conceptual and engineering breakthroughs that have made deep neural networks a critical component of computing.

In recent years, deep learning methods have been responsible for astonishing breakthroughs in computer vision, speech recognition, natural language processing, and robotics.

Page 9: Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input

First major success of Neural Networks: AlexNet

9

ImageNet● 15M labeled high-resolution images● 22,000 categories.

ImageNet Large Scale Visual Recognition Challenge (ILSVRC)● A subset of ImageNet● 1000 Images in each of 1000 Categories● 1.2M training images● 50K validation images● 150K testing images

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, 2012

Page 10: Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input

Neural networks dominate ImageNet competitions

10

Page 11: Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input

Trend of Deep Learning papers

11Evolution on the number of papers published on Deep Learning topics with respect to those on Deep Learning in BioInformatics (source ink) .

Page 12: Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input

Deep Learning for High Frequency Trading

12http://www.hudson-trading.com/careers/job/?gh_jid=940856

Page 13: Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input

Basics of Deep Learning

13

Page 14: Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input

14https://cs.nyu.edu/~yann/talks/lecun-ranzato-icml2013.pdf

Page 15: Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input

15

Page 16: Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input

16

Page 17: Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input

Neuron in Human Brain

17https://training.seer.cancer.gov/anatomy/nervous/tissue.html

Page 18: Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input

Artificial Neuron in Deep Learning

18

Activation functions introduces non-linearity in the model. Commonly used

● Sigmoid functions● Rectified linear unit (ReLU)

Page 19: Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input

Feed-Forward Neural Networks (FFNN)

19Input layer

Hidden layersOutput layer

Page 20: Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input

Neural network training: (Stochastic) Gradient Descent

20

● 1. Randomly initialize the weights in the neural networks.● 2. Given a batch or all input data, compute the predicted output.● 3. Compute the loss of the actual output and the predicted

output.● 4. Compute the gradient for each weight in the neural network.● 5. Update the weights based on the gradient.

Repeat from step 2 until convergence.

Page 21: Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input

Neural network training: Backpropagation

21

4. Compute the gradient for each weight in the neural network.

video link

Page 22: Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input

Neural network training: Optimizers for Gradient descent

22https://towardsdatascience.com/types-of-optimization-algorithms-used-in-neural-networks-and-ways-to-optimize-gradient-95ae5d39529f

Page 23: Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input

Properties for Neural network training

23

Properties of Gradient Descent algorithms● Does not guarantee convergence to the global minima● Random initialization converges to different local minima● But performs very well in practice

Techniques to avoid overfitting● Weight regularization● Dropout● Early-stop

video link

Page 24: Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input

Convolutional Neural Network

24

Page 25: Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input

Convolutional Neural Network

25

Page 26: Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input

Applications of Deep Learning

26

Page 27: Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input

Deep Learning Frameworks

27https://medium.com/@NirantK/the-silent-rise-of-pytorch-ecosystem-693e74b33f1e

Page 28: Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input

TensorFlow

28

TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications. Released by Google Brain in 2015.

Page 29: Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input

TensorFlow Example: MNIST

29

Link

Page 30: Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input

30

Page 31: Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input

31

Page 32: Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input

32

Page 33: Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input

33

Page 34: Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input

34

Page 35: Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input

35

Page 36: Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input

Summary

36

● Motivation: Success of Deep Learning● Basics of Deep Learning (DL)

○ Neural networks (NN): Neuron, activation function○ Optimizers: (Stochastic) Gradient Descent ○ Backpropagation○ Convolutional Neural Network

● Applications of Deep Learning○ TensorFlow○ Personal Photo Search, Search Ranking, Smart Reply and more

● Advanced topics (not covered in this lecture)○ recurrent neural networks○ sequence model○ word2vec (text embeddings)○ advanced optimizers○ autoencoder○ generative adversarial network

Page 37: Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input

Thank you!

37