tips for training neural network scratch the surface

23
Tips for Training Neural Network scratch the surface

Upload: shawn-darley

Post on 15-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tips for Training Neural Network scratch the surface

Tips for Training Neural Network

scratch the surface

Page 2: Tips for Training Neural Network scratch the surface

Two Concerns

• There are two things you have to concern.

Optimization

• Can I find the “best” parameter set θ*

in limited of time?

Generalization

• Is the “best” parameter set θ* good for testing data as well?

Page 3: Tips for Training Neural Network scratch the surface

Initialization

• For gradient decent, we need to pick an initialization parameter θ0.• Do not set all the parameters θ0 equal• Set the parameters in θ0 randomly

Page 4: Tips for Training Neural Network scratch the surface

Learning Rate

• Toy Example

y zw

b

x

1zy

x = [0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5]y = [0.1, 0.4, 0.9, 1.6, 2.2, 2.5, 2.8, 3.5, 3.9, 4.7, 5.1, 5.3, 6.3, 6.5, 6.7, 7.5, 8.1, 8.5, 8.9, 9.5]

Training Data (20 examples)

0

1*

b

w

11 iii C • Set the learning rate η carefully

Page 5: Tips for Training Neural Network scratch the surface

• Toy Example

Learning Rate

Error Surface: C(w,b)

target

start

11 iii C

Page 6: Tips for Training Neural Network scratch the surface

• Toy Example

updates 30.~ k

1.0

Learning Rate

updates 3~ k001.0 01.0

Different learning rate η

Page 7: Tips for Training Neural Network scratch the surface

Gradient Decent

r

rr yxfR

C2

ˆ1

rR

rC1

Gradient Decent

Stochastic Gradient Decent

11 iii C r

iri CR

C 11 1

11 irii C Pick an example xr

If all example xr have equal probabilities to be picked

r

irir CR

CE 11 1

Page 8: Tips for Training Neural Network scratch the surface

Gradient Decent Stochastic Gradient Decent

0101 CStarting at θ0

Training Data: RRrr yxyxyxyx ˆ,,ˆ,,ˆ,,ˆ, 2211

1212 C

pick x1

pick x2

pick xr 11 rrrr C

pick xR 1RR1RR C

pick x1 R1R1R C

Seen all the examples once

One epoch

… …

… …

Page 9: Tips for Training Neural Network scratch the surface

Gradient Decent

• Toy Example

Gradient Decent Stochastic Gradient Decent1 epoch

See all examples

See only one example

Update 20 times in an epoch

Page 10: Tips for Training Neural Network scratch the surface

Gradient Decent

Shuffle your data

Stochastic Gradient Decent

11 irii C Pick an example xr

Mini Batch Gradient DescentPick B examples as a batch b

bx

irii

r

CB

11 1

Average the gradient of the examples in the batch b

(B is batch size)

Gradient Decent

11 iii C r

iri CR

C 11 1

Page 11: Tips for Training Neural Network scratch the surface

Gradient Decent

• Real Example: Handwriting Digit Classification

Batch size = 1 Gradient Decent

Page 12: Tips for Training Neural Network scratch the surface

Two Concerns

• There are two things you have to concern.

Optimization

• Can I find the “best” parameter set θ*

in limited of time?

Generalization

• Is the “best” parameter set θ* good for testing data as well?

Page 13: Tips for Training Neural Network scratch the surface

Generalization

• You pick a “best” parameter set θ*

rr yx ˆ,Training Data: rr yxfr ˆ;: *

uxTesting Data: uu yxf ˆ; *

However,

Training Data: Testing Data:

Training data and testing data have different distribution.

Page 14: Tips for Training Neural Network scratch the surface

Panacea

• Have more training data if possible ……• Create more training data (?)

Original Training Data:

Created Training Data:

Shift 15 。

Handwriting recognition:

Page 15: Tips for Training Neural Network scratch the surface

Reference

• Chapter 3 of Neural network and Deep Learning• http://neuralnetworksanddeeplearning.com/

chap3.html

Page 16: Tips for Training Neural Network scratch the surface

Appendix

Page 17: Tips for Training Neural Network scratch the surface

Overfitting

• The function that performs well on the training data does not necessarily perform well on the testing data.

ux

rr yx ˆ,Training Data:

Testing Data:

rr yxfr ˆ~

:

uu yxf ˆ~

Overfitting in our daily life:Memorize the answers of the previous examples……

Page 18: Tips for Training Neural Network scratch the surface

• Joke for overfiting• http://xkcd.com/1122/

Page 19: Tips for Training Neural Network scratch the surface

Initialization

• For gradient decent, we need to pick an initialization parameter θ0.• Do not set all the parameters θ0 equal• Or your parameters will always be equal, no matter

how many times you update the parameters• Randomly pick θ0

• If the last layer has more neurons, the initialization values should be smaller.• E.g. Last layer has Nl-1

1/1,0~ llij NNw 11 /1,/1U~ ll

lij NNw

Page 20: Tips for Training Neural Network scratch the surface

MNIST

• The MNIST data comes in two parts. The first part contains 60,000 images to be used as training data. These images are scanned handwriting samples from 250 people, half of whom were US Census Bureau employees, and half of whom were high school students. The images are greyscale and 28 by 28 pixels in size. The second part of the MNIST data set is 10,000 images to be used as test data. Again, these are 28 by 28 greyscale images.

• git clone https://github.com/mnielsen/neural-networks-and-deep-learning.git

• http://yann.lecun.com/exdb/mnist/• http://www.deeplearning.net/tutorial/gettingstarted.html

Page 21: Tips for Training Neural Network scratch the surface

MNIST

• The current (2013) record is classifying 9,979 of 10,000 images correctly. This was done by Li Wan, Matthew Zeiler, Sixin Zhang, Yann LeCun, and Rob Fergus. • At that level the performance is close to human-

equivalent, and is arguably better, since quite a few of the MNIST images are difficult even for humans to recognize with confidence.

Page 22: Tips for Training Neural Network scratch the surface

Early Stopping

• For iteration• Layer

Page 23: Tips for Training Neural Network scratch the surface

Difficulty of Deep

• Lower layer cannot plan