machine learning using matlab - uni konstanz...example layer 1 layer 2 layer 3 layer 4 forward...

30
Machine Learning using Matlab Lecture 6 Neural Network (cont.)

Upload: others

Post on 22-Mar-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first

Machine Learning using Matlab

Lecture 6 Neural Network (cont.)

Page 2: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first

Cost function

Page 3: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first

Forward propagation● Forward propagation from layer l to layer l+1 is computed as:

● Note when l = 1,

Layer 1 Layer 2 Layer 3 Layer 4

Page 4: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first

Backpropagation● Backpropagation from layer l+1 to layer l is computed as:

● When l = L,

Layer 1 Layer 2 Layer 3 Layer 4

Page 5: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first

Example

Layer 1 Layer 2 Layer 3 Layer 4

Forward propagation Backpropagation

Given a training example (x,y), the cost function is first simplified as: . Forward propagation and backpropagation are computed as:

Page 6: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first

Gradient computation 1. Given training set 2. Set 3. For i = 1 to m

○ Set ○ Perform forward propagation to compute al for ○ Using yi , compute○ Compute○ delta

4.

Page 7: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first

Random initialization● Instead of initialize the parameters to all zeros, it is important to initialize them

randomly.● The random initialization serves the purpose of symmetry breaking.

Layer 1 Layer 2 Layer 3 Layer 4

Forward propagation Backpropagation

Page 8: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first

Random initialization - Matlab function● Initial each parameter to a random value in ● function W = randInitializeWeights(L_in, L_out)

epsilon_init = 0.1 W = rand(L_out, 1 + L_in) * 2 * epsilon_init - epsilon_init end

Page 9: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first

Advanced optimization● We have taught how to call the existing numerical computing functions to

acquire the optimal parameters:○ Function [J, grad] = costFunction(theta) ...○ optTheta = minFunc(@costFunction, initialTheta, options)

● In the following neural network, we have three parameter matrices, how to feed them into “minFunc” function?

Layer 1 Layer 2 Layer 3 Layer 4

“Unroll” into vectors

Page 10: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first

Advanced optimization - exampleL = 4, s1= 4, s2 = 5, s3 = 5, s4 = 4

ϴ(1) ∈ ℝ5×5, ϴ(2) ∈ ℝ5×6, ϴ(3) ∈ ℝ4×6

Matlab implementation:

1. Unroll: thetaVec = [ Theta1(:); Theta2(:); Theta3(:)]2. Feed thetaVec into “minFunc”3. Reshape thetaVec in “costFunction”:

a. Theta1 = reshape(thetaVec(1:25),5,5);b. Theta2 = reshape(thetaVect(26:55),5,6);c. Theta3 = reshape(thetaVect(56:79),4,6);d. Compute J and grad

Layer 1 Layer 2 Layer 3 Layer 4

Page 11: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first

Gradient check● Too many parameters, not sure if the computed gradient is correct or not?!● Recall the definition of numerical estimation of gradients, we can compare the

gradient with the numerical estimation of gradients.

Page 12: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first

Gradient check

Page 13: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first

Gradient check● Implementation note:

○ Implement backpropagation to compute gradient○ Implement numerical gradient check to compute estimated gradient○ Make sure they have similar values (less than a threshold)○ Turn off gradient check for training

● Note:○ Be sure to disable your gradient check code, otherwise it is very slow to learn○ Gradient check can be generalized to check gradient of any cost function

Page 14: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first

Overview: train a neural network● Design a network architecture● Randomly initialized weights● Implement forward propagation to get hypothesis for any xi ● Implement code to compute cost function Jthe● Implement backpropagation to compute partial derivatives● Use gradient check to compare with numerical estimation of gradient

of Jthet, If it works well, then disable gradient checking code● Use gradient descent or advanced optimization method to minimize Jthet as a

function of parameters T

Page 15: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first
Page 16: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first

Deep feedforward Neural Networks

Page 17: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first

Other architectures

Convolutional Neural Network (CNN)

Recurrent Neural Network (RNN)

Page 20: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first

Discussion● More parameters, more powerful.

○ Which one is better: more layers or more neurons?○ Disadvantages?

● Neural network is non-convex, gradient descent is susceptible to local optima; however, it works fairly well even though the optima is not global.

● Black box model

Page 21: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first

From logistic regression to SVMLogistic regression

● Label:

● Hypothesis:

● Objective:

Support Vector Machine (SVM)

● Label:

● Hypothesis:

● Objective:

Page 22: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first

From logistic regression to SVM● Cost function: ● Cost function:

Page 23: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first

From logistic regression to SVM● Logistic regression:

● SVM:

Page 24: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first

SVM - model representation● Given training examples , SVM aims to find an optimal hyperplane

so that:

● Which is equivalent to minimizing the following cost function:

● Here is called hinge loss.

Page 25: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first

SVM - gradient computing● Because the hinge loss is not differentiable, a sub-gradient is computed:

Page 26: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first

SVM - intuition

Page 27: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first

● Which of the linear classifier is optimal?

SVM - intuition

Page 28: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first

SVM - intuition

1. Maximizing the margin is good according to tuition and PAC theory

2. Implies that only support vectors are important; while other training examples are ignorable

Support vector

Support vector

Page 29: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first

SVM - intuition● Which linear classifier has better performance?

Page 30: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first