lecture 11 – perceptrons
DESCRIPTION
Lecture 11 – Perceptrons. Machine Learning. Last Time. Beginning of Support Vector Machines Maximum Margin Criteria. Today. Perceptrons Leading to Neural Networks aka Multilayer Perceptron Networks But more accurately: Multilayer Logistic Regression Networks. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/1.jpg)
Lecture 11 – Perceptrons
Machine Learning
![Page 2: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/2.jpg)
2
Last Time
• Beginning of Support Vector Machines– Maximum Margin Criteria
![Page 3: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/3.jpg)
3
Today
• Perceptrons• Leading to– Neural Networks– aka Multilayer Perceptron Networks– But more accurately: Multilayer Logistic
Regression Networks
![Page 4: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/4.jpg)
4
Review: Fitting Polynomial Functions
• Fitting nonlinear 1-D functions• Polynomial:
• Risk:
![Page 5: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/5.jpg)
5
Review: Fitting Polynomial Functions
• Order-D polynomial regression for 1D variables is the same as D-dimensional linear regression
• Extend the feature vector from a scalar.
• More generally
![Page 6: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/6.jpg)
6
Neuron inspires Regression
• Graphical Representation of linear regression– McCullough-Pitts Neuron– Note: Not a graphical model
![Page 7: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/7.jpg)
7
Neuron inspires Regression
• Edges multiply the signal (xi) by some weight (θi).• Nodes sum inputs• Equivalent to Linear Regression
![Page 8: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/8.jpg)
8
Introducing Basis functions
• Graphical representation of feature extraction
• Edges multiply the signal by a weight.• Nodes apply a function, ϕd.
![Page 9: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/9.jpg)
9
Extension to more features
• Graphical representation of feature extraction
![Page 10: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/10.jpg)
10
Combining function
• How do we construct the neural output
Linear Neuron
![Page 11: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/11.jpg)
11
Combining function
• Sigmoid function or Squashing function
Logistic NeuronClassification using the same metaphor
![Page 12: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/12.jpg)
12
Logistic Neuron optimization
• Minimizing R(θ) is more difficult
Bad News: There is no “closed-form” solution.
Good News: It’s convex.
![Page 13: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/13.jpg)
13
Aside: Convex Regions
• Convex: for any pair of points xa and xb within a region, every point xc on a line between xa and xb is in the region
![Page 14: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/14.jpg)
14
Aside: Convex Functions
• Convex: for any pair of points xa and xb within a region, every point xc on a line between xa and xb is in the region
![Page 15: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/15.jpg)
15
Aside: Convex Functions
• Convex: for any pair of points xa and xb within a region, every point xc on a line between xa and xb is in the region
![Page 16: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/16.jpg)
16
Aside: Convex Functions
• Convex functions have a single maximum and minimum!
• How does this help us? • (nearly) Guaranteed optimality of Gradient
Descent
![Page 17: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/17.jpg)
17
Gradient Descent
• The Gradient is defined (though we can’t solve directly)
• Points in the direction of fastest increase
![Page 18: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/18.jpg)
18
Gradient Descent
• Gradient points in the direction of fastest increase
• To minimize R, move in the opposite direction
![Page 19: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/19.jpg)
19
Gradient Descent
• Gradient points in the direction of fastest increase
• To minimize R, move in the opposite direction
![Page 20: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/20.jpg)
20
Gradient Descent
• Initialize Randomly• Update with small steps• (nearly) guaranteed to converge to the
minimum
![Page 21: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/21.jpg)
21
Gradient Descent
• Initialize Randomly• Update with small steps• (nearly) guaranteed to converge to the
minimum
![Page 22: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/22.jpg)
22
Gradient Descent
• Initialize Randomly• Update with small steps• (nearly) guaranteed to converge to the
minimum
![Page 23: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/23.jpg)
23
Gradient Descent
• Initialize Randomly• Update with small steps• (nearly) guaranteed to converge to the
minimum
![Page 24: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/24.jpg)
24
Gradient Descent
• Initialize Randomly• Update with small steps• (nearly) guaranteed to converge to the
minimum
![Page 25: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/25.jpg)
25
Gradient Descent
• Initialize Randomly• Update with small steps• (nearly) guaranteed to converge to the
minimum
![Page 26: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/26.jpg)
26
Gradient Descent
• Initialize Randomly• Update with small steps• (nearly) guaranteed to converge to the
minimum
![Page 27: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/27.jpg)
27
Gradient Descent
• Initialize Randomly• Update with small steps• Can oscillate if η is too large
![Page 28: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/28.jpg)
28
Gradient Descent
• Initialize Randomly• Update with small steps• Can oscillate if η is too large
![Page 29: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/29.jpg)
29
Gradient Descent
• Initialize Randomly• Update with small steps• Can oscillate if η is too large
![Page 30: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/30.jpg)
30
Gradient Descent
• Initialize Randomly• Update with small steps• Can oscillate if η is too large
![Page 31: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/31.jpg)
31
Gradient Descent
• Initialize Randomly• Update with small steps• Can oscillate if η is too large
![Page 32: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/32.jpg)
32
Gradient Descent
• Initialize Randomly• Update with small steps• Can oscillate if η is too large
![Page 33: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/33.jpg)
33
Gradient Descent
• Initialize Randomly• Update with small steps• Can oscillate if η is too large
![Page 34: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/34.jpg)
34
Gradient Descent
• Initialize Randomly• Update with small steps• Can oscillate if η is too large
![Page 35: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/35.jpg)
35
Gradient Descent
• Initialize Randomly• Update with small steps• Can oscillate if η is too large
![Page 36: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/36.jpg)
36
Gradient Descent
• Initialize Randomly• Update with small steps• Can stall if is ever 0 not at the minimum
![Page 37: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/37.jpg)
37
Back to Neurons
Linear Neuron
Logistic Neuron
![Page 38: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/38.jpg)
38
• Classification squashing function
• Strictly classification error
Perceptron
Perceptron
![Page 39: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/39.jpg)
39
Classification Error
• Only count errors when a classification is incorrect.
Sigmoid leads to greater than zero error on correct classifications
![Page 40: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/40.jpg)
40
Classification Error
• Only count errors when a classification is incorrect.
Perceptrons usestrictly classificationerror
![Page 41: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/41.jpg)
41
Perceptron Error
• Can’t do gradient descent on this.
Perceptrons usestrictly classificationerror
![Page 42: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/42.jpg)
42
Perceptron Loss
• With classification loss:• Define Perceptron Loss.– Loss calculated for each misclassified data point
– Now piecewise linear risk rather than step function
vs
![Page 43: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/43.jpg)
43
Perceptron Loss
• Perceptron Loss.– Loss calculated for each misclassified data point
– Nice gradient descent
![Page 44: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/44.jpg)
44
Perceptron vs. Logistic Regression
• Logistic Regression has a hard time eliminating all errors– Errors: 2
• Perceptrons often do better. Increased importance of errors.– Errors: 0
![Page 45: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/45.jpg)
45
Stochastic Gradient Descent
• Instead of calculating the gradient over all points, and then updating.
• Update the gradient for each misclassified point at training
• Update rule:
![Page 46: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/46.jpg)
46
Online Perceptron Training
• Online Training – Update weights for each data point.
• Iterate over points xi point, – If xi is correctly classified– Else
• Theorem: If xi in X are linearly separable, then this process will converge to a θ* which leads to zero error in a finite number of steps.
![Page 47: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/47.jpg)
47
Linearly Separable
• Two classes of points are linearly separable, iff there exists a line such that all the points of one class fall on one side of the line, and all the points of the other class fall on the other side of the line
![Page 48: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/48.jpg)
48
Linearly Separable
• Two classes of points are linearly separable, iff there exists a line such that all the points of one class fall on one side of the line, and all the points of the other class fall on the other side of the line
![Page 49: Lecture 11 – Perceptrons](https://reader036.vdocument.in/reader036/viewer/2022062222/56816628550346895dd9868e/html5/thumbnails/49.jpg)
49
Next
• Multilayer Neural Networks