csc 4510 – machine learning
DESCRIPTION
CSC 4510 – Machine Learning. 4: Regression (continued). Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/. T he slides in this presentation are adapted from: - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/1.jpg)
CSC 4510 – Machine LearningDr. Mary-Angela PapalaskariDepartment of Computing SciencesVillanova University
Course website:www.csc.villanova.edu/~map/4510/
4: Regression (continued)
1CSC 4510 - M.A. Papalaskari - Villanova University
The slides in this presentation are adapted from:• The Stanford online ML course http://www.ml-class.org/
![Page 2: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/2.jpg)
Last time
• Introduction to linear regression• Intuition – least squares approximation• Intuition – gradient descent algorithm• Hands on: Simple example using excel
CSC 4510 - M.A. Papalaskari - Villanova University 2
![Page 3: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/3.jpg)
Today
• How to apply gradient descent to minimize the cost function for regression
• linear algebra refresher
CSC 4510 - M.A. Papalaskari - Villanova University 3
![Page 4: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/4.jpg)
Housing Prices(Portland, OR)
Price(in 1000s of dollars)
Size (feet2)
4CSC 4510 - M.A. Papalaskari - Villanova University
Reminder: sample problem
![Page 5: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/5.jpg)
Notation:
m = Number of training examples x’s = “input” variable / features y’s = “output” variable / “target” variable
Size in feet2 (x)
Price ($) in 1000's (y)
2104 4601416 2321534 315852 178… …
Training set ofhousing prices(Portland, OR)
5CSC 4510 - M.A. Papalaskari - Villanova University
Reminder: Notation
![Page 6: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/6.jpg)
Training Set
Learning Algorithm
hSize of house
Estimate price
Linear Hypothesis:
Univariate linear regression)
6CSC 4510 - M.A. Papalaskari - Villanova University
Reminder: Learning algorithm for hypothesis function h
![Page 7: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/7.jpg)
Training Set
Learning Algorithm
hSize of house
Estimate price
Linear Hypothesis:
Univariate linear regression)
7CSC 4510 - M.A. Papalaskari - Villanova University
Reminder: Learning algorithm for hypothesis function h
![Page 8: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/8.jpg)
Gradient descent algorithm Linear Regression Model
8CSC 4510 - M.A. Papalaskari - Villanova University
![Page 9: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/9.jpg)
Today
• How to apply gradient descent to minimize the cost function for regression1. a closer look at the cost function2. applying gradient descent to find the minimum
of the cost function
• linear algebra refresher
CSC 4510 - M.A. Papalaskari - Villanova University 9
![Page 10: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/10.jpg)
Hypothesis:
Parameters:
Cost Function:
Goal:
10CSC 4510 - M.A. Papalaskari - Villanova University
![Page 11: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/11.jpg)
Hypothesis:
Parameters:
Cost Function:
Goal:
Simplified
11CSC 4510 - M.A. Papalaskari - Villanova University
θ0 = 0 θ0 = 0
![Page 12: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/12.jpg)
y
x
(for fixed θ1 this is a function of x) (function of the parameter θ1 )
12CSC 4510 - M.A. Papalaskari - Villanova University
θ0 = 0 θ0 = 0
hθ (x) = x hθ (x) = x
![Page 13: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/13.jpg)
y
x
13CSC 4510 - M.A. Papalaskari - Villanova University
(for fixed θ1 this is a function of x) (function of the parameter θ1 )
θ0 = 0 θ0 = 0
hθ (x) = 0.5x hθ (x) = 0.5x
![Page 14: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/14.jpg)
y
x
14CSC 4510 - M.A. Papalaskari - Villanova University
(for fixed θ1 this is a function of x) (function of the parameter θ1 )
θ0 = 0 θ0 = 0
hθ (x) = 0 hθ (x) = 0
![Page 15: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/15.jpg)
Hypothesis:
Parameters:
Cost Function:
Goal:
15CSC 4510 - M.A. Papalaskari - Villanova University
What if θ0 ≠ 0? What if θ0 ≠ 0?
![Page 16: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/16.jpg)
Price ($) in 1000’s
Size in feet2 (x)
16CSC 4510 - M.A. Papalaskari - Villanova University
hθ (x) = 10 + 0.1x hθ (x) = 10 + 0.1x
(for fixed θ0 , θ1 , this is a function of x) (function of the parameters θ0 , θ1)
![Page 17: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/17.jpg)
17CSC 4510 - M.A. Papalaskari - Villanova University
![Page 18: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/18.jpg)
(for fixed θ0 , θ1 , this is a function of x) (function of the parameters θ0 , θ1)
18CSC 4510 - M.A. Papalaskari - Villanova University
![Page 19: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/19.jpg)
19CSC 4510 - M.A. Papalaskari - Villanova University
(for fixed θ0 , θ1 , this is a function of x) (function of the parameters θ0 , θ1)
![Page 20: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/20.jpg)
20CSC 4510 - M.A. Papalaskari - Villanova University
(for fixed θ0 , θ1 , this is a function of x) (function of the parameters θ0 , θ1)
![Page 21: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/21.jpg)
21CSC 4510 - M.A. Papalaskari - Villanova University
(for fixed θ0 , θ1 , this is a function of x) (function of the parameters θ0 , θ1)
![Page 22: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/22.jpg)
Today
• How to apply gradient descent to minimize the cost function for regression1. a closer look at the cost function2. applying gradient descent to find the minimum
of the cost function
• linear algebra refresher
CSC 4510 - M.A. Papalaskari - Villanova University 22
![Page 23: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/23.jpg)
Have some function
Want
Gradient descent algorithm outline:
• Start with some
• Keep changing to reduce
until we hopefully end up at a
minimum23CSC 4510 - M.A. Papalaskari - Villanova University
![Page 24: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/24.jpg)
Have some function
Want
Gradient descent algorithm
24CSC 4510 - M.A. Papalaskari - Villanova University
![Page 25: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/25.jpg)
Have some function
Want
Gradient descent algorithm
learning ratelearning rate25CSC 4510 - M.A. Papalaskari - Villanova University
![Page 26: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/26.jpg)
If α is too small, gradient descent can be slow.
If α is too large, gradient descent can overshoot the minimum. It may fail to converge, or even diverge.
26CSC 4510 - M.A. Papalaskari - Villanova University
![Page 27: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/27.jpg)
at local minimum
Current value of
27CSC 4510 - M.A. Papalaskari - Villanova University
![Page 28: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/28.jpg)
Gradient descent can converge to a local minimum, even with the learning rate α fixed.
28CSC 4510 - M.A. Papalaskari - Villanova University
![Page 29: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/29.jpg)
Gradient descent algorithm Linear Regression Model
29CSC 4510 - M.A. Papalaskari - Villanova University
![Page 30: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/30.jpg)
Gradient descent algorithm
update and
simultaneously
30CSC 4510 - M.A. Papalaskari - Villanova University
![Page 31: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/31.jpg)
J()
31CSC 4510 - M.A. Papalaskari - Villanova University
![Page 32: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/32.jpg)
33CSC 4510 - M.A. Papalaskari - Villanova University
![Page 33: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/33.jpg)
(for fixed , this is a function of x) (function of the parameters )
34CSC 4510 - M.A. Papalaskari - Villanova University
![Page 34: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/34.jpg)
(for fixed , this is a function of x) (function of the parameters )
35CSC 4510 - M.A. Papalaskari - Villanova University
![Page 35: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/35.jpg)
(for fixed , this is a function of x) (function of the parameters )
36CSC 4510 - M.A. Papalaskari - Villanova University
![Page 36: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/36.jpg)
(for fixed , this is a function of x) (function of the parameters )
37CSC 4510 - M.A. Papalaskari - Villanova University
![Page 37: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/37.jpg)
(for fixed , this is a function of x) (function of the parameters )
38CSC 4510 - M.A. Papalaskari - Villanova University
![Page 38: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/38.jpg)
(for fixed , this is a function of x) (function of the parameters )
39CSC 4510 - M.A. Papalaskari - Villanova University
![Page 39: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/39.jpg)
(for fixed , this is a function of x) (function of the parameters )
40CSC 4510 - M.A. Papalaskari - Villanova University
![Page 40: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/40.jpg)
(for fixed , this is a function of x) (function of the parameters )
41CSC 4510 - M.A. Papalaskari - Villanova University
![Page 41: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/41.jpg)
(for fixed , this is a function of x) (function of the parameters )
42CSC 4510 - M.A. Papalaskari - Villanova University
![Page 42: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/42.jpg)
“Batch” Gradient Descent
“Batch”: Each step of gradient descent uses all the training examples.
Alternative: process part of the dataset for each step of the algorithm.
The slides in this presentation are adapted from:• The Stanford online ML course http://www.ml-class.org/
43CSC 4510 - M.A. Papalaskari - Villanova University
![Page 43: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/43.jpg)
Size (feet2)
Number of
bedroomsNumber of floors
Age of home (years)
Price ($1000)
1 2104 5 1 45 4601 1416 3 2 40 2321 1534 3 2 30 3151 852 2 1 36 178
What’s next? We are not in univariate regression anymore:
44CSC 4510 - M.A. Papalaskari - Villanova University
![Page 44: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/44.jpg)
Size (feet2)
Number of
bedroomsNumber of floors
Age of home (years)
Price ($1000)
1 2104 5 1 45 4601 1416 3 2 40 2321 1534 3 2 30 3151 852 2 1 36 178
What’s next? We are not in univariate regression anymore:
45CSC 4510 - M.A. Papalaskari - Villanova University
![Page 45: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/45.jpg)
Today
• How to apply gradient descent to minimize the cost function for regression1. a closer look at the cost function2. applying gradient descent to find the minimum
of the cost function
• linear algebra refresher
CSC 4510 - M.A. Papalaskari - Villanova University 46
![Page 46: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/46.jpg)
Linear Algebra Review
CSC 4510 - M.A. Papalaskari - Villanova University 47
![Page 47: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/47.jpg)
Matrix Elements (entries of matrix)
“ i, j entry” in the ith row, jth column
Matrix: Rectangular array of numbers
Dimension of matrix: number of rows x number of columns eg: 4 x 2
48CSC 4510 - M.A. Papalaskari - Villanova University
![Page 48: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/48.jpg)
49
Another Example: Representing communication links in a network
b ba c a c
e d e d
Adjacency matrix Adjacency matrix a b c d e a b c d e a 0 1 2 0 3 a 0 1 0 0 2 b 1 0 0 0 0 b 0 1 0 0 0 c 2 0 0 1 1 c 1 0 0 1 0 d 0 0 1 0 1 d 0 0 1 0 1 e 3 0 1 1 0 e 0 0 0 0 0
![Page 49: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/49.jpg)
Vector: An n x 1 matrix.
element
50CSC 4510 - M.A. Papalaskari - Villanova University
![Page 50: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/50.jpg)
Vector: An n x 1 matrix.
1-indexed vs 0-indexed:
element
51CSC 4510 - M.A. Papalaskari - Villanova University
![Page 51: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/51.jpg)
Matrix Addition
52CSC 4510 - M.A. Papalaskari - Villanova University
![Page 52: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/52.jpg)
Scalar Multiplication
53CSC 4510 - M.A. Papalaskari - Villanova University
![Page 53: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/53.jpg)
Combination of Operands
54CSC 4510 - M.A. Papalaskari - Villanova University
![Page 54: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/54.jpg)
Matrix-vector multiplication
55CSC 4510 - M.A. Papalaskari - Villanova University
![Page 55: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/55.jpg)
Details:
m x n matrix(m rows,
n columns)
n x 1 matrix(n-dimensional
vector)
m-dimensional vector
To get yi, multiply A’s ith row with elements of vector x, and add them up.
56CSC 4510 - M.A. Papalaskari - Villanova University
![Page 56: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/56.jpg)
Example
57CSC 4510 - M.A. Papalaskari - Villanova University
![Page 57: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/57.jpg)
House sizes:
58CSC 4510 - M.A. Papalaskari - Villanova University
![Page 58: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/58.jpg)
Example matrix-matrix multiplication
![Page 59: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/59.jpg)
Details:
m x k matrix(m rows,
k columns)
k x n matrix(k rows,
n columns)
m x nmatrix
60CSC 4510 - M.A. Papalaskari - Villanova University
The ith column of the Matrix C is obtained by multiplying A with the ith column of B. (for i = 1, 2, … , n )
![Page 60: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/60.jpg)
Example: Matrix-matrix multiplication
61CSC 4510 - M.A. Papalaskari - Villanova University
![Page 61: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/61.jpg)
House sizes:
Matrix Matrix
Have 3 competing hypotheses:1.
2.
3.
62CSC 4510 - M.A. Papalaskari - Villanova University
![Page 62: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/62.jpg)
Let and be matrices. Then in general,
(not commutative.)
E.g.
63CSC 4510 - M.A. Papalaskari - Villanova University
![Page 63: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/63.jpg)
Let
Let
Compute
Compute
64CSC 4510 - M.A. Papalaskari - Villanova University
![Page 64: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/64.jpg)
Identity Matrix
For any matrix A,
Denoted I (or Inxn or In).Examples of identity matrices:
2 x 23 x 3
4 x 4
65CSC 4510 - M.A. Papalaskari - Villanova University
![Page 65: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/65.jpg)
Matrix inverse: A-1
If A is an m x m matrix, and if it has an inverse,
Matrices that don’t have an inverse are “singular” or “degenerate”
66CSC 4510 - M.A. Papalaskari - Villanova University
![Page 66: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/66.jpg)
Matrix Transpose
Example:
Let be an m x n matrix, and let
Then is an n x m matrix, and
67CSC 4510 - M.A. Papalaskari - Villanova University
![Page 67: CSC 4510 – Machine Learning](https://reader035.vdocument.in/reader035/viewer/2022081603/568143a1550346895db0214b/html5/thumbnails/67.jpg)
Size (feet2)
Number of
bedroomsNumber of floors
Age of home (years)
Price ($1000)
1 2104 5 1 45 4601 1416 3 2 40 2321 1534 3 2 30 3151 852 2 1 36 178
What’s next? We are not in univariate regression anymore:
68CSC 4510 - M.A. Papalaskari - Villanova University