csc 4510 – machine learning

67
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/ 4: Regression (continued) 1 CSC 4510 - M.A. Papalaskari - Villanova University The slides in this presentation are adapted from: The Stanford online ML course http://www.ml-class.org/

Upload: ciara

Post on 11-Jan-2016

36 views

Category:

Documents


0 download

DESCRIPTION

CSC 4510 – Machine Learning. 4: Regression (continued). Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/. T he slides in this presentation are adapted from: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CSC 4510 – Machine Learning

CSC 4510 – Machine LearningDr. Mary-Angela PapalaskariDepartment of Computing SciencesVillanova University

Course website:www.csc.villanova.edu/~map/4510/

4: Regression (continued)

1CSC 4510 - M.A. Papalaskari - Villanova University

The slides in this presentation are adapted from:• The Stanford online ML course http://www.ml-class.org/

Page 2: CSC 4510 – Machine Learning

Last time

• Introduction to linear regression• Intuition – least squares approximation• Intuition – gradient descent algorithm• Hands on: Simple example using excel

CSC 4510 - M.A. Papalaskari - Villanova University 2

Page 3: CSC 4510 – Machine Learning

Today

• How to apply gradient descent to minimize the cost function for regression

• linear algebra refresher

CSC 4510 - M.A. Papalaskari - Villanova University 3

Page 4: CSC 4510 – Machine Learning

Housing Prices(Portland, OR)

Price(in 1000s of dollars)

Size (feet2)

4CSC 4510 - M.A. Papalaskari - Villanova University

Reminder: sample problem

Page 5: CSC 4510 – Machine Learning

Notation:

m = Number of training examples x’s = “input” variable / features y’s = “output” variable / “target” variable

Size in feet2 (x)

Price ($) in 1000's (y)

2104 4601416 2321534 315852 178… …

Training set ofhousing prices(Portland, OR)

5CSC 4510 - M.A. Papalaskari - Villanova University

Reminder: Notation

Page 6: CSC 4510 – Machine Learning

Training Set

Learning Algorithm

hSize of house

Estimate price

Linear Hypothesis:

Univariate linear regression)

6CSC 4510 - M.A. Papalaskari - Villanova University

Reminder: Learning algorithm for hypothesis function h

Page 7: CSC 4510 – Machine Learning

Training Set

Learning Algorithm

hSize of house

Estimate price

Linear Hypothesis:

Univariate linear regression)

7CSC 4510 - M.A. Papalaskari - Villanova University

Reminder: Learning algorithm for hypothesis function h

Page 8: CSC 4510 – Machine Learning

Gradient descent algorithm Linear Regression Model

8CSC 4510 - M.A. Papalaskari - Villanova University

Page 9: CSC 4510 – Machine Learning

Today

• How to apply gradient descent to minimize the cost function for regression1. a closer look at the cost function2. applying gradient descent to find the minimum

of the cost function

• linear algebra refresher

CSC 4510 - M.A. Papalaskari - Villanova University 9

Page 10: CSC 4510 – Machine Learning

Hypothesis:

Parameters:

Cost Function:

Goal:

10CSC 4510 - M.A. Papalaskari - Villanova University

Page 11: CSC 4510 – Machine Learning

Hypothesis:

Parameters:

Cost Function:

Goal:

Simplified

11CSC 4510 - M.A. Papalaskari - Villanova University

θ0 = 0 θ0 = 0

Page 12: CSC 4510 – Machine Learning

y

x

(for fixed θ1 this is a function of x) (function of the parameter θ1 )

12CSC 4510 - M.A. Papalaskari - Villanova University

θ0 = 0 θ0 = 0

hθ (x) = x hθ (x) = x

Page 13: CSC 4510 – Machine Learning

y

x

13CSC 4510 - M.A. Papalaskari - Villanova University

(for fixed θ1 this is a function of x) (function of the parameter θ1 )

θ0 = 0 θ0 = 0

hθ (x) = 0.5x hθ (x) = 0.5x

Page 14: CSC 4510 – Machine Learning

y

x

14CSC 4510 - M.A. Papalaskari - Villanova University

(for fixed θ1 this is a function of x) (function of the parameter θ1 )

θ0 = 0 θ0 = 0

hθ (x) = 0 hθ (x) = 0

Page 15: CSC 4510 – Machine Learning

Hypothesis:

Parameters:

Cost Function:

Goal:

15CSC 4510 - M.A. Papalaskari - Villanova University

What if θ0 ≠ 0? What if θ0 ≠ 0?

Page 16: CSC 4510 – Machine Learning

Price ($) in 1000’s

Size in feet2 (x)

16CSC 4510 - M.A. Papalaskari - Villanova University

hθ (x) = 10 + 0.1x hθ (x) = 10 + 0.1x

(for fixed θ0 , θ1 , this is a function of x) (function of the parameters θ0 , θ1)

Page 17: CSC 4510 – Machine Learning

17CSC 4510 - M.A. Papalaskari - Villanova University

Page 18: CSC 4510 – Machine Learning

(for fixed θ0 , θ1 , this is a function of x) (function of the parameters θ0 , θ1)

18CSC 4510 - M.A. Papalaskari - Villanova University

Page 19: CSC 4510 – Machine Learning

19CSC 4510 - M.A. Papalaskari - Villanova University

(for fixed θ0 , θ1 , this is a function of x) (function of the parameters θ0 , θ1)

Page 20: CSC 4510 – Machine Learning

20CSC 4510 - M.A. Papalaskari - Villanova University

(for fixed θ0 , θ1 , this is a function of x) (function of the parameters θ0 , θ1)

Page 21: CSC 4510 – Machine Learning

21CSC 4510 - M.A. Papalaskari - Villanova University

(for fixed θ0 , θ1 , this is a function of x) (function of the parameters θ0 , θ1)

Page 22: CSC 4510 – Machine Learning

Today

• How to apply gradient descent to minimize the cost function for regression1. a closer look at the cost function2. applying gradient descent to find the minimum

of the cost function

• linear algebra refresher

CSC 4510 - M.A. Papalaskari - Villanova University 22

Page 23: CSC 4510 – Machine Learning

Have some function

Want

Gradient descent algorithm outline:

• Start with some

• Keep changing to reduce

until we hopefully end up at a

minimum23CSC 4510 - M.A. Papalaskari - Villanova University

Page 24: CSC 4510 – Machine Learning

Have some function

Want

Gradient descent algorithm

24CSC 4510 - M.A. Papalaskari - Villanova University

Page 25: CSC 4510 – Machine Learning

Have some function

Want

Gradient descent algorithm

learning ratelearning rate25CSC 4510 - M.A. Papalaskari - Villanova University

Page 26: CSC 4510 – Machine Learning

If α is too small, gradient descent can be slow.

If α is too large, gradient descent can overshoot the minimum. It may fail to converge, or even diverge.

26CSC 4510 - M.A. Papalaskari - Villanova University

Page 27: CSC 4510 – Machine Learning

at local minimum

Current value of

27CSC 4510 - M.A. Papalaskari - Villanova University

Page 28: CSC 4510 – Machine Learning

Gradient descent can converge to a local minimum, even with the learning rate α fixed.

28CSC 4510 - M.A. Papalaskari - Villanova University

Page 29: CSC 4510 – Machine Learning

Gradient descent algorithm Linear Regression Model

29CSC 4510 - M.A. Papalaskari - Villanova University

Page 30: CSC 4510 – Machine Learning

Gradient descent algorithm

update and

simultaneously

30CSC 4510 - M.A. Papalaskari - Villanova University

Page 31: CSC 4510 – Machine Learning

J()

31CSC 4510 - M.A. Papalaskari - Villanova University

Page 32: CSC 4510 – Machine Learning

33CSC 4510 - M.A. Papalaskari - Villanova University

Page 33: CSC 4510 – Machine Learning

(for fixed , this is a function of x) (function of the parameters )

34CSC 4510 - M.A. Papalaskari - Villanova University

Page 34: CSC 4510 – Machine Learning

(for fixed , this is a function of x) (function of the parameters )

35CSC 4510 - M.A. Papalaskari - Villanova University

Page 35: CSC 4510 – Machine Learning

(for fixed , this is a function of x) (function of the parameters )

36CSC 4510 - M.A. Papalaskari - Villanova University

Page 36: CSC 4510 – Machine Learning

(for fixed , this is a function of x) (function of the parameters )

37CSC 4510 - M.A. Papalaskari - Villanova University

Page 37: CSC 4510 – Machine Learning

(for fixed , this is a function of x) (function of the parameters )

38CSC 4510 - M.A. Papalaskari - Villanova University

Page 38: CSC 4510 – Machine Learning

(for fixed , this is a function of x) (function of the parameters )

39CSC 4510 - M.A. Papalaskari - Villanova University

Page 39: CSC 4510 – Machine Learning

(for fixed , this is a function of x) (function of the parameters )

40CSC 4510 - M.A. Papalaskari - Villanova University

Page 40: CSC 4510 – Machine Learning

(for fixed , this is a function of x) (function of the parameters )

41CSC 4510 - M.A. Papalaskari - Villanova University

Page 41: CSC 4510 – Machine Learning

(for fixed , this is a function of x) (function of the parameters )

42CSC 4510 - M.A. Papalaskari - Villanova University

Page 42: CSC 4510 – Machine Learning

“Batch” Gradient Descent

“Batch”: Each step of gradient descent uses all the training examples.

Alternative: process part of the dataset for each step of the algorithm.

The slides in this presentation are adapted from:• The Stanford online ML course http://www.ml-class.org/

43CSC 4510 - M.A. Papalaskari - Villanova University

Page 43: CSC 4510 – Machine Learning

Size (feet2)

Number of

bedroomsNumber of floors

Age of home (years)

Price ($1000)

1 2104 5 1 45 4601 1416 3 2 40 2321 1534 3 2 30 3151 852 2 1 36 178

What’s next? We are not in univariate regression anymore:

44CSC 4510 - M.A. Papalaskari - Villanova University

Page 44: CSC 4510 – Machine Learning

Size (feet2)

Number of

bedroomsNumber of floors

Age of home (years)

Price ($1000)

1 2104 5 1 45 4601 1416 3 2 40 2321 1534 3 2 30 3151 852 2 1 36 178

What’s next? We are not in univariate regression anymore:

45CSC 4510 - M.A. Papalaskari - Villanova University

Page 45: CSC 4510 – Machine Learning

Today

• How to apply gradient descent to minimize the cost function for regression1. a closer look at the cost function2. applying gradient descent to find the minimum

of the cost function

• linear algebra refresher

CSC 4510 - M.A. Papalaskari - Villanova University 46

Page 46: CSC 4510 – Machine Learning

Linear Algebra Review

CSC 4510 - M.A. Papalaskari - Villanova University 47

Page 47: CSC 4510 – Machine Learning

Matrix Elements (entries of matrix)

“ i, j entry” in the ith row, jth column

Matrix: Rectangular array of numbers

Dimension of matrix: number of rows x number of columns eg: 4 x 2

48CSC 4510 - M.A. Papalaskari - Villanova University

Page 48: CSC 4510 – Machine Learning

49

Another Example: Representing communication links in a network

b ba c a c

e d e d

Adjacency matrix Adjacency matrix a b c d e a b c d e a 0 1 2 0 3 a 0 1 0 0 2 b 1 0 0 0 0 b 0 1 0 0 0 c 2 0 0 1 1 c 1 0 0 1 0 d 0 0 1 0 1 d 0 0 1 0 1 e 3 0 1 1 0 e 0 0 0 0 0

Page 49: CSC 4510 – Machine Learning

Vector: An n x 1 matrix.

element

50CSC 4510 - M.A. Papalaskari - Villanova University

Page 50: CSC 4510 – Machine Learning

Vector: An n x 1 matrix.

1-indexed vs 0-indexed:

element

51CSC 4510 - M.A. Papalaskari - Villanova University

Page 51: CSC 4510 – Machine Learning

Matrix Addition

52CSC 4510 - M.A. Papalaskari - Villanova University

Page 52: CSC 4510 – Machine Learning

Scalar Multiplication

53CSC 4510 - M.A. Papalaskari - Villanova University

Page 53: CSC 4510 – Machine Learning

Combination of Operands

54CSC 4510 - M.A. Papalaskari - Villanova University

Page 54: CSC 4510 – Machine Learning

Matrix-vector multiplication

55CSC 4510 - M.A. Papalaskari - Villanova University

Page 55: CSC 4510 – Machine Learning

Details:

m x n matrix(m rows,

n columns)

n x 1 matrix(n-dimensional

vector)

m-dimensional vector

To get yi, multiply A’s ith row with elements of vector x, and add them up.

56CSC 4510 - M.A. Papalaskari - Villanova University

Page 56: CSC 4510 – Machine Learning

Example

57CSC 4510 - M.A. Papalaskari - Villanova University

Page 57: CSC 4510 – Machine Learning

House sizes:

58CSC 4510 - M.A. Papalaskari - Villanova University

Page 58: CSC 4510 – Machine Learning

Example matrix-matrix multiplication

Page 59: CSC 4510 – Machine Learning

Details:

m x k matrix(m rows,

k columns)

k x n matrix(k rows,

n columns)

m x nmatrix

60CSC 4510 - M.A. Papalaskari - Villanova University

The ith column of the Matrix C is obtained by multiplying A with the ith column of B. (for i = 1, 2, … , n )

Page 60: CSC 4510 – Machine Learning

Example: Matrix-matrix multiplication

61CSC 4510 - M.A. Papalaskari - Villanova University

Page 61: CSC 4510 – Machine Learning

House sizes:

Matrix Matrix

Have 3 competing hypotheses:1.

2.

3.

62CSC 4510 - M.A. Papalaskari - Villanova University

Page 62: CSC 4510 – Machine Learning

Let and be matrices. Then in general,

(not commutative.)

E.g.

63CSC 4510 - M.A. Papalaskari - Villanova University

Page 63: CSC 4510 – Machine Learning

Let

Let

Compute

Compute

64CSC 4510 - M.A. Papalaskari - Villanova University

Page 64: CSC 4510 – Machine Learning

Identity Matrix

For any matrix A,

Denoted I (or Inxn or In).Examples of identity matrices:

2 x 23 x 3

4 x 4

65CSC 4510 - M.A. Papalaskari - Villanova University

Page 65: CSC 4510 – Machine Learning

Matrix inverse: A-1

If A is an m x m matrix, and if it has an inverse,

Matrices that don’t have an inverse are “singular” or “degenerate”

66CSC 4510 - M.A. Papalaskari - Villanova University

Page 66: CSC 4510 – Machine Learning

Matrix Transpose

Example:

Let be an m x n matrix, and let

Then is an n x m matrix, and

67CSC 4510 - M.A. Papalaskari - Villanova University

Page 67: CSC 4510 – Machine Learning

Size (feet2)

Number of

bedroomsNumber of floors

Age of home (years)

Price ($1000)

1 2104 5 1 45 4601 1416 3 2 40 2321 1534 3 2 30 3151 852 2 1 36 178

What’s next? We are not in univariate regression anymore:

68CSC 4510 - M.A. Papalaskari - Villanova University