instructor: prof. ganeshramakrishnancs725/notes/lecture... · examples...

23
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Regression Instructor: Prof. Ganesh Ramakrishnan January 12, 2016 1 / 20

Upload: others

Post on 11-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Instructor: Prof. GaneshRamakrishnancs725/notes/lecture... · Examples Acompanywantstohowmuchmoneytheyneedtospendon T.Vadvertisingtoincreasesalestoadesiredlevel,sayy* Theyhavepreviousdataofform,where

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

RegressionInstructor: Prof. Ganesh Ramakrishnan

January 12, 2016 1 / 20

Page 2: Instructor: Prof. GaneshRamakrishnancs725/notes/lecture... · Examples Acompanywantstohowmuchmoneytheyneedtospendon T.Vadvertisingtoincreasesalestoadesiredlevel,sayy* Theyhavepreviousdataofform,where

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Recap

Supervised (Classification and Regression) vs UnsupervisedLearning

▶ Three canonical learning problems

What is data and how to predict▶ More on this today in the context of regression

Squared Error

January 12, 2016 2 / 20

Page 3: Instructor: Prof. GaneshRamakrishnancs725/notes/lecture... · Examples Acompanywantstohowmuchmoneytheyneedtospendon T.Vadvertisingtoincreasesalestoadesiredlevel,sayy* Theyhavepreviousdataofform,where

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Agenda

What is Regression

Formal Defintion

Types of Regression

Least Square Solution

Geometric Interpretation of least square solution

January 12, 2016 3 / 20

Page 4: Instructor: Prof. GaneshRamakrishnancs725/notes/lecture... · Examples Acompanywantstohowmuchmoneytheyneedtospendon T.Vadvertisingtoincreasesalestoadesiredlevel,sayy* Theyhavepreviousdataofform,where

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Regression

Finding correlation between a set of output variables and a set ofinput variables

Input variables are called independent variables

Output variables are called dependent variables

January 12, 2016 4 / 20

Page 5: Instructor: Prof. GaneshRamakrishnancs725/notes/lecture... · Examples Acompanywantstohowmuchmoneytheyneedtospendon T.Vadvertisingtoincreasesalestoadesiredlevel,sayy* Theyhavepreviousdataofform,where

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Examples

A company wants to how much money they need to spend onT.V advertising to increase sales to a desired level, say y*

They have previous data of form <xi,yi>, where xi is moneyspent on advertising and yi are sale figures

They now fit the data with a function, lets say linear function

y = β0 + β1 ∗ x (1)

and then find the money they need to spend using this function

Regression problem is to find the appropriate function and itscoefficients

January 12, 2016 5 / 20

Page 6: Instructor: Prof. GaneshRamakrishnancs725/notes/lecture... · Examples Acompanywantstohowmuchmoneytheyneedtospendon T.Vadvertisingtoincreasesalestoadesiredlevel,sayy* Theyhavepreviousdataofform,where

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Figure: Linear regression on T.V advertising vs sales figure

January 12, 2016 6 / 20

Page 7: Instructor: Prof. GaneshRamakrishnancs725/notes/lecture... · Examples Acompanywantstohowmuchmoneytheyneedtospendon T.Vadvertisingtoincreasesalestoadesiredlevel,sayy* Theyhavepreviousdataofform,where

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

What if sales is a non-linear function of

advertising?

January 12, 2016 7 / 20

Page 8: Instructor: Prof. GaneshRamakrishnancs725/notes/lecture... · Examples Acompanywantstohowmuchmoneytheyneedtospendon T.Vadvertisingtoincreasesalestoadesiredlevel,sayy* Theyhavepreviousdataofform,where
Page 9: Instructor: Prof. GaneshRamakrishnancs725/notes/lecture... · Examples Acompanywantstohowmuchmoneytheyneedtospendon T.Vadvertisingtoincreasesalestoadesiredlevel,sayy* Theyhavepreviousdataofform,where
Page 10: Instructor: Prof. GaneshRamakrishnancs725/notes/lecture... · Examples Acompanywantstohowmuchmoneytheyneedtospendon T.Vadvertisingtoincreasesalestoadesiredlevel,sayy* Theyhavepreviousdataofform,where
Page 11: Instructor: Prof. GaneshRamakrishnancs725/notes/lecture... · Examples Acompanywantstohowmuchmoneytheyneedtospendon T.Vadvertisingtoincreasesalestoadesiredlevel,sayy* Theyhavepreviousdataofform,where

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Formal Definition

Two sets of variables: x ∈ RN (independent) and y ∈ Rk

(dependent)

D is a set of m data points: <x1, y1>, <x2, y2>, ..., <xm, ym>

ϵ (f, D): An error function, designed to reflect the discrepancybetween the predicted value f(xi) and yi ∀i

Regression problem: Determine a function f∗ such that f∗(x) isthe best predictor for y, with respect to D,

f∗ = argminf∈F

ϵ(f,D) (2)

where, F denotes the class of functions over which theoptimization is performed

January 12, 2016 8 / 20

Page 12: Instructor: Prof. GaneshRamakrishnancs725/notes/lecture... · Examples Acompanywantstohowmuchmoneytheyneedtospendon T.Vadvertisingtoincreasesalestoadesiredlevel,sayy* Theyhavepreviousdataofform,where

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Types of Regression

Depends on the function class and error function

Linear Regression : establishes a relationship between dependentvariable (Y) and one or more independent variables (X) using abest fit straight line, i.e

Y = a + b ∗ X (3)

▶ Here F is of the form Σpi=1

wiφi(x), where φi are called basisfunctions

▶ Problem is to find w∗ where

w∗ = argminw

ϵ(w,D) (4)

January 12, 2016 9 / 20

Page 13: Instructor: Prof. GaneshRamakrishnancs725/notes/lecture... · Examples Acompanywantstohowmuchmoneytheyneedtospendon T.Vadvertisingtoincreasesalestoadesiredlevel,sayy* Theyhavepreviousdataofform,where

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Ridge Regression : A shrinkage parameter (regularizationparameter) is added in the error function to reduce discrepanciesdue to variance

Logistic Regression : Used to model conditional probability ofdependent variable given independent variable and is extensivelyused in classification tasks

logp(y|x)

1− p(y|x)= β0 + β ∗ x (5)

Lasso regression, Stepwise regression and many more

January 12, 2016 10 / 20

Page 14: Instructor: Prof. GaneshRamakrishnancs725/notes/lecture... · Examples Acompanywantstohowmuchmoneytheyneedtospendon T.Vadvertisingtoincreasesalestoadesiredlevel,sayy* Theyhavepreviousdataofform,where

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Least Square Solution

Form of ϵ plays a major role in the accuracy and tractability ofthe optimization problem

The squared loss is a commonly used error/loss function. It isthe sum of squares of the differences between the actual valueand the predicted value

ϵ(f,D) = Σ(f(xi)− yi)2 (6)

The least square solution for linear regression is given by

w∗ = argminw

mj=1(

p

i=1(wiφi(xj)− yj)2) (7)

January 12, 2016 11 / 20

Page 15: Instructor: Prof. GaneshRamakrishnancs725/notes/lecture... · Examples Acompanywantstohowmuchmoneytheyneedtospendon T.Vadvertisingtoincreasesalestoadesiredlevel,sayy* Theyhavepreviousdataofform,where

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

The minimum value of the squared loss is zero

If zero were attained at w∗, we would have ....................

January 12, 2016 12 / 20

Page 16: Instructor: Prof. GaneshRamakrishnancs725/notes/lecture... · Examples Acompanywantstohowmuchmoneytheyneedtospendon T.Vadvertisingtoincreasesalestoadesiredlevel,sayy* Theyhavepreviousdataofform,where

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

The minimum value of the squared loss is zero

If zero were attained at w∗, we would have ∀u, φT(xu)w∗ = yu,

or equivalently φw∗ = y, where

φ =

φ1(x1) ... φp(x1)... ... ...

φ1(xm) ... φp(xm)

and

y =

y1...

ym

It has a solution if y is in the column space (the subspace of Rn

formed by the column vectors) of φ

January 12, 2016 13 / 20

Page 17: Instructor: Prof. GaneshRamakrishnancs725/notes/lecture... · Examples Acompanywantstohowmuchmoneytheyneedtospendon T.Vadvertisingtoincreasesalestoadesiredlevel,sayy* Theyhavepreviousdataofform,where

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

The minimum value of the squared loss is zero

If zero were NOT attainable at w∗, what can be done?

January 12, 2016 14 / 20

Page 18: Instructor: Prof. GaneshRamakrishnancs725/notes/lecture... · Examples Acompanywantstohowmuchmoneytheyneedtospendon T.Vadvertisingtoincreasesalestoadesiredlevel,sayy* Theyhavepreviousdataofform,where

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Geometric Interpretation of Least Square Solution

Let y∗ be a solution in the column space of φ

The least squares solution is such that the distance between y∗

and y is minimized

Therefore............

January 12, 2016 15 / 20

Page 19: Instructor: Prof. GaneshRamakrishnancs725/notes/lecture... · Examples Acompanywantstohowmuchmoneytheyneedtospendon T.Vadvertisingtoincreasesalestoadesiredlevel,sayy* Theyhavepreviousdataofform,where

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Geometric Interpretation of Least Square Solution

Let y∗ be a solution in the column space of φ

The least squares solution is such that the distance between y∗

and y is minimized

Therefore, the line joining y∗ to y should be orthogonal to thecolumn space

φw = y∗ (8)

(y − y∗)Tφ = 0 (9)

(y∗)Tφ = (y)Tφ (10)

January 12, 2016 16 / 20

Page 20: Instructor: Prof. GaneshRamakrishnancs725/notes/lecture... · Examples Acompanywantstohowmuchmoneytheyneedtospendon T.Vadvertisingtoincreasesalestoadesiredlevel,sayy* Theyhavepreviousdataofform,where

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

(φw)Tφ = yTφ (11)

wTφTφ = yTφ (12)

φTφw = φTy (13)

w = (φTφ)−1y (14)

Here φTφ is invertible only if φ has full column rank

January 12, 2016 17 / 20

Page 21: Instructor: Prof. GaneshRamakrishnancs725/notes/lecture... · Examples Acompanywantstohowmuchmoneytheyneedtospendon T.Vadvertisingtoincreasesalestoadesiredlevel,sayy* Theyhavepreviousdataofform,where

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Proof?

January 12, 2016 18 / 20

Page 22: Instructor: Prof. GaneshRamakrishnancs725/notes/lecture... · Examples Acompanywantstohowmuchmoneytheyneedtospendon T.Vadvertisingtoincreasesalestoadesiredlevel,sayy* Theyhavepreviousdataofform,where

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Theorem : φTφ is invertible if and only if φ is full column rankProof :Given that φ has full column rank and hence columns are linearlyindependent, we have that φx = 0 ⇒ x = 0

Assume on the contrary that φTφ is non invertible. Then ∃x ̸= 0

such that φTφx = 0

⇒ xTφTφx = 0

⇒ (φx)Tφx = 0

⇒ φx = 0

This is a contradiction. Hence φTφ is invertible if φ is full columnrankIf φTφ is invertible then φx = 0 implies (φTφx) = 0, which in turnimplies x = 0 , This implies φ has full column rank if φTφ is

invertible. Hence, theorem proved

January 12, 2016 19 / 20

Page 23: Instructor: Prof. GaneshRamakrishnancs725/notes/lecture... · Examples Acompanywantstohowmuchmoneytheyneedtospendon T.Vadvertisingtoincreasesalestoadesiredlevel,sayy* Theyhavepreviousdataofform,where

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Figure: Least square solution y∗ is the orthogonal projection of y ontocolumn space of φ

January 12, 2016 20 / 20