eclt 5810 linear regression and logistic regression …eclt5810/lecture/logistic-reg-2017.pdf ·...

31
ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam

Upload: doantuyen

Post on 07-Feb-2018

232 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: ECLT 5810 Linear Regression and Logistic Regression …eclt5810/lecture/logistic-reg-2017.pdf · ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam

ECLT 5810Linear Regression and Logistic Regression for Classification

Prof. Wai Lam

Page 2: ECLT 5810 Linear Regression and Logistic Regression …eclt5810/lecture/logistic-reg-2017.pdf · ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam

Linear Regression ModelsLeast Squares• Input vectors • is an attribute / feature / predictor 

(independent variable)• The linear regression model:

• The output is called response (dependent variable)

• ’s are unknown parameters (coefficients)2

Page 3: ECLT 5810 Linear Regression and Logistic Regression …eclt5810/lecture/logistic-reg-2017.pdf · ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam

Linear Regression ModelsLeast Squares

• A set of training data 

• Each  corresponding to attributes

• Each  is a class attribute value / a label• Wish to estimate the parameters 

3

Page 4: ECLT 5810 Linear Regression and Logistic Regression …eclt5810/lecture/logistic-reg-2017.pdf · ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam

Linear Regression ModelsLeast Squares

• One common approach ‐ the method of least squares:

• Pick the coefficients  to minimize the residual sum of squares:

4

Page 5: ECLT 5810 Linear Regression and Logistic Regression …eclt5810/lecture/logistic-reg-2017.pdf · ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam

Linear Regression ModelsLeast Squares• This criterion is reasonable if the training 

observations  represent independent draws.

• Even if the  ’s were not drawn randomly, the criterion is still valid if the  ’s are conditionally independent given the inputs  .

5

Page 6: ECLT 5810 Linear Regression and Logistic Regression …eclt5810/lecture/logistic-reg-2017.pdf · ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam

• Make no assumption about the validity of the model

• Simply finds the best linear fit to the data

Linear Regression ModelsLeast Squares

6

Page 7: ECLT 5810 Linear Regression and Logistic Regression …eclt5810/lecture/logistic-reg-2017.pdf · ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam

Linear Regression ModelsFinding Residual Sum of Squares

• Denote by  the  matrix with each row an input vector (with a 1 in the first position)

• Let  be the N‐vector of outputs in the training set

• Quadratic function in the  parameters:

7

Page 8: ECLT 5810 Linear Regression and Logistic Regression …eclt5810/lecture/logistic-reg-2017.pdf · ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam

Linear Regression ModelsFinding Residual Sum of Squares

• Set the first derivation to zero:

• Obtain the unique solution:

8

Page 9: ECLT 5810 Linear Regression and Logistic Regression …eclt5810/lecture/logistic-reg-2017.pdf · ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam

Linear Regression ModelsOrthogonal Projection

• The fitted values at the training inputs are:

• The matrix  appearing in the above equation, called “hat” matrix because it puts the hat on 

9

Page 10: ECLT 5810 Linear Regression and Logistic Regression …eclt5810/lecture/logistic-reg-2017.pdf · ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam

Linear Regression ModelsExample

• Training Data:

x y

(1, 2, 1) 22

(2, 0, 4) 49

(3, 4, 2) 39

(4, 2, 3) 52

(5, 4, 1) 38

10

Page 11: ECLT 5810 Linear Regression and Logistic Regression …eclt5810/lecture/logistic-reg-2017.pdf · ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam

Linear Regression ModelsExample

1 1 2 11 2 0 4111

345

424

231

2249395238

•• 4.04 0.51 8.43 8.13

11

Page 12: ECLT 5810 Linear Regression and Logistic Regression …eclt5810/lecture/logistic-reg-2017.pdf · ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam

Linear Regression ModelsExample

• 4.04 0.51 8.43 8.13

•21.61 0.39

49.91 ‐0.91

39.13 ‐0.13

50.57 1.4

38.78 ‐0.78

12

residual vector

Page 13: ECLT 5810 Linear Regression and Logistic Regression …eclt5810/lecture/logistic-reg-2017.pdf · ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam

• Suppose there are  classes, labeled 1,2, ,• A class of methods that model 

for each class. Then, classify  to the class with the largest value for its discriminant function

• Decision boundary between class  and  is that set of points for which  ℓ

13

Logistic RegressionDiscriminant Functions

Page 14: ECLT 5810 Linear Regression and Logistic Regression …eclt5810/lecture/logistic-reg-2017.pdf · ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam

• Suppose  is the class‐conditional density of  in class  , i.e., 

• Let  be the prior probability of class  , with 

• A simple application of Bayes theorem:

∑ ℓ ℓℓ14

Logistic Regression

Page 15: ECLT 5810 Linear Regression and Logistic Regression …eclt5810/lecture/logistic-reg-2017.pdf · ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam

Logistic Regression• Desire to model the posterior probabilities of the 

classes via linear functions in  (p‐dimensional vector)

• Ensuring they sum to one and remain in • Model:

15

Page 16: ECLT 5810 Linear Regression and Logistic Regression …eclt5810/lecture/logistic-reg-2017.pdf · ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam

Logistic Regression• Specified in terms of  log‐odds or logit

transformations• Choice of denominator is arbitrary – estimates are 

equivariant under this choice

ℓ ℓℓ

ℓ ℓℓ Sum to 1

16

Page 17: ECLT 5810 Linear Regression and Logistic Regression …eclt5810/lecture/logistic-reg-2017.pdf · ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam

Logistic RegressionTwo‐class Classification

• For two‐class classification, we can model two classes as 0 and 1.

• Treating the class 1 as the concept of interest, the posterior probability can be regarded as the class membership probability:

Pr 1exp

1 exp

• As a result, it maps  in p‐dimensional space to a value in [0,1]

17

logistic function

Page 18: ECLT 5810 Linear Regression and Logistic Regression …eclt5810/lecture/logistic-reg-2017.pdf · ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam

Logistic RegressionShape of sigmoid curve

• Consider 1‐dimensional 

18

Pr

Page 19: ECLT 5810 Linear Regression and Logistic Regression …eclt5810/lecture/logistic-reg-2017.pdf · ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam

Logistic RegressionAn Example of One‐dimension

• We wish to predict death from baseline APACHE II score of patients.

• Let Pr be the probability that a patient with score  will die.

19

• Note that linear regression would not work well since it could produce probabilities less than 0 or greater than 1

Page 20: ECLT 5810 Linear Regression and Logistic Regression …eclt5810/lecture/logistic-reg-2017.pdf · ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam

Logistic RegressionAn Example of One‐dimension

• Data that has a sharp survival cut off point between patients who live or die will lead to a large value of 

20

Page 21: ECLT 5810 Linear Regression and Logistic Regression …eclt5810/lecture/logistic-reg-2017.pdf · ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam

Logistic RegressionAn Example of One‐dimension

• One the other hand, if the data has a lengthy transition from survival to death, it will lead to a low value of 

21

Page 22: ECLT 5810 Linear Regression and Logistic Regression …eclt5810/lecture/logistic-reg-2017.pdf · ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam

Logistic RegressionModel Fitting for General Cases (K classes, p Dimension)

• Logistic regression models fit by maximum likelihood‐ using the conditional likelihood of  given 

• completely specifies the conditional distribution the multinomial distribution is appropriate

22

Page 23: ECLT 5810 Linear Regression and Logistic Regression …eclt5810/lecture/logistic-reg-2017.pdf · ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam

Logistic RegressionModel Fitting for General Cases (K classes, p Dimension)

• Let entire parameter set be

, then

• Log‐likelihood for  observations of input data and class labels:

where • Find the model that maximizes the log‐likelihood.

23

Page 24: ECLT 5810 Linear Regression and Logistic Regression …eclt5810/lecture/logistic-reg-2017.pdf · ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam

Logistic RegressionExample• The subset of the Coronary Risk‐Factor Study 

(CORIS) baseline survey, carried out in three rural areas of the Western Cape, South Africa

• Aim: establish the intensity of ischemic heart disease risk factors in that high‐incidence region

• Response variable is the presence or absence of myocardial infraction (MI) at the time of survey

• 160 cases in data set, sample of 302 controls

24

Page 25: ECLT 5810 Linear Regression and Logistic Regression …eclt5810/lecture/logistic-reg-2017.pdf · ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam

Logistic RegressionExample

25

Page 26: ECLT 5810 Linear Regression and Logistic Regression …eclt5810/lecture/logistic-reg-2017.pdf · ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam

Logistic RegressionExample

• Fit a logistic‐regression model by maximum likelihood, giving the results shown in the next slide• z scores for each coefficients in the 

model (coefficients divided by their standard errors)

26

Page 27: ECLT 5810 Linear Regression and Logistic Regression …eclt5810/lecture/logistic-reg-2017.pdf · ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam

Logistic RegressionExample• Results from a logistic regression fit to the South 

African heart disease data:

27

Coefficient Std. Error Z Score

(Intercept) ‐4.130 0.964 ‐4.285

sbp 0.006 0.006 1.023

tobacco 0.080 0.026 3.034

ldl 0.185 0.057 3.219

famhist 0.939 0.225 4.178

obesity ‐0.035 0.029 ‐1.187

alcohol 0.001 0.004 0.136

age 0.043 0.010 4.184

Page 28: ECLT 5810 Linear Regression and Logistic Regression …eclt5810/lecture/logistic-reg-2017.pdf · ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam

Logistic RegressionExample• z scores greater than approximately 2 in absolute 

value is significant at the 5% level• Some surprises in the table of coefficients

• sbp and obesity appear to be not significant• On their own, both sbp and obesity are 

significant, with positive sign• Presence of many other correlated variables no longer needed (can even get a negative sign)

28

Page 29: ECLT 5810 Linear Regression and Logistic Regression …eclt5810/lecture/logistic-reg-2017.pdf · ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam

29

3 common transformation/ link function (provided by SAS):

Logit : ln(p/1-p) (We call this log of odd ratio)

Probit: Normal inverse of p (Recall: normal table’s mapping scheme)

Complementary log-log: ln(-ln(1-p))

The choice of link function depends on your purpose rather than performance. They all perform equally good but the implications is a bit different.

Logistic Regression

Page 30: ECLT 5810 Linear Regression and Logistic Regression …eclt5810/lecture/logistic-reg-2017.pdf · ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam

30

Related Measures

Page 31: ECLT 5810 Linear Regression and Logistic Regression …eclt5810/lecture/logistic-reg-2017.pdf · ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam

31

Wald’s Chi-square We could treat an effect as significant if the tail probability is

small enough (< 5%).

If we are using the model for predicting the outcome rather than the probability for that outcome (the case when the criterion is set to minimize loss), the interpretation for misclassification rate/ profit and loss/ ROC curve/ lift chart is similar to those for decision tree.

Related Measures