applied machine learning lecture 5-2: logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf ·...

51

Upload: others

Post on 19-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

Applied Machine LearningLecture 5-2: Logistic regression and SVM

Selpi ([email protected])

The slides are further development of Richard Johansson's slides

February 7, 2020

Page 2: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

Overview

A bit on perceptron

logistic regression

training a logistic regression classi�er

detour: multiclass linear classi�ers

support vector classi�cation

optimizing the LR and SVM objectives

Page 3: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

The perceptron algorithm and the simpler version

Perceptron algorithm

w = (0, . . . , 0)repeat N timesfor (x i , yi ) in the training set

score = w · x iif a pos. is misclassi�edw = w + x i

else if a neg. is misclassi�edw = w − x i

return w

The simpler versionif the yi are coded as +1or -1:

w = (0, . . . , 0)for (x i , yi ) in the training set

score = w · x iif yi · score ≤ 0w = w + yi · x i

return w

Page 4: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

The perceptron algorithm and the simpler version

Perceptron algorithm

w = (0, . . . , 0)repeat N timesfor (x i , yi ) in the training set

score = w · x iif a pos. is misclassi�edw = w + x i

else if a neg. is misclassi�edw = w − x i

return w

The simpler versionif the yi are coded as +1or -1:

w = (0, . . . , 0)for (x i , yi ) in the training set

score = w · x iif yi · score ≤ 0w = w + yi · x i

return w

Page 5: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

how can we get the �certainty� of a linear classi�er?

score = w · x

I large positive score: quite certain that x belongs to thepositive class

I large negative score: quite certain that x belongs to thenegative class

I near zero: we are unsure

3 2 1 0 1 2 3 4

3

2

1

0

1

2

3

Page 6: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

Overview

A bit on perceptron

logistic regression

training a logistic regression classi�er

detour: multiclass linear classi�ers

support vector classi�cation

optimizing the LR and SVM objectives

Page 7: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

The logistic regression model

I logistic regression is a method to train a linear classi�er thatgives a probabilistic output

I how to get the probability? use a logistic or sigmoid function:

P(positive output|x) = 1

1+ e−score

where e−score = np.exp(-score)

P(negative output|x) = 1− 1

1+ e−score =1

1+ escore

Page 8: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

The logistic regression model

I logistic regression is a method to train a linear classi�er thatgives a probabilistic output

I how to get the probability? use a logistic or sigmoid function:

P(positive output|x) = 1

1+ e−score

where e−score = np.exp(-score)

P(negative output|x) = 1− 1

1+ e−score =1

1+ escore

Page 9: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

the logistic / sigmoid function [Verhulst, 1845]

6 4 2 0 2 4 6classifier score

0.0

0.2

0.4

0.6

0.8

1.0

P(y

= po

sitiv

e | x

)

Page 10: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

interpreting the linear classi�er's score

I the output score w · x can now be interpreted as the log oddsin favor of the positive outcome

I odds: how much more likely is the positive outcome than thenegative outcome?

odds =p

1− p

Knew that we ventured on such dangerous seas

That if we wrought out life 'twas ten to one

Shakespeare, Henry IV, Part II, Act I, Scene 1 lines 183�4.

Page 11: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

making it a bit more compact

I if we code the positive class as +1 and the negative classas -1, then we can write the probability a bit more neatly:

P(y |x) = 1

1+ e−y ·score

Page 12: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

in scikit-learn

I LR is called sklearn.linear_model.LogisticRegression

I predict_proba gives the probability output

Page 13: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

code example: using a logistic regression classi�er

Page 14: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

Overview

A bit on perceptron

logistic regression

training a logistic regression classi�er

detour: multiclass linear classi�ers

support vector classi�cation

optimizing the LR and SVM objectives

Page 15: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

recall: the maximum likelihood principle

I in a probabilistic model, we can train the model by selecingparameters that assign a high probability to the data

I in our case, the parameters are the weight vector w

I adjust w so that each output label gets a high probability

Page 16: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

the likelihood function

I formally, the �probability of the data� is de�ned by thelikelihood function

I this is the product of the probabilities of all m individualtraining instances:

L(w) = P(y1|x1) · · · · · P(ym|xm)

I in our case, this means

L(w) =1

1+ e−y1·(w ·x1)· · · · · 1

1+ e−ym·(w ·xm)

Page 17: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

rewriting a bit. . .

I we rewrite the previous formula

L(w) =1

1+ e−y1·(w ·x1)· · · · · 1

1+ e−ym·(w ·xm)

as

− logL(w) = Loss(w , x1, y1) + . . .+ Loss(w , xm, ym)

where

Loss(w , x , y) = log(1+ exp(−y · (w · x)))

is called the log loss function

Page 18: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

plot of the log loss

6 4 2 0 2 4 6y * classifier score

0

1

2

3

4

5

6lo

g lo

ss

Page 19: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

The fundamental tradeo� in machine learning

I goodness of �t: the learned classi�er should be ableto correctly classify the examples in the training data

I regularization: the classi�er should be simple

I but so far in our LR description, we've just taken careof the �rst part!

− logL(w) = Loss(w , x1, y1) + . . .+ Loss(w , xm, ym)

Page 20: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

regularization in logistic regression models

I just like we saw for linear regression models (Ridge and Lasso),we can add a regularizer that keeps the weights small

I most commonly, the L2 regularizer:

‖w‖2 = w1 · w1 + . . .+ wn · wn = w ·w

I . . . or an L1 regularizer:

‖w‖1 = |w1|+ . . .+ |wn|

which will do some feature selection

Page 21: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

combining the pieces

I we combine the loss and the regularizer:

1

N∑i=1

Loss(w , x i , yi ) +λ

2· ‖w‖2

I in this formula, λ is a �tweaking� parameter that controls thetradeo� between loss and regularization

I note: in some formulations (including scikit-learn), there is aparameter C instead of the λ that is put before the loss

C

N∑i=1

Loss(w , x i , yi ) +1

2‖w‖2

Page 22: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

check

C

N∑i=1

Loss(w , x i , yi ) +1

2‖w‖2

I how do we convert this into an algorithm?

Page 23: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

probabilistic justi�cation of the regularizers

I by adding the regularizer, we are carrying outmaximum a posteriori estimation insteadof maximum likelihood

I in MAP estimation, we use a prior to push

the parameters in a desired direction

wMAP = argmaxw

p(data|w)·p(w)

I L2 regularizer = Gaussian prior

− log p(w) = λ‖w‖2 + constant

I L1 regularizer = Laplace prior

− log p(w) = λ|w |+ constant

3 2 1 0 1 2 30.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

3 2 1 0 1 2 30.0

0.1

0.2

0.3

0.4

0.5

Page 24: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

Overview

A bit on perceptron

logistic regression

training a logistic regression classi�er

detour: multiclass linear classi�ers

support vector classi�cation

optimizing the LR and SVM objectives

Page 25: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

two-class (binary) linear classi�ers

I a linear classi�er is a classi�er that is de�ned in terms of ascoring function like this

score = w · x

I this is a binary (2-class) classi�er:I return the �rst class if the score > 0I . . . otherwise the second class

I how can we deal with non-binary (multi-class) problems whenusing linear classi�ers?

Page 26: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

approaches to multiclass classi�cation

I idea 1: break down the complex problem into simplerproblems, train a classi�er for each separately

I idea 2: modify the learning algorithm so that it can handlethe multiclass case directly

Page 27: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

idea 1: reduction from multiclass to binary

I one-versus-rest (�long jump�):I for each class c, make a binary classi�er to distinguish c from

all other classesI so if there are n classes, there are n classi�ersI at test time, we select the class giving the highest score

I one-versus-one (�football league�):I for each pair of classes c1 and c2, make a classi�er to

distinguish c1 from c2I if there are n classes, there are

n·(n−1)2

classi�ersI at test time, we select the class that has most �wins�

Page 28: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

example

I assume we're training a classi�er of fruits and we have theclasses apple, orange, mango

I in one-vs-rest, we train the following three classi�ers:I apple vs orange+mangoI orange vs apple+mangoI mango vs apple+orange

I in one-vs-one, we train the following three:I apple vs orangeI apple vs mangoI orange vs mango

Page 29: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

example (continued)

I we train classi�ers to distinguish between apple, orange, andmango, using one-vs-restI so we get wapple, worange, wmango

I for some instance x , the respective scores are

[-1, 2.2, 1.5]

so our guess is orange

Page 30: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

in scikit-learn

I scikit-learn includes implementations of both of the methodswe have discussed:I OneVsRestClassifierI OneVsOneClassifier

I however, the built-in algorithms (e.g. Perceptron,LogisticRegression) will do this automatically for youI they use one-versus-rest

Page 31: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

idea 2: multiclass learning algorithms

I is it good to separate the multiclass task into smaller tasksthat are trained independently?I maybe training should be similar to testing?

I let's make a model where one-vs-rest is used while trainingI we'll see how this can be done for logistic regression

Page 32: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

binary LR: reminders

I the logistic or sigmoid function:

P(positive output|x) = 1

1+ e−score

def sigmoid(score):

return 1 / (1 + np.exp(-score))

I when training, we minimize the log-loss

Loss(w , x , y) = log(1+ exp(−y · (w · x)))

Page 33: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

multiclass LR using the softmax

I the softmax function is used in multiclass LR instead of thelogistic:

P(yi |x) =escorei∑k e

scorek

def softmax(scores):

expscores = np.exp(scores)

return expscores / sum(expscores)

[exercise: make softmax numerically stable]

Page 34: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

softmax example

def softmax(scores):

expscores = np.exp(scores)

return expscores / sum(expscores)

scores = [-1, 2.2, 1.5, -0.3]

print(softmax(scores))

array([ 0.02517067, 0.61750026, 0.30664156, 0.05068751])

0 1 2 30.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Page 35: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

cross-entropy loss

I when training, the softmax probabilities lead to thecross-entropy loss instead of the log loss

LossCE (w , x i , yi ) = − logP(yi |x i ) = − logescorei∑k e

scorek

I just like the log-loss:I high probability for the correct label yi ⇒ low lossI low probability for yi ⇒ high loss

Page 36: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

multiclass LR in scikit-learn

I LogisticRegression(multi_class='multinomial')

I (otherwise, separate classi�ers are trained independently)

Page 37: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

Overview

A bit on perceptron

logistic regression

training a logistic regression classi�er

detour: multiclass linear classi�ers

support vector classi�cation

optimizing the LR and SVM objectives

Page 38: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

geometric view

I using lines to separate 2-dimensional dataI using planes to separate 3-dimensional data

4 3 2 1 0 1 2 3 4

3

2

1

0

1

2

3

4

Page 39: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

margin of separation

I the margin γ denotes how well w separates the classes:

4 3 2 1 0 1 2 3 4

3

2

1

0

1

2

3

4

4 3 2 1 0 1 2 3 4

3

2

1

0

1

2

3

4

I γ is the shortest distance from the separator to the nearesttraining instance

Page 40: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

large margins are good

I a result from statistical learning theory:

true error ≤ training error+ BigUglyFormula(1

γ2)

I larger margin → better generalization

Page 42: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

support vector machinesI support vector machines (SVM) or support vector classi�ers

(SVC) are linear classi�ers constructed by selecting the w thatmaximizes the margin

4 3 2 1 0 1 2 3 4

3

2

1

0

1

2

3

4

I note: the solution depends only on the borderline examples:the support vectors

Page 43: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

soft-margin SVMs

I in some cases the dataset is inseparable, or nearly inseparable

I soft-margin SVM: allow some examples to be disregardedwhen maximizing the margin

ξi

B) Soft Margin SVM A) Hard Margin SVM

ixr

ixr

Page 44: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

stating the SVM as an objective function

I the hard-margin and soft-margin SVM can be statedmathematically in a number of ways

I we'll skip the details, but it can be shown (see Daumé's book)that the soft-margin SVM can be stated as minimizing

C

N∑i=1

Loss(w , x i , yi ) +1

2‖w‖2

where

Loss(w , x , y) = max(0, 1− y · (w · x))

is called the hinge loss

Page 45: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

plot of the hinge loss

6 4 2 0 2 4 6y * classifier score

0

1

2

3

4

5

6

7hi

nge

loss

Page 46: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

in scikit-learn

I linear SVM is called sklearn.svm.LinearSVC

Page 47: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

Overview

A bit on perceptron

logistic regression

training a logistic regression classi�er

detour: multiclass linear classi�ers

support vector classi�cation

optimizing the LR and SVM objectives

Page 48: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

SVM and LR have convex objective functions

−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.00.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

+

0.0 0.5 1.0 1.5 2.00.0

0.2

0.4

0.6

0.8

1.0

−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.00.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

+

−1.0 −0.5 0.0 0.5 1.0 1.5 2.00.0

0.2

0.4

0.6

0.8

1.0

Page 49: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

optimizing SVM and LR

I since the objective functions of SVM and LR are convex, wecan �nd w by stochastic gradient descent

I pseudocode:I set w to some initial value, e.g. all zeroI iterate a �xed number of times:

I select a single training instance xI select a �suitable� step length ηI compute the gradient of the hinge loss or log lossI subtract step length · gradient from w

I note the similarity to the perceptron!

Page 50: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

missing pieces

I setting the learning rate ηI in principle, one can try to select a �small enough� value of ηI in practice, it's better to decrease η graduallyI we can use the Pegasos algorithm to set η as follows:

η =C

t=

1

λ · t

whereI t is the current step (1, 2, . . . )I C or λ is the loss/regularization tradeo�

I gradients for SVM and LR loss functions (hinge and log loss)

Page 51: Applied Machine Learning Lecture 5-2: Logistic regression ...richajo/dit866/lectures/l5/l5_2.pdf · Applied Machine Learning Lecture 5-2: Logistic regression and SVM Selpi (selpi@chalmers.se)

Next lecture

I Gradient Boosting

I Evaluation methods