online learning rong jin. batch learning given a collection of training examples d learning a...

Online Learning

Rong Jin

Batch Learning

• Given a collection of training examples D

• Learning a classification model from D• What if training examples are received one

at each time ?

Online Learning

For t=1, 2, … T • Receive an instance• Predict its class label• Receive the true class label• Encounter loss• Update the classification model

4

Objective

• Minimize the total loss

• Loss function• Zero-One loss:

• Hinge loss:

5

Loss Functions

1

1Zero-One Loss

Hinge Loss

6

• Restrict our discussion to linear classifier

• Prediction:• Confidence:

Linear Classifiers

7

Separable Set

8

Inseparable Sets

9

Why Online Learning?

FastMemory efficient - process one example at a timeSimple to implementFormal guarantees – Regret/Mistake bounds Online to Batch conversionsNo statistical assumptionsAdaptive

Not as good as a well designed batch algorithms

10

Update Rules

• Online algorithms are based on an update rule which defines from (and possibly other information)

• Linear Classifiers : find from based on the input

Some Update Rules :– Perceptron (Rosenblat)– ALMA (Gentile)– ROMMA (Li & Long)– NORMA (Kivinen et. al)

– MIRA (Crammer & Singer)– EG (Littlestown and Warmuth)– Bregman Based (Warmuth)

Perceptron

Initialize For t=1, 2, … T • Receive an instance• Predict its class label• Receive the true class label• If then

12

Geometrical Interpretation

Mistake Bound: Separable Case

• Assume the data set D is linearly separable with margin , i.e.,

• Assume• Then, the maximum number of mistakes

made by the Perceptron algorithm is bounded by

Mistake Bound: Separable Case

Mistake Bound: Inseparable Case

• Let be the best linear classifier• We measure our progress by• Consider we make a mistake for


• Result 1:


• Result 2

Perceptron with Projection

Initialize For t=1, 2, … T • Receive an instance• Predict its class label• Receive the true class label• If then• If then

19

Remarks

• Mistake bound is measured for a sequence of classifiers

• Bound does not depend on dimension of the feature vector

• The bound holds for all sequences (no i.i.d. assumption).

• It is not tight for most real world data. But, it can not be further improved in general.

Perceptron


Conservative: updates the classifier only

when it misclassifies

Aggressive Perceptron


Regret Bound

Learning a Classifier

• The evaluation (mistake bound or regret bound) concerns a sequence of classifiers

• But, by the end of the day, which classifier should used ? The last? By Cross Validation ?

Learning with Expert Advice

• Learning to combine the predictions from multiple experts

• An ensemble of d experts: • Combination weights:

• Combined classifier

Hedge

Simple Case• There exists one expert, denoted by ,

who can perfectly classify all the training examples

• What is your learning strategy ?

Difficult case• What if we don’t have such a perfect expert ?

Hedge Algorithm

+1 -1 +1 +1

Hedge Algorithm

Initialize For t=1, 2, … T • Receive a training example• Prediction • If then

For i=1, 2, …, d• If then

Mistake Bound

Mistake Bound

• Measure the progress• Lower bound

Mistake Bound

• Upper bound

Mistake Bound

online learning rong jin. batch learning given a collection of training examples d learning a...

Documents