online learning rong jin. batch learning given a collection of training examples d learning a...
TRANSCRIPT
Online Learning
Rong Jin
Batch Learning
• Given a collection of training examples D
• Learning a classification model from D• What if training examples are received one
at each time ?
Online Learning
For t=1, 2, … T • Receive an instance• Predict its class label• Receive the true class label• Encounter loss• Update the classification model
4
Objective
• Minimize the total loss
• Loss function• Zero-One loss:
• Hinge loss:
5
Loss Functions
1
1Zero-One Loss
Hinge Loss
6
• Restrict our discussion to linear classifier
• Prediction:• Confidence:
Linear Classifiers
7
Separable Set
8
Inseparable Sets
9
Why Online Learning?
FastMemory efficient - process one example at a timeSimple to implementFormal guarantees – Regret/Mistake bounds Online to Batch conversionsNo statistical assumptionsAdaptive
Not as good as a well designed batch algorithms
10
Update Rules
• Online algorithms are based on an update rule which defines from (and possibly other information)
• Linear Classifiers : find from based on the input
Some Update Rules :– Perceptron (Rosenblat)– ALMA (Gentile)– ROMMA (Li & Long)– NORMA (Kivinen et. al)
– MIRA (Crammer & Singer)– EG (Littlestown and Warmuth)– Bregman Based (Warmuth)
Perceptron
Initialize For t=1, 2, … T • Receive an instance• Predict its class label• Receive the true class label• If then
12
Geometrical Interpretation
Mistake Bound: Separable Case
• Assume the data set D is linearly separable with margin , i.e.,
• Assume• Then, the maximum number of mistakes
made by the Perceptron algorithm is bounded by
Mistake Bound: Separable Case
Mistake Bound: Inseparable Case
• Let be the best linear classifier• We measure our progress by• Consider we make a mistake for
Mistake Bound: Inseparable Case
• Result 1:
Mistake Bound: Inseparable Case
• Result 2
Perceptron with Projection
Initialize For t=1, 2, … T • Receive an instance• Predict its class label• Receive the true class label• If then• If then
19
Remarks
• Mistake bound is measured for a sequence of classifiers
• Bound does not depend on dimension of the feature vector
• The bound holds for all sequences (no i.i.d. assumption).
• It is not tight for most real world data. But, it can not be further improved in general.
Perceptron
Initialize For t=1, 2, … T • Receive an instance• Predict its class label• Receive the true class label• If then
Conservative: updates the classifier only
when it misclassifies
Aggressive Perceptron
Initialize For t=1, 2, … T • Receive an instance• Predict its class label• Receive the true class label• If then
Regret Bound
Learning a Classifier
• The evaluation (mistake bound or regret bound) concerns a sequence of classifiers
• But, by the end of the day, which classifier should used ? The last? By Cross Validation ?
Learning with Expert Advice
• Learning to combine the predictions from multiple experts
• An ensemble of d experts: • Combination weights:
• Combined classifier
Hedge
Simple Case• There exists one expert, denoted by ,
who can perfectly classify all the training examples
• What is your learning strategy ?
Difficult case• What if we don’t have such a perfect expert ?
Hedge Algorithm
+1 -1 +1 +1
Hedge Algorithm
Initialize For t=1, 2, … T • Receive a training example• Prediction • If then
For i=1, 2, …, d• If then
Mistake Bound
Mistake Bound
• Measure the progress• Lower bound
Mistake Bound
• Upper bound
Mistake Bound
• Upper bound
Mistake Bound