some useful machine learning tools

99
Some Useful Machine Learning Tools M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France

Upload: nico

Post on 25-Feb-2016

67 views

Category:

Documents


0 download

DESCRIPTION

Some Useful Machine Learning Tools. M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay , Île-de-France. Outline. Part I : Supervised Learning Part II: Weakly Supervised Learning. Outline – Part I. Introduction to Supervised Learning Probabilistic Methods - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Some Useful Machine Learning Tools

Some Useful Machine Learning Tools

M. Pawan KumarÉcole Centrale Paris

École des Ponts ParisTechINRIA Saclay, Île-de-France

Page 2: Some Useful Machine Learning Tools

• Part I : Supervised Learning

• Part II: Weakly Supervised Learning

Outline

Page 3: Some Useful Machine Learning Tools

• Introduction to Supervised Learning

• Probabilistic Methods– Logistic regression– Multiclass logistic regression– Regularized maximum likelihood

• Loss-based Methods– Support vector machine– Structured output support vector machine

Outline – Part I

Page 4: Some Useful Machine Learning Tools

Image Classification

Is this an urban or rural area?

Input: x Output: y {-1,+1}

Page 5: Some Useful Machine Learning Tools

Image Classification

Is this scan healthy or unhealthy?

Input: x Output: y {-1,+1}

Page 6: Some Useful Machine Learning Tools

Image Classification

Which city is this?

Input: x Output: y {1,2,…,C}

Page 7: Some Useful Machine Learning Tools

Image Classification

What type of tumor does this scan contain?

Input: x Output: y {1,2,…,C}

Page 8: Some Useful Machine Learning Tools

Object Detection

Where is the object in the image?

Input: x Output: y {Pixels}

Page 9: Some Useful Machine Learning Tools

Object Detection

Where is the rupture in the scan?

Input: x Output: y {Pixels}

Page 10: Some Useful Machine Learning Tools

Segmentation

What is the semantic class of each pixel?

Input: x Output: y {1,2,…,C}|Pixels|

car

roadgrass

treesky

sky

Page 11: Some Useful Machine Learning Tools

Segmentation

What is the muscle group of each pixel?

Input: x Output: y {1,2,…,C}|Pixels|

Page 12: Some Useful Machine Learning Tools

A Simplified View of the Pipeline

Inputx

FeaturesΦ(x)

Scoresf(Φ(x),y)

Extract Features

ComputeScores

maxy f(Φ(x),y)Prediction

y(f)

Learn f

http://deeplearning.net

Page 13: Some Useful Machine Learning Tools

Learning Objective

Data distribution P(x,y)

Prediction

f* = argminf EP(x,y) Error(y(f),y)

Ground Truth

Measure of prediction quality

Distribution is unknown

Expectation overdata distribution

Page 14: Some Useful Machine Learning Tools

Learning Objective

Training data {(xi,yi), i = 1,2,…,n}

Prediction

f* = argminf EP(x,y) Error(y(f),y)

Ground Truth

Measure of prediction quality

Expectation overdata distribution

Page 15: Some Useful Machine Learning Tools

Learning Objective

Training data {(xi,yi), i = 1,2,…,n}

Prediction

f* = argminf Σi Error(yi(f),yi)

Ground Truth

Measure of prediction quality

Expectation overempirical distribution

Finite samples

Page 16: Some Useful Machine Learning Tools

Learning Objective

Training data {(xi,yi), i = 1,2,…,n}

f* = argminf Σi Error(yi(f),yi) + λ R(f)

Finite samples

RegularizerRelative weight(hyperparameter)

Page 17: Some Useful Machine Learning Tools

• Introduction to Supervised Learning

• Probabilistic Methods– Logistic regression– Multiclass logistic regression– Regularized maximum likelihood

• Loss-based Methods– Support vector machine– Structured output support vector machine

Outline – Part I

Page 18: Some Useful Machine Learning Tools

Logistic RegressionInput: x Output: y {-1,+1}Features: Φ(x)

f(Φ(x),y) = yθTΦ(x) Prediction: sign(θTΦ(x))

P(y|x) = l(f(Φ(x),y))

l(z) = 1/(1+e-z)

Logistic function

Is the distribution normalized?

Page 19: Some Useful Machine Learning Tools

Logistic Regression

Training data {(xi,yi), i = 1,2,…,n}

minθ Σi –log(P(yi|xi)) + λ R(θ)

NegativeLog-likelihood

Regularizer

Page 20: Some Useful Machine Learning Tools

Logistic Regression

Training data {(xi,yi), i = 1,2,…,n}

minθ Σi –log(P(yi|xi)) + λ ||θ||2

Convex optimization problem

Proof left as an exercise.

Hint: Prove that Hessian H is PSD

aTHa ≥ 0, for all a

Page 21: Some Useful Machine Learning Tools

Gradient Descent

Training data {(xi,yi), i = 1,2,…,n}

minθ Σi –log(P(yi|xi)) + λ ||θ||2

Start with an initial estimate θ0

θt+1 θt - μ dL(θ)dθ

Repeat until decrease in objective is below a threshold

θt

Page 22: Some Useful Machine Learning Tools

Gradient Descent

Small μ Large μ

Page 23: Some Useful Machine Learning Tools

Gradient Descent

Small μ Large μ

Page 24: Some Useful Machine Learning Tools

Gradient Descent

Training data {(xi,yi), i = 1,2,…,n}

minθ Σi –log(P(yi|xi)) + λ ||θ||2

Start with an initial estimate θ0

θt+1 θt - μ dL(θ)dθ

Repeat until decrease in objective is below a threshold

θt

Small constant orLine search

Page 25: Some Useful Machine Learning Tools

Newton’s Method

Minimize g(z) Solution at iteration t = zt

Define gt(Δz) = g(zt + Δz)

Second-order Taylor’s Seriesgt(Δz) ≈ g(zt) + g’(zt)Δz + g’’(zt) (Δz)2

Derivative wrt Δz = 0, implies g’(zt) + g’’(zt) Δz = 0

Solving for Δz provides the learning rate

Page 26: Some Useful Machine Learning Tools

Newton’s Method

Training data {(xi,yi), i = 1,2,…,n}

minθ Σi –log(P(yi|xi)) + λ ||θ||2

Start with an initial estimate θ0

θt+1 θt - μ dL(θ)dθ

Repeat until decrease in objective is below a threshold

θt

μ-1 = d2L(θ)dθ2

θt

Page 27: Some Useful Machine Learning Tools

Logistic RegressionInput: x Output: y {1,2,…,C}Features: Φ(x)

Train C 1-vs-all logistic regression binary classifiers

Prediction: Maximum probability of +1 over C classifiers

Simple extension, easy to code

Loses the probabilistic interpretation

Page 28: Some Useful Machine Learning Tools

• Introduction to Supervised Learning

• Probabilistic Methods– Logistic regression– Multiclass logistic regression– Regularized maximum likelihood

• Loss-based Methods– Support vector machine– Structured output support vector machine

Outline – Part I

Page 29: Some Useful Machine Learning Tools

Multiclass Logistic RegressionInput: x Output: y {1,2,…,C}Features: Φ(x)

Joint feature vector of input and output: Ψ(x,y)

Ψ(x,1) = [Φ(x) 0 0 … 0]

Ψ(x,2) = [0 Φ(x) 0 … 0]

Ψ(x,C) = [0 0 0 … Φ(x)]

Page 30: Some Useful Machine Learning Tools

Multiclass Logistic RegressionInput: x Output: y {1,2,…,C}Features: Φ(x)

Joint feature vector of input and output: Ψ(x,y)

f(Ψ(x,y)) = θTΨ(x,y)

Prediction: maxy θTΨ(x,y))

P(y|x) = exp(f(Ψ(x,y)))/Z(x)

Partition function Z(x) = Σy exp(f(Ψ(x,y)))

Page 31: Some Useful Machine Learning Tools

Multiclass Logistic Regression

Training data {(xi,yi), i = 1,2,…,n}

minθ Σi –log(P(yi|xi)) + λ ||θ||2

Convex optimization problem

Gradient Descent, Newton’s Method, and many others

Page 32: Some Useful Machine Learning Tools

• Introduction to Supervised Learning

• Probabilistic Methods– Logistic regression– Multiclass logistic regression– Regularized maximum likelihood

• Loss-based Methods– Support vector machine– Structured output support vector machine

Outline – Part I

Page 33: Some Useful Machine Learning Tools

Regularized Maximum LikelihoodInput: x Output: y {1,2,…,C}mFeatures: Φ(x)

Joint feature vector of input and output: Ψ(x,y)

[Ψ(x,y1); Ψ(x,y2); …; Ψ(x,ym)]

[Ψ(x,yi), for all i; Ψ(x,yi,yj), for all i, j]

Page 34: Some Useful Machine Learning Tools

Regularized Maximum LikelihoodInput: x Output: y {1,2,…,C}mFeatures: Φ(x)

Joint feature vector of input and output: Ψ(x,y)

[Ψ(x,y1); Ψ(x,y2); …; Ψ(x,ym)]

[Ψ(x,yi), for all i; Ψ(x,yij), for all i, j]

[Ψ(x,yi), for all i; Ψ(x,yc), c is a subset of variables]

Page 35: Some Useful Machine Learning Tools

Input: x Output: y {1,2,…,C}mFeatures: Φ(x)

Joint feature vector of input and output: Ψ(x,y)

f(Ψ(x,y)) = θTΨ(x,y)

Prediction: maxy θTΨ(x,y))

P(y|x) = exp(f(Ψ(x,y)))/Z(x)

Partition function Z(x) = Σy exp(f(Ψ(x,y)))

Regularized Maximum Likelihood

Page 36: Some Useful Machine Learning Tools

Training data {(xi,yi), i = 1,2,…,n}

minθ Σi –log(P(yi|xi)) + λ ||θ||2

Partition function is expensive to compute

Regularized Maximum Likelihood

Approximate inference (Nikos Komodakis’ tutorial)

Page 37: Some Useful Machine Learning Tools

• Introduction to Supervised Learning

• Probabilistic Methods– Logistic regression– Multiclass logistic regression– Regularized maximum likelihood

• Loss-based Methods– Support vector machine (multiclass)– Structured output support vector machine

Outline – Part I

Page 38: Some Useful Machine Learning Tools

Multiclass SVMInput: x Output: y {1,2,…,C}Features: Φ(x)

Joint feature vector of input and output: Ψ(x,y)

Ψ(x,1) = [Φ(x) 0 0 … 0]

Ψ(x,2) = [0 Φ(x) 0 … 0]

Ψ(x,C) = [0 0 0 … Φ(x)]

Page 39: Some Useful Machine Learning Tools

Multiclass SVMInput: x Output: y {1,2,…,C}Features: Φ(x)

Joint feature vector of input and output: Ψ(x,y)

f(Ψ(x,y)) = wTΨ(x,y)

Prediction: maxy wTΨ(x,y))

Predicted Output: y(w) = argmaxy wTΨ(x,y))

Page 40: Some Useful Machine Learning Tools

Multiclass SVM

Training data {(xi,yi), i = 1,2,…,n}

Δ(yi,yi(w))

Loss function for i-th sample

Minimize the regularized sum of loss over training data

Highly non-convex in w

Regularization plays no role (overfitting may occur)

Page 41: Some Useful Machine Learning Tools

Multiclass SVM

Training data {(xi,yi), i = 1,2,…,n}

Δ(yi,yi(w))wTΨ(x,yi(w)) + - wTΨ(x,yi(w))

≤ wTΨ(x,yi(w)) + Δ(yi,yi(w)) - wTΨ(x,yi)

≤ maxy { wTΨ(x,y) + Δ(yi,y) } - wTΨ(x,yi)

ConvexSensitive to regularization of w

Page 42: Some Useful Machine Learning Tools

Multiclass SVM

Training data {(xi,yi), i = 1,2,…,n}

wTΨ(x,y) + Δ(yi,y) - wTΨ(x,yi) ≤ ξi for all y

minw ||w||2 + C Σiξi

Specialized software packages freely available

http://www.cs.cornell.edu/People/tj/svm_light/svm_multiclass.html

Quadratic program with polynomial # of constraints

Page 43: Some Useful Machine Learning Tools

• Introduction to Supervised Learning

• Probabilistic Methods– Logistic regression– Multiclass logistic regression– Regularized maximum likelihood

• Loss-based Methods– Support vector machine (multiclass)– Structured output support vector machine

Outline – Part I

Page 44: Some Useful Machine Learning Tools

Input: x Output: y {1,2,…,C}mFeatures: Φ(x)

Joint feature vector of input and output: Ψ(x,y)

f(Ψ(x,y)) = wTΨ(x,y)

Prediction: maxy wTΨ(x,y))

Structured Output SVM

Page 45: Some Useful Machine Learning Tools

Structured Output SVM

Training data {(xi,yi), i = 1,2,…,n}

wTΨ(x,y) + Δ(yi,y) - wTΨ(x,yi) ≤ ξi for all y

minw ||w||2 + C Σiξi

Quadratic program with exponential # of constraints

Many polynomial time algorithms

Page 46: Some Useful Machine Learning Tools

Cutting Plane Algorithm

Define working sets Wi = {}

wTΨ(x,y) + Δ(yi,y) - wTΨ(x,yi) ≤ ξi for all y Wi

minw ||w||2 + C Σiξi

ŷi = argmaxy wTΨ(x,y) + Δ(yi,y)

Update w by solving the following problem

Compute the most violated constraint for all samples

Update the working sets Wi by adding ŷi

REPEAT

Page 47: Some Useful Machine Learning Tools

Cutting Plane Algorithm

Number of iterations = max{O(n/ε),O(C/ε2)}

Termination criterion: Violation of ŷi < ξi + ε, for all i

Ioannis Tsochantaridis et al., JMLR 2005

At each iteration, convex dual of problem increases.

Convex dual can be upper bounded.

http://svmlight.joachims.org/svm_struct.html

Page 48: Some Useful Machine Learning Tools

Structured Output SVM

Training data {(xi,yi), i = 1,2,…,n}

wTΨ(x,y) + Δ(yi,y) - wTΨ(x,yi) ≤ ξi

minw ||w||2 + C Σiξi

for all y {1,2,…,C}m

Number of constraints = nCm

Page 49: Some Useful Machine Learning Tools

Structured Output SVM

Training data {(xi,yi), i = 1,2,…,n}

wTΨ(x,y) + Δ(yi,y) - wTΨ(x,yi) ≤ ξi

minw ||w||2 + C Σiξi

for all y Y

Page 50: Some Useful Machine Learning Tools

Structured Output SVM

Training data {(xi,yi), i = 1,2,…,n}

wTΨ(x,zi) + Δ(yi,zi) - wTΨ(x,yi) ≤ ξi

minw ||w||2 + C Σiξi

for all zi Y

Page 51: Some Useful Machine Learning Tools

Structured Output SVM

Training data {(xi,yi), i = 1,2,…,n}

Σi (wTΨ(x,zi) + Δ(yi,zi) - wTΨ(x,yi)) ≤ Σiξi

minw ||w||2 + C Σiξi

for all Z = {zi,i=1,…,n} Yn

Equivalent problem to structured output SVM

Number of constraints = Cmn

Page 52: Some Useful Machine Learning Tools

1-Slack Structured Output SVM

Training data {(xi,yi), i = 1,2,…,n}

Σi (wTΨ(x,zi) + Δ(yi,zi) - wTΨ(x,yi)) ≤ ξ

minw ||w||2 + C ξ

for all Z = {zi,i=1,…,n} Yn

Page 53: Some Useful Machine Learning Tools

Cutting Plane Algorithm

Define working sets W = {}

Σi (wTΨ(x,zi) + Δ(yi,zi) - wTΨ(x,yi)) ≤ ξ for all Z W

minw ||w||2 + C ξ

zi = argmaxy wTΨ(x,y) + Δ(yi,y)

Update w by solving the following problem

Compute the most violated constraint for all samples

Update the working sets W by adding {zi, i=1,…n}

REPEAT

Page 54: Some Useful Machine Learning Tools

Cutting Plane Algorithm

Number of iterations = O(C/ε)

Termination criterion: Violation of {zi} < ξ + ε

Thorsten Joachims et al., Machine Learning 2009

At each iteration, convex dual of problem increases.

Convex dual can be upper bounded.

http://svmlight.joachims.org/svm_struct.html

Page 55: Some Useful Machine Learning Tools

• Introduction to Weakly Supervised Learning– Two types of problems

• Probabilistic Methods– Expectation maximization

• Loss-based Methods– Latent support vector machine– Dissimilarity coefficient learning

Outline – Part II

Page 56: Some Useful Machine Learning Tools

Computer Vision Data

Segmentation

Information

Log

(Size

)

~ 2000

Page 57: Some Useful Machine Learning Tools

Computer Vision Data

Segmentation

Log

(Size

)

~ 2000

Information

Bounding Box

~ 1 M

Page 58: Some Useful Machine Learning Tools

Computer Vision Data

Segmentation

Log

(Size

)

Bounding BoxImage-Level ~ 2000

~ 1 M> 14 M

“Car” “Chair”Information

Page 59: Some Useful Machine Learning Tools

Computer Vision Data

Segmentation

Log

(Size

)

Image-Level

Noisy Label~ 2000

> 14 M

> 6 B

Information

Bounding Box

~ 1 M

Page 60: Some Useful Machine Learning Tools

Data

Learn with missing information (latent variables)

Detailed annotation is expensive

Often, in medical imaging, annotation is impossible

Desired annotation keeps changing

Page 61: Some Useful Machine Learning Tools

• Introduction to Weakly Supervised Learning– Two types of problems

• Probabilistic Methods– Expectation maximization

• Loss-based Methods– Latent support vector machine– Dissimilarity coefficient learning

Outline – Part II

Page 62: Some Useful Machine Learning Tools

Annotation MismatchLearn to classify an image

Image x

Annotation y = “Deer”

Mismatch between desired and available annotations

h

Exact value of latent variable is not “important”

Desired Outputy

Page 63: Some Useful Machine Learning Tools

Annotation MismatchLearn to classify a DNA sequence

Mismatch between desired and possible annotations

Exact value of latent variable is not “important”

Sequence x

Annotation y {+1, -1}

Latent Variables h

Desired Output y

Page 64: Some Useful Machine Learning Tools

Output MismatchLearn to detect an object in an image

Mismatch between output and available annotations

Exact value of latent variable is important

Image x

Annotation y = “Deer”

hDesired Output

(y,h)

Page 65: Some Useful Machine Learning Tools

Output MismatchLearn to segment an image

Image Desired Output

Page 66: Some Useful Machine Learning Tools

Output MismatchLearn to segment an image

Bird

(x, y) (y, h)

Page 67: Some Useful Machine Learning Tools

Output MismatchLearn to segment an image

(x, y) (y, h)

Cow

Mismatch between output and available annotations

Exact value of latent variable is important

Page 68: Some Useful Machine Learning Tools

• Introduction to Weakly Supervised Learning– Two types of problems

• Probabilistic Methods– Expectation maximization

• Loss-based Methods– Latent support vector machine– Dissimilarity coefficient learning

Outline – Part II

Page 69: Some Useful Machine Learning Tools

Expectation MaximizationInput: x Latent Variables: hAnnotation: y

Joint feature vector: Ψ(x,y,h)

f(Ψ(x,y,h)) = θTΨ(x,y,h)

Prediction: maxy P(y|x;θ) = maxy Σh P(y,h|x;θ)

P(y,h|x;θ) = exp(f(Ψ(x,y,h)))/Z(x;θ)

Partition function Z(x;θ) = Σy,h exp(f(Ψ(x,y,h)))

Page 70: Some Useful Machine Learning Tools

Expectation MaximizationInput: x Latent Variables: hAnnotation: y

Joint feature vector: Ψ(x,y,h)

f(Ψ(x,y,h)) = θTΨ(x,y,h)

Prediction: maxy,h P(y,h|x;θ)

P(y,h|x;θ) = exp(f(Ψ(x,y,h)))/Z(x;θ)

Partition function Z(x;θ) = Σy,h exp(f(Ψ(x,y,h)))

Page 71: Some Useful Machine Learning Tools

Expectation Maximization

Training data {(xi,yi), i = 1,2,…,n}

minθ Σi –log(P(yi|xi;θ)) + λ ||θ||2

Annotation Mismatch

- log P(y|x;θ)

log P(y,h|x;θ)EP(h|y,x;θ’)log P(h|y,x;θ) -EP(h|y,x;θ’)

Left as exerciseMaximized at θ = θ’

Page 72: Some Useful Machine Learning Tools

Expectation Maximization

Training data {(xi,yi), i = 1,2,…,n}

minθ Σi –log(P(yi|xi;θ)) + λ ||θ||2

Annotation Mismatch

- log P(y|x;θ)

log P(y,h|x;θ)EP(h|y,x;θ’)log P(h|y,x;θ) -EP(h|y,x;θ’)

Maximized at θ = θ’

minθ

Page 73: Some Useful Machine Learning Tools

Expectation Maximization

Training data {(xi,yi), i = 1,2,…,n}

minθ Σi –log(P(yi|xi;θ)) + λ ||θ||2

Annotation Mismatch

- log P(y|x;θ)

log P(y,h|x;θ)EP(h|y,x;θ’)-

minθ

minθ

Page 74: Some Useful Machine Learning Tools

Expectation Maximization

Start with an initial estimate θ0

minθ Σi –EP(h|yi,xi;θt) log(P(yi,h|xi;θ)) + λ ||θ||2

E-step: Compute P(h|y,x;θt)

M-step: Obtain θt+1 by solving the following problem

Repeat until convergence

Page 75: Some Useful Machine Learning Tools

• Introduction to Weakly Supervised Learning– Two types of problems

• Probabilistic Methods– Expectation maximization

• Loss-based Methods– Latent support vector machine– Dissimilarity coefficient learning

Outline – Part II

Page 76: Some Useful Machine Learning Tools

Latent SVM

Input x

Output y Y

“Deer”

Hidden Variableh H

Y = {“Bison”, “Deer”, ”Elephant”, “Giraffe”, “Llama”, “Rhino” }

Page 77: Some Useful Machine Learning Tools

Latent SVM

Feature (x,y,h)(HOG, BoW)

(y(w),h(w)) = maxyY,hH wT(x,y,h)

Parameters w

Page 78: Some Useful Machine Learning Tools

Latent SVM

Training samples xi

Ground-truth label yi

Loss Function(yi, yi(w))

Annotation Mismatch

Page 79: Some Useful Machine Learning Tools

(y(w),h(w)) = maxyY,hH wT(x,y,h)

Latent SVM

- wT(xi,yi(w),hi(w))

“Very” non-convex

(yi, yi(w))wT(xi,yi(w),hi(w)) +

Page 80: Some Useful Machine Learning Tools

(y(w),h(w)) = maxyY,hH wT(x,y,h)

Latent SVM

Upper Bound

(yi, yi(w))wT(xi,yi(w),hi(w)) +

- maxhi wT(xi,yi,hi)

Page 81: Some Useful Machine Learning Tools

(y(w),h(w)) = maxyY,hH wT(x,y,h)

Latent SVM

Upper Bound

(yi, y)wT(xi,y,h) +maxy,h

- maxhi wT(xi,yi,hi)

Page 82: Some Useful Machine Learning Tools

(y(w),h(w)) = maxyY,hH wT(x,y,h)

Latent SVM

min ||w||2 + C∑i i

maxhiwT(xi,yi,hi) - wT(xi,y,h)

≥ (yi, y) - i

So is this convex?

Page 83: Some Useful Machine Learning Tools

(y(w),h(w)) = maxyY,hH wT(x,y,h)

Latent SVM

(yi, y)wT(xi,y,h) +maxy,h

- maxhi wT(xi,yi,hi)

Convex

Convex

Difference-of-convex !!

Page 84: Some Useful Machine Learning Tools

Concave-Convex Procedure

+

Linear upper-bound of concave part

Page 85: Some Useful Machine Learning Tools

+

Linear upper-bound of concave part

Concave-Convex Procedure

Page 86: Some Useful Machine Learning Tools

+

Until Convergence

Concave-Convex Procedure

Page 87: Some Useful Machine Learning Tools

(y(w),h(w)) = maxyY,hH wT(x,y,h)

Latent SVM

(yi, y)wT(xi,y,h) +maxy,h

- maxhi wT(xi,yi,hi)

Linear upper bound at wt

(xi,yi,hi*) hi* = argmaxhi wt

T(xi,yi,hi)

Page 88: Some Useful Machine Learning Tools

(y(w),h(w)) = maxyY,hH wT(x,y,h)

Latent SVM

min ||w||2 + C∑i i

maxhiwT(xi,yi,hi) - wT(xi,y,h)

≥ (yi, y) - i

Solve using CCCP

Page 89: Some Useful Machine Learning Tools

CCCP for Latent SVM

Start with an initial estimate w0

Update

Update wt+1 by solving a convex problem

min ||w||2 + C∑i i

wT(xi,yi,hi*) - wT(xi,y,h)≥ (yi, y) - i

hi* = argmaxhiH wtT(xi,yi,hi)

http://webdocs.cs.ualberta.ca/~chunnam/

Page 90: Some Useful Machine Learning Tools

CCCP for Human Learning

1 + 1 = 2

1/3 + 1/6 = 1/2

eiπ+1 = 0

Math is for losers !!

FAILURE … BAD LOCAL MINIMUM

Page 91: Some Useful Machine Learning Tools

Self-Paced Learning

Euler wasa Genius!!

SUCCESS … GOOD LOCAL MINIMUM

1 + 1 = 2

1/3 + 1/6 = 1/2

eiπ+1 = 0

Page 92: Some Useful Machine Learning Tools

Self-Paced LearningStart with “easy” examples, then consider “hard” ones

Easy vs. Hard

Expensive

Easy for human Easy for machine

Simultaneously estimate easiness and parametersEasiness is property of data sets, not single instances

Page 93: Some Useful Machine Learning Tools

CCCP for Latent SVM

Start with an initial estimate w0

Update

Update wt+1 by solving a convex problem

min ||w||2 + C∑i i

wT(xi,yi,hi*) - wT(xi,y,h)≥ (yi, y) - i

hi* = argmaxhiH wtT(xi,yi,hi)

Page 94: Some Useful Machine Learning Tools

Self-Paced Learning

min ||w||2 + C∑i i

wT(xi,yi,hi*) - wT(xi,y,h)≥ (yi, y, h) - i

Page 95: Some Useful Machine Learning Tools

Self-Paced Learning

min ||w||2 + C∑i vii

wT(xi,yi,hi*) - wT(xi,y,h)≥ (yi, y, h) - i

vi {0,1}

Trivial Solution

Page 96: Some Useful Machine Learning Tools

Self-Paced Learning

vi {0,1}

Large K Medium K Small K

min ||w||2 + C∑i vii - ∑ivi/K

wT(xi,yi,hi*) - wT(xi,y,h)≥ (yi, y, h) - i

Page 97: Some Useful Machine Learning Tools

Self-Paced Learning

vi [0,1]

min ||w||2 + C∑i vii - ∑ivi/K

wT(xi,yi,hi*) - wT(xi,y,h)≥ (yi, y, h) - i

Large K Medium K Small K

BiconvexProblem

AlternatingConvex Search

Page 98: Some Useful Machine Learning Tools

SPL for Latent SVM

Start with an initial estimate w0

Update

Update wt+1 by solving a convex problem

min ||w||2 + C∑i i - ∑i vi/K

wT(xi,yi,hi*) - wT(xi,y,h)≥ (yi, y) - i

hi* = argmaxhiH wtT(xi,yi,hi)

Decrease K K/

http://cvc.centrale-ponts.fr/personnel/pawan/

Page 99: Some Useful Machine Learning Tools

• Introduction to Weakly Supervised Learning– Two types of problems

• Probabilistic Methods– Expectation maximization

• Loss-based Methods– Latent support vector machine– Dissimilarity coefficient learning (if time permits)

Outline – Part II