optimizing average precision using weakly supervised data aseem behl iiit hyderabad under...

43
Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V. Jawahar (IIIT Hyderabad)

Upload: sydney-pitts

Post on 30-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Optimizing Average Precision using Weakly Supervised Data

Aseem BehlIIIT Hyderabad

Under supervision of:

Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V. Jawahar (IIIT Hyderabad)

Page 2: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Input x

Output y = “Using Computer”Latent Variable h

Jumping

Phoning

Playing Instrument

Reading

Riding Bike

Riding Horse

Running

Taking Photo

Using Computer

Walking

Train Input xi Output yi

Action Classification

Aim - To estimate accurate model parameters by optimizing average precision with weakly supervised data

Page 3: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

• Preliminaries

• Previous Work

• Our Framework

• Results

• Conclusion

Outline

Page 4: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Binary Classification

• Several problems in computer vision can be formulated as binary classification tasks.

• Running example: Action Classification, ie, automatically figuring out whether an image contains a person performing an action of interest (such as ‘jumping’ or ‘walking’).

• Binary classifier widely employed in computer vision is the support vector machine (SVM).

Page 5: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Conventional SVMs• Input examples xi (vector)

• Output labels yi (either +1 or -1)

• SVM learns a hyperplane w

• Predictions are sign(wTΦi(xi))

• Training involves solving the following:

minw ½ ||w||2 + CΣiξi

s.t. i : y∀ i(wTΦi(xi)) ≥ 1 - ξi •The sum of slacks Σiξi upper bounds the 0/1 loss

Page 6: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Structural SVM (SSVM) • Generalization of the SVM to structured output spaces

minw ½ ||w||2 + CΣiξi

s.t. ŷ∀ i : wT(xi,yi) – wTΨ(xi, ŷi) ≥ Δ(yi, ŷi) - ξi• Joint score for the correct label at least as large as

incorrect label plus the loss.• Number of constraints is the |dom(y)|.• Thus, number of constraints can be intractably large.• At least one constraint for where inequality is tight.• “most violated constraint”.

Learning:

Ypred = argmaxy wTΨ(x,y)Prediction:• Maximize the score over all possible outputs.

Page 7: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Structural SVM Learning

Original SVM Problem• Exponential constraints

• Most are dominated by a small set of “important” constraints

Structural SVM Approach• Repeatedly finds the next most

violated constraint…

• …until set of constraints is a good approximation.

Slide taken from Yue et al. (2007)

Page 8: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Structural SVM Learning

Original SVM Problem• Exponential constraints

• Most are dominated by a small set of “important” constraints

Structural SVM Approach• Repeatedly finds the next most

violated constraint…

• …until set of constraints is a good approximation.

Slide taken from Yue et al. (2007)

Page 9: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Structural SVM Learning

Original SVM Problem• Exponential constraints

• Most are dominated by a small set of “important” constraints

Structural SVM Approach• Repeatedly finds the next most

violated constraint…

• …until set of constraints is a good approximation.

Slide taken from Yue et al. (2007)

Page 10: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Structural SVM Learning

Original SVM Problem• Exponential constraints

• Most are dominated by a small set of “important” constraints

Structural SVM Approach• Repeatedly finds the next most

violated constraint…

• …until set of constraints is a good approximation.

Slide taken from Yue et al. (2007)

Page 11: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Structural SVM Learning1: Solve the SVM objective function using only the current working set of constraints.

2: Using the model learned in step 1, find the most violated constraint from the exponential set of constraints.

3: If the constraint returned in step 2 is more violated than the most violated constraint the working set by some small constant, add that constraint to the working set.

Repeat steps 1-3 until no additional constraints are added. Return the most recent model that was trained in step 1.

Steps 1-3 are guaranteed to loop for at most a polynomial number of iterations.

[Tsochantaridis et al. 2005]

Page 12: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Weak Supervision• Supervised learning involves the onerous task of collecting detailed annotations for each training sample.

• Financially infeasible as the size of the datasets grow.

• Weak supervision – additional annotations hi are unknown

• More complex machine learning problem.

Page 13: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Weak Supervision – Challenges• Find the best additional annotation for positive examples.

• Identify bounding box of the `jumping’ person in positive images

• Need to consider all possible values of annotations for negative samples as negative examples.• Ensure that scores of `jumping’ person bounding boxes are

higher than the scores of all possible bounding boxes in the negative images.

Page 14: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Latent SVM (LSVM)• Extends SSVM to incorporate hidden information.

• This information is considered as part of the label.

• Not observed during training.

minw ½ ||w||2 + CΣiξi

s.t. ŷ∀ i,ĥi : maxhiwTΨ (xi,yi,hi) – wTΨ(xi, ŷi, ĥi) ≥ Δ(yi, ŷi) - ξi

Non-convex objective Difference of convex

CCCP Algorithm - converges to a local minimum

Page 15: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Concave-Convex Procedure (CCCP)1. Repeat until convergence:

2. Iteratively approximate the concave portion of the objective

• Impute the hidden variables

3. Update the value of the parameter using the values of the hidden variables

• Solve the resulting convex SSVM problem

Steps 2-3 are guaranteed to local minima in polynomial number of iterations.

[Yuille and A. Rangarajan, 2003]

Page 16: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Average Precision (AP)

AP-loss = 0.24, 0-1 loss = 0.40

AP-loss = 0.36, 0-1 loss = 0.40

• AP is the most commonly used accuracy measure for binary classification. • AP is the average of the precision scores at the rank

locations of each positive sample. • AP-loss depends on the ranking of the samples.• 0-1 loss depends only on the number of incorrectly classified samples.

• A machine learning algorithm optimizing for 0/1 loss might learn a very different model than optimizing for AP.

Page 17: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Notation

x h

y = “Using Computer”

1, xi ranked higher than xj

Y: ranking matrix, st. Yij= 0, xi & xj are ranked same

-1, xi ranked lower than xj

X: Input {xi= 1,..,n}

{HP: Additional information for positives {hi, i P}∈HN: Additional information for negatives {hj, j N} ∈

∆(Y, Y∗): AP-loss = 1 − AP(Y, Y∗)AP(Y, Y∗) = AP of ranking Y

Joint Feature Vector: Ψ(X,Y*,{HP,HN}) = (1/|P|.|N|)ΣiΣjYij(Φi(hi) - Φj(hj))

Page 18: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

AP-SVM• AP-SVM optimizes the correct AP-loss function as opposed to 0/1 loss.

Prediction: Yopt = maxY wTΨ(X,Y,H)

minw ½ ||w||2 + Cξs.t. Y : w∀ TΨ(X,Y*,H) - wTΨ(X,Y,H)

≥ Δ(Y,Y*) - ξ • Constraints are defined for each incorrect labeling Y.Joint discriminant score for the correct labeling at least as large as incorrect labeling plus the loss.

• After learning w, a prediction is made by sorting samples (xk,hk) in descending order of wTΦk(hk)

Learning:

Page 19: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

AP-SVM – Exponential Constraints

• For Average Precision, the true labeling is a ranking where the positive examples are all ranked in the front, e.g.,

• An incorrect labeling would be any other ranking, e.g.,

• Exponential number of incorrect rankings.

• Thus an exponential number of constraints.

Page 20: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Finding Most Violated Constraint

• Structural SVM requires a subroutine to find the most violated constraint.

• Subroutine is dependent on formulation of loss function and joint feature representation.

• Yue et al. came up with efficient algorithm in the case of optimizing AP.

Page 21: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Finding Most Violated Constraint

• AP is invariant to the order of examples within positive and negative examples

• Joint SVM score is optimized by sorting in descending order by individual examples score.

• Reduces to finding an interleaving between two sorted lists of examples

Page 22: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Finding Most Violated Constraint

• Start with perfect ranking• Consider swapping adjacent

positive/negative examples

Slide taken from Yue et al. (2007)

Page 23: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Finding Most Violated Constraint

• Start with perfect ranking• Consider swapping adjacent

positive/negative examples• Find the best feasible ranking of

the negative example

Slide taken from Yue et al. (2007)

Page 24: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Finding Most Violated Constraint

• Start with perfect ranking• Consider swapping adjacent

positive/negative examples• Find the best feasible ranking of

the negative example• Repeat for next negative

examples

Slide taken from Yue et al. (2007)

Page 25: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Finding Most Violated Constraint

• Start with perfect ranking• Consider swapping adjacent

positive/negative examples• Find the best feasible ranking of

the negative example• Repeat for next negative

examples• Never want to swap past

previous negative examples

Slide taken from Yue et al. (2007)

Page 26: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Finding Most Violated Constraint

• Start with perfect ranking• Consider swapping adjacent

positive/negative examples• Find the best feasible ranking of the

negative example• Repeat for next negative examples• Never want to swap past previous

negative examples• Repeat until all negative examples

have been considered

Slide taken from Yue et al. (2007)

Page 27: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Hypothesis

Optimizing correct loss function is important for weakly supervised learning.

Page 28: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Latent Structural SVM (LSSVM)

• Introduces a margin between the maximum score for the ground-truth output and all other pairs of output and additional annotations

• Compares scores between 2 different sets of annotation

minw ½ ||w||2 + Cξs.t. Y,∀ H :

maxĤ{wTΨ(X,Y*,Ĥ)} - wTΨ(X,Y,H) ≥ Δ(Y,Y*) - ξ

Learning:

Page 29: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Latent Structural SVM (LSSVM)Prediction:

(Yopt,Hopt) = maxY,H wTΨ(X,Y,H)

Negatives

Positives

Page 30: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Latent Structural SVM (LSSVM)Disadvantages:

•Prediction: LSSVM uses an unintuitive prediction rule.

•Learning: LSSVM optimizes a loose upper-bound on the AP-loss.

•Optimization: Exact loss-augmented inference is computationally inefficient.

Page 31: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Latent AP-SVM - Prediction

Negatives

Positives

Step 1: Find the best hi for each sample

Hopt = argmaxH wT Ψ(X,Y,H)

Yopt = argmaxY wT Ψ(X,Y,Hopt)

Step 2: Sort samples according to best scores

Page 32: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Latent AP-SVM - Learning

• Finds the best assignment of values HP such that the score for the correct ranking is higher than the score for an incorrect ranking, regardless of the choice of HN.

• Compares scores between same sets of additional annotation.

minw ½ ||w||2 + Cξs.t. Y,∀ HN :

maxHp{wTΨ(X,Y*,{HP,HN}) - wTΨ(X,Y,{HP,HN})} ≥ Δ(Y,Y*) - ξ

Page 33: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Latent AP-SVM - Learning

• Constraints of latent AP-SVM are a subset of LSSVM constraints.

• Optimal solution of latent AP-SVM has a lower objective than LSSVM solution.

• Latent AP-SVM provides a valid upper-bound on the AP-loss.

Latent AP-SVM provides a tighter upper-bound on the AP Loss

Page 34: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Latent AP-SVM - Optimization

1. Initialize the set of parameters w0

2. Repeat until convergence3.Imputation of the additional annotations for positives

4. Parameter update using cutting-plane algorithm

Independently choose additional annotation HP Complexity: O(nP.|H|)

Maximize over HN and Y independently Complexity: O(nP.nN)

Page 35: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Action ClassificationInput x

Output y = “Using Computer”Latent Variable h

PASCAL VOC 2011 action classification 4846 images, 10 action classes2424 trainval & 2422 test images

Features- 2400 activation scores of action-specific poselets & 4 object activation scores

Jumping

Phoning

Playing Instrument

Reading

Riding Bike

Riding Horse

Running

Taking Photo

Using Computer

Walking

Train Input xi Output yi

Page 36: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Action Classification• 5-fold cross validation on the `trainval’ dataset.

• Statistically significant increase in performance:• 6/10 classes over LSVM• 7/10 classes over LSSVM

• Overall improvement:• 5% over LSVM• 4% over LSSVM

Page 37: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Action Classification• X-axis corresponds to the amount of supervision provided. • Y-axis corresponds to the mean average precision. • As amount of supervision decreases, gap in performance

of latent AP-SVM and the baseline methods increases.

Page 38: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Action Classification• Performance on test-set of PASCAL 2011.• Increase in performance:

• All classes over LSVM• 8/10 classes over LSSVM

• Overall improvement:• 5.1% compared to LSVM• 3.7% over LSSVM

Page 39: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Object Detection• PASCAL VOC 2007 object detection dataset.• 9963 images over 20 object categories.• 5011 trainval & 4952 test images.

• Features – 4096 dimensional activation vector of penultimate layer of a trained Convolutional Neural Network (CNN).

Input x Output y = “Aeroplane”Latent Variable h

(Average of 2000 candidate windows per image using the selective-search algorithm)

Page 40: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Object Detection• 5-fold cross validation on the `trainval’ dataset.

• Statistically significant increase in performance for 15/20 classes over LSVM.

• Superior performance partially attributed to the better localization of objects by LAP-SVM during training.

Page 41: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Object Detection• Performance on test set of

PASCAL 2007.

• Increase in performance for19/20 classes over LSVM.

• Overall improvement of 7% over LSVM.

• We also get improved results on the IIIT 5K-WORD dataset.

Page 42: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Conclusion

• Proposed novel formulation that obtains accurate ranking by minimizing a carefully designed upper bound on the AP loss.

• Showed the theoretical benefits of our method.

• Demonstrated advantage of our approach on challenging machine learning problems.

Page 43: Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V

Thank you