optimizing learning with svm constraint for content-based image retrieval* steven c.h. hoi 1th...

23
Optimizing Learning with SVM Constraint for Content-based Image Retrieval* Steven C.H. Hoi Steven C.H. Hoi 1th March, 2004 1th March, 2004 e: The copyright of the presentation material is hold by the authors.

Post on 22-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Optimizing Learning with SVM Constraint for Content-based Image Retrieval* Steven C.H. Hoi 1th March, 2004 *Note: The copyright of the presentation material

Optimizing Learning with SVM Constraint forContent-based Image Retrieval*

Steven C.H. HoiSteven C.H. Hoi

1th March, 20041th March, 2004

*Note: The copyright of the presentation material is hold by the authors.

Page 2: Optimizing Learning with SVM Constraint for Content-based Image Retrieval* Steven C.H. Hoi 1th March, 2004 *Note: The copyright of the presentation material

OutlineOutline

IntroductionIntroduction Related WorkRelated Work Optimizing Learning with SVM constraintOptimizing Learning with SVM constraint Experimental ResultsExperimental Results Discussions and Future WorkDiscussions and Future Work ConclusionsConclusions

Page 3: Optimizing Learning with SVM Constraint for Content-based Image Retrieval* Steven C.H. Hoi 1th March, 2004 *Note: The copyright of the presentation material

IntroductionIntroduction In CBIR, there exists a gap between the high In CBIR, there exists a gap between the high

level semantics and the low level features level semantics and the low level features calculated by the computers.calculated by the computers.

To learn the associations between the human To learn the associations between the human perception and the low level features, relevance perception and the low level features, relevance feedback was proposed as a natural way to feedback was proposed as a natural way to solve this task.solve this task.

In a CBIR system, users will be asked to provide In a CBIR system, users will be asked to provide the relevance judgementthe relevance judgement on the query results. on the query results. Based on user’s feedbacks, the CBIR system Based on user’s feedbacks, the CBIR system refines the retrieval performance round-by-refines the retrieval performance round-by-round.round.

Difficulties in Relevance Feedback: high Difficulties in Relevance Feedback: high dimension feature space, small training samplesdimension feature space, small training samples

Page 4: Optimizing Learning with SVM Constraint for Content-based Image Retrieval* Steven C.H. Hoi 1th March, 2004 *Note: The copyright of the presentation material

Related WorkRelated Work

Major techniques for Relevance FeedbackMajor techniques for Relevance Feedback Query-Point Movement: Rochhio’s formulaQuery-Point Movement: Rochhio’s formula

• Moving the ideal query point toward the positive examples and Moving the ideal query point toward the positive examples and away from the negative examplesaway from the negative examples

Re-weighting: MARSRe-weighting: MARS• Axis re-weighting: the inverse of the standard deviation of a Axis re-weighting: the inverse of the standard deviation of a

feature, say the j-th feature, is used as the weight for the feature, say the j-th feature, is used as the weight for the corresponding axis.corresponding axis.

Page 5: Optimizing Learning with SVM Constraint for Content-based Image Retrieval* Steven C.H. Hoi 1th March, 2004 *Note: The copyright of the presentation material

Parameters optimization methods: MindReader, Parameters optimization methods: MindReader, Rui’00Rui’00

• By minimizing the total distance of positive examples to the By minimizing the total distance of positive examples to the ideal query pointideal query point

• Based on Generalized Ellipsoid DistanceBased on Generalized Ellipsoid Distance

Kernel-based Classification Techniques: SVMs, Kernel-based Classification Techniques: SVMs, Boosting, etc.Boosting, etc.

Page 6: Optimizing Learning with SVM Constraint for Content-based Image Retrieval* Steven C.H. Hoi 1th March, 2004 *Note: The copyright of the presentation material

SVMSVM AdvantagesAdvantages

• Sound theoretical backgroundSound theoretical background• Minimizing Structure Risk rather than Empirical RiskMinimizing Structure Risk rather than Empirical Risk• Excellent Classification performanceExcellent Classification performance

Basic TheoryBasic Theory

Learning boundary with SVMLearning boundary with SVM

Page 7: Optimizing Learning with SVM Constraint for Content-based Image Retrieval* Steven C.H. Hoi 1th March, 2004 *Note: The copyright of the presentation material

Considering the soft-margin, SVM has the form below:Considering the soft-margin, SVM has the form below:

The optimization problem can be solved by introducing The optimization problem can be solved by introducing the Lagrange multipliers.the Lagrange multipliers.

The derived decision function can be described as The derived decision function can be described as followsfollows

The distance function for SVM to measure the The distance function for SVM to measure the similarity for Image retrieval is typically given assimilarity for Image retrieval is typically given as

Page 8: Optimizing Learning with SVM Constraint for Content-based Image Retrieval* Steven C.H. Hoi 1th March, 2004 *Note: The copyright of the presentation material

Limitations of SVMsLimitations of SVMs

Inaccurate boundary for few samplesInaccurate boundary for few samples In the initial round of relevance feedback, the In the initial round of relevance feedback, the

number of training samples is small, SVMs number of training samples is small, SVMs cannot accurately catch the data distributions. cannot accurately catch the data distributions.

Ranking problemRanking problem SVM is originally for classification purposes. SVM is originally for classification purposes.

Simply to take the distance from boundaries Simply to take the distance from boundaries of SVM as the distance function may not be of SVM as the distance function may not be effective enough to describe the data. effective enough to describe the data.

Page 9: Optimizing Learning with SVM Constraint for Content-based Image Retrieval* Steven C.H. Hoi 1th March, 2004 *Note: The copyright of the presentation material

SolutionsSolutions A heuristic approachA heuristic approach

Guo in Ref [3] suggested a simple approach to embed the Guo in Ref [3] suggested a simple approach to embed the Euclidean Distance in SVM learning. In their scheme, the samples Euclidean Distance in SVM learning. In their scheme, the samples inside the boundary of SVM are measured by Euclidean Distance inside the boundary of SVM are measured by Euclidean Distance while the samples outside the boundary are evaluated by the while the samples outside the boundary are evaluated by the distance-from-boundary of SVMs.distance-from-boundary of SVMs.

However, the heuristic approach based on straight Euclidean However, the heuristic approach based on straight Euclidean Distance is not flexible and powerful enough to describe the data Distance is not flexible and powerful enough to describe the data distribution. Moreover, it lacks systematic formulation in distribution. Moreover, it lacks systematic formulation in mathematics.mathematics.

Optimizing Learning approachOptimizing Learning approach To best learn the similarity, we suggest a novel systematic To best learn the similarity, we suggest a novel systematic

scheme: the optimal learning with SVMs constraint. Our approach scheme: the optimal learning with SVMs constraint. Our approach can optimally learn the similarity for Image retrieval with the can optimally learn the similarity for Image retrieval with the Generalized Ellipsoid Distance.Generalized Ellipsoid Distance.

Page 10: Optimizing Learning with SVM Constraint for Content-based Image Retrieval* Steven C.H. Hoi 1th March, 2004 *Note: The copyright of the presentation material

Optimizing Learning with SVM Optimizing Learning with SVM constraintconstraint

Basic IdeaBasic Idea The training samples are first learned by SVM to form a boundary The training samples are first learned by SVM to form a boundary

to separate the positive part and the negative ones.to separate the positive part and the negative ones. Then we learn the optimal similarity measure metric by positive Then we learn the optimal similarity measure metric by positive

samples with Generalized Ellipsoid Distance constrained with the samples with Generalized Ellipsoid Distance constrained with the boundary of SVMs.boundary of SVMs.

Comparison for the proposed method with previousComparison for the proposed method with previous

Page 11: Optimizing Learning with SVM Constraint for Content-based Image Retrieval* Steven C.H. Hoi 1th March, 2004 *Note: The copyright of the presentation material

Problem formulation and notationsProblem formulation and notations N – denote the number relevant samplesN – denote the number relevant samples M – denote the number of features. M – denote the number of features. Given an image xGiven an image xnn, we use , we use xxnini = [x = [xni1ni1,…x,…xniLiniLi]] to to

represent the represent the i-i-th feature vector, where th feature vector, where LLii is the is the length of the length of the i-i-th feature vector.th feature vector.

Let qi = [ qi1,…,qiL1] denote the ideal query vector Let qi = [ qi1,…,qiL1] denote the ideal query vector of the i-th feature vector.of the i-th feature vector.

Generalized Ellipsoid Distance is described by a Generalized Ellipsoid Distance is described by a real symmetric full matrix Wreal symmetric full matrix Wi.i.

Page 12: Optimizing Learning with SVM Constraint for Content-based Image Retrieval* Steven C.H. Hoi 1th March, 2004 *Note: The copyright of the presentation material
Page 13: Optimizing Learning with SVM Constraint for Content-based Image Retrieval* Steven C.H. Hoi 1th March, 2004 *Note: The copyright of the presentation material

Optimization targetOptimization target

where where uu is used for feature weights and is used for feature weights and v v is provided is provided for the goodness value of positive samples to weight for the goodness value of positive samples to weight the importance of the samples.the importance of the samples.

Page 14: Optimizing Learning with SVM Constraint for Content-based Image Retrieval* Steven C.H. Hoi 1th March, 2004 *Note: The copyright of the presentation material

To fuse the optimal learning model with SVM learning, To fuse the optimal learning model with SVM learning, we set the goodness value v(x) related to the SVM we set the goodness value v(x) related to the SVM distance in each iteration.distance in each iteration.

By introducing Lagrange multipliers, we can solve the By introducing Lagrange multipliers, we can solve the previous optimization function. We here only present the previous optimization function. We here only present the major conclusions as follows.major conclusions as follows.

The optimal solution to the ideal query point can be solved as The optimal solution to the ideal query point can be solved as

Page 15: Optimizing Learning with SVM Constraint for Content-based Image Retrieval* Steven C.H. Hoi 1th March, 2004 *Note: The copyright of the presentation material

The optimal solution to the distance matrix can be solved asThe optimal solution to the distance matrix can be solved as

where is weighting covariance matrix of where is weighting covariance matrix of XiXi

The optimal solution to the vector The optimal solution to the vector uu can be solved as can be solved as

Page 16: Optimizing Learning with SVM Constraint for Content-based Image Retrieval* Steven C.H. Hoi 1th March, 2004 *Note: The copyright of the presentation material

Distance Measure MetricsDistance Measure Metrics

where MaxDis is given aswhere MaxDis is given as

Page 17: Optimizing Learning with SVM Constraint for Content-based Image Retrieval* Steven C.H. Hoi 1th March, 2004 *Note: The copyright of the presentation material

Experimental ResultsExperimental Results DatasetsDatasets

Natural images selected from COREL CDsNatural images selected from COREL CDs Two datasets: 20-Category and 50-CategoryTwo datasets: 20-Category and 50-Category Each category contains 100 images.Each category contains 100 images.

Feature RepresentationFeature Representation 9-dimensional Color Moment9-dimensional Color Moment 18-dimensional Edge Direction Histogram18-dimensional Edge Direction Histogram 9-dimensional Wavelet-based Texture9-dimensional Wavelet-based Texture

Experimental SettingsExperimental Settings Radial Basis Function for SVMsRadial Basis Function for SVMs To enable objective evaluation, we fix the parameters To enable objective evaluation, we fix the parameters

in the learning task for all compared algorithms. in the learning task for all compared algorithms.

Page 18: Optimizing Learning with SVM Constraint for Content-based Image Retrieval* Steven C.H. Hoi 1th March, 2004 *Note: The copyright of the presentation material

Evaluation on 20-Cat dataset: Evaluation on 20-Cat dataset: Average Precision on Top-20Average Precision on Top-20

Page 19: Optimizing Learning with SVM Constraint for Content-based Image Retrieval* Steven C.H. Hoi 1th March, 2004 *Note: The copyright of the presentation material

Evaluation on 50-Cat dataset: Evaluation on 50-Cat dataset:

Page 20: Optimizing Learning with SVM Constraint for Content-based Image Retrieval* Steven C.H. Hoi 1th March, 2004 *Note: The copyright of the presentation material

Computational Complexity and Empirical Time Cost

Page 21: Optimizing Learning with SVM Constraint for Content-based Image Retrieval* Steven C.H. Hoi 1th March, 2004 *Note: The copyright of the presentation material

Discussion and Future WorkDiscussion and Future Work

Feature Set SelectionFeature Set Selection In our SVM learning scheme, we do not In our SVM learning scheme, we do not

consider the feature set selection in the consider the feature set selection in the boundary learning. boundary learning.

We can further improve the retrieval We can further improve the retrieval performance by combining the feature set performance by combining the feature set selection in the SVM learning tasks.selection in the SVM learning tasks.

The efficiency problem may also be The efficiency problem may also be considered in the future work.considered in the future work.

Page 22: Optimizing Learning with SVM Constraint for Content-based Image Retrieval* Steven C.H. Hoi 1th March, 2004 *Note: The copyright of the presentation material

ConclusionsConclusions In this talk, we present a novel scheme to learn In this talk, we present a novel scheme to learn

the relevance feedback task for image retrieval.the relevance feedback task for image retrieval. We suggest the approach by optimal learning We suggest the approach by optimal learning

with SVMs constraint.with SVMs constraint. Our scheme not only can Our scheme not only can utilizeutilize the advantages the advantages

of SVMs to learn the boundary in the high of SVMs to learn the boundary in the high dimension feature space, but also can exploit dimension feature space, but also can exploit the hidden similarity structure within the the hidden similarity structure within the boundary of SVMs well. boundary of SVMs well.

Compared with previous methods, our approach Compared with previous methods, our approach with systematic formulation can perform better with systematic formulation can perform better from the preliminary experimental results.from the preliminary experimental results.

Page 23: Optimizing Learning with SVM Constraint for Content-based Image Retrieval* Steven C.H. Hoi 1th March, 2004 *Note: The copyright of the presentation material

ReferenceReference [1] Chu-Hong Hoi and Michael R. Lyu, [1] Chu-Hong Hoi and Michael R. Lyu, Optimizing Learning with SVM

Constraint for Content-Based Image Retrieval, Technical Report, Department of Computer Science and Engineering, The Chinese University of Hong Kong, March, 2004

[2][2] Yoshiharu Ishikawa, Ravishankar Subramanya, and Christos Faloutsos Yoshiharu Ishikawa, Ravishankar Subramanya, and Christos Faloutsos MindReader: Querying databases through multiple examplesMindReader: Querying databases through multiple examples (1998) (1998) Proc. 24th Int. Conf. Very Large Data Bases, VLDB, 1998Proc. 24th Int. Conf. Very Large Data Bases, VLDB, 1998

[3] Y. Rui and T.S. Huang.[3] Y. Rui and T.S. Huang. Optimizing learning in image retrievalOptimizing learning in image retrieval. IEEE . IEEE Conf. on CVPR, June 2000.Conf. on CVPR, June 2000.

[4] G. D. Guo, A. K. Jain, W. Y. Ma, and H. J. Zhang, [4] G. D. Guo, A. K. Jain, W. Y. Ma, and H. J. Zhang, Learning Similarity Learning Similarity Measure for Natural Image Retrieval with Relevance FeedbackMeasure for Natural Image Retrieval with Relevance Feedback, , IEEE IEEE Trans. on Neural NetworksTrans. on Neural Networks, vol. 13, No. 4, 811-820, July, 2002. , vol. 13, No. 4, 811-820, July, 2002.