kronecker decomposition for image classification · i tensor decomposition can be considered as a...
TRANSCRIPT
university of innsbruck
institute of computer scienceintelligent and interactive systems
Kronecker Decomposition for Image Classification
Sabrina Fontanella1,2, Antonio Rodrıguez-Sanchez 1, JustusPiater1, and Sandor Szedmak3
1University of Innsbruck
2University of Salerno
3Aalto University
Evora, September 2016
Outline
Image classificationThe problemDecomposing the environment
The tensor decompositionWhat is itCompressionInterpretation of the image components
Learning approachMaximum Margin Regression
Experimental evaluationImageCLEF 2015
Experimental evaluationPascal and Flickr
Antonio Rodrıguez-Sanchez (CLEF 2016) 1/41
Outline
Image classificationThe problemDecomposing the environment
The tensor decompositionWhat is itCompressionInterpretation of the image components
Learning approachMaximum Margin Regression
Experimental evaluationImageCLEF 2015
Experimental evaluationPascal and Flickr
Antonio Rodrıguez-Sanchez (CLEF 2016) 2/41
Image classification I
I Images are classified according to their visual contentI Applicability:
1. Recognition of specific objects2. Indoor/outdoor recognition3. Analysis of medical images
Antonio Rodrıguez-Sanchez (CLEF 2016) 3/41
Image classification II
I Example of classification algorithm, Bag of Words:
1. Features extraction, stored into feature vectors2. Approximation of the distribution of the features by an
histogram3. Apply a classification algorithm (Support Vector Machine,
Neural Network, Markov Random Field, etc)
Antonio Rodrıguez-Sanchez (CLEF 2016) 4/41
Relations between objects are of interest
I Is it possible to recognize relationships between the objectsappearing in a scene?
I This is of interest, since this relationship can provideknowledge necessary to identify and classify the image
I E.g. A car is quite likely to be in an image where there is alsobuildings and people.
I E.g. A zebra is quite likely to be outdoors, surrounded bySavanna plants or animals.
Antonio Rodrıguez-Sanchez (CLEF 2016) 5/41
Outline
Image classificationThe problemDecomposing the environment
The tensor decompositionWhat is itCompressionInterpretation of the image components
Learning approachMaximum Margin Regression
Experimental evaluationImageCLEF 2015
Experimental evaluationPascal and Flickr
Antonio Rodrıguez-Sanchez (CLEF 2016) 6/41
Decomposing the environment
I Structured decomposition of the environmentI Learning structured output is a popular stream of machine
learning
I By decomposing the matrix that represent the image, thestructure behind the scene could be captured
I Let us consider 2D image decompositionI Points close to each other within continuous 2D blocks can
strongly relate to each other
Antonio Rodrıguez-Sanchez (CLEF 2016) 7/41
Outline
Image classificationThe problemDecomposing the environment
The tensor decompositionWhat is itCompressionInterpretation of the image components
Learning approachMaximum Margin Regression
Experimental evaluationImageCLEF 2015
Experimental evaluationPascal and Flickr
Antonio Rodrıguez-Sanchez (CLEF 2016) 8/41
Tensor decomposition
I A tensor is a multidimensional or N-way arrayI an N- way or Nth-order tensor is an element of the tensor
product of N vector spaces
I Tensor decomposition can be considered as a higher- ordergeneralization of the matrix singular value decomposition(SVD) and principal component analysis (PCA)
I The tensor decomposition for a same image is not uniqueI Given an RGB image of size (256,256,3), it is possible to
perform the following decompositions:I (16,16,3),(16,16,1)
tensor + matrix (2 components)I (8,8,3), (8,8,1), (4,4,1)
tensor + 2 matrices (3 components)
Antonio Rodrıguez-Sanchez (CLEF 2016) 9/41
Tensor decomposition
I Concerning computer vision, the tensor decomposition couldbe used to represent:
I Color images, where three matrices express the RGB imagesand we can use a tensor of order three (for example(1024,1024,3)).
I Video stream of color images where the dimensions are R, G, Band the time.
Antonio Rodrıguez-Sanchez (CLEF 2016) 10/41
The Kronecker product
Given two matrices A ∈ RmA×nA and B ∈ RmB×nB , the Kroneckerproduct X can be expressed as:
X = A⊗ B
A1,1B A1,2B · · · A1,nABA2,1B A2,2B · · · A2,nAB
......
. . ....
AmA,1B AmA,2B · · · AmA,nAB
with
mX = mA ×mB , nX = nA × nB
I If X is given (the image), how can we compute A and B (itscomponents)?
I B can be considered as a 2D filter of the image represented bythe matrix X components)?
Antonio Rodrıguez-Sanchez (CLEF 2016) 11/41
The Kronecker decomposition and SVD
I The Kronecker decomposition can be carried out by SingularValue Decomposition(SVD)
I Given an arbitrary matrix X with size m × n the SVD is givenby
X = USV T
where
I U ∈ Rmxm is an orthogonal matrix of left singular vectors,where UUT = Im ,
I V ∈ Rnxn , is an orthogonal matrix of right singular vectors,where VV T = In ,
I S ∈ Rmxn , is a diagonal matrix containing the singular valueswith nonnegative components in its diagonal
Antonio Rodrıguez-Sanchez (CLEF 2016) 12/41
Note
I The algorithm solving the SVD does not depend on theorder of the elements of the matrix
I Thus, any permutation of the indexes, reordering, of thecolumns and (or) rows preserves the same solution
I We can then work on a reordered representation of thematrix X
Antonio Rodrıguez-Sanchez (CLEF 2016) 13/41
Algorithm for solving Kronecker decomposition
1. Reorder the matrix
2. Compute SVD decomposition
3. Compute the approximation of X
4. Invert the reordering
Antonio Rodrıguez-Sanchez (CLEF 2016) 14/41
Nearest Kronecker Product (NKP)
I Given a matrix X ∈ Rmxn, the NKP problem involvesminimizing:
φ(A,B) = ‖X − A⊗ B‖F F is the Frobenius norm
I This problem can be solved using SVD, working on areordered representation of X
Antonio Rodrıguez-Sanchez (CLEF 2016) 15/41
Step 1: Reorder matrix X 1
X = A⊗ Bx11 x12 x13 x14 x15 x16
x21 x22 x23 x24 x25 x26
x31 x32 x33 x34 x35 x36
x41 x42 x43 x44 x45 x46
x51 x52 x53 x54 x55 x56
x61 x62 x63 x64 x65 x66
=
a11 a12 a13
a21 a22 a23
a31 a32 a33
⊗ [ b11 b12
b21 b22
],
can be reordered into
X = A⊗ Bx11 x13 x15 x31 x33 x35 x51 x53 x55
x12 x14 x16 x32 x34 x36 x52 x54 x56
x21 x23 x25 x41 x43 x45 x61 x63 x65
x22 x24 x26 x42 x44 x46 x62 x64 x66
=
b11
b12
b21
b22
⊗ [ a11 a12 a13 a21 a22 a23 a31 a32 a33]
,
1C.F.V. Loan. The ubiquitous Kronecker product. Journal of Computational and Applied Mathematics,
123:85-100, 2000.
Antonio Rodrıguez-Sanchez (CLEF 2016) 16/41
Approximation of X and reordering
I ‖X − vec(A)⊗ vec(B)‖FI Vec() is a vectorization operator which stacks columns of a
matrix on top of each other
I Problem of finding the nearest rank-1 matrix to X
I Well known solutions using SVD
Antonio Rodrıguez-Sanchez (CLEF 2016) 17/41
Step 2: Compute SVD decomposition
‖X − vec(A)⊗ vec(B)‖F
I Let X = USV T the decomposition of X
I The best A and B are defined as:
A =√σ1U(:, 1) and B =
√σ1V (:; 1)
where σ1 is the largest singular value and U and V are thecorresponding singular vectors
Antonio Rodrıguez-Sanchez (CLEF 2016) 18/41
Steps 3 and 4: Approximation and reordering
I Once we have A and B is possible to compute theapproximation of X
I Since at beginning we have changed the order of values intomatrix, invert the reordering is necessary for obtain theoriginal A and B
Antonio Rodrıguez-Sanchez (CLEF 2016) 19/41
Components and factorization
I The number of components and factorization influence thelevel of details
I Given, for example, a gray image of size (1024,1024):I If it has many details, is better chose many components with
small factorization:I Example: (4,4)(4,4)(4,4)(4,4)(4,4)
I If is less detailed, less component with high factorization:I Example: (32,32)(32,32)
Antonio Rodrıguez-Sanchez (CLEF 2016) 20/41
Outline
Image classificationThe problemDecomposing the environment
The tensor decompositionWhat is itCompressionInterpretation of the image components
Learning approachMaximum Margin Regression
Experimental evaluationImageCLEF 2015
Experimental evaluationPascal and Flickr
Antonio Rodrıguez-Sanchez (CLEF 2016) 21/41
Compression I
I The tensor decomposition can provide a very high level ofimages’ compression
I It takes consideration only the largest singular values(Eckart-Young theorem)
I The level of compression is given by the total number of:
elements in image matrix
elements of components in the decomposition
Antonio Rodrıguez-Sanchez (CLEF 2016) 22/41
Compression II
I LetI nsv number of singular values taken in considerationI nf number of factors for componentI v value of factorsI nc the number of components used
Then the total number of elements of components is given by:
nsv ∗ nc ∗ vnf
I For simplify the notation we assume that the all factors areequal for every component
I Decomposition with different factors can be taken inconsideration
I For example (32,28)(16,8)(2,4)
Antonio Rodrıguez-Sanchez (CLEF 2016) 23/41
Compression III: Example
I Given an image of size (1024,1024).I It can be compressed with components
I (32,32)(32,32) and with 10 singular values by:
10242
10 ∗ 2 ∗ 322= 51.2
I (4,4),(4,4),(4,4),(4,4),(4,4) and with 10 singular values by:
10242
10 ∗ 5 ∗ 42= 1310.72
Antonio Rodrıguez-Sanchez (CLEF 2016) 24/41
Compression IV: Example
Compression ratio: 202
Compression ratio: 99
Figure: Example of compression on toys room image.
Antonio Rodrıguez-Sanchez (CLEF 2016) 25/41
Outline
Image classificationThe problemDecomposing the environment
The tensor decompositionWhat is itCompressionInterpretation of the image components
Learning approachMaximum Margin Regression
Experimental evaluationImageCLEF 2015
Experimental evaluationPascal and Flickr
Antonio Rodrıguez-Sanchez (CLEF 2016) 26/41
Interpretation of image components I
X = A⊗ B
I B can be interpreted like an image filter
I It finds the boundary of the critical regions where most of thestructural information concentrates
I This represents a big advantage:I In general, in image filtering processes, a predetermined filter is
usedI The Kronecker decomposition automatically tries to predict
the optimal filters
Antonio Rodrıguez-Sanchez (CLEF 2016) 27/41
Interpretation of image components II
Highest components (A) Lowest components (B)
Figure: Toys room picture and its components. The Highest component and the
Lowest component correspond to the matrices A1, ... and B1, ... respectively.
Antonio Rodrıguez-Sanchez (CLEF 2016) 28/41
Outline
Image classificationThe problemDecomposing the environment
The tensor decompositionWhat is itCompressionInterpretation of the image components
Learning approachMaximum Margin Regression
Experimental evaluationImageCLEF 2015
Experimental evaluationPascal and Flickr
Antonio Rodrıguez-Sanchez (CLEF 2016) 29/41
Learning
I Sample set of pairs of output and input objects
{(yi , xi ) : yi ∈ Y, xi ∈ X , i = 1, ...,m}
I Define two functions, φ and ψ , that map the input andoutput objects respectively into linear vector spaces
I feature space in case of the inputI label space in case of the output
φ : X → Hφ and ψ : Y → Hψ
Antonio Rodrıguez-Sanchez (CLEF 2016) 30/41
Objective
I Find a linear function acting on the feature space
f (φ(x)) = Wφ(x) + b
that produces a prediction of every input object in the labelspace
I The output corresponding to X is:
y = ψ−1(f (φ(x)))
Antonio Rodrıguez-Sanchez (CLEF 2016) 31/41
MMR (Maximum Margin Regression) vs SVM (SupportVector Machine)
I MMR is a framework for multilabel classificationI Is based on Support Vector Machine (SVM)
I Key idea: reinterpretation of the normal vector w
SVM Extended View
I w is the normal vector of
the separating hyperplane.
I W is a linear operator
projecting the feature space
into the label space
I yi ∈ {−1,+1} binary
outputs.
I The labels are equal to the
binary objects.
I yi ∈ Y arbitrary outputs
I ψ(yi ) ∈ Hψ are the labels,
the embedded outputs in a
linear vector space
Antonio Rodrıguez-Sanchez (CLEF 2016) 32/41
Outline
Image classificationThe problemDecomposing the environment
The tensor decompositionWhat is itCompressionInterpretation of the image components
Learning approachMaximum Margin Regression
Experimental evaluationImageCLEF 2015
Experimental evaluationPascal and Flickr
Antonio Rodrıguez-Sanchez (CLEF 2016) 33/41
ImageCLEF dataset
I Task: multi-label classification
Figure: The hierarchy of classes in ImageCLEF multi-label challenge.
Antonio Rodrıguez-Sanchez (CLEF 2016) 34/41
Results on ImageCLEF
Degree of polynomial
F1
scor
e
F1
scor
e
Standard deviation
(a) (b)
Figure: Results for six filter sizes: 4, 8, 12, 20, 18 and 32 using 3 components,
training with two different kernel: a) polynomial b) Gaussian. The parameter varied in
F1 measure are degree of polynomial from 1 to 10 for polynomial kernel and values of
standard deviation of Gaussian for Gaussian kernel.
Antonio Rodrıguez-Sanchez (CLEF 2016) 35/41
Outline
Image classificationThe problemDecomposing the environment
The tensor decompositionWhat is itCompressionInterpretation of the image components
Learning approachMaximum Margin Regression
Experimental evaluationImageCLEF 2015
Experimental evaluationPascal and Flickr
Antonio Rodrıguez-Sanchez (CLEF 2016) 36/41
Pascal and Flickr: Features to compare to
Feature Dimension Source DescriptorHsv 4096 color HSVLab 4096 color LABRgb 4096 color RGBHsvV3H1 5184 color HSVLabV3H1 5184 color LABRgbV3H1 5184 color RGBDenseHue 100 texture hueHarrisHue 100 texture HueDenseHueV3H1 300 texture hueHarrisHueV3H1 300 texture HueDenseSift 1000 texture siftHarrisSift 1000 texture siftDenseSiftV3H1 3000 texture siftHarrisSiftV3H1 3000 texture sift
Figure: Comparing tensor decomposition with other features 1 on Pascal07 dataset
with Gaussian and Polynomial kernel. The decomposition chosen is 3 components
with factorization (22,22).
1Matthieu Guillaumin, Thomas Mensink, Jakob Verbeek, and Cordelia Schmid. Tagprop: Discriminative
metric learning in nearest neighbor models for image auto-annotation, 2009.
Antonio Rodrıguez-Sanchez (CLEF 2016) 37/41
Results on Pascal07 dataset
Gaussian kernelFeature P(%) R(%) F1(%)TD 0.4158 0.2877 0.3400HarrisSiftV3H1 0.4623 0.4491 0.4552HarrisSift 0.4202 0.4895 0.4522DenseSiftV3H1 0.4189 0.4886 0.4510DenseSift 0.3750 0.5044 0.4302LabV3H1 0.3911 0.3366 0.3618DenseHueV3H1 0.3884 0.3282 0.3558HarrisHueV3H1 0.3274 0.3884 0.3552RgbV3H1 0.3907 0.3224 0.3533HsvV3H1 0.4080 0.3048 0.3489Hsv 0.3911 0.3085 0.3449Lab 0.4135 0.2920 0.3423Rgb 0.3857 0.2985 0.3350HarrisHue 0.3930 0.2887 0.3328DenseHue 0.3962 0.2828 0.3299
Polynomial kernelFeature P(%) R(%) F1(%)TD 0.3931 0.2855 0.3308HarrisSiftV3H1 0.4002 0.5520 0.4640HarrisSift 0.3728 0.5523 0.4449DenseSiftV3H1 0.3592 0.5663 0.4396DenseSift 0.3442 0.5337 0.4184HsvV3H1 0.3815 0.3295 0.3536RgbV3H1 0.3479 0.3551 0.3515LabV3H1 0.3106 0.3868 0.3434HarrisHueV3H1 0.3110 0.3894 0.3417DenseHueV3H1 0.3166 0.3607 0.3363Hsv 0.3390 0.3232 0.3309HarrisHue 0.3037 0.3597 0.3241Rgb 0.2906 0.3420 0.3135Lab 0.2800 0.3389 0.3031DenseHue 0.2808 0.3329 0.2995
Figure: Comparing tensor decomposition with other features 1 on Pascal07 dataset
with Gaussian and Polynomial kernel. The decomposition chosen is 3 components
with factorization (22,22).
1Matthieu Guillaumin, Thomas Mensink, Jakob Verbeek, and Cordelia Schmid. Tagprop: Discriminative
metric learning in nearest neighbor models for image auto-annotation, 2009.
Antonio Rodrıguez-Sanchez (CLEF 2016) 38/41
Results on Flickr dataset
Gaussian kernelFeature P(%) R(%) F1(%)TD 0.3164 0.3780 0.3118HarrisSiftV3H1 0.5470 0.3842 0.4512DenseSift 0.5438 0.3862 0.4515HarrisSift 0.5368 0.3780 0.4435DenseSiftV3H1 0.5475 0.3807 0.4491LabV3H1 0.4693 0.3200 0.3806HarrisHueV3H1 0.4368 0.3288 0.3752DenseHueV3H1 0.4221 0.3333 0.3723HsvV3H1 0.4570 0.3062 0.3667HarrisHue 0.3753 0.3435 0.3587RgbV3H1 0.4150 0.3089 0.3542Lab 0.4153 0.3016 0.3494DenseHue 0.3854 0.3187 0.3477Rgb 0.4181 0.2824 0.3371Hsv 0.4152 0.2762 0.3317
Polynomial kernelFeature P(%) R(%) F1(%)TD 0.2311 0.2615 0.2453HarrisSiftV3H1 0.5289 0.4646 0.4940DenseSiftV3H1 0.5328 0.4415 0.4828HarrisSift 0.5260 0.4447 0.4819DenseSift 0.5132 0.4316 0.4688LabV3H1 0.4508 0.3533 0.3961HsvV3H1 0.3961 0.3655 0.3798HarrisHueV3H1 0.4115 0.3490 0.3777DenseHueV3H1 0.4086 0.3445 0.3737RgbV3H1 0.3996 0.3460 0.3704Lab 0.2717 0.5600 0.3658DenseHue 0.2698 0.5249 0.3564HarrisHue 0.3294 0.4159 0.3561Hsv 0.3603 0.3602 0.3540Rgb 0.3495 0.3406 0.3443
Figure: Comparing tensor decomposition with other features 1 on Flickr dataset with
Gaussian and Polynomial kernel. The decomposition chosen is 3 components with
factorization (22,22).
1Matthieu Guillaumin, Thomas Mensink, Jakob Verbeek, and Cordelia Schmid. Tagprop: Discriminative
metric learning in nearest neighbor models for image auto-annotation, 2009.
Antonio Rodrıguez-Sanchez (CLEF 2016) 39/41
Conclusions
I We have presented a method for feature extraction based ondecomposition of environment
I Pro:
1. Compression2. Automatic prediction of the best filters to use for extracting
features
I Cons:
1. Different decompositions can strong influence the final result2. Lack of a mechanism for automatically choose the best
parameters
Antonio Rodrıguez-Sanchez (CLEF 2016) 40/41