kronecker decomposition for image classification · i tensor decomposition can be considered as a...

university of innsbruck

institute of computer scienceintelligent and interactive systems

Kronecker Decomposition for Image Classification

Sabrina Fontanella1,2, Antonio Rodrıguez-Sanchez 1, JustusPiater1, and Sandor Szedmak3

1University of Innsbruck

2University of Salerno

3Aalto University

Evora, September 2016

Outline

Image classificationThe problemDecomposing the environment

The tensor decompositionWhat is itCompressionInterpretation of the image components

Learning approachMaximum Margin Regression

Experimental evaluationImageCLEF 2015

Experimental evaluationPascal and Flickr

Antonio Rodrıguez-Sanchez (CLEF 2016) 1/41

Outline







Image classification I

I Images are classified according to their visual contentI Applicability:

1. Recognition of specific objects2. Indoor/outdoor recognition3. Analysis of medical images


Image classification II

I Example of classification algorithm, Bag of Words:

1. Features extraction, stored into feature vectors2. Approximation of the distribution of the features by an

histogram3. Apply a classification algorithm (Support Vector Machine,

Neural Network, Markov Random Field, etc)


Relations between objects are of interest

I Is it possible to recognize relationships between the objectsappearing in a scene?

I This is of interest, since this relationship can provideknowledge necessary to identify and classify the image

I E.g. A car is quite likely to be in an image where there is alsobuildings and people.

I E.g. A zebra is quite likely to be outdoors, surrounded bySavanna plants or animals.


Outline







Decomposing the environment

I Structured decomposition of the environmentI Learning structured output is a popular stream of machine

learning

I By decomposing the matrix that represent the image, thestructure behind the scene could be captured

I Let us consider 2D image decompositionI Points close to each other within continuous 2D blocks can

strongly relate to each other


Outline







Tensor decomposition

I A tensor is a multidimensional or N-way arrayI an N- way or Nth-order tensor is an element of the tensor

product of N vector spaces

I Tensor decomposition can be considered as a higher- ordergeneralization of the matrix singular value decomposition(SVD) and principal component analysis (PCA)

I The tensor decomposition for a same image is not uniqueI Given an RGB image of size (256,256,3), it is possible to

perform the following decompositions:I (16,16,3),(16,16,1)

tensor + matrix (2 components)I (8,8,3), (8,8,1), (4,4,1)

tensor + 2 matrices (3 components)


Tensor decomposition

I Concerning computer vision, the tensor decomposition couldbe used to represent:

I Color images, where three matrices express the RGB imagesand we can use a tensor of order three (for example(1024,1024,3)).

I Video stream of color images where the dimensions are R, G, Band the time.


The Kronecker product

Given two matrices A ∈ RmA×nA and B ∈ RmB×nB , the Kroneckerproduct X can be expressed as:

X = A⊗ B

A1,1B A1,2B · · · A1,nABA2,1B A2,2B · · · A2,nAB

......

. . ....

AmA,1B AmA,2B · · · AmA,nAB

with

mX = mA ×mB , nX = nA × nB

I If X is given (the image), how can we compute A and B (itscomponents)?

I B can be considered as a 2D filter of the image represented bythe matrix X components)?


The Kronecker decomposition and SVD

I The Kronecker decomposition can be carried out by SingularValue Decomposition(SVD)

I Given an arbitrary matrix X with size m × n the SVD is givenby

X = USV T

where

I U ∈ Rmxm is an orthogonal matrix of left singular vectors,where UUT = Im ,

I V ∈ Rnxn , is an orthogonal matrix of right singular vectors,where VV T = In ,

I S ∈ Rmxn , is a diagonal matrix containing the singular valueswith nonnegative components in its diagonal


Note

I The algorithm solving the SVD does not depend on theorder of the elements of the matrix

I Thus, any permutation of the indexes, reordering, of thecolumns and (or) rows preserves the same solution

I We can then work on a reordered representation of thematrix X


Algorithm for solving Kronecker decomposition

1. Reorder the matrix

2. Compute SVD decomposition

3. Compute the approximation of X

4. Invert the reordering


Nearest Kronecker Product (NKP)

I Given a matrix X ∈ Rmxn, the NKP problem involvesminimizing:

φ(A,B) = ‖X − A⊗ B‖F F is the Frobenius norm

I This problem can be solved using SVD, working on areordered representation of X


Step 1: Reorder matrix X 1

X = A⊗ Bx11 x12 x13 x14 x15 x16

x21 x22 x23 x24 x25 x26

x31 x32 x33 x34 x35 x36

x41 x42 x43 x44 x45 x46

x51 x52 x53 x54 x55 x56

x61 x62 x63 x64 x65 x66

=

a11 a12 a13

a21 a22 a23

a31 a32 a33

⊗ [ b11 b12

b21 b22

],

can be reordered into

X = A⊗ Bx11 x13 x15 x31 x33 x35 x51 x53 x55

x12 x14 x16 x32 x34 x36 x52 x54 x56

x21 x23 x25 x41 x43 x45 x61 x63 x65

x22 x24 x26 x42 x44 x46 x62 x64 x66

=

b11

b12

b21

b22

⊗ [ a11 a12 a13 a21 a22 a23 a31 a32 a33]

,

1C.F.V. Loan. The ubiquitous Kronecker product. Journal of Computational and Applied Mathematics,

123:85-100, 2000.


Approximation of X and reordering

I ‖X − vec(A)⊗ vec(B)‖FI Vec() is a vectorization operator which stacks columns of a

matrix on top of each other

I Problem of finding the nearest rank-1 matrix to X

I Well known solutions using SVD


Step 2: Compute SVD decomposition

‖X − vec(A)⊗ vec(B)‖F

I Let X = USV T the decomposition of X

I The best A and B are defined as:

A =√σ1U(:, 1) and B =

√σ1V (:; 1)

where σ1 is the largest singular value and U and V are thecorresponding singular vectors


Steps 3 and 4: Approximation and reordering

I Once we have A and B is possible to compute theapproximation of X

I Since at beginning we have changed the order of values intomatrix, invert the reordering is necessary for obtain theoriginal A and B


Components and factorization

I The number of components and factorization influence thelevel of details

I Given, for example, a gray image of size (1024,1024):I If it has many details, is better chose many components with

small factorization:I Example: (4,4)(4,4)(4,4)(4,4)(4,4)

I If is less detailed, less component with high factorization:I Example: (32,32)(32,32)


Outline







Compression I

I The tensor decomposition can provide a very high level ofimages’ compression

I It takes consideration only the largest singular values(Eckart-Young theorem)

I The level of compression is given by the total number of:

elements in image matrix

elements of components in the decomposition


Compression II

I LetI nsv number of singular values taken in considerationI nf number of factors for componentI v value of factorsI nc the number of components used

Then the total number of elements of components is given by:

nsv ∗ nc ∗ vnf

I For simplify the notation we assume that the all factors areequal for every component

I Decomposition with different factors can be taken inconsideration

I For example (32,28)(16,8)(2,4)


Compression III: Example

I Given an image of size (1024,1024).I It can be compressed with components

I (32,32)(32,32) and with 10 singular values by:

10242

10 ∗ 2 ∗ 322= 51.2

I (4,4),(4,4),(4,4),(4,4),(4,4) and with 10 singular values by:

10242

10 ∗ 5 ∗ 42= 1310.72


Compression IV: Example

Compression ratio: 202

Compression ratio: 99

Figure: Example of compression on toys room image.


Outline







Interpretation of image components I

X = A⊗ B

I B can be interpreted like an image filter

I It finds the boundary of the critical regions where most of thestructural information concentrates

I This represents a big advantage:I In general, in image filtering processes, a predetermined filter is

usedI The Kronecker decomposition automatically tries to predict

the optimal filters


Interpretation of image components II

Highest components (A) Lowest components (B)

Figure: Toys room picture and its components. The Highest component and the

Lowest component correspond to the matrices A1, ... and B1, ... respectively.


Outline







Learning

I Sample set of pairs of output and input objects

{(yi , xi ) : yi ∈ Y, xi ∈ X , i = 1, ...,m}

I Define two functions, φ and ψ , that map the input andoutput objects respectively into linear vector spaces

I feature space in case of the inputI label space in case of the output

φ : X → Hφ and ψ : Y → Hψ


Objective

I Find a linear function acting on the feature space

f (φ(x)) = Wφ(x) + b

that produces a prediction of every input object in the labelspace

I The output corresponding to X is:

y = ψ−1(f (φ(x)))


MMR (Maximum Margin Regression) vs SVM (SupportVector Machine)

I MMR is a framework for multilabel classificationI Is based on Support Vector Machine (SVM)

I Key idea: reinterpretation of the normal vector w

SVM Extended View

I w is the normal vector of

the separating hyperplane.

I W is a linear operator

projecting the feature space

into the label space

I yi ∈ {−1,+1} binary

outputs.

I The labels are equal to the

binary objects.

I yi ∈ Y arbitrary outputs

I ψ(yi ) ∈ Hψ are the labels,

the embedded outputs in a

linear vector space


Outline







ImageCLEF dataset

I Task: multi-label classification

Figure: The hierarchy of classes in ImageCLEF multi-label challenge.


Results on ImageCLEF

Degree of polynomial

F1

scor

e

F1

scor

e

Standard deviation

(a) (b)

Figure: Results for six filter sizes: 4, 8, 12, 20, 18 and 32 using 3 components,

training with two different kernel: a) polynomial b) Gaussian. The parameter varied in

F1 measure are degree of polynomial from 1 to 10 for polynomial kernel and values of

standard deviation of Gaussian for Gaussian kernel.


Outline







Pascal and Flickr: Features to compare to

Feature Dimension Source DescriptorHsv 4096 color HSVLab 4096 color LABRgb 4096 color RGBHsvV3H1 5184 color HSVLabV3H1 5184 color LABRgbV3H1 5184 color RGBDenseHue 100 texture hueHarrisHue 100 texture HueDenseHueV3H1 300 texture hueHarrisHueV3H1 300 texture HueDenseSift 1000 texture siftHarrisSift 1000 texture siftDenseSiftV3H1 3000 texture siftHarrisSiftV3H1 3000 texture sift

Figure: Comparing tensor decomposition with other features 1 on Pascal07 dataset

with Gaussian and Polynomial kernel. The decomposition chosen is 3 components

with factorization (22,22).

1Matthieu Guillaumin, Thomas Mensink, Jakob Verbeek, and Cordelia Schmid. Tagprop: Discriminative

metric learning in nearest neighbor models for image auto-annotation, 2009.


Results on Pascal07 dataset

Gaussian kernelFeature P(%) R(%) F1(%)TD 0.4158 0.2877 0.3400HarrisSiftV3H1 0.4623 0.4491 0.4552HarrisSift 0.4202 0.4895 0.4522DenseSiftV3H1 0.4189 0.4886 0.4510DenseSift 0.3750 0.5044 0.4302LabV3H1 0.3911 0.3366 0.3618DenseHueV3H1 0.3884 0.3282 0.3558HarrisHueV3H1 0.3274 0.3884 0.3552RgbV3H1 0.3907 0.3224 0.3533HsvV3H1 0.4080 0.3048 0.3489Hsv 0.3911 0.3085 0.3449Lab 0.4135 0.2920 0.3423Rgb 0.3857 0.2985 0.3350HarrisHue 0.3930 0.2887 0.3328DenseHue 0.3962 0.2828 0.3299

Polynomial kernelFeature P(%) R(%) F1(%)TD 0.3931 0.2855 0.3308HarrisSiftV3H1 0.4002 0.5520 0.4640HarrisSift 0.3728 0.5523 0.4449DenseSiftV3H1 0.3592 0.5663 0.4396DenseSift 0.3442 0.5337 0.4184HsvV3H1 0.3815 0.3295 0.3536RgbV3H1 0.3479 0.3551 0.3515LabV3H1 0.3106 0.3868 0.3434HarrisHueV3H1 0.3110 0.3894 0.3417DenseHueV3H1 0.3166 0.3607 0.3363Hsv 0.3390 0.3232 0.3309HarrisHue 0.3037 0.3597 0.3241Rgb 0.2906 0.3420 0.3135Lab 0.2800 0.3389 0.3031DenseHue 0.2808 0.3329 0.2995

Figure: Comparing tensor decomposition with other features 1 on Pascal07 dataset

with Gaussian and Polynomial kernel. The decomposition chosen is 3 components

with factorization (22,22).




Results on Flickr dataset

Gaussian kernelFeature P(%) R(%) F1(%)TD 0.3164 0.3780 0.3118HarrisSiftV3H1 0.5470 0.3842 0.4512DenseSift 0.5438 0.3862 0.4515HarrisSift 0.5368 0.3780 0.4435DenseSiftV3H1 0.5475 0.3807 0.4491LabV3H1 0.4693 0.3200 0.3806HarrisHueV3H1 0.4368 0.3288 0.3752DenseHueV3H1 0.4221 0.3333 0.3723HsvV3H1 0.4570 0.3062 0.3667HarrisHue 0.3753 0.3435 0.3587RgbV3H1 0.4150 0.3089 0.3542Lab 0.4153 0.3016 0.3494DenseHue 0.3854 0.3187 0.3477Rgb 0.4181 0.2824 0.3371Hsv 0.4152 0.2762 0.3317

Polynomial kernelFeature P(%) R(%) F1(%)TD 0.2311 0.2615 0.2453HarrisSiftV3H1 0.5289 0.4646 0.4940DenseSiftV3H1 0.5328 0.4415 0.4828HarrisSift 0.5260 0.4447 0.4819DenseSift 0.5132 0.4316 0.4688LabV3H1 0.4508 0.3533 0.3961HsvV3H1 0.3961 0.3655 0.3798HarrisHueV3H1 0.4115 0.3490 0.3777DenseHueV3H1 0.4086 0.3445 0.3737RgbV3H1 0.3996 0.3460 0.3704Lab 0.2717 0.5600 0.3658DenseHue 0.2698 0.5249 0.3564HarrisHue 0.3294 0.4159 0.3561Hsv 0.3603 0.3602 0.3540Rgb 0.3495 0.3406 0.3443

Figure: Comparing tensor decomposition with other features 1 on Flickr dataset with

Gaussian and Polynomial kernel. The decomposition chosen is 3 components with

factorization (22,22).




Conclusions

I We have presented a method for feature extraction based ondecomposition of environment

I Pro:

1. Compression2. Automatic prediction of the best filters to use for extracting

features

I Cons:

1. Different decompositions can strong influence the final result2. Lack of a mechanism for automatically choose the best

parameters


kronecker decomposition for image classification · i tensor decomposition can be considered as a...

Documents