modern features-part-3-software

Andrea Vedaldi, University of Oxford

SoftwareFeature implementation and computation

https://sites.google.com/site/eccv12features/

goo.gl/jKC2J

http://www.vlfeat.org/benchmarks/index.html


http://goo.gl/jKC2J

http://goo.gl/jKC2J

2

VLFeatMATLAB ease of useCovariant detectors, MSER, SIFT, etc ...

OpenCVOne-shop experienceThe most popular vision library

OthersThere is certainly no lack of software!

3

VLFeatMATLAB ease of useCovariant detectors, MSER, SIFT, etc ...

OpenCVOne-shop experienceThe most popular vision library

OthersThere is certainly no lack of software!

An open and portable library of CV software

VLFeat 4

Linux WindowsMac

Quick Start with MATLAB

>>"untar"http://www.vlfeat.org/download/vlfeat50.9.165bin.tar.gz"

>>"run vlfeat-0.9.16/toolbox/vl_setup

Since 2008ACM OSS award


http://www.vlfeat.org/download/vlfeat-0.9.16-bin.tar.gz

http://www.vlfeat.org/download/vlfeat-0.9.16-bin.tar.gz



SIFT detector and descriptor

1. load an imageimPath = fullfile(‘oxford.jpg’) ;im = imread(imPath) ;

2. convert it to single precision gray scaleimgs = im2single(rgb2gray(im)) ;

3. run SIFT[frames, descrs] = vl_sift(imgs) ;

4. visualise keypointsimagesc(im) ; colormap gray; hold on ;vl_plotframe(frames) ;

5. visualise descriptorsvl_plotsiftdescriptor(... descrs(:,432), frames(:,432)) ;

5

Example: matching features

nr :� � :h� f� � � � � � t � :a� �r /� � /h� f� � � � � � t � /a� �r� � � � � � � � � � � � � h� f� � � � � � � � � � � t� :�� /a� �

6

684 tentative matches

Example: stitching

• Live demonstration

7

684 tentative matches

492 (76%) inliers mosaic

Other covariant detectors 8

frames = vl_covdet(imgs, 'verbose') ;

vl_plotframes(frames) ;imagesc(im) ;

Do not forget to convert the images to grayscale in single precision. Use

im2single(im) and not single(im) !

• frames has a feature frame for each columncontaining a stacked affine transformation (A,T)

Understanding feature frames 9

affine tf.x

y

�=

a11 a12

a21 a22

� bxby

�+

t1

t2

�

=

2

6666664

t1t2a11a21a12a22

3

7777775frames(:,i)

x

y

�=

a11 a12

a21 a22

� bxby

�b�

bb

�

b +

t1

t2

�

a2222a22

�

x

y

1bx

by

x

y

�=

a11 a12

a21 a22

� bxby

�+

t1

t2

�

A frame as an oriented ellipse 10

�

uniquely identifiesthe affine

transformation

reference = oriented ellipse

reference = two vectors

1bx

by

Different cornerness measures 11

frames = vl_covdet(imgs, 'method', 'Hessian') ;

DoG Hessian

HarrisLaplace HessianLaplace

MultiscaleHarris MultiscaleHessian

• Difference of Gaussian (Laplacian, trace of Hessian, or SIFT)

• Hessian (determinant of)


frames = vl_covdet(imgs, 'method', 'HarrisLaplace') ;

DoG Hessian



• Harris Laplace (Harris cornerness, Laplacian for scale selection)

• Hessian Laplace


frames = vl_covdet(imgs, 'method', 'MultiscaleHarris') ;

DoG Hessian



• Multiscale Harris (just a bunch of Harris)

• Multiscale Hessian

Affine adaptation 14

frames = vl_covdet(imgs, 'EstimateAffineShape', true) ;

Upright features 15

By default the rotation is selected so that the vertical

axis stays vertical.

Remark: for general affine transformations the horizontal direction may appear rotated

even for upright features.

x

y

�=

a11 0a21 a22

� bxby

�+

t1

t2

�In formulas

Orientation detection 16

frames = vl_covdet(imgs, 'EstimateOrientations', true) ;

• Each column of descrs is a stacked SIFT descriptor

• How is a SIFT descriptor associated to a feature?

• What about other descriptors?

Computing descriptors 17

[frames, descrs] = vl_covdet(imgs) ;

Computing descriptors 18

[frames, descrs] = vl_covdet(imgs, 'Descriptor', 'patch') ;

Each column of descrs is a stacked normalised image patch.

• Normalisation and patch resampling is controlled by three parameters:

The geometry of an normalised patch 19

� � � � � � � � � � � � � �(number of pixels)

� � � � � � � � � � � � � � � � � �(size of patch in normalised units)

� � � � � � � � � � � � � � � � � � � � (antialiasing effect)

affine tf.

Example: normalised patch for SIFT 20

� � � � � � � � � � � � � � � � � � � f� v� w

� � � � � � � � � � � � � � � � � � � � � = 1

� � � � � � � � � � � � � � � = 15

1 3 6

Example: running affine adaptation on a grid

Custom frames 21

frames = vl_covdet(imgs, ...'Frames', frames, ...

'EstimateAffineShape', true, ...'EstiamteOrientation', true) ;

Obtaining additional information on the features 22

info = gss: [1x1 struct] css: [1x1 struct] peakScores: [1x351 single] edgeScores: [1x351 single] orientationScores: [1x351 single] laplacianScaleScores: [1x351 single]

[frames, descrs, info] = vl_covdet(imgs);

−0.1 −0.05 0 0.05 0.10

20

40

60

80

100

120

Peak Score

Occ

uren

ces

0 2 4 6 8 100

20

40

60

80

100

120

Edge Score

Occ

uren

ces

Obtaining the scale space 23


octave −1 octave 0 octave 1

octave 2 octave 3


Gaussianscale space

octave −1 octave 0 octave 1

octave 2 octave 3

Obtaining the scale space 24



cornernessmeasure

Performance comparison 25

VL HessianLaplace (double)

VL DoG (double)

VL Hessian (double)

VL DoG (no aff)

VL HarrisLaplace (double)

CMP Hessian

VL HessianLaplace

VL DoG

VGG hes

VGG har

VL Hessian

VL HarrisLaplace

0.6500 0.7125 0.7750 0.8375 0.9000

70.4%

68.7%

71.9%

74.6%

80.1%

73.8%

77.1%

77.8%

74.0%

77.8%

85.1%

79.9%

mAP

VL HessianLaplace (double)

VL DoG (double)

VL Hessian (double)

VL DoG (no aff)

VL HarrisLaplace (double)

CMP Hessian

VL HessianLaplace

VL DoG

VGG hes

VGG har

VL Hessian

VL HarrisLaplace

0 2000 4000 6000 8000

765

1089

1384

1517

1854

2020

2292

2316

2794

3411

5631

7131

Num. features / image

Oxford 5K lite retrieval benchmark

Repeatability: viewpoint 26

30 40 50 60 200

10

20

30

40

50

60

70

80

90

100

Viewpoint angle

Rep

eata

bilit

y (g

raf)

Repeatability (graf)

VL DoGVL HessianVL HessianLaplaceVL HarrisLaplaceVL DoG (double)VL Hessian (double)VL HessianLaplace (double)VL HarrisLaplace (double)VLFeat SIFTCMP HessianVGG hesVGG har

Repeatability: viewpoint 27

30 40 50 60 200

10

20

30

40

50

60

70

80

90

100

Viewpoint angle

Rep

eata

bilit

y (w

all)

Repeatability (wall)


Repeatability: scale 28

1.38 1.9 2.35 2.8 1.120

10

20

30

40

50

60

70

80

90

100

Scale changes

Rep

eata

bilit

y (b

oat)

Repeatability (boat)


Repeatability: JPEG compression 29

80 90 95 98 600

10

20

30

40

50

60

70

80

90

100

JPEG compression %

Rep

eata

bilit

y (u

bc)

Repeatability (ubc)


[Matas et al. 2002]

Maximally Stable Extremal Regions (MSER)

• Extracts MSER regions regions and fitted ellipses frames:imgs = uint8(rgb2gray(I)) ;[regions,"frames]"="vl_mser(imgs)";

• Control region stability:[regions,"frames]"="vl_mser(imgs,"...""""""""""""""""""""""""""""‘MinDiversity’,"0.7,"...

""""""""""""""""""""""""""""‘MaxVariation’,"0.2,"...

""""""""""""""""""""""""""""‘Delta’,"10)";

30

31Many other features and tools2 A. Vedaldi and S. Soatto

Fig. 1. Mode seeking algorithms. Comparison of di�erent mode seeking algo-rithms (Sect. 2) on a toy problem. The black dots represent (some of) the data pointsxi ⇥ X � R2 and the intensity of the image is proportional to the Parzen density esti-mate P (x). Left. Mean shift moves the points uphill towards the mode approximatelyfollowing the gradient. Middle. Medoid shift approximates mean shift trajectories byconnecting data points. For reason explained in the text and in Fig. 2, medoid shiftsare constrained to connect points comprised in the red circles. This disconnects por-tions of the space where the data is sparse, and can be alleviated (but not solved) byiterating the procedure (Fig. 2). Right. Quick shift (Sect. 3) seeks the energy modesby connecting nearest neighbors at higher energy levels, trading-o� mode over- andunder-fragmentation.

and O(dN2 + N2.38) operations to cluster N points, where d is the dimension-ality of the data. On the other hand, mean shift is only O(dN2T ), where Tis the number of iterations of the algorithm, and clever implementations yielddT � N .

Contributions. In this paper we show that the computational complexity ofEuclidean medoid shift is only O(dN2) (with a small constant), which makes itfaster (not slower!) than mean shift (Sect. 3). We then generalize this result toa large family of non-Euclidean distances by using kernel methods [18], showingthat in this case the complexity is bounded by the e�ective dimensionality ofthe kernel space (Sect. 3). Working with kernels has other advantages: First, itextends to mean shift (Sect. 4); second, it gives an explicit interpretation of non-Euclidean medoid shift; third, it suggests why such generalized mode seekingalgorithms skirt the curse of dimensionality, despite estimating a density incomplex spaces (Sect. 4). In summary, we show that kernels extend mode seekingalgorithms to non-Euclidean spaces in a simple, general and e⇤cient way.

Can we conclude that medoid shift should replace mean shift? Unfortunately,not. We show that the weak point of medoid shift is its inability to identifyconsistently all the modes of the density (Sect. 2). This fact was addressedimplicitly by [20] who reiterate medoid shift on a simplified dataset (similarto [2]). However, this compromises the non-iterative nature of medoid shift andchanges the underlying density function (which may be undesirable). Moreover,we show that this fix does not always work (Fig. 2).

superpixels agglomerative information bottleneckMSER distance transform

feature maps, stochastic gradient SVMfast k-meansrandomized kd-trees

Fast Dense SIFT

Many other features and tools 32

Histogram of Oriented Gradients (HOG)SLIC superpixels

New SGD Learning of SVMs

HARVEST Programme

OpenCV

• The largest, best known vision library

• Features- OpenCV 2.1:

Harris, Tomasi&Shi, MSER, DoG, SURF, STAR and FAST,DoG + SIFT, SURF, HARRIS, GoodFeaturesToTrack

- OpenCV 2.4:ORB, BRIEF, and FREAK

33

#include <opencv2/core/core.hpp>#include <opencv2/features2d/features2d.hpp>#include <vector>

Mat image = importImage8UC1(in[IN_I]);Mat descriptors;std::vector<KeyPoint> frames;

ORB orb(numFeatures, scaleFactor, numLevels, edgeThreshold, 0, wtak, scoreTypeToORBPar(scoreType), patchSize);

orb(image, Mat(), frames);

Example: computing ORB features.

This is not all folks

• Abundance of open source / free software for CV- Always check out authors home pages

• Examples

34

http://robwhess.github.com/opensift/Rob Hess SIFT

http://www.chrisevansdev.com/computer-vision-opensurf.htmlOpen SURF

http://www.vision.ee.ethz.ch/~surf/SURF

http://www.edwardrosten.com/work/fast.htmlFAST corners

http://vision.ia.ac.cn/Students/bfan/index.htmLIOP descriptors ...

http://blogs.oregonstate.edu/hess/code/sift/

http://blogs.oregonstate.edu/hess/code/sift/

http://www.chrisevansdev.com/computer-vision-opensurf.html

http://www.chrisevansdev.com/computer-vision-opensurf.html

http://www.vision.ee.ethz.ch/~surf/

http://www.vision.ee.ethz.ch/~surf/

http://www.edwardrosten.com/work/fast.html

http://www.edwardrosten.com/work/fast.html

http://vision.ia.ac.cn/Students/bfan/index.htm

http://vision.ia.ac.cn/Students/bfan/index.htm

Credits 35

HARVEST Programme

Karel Lenc Daniele Perrone Michal Perdoch

Andrew Zisserman

Cordelia SchmidKrystian Mikolajczyk Jiri MatasTinne Tuytelaars

modern features-part-3-software

Documents

vl hessian double

aff2794 vl dog

vl dog1854vl dog80

verbose imagescim vl

1x351 single edgescores

1x351 singleorientationscores

1x351 singleoctave

covdetimgs info