modern features-part-3-software
TRANSCRIPT
Andrea Vedaldi, University of Oxford
SoftwareFeature implementation and computation
https://sites.google.com/site/eccv12features/
goo.gl/jKC2J
2
VLFeatMATLAB ease of useCovariant detectors, MSER, SIFT, etc ...
OpenCVOne-shop experienceThe most popular vision library
OthersThere is certainly no lack of software!
3
VLFeatMATLAB ease of useCovariant detectors, MSER, SIFT, etc ...
OpenCVOne-shop experienceThe most popular vision library
OthersThere is certainly no lack of software!
An open and portable library of CV software
VLFeat 4
Linux WindowsMac
Quick Start with MATLAB
>>"untar"http://www.vlfeat.org/download/vlfeat50.9.165bin.tar.gz"
>>"run vlfeat-0.9.16/toolbox/vl_setup
Since 2008ACM OSS award
http://www.vlfeat.org/benchmarks/index.html
SIFT detector and descriptor
1. load an imageimPath = fullfile(‘oxford.jpg’) ;im = imread(imPath) ;
2. convert it to single precision gray scaleimgs = im2single(rgb2gray(im)) ;
3. run SIFT[frames, descrs] = vl_sift(imgs) ;
4. visualise keypointsimagesc(im) ; colormap gray; hold on ;vl_plotframe(frames) ;
5. visualise descriptorsvl_plotsiftdescriptor(... descrs(:,432), frames(:,432)) ;
5
Example: matching features
nr :� � :h� f� � � � � � t � :a� �r /� � /h� f� � � � � � t � /a� �r� � � � � � � � � � � � � h� f� � � � � � � � � � � t� :�� /a� �
6
684 tentative matches
Example: stitching
• Live demonstration
7
684 tentative matches
492 (76%) inliers mosaic
Other covariant detectors 8
frames = vl_covdet(imgs, 'verbose') ;
vl_plotframes(frames) ;imagesc(im) ;
Do not forget to convert the images to grayscale in single precision. Use
im2single(im) and not single(im) !
• frames has a feature frame for each columncontaining a stacked affine transformation (A,T)
Understanding feature frames 9
affine tf.x
y
�=
a11 a12
a21 a22
� bxby
�+
t1
t2
�
=
2
6666664
t1t2a11a21a12a22
3
7777775frames(:,i)
x
y
�=
a11 a12
a21 a22
� bxby
�b�
bb
�
b +
t1
t2
�
a2222a22
�
x
y
1bx
by
x
y
�=
a11 a12
a21 a22
� bxby
�+
t1
t2
�
A frame as an oriented ellipse 10
�
uniquely identifiesthe affine
transformation
reference = oriented ellipse
reference = two vectors
1bx
by
Different cornerness measures 11
frames = vl_covdet(imgs, 'method', 'Hessian') ;
DoG Hessian
HarrisLaplace HessianLaplace
MultiscaleHarris MultiscaleHessian
• Difference of Gaussian (Laplacian, trace of Hessian, or SIFT)
• Hessian (determinant of)
Different cornerness measures 12
frames = vl_covdet(imgs, 'method', 'HarrisLaplace') ;
DoG Hessian
HarrisLaplace HessianLaplace
MultiscaleHarris MultiscaleHessian
• Harris Laplace (Harris cornerness, Laplacian for scale selection)
• Hessian Laplace
Different cornerness measures 13
frames = vl_covdet(imgs, 'method', 'MultiscaleHarris') ;
DoG Hessian
HarrisLaplace HessianLaplace
MultiscaleHarris MultiscaleHessian
• Multiscale Harris (just a bunch of Harris)
• Multiscale Hessian
Affine adaptation 14
frames = vl_covdet(imgs, 'EstimateAffineShape', true) ;
Upright features 15
By default the rotation is selected so that the vertical
axis stays vertical.
Remark: for general affine transformations the horizontal direction may appear rotated
even for upright features.
x
y
�=
a11 0a21 a22
� bxby
�+
t1
t2
�In formulas
Orientation detection 16
frames = vl_covdet(imgs, 'EstimateOrientations', true) ;
• Each column of descrs is a stacked SIFT descriptor
• How is a SIFT descriptor associated to a feature?
• What about other descriptors?
Computing descriptors 17
[frames, descrs] = vl_covdet(imgs) ;
Computing descriptors 18
[frames, descrs] = vl_covdet(imgs, 'Descriptor', 'patch') ;
Each column of descrs is a stacked normalised image patch.
• Normalisation and patch resampling is controlled by three parameters:
The geometry of an normalised patch 19
� � � � � � � � � � � � � �(number of pixels)
� � � � � � � � � � � � � � � � � �(size of patch in normalised units)
� � � � � � � � � � � � � � � � � � � � (antialiasing effect)
affine tf.
Example: normalised patch for SIFT 20
� � � � � � � � � � � � � � � � � � � f� v� w
� � � � � � � � � � � � � � � � � � � � � = 1
� � � � � � � � � � � � � � � = 15
1 3 6
Example: running affine adaptation on a grid
Custom frames 21
frames = vl_covdet(imgs, ...'Frames', frames, ...
'EstimateAffineShape', true, ...'EstiamteOrientation', true) ;
Obtaining additional information on the features 22
info = gss: [1x1 struct] css: [1x1 struct] peakScores: [1x351 single] edgeScores: [1x351 single] orientationScores: [1x351 single] laplacianScaleScores: [1x351 single]
[frames, descrs, info] = vl_covdet(imgs);
−0.1 −0.05 0 0.05 0.10
20
40
60
80
100
120
Peak Score
Occ
uren
ces
0 2 4 6 8 100
20
40
60
80
100
120
Edge Score
Occ
uren
ces
Obtaining the scale space 23
[frames, descrs, info] = vl_covdet(imgs);
octave −1 octave 0 octave 1
octave 2 octave 3
info = gss: [1x1 struct] css: [1x1 struct] peakScores: [1x351 single] edgeScores: [1x351 single] orientationScores: [1x351 single] laplacianScaleScores: [1x351 single]
Gaussianscale space
octave −1 octave 0 octave 1
octave 2 octave 3
Obtaining the scale space 24
[frames, descrs, info] = vl_covdet(imgs);
info = gss: [1x1 struct] css: [1x1 struct] peakScores: [1x351 single] edgeScores: [1x351 single] orientationScores: [1x351 single] laplacianScaleScores: [1x351 single]
cornernessmeasure
Performance comparison 25
VL HessianLaplace (double)
VL DoG (double)
VL Hessian (double)
VL DoG (no aff)
VL HarrisLaplace (double)
CMP Hessian
VL HessianLaplace
VL DoG
VGG hes
VGG har
VL Hessian
VL HarrisLaplace
0.6500 0.7125 0.7750 0.8375 0.9000
70.4%
68.7%
71.9%
74.6%
80.1%
73.8%
77.1%
77.8%
74.0%
77.8%
85.1%
79.9%
mAP
VL HessianLaplace (double)
VL DoG (double)
VL Hessian (double)
VL DoG (no aff)
VL HarrisLaplace (double)
CMP Hessian
VL HessianLaplace
VL DoG
VGG hes
VGG har
VL Hessian
VL HarrisLaplace
0 2000 4000 6000 8000
765
1089
1384
1517
1854
2020
2292
2316
2794
3411
5631
7131
Num. features / image
Oxford 5K lite retrieval benchmark
Repeatability: viewpoint 26
30 40 50 60 200
10
20
30
40
50
60
70
80
90
100
Viewpoint angle
Rep
eata
bilit
y (g
raf)
Repeatability (graf)
VL DoGVL HessianVL HessianLaplaceVL HarrisLaplaceVL DoG (double)VL Hessian (double)VL HessianLaplace (double)VL HarrisLaplace (double)VLFeat SIFTCMP HessianVGG hesVGG har
Repeatability: viewpoint 27
30 40 50 60 200
10
20
30
40
50
60
70
80
90
100
Viewpoint angle
Rep
eata
bilit
y (w
all)
Repeatability (wall)
VL DoGVL HessianVL HessianLaplaceVL HarrisLaplaceVL DoG (double)VL Hessian (double)VL HessianLaplace (double)VL HarrisLaplace (double)VLFeat SIFTCMP HessianVGG hesVGG har
Repeatability: scale 28
1.38 1.9 2.35 2.8 1.120
10
20
30
40
50
60
70
80
90
100
Scale changes
Rep
eata
bilit
y (b
oat)
Repeatability (boat)
VL DoGVL HessianVL HessianLaplaceVL HarrisLaplaceVL DoG (double)VL Hessian (double)VL HessianLaplace (double)VL HarrisLaplace (double)VLFeat SIFTCMP HessianVGG hesVGG har
Repeatability: JPEG compression 29
80 90 95 98 600
10
20
30
40
50
60
70
80
90
100
JPEG compression %
Rep
eata
bilit
y (u
bc)
Repeatability (ubc)
VL DoGVL HessianVL HessianLaplaceVL HarrisLaplaceVL DoG (double)VL Hessian (double)VL HessianLaplace (double)VL HarrisLaplace (double)VLFeat SIFTCMP HessianVGG hesVGG har
[Matas et al. 2002]
Maximally Stable Extremal Regions (MSER)
• Extracts MSER regions regions and fitted ellipses frames:imgs = uint8(rgb2gray(I)) ;[regions,"frames]"="vl_mser(imgs)";
• Control region stability:[regions,"frames]"="vl_mser(imgs,"...""""""""""""""""""""""""""""‘MinDiversity’,"0.7,"...
""""""""""""""""""""""""""""‘MaxVariation’,"0.2,"...
""""""""""""""""""""""""""""‘Delta’,"10)";
30
31Many other features and tools2 A. Vedaldi and S. Soatto
Fig. 1. Mode seeking algorithms. Comparison of di�erent mode seeking algo-rithms (Sect. 2) on a toy problem. The black dots represent (some of) the data pointsxi ⇥ X � R2 and the intensity of the image is proportional to the Parzen density esti-mate P (x). Left. Mean shift moves the points uphill towards the mode approximatelyfollowing the gradient. Middle. Medoid shift approximates mean shift trajectories byconnecting data points. For reason explained in the text and in Fig. 2, medoid shiftsare constrained to connect points comprised in the red circles. This disconnects por-tions of the space where the data is sparse, and can be alleviated (but not solved) byiterating the procedure (Fig. 2). Right. Quick shift (Sect. 3) seeks the energy modesby connecting nearest neighbors at higher energy levels, trading-o� mode over- andunder-fragmentation.
and O(dN2 + N2.38) operations to cluster N points, where d is the dimension-ality of the data. On the other hand, mean shift is only O(dN2T ), where Tis the number of iterations of the algorithm, and clever implementations yielddT � N .
Contributions. In this paper we show that the computational complexity ofEuclidean medoid shift is only O(dN2) (with a small constant), which makes itfaster (not slower!) than mean shift (Sect. 3). We then generalize this result toa large family of non-Euclidean distances by using kernel methods [18], showingthat in this case the complexity is bounded by the e�ective dimensionality ofthe kernel space (Sect. 3). Working with kernels has other advantages: First, itextends to mean shift (Sect. 4); second, it gives an explicit interpretation of non-Euclidean medoid shift; third, it suggests why such generalized mode seekingalgorithms skirt the curse of dimensionality, despite estimating a density incomplex spaces (Sect. 4). In summary, we show that kernels extend mode seekingalgorithms to non-Euclidean spaces in a simple, general and e⇤cient way.
Can we conclude that medoid shift should replace mean shift? Unfortunately,not. We show that the weak point of medoid shift is its inability to identifyconsistently all the modes of the density (Sect. 2). This fact was addressedimplicitly by [20] who reiterate medoid shift on a simplified dataset (similarto [2]). However, this compromises the non-iterative nature of medoid shift andchanges the underlying density function (which may be undesirable). Moreover,we show that this fix does not always work (Fig. 2).
superpixels agglomerative information bottleneckMSER distance transform
feature maps, stochastic gradient SVMfast k-meansrandomized kd-trees
Fast Dense SIFT
Many other features and tools 32
Histogram of Oriented Gradients (HOG)SLIC superpixels
New SGD Learning of SVMs
HARVEST Programme
OpenCV
• The largest, best known vision library
• Features- OpenCV 2.1:
Harris, Tomasi&Shi, MSER, DoG, SURF, STAR and FAST,DoG + SIFT, SURF, HARRIS, GoodFeaturesToTrack
- OpenCV 2.4:ORB, BRIEF, and FREAK
33
#include <opencv2/core/core.hpp>#include <opencv2/features2d/features2d.hpp>#include <vector>
Mat image = importImage8UC1(in[IN_I]);Mat descriptors;std::vector<KeyPoint> frames;
ORB orb(numFeatures, scaleFactor, numLevels, edgeThreshold, 0, wtak, scoreTypeToORBPar(scoreType), patchSize);
orb(image, Mat(), frames);
Example: computing ORB features.
This is not all folks
• Abundance of open source / free software for CV- Always check out authors home pages
• Examples
34
http://robwhess.github.com/opensift/Rob Hess SIFT
http://www.chrisevansdev.com/computer-vision-opensurf.htmlOpen SURF
http://www.vision.ee.ethz.ch/~surf/SURF
http://www.edwardrosten.com/work/fast.htmlFAST corners
http://vision.ia.ac.cn/Students/bfan/index.htmLIOP descriptors ...
Credits 35
HARVEST Programme
Karel Lenc Daniele Perrone Michal Perdoch
Andrew Zisserman
Cordelia SchmidKrystian Mikolajczyk Jiri MatasTinne Tuytelaars