speeded-uprobustfeatures(surf) misha/readingseminar/papers/bay08.pdf ·...
Post on 08-Nov-2018
Embed Size (px)
Speeded-Up Robust Features (SURF)
Herbert Bay a , Andreas Ess a , Tinne Tuytelaars b , and Luc Van Gool a,baETH Zurich, BIWI
Sternwartstrasse 7CH-8092 Zurich
SwitzerlandbK. U. Leuven, ESAT-PSI
Kasteelpark Arenberg 10
This article presents a novel scale- and rotation-invariant detector and descriptor, coined SURF (Speeded-Up Robust Features).SURF approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness,yet can be computed and compared much faster.
This is achieved by relying on integral images for image convolutions; by building on the strengths of the leading existing detectorsand descriptors (specifically, using a Hessian matrix-based measure for the detector, and a distribution-based descriptor); and bysimplifying these methods to the essential. This leads to a combination of novel detection, description, and matching steps.
The paper encompasses a detailed description of the detector and descriptor and then explores the effect of the most importantparameters. We conclude the article with SURFs application to two challenging, yet converse goals: camera calibration as a specialcase of image registration, and object recognition. Our experiments underline SURFs usefulness in a broad range of topics incomputer vision.
Key words: interest points, local features, feature description, camera calibration, object recognition
The task of finding point correspondences between twoimages of the same scene or object is part of many computervision applications. Image registration, camera calibration,object recognition, and image retrieval are just a few.
The search for discrete image point correspondences canbe divided into three main steps. First, interest pointsare selected at distinctive locations in the image, such ascorners, blobs, and T-junctions. The most valuable prop-erty of an interest point detector is its repeatability. Therepeatability expresses the reliability of a detector for find-ing the same physical interest points under different view-ing conditions. Next, the neighbourhood of every interestpoint is represented by a feature vector. This descriptor hasto be distinctive and at the same time robust to noise, de-tection displacements and geometric and photometric de-formations. Finally, the descriptor vectors are matched be-tween different images. The matching is based on a distance
between the vectors, e.g. the Mahalanobis or Euclidean dis-tance. The dimension of the descriptor has a direct impacton the time this takes, and less dimensions are desirable forfast interest point matching. However, lower dimensionalfeature vectors are in general less distinctive than theirhigh-dimensional counterparts.
It has been our goal to develop both a detector and de-scriptor that, in comparison to the state-of-the-art, are fastto compute while not sacrificing performance. In order tosucceed, one has to strike a balance between the aboverequirements like simplifying the detection scheme whilekeeping it accurate, and reducing the descriptors size whilekeeping it sufficiently distinctive.
A wide variety of detectors and descriptors have alreadybeen proposed in the literature (e.g. [21,24,27,37,39,25]).Also, detailed comparisons and evaluations on benchmark-ing datasets have been performed [28,30,31]. Our fast de-tector and descriptor, called SURF (Speeded-Up RobustFeatures), was introduced in . It is built on the insights
Preprint submitted to Elsevier 10 September 2008
gained from this previous work. In our experiments on thesebenchmarking datasets, SURFs detector and descriptorare not only faster, but the former is also more repeatableand the latter more distinctive.
We focus on scale and in-plane rotation invariant detec-tors and descriptors. These seem to offer a good compromisebetween feature complexity and robustness to commonlyoccurring deformations. Skew, anisotropic scaling, and per-spective effects are assumed to be second-order effects, thatare covered to some degree by the overall robustness of thedescriptor. Note that the descriptor can be extended to-wards affine invariant regions using affine normalisation ofthe ellipse (cf. ), although this will have an impact onthe computation time. Extending the detector, on the otherhand, is less straight-forward. Concerning the photometricdeformations, we assume a simple linear model with a bias(offset) and contrast change (scale factor). Neither detectornor descriptor use colour information.
In section 3, we describe the strategy applied for fast androbust interest point detection. The input image is anal-ysed at different scales in order to guarantee invariance toscale changes. The detected interest points are providedwith a rotation and scale invariant descriptor in section 4.Furthermore, a simple and efficient first-line indexing tech-nique, based on the contrast of the interest point with itssurrounding, is proposed.
In section 5, some of the available parameters and theireffects are discussed, including the benefits of an uprightversion (not invariant to image rotation). We also inves-tigate SURFs performance in two important applicationscenarios. First, we consider a special case of image regis-tration, namely the problem of camera calibration for 3Dreconstruction. Second, we will explore SURFs applica-tion to an object recognition experiment. Both applicationshighlight SURFs benefits in terms of speed and robustnessas opposed to other strategies. The article is concluded insection 6.
2. Related Work
2.1. Interest Point Detection
The most widely used detector is probably the Harris cor-ner detector , proposed back in 1988. It is based on theeigenvalues of the second moment matrix. However, Harriscorners are not scale-invariant. Lindeberg  introducedthe concept of automatic scale selection. This allows to de-tect interest points in an image, each with their own char-acteristic scale. He experimented with both the determi-nant of the Hessian matrix as well as the Laplacian (whichcorresponds to the trace of the Hessian matrix) to detectblob-like structures. Mikolajczyk and Schmid  refinedthis method, creating robust and scale-invariant feature de-tectors with high repeatability, which they coined Harris-Laplace and Hessian-Laplace. They used a (scale-adapted)Harris measure or the determinant of the Hessian matrix
to select the location, and the Laplacian to select the scale.Focusing on speed, Lowe  proposed to approximate theLaplacian of Gaussians (LoG) by a Difference of Gaussians(DoG) filter.
Several other scale-invariant interest point detectorshave been proposed. Examples are the salient region detec-tor, proposed by Kadir and Brady , which maximisesthe entropy within the region, and the edge-based regiondetector proposed by Jurie and Schmid . They seemless amenable to acceleration though. Also several affine-invariant feature detectors have been proposed that cancope with wider viewpoint changes. However, these falloutside the scope of this article.
From studying the existing detectors and from publishedcomparisons [29,30], we can conclude that Hessian-baseddetectors are more stable and repeatable than their Harris-based counterparts. Moreover, using the determinant of theHessian matrix rather than its trace (the Laplacian) seemsadvantageous, as it fires less on elongated, ill-localisedstructures. We also observed that approximations like theDoG can bring speed at a low cost in terms of lost accuracy.
2.2. Interest Point Description
An even larger variety of feature descriptors has beenproposed, like Gaussian derivatives , moment invariants, complex features [1,36], steerable filters , phase-based local features , and descriptors representing thedistribution of smaller-scale features within the interestpoint neighbourhood. The latter, introduced by Lowe ,have been shown to outperform the others . This canbe explained by the fact that they capture a substantialamount of information about the spatial intensity patterns,while at the same time being robust to small deformationsor localisation errors. The descriptor in , called SIFTfor short, computes a histogram of local oriented gradi-ents around the interest point and stores the bins in a 128-dimensional vector (8 orientation bins for each of 4 4 lo-cation bins).
Various refinements on this basic scheme have been pro-posed. Ke and Sukthankar  applied PCA on the gradientimage around the detected interest point. This PCA-SIFTyields a 36-dimensional descriptor which is fast for match-ing, but proved to be less distinctive than SIFT in a sec-ond comparative study by Mikolajczyk ; and applyingPCA slows down feature computation. In the same paper, the authors proposed a variant of SIFT, called GLOH,which proved to be even more distinctive with the samenumber of dimensions. However, GLOH is computationallymore expensive as it uses again PCA for data compression.
The SIFT descriptor still seems the most appealing de-scriptor for practical uses, and hence also the most widelyused nowadays. It is distinctive and relatively fast, whichis crucial for on-line applications. Recently, Se et al. implemented SIFT on a Field Programmable Gate Array(FPGA) and improved its speed by an order of magni-
tude. Meanwhile, Grabner et al.  also used integral im-ages to approximate SIFT. Their detection step is based ondifference-of-mean (without interpolation), their descrip-tion step on integral histograms. They achieve about thesame speed as we do (though the description step is con-stant in speed), but at the cost of reduced quality com-pared to SIFT. Generally, the high dimensionality of thedescriptor is a draw