an analysis of scale-space sampling in...

An Analysis of Scale-space Sampling in SIFT

Ives Rey-Otero† Jean-Michel Morel† Mauricio Delbracio ?†

†CMLA, ENS-Cachan, France ?ECE, Duke University, USA

IEEE ICIP 2014, Paris

Overview

In this article, we propose:

a study of SIFT

an experimental framework consistent with SIFT’s cameramodel

an analysis of detection stability and invariance

a study on the influence of

scale-space samplingimage aliasingthresholds aiming at discarding unstable detections

Overview

Detection stability ?

Overview

This paper is not:

a variant of SIFTSURF: Speeded Up Robust Feature (Bay et al. 2006)

Affine SIFT (Yu and Morel, 2009)

Spectral-SIFT (Koutaki and Kumamoto, 2014)

a new feature descriptorOn affine invariant descriptors related to SIFT (Sadek and Caselles, 2012)

BRIEF (Calonder et al. 2010)

K-means Hashing (He et al. 2013)

a benchmarkA comparison of affine region detectors (Mikolajczyk et al. 2005)

Overview

Study the influence of scale-space sampling on SIFT

SIFT applied with 3 different sampling settings

SIFT overview

The detection step:Assumes that the input image u(x) has a Gaussian camera blur of c

Gaussian scale-space v(σ, x) = G√σ2−c2u(x)

Differential operator: DoG (difference of Gaussians)

Extract discrete 3D extrema

Refine position of 3D extrema (local quadratic model)

Filter unstable keypoints (thresholds)

SIFT camera model

The camera model adopted by SIFT approximates the point spreadfunction (PSF) by a Gaussian kernel of standard deviation c.

u =: S1GcHT Ru0

S1 sampling operator

Gaussian kernel of standard deviation c

H homothety

T translation

R rotation

SIFT overview

The camera model adopted by SIFT approximates the point spreadfunction (PSF) by a Gaussian kernel of standard deviation c.

SIFT overview

Compute the Gaussian scale-space v(σ, x) = G√σ2−c2u(x)

SIFT overview

Compute differential operator: DoG (difference of Gaussians)

Extract 3D extrema

Theoretically invariant to zoom outs

Let uλ and uµ be two different acquisitions at two differentdistances

uλ = S1GcHλu0 uµ = S1GcHµu0

Their respective scale-space

vλ(σ, x) = GσHλu0(x) vµ(σ, x) = GσHµu0(x)

Reparameterizations of v0(σ, x) = Gσu0(σ, x)

vµ(σ/µ, x/µ) = vµ(σ/λ, x/λ)

Does this perfect invariance hold in practice ?

The architecture of digital Gaussian scale-space

A set of digital images with various level of blur σ and sampled atvarious rates δ.


Supersample by a factor 1/δmin (default 1/δmin = 2)

Add extra blur (G(σmin2−c2)1/2/δmin

) to reach the minimal level of blur σmin

(default σmin = 0.8)

blur c supersampling factor 1/δmin > 1blur σmin > c


The scale-space is split into octaves, subsets of nspo images (defaultnspo = 3) sharing the same sampling rate 1/δ

Blurs follow a geometric progression

σ = σmin2s/nspo

Increase the scale-space sampling rates:

↗ 1/δmin

↗ nspo

Simulating the camera model

How to simulate snapshots having a given level of blur c

Take a large image uin with unknown level of blur cin

Apply a Gaussian filtering of standard deviation s × c

Apply a subsampling of factor s >> 1

usimul = S1/sGs×cuin

The approximated level of blur is√c2 + (cin/s)2 ≈ c

The experimental setup

Measure the invariance level by accurately simulating image pairsrelated through a scale change, a translation or a blur

Comparing the set of detected keypoints

The non repeatability ratio (NRR) is the number of keypointsdetected in one image but not detected on the other at itsexpected position (+/−0.25px in space, +/−21/4s relatively inscale) divided by the total number of detected keypoints

Influence of scale-space sampling - Translation

Subpixel translation of 0.25px

Compare the sets of keypoints for various scale-spacesettings

5 10 15 20 25 30 350

1000

2000

3000

4000

nspo

Keypoints

dmin=1

dmin=1/2

dmin=1/4

dmin=1/8

dmin=1/16

5 10 15 20 25 30 350

0.1

0.2

0.3

0.4

0.5

nspo

NRR

Influence of scale-space sampling - Zoom-out

2.15× zoom-out

Compare the sets of keypoints for various scale-spacesettings

5 10 15 20 25 30 350

100

200

300

nspo

Keypoints

dmin=1

dmin=1/2

dmin=1/4

dmin=1/8

dmin=1/16

0 5 10 15 20 25 30 350

0.1

0.2

0.3

0.4

0.5

0.6

0.7

nspo

NRR

Influence of image blur

Stability varying the level of blur c in the input image(0.30 ≤ c ≤ 1.10, no image aliasing for c > 0.75)

Subpixel translation of 0.25pxKeypoint stability to the image blur and the detectionscale

1 1.5 2 2.50

500

1000

1500

2000

2500

smin

Keypoints

c=0.30

c=0.40

c=0.50

c=0.60

c=0.70

c=0.80

c=0.90

c=1.00

c=1.10

1 1.5 2 2.50

0.1

0.2

0.3

0.4

smin

NRR

The DoG threshold

Is the DoG threshold efficient at discarding unstable keypoints ?

Subpixel translation of 0.25pxIncreasing the DoG threshold

0 0.005 0.01 0.015 0.02 0.025 0.030

200

400

600

800

dog

Keypoints

c=0.4 OverSamp

c=0.7 OverSamp

c=0.4 Lowe

c=0.7 Lowe

0 0.005 0.01 0.015 0.02 0.025 0.030

0.1

0.2

0.3

0.4

dog

NRR

The DoG threshold fails to significantly improve the overallstability of keypoints.

Conclusions

Invariance is limited by insufficient sampling ofthe Gaussian scale-space

Invariance is limited by image aliasing

The DoG threshold is not efficient

Future work:

Extend this analysis in the case where the input image blur isnot consistent with SIFT’s camera model

Analyse the influence of image noise

an analysis of scale-space sampling in...

Documents