scale invariant feature transform (sift)

Scale Invariant Feature Scale Invariant Feature Transform (SIFT)Transform (SIFT)

OutlineOutline

What is SIFTWhat is SIFT

Algorithm overviewAlgorithm overview

Object DetectionObject Detection

SummarySummary

OverviewOverview

19991999

Generates image features, “keypoints”Generates image features, “keypoints”– invariant to image scaling and rotation– partially invariant to change in illumination and

3D camera viewpoint– many can be extracted from typical images– highly distinctive

Algorithm overviewAlgorithm overview

Scale-space extrema detectionScale-space extrema detection– Uses difference-of-Gaussian functionUses difference-of-Gaussian function

Keypoint localizationKeypoint localization– Sub-pixel location and scale fit to a modelSub-pixel location and scale fit to a model

Orientation assignmentOrientation assignment– 1 or more for each keypoint1 or more for each keypoint

Keypoint descriptorKeypoint descriptor– Created from local image gradientsCreated from local image gradients

Scale spaceScale space

Definition: Definition:

wherewhere

),(),,(),,( yxIyxGyxL 222 2/)(

22

1),,(

yxeyxG


Keypoints are detected using scale-space Keypoints are detected using scale-space extrema in difference-of-Gaussian function extrema in difference-of-Gaussian function DD

DD definition: definition:

Efficient to computeEfficient to compute

),()),,(),,((),,( yxIyxGkyxGyxD

),,(),,( yxLkyxL

Relationship of Relationship of DD to to

Close approximation to scale-Close approximation to scale-normalized Laplacian of Gaussian,normalized Laplacian of Gaussian,

Diffusion equation:

Approximate ∂G/∂σ:

– giving,

When D has scales differing by a constant factor it already incorporates the σ2 scale normalization required for scale-invariance

G22

GG 2

k

yxGkyxGG ),,(),,(

GkyxGkyxG 22)1(),,(),,(

Gk

yxGkyxG 2),,(),,(

G22

Scale space constructionScale space construction

2k2σ

2kσ

2σ

kσ

σ

2kσ

2σ

kσ

σ

Scale space imagesScale space images

…

first octave

…

…

second octave

…

…

third octave

…

fourth octave

…

…

Difference-of-Gaussian imagesDifference-of-Gaussian images

…

first octave

…

…

second octave

…

…

third octave

…

fourth octave

…

…

Frequency of samplingFrequency of sampling

There is no minimumThere is no minimum

Best frequency determined experimentallyBest frequency determined experimentally

Prior smoothing for each octavePrior smoothing for each octave

Increasing Increasing σσ increases robustness, but costs increases robustness, but costs

σσ = 1.6 a good tradeoff = 1.6 a good tradeoff

Doubling the image initially increases Doubling the image initially increases number of keypointsnumber of keypoints

Finding extremaFinding extrema

Sample point is selected only if it is a Sample point is selected only if it is a minimum or a maximum of these pointsminimum or a maximum of these points

DoG scale spaceExtrema in this image

LocalizationLocalization

3D quadratic function is fit to the local sample 3D quadratic function is fit to the local sample pointspoints

Start with Taylor expansion with sample point Start with Taylor expansion with sample point as the originas the origin– wherewhere

Take the derivative with respect to Take the derivative with respect to XX, and set , and set it to 0, givingit to 0, giving

is the location of the keypointis the location of the keypoint

This is a 3x3 linear systemThis is a 3x3 linear system

2

2

2

1)(

DDDD T

T

Tyx ),,(

DD2

12

ˆ

XX

D

X

D ˆ02

2


Derivatives approximated by finite Derivatives approximated by finite differences,differences,– example:example:

If If XX is > 0.5 in any dimension, process is > 0.5 in any dimension, process repeatedrepeated

x

Dy

D

D

x

y

x

D

yx

D

x

Dyx

D

y

D

y

Dx

D

y

DD

2

222

2

2

22

22

2

2

4

)()(

1

2

2

,11

,11

,11

,11

2

,1

,,1

2

2

,1

,1

jik

jik

jik

jik

jik

jik

jik

jik

jik

DDDD

y

D

DDDD

DDD

FilteringFiltering

Contrast (use prev. equation):Contrast (use prev. equation):– If If | D(X) || D(X) | < 0.03, throw it out < 0.03, throw it out

Edge-iness:Edge-iness:– Use ratio of principal curvatures to throw out poorly Use ratio of principal curvatures to throw out poorly

defined peaksdefined peaks– Curvatures come from Hessian:Curvatures come from Hessian:– Ratio of Ratio of Trace(H)Trace(H)22 and and Determinant(H)Determinant(H)

– If ratio > If ratio > (r+1)(r+1)22/(r)/(r), throw it out (SIFT uses r=10), throw it out (SIFT uses r=10)

XD

DDT

ˆ2

1)ˆ(

yyxy

xyxx

DD

DDH

2)()(

)(

xyyyxx

yyxx

DDDHDet

DDHTr

Orientation assignmentOrientation assignment

Descriptor computed relative to keypoint’s Descriptor computed relative to keypoint’s orientation achieves rotation invarianceorientation achieves rotation invariance

Precomputed along with mag. for all levels Precomputed along with mag. for all levels (useful in descriptor computation)(useful in descriptor computation)

Multiple orientations assigned to keypoints Multiple orientations assigned to keypoints from an orientation histogramfrom an orientation histogram– Significantly improve stability of matchingSignificantly improve stability of matching

))),1(),1(/())1,()1,(((2tan),(

))1,()1,(()),1(),1((),( 22

yxLyxLyxLyxLayx

yxLyxLyxLyxLyxm

Keypoint imagesKeypoint images

DescriptorDescriptor

Descriptor has 3 dimensions Descriptor has 3 dimensions (x,y,(x,y,θθ))

Orientation histogram of gradient magnitudesOrientation histogram of gradient magnitudes

Position and orientation of each gradient Position and orientation of each gradient sample rotated relative to keypoint orientationsample rotated relative to keypoint orientation

DescriptorDescriptor

Weight magnitude of each sample point by Weight magnitude of each sample point by Gaussian weighting functionGaussian weighting function

Distribute each sample to adjacent bins by Distribute each sample to adjacent bins by trilinear interpolation (avoids boundary effects)trilinear interpolation (avoids boundary effects)

DescriptorDescriptorBest results achieved with 4x4x8 = 128 Best results achieved with 4x4x8 = 128 descriptor sizedescriptor size

Normalize to unit lengthNormalize to unit length– Reduces effect of illumination changeReduces effect of illumination change

Cap each element to 0.2, normalize againCap each element to 0.2, normalize again– Reduces non-linear illumination changesReduces non-linear illumination changes– 0.2 determined experimentally0.2 determined experimentally

Object DetectionObject Detection

Create a database Create a database of keypoints from of keypoints from training imagestraining images

Match keypoints to Match keypoints to a databasea database– Nearest neighbor Nearest neighbor

searchsearch

PCA-SIFTPCA-SIFT

Different descriptor (same keypoints)Different descriptor (same keypoints)

Apply PCA to the gradient patchApply PCA to the gradient patch

Descriptor size is 20 (instead of 128)Descriptor size is 20 (instead of 128)

More robust, fasterMore robust, faster

SummarySummary


Difference-of-GaussianDifference-of-Gaussian


FilteringFiltering

Orientation assignmentOrientation assignment

Descriptor, 128 elementsDescriptor, 128 elements

scale invariant feature transform (sift)

Documents