computer visioncs-courses.mines.edu/csci507/schedule/35/tracking.pdf · colorado school of mines...

Computer VisionColorado School of Mines

Colorado School of Mines

Computer Vision

Professor William HoffDept of Electrical Engineering &Computer Science

http://inside.mines.edu/~whoff/ 1


Tracking

2


Detection vs Tracking

• Assume we want to detect an object in a sequence of images and compute its pose

• Two possible approaches:1. Detect the object independently in each image2. Track the object, using knowledge of its position (and

motion) in the previous image

• Approach #2 may be faster and more robust• Approach #1 is necessary when starting out, or to re‐acquire the object if tracking is lost

3


Example Detection Method

• Assume that we have a “reference image” of a planar object, and we want to find the object in each new image

• Approach:– Find feature points (e.g., SIFT or SURF) and their descriptors in the reference

image– Calculate the 3D locations of the feature points– Then, for each new image:

• Find feature points and their descriptors in the new image• Match the descriptors to get a set of tentative matched points• Fit a homography (projective transform) to the point locations in the new

image and reference image, using RANSAC• Using the inlier points, find the pose of the object

4


Example Tracking Method

• Assume that we have detected the planar object in the initial image

• Approach:– Find simple feature points (e.g., template image patches) in the initial

image– Calculate the 3D locations of the feature points– Then, for each new image:

• Track each template to its location in the new image• Fit a homography (projective transform) to the point locations in the new image and previous image, using RANSAC

• Using the inlier points, find the pose of the object• Update templates with their appearance in the new image

5


Tracking Patch Features

• We want to select good features to track– Features should be “distinctive” in some sense (such as a

corner)– This is often called “corner detection”, although the features

don’t have to be corners!

• Once we have found some patch features, we need to track them to the new image– We can always use normalized cross correlation to search for

the best matching location (see lecture on template matching)– However, if the movement is small in the image, we can use an

iterative search instead (can be very fast)

6


Good Features to Track

7


Good Features to Track

• Say we want to track a patch from image I0 to image I1– We’ll assume that intensities don’t

change, and only translational motion– So we expect that

– In practice we will have noise, so they won’t be exactly equal

• But we can search for the displacement u=(x,y) that minimizes the sum of squared differences (SSD):

8

i

iii IIwE 201 )()()( xuxxu

)()( 01 ii II xux


Stability of SSD• We would like the SSD score to have a distinct minimum at the

correct value of u– If not, there will be uncertainty in the value of u

• The image texture in the patch will determine the stability of the SSD score– Bland, featureless patches will not be able to be matched uniquely

• We want to choose patches to track that will be stable

• We can predict how stable a patch will be by comparing its SSD score with itself, at various displacements u

• If the surface E(u) has a distinct minimum, it is a good feature to track

9

yx

IIwEi

iii uxuxxu where,)()()( 200

w(x) is optional weighting function

Computer VisionColorado School of Mines10

From Computer Vision:Algorithms and Applications, Richard Szeliski, 2010, Springer


Stability of SSD Score

• We can estimate the stability of the SSD score• First, take the Taylor series expansion of the image

– Recall Taylor series expansion of a function f(x) near x0

– For vectors x, u

• Then

11

)()()( 00

0

xxdxdfxfxf

x

uxxux )()()( 000 iii III

iii

iiiii

iiii

Iw

IIIw

IIwE

20

2000

200

)()(

)()()()(

)()()()(

uxx

xuxxx

xuxxu

where• xi = (xi,yi)T is some image point

• u = (x,y)T is the displacement from that point

• I0(xi) is the gradient of the image at xi


Stability of SSD Score (continued)

• Expanding, we get

• where

• A can tell us how stable the SSD score is, at a given point

12

yyxy

xyxx

IIII

wA

Note – the “*” indicates correlation of w with each element of A

uAu

u

T

yyxy

xyxx

yx

IwIwIwIw

yxE )(2

2

, ,

yII

yI

xII

xII

yy

xyxx

• w is a small NxN averaging filter such as a box filter • it essentially averages the values over a small local

neighborhood


Example Patches

13

2

2

yxy

xyx

IIII

wA

-5

0

5

-5

0

50

50

100

150

200

250

300

-5

0

5

-5

0

50

100

200

300

400

-5

0

5

-5

0

50

500

1000

1500

2000

2500

21 ‐21‐21 21

12 11 12

0 00 22

22 00 0

-5

0

5

-5

0

50

50

100

150

200

250

300


Stability of SSD Score (continued)

• E=uTA u is a second order polynomial – Has the form E(x,y)=ax2+bxy+cy2

– The matrix A gives the curvature of the surface

14

-5 -4 -3 -2 -1 0 1 2 3 4 50

1

2

3

cbba

A

ax2

• A large value of ameans that E curves up sharply as xvaries– Similarly, large cmeans that E curves up sharply as y varies– Both a,c large mean that we have a good minimum in E

• Problem – if b is not zero, the surface may be flat along the diagonal

-5 -4 -3 -2 -1 0 1 2 3 4 50

1

2

3

4

5

cy2

-5

0

5

-5

0

50

500

1000

1500

2000

2500


Estimating stability (continued)

• Let R be the matrix containing the eigenvectors of A– We can rotate the coordinate system to u=R v– Then E = uTA u= (vTRT) A (R v ) = vT v– Which has the form E(v1,v2) = 1 v12 + 2 v22

• So if both eigenvalues are large, we have a good minimum (there is no diagonal component)

• If one eigenvalue is low, surface is flat along that direction

15

-5

0

5

-5

0

50

100

200

300

400

-5

0

5

-5

0

50

50

100

150

200

250

300


Possible interest point measures

• Look at the smallest eigenvalue – if it is sufficiently large, then we have a candidate point (Shi‐Tomasi)

• Alternative measures that don’t require square roots:

16

210102)(tr)det( AA

10

10

)(tr)det(

AA

From Computer Vision:Algorithms and Applications, Richard Szeliski, 2010, Springer

(Harris)

(Harmonic mean)


Matlab Corner Interest Operator

17

% Find corner pointsclear allclose all

I = double(imread('test000.jpg'));

% Apply Gaussian blursd = 1.0;I = imfilter(I, fspecial('gaussian', round(6*sd), sd));

imshow(I, []);

% Compute the gradient componentsGx = imfilter(I, [-1 1]);Gy = imfilter(I, [-1; 1]);

% Compute the outer product g'g at each pixelGxx = Gx .* Gx;Gxy = Gx .* Gy;Gyy = Gy .* Gy;


Matlab Corner Interest Operator (continued)

18

% Size of neighborhood over which to compute corner features.N = 13;

w = ones(N); % The neighborhood

% Sum the G's over the window size.A11 = imfilter(Gxx, w);A12 = imfilter(Gxy, w);A22 = imfilter(Gyy, w);

% At each pixel (x,y), we have the 2x2 matrix% [A11(x,y) A12(x,y);% A21(x,y) A22(x,y)]% Of course, A21 = A12.

% Compute the interest score: det(A)/trace(A)detA = A11.*A22 - A12.^2; % Computes det(A) at each pixeltraceA = A11 + A22; % Computer trace(A) at each pixels = detA./traceA; % The interest score

% If trace(A)=0 anywhere, we get "not a number".s(isnan(s)) = 0; % If any NaN's, just set them to zero


Finding Local Maxima

• Simply finding all points greater than a threshold will yield too many points

• We should find points that are– greater than the threshold– a local maximum

interest scores interest scores, thresholded

19


Finding Local Maxima

• If peaks are smooth, you can test each point to see if it is higher than its neighbors

• Matlab codeLmax = false(size(S));Lmax(S(2:end-1, 2:end-1) = ...

( S(2:end-1, 2:end-1) > S(3:end, 2:end-1) ... & S(2:end-1, 2:end-1) > S(1:end-2, 2:end-1) ... & S(2:end-1, 2:end-1) > S(2:end-1, 3:end) ... & S(2:end-1, 2:end-1) > S(2:end-1, 1:end-2) );

A 1D cross sectionscore

x

thresh

0

1

2

3

4

5

6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

In 2D, check 4 nearest neighbors

20


Finding Local Maxima (continued)

• Problem – what if you don’t have smooth peaks?– Too many detected features

• Instead, find points that are local regionalmaxima – they are the largest points within a region (e.g., square of size W)

• Can do by calculating the largest value within distance N (this is a dilation)

• Then find those points that equal the local maxima

Smax = (S==imdilate(S, ones(N)));

0

1

2

3

4

5

6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

score

x0

1

2

3

4

5

6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Local max within ±2

21


Finding Local Maxima (continued)

Score array

S

Score array dilated with square of side 15

Sd

Points where S=Sd

Smax

Points where S=Sd and are > threshold

22


Matlab

23

% Choose the suppression radius, for non-maxima suppressionr = N;

% Find local maxima within each neighborhood of radius 2rLmax = (s==imdilate(s, strel('disk',2*r)));

% Note - we don't want to detect points too close to the border, so just % zero out everything near the border.Lmax(1:N,:) = false;Lmax(:,1:N) = false;Lmax(end-N:end,:) = false;Lmax(:,end-N:end) = false;

% Get a list of the indices of all the potential interest points[rows cols] = find(Lmax);

% Get the values of those interest pointsvals = s(Lmax);

Look at range of scores


Identifying interest points

• Once you have the scores, you can decide which points to identify as interest points – Could choose the points with the top M scores– Or, choose all points with scores greater than a threshold

24

% Identify interest points above a threshold, and draw% a box around them, of size NxNfor i=1:length(vals)

if vals(i) > thresholdx = cols(i);y = rows(i);

rectangle('Position', [x-N/2 y-N/2 N N], ...'EdgeColor', 'r', ...'Linewidth', 2.0); % default is 0.5

text(x,y, sprintf('%d', i), ...'Color', 'r', ... % label with id number'FontSize', 16); % default is 10

endend


Corner interest points:

• Window size = 15

• Threshold = 100

• Local maxima within 15 pixels25


Tracking Features

26


Tracking Features

• Once we have found a good feature to track, how do we find it in the next image?– Assume the feature is represented by a small patch

• We can always use normalized cross correlation to search for the best matching location (see lecture on template matching)

• However, if the movement is small in the image, we can use an iterative search instead– This can be very fast, as long as you have a good initial guess

27

,1/2

22

, ,

[ ( , ) ][ ( , ) ]( , )

[ ( , ) ] [ ( , ) ]

s t

s t s t

w s t w f x s y t fc x y

w s t w f x s y t f


Lucas‐Kanade Incremental Refinement

• Does gradient descent on the sum‐of‐squared differences energy function

28

Method• We have the current guess

of the displacement u• Compute the intensity

difference between the translated image patch and the current image at that point

• Compute the local image gradient

• Update the displacement From: Richard Szeliski, “Computer VisionAlgorithms and Applications”


Taylor Series Expansion

• In 2D

29


Solve for the Displacement Correction

• Solve for u in the normal equations

• A, b can be computed using

30


Hierarchical Search

• If the search range is large, a hierarchical search strategy can be used

• An image pyramid is used, where matches are first found in lower‐resolution images and then used to guide search in higher resolution images

31

https://en.wikipedia.org/wiki/Pyramid_(image_processing)


Problems Tracking over Long Sequences

• If features are tracked over a long sequence, their appearance can undergo changes.

• If you continue to match against the originally detected patch (feature), tracking may eventually fail.

• Or, you can re‐sample the template at subsequent frame at the matching location. – However, the feature may drift from its original location to some other location in the image

32


Affine Tracker

• Using an affine motion model is more robust when tracking over long sequences

• where

– d is the translation, A is the deformation matrix

• This can account for rotation and scale changes• Also viewpoint changes, if planar patch is small compared

to distance from camera

2 1 I Ax d I (x)

yyyx

xyxx

aaaa

A

33


KLT Tracker

• Find interest points in the first frame

• Track (assuming translation only) points from each frame to subsequent frame

• Test consistency with original frame by computing affine transformation

from: Shi and Tomasi, “Good Features to Track”, IEEE CVPR 1994

34

computer visioncs-courses.mines.edu/csci507/schedule/35/tracking.pdf · colorado school of mines...

Documents