computer visioncs-courses.mines.edu/csci507/schedule/35/tracking.pdf · colorado school of mines...
TRANSCRIPT
Computer VisionColorado School of Mines
Colorado School of Mines
Computer Vision
Professor William HoffDept of Electrical Engineering &Computer Science
http://inside.mines.edu/~whoff/ 1
Computer VisionColorado School of Mines
Tracking
2
Computer VisionColorado School of Mines
Detection vs Tracking
• Assume we want to detect an object in a sequence of images and compute its pose
• Two possible approaches:1. Detect the object independently in each image2. Track the object, using knowledge of its position (and
motion) in the previous image
• Approach #2 may be faster and more robust• Approach #1 is necessary when starting out, or to re‐acquire the object if tracking is lost
3
Computer VisionColorado School of Mines
Example Detection Method
• Assume that we have a “reference image” of a planar object, and we want to find the object in each new image
• Approach:– Find feature points (e.g., SIFT or SURF) and their descriptors in the reference
image– Calculate the 3D locations of the feature points– Then, for each new image:
• Find feature points and their descriptors in the new image• Match the descriptors to get a set of tentative matched points• Fit a homography (projective transform) to the point locations in the new
image and reference image, using RANSAC• Using the inlier points, find the pose of the object
4
Computer VisionColorado School of Mines
Example Tracking Method
• Assume that we have detected the planar object in the initial image
• Approach:– Find simple feature points (e.g., template image patches) in the initial
image– Calculate the 3D locations of the feature points– Then, for each new image:
• Track each template to its location in the new image• Fit a homography (projective transform) to the point locations in the new image and previous image, using RANSAC
• Using the inlier points, find the pose of the object• Update templates with their appearance in the new image
5
Computer VisionColorado School of Mines
Tracking Patch Features
• We want to select good features to track– Features should be “distinctive” in some sense (such as a
corner)– This is often called “corner detection”, although the features
don’t have to be corners!
• Once we have found some patch features, we need to track them to the new image– We can always use normalized cross correlation to search for
the best matching location (see lecture on template matching)– However, if the movement is small in the image, we can use an
iterative search instead (can be very fast)
6
Computer VisionColorado School of Mines
Good Features to Track
7
Computer VisionColorado School of Mines
Good Features to Track
• Say we want to track a patch from image I0 to image I1– We’ll assume that intensities don’t
change, and only translational motion– So we expect that
– In practice we will have noise, so they won’t be exactly equal
• But we can search for the displacement u=(x,y) that minimizes the sum of squared differences (SSD):
8
i
iii IIwE 201 )()()( xuxxu
)()( 01 ii II xux
Computer VisionColorado School of Mines
Stability of SSD• We would like the SSD score to have a distinct minimum at the
correct value of u– If not, there will be uncertainty in the value of u
• The image texture in the patch will determine the stability of the SSD score– Bland, featureless patches will not be able to be matched uniquely
• We want to choose patches to track that will be stable
• We can predict how stable a patch will be by comparing its SSD score with itself, at various displacements u
• If the surface E(u) has a distinct minimum, it is a good feature to track
9
yx
IIwEi
iii uxuxxu where,)()()( 200
w(x) is optional weighting function
Computer VisionColorado School of Mines10
From Computer Vision:Algorithms and Applications, Richard Szeliski, 2010, Springer
Computer VisionColorado School of Mines
Stability of SSD Score
• We can estimate the stability of the SSD score• First, take the Taylor series expansion of the image
– Recall Taylor series expansion of a function f(x) near x0
– For vectors x, u
• Then
11
)()()( 00
0
xxdxdfxfxf
x
uxxux )()()( 000 iii III
iii
iiiii
iiii
Iw
IIIw
IIwE
20
2000
200
)()(
)()()()(
)()()()(
uxx
xuxxx
xuxxu
where• xi = (xi,yi)T is some image point
• u = (x,y)T is the displacement from that point
• I0(xi) is the gradient of the image at xi
Computer VisionColorado School of Mines
Stability of SSD Score (continued)
• Expanding, we get
• where
• A can tell us how stable the SSD score is, at a given point
12
yyxy
xyxx
IIII
wA
Note – the “*” indicates correlation of w with each element of A
uAu
u
T
yyxy
xyxx
yx
IwIwIwIw
yxE )(2
2
, ,
yII
yI
xII
xII
yy
xyxx
• w is a small NxN averaging filter such as a box filter • it essentially averages the values over a small local
neighborhood
Computer VisionColorado School of Mines
Example Patches
13
2
2
yxy
xyx
IIII
wA
-5
0
5
-5
0
50
50
100
150
200
250
300
-5
0
5
-5
0
50
100
200
300
400
-5
0
5
-5
0
50
500
1000
1500
2000
2500
21 ‐21‐21 21
12 11 12
0 00 22
22 00 0
-5
0
5
-5
0
50
50
100
150
200
250
300
Computer VisionColorado School of Mines
Stability of SSD Score (continued)
• E=uTA u is a second order polynomial – Has the form E(x,y)=ax2+bxy+cy2
– The matrix A gives the curvature of the surface
14
-5 -4 -3 -2 -1 0 1 2 3 4 50
1
2
3
cbba
A
ax2
• A large value of ameans that E curves up sharply as xvaries– Similarly, large cmeans that E curves up sharply as y varies– Both a,c large mean that we have a good minimum in E
• Problem – if b is not zero, the surface may be flat along the diagonal
-5 -4 -3 -2 -1 0 1 2 3 4 50
1
2
3
4
5
cy2
-5
0
5
-5
0
50
500
1000
1500
2000
2500
Computer VisionColorado School of Mines
Estimating stability (continued)
• Let R be the matrix containing the eigenvectors of A– We can rotate the coordinate system to u=R v– Then E = uTA u= (vTRT) A (R v ) = vT v– Which has the form E(v1,v2) = 1 v12 + 2 v22
• So if both eigenvalues are large, we have a good minimum (there is no diagonal component)
• If one eigenvalue is low, surface is flat along that direction
15
-5
0
5
-5
0
50
100
200
300
400
-5
0
5
-5
0
50
50
100
150
200
250
300
Computer VisionColorado School of Mines
Possible interest point measures
• Look at the smallest eigenvalue – if it is sufficiently large, then we have a candidate point (Shi‐Tomasi)
• Alternative measures that don’t require square roots:
16
210102)(tr)det( AA
10
10
)(tr)det(
AA
From Computer Vision:Algorithms and Applications, Richard Szeliski, 2010, Springer
(Harris)
(Harmonic mean)
Computer VisionColorado School of Mines
Matlab Corner Interest Operator
17
% Find corner pointsclear allclose all
I = double(imread('test000.jpg'));
% Apply Gaussian blursd = 1.0;I = imfilter(I, fspecial('gaussian', round(6*sd), sd));
imshow(I, []);
% Compute the gradient componentsGx = imfilter(I, [-1 1]);Gy = imfilter(I, [-1; 1]);
% Compute the outer product g'g at each pixelGxx = Gx .* Gx;Gxy = Gx .* Gy;Gyy = Gy .* Gy;
Computer VisionColorado School of Mines
Matlab Corner Interest Operator (continued)
18
% Size of neighborhood over which to compute corner features.N = 13;
w = ones(N); % The neighborhood
% Sum the G's over the window size.A11 = imfilter(Gxx, w);A12 = imfilter(Gxy, w);A22 = imfilter(Gyy, w);
% At each pixel (x,y), we have the 2x2 matrix% [A11(x,y) A12(x,y);% A21(x,y) A22(x,y)]% Of course, A21 = A12.
% Compute the interest score: det(A)/trace(A)detA = A11.*A22 - A12.^2; % Computes det(A) at each pixeltraceA = A11 + A22; % Computer trace(A) at each pixels = detA./traceA; % The interest score
% If trace(A)=0 anywhere, we get "not a number".s(isnan(s)) = 0; % If any NaN's, just set them to zero
Computer VisionColorado School of Mines
Finding Local Maxima
• Simply finding all points greater than a threshold will yield too many points
• We should find points that are– greater than the threshold– a local maximum
interest scores interest scores, thresholded
19
Computer VisionColorado School of Mines
Finding Local Maxima
• If peaks are smooth, you can test each point to see if it is higher than its neighbors
• Matlab codeLmax = false(size(S));Lmax(S(2:end-1, 2:end-1) = ...
( S(2:end-1, 2:end-1) > S(3:end, 2:end-1) ... & S(2:end-1, 2:end-1) > S(1:end-2, 2:end-1) ... & S(2:end-1, 2:end-1) > S(2:end-1, 3:end) ... & S(2:end-1, 2:end-1) > S(2:end-1, 1:end-2) );
A 1D cross sectionscore
x
thresh
0
1
2
3
4
5
6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
In 2D, check 4 nearest neighbors
20
Computer VisionColorado School of Mines
Finding Local Maxima (continued)
• Problem – what if you don’t have smooth peaks?– Too many detected features
• Instead, find points that are local regionalmaxima – they are the largest points within a region (e.g., square of size W)
• Can do by calculating the largest value within distance N (this is a dilation)
• Then find those points that equal the local maxima
Smax = (S==imdilate(S, ones(N)));
0
1
2
3
4
5
6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
score
x0
1
2
3
4
5
6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Local max within ±2
21
Computer VisionColorado School of Mines
Finding Local Maxima (continued)
Score array
S
Score array dilated with square of side 15
Sd
Points where S=Sd
Smax
Points where S=Sd and are > threshold
22
Computer VisionColorado School of Mines
Matlab
23
% Choose the suppression radius, for non-maxima suppressionr = N;
% Find local maxima within each neighborhood of radius 2rLmax = (s==imdilate(s, strel('disk',2*r)));
% Note - we don't want to detect points too close to the border, so just % zero out everything near the border.Lmax(1:N,:) = false;Lmax(:,1:N) = false;Lmax(end-N:end,:) = false;Lmax(:,end-N:end) = false;
% Get a list of the indices of all the potential interest points[rows cols] = find(Lmax);
% Get the values of those interest pointsvals = s(Lmax);
Look at range of scores
Computer VisionColorado School of Mines
Identifying interest points
• Once you have the scores, you can decide which points to identify as interest points – Could choose the points with the top M scores– Or, choose all points with scores greater than a threshold
24
% Identify interest points above a threshold, and draw% a box around them, of size NxNfor i=1:length(vals)
if vals(i) > thresholdx = cols(i);y = rows(i);
rectangle('Position', [x-N/2 y-N/2 N N], ...'EdgeColor', 'r', ...'Linewidth', 2.0); % default is 0.5
text(x,y, sprintf('%d', i), ...'Color', 'r', ... % label with id number'FontSize', 16); % default is 10
endend
Computer VisionColorado School of Mines
Corner interest points:
• Window size = 15
• Threshold = 100
• Local maxima within 15 pixels25
Computer VisionColorado School of Mines
Tracking Features
26
Computer VisionColorado School of Mines
Tracking Features
• Once we have found a good feature to track, how do we find it in the next image?– Assume the feature is represented by a small patch
• We can always use normalized cross correlation to search for the best matching location (see lecture on template matching)
• However, if the movement is small in the image, we can use an iterative search instead– This can be very fast, as long as you have a good initial guess
27
,1/2
22
, ,
[ ( , ) ][ ( , ) ]( , )
[ ( , ) ] [ ( , ) ]
s t
s t s t
w s t w f x s y t fc x y
w s t w f x s y t f
Computer VisionColorado School of Mines
Lucas‐Kanade Incremental Refinement
• Does gradient descent on the sum‐of‐squared differences energy function
28
Method• We have the current guess
of the displacement u• Compute the intensity
difference between the translated image patch and the current image at that point
• Compute the local image gradient
• Update the displacement From: Richard Szeliski, “Computer VisionAlgorithms and Applications”
Computer VisionColorado School of Mines
Taylor Series Expansion
• In 2D
29
Computer VisionColorado School of Mines
Solve for the Displacement Correction
• Solve for u in the normal equations
• A, b can be computed using
30
Computer VisionColorado School of Mines
Hierarchical Search
• If the search range is large, a hierarchical search strategy can be used
• An image pyramid is used, where matches are first found in lower‐resolution images and then used to guide search in higher resolution images
31
https://en.wikipedia.org/wiki/Pyramid_(image_processing)
Computer VisionColorado School of Mines
Problems Tracking over Long Sequences
• If features are tracked over a long sequence, their appearance can undergo changes.
• If you continue to match against the originally detected patch (feature), tracking may eventually fail.
• Or, you can re‐sample the template at subsequent frame at the matching location. – However, the feature may drift from its original location to some other location in the image
32
Computer VisionColorado School of Mines
Affine Tracker
• Using an affine motion model is more robust when tracking over long sequences
• where
– d is the translation, A is the deformation matrix
• This can account for rotation and scale changes• Also viewpoint changes, if planar patch is small compared
to distance from camera
2 1 I Ax d I (x)
yyyx
xyxx
aaaa
A
33
Computer VisionColorado School of Mines
KLT Tracker
• Find interest points in the first frame
• Track (assuming translation only) points from each frame to subsequent frame
• Test consistency with original frame by computing affine transformation
from: Shi and Tomasi, “Good Features to Track”, IEEE CVPR 1994
34