computer vision template matching and object recognition marc pollefeys comp 256 some slides and...

65
Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

Post on 21-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision

Template matching and object recognition

Marc PollefeysCOMP 256

Some slides and illustrations from D. Forsyth, …

Page 2: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision

Aug 26/28 - Introduction

Sep 2/4 Cameras Radiometry

Sep 9/11 Sources & Shadows Color

Sep 16/18 Linear filters & edges

(hurricane Isabel)

Sep 23/25 Pyramids & Texture Multi-View Geometry

Sep30/Oct2 Stereo Project proposals

Oct 7/9 Tracking (Welch) Optical flow

Oct 14/16 - -

Oct 21/23 Silhouettes/carving (Fall break)

Oct 28/30 - Structure from motion

Nov 4/6 Project update Proj. SfM

Nov 11/13 Camera calibration Segmentation

Nov 18/20 Fitting Prob. segm.&fit.

Nov 25/27 Matching templates (Thanksgiving)

Dec 2/4 Matching relations Range data

Dec 9 Final project

Tentative class schedule

Page 3: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision Assignment 3

• Use Hough, RANSAC and EM to estimate noisy line embedded in noise (details on the web by tonight)

Page 4: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision Last class: EM

(Expectation Maximization)

Alternate• Expectation (determine feature appartenance)

• Maximization (determine ML model parameters)

P i 1|,xi P xi |i 1, P i 1 P xi | i 1, P i 1 P xi |i 0, P i 0

exp 1

2 2 xi cos yi sin c 2 exp 1

2 2 xi cos yi sin c 2 exp kn 1

P i 1|,xi P xi |i 1, P i 1 P xi | i 1, P i 1 P xi |i 0, P i 0

exp 1

2 2 xi cos yi sin c 2 exp 1

2 2 xi cos yi sin c 2 exp kn 1

c,,

optimization (weighted with i)

counting

Page 5: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision Last class: model selection

2L D;* p log N

2L D;* 2 pAIC:

BIC (and MDL):

structure complexity

model complexity

Page 6: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision Recognition by finding patterns

• We have seen very simple template matching (under filters)

• Some objects behave like quite simple templates – Frontal faces

• Strategy:– Find image windows– Correct lighting– Pass them to a

statistical test (a classifier) that accepts faces and rejects non-faces

Page 7: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision Basic ideas in classifiers

• Loss– some errors may be more expensive than others

• e.g. a fatal disease that is easily cured by a cheap medicine with no side-effects -> false positives in diagnosis are better than false negatives

– We discuss two class classification: L(1->2) is the loss caused by calling 1 a 2

• Total risk of using classifier s

Page 8: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision Basic ideas in classifiers

• Generally, we should classify as 1 if the expected loss of classifying as 1 is better than for 2

• gives

• Crucial notion: Decision boundary– points where the loss is the same for either case

1 if

2 if

Page 9: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision

Some loss may be inevitable: the minimumrisk (shaded area) is called the Bayes risk

Page 10: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision

Finding a decision boundary is not the same asmodelling a conditional density.

Page 11: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision

• Assume normal class densities, p-dimensional measurements with common (known) covariance and different (known) means

• Class priors are• Can ignore a common factor

in posteriors - important; posteriors are then:

p x k 1

2

p2

1

2exp

1

2x

k T 1 x k

k

Example: known distributions

p k | x k 1

2

p2

1

2 exp 12

x k T 1 x

k

Page 12: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision

• Classifier boils down to: choose class that minimizes:

x,k 2 2 log k

where

x,k x

k T 1 x k

12

because covariance is common, this simplifies to sign ofa linear expression (i.e. Voronoi diagram in 2D for =I)

Mahalanobis distance

Page 13: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision Plug-in classifiers

• Assume that distributions have some parametric form - now estimate the parameters from the data.

• Common: – assume a normal distribution with shared

covariance, different means; use usual estimates– ditto, but different covariances; ditto

• Issue: parameter estimates that are “good” may not give optimal classifiers.

Page 14: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision Histogram based classifiers

• Use a histogram to represent the class-conditional densities– (i.e. p(x|1), p(x|2), etc)

• Advantage: estimates become quite good with enough data!

• Disadvantage: Histogram becomes big with high dimension– but maybe we can assume feature

independence?

Page 15: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision Finding skin

• Skin has a very small range of (intensity independent) colours, and little texture– Compute an intensity-independent colour

measure, check if colour is in this range, check if there is little texture (median filter)

– See this as a classifier - we can set up the tests by hand, or learn them.

– get class conditional densities (histograms), priors from data (counting)

• Classifier is

Page 16: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision

Figure from “Statistical color models with application to skin detection,” M.J. Jones and J. Rehg, Proc. Computer Vision and Pattern Recognition, 1999 copyright 1999, IEEE

Page 17: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision

Figure from “Statistical color models with application to skin detection,” M.J. Jones and J. Rehg, Proc. Computer Vision and Pattern Recognition, 1999 copyright 1999, IEEE

Receiver Operating Curve

Page 18: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision Finding faces

• Faces “look like” templates (at least when they’re frontal).

• General strategy:– search image

windows at a range of scales

– Correct for illumination

– Present corrected window to classifier

• Issues– How corrected?– What features?– What classifier?– what about lateral

views?

Page 19: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision Naive Bayes

• (Important: naive not necessarily pejorative)• Find faces by vector quantizing image patches,

then computing a histogram of patch types within a face

• Histogram doesn’t work when there are too many features– features are the patch types– assume they’re independent and cross fingers– reduction in degrees of freedom– very effective for face finders

• why? probably because the examples that would present real problems aren’t frequent.

Many face finders on the face detection home pagehttp://home.t-online.de/home/Robert.Frischholz/face.htm

Page 20: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision

Figure from A Statistical Method for 3D Object Detection Applied to Faces and Cars, H. Schneiderman and T. Kanade, Proc. Computer Vision and Pattern Recognition, 2000, copyright 2000, IEEE

Page 21: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision Face Recognition

• Whose face is this? (perhaps in a mugshot)

• Issue:– What differences are

important and what not?

– Reduce the dimension of the images, while maintaining the “important” differences.

• One strategy:– Principal components

analysis

Page 22: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision Template matching

• Simple cross-correlation between images• Best match wins

• Computationally expensive, i.e. requires presented image to be correlated with every image in the database !

IImaxarg Tii

iS

Page 23: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision Eigenspace matching

• Consider PCA

• Then,

ii pI E

IpII TTT Eii ppII TTii ppIImaxarg TTiii

iS

Much cheaper to compute!

Page 24: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision

Page 25: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision

Eigenfaces

plus a linear combination of eigenfaces

Page 26: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision Appearance manifold approach

- for every object sample the set of viewing conditions- use these images as feature vectors- apply a PCA over all the images - keep the dominant PCs- sequence of views for 1 object represent a manifold in space of projections- what is the nearest manifold for a given view?

(Nayar et al. ‘96)

Page 27: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision Object-pose manifold

• Appearance changes projected on PCs (1D pose changes)

• Sufficient characterization for recognition and pose estimation

Page 28: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision Real-time system (Nayar et al. ‘96)

Page 29: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision Difficulties with PCA

• Projection may suppress important detail– smallest variance directions may not be

unimportant

• Method does not take discriminative task into account– typically, we wish to compute features

that allow good discrimination– not the same as largest variance

Page 30: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision

Page 31: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision

Page 32: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision Linear Discriminant Analysis

• We wish to choose linear functions of the features that allow good discrimination.– Assume class-conditional covariances are

the same– Want linear feature that maximises the

spread of class means for a fixed within-class variance

Page 33: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision

Page 34: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision

Page 35: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision

Page 36: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision Neural networks

• Linear decision boundaries are useful– but often not very powerful – we seek an easy way to get more

complex boundaries

• Compose linear decision boundaries– i.e. have several linear classifiers, and

apply a classifier to their output– a nuisance, because sign(ax+by+cz) etc.

isn’t differentiable.– use a smooth “squashing function” in

place of sign.

Page 37: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision

Page 38: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision

Page 39: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision Training

• Choose parameters to minimize error on training set

• Stochastic gradient descent, computing gradient using trick (backpropagation, aka the chain rule)

• Stop when error is low, and hasn’t changed much

Error p 12 n xe; p oe

e

Page 40: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision

The vertical face-finding part of Rowley, Baluja and Kanade’s systemFigure from “Rotation invariant neural-network based face detection,” H.A. Rowley, S. Baluja and T. Kanade, Proc. Computer Vision and Pattern Recognition, 1998, copyright 1998, IEEE

Page 41: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision

Histogram equalisation gives an approximate fix for illumination induced variability

Page 42: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision

Architecture of the complete system: they use another neuralnet to estimate orientation of the face, then rectify it. They search over scales to find bigger/smaller faces.

Figure from “Rotation invariant neural-network based face detection,” H.A. Rowley, S. Baluja and T. Kanade, Proc. Computer Vision and Pattern Recognition, 1998, copyright 1998, IEEE

Page 43: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision

Figure from “Rotation invariant neural-network based face detection,” H.A. Rowley, S. Baluja and T. Kanade, Proc. Computer Vision and Pattern Recognition, 1998, copyright 1998, IEEE

Page 44: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision Convolutional neural networks

• Template matching using NN classifiers seems to work

• Natural features are filter outputs– probably, spots and bars, as in texture– but why not learn the filter kernels, too?

Page 45: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision

Figure from “Gradient-Based Learning Applied to Document Recognition”, Y. Lecun et al Proc. IEEE, 1998 copyright 1998, IEEE

A convolutional neural network, LeNet; the layers filter, subsample, filter,subsample, and finally classify based on outputs of this process.

Page 46: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision

LeNet is used to classify handwritten digits. Notice that the test error rate is not the same as the training error rate, becausethe test set consists of items not in the training set. Not all classification schemes necessarily have small test error when theyhave small training error.

Figure from “Gradient-Based Learning Applied to Document Recognition”, Y. Lecun et al Proc. IEEE, 1998 copyright 1998, IEEE

Page 47: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision Support Vector Machines

• Neural nets try to build a model of the posterior, p(k|x)

• Instead, try to obtain the decision boundary directly– potentially easier, because we need to

encode only the geometry of the boundary, not any irrelevant wiggles in the posterior.

– Not all points affect the decision boundary

Page 48: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision Support Vector Machines

• Set S of points xiRn, each xi belongs to one of two classes yi {-1,1}

• The goals is to find a hyperplane that divides S in these two classes

S is separable if w Rn,b R

1x.w by ii

Separating hyperplanes

0x.w b wdy ii

1

w

bd i

i

x.wdi

Closest point

wdy ii

1

ww

Page 49: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision

Problem 1:

Support Vector Machines

• Optimal separating hyperplane maximizes

Optimal separating hyperplane (OSH)

w

1

w.w21

Niby ii ,...,2,1,1x.w Minimize

Subject to

w2

support vectors

Page 50: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision

Solve using Lagrange multipliers

• Lagrangian

– at solution

– therefore

1x.ww.wα,w,1

21

bybL ii

N

ii

01

N

iiiy

b

L

0xww 1

ii

N

ii y

L

jijij

N

jii

N

ii yyL x.x

1,1 2

1

0

Page 51: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision

Problem 2:

Dual problem

N

iiD

12

1 τ

01

N

iiiy

0

jijiij yyD x.x

Minimize

Subject to

where

Kühn-Tucker condition: 01x.w by iii

jjyb x.w

ii

N

ii y xw

1

(for xj a support vector)

(i>0 only for support vectors)

Page 52: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision Linearly non-separable cases

• Find trade-off between maximum separation and misclassifications

iii by 1x.w

wi1

Problem 3:

iC w.w21

Niiii by ,...,2,1,1x.w Minimize

Subject to

0

Page 53: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision

Dual problem for non-separable cases

Problem 4:

N

iiD

12

1 τ

01

N

iiiy

Ci 0

jijiij yyD x.x

Minimize

Subject to

where

Kühn-Tucker condition:

0

01x.w

ii

iiii

C

by

Support vectors: 0 ii C

0

10

1

i

i

i

i C

misclassified

margin vectors

too close OSHerrors

Page 54: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision Decision function

• Once w and b have been computed the classification decision for input x is given by

• Note that the globally optimal solution can always be obtained (convex problem)

Page 55: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision Non-linear SVMs

• Non-linear separation surfaces can be obtained by non-linearly mapping the data to a high dimensional space and then applying the linear SVM technique

• Note that data only appears through vector product• Need for vector product in high-dimension can be

avoided by using Mercer kernels:

iiiiK xxx,x

pK y.xyx, (Polynomial kernel)

2

2yx

expyx,

K (Radial Basis Function)

y.xtanhyx,K (Sigmoïdal function)

22

222121

21

21

2y.xyx, yxyyxxyxK

2221

21 ,,x xxxx

e.g.

Page 56: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision

Space in which decisionboundary is linear - aconic in the original space has the form

x, y x2 , xy, y2, x, y u0 ,u1,u2 ,u3 ,u4

au0 bu1 cu2 du3 eu4 f 0

Page 57: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision SVMs for 3D object recognition

- Consider images as vectors- Compute pairwise OSH using linear SVM- Support vectors are representative views

of the considered object (relative to other)

- Tournament like classification- Competing classes are grouped in pairs- Not selected classes are discarded- Until only one class is left- Complexity linear in number of classes

- No pose estimation

(Pontil & Verri PAMI’98)

Page 58: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision Vision applications

• Reliable, simple classifier, – use it wherever you

need a classifier

• Commonly used for face finding

• Pedestrian finding– many pedestrians look

like lollipops (hands at sides, torso wider than legs) most of the time

– classify image regions, searching over scales

– But what are the features?

– Compute wavelet coefficients for pedestrian windows, average over pedestrians. If the average is different from zero, probably strongly associated with pedestrian

Page 59: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision

Figure from, “A general framework for object detection,” by C. Papageorgiou, M. Oren and T. Poggio, Proc. Int. Conf. Computer Vision, 1998, copyright 1998, IEEE

Page 60: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision

Figure from, “A general framework for object detection,” by C. Papageorgiou, M. Oren and T. Poggio, Proc. Int. Conf. Computer Vision, 1998, copyright 1998, IEEE

Page 61: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision

Figure from, “A general framework for object detection,” by C. Papageorgiou, M. Oren and T. Poggio, Proc. Int. Conf. Computer Vision, 1998, copyright 1998, IEEE

Page 62: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision Latest results on Pedestrian Detection:

Viola, Jones and Snow’s paper (ICCV’03: Marr prize)

• Combine static and dynamic features

cascade for efficiency (4 frames/s)

5 best out of 55k (AdaBoost)

5 best static out of 28k (AdaBoost)

some positive examples used for training

Page 63: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision Dynamic detection

false detection: typically 1/400,000(=1 every 2 frames for 360x240)

Page 64: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision Static detection

Page 65: Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …

ComputerVision Next class:

Object recognition

Reading: Chapter 23