object recognition szeliski chapter 14. recognition

57
Object Recognition Szeliski Chapter 14

Post on 19-Dec-2015

292 views

Category:

Documents


10 download

TRANSCRIPT

Page 1: Object Recognition Szeliski Chapter 14. Recognition

Object Recognition

Szeliski Chapter 14

Page 2: Object Recognition Szeliski Chapter 14. Recognition

Recognition

Page 3: Object Recognition Szeliski Chapter 14. Recognition

Recognition

Simultaneous recognition and segmentation (Shotton, Winn, Rother et al. 2009)

Location recognition (Philbin, Chum, Isard et al. 2007)

Using context (Russell, Torralba, Liu et al. 2007).

Page 4: Object Recognition Szeliski Chapter 14. Recognition

Recognition Slides from Rick Szeliski

Recognition problems

What is it?• Object and scene recognition

Who is it?• Identity recognition

Where is it?• Object detection

What are they doing?• Activities

All of these are classification problems• Choose one class from a list of possible candidates

Page 5: Object Recognition Szeliski Chapter 14. Recognition

Recognition Slides from Rick Szeliski

What is recognition?

A different taxonomy from [Csurka et al. 2006]:• Recognition

• Where is this particular object?

• Categorization• What kind of object(s) is(are) present?

• Content-based image retrieval• Find me something that looks similar

• Detection• Locate all instances of a given class

Page 6: Object Recognition Szeliski Chapter 14. Recognition

Recognition Slides from Rick Szeliski

Readings

• Weakly Supervised Scale-Invariant Learning of Models for Visual Recognition

Fergus, R. , Perona, P. and Zisserman, A. International Journal of Computer Vision, Vol. 71(3), 273-303, March 2007

• MIT course • http://people.csail.mit.edu/torralba/courses/

6.870/6.870.recognition.htm

Page 7: Object Recognition Szeliski Chapter 14. Recognition

Recognition Slides from Rick Szeliski

Sources

• Steve Seitz, CSE 455/576, previous quarters

• Fei-Fei, Fergus, Torralba, CVPR’2007 course

• Efros, CMU 16-721 Learning in Vision

• Freeman, MIT 6.869 Computer Vision: Learning

• Linda Shapiro, CSE 576, Spring 2007

Page 8: Object Recognition Szeliski Chapter 14. Recognition

Object Detection

How to recognize each person in this image?

Recognition

Every possible sub-window?

Effective special-purpose detectors: rapidly find likely regions where particular objects might occur.

Page 9: Object Recognition Szeliski Chapter 14. Recognition

Object Detection

• Face Detection• More successful • Built in

• digital cameras to enhance auto-focus• Video conference to control pan-tilt heads• …

• General object detection • Pedestrian • Cars.

Recognition

Page 10: Object Recognition Szeliski Chapter 14. Recognition

Face Detection

• Every pixel? Scale?• Too slow in practice

• Tutorials• http://people.csail.mit.edu/torralba/shortCourseRLOC/

• General detection and recognition

• http://vision.ai.uiuc.edu/mhyang/face-detection-survey.html• Face recognition

Recognition

Page 11: Object Recognition Szeliski Chapter 14. Recognition

CSE 576, Spring 2008 Face Recognition and Detection 12

Face detection

How to tell if a face is present?

Page 12: Object Recognition Szeliski Chapter 14. Recognition

CSE 576, Spring 2008 Face Recognition and Detection 13

Skin detection

Skin pixels have a distinctive range of colors• Corresponds to region(s) in RGB color space

Skin classifier• A pixel X = (R,G,B) is skin if it is in the skin (color) region• How to find this region?

skin

Page 13: Object Recognition Szeliski Chapter 14. Recognition

CSE 576, Spring 2008 Face Recognition and Detection 14

Skin detection

Learn the skin region from examples• Manually label skin/non pixels in one or more “training images”• Plot the training data in RGB space

– skin pixels shown in orange, non-skin pixels shown in gray

– some skin pixels may be outside the region, non-skin pixels inside.

Page 14: Object Recognition Szeliski Chapter 14. Recognition

CSE 576, Spring 2008 Face Recognition and Detection 15

Skin classifier

Given X = (R,G,B): how to determine if it is skin or not?• Nearest neighbor

– find labeled pixel closest to X

• Find plane/curve that separates the two classes– popular approach: Support Vector Machines (SVM)

• Data modeling– fit a probability density/distribution model to each class

Page 15: Object Recognition Szeliski Chapter 14. Recognition

CSE 576, Spring 2008 Face Recognition and Detection 16

Probability

• X is a random variable• P(X) is the probability that X achieves a certain

value

continuous X discrete X

called a PDF-probability distribution/density function-a 2D PDF is a surface-3D PDF is a volume

Page 16: Object Recognition Szeliski Chapter 14. Recognition

CSE 576, Spring 2008 Face Recognition and Detection 17

Probabilistic skin classification

Model PDF / uncertainty• Each pixel has a probability of being skin or not skin

Skin classifier• Given X = (R,G,B): how to determine if it is skin or not?

• Choose interpretation of highest probability

Where do we get and ?

Page 17: Object Recognition Szeliski Chapter 14. Recognition

CSE 576, Spring 2008 Face Recognition and Detection 18

Learning conditional PDF’s

We can calculate P(R | skin) from a set of training images• It is simply a histogram over the pixels in the training images

• each bin Ri contains the proportion of skin pixels with color Ri

• This doesn’t work as well in higher-dimensional spaces. Why not?Approach: fit parametric PDF functions

• common choice is rotated Gaussian – center – covariance

Page 18: Object Recognition Szeliski Chapter 14. Recognition

CSE 576, Spring 2008 Face Recognition and Detection 19

Learning conditional PDF’s

We can calculate P(R | skin) from a set of training images

But this isn’t quite what we want• Why not? How to determine if a pixel is skin?• We want P(skin | R) not P(R | skin)• How can we get it?

Page 19: Object Recognition Szeliski Chapter 14. Recognition

CSE 576, Spring 2008 Face Recognition and Detection 20

Bayes rule

In terms of our problem:

what we measure(likelihood)

domain knowledge(prior)

what we want(posterior)

normalization term

What can we use for the prior P(skin)?• Domain knowledge:

– P(skin) may be larger if we know the image contains a person– For a portrait, P(skin) may be higher for pixels in the center

• Learn the prior from the training set. How?– P(skin) is proportion of skin pixels in training set

Page 20: Object Recognition Szeliski Chapter 14. Recognition

CSE 576, Spring 2008 Face Recognition and Detection 21

Bayesian estimation

Bayesian estimation• Goal is to choose the label (skin or ~skin) that maximizes the posterior ↔

minimizes probability of misclassification– this is called Maximum A Posteriori (MAP) estimation

likelihood posterior (unnormalized)

Page 21: Object Recognition Szeliski Chapter 14. Recognition

CSE 576, Spring 2008 Face Recognition and Detection 22

Skin detection results

Page 22: Object Recognition Szeliski Chapter 14. Recognition

CSE 576, Spring 2008 Face Recognition and Detection 23

This same procedure applies in more general circumstances• More than two classes• More than one dimension

General classification

Example: face detection• Here, X is an image region

– dimension = # pixels – each face can be thought of as a

point in a high dimensional space

H. Schneiderman, T. Kanade. "A Statistical Method for 3D Object Detection Applied to Faces and Cars". CVPR 2000

Page 23: Object Recognition Szeliski Chapter 14. Recognition

Face Detection

• Feature-based • Distinctive image features: eyes, nose, mouth

• Geometrical arrangement

• Template-based • Active shape model (ASM), active appearance model (AAM)

• Requires good initialization near a real face

• Appearance-based • Search for likely candidates

• Refine using cascade of more expensive but selective detection algorithm

• Rely heavily on training classifiers using labeled faces

Recognition

Page 24: Object Recognition Szeliski Chapter 14. Recognition

Recognition Slides from Rick Szeliski

Page 25: Object Recognition Szeliski Chapter 14. Recognition

Li Fei-Fei, Princeton

Rob Fergus, MIT

Antonio Torralba, MIT

Recognizing and Learning Recognizing and Learning Object Categories: Year 2007Object Categories: Year 2007

CVPR 2007 Minneapolis, Short Course, June 17CVPR 2007 Minneapolis, Short Course, June 17

(see other slide deck)

Page 26: Object Recognition Szeliski Chapter 14. Recognition

Recognition Slides from Rick Szeliski

Today’s lecture

• Known object recognition [Lowe]

• Bag of keypoints [Csurka etc.]

• Location recognition [Schindler et al.]

• Deformable object/category recognition [Fergus et al.]

• Recognition by segmentation

Page 27: Object Recognition Szeliski Chapter 14. Recognition

Recognition Slides from Rick Szeliski

Single object recognition

Page 28: Object Recognition Szeliski Chapter 14. Recognition

Recognition Slides from Rick Szeliski

Single object recognition

• Lowe, et al. 1999, 2003• Mahamud and Herbert, 2000• Ferrari, Tuytelaars, and Van Gool, 2004• Rothganger, Lazebnik, and Ponce, 2004• Moreels and Perona, 2005• …

Page 29: Object Recognition Szeliski Chapter 14. Recognition

Recognition Slides from Rick Szeliski

Planar object recognition [Lowe]

• Use SIFT features• Verify affine

(or homography) geometric alignment

Page 30: Object Recognition Szeliski Chapter 14. Recognition

Recognition Slides from Rick Szeliski

Planar object recognition [Lowe]

• Use SIFT features• Verify affine

(or homography) geometric alignment

Page 31: Object Recognition Szeliski Chapter 14. Recognition

Recognition Slides from Rick Szeliski

3D object recognition [Lowe]

• Extract object outlines with background subtraction

Page 32: Object Recognition Szeliski Chapter 14. Recognition

Recognition Slides from Rick Szeliski

3D object recognition [Lowe]

• Use 3 matches to recognize

• Use additional matches for verification

• Tolerant to occlusions

Page 33: Object Recognition Szeliski Chapter 14. Recognition

Recognition Slides from Rick Szeliski

Feature-based recognition

How can we scale to millions of objects?

Comparison to all stored objects/features is infeasible.

Answer:• quantize features into words [Csurka et al.

04]• use information retrieval (inverted index)• use metric tree for faster quantization (NN)

[Nister & Stewenius 05]

Page 34: Object Recognition Szeliski Chapter 14. Recognition

Recognition Slides from Rick Szeliski

Today’s lecture

• Known object recognition [Lowe]

• Bag of keypoints [Csurka etc.]

• Location recognition [Schindler et al.]

• Deformable object/category recognition [Fergus et al.]

• Recognition by segmentation

Page 35: Object Recognition Szeliski Chapter 14. Recognition

CVPR 2007 Minneapolis, Short Course, June 17CVPR 2007 Minneapolis, Short Course, June 17

Part 1: Bag-of-words models

by Li Fei-Fei (Princeton)

(see other slide deck)

Page 36: Object Recognition Szeliski Chapter 14. Recognition

Recognition Slides from Rick Szeliski

Today’s lecture

• Known object recognition [Lowe]

• Bag of keypoints [Csurka etc.]

• Location recognition [Schindler et al.]

• Deformable object/category recognition [Fergus et al.]

• Recognition by segmentation

Page 37: Object Recognition Szeliski Chapter 14. Recognition

Recognition Slides from Rick Szeliski

How to scale to 106s of images?

Make “word” generation even more efficient:

“Vocabulary tree”

Page 38: Object Recognition Szeliski Chapter 14. Recognition

Recognition Slides from Rick Szeliski

Scalable Recognition with a Vocabulary Tree

David Nistér, Henrik Stewénius

Page 39: Object Recognition Szeliski Chapter 14. Recognition

Recognition Slides from Rick Szeliski

Vocabulary Tree

Page 40: Object Recognition Szeliski Chapter 14. Recognition

Recognition Slides from Rick Szeliski

Page 41: Object Recognition Szeliski Chapter 14. Recognition

Recognition Slides from Rick Szeliski

Performance

Page 42: Object Recognition Szeliski Chapter 14. Recognition

Recognition Slides from Rick Szeliski

http://vis.uky.edu/~stewe/ukbench/

Page 43: Object Recognition Szeliski Chapter 14. Recognition

Recognition Slides from Rick Szeliski

Location Recognition

Can we apply this to recognizing your location from a cell-phone photo?

Page 44: Object Recognition Szeliski Chapter 14. Recognition

City-Scale Location Recognition

Grant Schindler, Matthew Brown,and Richard Szeliski

CVPR’2007

Page 45: Object Recognition Szeliski Chapter 14. Recognition

Recognition Slides from Rick Szeliski

The Problem

Page 46: Object Recognition Szeliski Chapter 14. Recognition

Recognition Slides from Rick Szeliski

Main idea

Find N-best matches in vocabulary tree

Page 47: Object Recognition Szeliski Chapter 14. Recognition

Recognition Slides from Rick Szeliski

Other ideas

• Use only informativefeatures (ignore trees…)

• Integrate matches withadjacent (streetside) neighbors

Page 48: Object Recognition Szeliski Chapter 14. Recognition

Recognition Slides from Rick Szeliski

Today’s lecture

• Known object recognition [Lowe]

• Bag of keypoints [Csurka etc.]

• Location recognition [Schindler et al.]

• Deformable object/category recognition [Fergus et al.]

• Recognition by segmentation

Page 49: Object Recognition Szeliski Chapter 14. Recognition

Part 2: part-based models

by Rob Fergus (MIT)

CVPR 2007 Minneapolis, Short Course, June 17CVPR 2007 Minneapolis, Short Course, June 17

(see other slide deck)

Page 50: Object Recognition Szeliski Chapter 14. Recognition

Recognition Slides from Rick Szeliski

Today’s lecture

• Known object recognition [Lowe]

• Bag of keypoints [Csurka etc.]

• Location recognition [Schindler et al.]

• Deformable object/category recognition [Fergus et al.]

• Recognition by segmentation

Page 51: Object Recognition Szeliski Chapter 14. Recognition

Part 4: Combined segmentation and recognition

by Rob Fergus (MIT)

CVPR 2007 Minneapolis, Short Course, June 17CVPR 2007 Minneapolis, Short Course, June 17

Page 52: Object Recognition Szeliski Chapter 14. Recognition

Recognition Slides from Rick Szeliski

Aim

Given an image and object category, segment the object

Segmentation should (ideally) be• shaped like the object e.g. cow-like• obtained efficiently in an unsupervised manner• able to handle self-occlusion

Segmentation

ObjectCategory

Model

Cow Image Segmented Cow

Slide from Kumar ‘05

Page 53: Object Recognition Szeliski Chapter 14. Recognition

Recognition Slides from Rick Szeliski

Implicit Shape Model - Liebe and Schiele, 2003

BackprojectedHypotheses

Interest PointsMatched Codebook Entries

Probabilistic Voting

Voting Space(continuous)

Backprojection

of Maxima

Segmentation

Refined Hypotheses(uniform sampling)

Page 54: Object Recognition Szeliski Chapter 14. Recognition

Recognition Slides from Rick Szeliski

Other topics: context (scenes)

Antonio Torralba, Contextual Priming for Object Detection,IJCV(53), No. 2, July 2003, pp. 169-191

Page 55: Object Recognition Szeliski Chapter 14. Recognition

Recognition Slides from Rick Szeliski

New work: tiny images

Page 56: Object Recognition Szeliski Chapter 14. Recognition

Datasets and object collections

CVPR 2007 Minneapolis, Short Course, June 17CVPR 2007 Minneapolis, Short Course, June 17

(see other slide deck)

Page 57: Object Recognition Szeliski Chapter 14. Recognition

Recognition Slides from Rick Szeliski

Summary of object recognition

• Known object recognition [Lowe]

• Bag of keypoints [Csurka etc.]

• Location recognition [Schindler et al.]

• Deformable object/category recognition [Fergus et al.]

• Recognition by segmentation

• Context and scenes