object recognition jana kosecka slides from d. lowe, d. forsythe and j. ponce book, iccv 2005...

Post on 20-Jan-2016

221 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Object recognition

Jana Kosecka

Slides from D. Lowe, D. Forsythe and J. Ponce book, ICCV 2005 Tutorial Fei-Fei Li,Rob Fergus and A. Torralba

perceptibleperceptible visionvision materialmaterialthingthing

Challenges 1: view point variation

Michelangelo 1475-1564

Challenges 2: illumination

slide credit: S. Ullman

Challenges 3: occlusion

Magritte, 1957

Challenges 4: scale

Challenges 5: deformation

Xu, Beihong 1943

Challenges 6: background clutter

Klimt, 1913

Object Recognition - history

• Collected databases of objects on uniform background (no occlusions, no clutter)

• Mostly focus on viewpoint variations (at fixed scale)

• PCA based techniques, Eigenimages [Turk 91], Leonardis & Bishoph 98]• Not robust with respect to occlusions, clutter

changes of viewpoint• Requires segmentation

History - Recognition

• Alternative representations - Color histogram [Swain 91] (not discriminative enough)

• Geometric invariants [Rothwell 92] - Function with a value independent of the transformation - Invariant for image rotation : distance of two points - Invariant for planar homography : cross-ratio

Figure from “Efficient model library access by projectively invariant indexing functions,” by C.A. Rothwell et al., Proc. Computer Vision and Pattern Recognition, 1992, copyright 1992, IEEE - (courtesy Forsythe, Ponce CV, Prentice Hall)

Matching by relations

• Idea:– find bits, then say object is present if bits are ok

• Advantage:– objects with complex configuration spaces don’t make good templates• internal degrees of freedom• aspect changes• (possibly) shading• variations in texture• etc.

Simplest

• Define a set of local feature templates– could find these with filters, etc.– corner detector+filters

• Think of objects as patterns• Each template votes for all patterns that contain it

• Pattern with the most votes wins

History: single object recognition

History: single object recognition

• Lowe, et al. 1999, 2003

• Mahamud and Herbert, 2000• Ferrari, Tuytelaars, and Van Gool, 2004• Rothganger, Lazebnik, and Ponce, 2004• Moreels and Perona, 2005• …

Invariant Local Features

• Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging parameters

SIFT Features

Example of keypoint detection

Threshold on value at DOG peak and on ratio of principle curvatures (Harris approach)

(a) 233x189 image(b) 832 DOG extrema(c) 729 left after peak value threshold(d) 536 left after testing ratio of principle curvatures

SIFT vector formation

• Thresholded image gradients are sampled over 16x16 array of locations in scale space

• Create array of orientation histograms• 8 orientations x 4x4 histogram array = 128

dimensions

Model verification

• Examine all clusters with at least 3 features

• Perform least-squares affine fit to model.

• Discard outliers and perform top-down check for additional features.

• Evaluate probability that match is correct Use Bayesian model, with probability that

features would arise by chance if object was not present (Lowe, CVPR 01)

Solution for affine parameters

• Affine transform of [x,y] to [u,v]:

• Rewrite to solve for transform parameters:

3D Object Recognition

• Extract outlines with background subtraction

3D Object Recognition

• Only 3 keys are needed for recognition, so extra keys provide robustness

• Affine model is no longer as accurate

Recognition under occlusion

Test of illumination invariance

• Same image under differing illumination

273 keys verified in final match

Recognition using View Interpolation

Figure from “Local grayvalue invariants for image retrieval,” by C. Schmid and R. Mohr, IEEE Trans. Pattern Analysis and Machine Intelligence, 1997 copyright 1997, IEEE

Employ spatial relations

Figure from “Local grayvalue invariants for image retrieval,” by C. Schmid and R. Mohr, IEEE Trans. Pattern Analysis and Machine Intelligence, 1997 copyright 1997, IEEE

Figure from “Local grayvalue invariants for image retrieval,” by C. Schmid and R. Mohr, IEEE Trans. Pattern Analysis and Machine Intelligence, 1997 copyright 1997, IEEE

Recognition …

• Integration of multiple view models (Complex 3D objects)

• Generative vs Discriminative Models• Scaling issues > 10000 object • Recognition of object categories

• Alternative models of context• intra-object-within-class variations (chairs)• Different feature types • Enable models with large number of parts

• Image based retrieval – annotating by semantic context

• Associating words with pictures

History: early object categorization

Figure from, “A general framework for object detection,” by C. Papageorgiou, M. Oren and T. Poggio, Proc. ICCV, copyright 1998, IEEE

Recognition by finding patternsGeneral strategy: search image windows at a range of scales

correct for illuminationPresent corrected window to classifier: - face/no face classifier

Figure from A Statistical Method for 3D Object Detection Applied to Faces and Cars, H. Schneiderman and T. Kanade, CVPR, 2000.

Object categorization: Object categorization: the statistical viewpointthe statistical viewpoint

)|( imagezebrap

)( ezebra|imagnop

vs.

• Bayes rule:

)(

)(

)|(

)|(

)|(

)|(

zebranop

zebrap

zebranoimagep

zebraimagep

imagezebranop

imagezebrap⋅=

posterior ratio likelihood ratio prior ratio

Object categorization: Object categorization: the statistical viewpointthe statistical viewpoint

)(

)(

)|(

)|(

)|(

)|(

zebranop

zebrap

zebranoimagep

zebraimagep

imagezebranop

imagezebrap⋅=

posterior ratio likelihood ratio prior ratio

• Discriminative methods model posterior

• Generative methods model likelihood and prior

Discriminative

• Direct modeling of

Zebra

Non-zebra

Decisionboundary

)|(

)|(

imagezebranop

imagezebrap

• Model and

Generative)|( zebraimagep ) |( zebranoimagep

Low Middle

High MiddleLow

)|( zebranoimagep

p(image | zebra)

Finding faces using relations

• Strategy:– Face is eyes, nose, mouth, etc. with appropriate relations between them

– build a specialised detector for each of these (template matching) and look for groups with the right internal structure

– Once we’ve found enough of a face, there is little uncertainty about where the other bits could be

Three main issuesThree main issues

• Representation– How to represent an object category

• Learning– How to form the classifier, given training data

• Recognition– How the classifier is to be used on novel data

Representation

– Generative / discriminative / hybrid

– Appearance only or location and appearance

ObjectObject Bag of ‘words’Bag of ‘words’

Analogy to documentsAnalogy to documents

Of all the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. Our perception of the world around us is based essentially on the messages that reach the brain from our eyes. For a long time it was thought that the retinal image was transmitted point by point to visual centers in the brain; the cerebral cortex was a movie screen, so to speak, upon which the image in the eye was projected. Through the discoveries of Hubel and Wiesel we now know that behind the origin of the visual perception in the brain there is a considerably more complicated course of events. By following the visual impulses along their path to the various cell layers of the optical cortex, Hubel and Wiesel have been able to demonstrate that the message about the image falling on the retina undergoes a step-wise analysis in a system of nerve cells stored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image.

sensory, brain, visual, perception,

retinal, cerebral cortex,eye, cell, optical

nerve, imageHubel, Wiesel

China is forecasting a trade surplus of $90bn (£51bn) to $100bn this year, a threefold increase on 2004's $32bn. The Commerce Ministry said the surplus would be created by a predicted 30% jump in exports to $750bn, compared with a 18% rise in imports to $660bn. The figures are likely to further annoy the US, which has long argued that China's exports are unfairly helped by a deliberately undervalued yuan. Beijing agrees the surplus is too high, but says the yuan is only one factor. Bank of China governor Zhou Xiaochuan said the country also needed to do more to boost domestic demand so more goods stayed within the country. China increased the value of the yuan against the dollar by 2.1% in July and permitted it to trade within a narrow band, but the US wants the yuan to be allowed to trade freely. However, Beijing has made it clear that it will take its time and tread carefully before allowing the yuan to rise further in value.

China, trade, surplus, commerce,

exports, imports, US, yuan, bank, domestic,

foreign, increase, trade, value

categorycategorydecisiondecision

learninglearning

feature detection& representation

codewords dictionarycodewords dictionary

image representation

category modelscategory models(and/or) classifiers(and/or) classifiers

recognitionrecognition

Problem with bag-of-words

• All have equal probability for bag-of-words methods

• Location information is important

Finding faces using relations

• Strategy: compare

Notice that once some facialfeatures have been found, theposition of the rest is quitestrongly constrained.Figure from, “Finding faces in cluttered scenes using random labelled graph matching,” by Leung, T. ;Burl, M and Perona, P., Proc. Int. Conf. on Computer Vision, 1995 copyright 1995, IEEE

Constellation of Parts Model

Fischler & Elschlagar, 1973

• [Coots, Taylor et al’95] Univ. of Manchester• [G.Csurka et al’04] Xerox Research Europe• [Webber et al’00, Fei-Fei Li et al 03] Caltech• [Fergus et al’03,04] Oxford

• Constellation of parts models• Object detection part, category recognition

Feature extraction and description

• Scale & Affine Invariant Harris Corner detector [K. Mikolajczyk, ECCV 2002]

SIFT feature descriptor [David Lowe, IJCV 2004]

f1f2

fd

…..

Gradient orientation histogram of the region

How to Obtain Parts?

• Challenge – many features come from the background– how to automatically select features which belong

to the objects?

– Feature selection – discarding irrelevant / non-discriminative features

– Parts – Clusters in the feature descriptor space

Features from training data

features in class 0features in class 1

components in class 0

Components in class 1

Feature Selection in Category Classification

•30 training Images for face and motorbike.

Constellation of Parts Model

• Strategy for learning models for recognition

Idea: Learn generative probabilistic model of objects

1. Run part detectors, obtain parts (location, appearance, scale)

2. Form likely object hypothesis, update the probability model and validate - hypothesis is particular configuration of

parts

Recognition (computing likelihood ratio)

Foreground model

Gaussian shape pdf

Prob. of detection

Gaussian part appearance pdf

Generative probabilistic model courtesy of R. Fergus

(presentation)

Uniform shape pdf

Clutter model

Gaussian appearance pdf

Gaussian relative scale pdf

Log(scale)

0.8 0.75 0.9

Poission pdf on # detections

Uniformrelative scale pdf

Log(scale)

Constellation model and Bayesian Framework

• P parts: location X, Scale S and Appearance A.

• Distribution is modeled by hidden parameters .( e.g. Mean, Covariance of Gaussian)

• Maximum Likelihood (ML) with a single value (Fergus et al) . • Approximation to make the integral tractable (Li Fei-Fei et al)

Appearance, shape, scale, hypothesis

Motorbikes (Fergus’s results)

Samples from appearance model

References

• L. Fei-Fei, R. Fergus, and P. Perona. A Bayesian approach to unsupervised One-Shot learning of Object categories. Proc. ICCV. 2003

• Fergus, R. , Perona, P. and Zisserman, A., A Visual Category Filter for Google Images ,Proc. Of of the 8th European Conf. on Computer Vision, ECCV 2004.

• Y. Amit and D. Geman, “A computational model for visual selection”, Neural Computation, vol. 11, no. 7, pp1691-1715, 1999.

• M. Weber, M. Welling and P. Perona, “Unsupervised learning of models for recognition”, Proc. 6th ECCV, vol. 2, pp. 101-108, 2000.

Horses

Figure from “Efficient Matching of Pictorial Structures,” P. Felzenszwalb and D.P. Huttenlocher, Proc. Computer Vision and Pattern Recognition2000, copyright 2000, IEEE

Hausdorff distance matching

• Let M be an nxn binary template and N an nxn binary image we want to compare to that template

• H(M,N)) = max(h(M, N), h(N, M)) where

– || || is a distance function like the Euclidean distance function

• h(A,B) is called the directed Hausdorff distance.– ranks each point in A based on closeness to a point in B– most mis-matched point is measure of match– if h(A,B) = e, then all points in A must be within

distance e of B.– generally, h(A,B) <> h(B,A)– easy to compute Hausdorff distances from distance

transform

h(A,B)=maxa∈A

minb∈B

a−b

Haussdorf Distance Matching

Shape Context

• From Shape Matching and Object Recognition using Shape Context, by Belongie, Malik, Puzicha, IEEE PAMI (24), 2002

top related