object recognition jana kosecka slides from d. lowe, d. forsythe and j. ponce book, iccv 2005...
Post on 20-Jan-2016
221 Views
Preview:
TRANSCRIPT
Object recognition
Jana Kosecka
Slides from D. Lowe, D. Forsythe and J. Ponce book, ICCV 2005 Tutorial Fei-Fei Li,Rob Fergus and A. Torralba
perceptibleperceptible visionvision materialmaterialthingthing
Challenges 1: view point variation
Michelangelo 1475-1564
Challenges 2: illumination
slide credit: S. Ullman
Challenges 3: occlusion
Magritte, 1957
Challenges 4: scale
Challenges 5: deformation
Xu, Beihong 1943
Challenges 6: background clutter
Klimt, 1913
Object Recognition - history
• Collected databases of objects on uniform background (no occlusions, no clutter)
• Mostly focus on viewpoint variations (at fixed scale)
• PCA based techniques, Eigenimages [Turk 91], Leonardis & Bishoph 98]• Not robust with respect to occlusions, clutter
changes of viewpoint• Requires segmentation
History - Recognition
• Alternative representations - Color histogram [Swain 91] (not discriminative enough)
• Geometric invariants [Rothwell 92] - Function with a value independent of the transformation - Invariant for image rotation : distance of two points - Invariant for planar homography : cross-ratio
Figure from “Efficient model library access by projectively invariant indexing functions,” by C.A. Rothwell et al., Proc. Computer Vision and Pattern Recognition, 1992, copyright 1992, IEEE - (courtesy Forsythe, Ponce CV, Prentice Hall)
Matching by relations
• Idea:– find bits, then say object is present if bits are ok
• Advantage:– objects with complex configuration spaces don’t make good templates• internal degrees of freedom• aspect changes• (possibly) shading• variations in texture• etc.
Simplest
• Define a set of local feature templates– could find these with filters, etc.– corner detector+filters
• Think of objects as patterns• Each template votes for all patterns that contain it
• Pattern with the most votes wins
History: single object recognition
History: single object recognition
• Lowe, et al. 1999, 2003
• Mahamud and Herbert, 2000• Ferrari, Tuytelaars, and Van Gool, 2004• Rothganger, Lazebnik, and Ponce, 2004• Moreels and Perona, 2005• …
Invariant Local Features
• Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging parameters
SIFT Features
Example of keypoint detection
Threshold on value at DOG peak and on ratio of principle curvatures (Harris approach)
(a) 233x189 image(b) 832 DOG extrema(c) 729 left after peak value threshold(d) 536 left after testing ratio of principle curvatures
SIFT vector formation
• Thresholded image gradients are sampled over 16x16 array of locations in scale space
• Create array of orientation histograms• 8 orientations x 4x4 histogram array = 128
dimensions
Model verification
• Examine all clusters with at least 3 features
• Perform least-squares affine fit to model.
• Discard outliers and perform top-down check for additional features.
• Evaluate probability that match is correct Use Bayesian model, with probability that
features would arise by chance if object was not present (Lowe, CVPR 01)
Solution for affine parameters
• Affine transform of [x,y] to [u,v]:
• Rewrite to solve for transform parameters:
3D Object Recognition
• Extract outlines with background subtraction
3D Object Recognition
• Only 3 keys are needed for recognition, so extra keys provide robustness
• Affine model is no longer as accurate
Recognition under occlusion
Test of illumination invariance
• Same image under differing illumination
273 keys verified in final match
Recognition using View Interpolation
Figure from “Local grayvalue invariants for image retrieval,” by C. Schmid and R. Mohr, IEEE Trans. Pattern Analysis and Machine Intelligence, 1997 copyright 1997, IEEE
Employ spatial relations
Figure from “Local grayvalue invariants for image retrieval,” by C. Schmid and R. Mohr, IEEE Trans. Pattern Analysis and Machine Intelligence, 1997 copyright 1997, IEEE
Figure from “Local grayvalue invariants for image retrieval,” by C. Schmid and R. Mohr, IEEE Trans. Pattern Analysis and Machine Intelligence, 1997 copyright 1997, IEEE
Recognition …
• Integration of multiple view models (Complex 3D objects)
• Generative vs Discriminative Models• Scaling issues > 10000 object • Recognition of object categories
• Alternative models of context• intra-object-within-class variations (chairs)• Different feature types • Enable models with large number of parts
• Image based retrieval – annotating by semantic context
• Associating words with pictures
History: early object categorization
Figure from, “A general framework for object detection,” by C. Papageorgiou, M. Oren and T. Poggio, Proc. ICCV, copyright 1998, IEEE
Recognition by finding patternsGeneral strategy: search image windows at a range of scales
correct for illuminationPresent corrected window to classifier: - face/no face classifier
Figure from A Statistical Method for 3D Object Detection Applied to Faces and Cars, H. Schneiderman and T. Kanade, CVPR, 2000.
Object categorization: Object categorization: the statistical viewpointthe statistical viewpoint
)|( imagezebrap
)( ezebra|imagnop
vs.
• Bayes rule:
)(
)(
)|(
)|(
)|(
)|(
zebranop
zebrap
zebranoimagep
zebraimagep
imagezebranop
imagezebrap⋅=
posterior ratio likelihood ratio prior ratio
Object categorization: Object categorization: the statistical viewpointthe statistical viewpoint
)(
)(
)|(
)|(
)|(
)|(
zebranop
zebrap
zebranoimagep
zebraimagep
imagezebranop
imagezebrap⋅=
posterior ratio likelihood ratio prior ratio
• Discriminative methods model posterior
• Generative methods model likelihood and prior
Discriminative
• Direct modeling of
Zebra
Non-zebra
Decisionboundary
)|(
)|(
imagezebranop
imagezebrap
• Model and
Generative)|( zebraimagep ) |( zebranoimagep
Low Middle
High MiddleLow
)|( zebranoimagep
€
p(image | zebra)
Finding faces using relations
• Strategy:– Face is eyes, nose, mouth, etc. with appropriate relations between them
– build a specialised detector for each of these (template matching) and look for groups with the right internal structure
– Once we’ve found enough of a face, there is little uncertainty about where the other bits could be
Three main issuesThree main issues
• Representation– How to represent an object category
• Learning– How to form the classifier, given training data
• Recognition– How the classifier is to be used on novel data
Representation
– Generative / discriminative / hybrid
– Appearance only or location and appearance
ObjectObject Bag of ‘words’Bag of ‘words’
Analogy to documentsAnalogy to documents
Of all the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. Our perception of the world around us is based essentially on the messages that reach the brain from our eyes. For a long time it was thought that the retinal image was transmitted point by point to visual centers in the brain; the cerebral cortex was a movie screen, so to speak, upon which the image in the eye was projected. Through the discoveries of Hubel and Wiesel we now know that behind the origin of the visual perception in the brain there is a considerably more complicated course of events. By following the visual impulses along their path to the various cell layers of the optical cortex, Hubel and Wiesel have been able to demonstrate that the message about the image falling on the retina undergoes a step-wise analysis in a system of nerve cells stored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image.
sensory, brain, visual, perception,
retinal, cerebral cortex,eye, cell, optical
nerve, imageHubel, Wiesel
China is forecasting a trade surplus of $90bn (£51bn) to $100bn this year, a threefold increase on 2004's $32bn. The Commerce Ministry said the surplus would be created by a predicted 30% jump in exports to $750bn, compared with a 18% rise in imports to $660bn. The figures are likely to further annoy the US, which has long argued that China's exports are unfairly helped by a deliberately undervalued yuan. Beijing agrees the surplus is too high, but says the yuan is only one factor. Bank of China governor Zhou Xiaochuan said the country also needed to do more to boost domestic demand so more goods stayed within the country. China increased the value of the yuan against the dollar by 2.1% in July and permitted it to trade within a narrow band, but the US wants the yuan to be allowed to trade freely. However, Beijing has made it clear that it will take its time and tread carefully before allowing the yuan to rise further in value.
China, trade, surplus, commerce,
exports, imports, US, yuan, bank, domestic,
foreign, increase, trade, value
categorycategorydecisiondecision
learninglearning
feature detection& representation
codewords dictionarycodewords dictionary
image representation
category modelscategory models(and/or) classifiers(and/or) classifiers
recognitionrecognition
Problem with bag-of-words
• All have equal probability for bag-of-words methods
• Location information is important
Finding faces using relations
• Strategy: compare
Notice that once some facialfeatures have been found, theposition of the rest is quitestrongly constrained.Figure from, “Finding faces in cluttered scenes using random labelled graph matching,” by Leung, T. ;Burl, M and Perona, P., Proc. Int. Conf. on Computer Vision, 1995 copyright 1995, IEEE
Constellation of Parts Model
Fischler & Elschlagar, 1973
• [Coots, Taylor et al’95] Univ. of Manchester• [G.Csurka et al’04] Xerox Research Europe• [Webber et al’00, Fei-Fei Li et al 03] Caltech• [Fergus et al’03,04] Oxford
• Constellation of parts models• Object detection part, category recognition
Feature extraction and description
• Scale & Affine Invariant Harris Corner detector [K. Mikolajczyk, ECCV 2002]
SIFT feature descriptor [David Lowe, IJCV 2004]
f1f2
fd
…..
Gradient orientation histogram of the region
How to Obtain Parts?
• Challenge – many features come from the background– how to automatically select features which belong
to the objects?
– Feature selection – discarding irrelevant / non-discriminative features
– Parts – Clusters in the feature descriptor space
Features from training data
features in class 0features in class 1
components in class 0
Components in class 1
Feature Selection in Category Classification
•30 training Images for face and motorbike.
Constellation of Parts Model
• Strategy for learning models for recognition
Idea: Learn generative probabilistic model of objects
1. Run part detectors, obtain parts (location, appearance, scale)
2. Form likely object hypothesis, update the probability model and validate - hypothesis is particular configuration of
parts
Recognition (computing likelihood ratio)
Foreground model
Gaussian shape pdf
Prob. of detection
Gaussian part appearance pdf
Generative probabilistic model courtesy of R. Fergus
(presentation)
Uniform shape pdf
Clutter model
Gaussian appearance pdf
Gaussian relative scale pdf
Log(scale)
0.8 0.75 0.9
Poission pdf on # detections
Uniformrelative scale pdf
Log(scale)
Constellation model and Bayesian Framework
• P parts: location X, Scale S and Appearance A.
• Distribution is modeled by hidden parameters .( e.g. Mean, Covariance of Gaussian)
• Maximum Likelihood (ML) with a single value (Fergus et al) . • Approximation to make the integral tractable (Li Fei-Fei et al)
Appearance, shape, scale, hypothesis
Motorbikes (Fergus’s results)
Samples from appearance model
References
• L. Fei-Fei, R. Fergus, and P. Perona. A Bayesian approach to unsupervised One-Shot learning of Object categories. Proc. ICCV. 2003
• Fergus, R. , Perona, P. and Zisserman, A., A Visual Category Filter for Google Images ,Proc. Of of the 8th European Conf. on Computer Vision, ECCV 2004.
• Y. Amit and D. Geman, “A computational model for visual selection”, Neural Computation, vol. 11, no. 7, pp1691-1715, 1999.
• M. Weber, M. Welling and P. Perona, “Unsupervised learning of models for recognition”, Proc. 6th ECCV, vol. 2, pp. 101-108, 2000.
Horses
Figure from “Efficient Matching of Pictorial Structures,” P. Felzenszwalb and D.P. Huttenlocher, Proc. Computer Vision and Pattern Recognition2000, copyright 2000, IEEE
Hausdorff distance matching
• Let M be an nxn binary template and N an nxn binary image we want to compare to that template
• H(M,N)) = max(h(M, N), h(N, M)) where
– || || is a distance function like the Euclidean distance function
• h(A,B) is called the directed Hausdorff distance.– ranks each point in A based on closeness to a point in B– most mis-matched point is measure of match– if h(A,B) = e, then all points in A must be within
distance e of B.– generally, h(A,B) <> h(B,A)– easy to compute Hausdorff distances from distance
transform
h(A,B)=maxa∈A
minb∈B
a−b
Haussdorf Distance Matching
Shape Context
• From Shape Matching and Object Recognition using Shape Context, by Belongie, Malik, Puzicha, IEEE PAMI (24), 2002
top related