![Page 1: Geometric Context from a Single Imagegrauman/courses/spring2009/slides/luis_pres.pdf · Motivation The goal is to recover a 3D \contextual frame" from a single image. Global scene](https://reader034.vdocument.in/reader034/viewer/2022043008/5f976ce038638a2e4f022e64/html5/thumbnails/1.jpg)
Geometric Context from a Single Image
Derek Hoiem Alexei A. Efros Martial Hebert
Carnegie Mellon University
February 26, 2009Presented by Luis Guimbarda
![Page 2: Geometric Context from a Single Imagegrauman/courses/spring2009/slides/luis_pres.pdf · Motivation The goal is to recover a 3D \contextual frame" from a single image. Global scene](https://reader034.vdocument.in/reader034/viewer/2022043008/5f976ce038638a2e4f022e64/html5/thumbnails/2.jpg)
Outline
1 IntroductionMotivationApproachObservations on the training/testing dataOverview of the Algorithm
2 Learning Segmentations and LabelsTraining DataGenerating Multiple SegmentationsTraining the Pairwise Affinity FunctionGeometric LabelingTraining the Label and Homogeneity Likelihood Functions
3 ResultsGeometric ClassificationImportance of Structure EstimationImportance of CuesObject DetectionAutomatic Single-View ReconstructionFailures
![Page 3: Geometric Context from a Single Imagegrauman/courses/spring2009/slides/luis_pres.pdf · Motivation The goal is to recover a 3D \contextual frame" from a single image. Global scene](https://reader034.vdocument.in/reader034/viewer/2022043008/5f976ce038638a2e4f022e64/html5/thumbnails/3.jpg)
Motivation
The goal is to recover a 3D “contextual frame” from a single image.
Global scene context is also important for object detection.12
1Antonio Torralba. Contextual priming for object detection. Int. J. Comput. Vision,53(2):169–191, July 2003
2A. Torralba, K. P. Murphy, and W. T. Freeman. Contextual models for objectdetection using boosted random fields. In Advances in Neural Information ProcessingSystems 17 (NIPS), pages 1401–1408, 2005
![Page 4: Geometric Context from a Single Imagegrauman/courses/spring2009/slides/luis_pres.pdf · Motivation The goal is to recover a 3D \contextual frame" from a single image. Global scene](https://reader034.vdocument.in/reader034/viewer/2022043008/5f976ce038638a2e4f022e64/html5/thumbnails/4.jpg)
Approach
3D geometry estimation is treated as a statistical learning problem.
The system models geometric classes that depend on the orientationof a physical scene.
For example, plywood lying on the ground and the same plywoodpropped by a board are in different geometric classes.
The geometric structure is built progressively.
![Page 5: Geometric Context from a Single Imagegrauman/courses/spring2009/slides/luis_pres.pdf · Motivation The goal is to recover a 3D \contextual frame" from a single image. Global scene](https://reader034.vdocument.in/reader034/viewer/2022043008/5f976ce038638a2e4f022e64/html5/thumbnails/5.jpg)
Observations on the training/testing data
Over 97% of pixels belonged to one of three geometric classes:
the ground planesurfaces roughly perpendicular to the groundsky
The camera axis was roughly parallel to the ground plane in most ofthe images.
![Page 6: Geometric Context from a Single Imagegrauman/courses/spring2009/slides/luis_pres.pdf · Motivation The goal is to recover a 3D \contextual frame" from a single image. Global scene](https://reader034.vdocument.in/reader034/viewer/2022043008/5f976ce038638a2e4f022e64/html5/thumbnails/6.jpg)
Observations on the training/testing data
3
3from Derek Hoiem’s presentation “Automatic Photo Popup”,http://www.cs.uiuc.edu/homes/dhoiem/presentations/index.html
![Page 7: Geometric Context from a Single Imagegrauman/courses/spring2009/slides/luis_pres.pdf · Motivation The goal is to recover a 3D \contextual frame" from a single image. Global scene](https://reader034.vdocument.in/reader034/viewer/2022043008/5f976ce038638a2e4f022e64/html5/thumbnails/7.jpg)
Overview of the AlgorithmRaw image
Every patch of an image isinduced by a surface withsome orientation in the realworld.
All available cues arenecessary to determine themost likely orientations.
![Page 8: Geometric Context from a Single Imagegrauman/courses/spring2009/slides/luis_pres.pdf · Motivation The goal is to recover a 3D \contextual frame" from a single image. Global scene](https://reader034.vdocument.in/reader034/viewer/2022043008/5f976ce038638a2e4f022e64/html5/thumbnails/8.jpg)
Overview of the AlgorithmSuperpixels
Each superpixel is assumedto belong to a singlegeometric class.
To estimate the orientationof large-scale surfaces, it’snecessary to compute morecomplex geometric featuresover large regions of theimage.
![Page 9: Geometric Context from a Single Imagegrauman/courses/spring2009/slides/luis_pres.pdf · Motivation The goal is to recover a 3D \contextual frame" from a single image. Global scene](https://reader034.vdocument.in/reader034/viewer/2022043008/5f976ce038638a2e4f022e64/html5/thumbnails/9.jpg)
Overview of the AlgorithmMultiple Hypotheses
A small number ofsegmentations from allpossible superpixelsegmentations are sampled.
The likelihood of eachsuperpixel label isdetermined.
![Page 10: Geometric Context from a Single Imagegrauman/courses/spring2009/slides/luis_pres.pdf · Motivation The goal is to recover a 3D \contextual frame" from a single image. Global scene](https://reader034.vdocument.in/reader034/viewer/2022043008/5f976ce038638a2e4f022e64/html5/thumbnails/10.jpg)
Overview of the AlgorithmGeometric Labels
There are 3 main geometriclabels:
groundverticalsky
And 5 subclasses of vertical:
left (�)center (�)right (�)porous (◯)solid (×)
![Page 11: Geometric Context from a Single Imagegrauman/courses/spring2009/slides/luis_pres.pdf · Motivation The goal is to recover a 3D \contextual frame" from a single image. Global scene](https://reader034.vdocument.in/reader034/viewer/2022043008/5f976ce038638a2e4f022e64/html5/thumbnails/11.jpg)
Overview of the AlgorithmFeatures
C1 captures the red,green and bluevalues, as expected
C2 represents the hueand “grayness” ofa pixel
T1-4 Derivative oforiented Gaussianfilters
![Page 12: Geometric Context from a Single Imagegrauman/courses/spring2009/slides/luis_pres.pdf · Motivation The goal is to recover a 3D \contextual frame" from a single image. Global scene](https://reader034.vdocument.in/reader034/viewer/2022043008/5f976ce038638a2e4f022e64/html5/thumbnails/12.jpg)
Training Data
300 publicly available images from the Internet
Images are often cluttered and span several environments.
Each image is over-segmented, and each segment is labeledaccording to its geometric class.
50 images are used to train the segmentation algorithm.
250 image are used to train and test the system using 5-fold crossvalidation.
![Page 13: Geometric Context from a Single Imagegrauman/courses/spring2009/slides/luis_pres.pdf · Motivation The goal is to recover a 3D \contextual frame" from a single image. Global scene](https://reader034.vdocument.in/reader034/viewer/2022043008/5f976ce038638a2e4f022e64/html5/thumbnails/13.jpg)
Generating Multiple Segmentations
An image is to be segmented into nr geometrically homogeneous(and not necessarily contiguous) regions.
The superpixels are shuffled.
The first nr superpixels are assigned to different regions.
Each of the remaining superpixels are iteratively assigned based on alearned pairwise affinity function.
The algorithm was run with nine different values for nr , rangingfrom 3 to 25.
![Page 14: Geometric Context from a Single Imagegrauman/courses/spring2009/slides/luis_pres.pdf · Motivation The goal is to recover a 3D \contextual frame" from a single image. Global scene](https://reader034.vdocument.in/reader034/viewer/2022043008/5f976ce038638a2e4f022e64/html5/thumbnails/14.jpg)
Training the Pairwise Affinity Function
Pairs of superpixels were sampled.
2500 same-label pairs2500 different-label pairs
The probability that two superpixels share a label given the absolutedifference of their feature vectors is derived:
P (yi = yj ∣ ∣xi − xj ∣)
![Page 15: Geometric Context from a Single Imagegrauman/courses/spring2009/slides/luis_pres.pdf · Motivation The goal is to recover a 3D \contextual frame" from a single image. Global scene](https://reader034.vdocument.in/reader034/viewer/2022043008/5f976ce038638a2e4f022e64/html5/thumbnails/15.jpg)
Training the Pairwise Affinity Function
The pairwise likelihood function is estimated using the logisticregression form of Adaboost4.
Each weak learner fm is based on the naive density estimates of theabsolute feature differences:
fm(x1,x2) =nf
∑i
logP (y1 = y2, ∣x1i − x2i ∣)P (y1 ≠ y2, ∣x1i − x2i ∣)
4A. Criminisi, I. Reid, and A. Zisserman. Single view metrology. InternationalJournal of Computer Vision, V40(2):123–148, November 2000
![Page 16: Geometric Context from a Single Imagegrauman/courses/spring2009/slides/luis_pres.pdf · Motivation The goal is to recover a 3D \contextual frame" from a single image. Global scene](https://reader034.vdocument.in/reader034/viewer/2022043008/5f976ce038638a2e4f022e64/html5/thumbnails/16.jpg)
Training the Pairwise Affinity Function
5
5from Derek Hoiem’s presentation “Automatic Photo Popup”,http://www.cs.uiuc.edu/homes/dhoiem/presentations/index.html
![Page 17: Geometric Context from a Single Imagegrauman/courses/spring2009/slides/luis_pres.pdf · Motivation The goal is to recover a 3D \contextual frame" from a single image. Global scene](https://reader034.vdocument.in/reader034/viewer/2022043008/5f976ce038638a2e4f022e64/html5/thumbnails/17.jpg)
Geometric Labeling
Each superpixel will belong to several regions, one per hypothesis.
The confidence of the superpixel label is the average label likelihoodof the regions containing it, weighted by the homogeneity likelihoods:
C(yi = v ∣x) =nh
∑j
P (yj = v ∣x,hji)P (hji ∣x)
![Page 18: Geometric Context from a Single Imagegrauman/courses/spring2009/slides/luis_pres.pdf · Motivation The goal is to recover a 3D \contextual frame" from a single image. Global scene](https://reader034.vdocument.in/reader034/viewer/2022043008/5f976ce038638a2e4f022e64/html5/thumbnails/18.jpg)
Training the Label and Homogeneity Likelihood Functions
Several segmented Hypotheses are generated as described above.
Each region is labeled with one of the main geometric classes or“mixed”.
Each region that is “vertical” is labeled with one of the verticalsubclasses or “mixed”.
![Page 19: Geometric Context from a Single Imagegrauman/courses/spring2009/slides/luis_pres.pdf · Motivation The goal is to recover a 3D \contextual frame" from a single image. Global scene](https://reader034.vdocument.in/reader034/viewer/2022043008/5f976ce038638a2e4f022e64/html5/thumbnails/19.jpg)
Training the Label and Homogeneity Likelihood Functions
The label likelihood function is learned as one-versus-many.
The homogeneity likelihood function is learned asmixed-versus-homogeneously labeled.
Both functions are learned using the logistic regression form ofAdaboost with weak learners based on eight-node decision trees6.
6J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statisticalview of boosting, 1998
![Page 20: Geometric Context from a Single Imagegrauman/courses/spring2009/slides/luis_pres.pdf · Motivation The goal is to recover a 3D \contextual frame" from a single image. Global scene](https://reader034.vdocument.in/reader034/viewer/2022043008/5f976ce038638a2e4f022e64/html5/thumbnails/20.jpg)
Training the Label and Homogeneity Likelihood Functions
7
7from Derek Hoiem’s presentation “Automatic Photo Popup”,http://www.cs.uiuc.edu/homes/dhoiem/presentations/index.html
![Page 21: Geometric Context from a Single Imagegrauman/courses/spring2009/slides/luis_pres.pdf · Motivation The goal is to recover a 3D \contextual frame" from a single image. Global scene](https://reader034.vdocument.in/reader034/viewer/2022043008/5f976ce038638a2e4f022e64/html5/thumbnails/21.jpg)
Training the Label and Homogeneity Likelihood Functions
8
8from Derek Hoiem’s presentation “Automatic Photo Popup”,http://www.cs.uiuc.edu/homes/dhoiem/presentations/index.html
![Page 22: Geometric Context from a Single Imagegrauman/courses/spring2009/slides/luis_pres.pdf · Motivation The goal is to recover a 3D \contextual frame" from a single image. Global scene](https://reader034.vdocument.in/reader034/viewer/2022043008/5f976ce038638a2e4f022e64/html5/thumbnails/22.jpg)
Geometric Classification
The overall accuracy formain geometric classes was86%.
The overall accuracy forvertical subclasses was 52%.
The difficulty of classifyingvertical subclasses is mostlydue to ambiguity of groundtruth labeling.
![Page 23: Geometric Context from a Single Imagegrauman/courses/spring2009/slides/luis_pres.pdf · Motivation The goal is to recover a 3D \contextual frame" from a single image. Global scene](https://reader034.vdocument.in/reader034/viewer/2022043008/5f976ce038638a2e4f022e64/html5/thumbnails/23.jpg)
Importance of Structure Estimation
Accuracy increases with the complexity of the intermediate structureestimation.
CPrior only class priors were usedLoc only pixel positions were used
Pixel only pixel-level colors and textures were usedSPixel all features are used at superpixel-levelOneH only used a single 9-segmented hypothesis
MultiH used the full multi-hypothesis framework
![Page 24: Geometric Context from a Single Imagegrauman/courses/spring2009/slides/luis_pres.pdf · Motivation The goal is to recover a 3D \contextual frame" from a single image. Global scene](https://reader034.vdocument.in/reader034/viewer/2022043008/5f976ce038638a2e4f022e64/html5/thumbnails/24.jpg)
Importance of Cues
Location features havethe strongest effect onthe system’s accuracy.
Location featuresaren’t sufficient forclassification.
![Page 25: Geometric Context from a Single Imagegrauman/courses/spring2009/slides/luis_pres.pdf · Motivation The goal is to recover a 3D \contextual frame" from a single image. Global scene](https://reader034.vdocument.in/reader034/viewer/2022043008/5f976ce038638a2e4f022e64/html5/thumbnails/25.jpg)
Object Detection
Using a local detector9 that uses GentleBoost to form a classifierbased on fragment templates to detect multiple-oriented cars on thePASCAL10 training set, sans grayscale images.
One version of the system only used 500 local features, while theother added 40 contextual features form the geometric context.
9Kevin P. Murphy, Antonio B. Torralba, and William T. Freeman. Graphical modelfor recognizing scenes and objects. In Sebastian Thrun, Lawrence K. Saul, andBernhard Schlkopf, editors, NIPS. MIT Press, 2003
10The pascal object recognition database collection, Website, PASCAL ChallengesWorkshop, 2005, http://www.pascal-network.org/challenges/VOC/.
![Page 26: Geometric Context from a Single Imagegrauman/courses/spring2009/slides/luis_pres.pdf · Motivation The goal is to recover a 3D \contextual frame" from a single image. Global scene](https://reader034.vdocument.in/reader034/viewer/2022043008/5f976ce038638a2e4f022e64/html5/thumbnails/26.jpg)
Object Detection
![Page 27: Geometric Context from a Single Imagegrauman/courses/spring2009/slides/luis_pres.pdf · Motivation The goal is to recover a 3D \contextual frame" from a single image. Global scene](https://reader034.vdocument.in/reader034/viewer/2022043008/5f976ce038638a2e4f022e64/html5/thumbnails/27.jpg)
Automatic Single-View Reconstruction
The automatically generated 3D model is comparable to themanually specified model11.
11D. Liebowitz, A. Criminisi, and A. Zisserman. Creating architectural models fromimages. Computer Graphics Forum, pages 39–50, September 1999
![Page 28: Geometric Context from a Single Imagegrauman/courses/spring2009/slides/luis_pres.pdf · Motivation The goal is to recover a 3D \contextual frame" from a single image. Global scene](https://reader034.vdocument.in/reader034/viewer/2022043008/5f976ce038638a2e4f022e64/html5/thumbnails/28.jpg)
FailuresReflection Failures
12
12from Derek Hoiem’s presentation “Automatic Photo Popup”,http://www.cs.uiuc.edu/homes/dhoiem/presentations/index.html
![Page 29: Geometric Context from a Single Imagegrauman/courses/spring2009/slides/luis_pres.pdf · Motivation The goal is to recover a 3D \contextual frame" from a single image. Global scene](https://reader034.vdocument.in/reader034/viewer/2022043008/5f976ce038638a2e4f022e64/html5/thumbnails/29.jpg)
FailuresShadow Failures
13
13from Derek Hoiem’s presentation “Automatic Photo Popup”,http://www.cs.uiuc.edu/homes/dhoiem/presentations/index.html
![Page 30: Geometric Context from a Single Imagegrauman/courses/spring2009/slides/luis_pres.pdf · Motivation The goal is to recover a 3D \contextual frame" from a single image. Global scene](https://reader034.vdocument.in/reader034/viewer/2022043008/5f976ce038638a2e4f022e64/html5/thumbnails/30.jpg)
FailuresCatastrophic Failures
14
14from Derek Hoiem’s presentation “Automatic Photo Popup”,http://www.cs.uiuc.edu/homes/dhoiem/presentations/index.html
![Page 31: Geometric Context from a Single Imagegrauman/courses/spring2009/slides/luis_pres.pdf · Motivation The goal is to recover a 3D \contextual frame" from a single image. Global scene](https://reader034.vdocument.in/reader034/viewer/2022043008/5f976ce038638a2e4f022e64/html5/thumbnails/31.jpg)
[1] A. Criminisi, I. Reid, and A. Zisserman. Single view metrology.International Journal of Computer Vision, V40(2):123–148,November 2000.
[2] J. Friedman, T. Hastie, and R. Tibshirani. Additive logisticregression: a statistical view of boosting, 1998.
[3] D. Liebowitz, A. Criminisi, and A. Zisserman. Creating architecturalmodels from images. Computer Graphics Forum, pages 39–50,September 1999.
[4] Kevin P. Murphy, Antonio B. Torralba, and William T. Freeman.Graphical model for recognizing scenes and objects. In SebastianThrun, Lawrence K. Saul, and Bernhard Schlkopf, editors, NIPS. MITPress, 2003.
[5] A. Torralba, K. P. Murphy, and W. T. Freeman. Contextual modelsfor object detection using boosted random fields. In Advances inNeural Information Processing Systems 17 (NIPS), pages 1401–1408,2005.
[6] Antonio Torralba. Contextual priming for object detection. Int. J.Comput. Vision, 53(2):169–191, July 2003.