![Page 1: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/1.jpg)
Context in vision
Antonio Torralba
![Page 2: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/2.jpg)
The goal
Bookshelf
Desk
Screen
Office scene
![Page 3: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/3.jpg)
Why object detection is a hard problem
viewpoints
Need to detect Nclasses * Nviews * Nstyles, in clutter.Lots of variability within classes, and across viewpoints.
Object classes
Styles, lighting conditions, etc, etc, etc…
![Page 4: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/4.jpg)
Where is the field of computer vision?There are efficient solutions for • Detecting few single object categories:
Lowe, 1999
• Detecting particular objects:.
But the problem of multi-class and multi-view object detection in a scene with clutter is still largely unsolved.
• Recognizing objects in isolation
From Leibe & Schiele, 2003
![Page 5: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/5.jpg)
The ingredients
• Object representations• Scene representations
• Classifiers• Graphical models
• Object features• Scene features
![Page 6: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/6.jpg)
OBJECTS
![Page 7: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/7.jpg)
Object representations
Models• Constellations of parts • Holistic representations
– Shape-appearance models• Shapes, silohuetes• 3D models
![Page 8: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/8.jpg)
Object representations
Features• Pixel intesities• Patches• SIFT • Basic geometric forms (Geons, quadrics)
![Page 9: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/9.jpg)
Learning representations
• Generative models• Discriminative models
![Page 10: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/10.jpg)
Shape-appearance models
• Idea
• Features– Pixel intensities
• Representation– Subspace model of shape and appearance
variations– Generative model
AAM = T. F. Cootes, C.J. Taylor, G. J. EdwardsMorphable models = Blanz, T. Vetter
![Page 11: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/11.jpg)
Shape-appearance models
AAM = T. F. Cootes, C.J. Taylor, G. J. EdwardsMorphable models = Blanz, T. Vetter
![Page 12: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/12.jpg)
Originalimage + s1 + s2 + s3 +=
+ t1 + t2 + t3=
shape
texture
Shape-appearance models
AAM uses an additional PCA, to reduce redundancy between texture and shape.
= ImageWarp ( , , , )
Originalimage
Meanshape
Shapefree textureshape zeros
ts
![Page 13: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/13.jpg)
Constelation models
• Idea
• Features– Intensities, patches, SIFT features.
• Representation– Parts base representation.
AAM = T. F. Cootes, C.J. Taylor, G. J. EdwardsMorphable models = Blanz, T. Vetter
![Page 14: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/14.jpg)
Constelations of parts
Slide from Perona 2005
![Page 15: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/15.jpg)
SIFT features
![Page 16: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/16.jpg)
Invariant Local Features• Image content is transformed into local feature
coordinates that are invariant to translation, rotation, scale, and other imaging parameters
SIFT Features Slide from David Lowe
![Page 17: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/17.jpg)
Build Scale-Space Pyramid• All scales must be examined to identify scale-
invariant features• An efficient function is to compute the
Difference of Gaussian (DOG) pyramid (Burt & Adelson, 1983) Blur
Res ample
Subtra ct
Blur
Res ample
Subtra ct
Blur
Resample
SubtractSlide from David Lowe
![Page 18: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/18.jpg)
Key point localization
Blur
Res ample
Subtra ct
• Detect maxima and minima of difference-of-Gaussian in scale space
Slide from David Lowe
![Page 19: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/19.jpg)
Select dominant orientation
0 2π
• Create histogram of local gradient directions computed at selected scale
• Assign canonical orientation at peak of smoothed histogram
Slide from David Lowe
![Page 20: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/20.jpg)
SIFT vector formation• Thresholded image gradients are sampled over
16x16 array of locations in scale space• Create array of orientation histograms• 8 orientations x 4x4 histogram array = 128
dimensions
Slide from David Lowe
![Page 21: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/21.jpg)
Invariant Local Features
Lowe, 1999
• Detecting particular objects:.
![Page 22: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/22.jpg)
Segmentation driven
• Idea– Avoid scaning and reduce number of
candidates
• Features– Blobs and image regions
• Representation– An image is an arrangement of regions
![Page 23: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/23.jpg)
Segmentation-recognition
Slide from Duygulu, 04 P. Duygulu, K. Barnard, N. de Freitas, D. Forsyth. ECCV 02
![Page 24: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/24.jpg)
Discriminative approach
• Idea
• Features– Pixel intensities, wavelets, patches
• Representation– Any of the representations before
![Page 25: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/25.jpg)
• Graded Learning for Object Detection - Fleuret, Geman (1999) • Robust Real-time Object Detection - Viola, Jones (2001)
Cascade of classifiers
Features: stumps, inspired from haar wavelets
Cascade: classifiers of increasing complexity. Low miss rate.
![Page 26: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/26.jpg)
Short introduction to Boosting
![Page 27: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/27.jpg)
Why use boosting?
• Creates very accurate, very fast classifiers.
• Training is fast and easy to implement.• Can handle high-dimensional data
(stumps perform feature selection).• Robust to overfitting (implicitly
maximizes margin).
![Page 28: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/28.jpg)
Boosted decision trees• “Best off-the-shelf classifier in the world”
– Leo Breiman, 1998• 1 node tree = “stump”
• Can be used for feature selection.• Pick best dimension d and threshold φ by
exhaustive search.• Pick best slope a and offset b using weighted
least squares.
![Page 29: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/29.jpg)
Additive models for classification
+1/-1 classificationclasses
feature responses
hm(v,c) is a weak classifier (performs better than chance)
H (v,c) is the strong classifier obtained as a sum of weak classifiers
![Page 30: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/30.jpg)
Example of weak classifier (stumps)
θ
a
b vf
hm (v)
A decision stump is a threshold on a single feature
Each decision stump has 4 parameters: {f, θ, a, b} f = template index (selected among a dictionary of 2000 templates)θ = Threshold, a,b = average class value (-1, +1) at each side of the threshold
![Page 31: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/31.jpg)
Flavors of boosting
• Different boosting algorithms use different lossfunctions or minimization procedures (Freund & Shapire, 1995; Friedman, Hastie, Tibshhirani, 1998).
• We base our approach on Gentle boosting: learns faster than others(Friedman, Hastie, Tibshhirani, 1998;Lienahart, Kuranov, & Pisarevsky, 2003).
![Page 32: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/32.jpg)
Multi-class Boosting
We use the exponential multi-class cost functionclasses
classifier output for class c
membership in class c,+1/-1
cost function
Freund & Shapire, 1995; Friedman, Hastie, Tibshhirani, 1998
![Page 33: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/33.jpg)
Weak learners are shared
At each boosting round, we add a perturbation or “weak learner”which is shared across some classes:
We add the weak classifier that provides the best reduction of the exponential cost
( )
Freund & Shapire, 1995; Friedman, Hastie, Tibshhirani, 1998
![Page 34: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/34.jpg)
Use Newton’s method to select weak learners
Treat hm as a perturbation, and expand loss J to second order in hm
classifier with perturbation
squared error
reweighting
cost function
Freund & Shapire, 1995; Friedman, Hastie, Tibshhirani, 1998
![Page 35: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/35.jpg)
Multi-class Boosting
weight squared errorWeight squared error over training data
Freund & Shapire, 1995; Friedman, Hastie, Tibshhirani, 1998
![Page 36: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/36.jpg)
Demo Boosting for object detection
![Page 37: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/37.jpg)
Summary
1) Object representation based on local features:
![Page 38: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/38.jpg)
Summary
2) Search strategy:
Localfeatures no car
ClassifierP( car | vp )
Vp
no cowClassifier
P( cow | vp )
no personClassifier
P(person | vp )
…
Agarwal & Roth, (02), Moghaddam, Pentland (97), Turk, Pentland (91),Vidal-Naquet, Ullman, (03)
Heisele, et al, (01), Agarwal & Roth, (02), Kremp, Geman, Amit (02), Dorko, Schmid, (03)Fergus, Perona, Zisserman (03), Fei Fei, Fergus, Perona, (03), Schneiderman, Kanade (00), Lowe (99)Etc.
![Page 39: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/39.jpg)
SCENES
![Page 40: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/40.jpg)
Try to find the face in this image
![Page 41: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/41.jpg)
The search space is huge“Like finding needles in a haystack”
For each object:
x
- Error prone (classifier must have very low false positive rate)
- Slow (many patches to examine)
scale
1,000,000 images/day
10,000 patches/object/image
- Need to search over locationsand scales
y
![Page 42: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/42.jpg)
Local features are not even sufficient
![Page 43: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/43.jpg)
The multiple personalities of a blob
![Page 44: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/44.jpg)
The multiple personalities of a blob
![Page 45: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/45.jpg)
The multiple personalities of a blob
![Page 46: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/46.jpg)
The multiple personalities of a blob
![Page 47: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/47.jpg)
Not everything fits inside a rectangle
• e.g., detecting irregularly-shaped “stuff”– Grass, trees, roads, building facades
• e.g., detecting non-rigid/ articulated/ “wiry”things– - people, chairs, desk lamps
Source: MIT-CSAIL database of Objects and Scenes
![Page 48: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/48.jpg)
Looking outside the box
PartsGlobal appearance
Inside the object(intrinsic features)
Outside the object(contextual features)
Object size
PixelsGlobal context Local context
Kruppa & Shiele, (03), Fink & Perona (03)
Carbonetto, Freitas, Barnard (03), Kumar, Hebert, (03)
He, Zemel, Carreira-Perpinan (04), Moore, Essa, Monson, Hayes (99)
Strat & Fischler (91), Murphy, Torralba & Freeman (03)
Agarwal & Roth, (02), Moghaddam, Pentland (97), Turk, Pentland (91),Vidal-Naquet, Ullman, (03)
Heisele, et al, (01), Agarwal & Roth, (02), Kremp, Geman, Amit (02), Dorko, Schmid, (03)Fergus, Perona, Zisserman (03), Fei Fei, Fergus, Perona, (03), Schneiderman, Kanade (00), Lowe (99)Etc.
![Page 49: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/49.jpg)
What is visual scene context?
• A specific scene category (a coffeemaker is usually in a kitchen)
• The structure of the scene background (a chair is on the ground, not the ceiling)
• A combination of objects of shapes (TV+sofa+rug+bookshelf = living-room)
• Spatial relationships between shapes
![Page 50: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/50.jpg)
Scene Context and Object Consistencies• Biederman et al (82) proposed that five classes of relations exist
between an object and its scene background:• (1) Interposition (object interrupts their background)• (2) Support (objects tend to rest on surfaces)• (3) Probability (objects tend to be found in some scenes but not
others)• (4) Position (given an object is probable in a scene, it often is
found in position but not others)• (5) Familiar size (objects have a limited set of size relations with
other objects)
![Page 51: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/51.jpg)
Object Consistencies
Biederman et al (1982), DeGraef(1990).
![Page 52: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/52.jpg)
Object Consistencies
Examples of inconsistencies
Biederman et al (1982), DeGraef(1990).
![Page 53: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/53.jpg)
Rapid scene processing
• Conceptual information about a picture is available with a glimpse of > 100 ms (M. Potter)
• Scene processing can be quickly done without much object information (Schyns & Oliva, 1994)
![Page 54: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/54.jpg)
Object primingInconsistent object
Consistent object
Increasing contextual information
Torralba, Sinha, Oliva, VSS 2001
![Page 55: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/55.jpg)
Object priming
Torralba, Sinha, Oliva, VSS 2001
![Page 56: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/56.jpg)
Why is context important?• Changes the interpretation of an object (or its function)
• Context defines what an unexpected event is
![Page 57: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/57.jpg)
Why is context important?• Reduces the search space
• Context features can be shared among many objects across locations and scales: more efficient than local features.
![Page 58: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/58.jpg)
Context models
VC
VL
The problem: how to represent context?
VC might have a very high dimensionality. There are as many ways of breaking down the dimensionality of VC as there are possible definitions of contextual representations.
How far can we go without object detectors?
![Page 59: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/59.jpg)
Previous work on context• Strat & Fischler (91) Context defined using hand-written rules about relationships between objects
• Torralba & Sinha (01), Torralba (03)Global context to predict objects.
• Fink & Perona (03)Use boosting incorporating the output of multiple detectors to generate
contextual weak-classifiers.
• Murphy, Torralba & Freeman (03)Use graphical models to represent the relation between global context and
objects.
• Carbonetto, Freitas & Barnard (04)They extend the work on “words and images” by adding spatial consistency between
labels.
• He, Zemel & Carreira-Perpinan (04)Use dense connectivity for incorporating spatial context using Multiscale conditional
random fields.
![Page 60: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/60.jpg)
Previous work on context• Strat & Fischler (91) Context defined using hand-written rules about relationships between
objects
![Page 61: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/61.jpg)
Previous work on context
• Fink & Perona (03)Use output of boosting from other objects at previous
iterations as input into boosting for this iteration
![Page 62: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/62.jpg)
Previous work on context
• Murphy, Torralba & Freeman (03)Use global context to predict objects but there is no
modeling of spatial relationships between objects.
Op1,c1
vp1,c1
OpN,c1
vpN,c1. . .
Op1,c2
vp1,c2
OpN,c2
vpN,c2. . .
Class 1 Class 2E1 E2
S
c2maxVc1
maxV
X1 X2vg
Keyboards
![Page 63: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/63.jpg)
Previous work on context
• Carbonetto, de Freitas & Barnard (04)• Enforce spatial consistency between labels using
MRF
![Page 64: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/64.jpg)
Previous work on context
• He, Zemel & Carreira-Perpinan (04)Use latent variables to induce long distance correlations
between labels in a Conditional Random Field (CRF)
![Page 65: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/65.jpg)
How do we exploit relationships between parts/ wholes
to overcome local ambiguity?
![Page 66: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/66.jpg)
Use probabilistic graphical models!
![Page 67: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/67.jpg)
What is a graphical model?• Nodes = random variables
– Shaded = observed– Clear = hidden
• Arcs = (soft) constraints• Bayes nets are a special case
• Goal of inference: state estimation
• Goal of learning: parameter estimationV1
V4
V3 V’3
H1
H2H4
H3
![Page 68: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/68.jpg)
Including scene-context for object detection
Ec = Exists object c anywherein image?
Op1,c1
vp1,c1
OpN,c1
vpN,c1. . .
Op1,c2
vp1,c2
OpN,c2
vpN,c2. . .
Class 2Class 1
E1 E2
Op,c = Object c in patch p?
Yes No
Vp,c = Features for class c in patch p
![Page 69: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/69.jpg)
Symptoms of local features only
Some false alarmsoccur in image regionsin which is impossiblefor the target to be presentgiven the context.
![Page 70: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/70.jpg)
Symptoms of local features onlyLow probability of keyboard presence
High probability of keyboard presence
![Page 71: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/71.jpg)
The system does not care about the scene, but we do…
We know there is a keyboard present in this scene even if we cannot see it clearly.
We know there is no keyboard present in this scene
… even if there is one indeed.
![Page 72: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/72.jpg)
Including scene-context for object detection
Op1,c1
vp1,c1
OpN,c1
vpN,c1. . .
Op1,c2
vp1,c2
OpN,c2
vpN,c2. . .
Class 1 Class 2E1 E2
SS = scene (category: street, office, corridor, …)
Ec = Exists object c anywherein image?
Op,c = Object c in patch p?
Vp,c = Features for class c in patch p
![Page 73: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/73.jpg)
Including scene-context for object detection
Op1,c1
vp1,c1
OpN,c1
vpN,c1. . .
Op1,c2
vp1,c2
OpN,c2
vpN,c2. . .
Class 1 Class 2E1 E2
vg
S
Ec = Exists object c anywherein image?
Op,c = Object c in patch p?
Vp,c = Features for class c in patch p
S = scene (category: street, office, corridor, …)
Vg = global image features
![Page 74: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/74.jpg)
Local and Global featuresA set of local features describes image properties at one particularlocation in the image:
Jet of local orientations and scales
>
A set of global features provides information about the globalimage structure without encoding specific objects
+
This feature likes images with vertical structures at the top part and horizontal texture at the bottom part (this is a typical composition of an empty street)
![Page 75: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/75.jpg)
Computing the global scene features
• Pipe image through steerable filter bank (here we use 6 orientations, 4 scales)
• Compute magnitude of filter outputs• Downsample to 4 x 4 each scale/orientation• PCA to 80 dimensions
| vt | PCA vG
Oliva, Torralba. IJCV 2001
![Page 76: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/76.jpg)
Global features
The representation preserves:Low resolution structurePhase is only preserved for very low spatial frequencies (2 cycles/image)
64 global features
![Page 77: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/77.jpg)
Goal• To build a system that knows where it is
• That recognizes the main objects in the scene
• That can work on new environments
• Robust to user
![Page 78: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/78.jpg)
Our mobile rig, version 1
Kevin MurphyTorralba, Murphy, Freeman, Rubin, ICCV 2003; Murphy, Torralba, Freeman, NIPS 2003
![Page 79: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/79.jpg)
Our mobile rig, version 2
Torralba, Murphy, Freeman, Rubin, ICCV 2003; Murphy, Torralba, Freeman, NIPS 2003
![Page 80: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/80.jpg)
Training for scene recognition
officeScene categorization:
street corridor
3 categoriesPlace identification:
‘Draper’ StreetOffice 610 Office 615
…62 places
![Page 81: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/81.jpg)
Scene classifier
vg
S
Discriminative(boosting)
ug
S
Generative(mixture of Gaussians)
![Page 82: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/82.jpg)
Corridor recognition
0 100 200 300 4000
50
100
150
200
250
300
350
400Targets=400, Distractors=2806
Number false alarms
boosting
gaussian
mis
ses
Cor
rect
reco
gniti
on
![Page 83: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/83.jpg)
Office recognition
0 200 400 6000
50
100
150
200
250
300
350
400 Targets=400, Distractors=2940
Cor
rect
reco
gniti
onm
isse
s boosting
gaussian
Number false alarms
![Page 84: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/84.jpg)
Temporal context helps
= ?
![Page 85: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/85.jpg)
Temporal context helps
![Page 86: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/86.jpg)
Place and object recognition
p( ot, qt | v1:t )
ObjectsLocation
Image sequence
![Page 87: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/87.jpg)
Place and object recognition
p( ot, qt | v1:t ) = p( ot, qt | v1:t , vG1:t) α
p( ot | qt , v1:t ) P( qt | v1:t ) G
LocationContext features
![Page 88: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/88.jpg)
Hidden Markov Model
p( ot, qt | v1:t ) α
p( ot | qt , v1:t ) P( qt | v1:t ) G
LocationContext features
P( qt | v1:t ) α p( vt | qt ) P(qt | q’t-1) P( q’t-1 | v1:t-1 ) Σq’
Probabilityfor each location
We use a HMM to estimate the location recursively:G G G
Transitionmatrix
(encodes topology)
Observationlikelihood
Previousestimation
![Page 89: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/89.jpg)
Hidden Markov ModelWe use 17 annotated sequences for training
Office 610 Corridor 6b Corridor 6c Office 617
• Hidden states = location (63 values)• Observations = vG
t (80 dimensions)• Transition matrix encodes topology of
environment• Observation model is a mixture of
Gaussians centered on prototypes (100 views per place)
![Page 90: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/90.jpg)
Temporal classifier
Discriminative(1D CRF)
Generative(HMM)
St
vtg
StSt-1
utg
St-1
ut-1g
RtRt-1
vt-1g
RtRt-1Room-name
Scene-type
Torralba, Murphy, Freeman, Rubin, ICCV 03
![Page 91: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/91.jpg)
Place recognition demo
VC
stist-1
i
Input image (120x160) Shows the category and the identity of The place when the system is confident.Runs at 4 fps on Matlab.
![Page 92: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/92.jpg)
Specific location
Location category
Indoor/outdoor
Ground truthSystem estimate
Identification and categorization of known places
Building 400 Outdoor AI-lab
Frame number
![Page 93: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/93.jpg)
Identification and categorization of new places
Category
Identification
frame
![Page 94: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/94.jpg)
Predicting the presence of an object
E
S
Φ(E, S) can be estimated by counting co-occurrences in labeled images
Office
p ed e
s tr i
a n
b oo k
s hel
f
c ar
c ha i
r
coff
ee m
/c
d es k
k ey b
o ard
mo u
s esc
r ee n
CorridorStreet
p ed e
s tr i
a n
b oo k
s hel
f
c ar
c ha i
r
coff
ee m
/c
d es k
k ey b
o ard
mo u
s esc
r ee n
p ed e
s tr i
a n
b oo k
s hel
f
c ar
c ha i
r
coff
ee m
/c
d es k
k ey b
o ard
mo u
s esc
r ee n
“Cars are likely in streets, but not in offices or corridors”
![Page 95: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/95.jpg)
Predicting the presence of an object
Placerecognition
Objectpredictions
Groundtruth forobject presence
At each placeit is not necessary toconsider all possibleobjects for detection.
indoor outdoor
![Page 96: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/96.jpg)
Combining scene Top-down predictions with detectors bottom-up signal
Op1,c1
vp1,c1
OpN,c1
vpN,c1. . .
Op1,c2
vp1,c2
OpN,c2
vpN,c2. . .
Class 1 Class 2E1 E2
vg
SS = scene (category: street, office, corridor, …)
We use a CRF
Ec = Exists object c anywherein image?
Op,c = Object c in patch p?
Vp,c = Features for class c in patch p
Vg = global image features
![Page 97: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/97.jpg)
Application of object detection for image retrieval
Results using the keyboard detector alone
Lowprobability
Highprobability
Results using both the keyboard detector and the global scene features
Lowprobability
Highprobability
![Page 98: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/98.jpg)
Application of object detection for image retrieval
Results using the car detector alone
Lowprobability
Highprobability
Results using both the car detector and the global scene features
Lowprobability
Highprobability
![Page 99: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/99.jpg)
Application of object detection for image retrieval
Detecting the coffee machine:
Without context
With context
![Page 100: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/100.jpg)
Global features can predict expected locations/scales of objects before
running detectors
PedestriansKeyboards
There is a relationship between the aspect of the objects in a scene, and the aspect of the scene itself. For instance, the point of view of cars is correlated with the orientation of the street. But also, the location of the ground in the scene is correlated with the location of the objects in the scene.
![Page 101: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/101.jpg)
Op1,c1
vp1,c1
OpN,c1
vpN,c1. . .
Op1,c2
vp1,c2
OpN,c2
vpN,c2. . .
Class 1 Class 2E1 E2
S
Op,c = Object c in patch p?
Ec = Exists object c anywherein image?
Vp,c = Features for class c in patch p
S = scene (category: street, office, corridor, …)
Global scene features predicts location
X1 X2Vg = global image features
Xc = expected location of class c
vg
![Page 102: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/102.jpg)
Op1,c1 OpN,c1
. . .
Op1,c2 OpN,c2
. . .
vg
X1 X2
Class 1 Class 2
Op,c = Object c in patch p?
Vg = global image features
Xc = expected location of class c
Global scene features predicts location
![Page 103: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/103.jpg)
vgX
Vg ,1{ }X1
Vg ,2{ }X2
Vg ,3{ }X3
Vg ,4{ }X4
Training set (cars)
X = Σ hm(Vg)
Minimize E[(xtrue – x)2]
We use boosting for regression.hm are regression stumps.
1) We learn the mapping between imageglobal features and object location asa regression problem:
Global scene features predicts location
(We do the regression for the horizontal and verticalComponents, and for scale)
…
![Page 104: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/104.jpg)
vgX
Vg ,1{ }X1
Vg ,2{ }X2
Vg ,3{ }X3
Vg ,4{ }X4
Op,c OpN,c
. . .
x = Σ hm(Vg)
P(Op,c | x) = σ (wT [1 ||xp– x||2 ])
2) We fit a logistic function to compute theprobability of object presence in a patch p given the expected location x:
Training set (cars)
Global scene features predicts location
…
![Page 105: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/105.jpg)
vg X
x = Σ hm(Vg)
Given a new scene, we can predict the most expected location of an objectbased on the global features of the image
Global scene features predicts location
Results for predicting the vertical location of cars
Estimated Y
True
Y
Results for predicting the horizontal location of cars
Estimated X
True
X
Scenes are arrangedon horizontal layers.We can predict the vertical component (ground level)but the horizontal component is poorly constrained by the global scene.
![Page 106: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/106.jpg)
Input Image
Region of the image likely to contain cars conditional on the scene (global features: Vg)
1. Compute global scene features
2. Compute location regression
3. Logistic classification
Global scene features predicts location
Op1,c2 OpN,c2
. . .
vg X2
![Page 107: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/107.jpg)
Full system
Op1,c1
vp1,c1
OpN,c1
vpN,c1. . .
Op1,c2
vp1,c2
OpN,c2
vpN,c2. . .
Class 1 Class 2E1 E2
SS = scene (category: street, office, corridor, …)
Ec = Exists object c anywherein image?
Op,c = Object c in patch p?
Vg = global image features
Vp,c = Features for class c in patch p
X1 X2Xc = expected location of class c
vg
![Page 108: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/108.jpg)
The strength of contextLets see how far can we get in object detection and localization without usingdetectors at all.
Op1,c1 OpN,c1. . . Op1,c2 OpN,c2. . .
Class 1 Class 2E1 E2
SS = scene
Op,c = Object c in patch p?
Ec = Exists object c anywherein image?
Vg = global image features
X1 X2Xc = expected location of class c
vg
![Page 109: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/109.jpg)
The strength of context
Op1,c2 OpN,c2
Scene
vg
Op1,c2 OpN,c2
Keyboard? Car?
X1 X2
No temporal integration. Every frame is processed independently from the previous one.
![Page 110: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/110.jpg)
The two sources of information and the final system
Integration of global andlocal features
Global scene analysisLocal scene analysis
Op1,c1
vp1,c1
OpN,c1
vpN,c1. . .
Op1,c2
vp1,c2
OpN,c2
vpN,c2. . .
Class 1 Class 2E1 E2
vg
S
c2maxVc2maxVc1
maxVc1maxV
X1 X2
Op1,c1 OpN,c1. . . Op1,c2 OpN,c2. . .
Class 1 Class 2E1 E2
S
X1 X2vg
Op1,c1
vp1,c1
OpN,c1
vpN,c1. . .
Op1,c2
vp1,c2
OpN,c2
vpN,c2. . .
Class 1 Class 2
E1 E2
c2maxV c2maxVc1
maxVc1maxV
![Page 111: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/111.jpg)
Context-based vision system for place and object recognition
Placerecognition
Scenecategorization
Objectdetection
![Page 112: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/112.jpg)
Learning joint object models
![Page 113: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/113.jpg)
Multiclass object detection• We want to recognize many object classes with efficient algorithms:
(Torralba, Murphy, Freeman, CVPR 04)
• We want to use contextual relationships between objects(Torralba, Murphy, Freeman, NIPS 04)
![Page 114: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/114.jpg)
A more complete model of context
VLn
s11 s1
n
VL1
…
street
s21 s2
n…
VG
s31 s3
n…
office…
Torralba, Murphy, Freeman, NIPS 04
![Page 115: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/115.jpg)
Image database
• ~2500 hand labeled images with segmentations
• ~30 objects and stuff
• Indoor and outdoor
• Sets of images are separated by locations and camera (digital/webcam)
![Page 116: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/116.jpg)
Detecting difficult objectsThere is a whole range of difficulties for the task of object detection:
Average percentage ofpixels occupiedby each object.
![Page 117: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/117.jpg)
Detecting difficult objects
Maybethere is a mouse
Office
Start recognizing the scene
![Page 118: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/118.jpg)
Detecting difficult objects
Detect first simple objects (reliable detectors) that provide strongcontextual constraints to the target (screen -> keyboard -> mouse)
![Page 119: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/119.jpg)
Segmenting difficult objects
Detect first simple objects (reliable detectors) that provide strongcontextual constraints to the target (screen -> keyboard -> mouse)
![Page 120: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/120.jpg)
Learning local features(First we need some intrinsic object features)
VLn
s11 s1
n
VL1
…s2
1 s2n
…s3
1 s3n
… building
roadcar
Pixels
We maximize the probability of the true labels using Boosting.
![Page 121: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/121.jpg)
Fragments for class-specific segmentation
Source: Borenstein & Ullman, ECCV’02
![Page 122: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/122.jpg)
Object local features(Borenstein & Ullman, ECCV 02)
*
Convolve with oriented filter
Threshold
*
Convolve withsegmentation
fragment
x
Normalized correlation with an object patch
Patches from5x5 to 30x30 pixels.
![Page 123: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/123.jpg)
Object local features(Borenstein & Ullman, ECCV 02)
*
Convolve with oriented filter
Threshold
*
Convolve withsegmentation
fragment
x
Normalized correlation with an object patch
Patches from5x5 to 30x30 pixels.
![Page 124: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/124.jpg)
Results with local featuresWe use Boosting to build a classifier:
![Page 125: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/125.jpg)
Results with local features
Screen
![Page 126: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/126.jpg)
Results with local features
Car
![Page 127: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/127.jpg)
Adding correlations between objects
VLn
s11 s1
n
VL1
…s2
1 s2n
…
VG
s31 s3
n…
We need to learn
• The structure of the graph
• The pairwise potentials
![Page 128: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/128.jpg)
Previous work on joint object modeling• Strat & Fischler (91) Context defined using hand-written rules about relationships between objects
• Torralba & Sinha (01) Global context to predict objects.
• Fink & Perona (03)Use boosting incorporating the output of multiple detectors to generate
contextual weak-classifiers.
• Murphy, Torralba & Freeman (03)Use graphical models to represent the relation between global context and
objects.• Carbonetto, Freitas & Barnard (04)They extend the work on “words and images” by adding spatial consistency between
labels.
• He, Zemel & Carreira-Perpinan (04)Use dense connectivity for incorporating spatial context using Multiscale conditional
random fields.
![Page 129: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/129.jpg)
Learning in conditional random fields
• Parameters– Lafferty, McCallum, Pereira (ICML 2001)
• Find global optimum using gradient methods plus exact inference (forwards-backwards) in a chain
– Kumar & Herbert, NIPS 2003• Use pseudo-likelihood in 2D CRF
– Carbonetto, de Freitas & Barnard (04) • Use approximate inference (loopy BP) and pseudo-likelihood on
2D MRF
• Structure– He, Zemel & Carreira-Perpinan (CVPR 04)
• Use contrastive divergence– Torralba, Murphy, Freeman (NIPS 04)
• Use boosting
![Page 130: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/130.jpg)
Graphical models for vision
Densely connected graphs with low informative connections
![Page 131: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/131.jpg)
Sequentially learning the structure
Final output
Iteration
![Page 132: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/132.jpg)
Sequentially learning the structureAt each iteration of boosting
•We pick a weak learner applied to the image(local or global features)
•We pick a weak learner applied to a subset of the label-beliefs at the previous iteration. These subsets are chosen from a dictionary of labeled graph fragments from the training set.
![Page 133: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/133.jpg)
Car detection
![Page 134: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/134.jpg)
Screen/keyboard/mouse
Iteration
![Page 135: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/135.jpg)
Screen/keyboard/mouse
Iteration
![Page 136: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/136.jpg)
Screen/keyboard/mouse
Iteration
![Page 137: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/137.jpg)
Screen/keyboard/mouse
Iteration
![Page 138: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/138.jpg)
Iteration
Screen/keyboard/mouse
![Page 139: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/139.jpg)
Screen/keyboard/mouse
![Page 140: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/140.jpg)
CascadeGeman et al, 98; Viola & Jones, 01
Set to zero the beliefs of nodes with low probability of containing the target.
Perform message passing only on undecided nodes
The detection of the screen reduces the search space for the mouse detector.
![Page 141: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/141.jpg)
CascadeLocal
Context
Geman et al, 99;Viola and Jones 01
![Page 142: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/142.jpg)
Cascade
![Page 143: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/143.jpg)
Car detectionFrom intrinsic features
From contextual features
A car out of context is less of a car
![Page 144: Antonio Torralba - courses.csail.mit.edu · 2005-02-24 · Previous work on context • Strat & Fischler (91) Context defined using hand-written rules about relationships between](https://reader035.vdocument.in/reader035/viewer/2022081606/5e9346bc3f5d094c203d9b32/html5/thumbnails/144.jpg)
Future work• Learn relationships between more objects
(things get interesting beyond the 10 objects bar)
Feature sharing
ContextScene
Cascade