recognizing scene viewpoint using panoramic place
TRANSCRIPT
!"#$ !"#$ !"#$
1. predict with current SVM 2. add the most confident prediction 3. re-train SVM
!"#$%&#!&'
Each Iteration:
()&*+,"#'!"
*&-)*+.#'/01'
!"#$%&#!&'
!"#$%&#!&'
!"#$ !"#$ !"#$
!+#%.%+)&'2+#"*+3+4'5"*')*+.#.#6'
()&*+,"#'!#$"
7' !"#$%&#!&'
!"#$%&#!&'
!+#%.%+)&'2+#"*+3+4'5"*')*+.#.#6'7'
SVM SVM SVM SVM SVM SVM
A B
A C B D
Kernel SVMClassifier
theater VS hotel room
Recognizing Scene Viewpoint using Panoramic Place RepresentationJianxiong Xiao Krista A. Ehinger Aude Oliva Antonio Torralba
SUN360 Database + Source Codehttp://sun360.mit.edu
−30
150
−60
120
−90 90
−120
60
−150
30
−180
0theater -180 0 +180
10 20 30 40 500
0.05
0.1
0.15
0.2
0.25
# of iterations
accu
racy
Greedy IncrRand AllRand Incr
(a) Training strategies.
10 20 30 40 500
0.05
0.1
0.15
0.2
0.25
# of iterations
accu
racy
Type IType IIType IIIType IV
(b) Di�erent symmetries.
0 30 60 90 1201501800
0.5Type I
0 30 60 90 1201501800
0.5Type II
(c) Angle deviation.
Type IType IIType III
Type IV
beachchurch
hotel roomstreet
subway stationtheater
train interiorwharf
corridorliving room
coastlawn
plaza courtyardshop
0
0.2
0.4
0.6
0.8
1
(a) Test on SUN360 Panorama Dataset.
Type IType IIType III
Type IV
beachchurch
hotel roomstreet
subway stationtheater
train interiorwharf
corridorliving room
coastlawn
plaza courtyardshop
0
0.2
0.4
0.6
0.8
1
(b) Test on SUN Dataset.
Viewpoint prediction accuracy.
Test SetManual Alignment Automatic Alignment
Accuracy 1 Accuracy 2 Accuracy Angle DeviationPlace Both Place Both Place View Both I II III IV
SUN360 48.4 27.3 51.9 23.8 51.9 50.2 24.2 55◦ 51◦ 86◦ 90◦
SUN 22.2 14.5 24.1 13.0 24.1 55.7 13.9 29◦ 30◦ 38◦ 90◦
Chance 3.8 0.3 3.8 0.3 3.8 8.3 0.3 90◦ 90◦ 90◦ 90◦
Evaluation HOG-S HOG-L Tiny HOG2×2 COM Final ChanceAccuracy 41.8 45.0 25.9 56.4 62.2 69.3 8.3Deviation 65.6◦ 62.5◦ 73.4◦ 48.0◦ 41.4◦ 34.9◦ 90.0◦
Image Extrapolation
Canonical View of Scenes
Reference:[1] SUN database: Large-scale scene recognition from abbey to zoo. J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba. CVPR, 2010.[2] Canonical perspective and the perception of objects. S. Palmer, E. Rosch, and P. Chase. Attention and Performance IX, 1981.[3] What your design looks like to peripheral vision. A. Raj and R. Rosen-holtz. APGV, 2010.
Symmetry Discovery & View Sharing
Scene Viewpoint Recognition Place Categorization & Viewpoint Recognition Results
SUN360 Panorama Database De�nition of Scenes
Theater-related Categories in SUN:stage/indoortheater/indoor_seatstheater/indoor_roundtheater/indoor_proceniumorchestra_pitcatwalk...
What about unnamable viewpoints?
Place + Viewpoint
−30
150
−60
120
−90 90
−120
60
−150
30
−180
0theater
Traditional Labelsvs.
theater
hotel room
street
P (Li1, Hi, Ii|M) =∏
j=1...m
P (Lij = C(Li1, j), Iij(Hi)|M)Joint Probability
M∗ = argmaxM∏
i=1...N
P (Ii|M)Training
L = argmaxj=1...mP (j, I|M∗)Testing⟨L(t)i1 , H
(t)i
⟩= argmaxLi1,Hi
∏
j=1...m
P (Lij = C(Li1, j), Iij(Hi)|M(t))Alignment Step
M(t+1) = argmaxM∏
i=1...N
∏
j=1...m
P (C(L(t)i1 , j), Iij(H
(t)i )|M)Maximization Step
Type I
Type II
Type III
Type IV
Despite belonging to the same place category (e.g., theater), the photos taken by an observer inside a place look very di�erent from di�erent viewpoints.
We design a model that, given a photo, can classify its place cat-egory and predict the direction in which the observer is facing within that place. Once we have a compass-like prediction of the observer’s viewpoint, we superimpose the photo onto an aver-aged panorama of the place category to automatically predict the possible layout that extends beyond the available field of view.
We construct a panorama database for training, because panora-mas densely cover all possible views within a place.
We introduce the problem of scene viewpoint recognition, the goal of which is to classify the type of place shown in a photo, and also to recognize the observer’s viewpoint within that cat-egory of place.
Analysis: Maximum-likelihood Interpretation
restaurant lobby atrium museum field expo showroom mountain forest old building park cave ruin workshop
1
trut
h
beach church hotel room street subway station theater train interior wharf corridor living room coast lawn plaza courtyard shop
2
trut
h
3
resu
lt
4
hum
anbi
as −30
150
−60
120
−90 90
−120
60
−150
30
−180
0
50
100
150
200
250−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
50
100
150
200
250−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
50
100
150−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
100
200
300−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
100
200
300−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
100
200
300
400−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
100
200
300
400
500−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
20
40
60
80
100−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
50
100
150−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
200
400
600
800−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
20
40
60
80−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
5
10
15
20
25−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
100
200
300
400−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
100
200
300
400
Target view Panorama
View Matching Task
To further illustrate how each view is correlated with specific scene cat-egories, we run the scene category classi�er from [1] on normal photos extracted from the panoramas.
We study the canonical views of di�erent place categories, i.e. the viewpoints people most often choose when they take a photo of a particular type of place. We obtain ground-truth viewpoint informa-tion for normal limited-�eld-of-view photos from the SUN database [1] by running a view-matching task on Amazon Mechanical Turk.
EvaluationsBecause the alignment was unsuper-vised, we use an oracle to assign each aligned viewpoint to one of the view directions from the ground truth and evaluate accuracy based on the result-ing viewpoint-to-viewpoint mapping.
Illustration: Algorithm Behavior for street.
k-means and EMLatent Structural SVM
Step 1: Place Categorization
Step 2: Simultaneously Align Panoramas and Train Viewpoint Classi�er
Step 0: Sample Normal Photos from Panorama−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
50
100
150
200
250
zoom in
automatic alignment resultmanual alignment result random alignment result
volleyball court/outdoor residen al neighborhood bridge pu ng green constru on site
sandbar islet beach
eld desert/sand
beach desert/sand sn residen l neighborhood islet
movie theater/indoor theater/indoor seats auditorium theater/indoor procenium conference center
laundromat videostore sushi bar restaurant movie theater/indoor
museum/indoor stage/indoor theater/indoor procenium movie theater/indoor subway sta on/pla orm
80 place categories and 67,583 panoramas (resolution 9104x4552)