recognizing scene viewpoint using panoramic place

!"#$ !"#$ !"#$

1. predict with current SVM 2. add the most confident prediction 3. re-train SVM

!"#$%&#!&'

Each Iteration:

()&*+,"#'!"

*&-)*+.#'/01'

!"#$%&#!&'

!"#$%&#!&'

!"#$ !"#$ !"#$

!+#%.%+)&'2+#"*+3+4'5"*')*+.#.#6'

()&*+,"#'!#$"

7' !"#$%&#!&'

!"#$%&#!&'

!+#%.%+)&'2+#"*+3+4'5"*')*+.#.#6'7'

SVM SVM SVM SVM SVM SVM

A B

A C B D

Kernel SVMClassifier

theater VS hotel room

Recognizing Scene Viewpoint using Panoramic Place RepresentationJianxiong Xiao Krista A. Ehinger Aude Oliva Antonio Torralba

SUN360 Database + Source Codehttp://sun360.mit.edu

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0theater -180 0 +180

10 20 30 40 500

0.05

0.1

0.15

0.2

0.25

# of iterations

accu

racy

Greedy IncrRand AllRand Incr

(a) Training strategies.

10 20 30 40 500

0.05

0.1

0.15

0.2

0.25

# of iterations

accu

racy

Type IType IIType IIIType IV

(b) Di�erent symmetries.

0 30 60 90 1201501800

0.5Type I

0 30 60 90 1201501800

0.5Type II

(c) Angle deviation.

Type IType IIType III

Type IV

beachchurch

hotel roomstreet

subway stationtheater

train interiorwharf

corridorliving room

coastlawn

plaza courtyardshop

0

0.2

0.4

0.6

0.8

1

(a) Test on SUN360 Panorama Dataset.

Type IType IIType III

Type IV

beachchurch

hotel roomstreet

subway stationtheater

train interiorwharf

corridorliving room

coastlawn

plaza courtyardshop

0

0.2

0.4

0.6

0.8

1

(b) Test on SUN Dataset.

Viewpoint prediction accuracy.

Test SetManual Alignment Automatic Alignment

Accuracy 1 Accuracy 2 Accuracy Angle DeviationPlace Both Place Both Place View Both I II III IV

SUN360 48.4 27.3 51.9 23.8 51.9 50.2 24.2 55◦ 51◦ 86◦ 90◦

SUN 22.2 14.5 24.1 13.0 24.1 55.7 13.9 29◦ 30◦ 38◦ 90◦

Chance 3.8 0.3 3.8 0.3 3.8 8.3 0.3 90◦ 90◦ 90◦ 90◦

Evaluation HOG-S HOG-L Tiny HOG2×2 COM Final ChanceAccuracy 41.8 45.0 25.9 56.4 62.2 69.3 8.3Deviation 65.6◦ 62.5◦ 73.4◦ 48.0◦ 41.4◦ 34.9◦ 90.0◦

Image Extrapolation

Canonical View of Scenes

Reference:[1] SUN database: Large-scale scene recognition from abbey to zoo. J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba. CVPR, 2010.[2] Canonical perspective and the perception of objects. S. Palmer, E. Rosch, and P. Chase. Attention and Performance IX, 1981.[3] What your design looks like to peripheral vision. A. Raj and R. Rosen-holtz. APGV, 2010.

Symmetry Discovery & View Sharing

Scene Viewpoint Recognition Place Categorization & Viewpoint Recognition Results

SUN360 Panorama Database De�nition of Scenes

Theater-related Categories in SUN:stage/indoortheater/indoor_seatstheater/indoor_roundtheater/indoor_proceniumorchestra_pitcatwalk...

What about unnamable viewpoints?

Place + Viewpoint

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0theater

Traditional Labelsvs.

theater

hotel room

street

P (Li1, Hi, Ii|M) =∏

j=1...m

P (Lij = C(Li1, j), Iij(Hi)|M)Joint Probability

M∗ = argmaxM∏

i=1...N

P (Ii|M)Training

L = argmaxj=1...mP (j, I|M∗)Testing⟨L(t)i1 , H

(t)i

⟩= argmaxLi1,Hi

∏

j=1...m

P (Lij = C(Li1, j), Iij(Hi)|M(t))Alignment Step

M(t+1) = argmaxM∏

i=1...N

∏

j=1...m

P (C(L(t)i1 , j), Iij(H

(t)i )|M)Maximization Step

Type I

Type II

Type III

Type IV

Despite belonging to the same place category (e.g., theater), the photos taken by an observer inside a place look very di�erent from di�erent viewpoints.

We design a model that, given a photo, can classify its place cat-egory and predict the direction in which the observer is facing within that place. Once we have a compass-like prediction of the observer’s viewpoint, we superimpose the photo onto an aver-aged panorama of the place category to automatically predict the possible layout that extends beyond the available field of view.

We construct a panorama database for training, because panora-mas densely cover all possible views within a place.

We introduce the problem of scene viewpoint recognition, the goal of which is to classify the type of place shown in a photo, and also to recognize the observer’s viewpoint within that cat-egory of place.

Analysis: Maximum-likelihood Interpretation

restaurant lobby atrium museum field expo showroom mountain forest old building park cave ruin workshop

1

trut

h

beach church hotel room street subway station theater train interior wharf corridor living room coast lawn plaza courtyard shop

2

trut

h

3

resu

lt

4

hum

anbi

as −30

150

−60

120

−90 90

−120

60

−150

30

−180

0

50

100

150

200

250−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

50

100

150

200

250−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

50

100

150−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

100

200

300−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

100

200

300−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

100

200

300

400−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

100

200

300

400

500−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

20

40

60

80

100−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

50

100

150−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

200

400

600

800−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

20

40

60

80−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

5

10

15

20

25−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

100

200

300

400−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

100

200

300

400

Target view Panorama

View Matching Task

To further illustrate how each view is correlated with specific scene cat-egories, we run the scene category classi�er from [1] on normal photos extracted from the panoramas.

We study the canonical views of di�erent place categories, i.e. the viewpoints people most often choose when they take a photo of a particular type of place. We obtain ground-truth viewpoint informa-tion for normal limited-�eld-of-view photos from the SUN database [1] by running a view-matching task on Amazon Mechanical Turk.

EvaluationsBecause the alignment was unsuper-vised, we use an oracle to assign each aligned viewpoint to one of the view directions from the ground truth and evaluate accuracy based on the result-ing viewpoint-to-viewpoint mapping.

Illustration: Algorithm Behavior for street.

k-means and EMLatent Structural SVM

Step 1: Place Categorization

Step 2: Simultaneously Align Panoramas and Train Viewpoint Classi�er

Step 0: Sample Normal Photos from Panorama−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

50

100

150

200

250

zoom in

automatic alignment resultmanual alignment result random alignment result

volleyball court/outdoor residen al neighborhood bridge pu ng green constru on site

sandbar islet beach

eld desert/sand

beach desert/sand sn residen l neighborhood islet

movie theater/indoor theater/indoor seats auditorium theater/indoor procenium conference center

laundromat videostore sushi bar restaurant movie theater/indoor

museum/indoor stage/indoor theater/indoor procenium movie theater/indoor subway sta on/pla orm

80 place categories and 67,583 panoramas (resolution 9104x4552)

recognizing scene viewpoint using panoramic place

Documents