recognizing scene viewpoint using panoramic place

1
!"# !"# !"# 1. predict with current SVM 2. add the most confident prediction 3. re-train SVM !"#$%&#!& Each Iteration: ()&*+,"# ! *&-)*+.# /01 !"#$%&#!& !"#$%&#!& !"# !"# !"# !+#%.%+)& 2+#"*+3+4 5"* )*+.#.#6 ()&*+,"# !#$ 7 !"#$%&#!& !"#$%&#!& !+#%.%+)& 2+#"*+3+4 5"* )*+.#.#6 7 SVM SVM SVM SVM SVM SVM A B A C B D Kernel SVM Classier theater VS hotel room Recognizing Scene Viewpoint using Panoramic Place Representation Jianxiong Xiao Krista A. Ehinger Aude Oliva Antonio Torralba SUN360 Database + Source Code http://sun360.mit.edu -30 150 -60 120 -90 90 -120 60 -150 30 -180 0 theater -180 0 +180 10 20 30 40 50 0 0.05 0.1 0.15 0.2 0.25 # of iterations accuracy Greedy Incr Rand All Rand Incr (a) Training strategies. 10 20 30 40 50 0 0.05 0.1 0.15 0.2 0.25 # of iterations accuracy Type I Type II Type III Type IV (b) Different symmetries. 0 30 60 90 120 150 180 0 0.5 Type I 0 30 60 90 120 150 180 0 0.5 Type II (c) Angle deviation. Type I Type II Type III Type IV beach church hotel room street subway station theater train interior wharf corridor living room coast lawn plaza courtyard shop 0 0.2 0.4 0.6 0.8 1 (a) Test on SUN360 Panorama Dataset. Type I Type II Type III Type IV beach church hotel room street subway station theater train interior wharf corridor living room coast lawn plaza courtyard shop 0 0.2 0.4 0.6 0.8 1 (b) Test on SUN Dataset. Viewpoint prediction accuracy. Test Set Manual Alignment Automatic Alignment Accuracy 1 Accuracy 2 Accuracy Angle Deviation Place Both Place Both Place View Both I II III IV SUN360 48.4 27.3 51.9 23.8 51.9 50.2 24.2 55 51 86 90 SUN 22.2 14.5 24.1 13.0 24.1 55.7 13.9 29 30 38 90 Chance 3.8 0.3 3.8 0.3 3.8 8.3 0.3 90 90 90 90 Evaluation HOG-S HOG-L Tiny HOG2 ×2 COM Final Chance Accuracy 41.8 45.0 25.9 56.4 62.2 69.3 8.3 Deviation 65.6 62.5 73.4 48.0 41.4 34.9 90.0 Image Extrapolation Canonical View of Scenes Reference: [1] SUN database: Large-scale scene recognition from abbey to zoo. J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba. CVPR, 2010. [2] Canonical perspective and the perception of objects. S. Palmer, E. Rosch, and P. Chase. Attention and Performance IX, 1981. [3] What your design looks like to peripheral vision. A. Raj and R. Rosen- holtz. APGV, 2010. Symmetry Discovery & View Sharing Scene Viewpoint Recognition Place Categorization & Viewpoint Recognition Results SUN360 Panorama Database Definition of Scenes Theater-related Categories in SUN: stage/indoor theater/indoor_seats theater/indoor_round theater/indoor_procenium orchestra_pit catwalk ... What about unnamable viewpoints? Place + Viewpoint -30 150 -60 120 -90 90 -120 60 -150 30 -180 0 theater Traditional Labels vs. theater hotel room street P (L i1 ,H i ,I i |M)= j =1...m P (L ij = C (L i1 ,j ),I ij (H i )|M) Joint Probability M * = arg max M i=1...N P (I i |M) Training L = arg max j =1...m P (j, I |M * ) Testing L (t) i1 ,H (t) i = arg max L i1 ,H i j =1...m P (L ij = C (L i1 ,j ),I ij (H i )|M (t) ) Alignment Step M (t+1) = arg max M i=1...N j =1...m P (C (L (t) i1 ,j ),I ij (H (t) i )|M) Maximization Step Type I Type II Type III Type IV Despite belonging to the same place category (e.g., theater), the photos taken by an observer inside a place look very different from different viewpoints. We design a model that, given a photo, can classify its place cat- egory and predict the direction in which the observer is facing within that place. Once we have a compass-like prediction of the observer’s viewpoint, we superimpose the photo onto an aver- aged panorama of the place category to automatically predict the possible layout that extends beyond the available field of view. We construct a panorama database for training, because panora- mas densely cover all possible views within a place. We introduce the problem of scene viewpoint recognition, the goal of which is to classify the type of place shown in a photo, and also to recognize the observer’s viewpoint within that cat- egory of place. Analysis: Maximum-likelihood Interpretation restaurant lobby atrium museum field expo showroom mountain forest old building park cave ruin workshop 1 truth beach church hotel room street subway station theater train interior wharf corridor living room coast lawn plaza courtyard shop 2 truth 3 result 4 human bias -30 150 -60 120 -90 90 -120 60 -150 30 -180 0 50 100 150 200 250 -30 150 -60 120 -90 90 -120 60 -150 30 -180 0 50 100 150 200 250 -30 150 -60 120 -90 90 -120 60 -150 30 -180 0 50 100 150 -30 150 -60 120 -90 90 -120 60 -150 30 -180 0 100 200 300 -30 150 -60 120 -90 90 -120 60 -150 30 -180 0 100 200 300 -30 150 -60 120 -90 90 -120 60 -150 30 -180 0 100 200 300 400 -30 150 -60 120 -90 90 -120 60 -150 30 -180 0 100 200 300 400 500 -30 150 -60 120 -90 90 -120 60 -150 30 -180 0 20 40 60 80 100 -30 150 -60 120 -90 90 -120 60 -150 30 -180 0 50 100 150 -30 150 -60 120 -90 90 -120 60 -150 30 -180 0 200 400 600 800 -30 150 -60 120 -90 90 -120 60 -150 30 -180 0 20 40 60 80 -30 150 -60 120 -90 90 -120 60 -150 30 -180 0 5 10 15 20 25 -30 150 -60 120 -90 90 -120 60 -150 30 -180 0 100 200 300 400 -30 150 -60 120 -90 90 -120 60 -150 30 -180 0 100 200 300 400 Target view Panorama View Matching Task To further illustrate how each view is correlated with specific scene cat- egories, we run the scene category classifier from [1] on normal photos extracted from the panoramas. We study the canonical views of different place categories, i.e. the viewpoints people most often choose when they take a photo of a particular type of place. We obtain ground-truth viewpoint informa- tion for normal limited-field-of-view photos from the SUN database [1] by running a view-matching task on Amazon Mechanical Turk. Evaluations Because the alignment was unsuper- vised, we use an oracle to assign each aligned viewpoint to one of the view directions from the ground truth and evaluate accuracy based on the result- ing viewpoint-to-viewpoint mapping. Illustration: Algorithm Behavior for street. k-means and EM Latent Structural SVM Step 1: Place Categorization Step 2: Simultaneously Align Panoramas and Train Viewpoint Classifier Step 0: Sample Normal Photos from Panorama -30 150 -60 120 -90 90 -120 60 -150 30 -180 0 -30 150 -60 120 -90 90 -120 60 -150 30 -180 0 -30 150 -60 120 -90 90 -120 60 -150 30 -180 0 -30 150 -60 120 -90 90 -120 60 -150 30 -180 0 -30 150 -60 120 -90 90 -120 60 -150 30 -180 0 -30 150 -60 120 -90 90 -120 60 -150 30 -180 0 -30 150 -60 120 -90 90 -120 60 -150 30 -180 0 -30 150 -60 120 -90 90 -120 60 -150 30 -180 0 -30 150 -60 120 -90 90 -120 60 -150 30 -180 0 -30 150 -60 120 -90 90 -120 60 -150 30 -180 0 -30 150 -60 120 -90 90 -120 60 -150 30 -180 0 -30 150 -60 120 -90 90 -120 60 -150 30 -180 0 -30 150 -60 120 -90 90 -120 60 -150 30 -180 0 50 100 150 200 250 zoom in automatic alignment result manual alignment result random alignment result volleyball court/outdoor residenƟal neighborhood bridge puƫng green construĐƟon site sandbar islet beach ƐŶŽǁĮeld desert/sand beach desert/sand snŽǁĮĞůĚ residenƟĂl neighborhood islet movie theater/indoor theater/indoor seats auditorium theater/indoor procenium conference center laundromat videostore sushi bar restaurant movie theater/indoor museum/indoor stage/indoor theater/indoor procenium movie theater/indoor subway staƟon/plaƞorm 80 place categories and 67,583 panoramas (resolution 9104x4552)

Upload: others

Post on 07-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Recognizing Scene Viewpoint using Panoramic Place

!"#$ !"#$ !"#$

1. predict with current SVM 2. add the most confident prediction 3. re-train SVM

!"#$%&#!&'

Each Iteration:

()&*+,"#'!"

*&-)*+.#'/01'

!"#$%&#!&'

!"#$%&#!&'

!"#$ !"#$ !"#$

!+#%.%+)&'2+#"*+3+4'5"*')*+.#.#6'

()&*+,"#'!#$"

7' !"#$%&#!&'

!"#$%&#!&'

!+#%.%+)&'2+#"*+3+4'5"*')*+.#.#6'7'

SVM SVM SVM SVM SVM SVM

A B

A C B D

Kernel SVMClassifier

theater VS hotel room

Recognizing Scene Viewpoint using Panoramic Place RepresentationJianxiong Xiao Krista A. Ehinger Aude Oliva Antonio Torralba

SUN360 Database + Source Codehttp://sun360.mit.edu

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0theater -180 0 +180

10 20 30 40 500

0.05

0.1

0.15

0.2

0.25

# of iterations

accu

racy

Greedy IncrRand AllRand Incr

(a) Training strategies.

10 20 30 40 500

0.05

0.1

0.15

0.2

0.25

# of iterations

accu

racy

Type IType IIType IIIType IV

(b) Di�erent symmetries.

0 30 60 90 1201501800

0.5Type I

0 30 60 90 1201501800

0.5Type II

(c) Angle deviation.

Type IType IIType III

Type IV

beachchurch

hotel roomstreet

subway stationtheater

train interiorwharf

corridorliving room

coastlawn

plaza courtyardshop

0

0.2

0.4

0.6

0.8

1

(a) Test on SUN360 Panorama Dataset.

Type IType IIType III

Type IV

beachchurch

hotel roomstreet

subway stationtheater

train interiorwharf

corridorliving room

coastlawn

plaza courtyardshop

0

0.2

0.4

0.6

0.8

1

(b) Test on SUN Dataset.

Viewpoint prediction accuracy.

Test SetManual Alignment Automatic Alignment

Accuracy 1 Accuracy 2 Accuracy Angle DeviationPlace Both Place Both Place View Both I II III IV

SUN360 48.4 27.3 51.9 23.8 51.9 50.2 24.2 55◦ 51◦ 86◦ 90◦

SUN 22.2 14.5 24.1 13.0 24.1 55.7 13.9 29◦ 30◦ 38◦ 90◦

Chance 3.8 0.3 3.8 0.3 3.8 8.3 0.3 90◦ 90◦ 90◦ 90◦

Evaluation HOG-S HOG-L Tiny HOG2×2 COM Final ChanceAccuracy 41.8 45.0 25.9 56.4 62.2 69.3 8.3Deviation 65.6◦ 62.5◦ 73.4◦ 48.0◦ 41.4◦ 34.9◦ 90.0◦

Image Extrapolation

Canonical View of Scenes

Reference:[1] SUN database: Large-scale scene recognition from abbey to zoo. J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba. CVPR, 2010.[2] Canonical perspective and the perception of objects. S. Palmer, E. Rosch, and P. Chase. Attention and Performance IX, 1981.[3] What your design looks like to peripheral vision. A. Raj and R. Rosen-holtz. APGV, 2010.

Symmetry Discovery & View Sharing

Scene Viewpoint Recognition Place Categorization & Viewpoint Recognition Results

SUN360 Panorama Database De�nition of Scenes

Theater-related Categories in SUN:stage/indoortheater/indoor_seatstheater/indoor_roundtheater/indoor_proceniumorchestra_pitcatwalk...

What about unnamable viewpoints?

Place + Viewpoint

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0theater

Traditional Labelsvs.

theater

hotel room

street

P (Li1, Hi, Ii|M) =∏

j=1...m

P (Lij = C(Li1, j), Iij(Hi)|M)Joint Probability

M∗ = argmaxM∏

i=1...N

P (Ii|M)Training

L = argmaxj=1...mP (j, I|M∗)Testing⟨L(t)i1 , H

(t)i

⟩= argmaxLi1,Hi

j=1...m

P (Lij = C(Li1, j), Iij(Hi)|M(t))Alignment Step

M(t+1) = argmaxM∏

i=1...N

j=1...m

P (C(L(t)i1 , j), Iij(H

(t)i )|M)Maximization Step

Type I

Type II

Type III

Type IV

Despite belonging to the same place category (e.g., theater), the photos taken by an observer inside a place look very di�erent from di�erent viewpoints.

We design a model that, given a photo, can classify its place cat-egory and predict the direction in which the observer is facing within that place. Once we have a compass-like prediction of the observer’s viewpoint, we superimpose the photo onto an aver-aged panorama of the place category to automatically predict the possible layout that extends beyond the available field of view.

We construct a panorama database for training, because panora-mas densely cover all possible views within a place.

We introduce the problem of scene viewpoint recognition, the goal of which is to classify the type of place shown in a photo, and also to recognize the observer’s viewpoint within that cat-egory of place.

Analysis: Maximum-likelihood Interpretation

restaurant lobby atrium museum field expo showroom mountain forest old building park cave ruin workshop

1

trut

h

beach church hotel room street subway station theater train interior wharf corridor living room coast lawn plaza courtyard shop

2

trut

h

3

resu

lt

4

hum

anbi

as −30

150

−60

120

−90 90

−120

60

−150

30

−180

0

50

100

150

200

250−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

50

100

150

200

250−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

50

100

150−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

100

200

300−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

100

200

300−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

100

200

300

400−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

100

200

300

400

500−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

20

40

60

80

100−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

50

100

150−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

200

400

600

800−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

20

40

60

80−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

5

10

15

20

25−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

100

200

300

400−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

100

200

300

400

Target view Panorama

View Matching Task

To further illustrate how each view is correlated with specific scene cat-egories, we run the scene category classi�er from [1] on normal photos extracted from the panoramas.

We study the canonical views of di�erent place categories, i.e. the viewpoints people most often choose when they take a photo of a particular type of place. We obtain ground-truth viewpoint informa-tion for normal limited-�eld-of-view photos from the SUN database [1] by running a view-matching task on Amazon Mechanical Turk.

EvaluationsBecause the alignment was unsuper-vised, we use an oracle to assign each aligned viewpoint to one of the view directions from the ground truth and evaluate accuracy based on the result-ing viewpoint-to-viewpoint mapping.

Illustration: Algorithm Behavior for street.

k-means and EMLatent Structural SVM

Step 1: Place Categorization

Step 2: Simultaneously Align Panoramas and Train Viewpoint Classi�er

Step 0: Sample Normal Photos from Panorama−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

50

100

150

200

250

zoom in

automatic alignment resultmanual alignment result random alignment result

volleyball court/outdoor residen al neighborhood bridge pu ng green constru on site

sandbar islet beach

eld desert/sand

beach desert/sand sn residen l neighborhood islet

movie theater/indoor theater/indoor seats auditorium theater/indoor procenium conference center

laundromat videostore sushi bar restaurant movie theater/indoor

museum/indoor stage/indoor theater/indoor procenium movie theater/indoor subway sta on/pla orm

80 place categories and 67,583 panoramas (resolution 9104x4552)