building desert field sky mountain sea sky vert horz...
TRANSCRIPT
SuperParsing: Scalable Nonparametric Image Parsing with SuperpixelsJoseph Tighe and Svetlana Lazebnik Dept. of Computer Science, University of North Carolina at Chapel Hill http://www.cs.unc.edu/SuperParsing
Overview
• “Open universe” system: no training required, easy to accommodate an evolv-ing dataset as new classes or new training exemplars are added
• State of the art performance on the SIFT Flow dataset (Liu et al., CVPR 2009)
• New large-scale baseline for image parsing: per-pixel recognition results on asubset of LabelMe consisting of 15k images, 170 labels
Query Image Retrieval set of similar images
Per-class likelihood
Building
Road
Sky
Car
SkyVertical
Horizontal
Superpixels
Building
Car
Road
Sky Semantic Classes Geometric Classes
Image Parsing MethodGiven a query (test) image:
• Find a retrieval set of 200 similar images by taking the minimum per-feature rankof four global image features.
• For each test superpixel si described by multiple local features {fki }, compute alikelihood ratio score for each class c found in the retrieval set:
L(si, c) =P (si|c)P (si|¬c)
=∏k
P (fki |c)P (fki |¬c)
.
The feature likelihood P (fki |c) is given by a nonparametric density estimate:
P (fki | c) =#(retrieval set features of class c within a fixed radius offki )
#(total features of class c in the training set).
• Use MRF inference to solve for the label field c = {ci} over the entire test image:
J(c) =∑
si∈SP
−wi logL(si, ci) + λ∑
(si,sj)∈A
Eedge(ci, cj) ,
where wi is a weight based on the superpixel size, and edge penalty Eedge isbased on the co-occurrence of adjacent labels in the training set:
Eedge(ci, cj) = − log[(P (ci|cj) + P (cj |ci))/2]× δ[ci 6= cj ] .
Road
Sky Sky
Sea Sea
Sand Sand
Tree
Max. Likelihood Ratio Edge Penality MRF LabelingQuery Image
Joint Semantic and Geometric LabelingSimultaneously solve for a field of semantic labels (c) and geometric labels (g) overthe image by optimizing
H(c,g) = J(c) + J(g) + µ∑
si∈SP
ϕ(ci, gi) ,
where ϕ(ci, gi) is a coherence term between the semantic and geometric label of thesame superpixel.
72.0 68.6 77.6
97.2 97.6 94.8
Road
WindowDoor
SidewalkBuildingAwning
SignPerson
HorzVertSky
InitialLabeling
Joint Semanticand Geometric
Query GroundTruth Labels
SemanticMRF
SemanticClasses
GeometricClasses
Results on Large-Scale Datasets
• SIFT Flow dataset (Liu et al., CVPR 2009): 2,488 training images, 200 test images,33 labels
• Barcelona dataset (a new large-scale benchmark): 14,871 training images, 279test images, 170 labels
020406080
# of
Sup
erpi
xels
(x10
00)
264,945 SIFT Flow Dataset Barcelona DatasetLabel Frequency in Dataset
Per-pixel classification rates (with average per-class rates in parentheses):
SIFT Flow BarcelonaSemantic Geometric Semantic Geometric
Liu et al. (CVPR 2009) 74.75 N/A N/A N/ALocal labeling 73.2 (29.1) 89.8 62.5 (8.0) 89.9MRF 76.3 (28.8) 89.9 66.6 (7.6) 90.2Joint semantic/geometric 76.9 (29.4) 90.8 66.9 (7.6) 90.7
0%25%50%75%
100%SIFT Flow Dataset Barcelona DatasetPer-class Performance
Timing
SIFT Flow BarcelonaTraining set size 2,488 14,871Image size 256× 256 640× 480Ave. # superpixels 63.9 307.9Feature extraction ∼ 4 sec ∼ 5 minRetrieval set search 0.04 ± 0.0 0.21 ± 0.0Superpixel search 4.4 ± 2.3 34.2 ± 13.4MRF solver 0.005 ± 0.003 0.03 ± 0.02Total (excluding features) 4.4 ± 2.3 34.4 ± 13.4 0 100 200 300 400 500
0
20
40
60
80
Number of Superpixels
Sec
onds
Full System Times
SIFT Flow Dataset
Barcelona Dataset
Sample Output on SIFT Flow DatasetInitial
LabelingFinal
LabelingGeometricLabeling
Query GroundTruth Labels
EdgePenalties
HorzUnlabeledTreeSkyRoad
98.7 98.6 99.2
CarBridge VertSky
TreeSkySea HorzVertSkyRoadMountainGrass
95.2 97.1 97.1
FieldDesert
TreeSkySea HorzVertSkySunMountain
86.6 88.4 88.5
Sea HorzVertSkyMountain
68.8 94.2 94.2
FieldDesert SkyBuilding
Tree HorzVertSkyRoad
85.4 86.2 94.2
SkySidewalkBuilding Car
Horz
Window VertSkyRoad SkySidewalk
73.2 77.2 93.3
BuildingBalcony Door
TreeSky HorzVertSkyMountain
57.9 73.2 81.3
Building
Small DatasetsResults on two small-scale datasets using trained boosted decision tree classifiers(instead of retrieval set and superpixel search):
Stanford Dataset Geometric Context Dataset715 images, 8 classes 300 images, 7 classes
Semantic Geometric Sub-classes Main classesGould et al. (ICCV 2009) 76.4 91.0 N/A 86.9Hoiem et al. (IJCV 2007) N/A N/A 61.5 88.1Local labeling 76.9 90.5 57.6 87.8MRF 77.5 90.6 61.0 88.2Joint semantic/geometric 77.5 90.6 61.0 88.1
FundingThis research was supported in part by NSF CAREER award IIS-0845629, Microsoft ResearchFaculty Fellowship, and Xerox.
1