word-prediction as a tool to evaluate low-level vision processes
DESCRIPTION
WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES. Prasad Gabbur, Kobus Barnard University of Arizona. Overview. Word-prediction using translation model for object recognition Feature evaluation Segmentation evaluation Modifications to Normalized Cuts segmentation algorithm - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES](https://reader035.vdocument.in/reader035/viewer/2022062810/56815b72550346895dc96b20/html5/thumbnails/1.jpg)
WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES
Prasad Gabbur, Kobus Barnard
University of Arizona
![Page 2: WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES](https://reader035.vdocument.in/reader035/viewer/2022062810/56815b72550346895dc96b20/html5/thumbnails/2.jpg)
Overview
Word-prediction using translation model for object recognition
Feature evaluation
Segmentation evaluation
Modifications to Normalized Cuts segmentation algorithm
Evaluation of color constancy algorithms
Effects of illumination color change on object recognition
Strategies to deal with illumination color change
![Page 3: WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES](https://reader035.vdocument.in/reader035/viewer/2022062810/56815b72550346895dc96b20/html5/thumbnails/3.jpg)
Low-level computer vision algorithms Segmentation, edge detection, feature extraction, etc.
Building blocks of computer vision systems
Is there a generic task to evaluate these algorithms quantitatively?
Word-prediction using translation model for object recognition Sufficiently general
Quantitative evaluation is possible
Motivation
![Page 4: WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES](https://reader035.vdocument.in/reader035/viewer/2022062810/56815b72550346895dc96b20/html5/thumbnails/4.jpg)
Translation model for object recognition
Translate from visual to semantic description
![Page 5: WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES](https://reader035.vdocument.in/reader035/viewer/2022062810/56815b72550346895dc96b20/html5/thumbnails/5.jpg)
Approach
Model joint probability distribution of visual representations and associated words using a large, annotated image collection.
Corel database
![Page 6: WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES](https://reader035.vdocument.in/reader035/viewer/2022062810/56815b72550346895dc96b20/html5/thumbnails/6.jpg)
Image pre-processing
sun sky waves sea
visual features
Segmentation*
* Thanks to N-cuts team [Shi, Tal, Malik] for their segmentation algorithm
[f1 f2 f3 …. fN]
Joint distribution
![Page 7: WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES](https://reader035.vdocument.in/reader035/viewer/2022062810/56815b72550346895dc96b20/html5/thumbnails/7.jpg)
P(w | b) P(w | l)P(b | l)P(l) P(b)l
word
blob
joint visual/textual concepts *
Learn P(w | l), P(b | l), and P(l) from data using EM
Node l
Frequency table
Gaussian over features
* Barnard et al JMLR 2003
![Page 8: WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES](https://reader035.vdocument.in/reader035/viewer/2022062810/56815b72550346895dc96b20/html5/thumbnails/8.jpg)
Annotating images
Segment image
Compute P(w|b) for each region
Sum over regions
. . .
b1
b2
P(w|b1)
P(w|b2)
+
P(w|image)
![Page 9: WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES](https://reader035.vdocument.in/reader035/viewer/2022062810/56815b72550346895dc96b20/html5/thumbnails/9.jpg)
CAT TIGER GRASS FOREST
Predicted Words
Actual Keywords
CAT HORSE GRASS WATER
Measuring performance
• Record percent correct• Use annotation performance as a proxy for recognition
• Large region-labeled databases are not available• Large annotated databases are available
![Page 10: WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES](https://reader035.vdocument.in/reader035/viewer/2022062810/56815b72550346895dc96b20/html5/thumbnails/10.jpg)
75%Training
160 CD’s
80 CD’s
80 CD’sNovel
25%Test
Experimental protocol
sampling scheme Each CD contains 100 images on one specific topic like “aircraft”
Average results over 10 different samplings
Corel database
![Page 11: WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES](https://reader035.vdocument.in/reader035/viewer/2022062810/56815b72550346895dc96b20/html5/thumbnails/11.jpg)
Semantic evaluation of vision processes
Feature setsCombinations of visual features
Segmentation methods Mean-Shift [Comaniciu, Meer]
Normalized Cuts [Shi, Tal, Malik]
Color constancy algorithms Train with illumination change
Color constancy processing – Gray-world, Scale-by-max
![Page 12: WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES](https://reader035.vdocument.in/reader035/viewer/2022062810/56815b72550346895dc96b20/html5/thumbnails/12.jpg)
Feature evaluation
FeaturesSize
Location Shape
• Second moment
• Compactness
• Convexity
• Outer boundary descriptor
Color
(RGB, L*a*b, rgS)
• Average color
• Standard deviation
Texture
Responses to a bank of filters
• Even and Odd symmetric
• Rotationally symmetric (DOG)
Context
(Average surrounding color)
![Page 13: WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES](https://reader035.vdocument.in/reader035/viewer/2022062810/56815b72550346895dc96b20/html5/thumbnails/13.jpg)
Feature evaluation
Base = Size + Location + Second moment + Compactness
0
0.02
0.04
0.06
0.08
0.1
0.12
Base +Color +Texture +Shape
TrainingHeld outNovel
An
nota
tion
P
erf
orm
an
ce
(big
ger
is b
ett
er)
![Page 14: WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES](https://reader035.vdocument.in/reader035/viewer/2022062810/56815b72550346895dc96b20/html5/thumbnails/14.jpg)
Segmentation evaluation
Mean Shift
(Comaniciu, Meer)
Normalized Cuts (N-Cuts)
(Shi, Tal, Malik)
![Page 15: WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES](https://reader035.vdocument.in/reader035/viewer/2022062810/56815b72550346895dc96b20/html5/thumbnails/15.jpg)
Segmentation evaluation
• Performance depends on number of regions used for annotation
• Mean Shift is better than N-Cuts for # regions < 6
An
nota
tion
P
erf
orm
an
ce
(big
ger
is b
ett
er)
# regions
![Page 16: WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES](https://reader035.vdocument.in/reader035/viewer/2022062810/56815b72550346895dc96b20/html5/thumbnails/16.jpg)
Normalized Cuts
• Graph partitioning technique• Bi-partitions an edge-weighted graph in an optimal sense
• Normalized cut (Ncut) is the optimizing criterion
i j
wij
Edge weight => Similarity between i and j
A B
Minimize Ncut(A,B)
Nodes
• Image segmentation• Each pixel is a node
• Edge weight is similarity between pixels
• Similarity based on color, texture and contour cues
![Page 17: WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES](https://reader035.vdocument.in/reader035/viewer/2022062810/56815b72550346895dc96b20/html5/thumbnails/17.jpg)
Normalized Cuts
Original algorithm
pixelpixel regionregion
Initialseg
Finalseg
Produces splits in homogeneous regions, e.g., “sky”
– Local connectivity between pixels
Preseg Seg
![Page 18: WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES](https://reader035.vdocument.in/reader035/viewer/2022062810/56815b72550346895dc96b20/html5/thumbnails/18.jpg)
Meta-segmentation
regionregion
Preseg Iteration 1 Iteration n
regionregion
k lRi Rj
ijkl WT
W1ˆ
k lRi Rj
ijkl WW
Modifications to Normalized Cuts
Original
Modified
k
l
k
l
![Page 19: WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES](https://reader035.vdocument.in/reader035/viewer/2022062810/56815b72550346895dc96b20/html5/thumbnails/19.jpg)
Modifications to Normalized Cuts
Original Modified Original Modified
![Page 20: WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES](https://reader035.vdocument.in/reader035/viewer/2022062810/56815b72550346895dc96b20/html5/thumbnails/20.jpg)
Original vs. Modified
• For # regions < 6, modified out-performs original
• For # regions > 6, original is better
An
nota
tion
P
erf
orm
an
ce
(big
ger
is b
ett
er)
# regions
![Page 21: WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES](https://reader035.vdocument.in/reader035/viewer/2022062810/56815b72550346895dc96b20/html5/thumbnails/21.jpg)
Incorporating high-level information into segmentation
algorithms
Low-level segmenters split up objects (eg. Black and white halves of a penguin)
Using word-prediction gives a way to incorporate high-level semantic information into segmentation algorithms
Propose a merge between regions that have similar posterior distributions over words
![Page 22: WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES](https://reader035.vdocument.in/reader035/viewer/2022062810/56815b72550346895dc96b20/html5/thumbnails/22.jpg)
Illumination change
Makes recognition difficult
Illumination color change
Illuminant 1
Illuminant 2
Strategies to deal with illumination change:
• Train for illumination change
• Color constancy pre-processing and normalizationhttp://www.cs.sfu.ca/~colour/data
*
*
![Page 23: WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES](https://reader035.vdocument.in/reader035/viewer/2022062810/56815b72550346895dc96b20/html5/thumbnails/23.jpg)
Training
Train for illumination change
Variation of color under expected illumination changes
[Matas et al 1994, Matas 1996, Matas et al 2000]
![Page 24: WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES](https://reader035.vdocument.in/reader035/viewer/2022062810/56815b72550346895dc96b20/html5/thumbnails/24.jpg)
Algorithm
Unknown illuminant Canonical (reference) illuminant
(Map image as if it were taken under reference illuminant).
Test Input
Recognition system
Training database
Canonical (reference) illuminant
Color constancy pre-processing
[Funt et al 1998]
![Page 25: WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES](https://reader035.vdocument.in/reader035/viewer/2022062810/56815b72550346895dc96b20/html5/thumbnails/25.jpg)
Algorithm
Unknown illuminant Canonical (reference) illuminant
(Map image as if it were taken under reference illuminant).
Test Input
Recognition system
Normalized training database
Canonical (reference) illuminant
Training database
Algorithm
Color normalization
[Funt and Finlayson 1995, Finlayson et al 1998]
Unknown illuminant
![Page 26: WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES](https://reader035.vdocument.in/reader035/viewer/2022062810/56815b72550346895dc96b20/html5/thumbnails/26.jpg)
Simulating illumination change
11 illuminants
(0 is canonical)
0 1 2
3 4 5
6 7 8
9 10
![Page 27: WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES](https://reader035.vdocument.in/reader035/viewer/2022062810/56815b72550346895dc96b20/html5/thumbnails/27.jpg)
Train with illumination variation
Experiment BTraining: No illumination change
Testing: Illumination change
Experiment CTraining: Illumination change
Testing: Illumination change
An
nota
tion
P
erf
orm
an
ce
(big
ger
is b
ett
er)
Experiment ATraining: No illumination change
Testing: No illumination change
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
A B C
TrainingHeld-outNovel
![Page 28: WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES](https://reader035.vdocument.in/reader035/viewer/2022062810/56815b72550346895dc96b20/html5/thumbnails/28.jpg)
Color constancy pre-processing
Gray-world
Training Test
Algorithm
Mean color = constant
Canonical Unknown
Canonical
rr
rr ct g
g
gg ct b
b
bb ct
r g b
tr tg tb
![Page 29: WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES](https://reader035.vdocument.in/reader035/viewer/2022062810/56815b72550346895dc96b20/html5/thumbnails/29.jpg)
Color constancy pre-processingScale-by-max
Training Test
Algorithm
Max color = constant
Canonical Unknown
Canonical
r g b
tr tg tb
rr
rr
m
mc
t gg
gg
m
mc
t bb
bb
m
mc
t
![Page 30: WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES](https://reader035.vdocument.in/reader035/viewer/2022062810/56815b72550346895dc96b20/html5/thumbnails/30.jpg)
Color constancy pre-processing
Experiment BTraining: No illumination change
Testing: Illumination change
OthersTraining: No illumination change
Testing: Illumination change
+ Color constancy algorithm
An
nota
tion
P
erf
orm
an
ce
(big
ger
is b
ett
er)
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
A B Gray-world
Scale-by-max
TrainingHeld-outNovel
Experiment ATraining: No illumination change
Testing: No illumination change
![Page 31: WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES](https://reader035.vdocument.in/reader035/viewer/2022062810/56815b72550346895dc96b20/html5/thumbnails/31.jpg)
Color normalization
Gray-world
Scale-by-max
Training Test Training Test
Algorithm
Algorithm
Mean color = constant
Max color = constant
Canonical
Unknown
Canonical
Unknown
![Page 32: WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES](https://reader035.vdocument.in/reader035/viewer/2022062810/56815b72550346895dc96b20/html5/thumbnails/32.jpg)
Color normalization
Experiment BTraining: No illumination change
Testing: Illumination change
OthersTraining: No illumination change
+ Color constancy algorithm
Testing: Illumination change
+ Color constancy algorithm
An
nota
tion
P
erf
orm
an
ce
(big
ger
is b
ett
er)
Experiment ATraining: No illumination change
Testing: No illumination change
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
A B Gray-world
Scale-by-max
TrainingHeld-outNovel
![Page 33: WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES](https://reader035.vdocument.in/reader035/viewer/2022062810/56815b72550346895dc96b20/html5/thumbnails/33.jpg)
Conclusions
Translation (visual to semantic) model for object recognition
Identify and evaluate low-level vision processes for recognition
Feature evaluation
Color and texture are the most important in that order
Shape needs better segmentation methods
Segmentation evaluation
Performance depends on # regions for annotation
Mean Shift and modified NCuts do better than original NCuts for # regions < 6
Color constancy evaluation
Training with illumination helps
Color constancy processing helps (scale-by-max better than gray-world)
![Page 34: WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES](https://reader035.vdocument.in/reader035/viewer/2022062810/56815b72550346895dc96b20/html5/thumbnails/34.jpg)
Thank you!