object recognition by discriminative combinations of line segments and ellipses alex chia ^˚...

Object Recognition by

Discriminative Combinations

of Line Segments and Ellipses

Alex Chia^˚

Susanto Rahardja^

Deepu Rajan˚

Maylor Leung˚

^Institute for Infocomm Research (I²R), Singapore

˚Nanyang Technological University, Singapore

Horse-side

Horse-sideHorse-side

• Image classification – Separate images containing an object

category from other images

Goals

2

• Category-Level Object Detection– Localize all instances of an object category

from an image

Goals – cont.

3

• Region based approach

– Exploits image pixel brightness or color values

– Other classes (e.g. horse) are more defined by their shape

• Region based approach

– Exploits image pixel brightness or color values

– Not suitable for complex classes characterized by thin skeletal structures (e.g. bicycle)

Existing Approaches

4

• Contour based approach

– Exploits spatial configuration or statistic of edge pixels

– Edge based rich local descriptors

– Contour fragments

– Shape primitives


– Exploits spatial configuration or statistic of edge pixels

– Edge based rich local descriptors

– Contour fragments


Existing Approaches – cont.



I. Support abstract reasoning (unlike edge based local descriptors)

II. Efficient storage demands (unlike contour fragments)

III. Efficient comparison across single and multiple scales (unlike contour fragments)

5

Detect object instances and classify images

Boost discriminative codeword combinations

Construct shape tokens

Our contour based approach - outline

Detect object instances and classify images

Evaluate performance

Learn category-specific codebook of shape tokens

Boost discriminative codeword combinations

Construct shape tokens

Extract line segments and ellipses

Learn category-specific codebook of shape tokens

DatasetTraining images Testing images

Extract line segments and ellipses

Learning phase Evaluation phase6

Tyxnnnrrr vvhwlwlA

Constructing shape tokens• Pair a reference primitive to its connected neighbor

– Tokens: Ellipse-line, Line-line, Ellipse-ellipse

• Geometrical and spatial properties– Length, orientation, distance between midpoints,

relative primitive positions

θr

θn

hlr

lnwr

lr

wr

θr

ln θn

hT

y

x

vv

7

Difference in widths

• A token is compared only to similar typed tokens

• Differences in their attributes

Difference in spatial separation of primitives

Difference in orientationDifference in widths

Difference in lengths

yjxj

yi

xiv

nrp

pj

pijil

nrp

pj

pil

nrp

pj

pilji vvvvDDhhDwwDllDAAD ,,,,,,,,

,,,

1,lnmin,

j

ijil l

lllD

Tyxnnnrrr vvhwlwlA

2/

,min,

jijijiD

22

,,, yj

yi

xj

xi

yj

xj

yi

xiv vvvvvvvvD

Difference in lengthsDifference in spatial

separation of primitives

Difference in orientation

Difference in relative primitive positions

Comparing shape tokens

8

• Clustering for its relative position– Mean-shift clustering

• Extracting tokens from within the bounding boxes of training objects

Learning category-specific codebook

• Clustering for its scale normalized appearance descriptors– Adapted bisecting 2-medoid clustering

Normalized appearance descriptorNormalized translational vector

10

• Medoid in each mean-shift as candidate codeword

• Appearance distance allowance• Indicate range of appearance candidate represents

• = Mean appearance distance + Std. dev.

• Scale normalized circular window• Indicate where candidate is found relative to object

centroid• center and radius of window:


• Appearance distance allowance• Indicate range of appearance candidate represents

Learning category-specific codebook – cont.

Mean-shift sub-cluster feature space

x

x x

x

x

x

x +

• = Mean appearance distance + Std. dev.

rc,


c

r

11


• Score each candidate by appearance + geometric qualities•

• Number of unique training objects

•

d

r/1

Candidates from all sub-clusters

Candidates from 350 most populated sub-clusters

Appearance qualities

Geometric quality

12


• Radial ranking method to select candidate into codebook

13


Candidates from all sub-clusters

Candidates from 350 most populated sub-clusters

Candidates from 350 selected sub-clusters

Face Bike-front Bottle

Horse-side Cow-side

14

• Matching codeword combination• Every codeword in combination finds image tokens

within (appearance constraint)• Centroid predictions by all codewords in combination

concur (geometric constraint)

Learning discriminative codeword combinations

• Each codeword parameterized by• Appearance distance allowance• Scale normalized circular window with radius and

center

rc

• Matching codeword combination• Every codeword in combination finds image tokens

within (appearance constraint)• Centroid predictions by all codewords in combination

concur (geometric constraint)

15

For a scale ‘s’ and location ‘x’, all codewords find matching tokens within its estimated window, will predict centroid locations which concur

Learning discriminative codeword combinations – cont.

Basic idea for finding matched codeword combinations

ics

xirs

irs

xjcs

x = (0,0)++x = (0,0)++ics

xirs xj

csjrs

Given codeword i and codeword j, for a scale ‘s’ and location ‘x’ in an image

ics

jcs

jrs

jrs

16


''

'

* ,,minarg tdtdt igeoiappt

Finding token t* within estimated window that has the least appearance distance to codeword

ics

xirs

x = (0,0)++

xx

xx

[0, 2] if matching token found within window=

xx

x

[0, 2] if matching token found within windowotherwise

x

x

xx

x x

** ,,, tdtdxs igeoiappi Response of codeword i at scale ‘s’ and location ‘x’ of image

17

• Simple example (2 codewords)– Matching of codewords ‘i’ and ‘j’ at scale s and location x

– Generalized form

ii xs , jj xs ,andpi

pj

pi

pj

p, [0, 2] {-1 or +1}where, ii xs , jj xs ,and …p

ipj

pi

pj

Binary decision tree


Visual aspects of tokens

Spatial layout of tokens

Relationships of tokens

ii xs , and jj xs , , [0, 2]

Direction of inequality

Structural constraints of object class• Appearance Geometric+ + constraints of object class• Appearance Geometric Structural+ + constraints of object class• Appearance Geometric Structural+ +

Predicted label

18

iii pxs , predicted labelip iii pxs , jjj pxs ,and predicted

labelip jp iii pxs , jjj pxs ,andpredicted

label kkk pxs ,andjpip

kp iii pxs ,ip

• Input

• Output• Output… … …


… … …

111 , xs

112 , xs

11, xsn

…

211 , xs

212 , xs

21, xsn

…

311 , xs

312 , xs

31, xsn…

nm xs ,1

nm xs ,2

nmn xs ,

……Matrix of values

11, xsz 11, xsz 11, xsz nm xsz ,…Vector of z labels

Weight vector 11, xsw 22 , xsw 33, xsw nm xsw ,…

Boosting

otherwise

pxspandpxspifxsxCC jjjjiiii

0

,,,

• Output… … …

xsxCCxsHi

ii

,, • Detection confidence:

19

False positives per image

Rec

all

Shotton et. al. I

Shotton-et. al. II (Retrained test)

Bai et. al.

Our method

0.8738

0.8903

0.8032

0.9218

Detection RP-AUC

False positive rate

True

pos

itive

rate

Shotton et. al. I

Shotton-et. al. II (Retrained test)

Our method

0.9251

0.9400

0.9500

Classification ROC-AUC

Experimental Results – Weizmann horse

J. Shotton et.al., TPAMI, 2008.

X. Bai et. al., ICCV, 2009.

100 400 0.9826 0.9953 0.9325 0.9310

100 400 0.9983 1.0000 0.9996 1.0000

100 217 0.9974 0.9966 0.9895 0.9850

100 400 0.9883 0.9992 0.9797 0.9912

32 14 0.9643 0.9000 0.6843 0.6925

34 16 0.9688 0.9727 0.8256 0.7233

29 13 0.9468 0.9172 0.6042 0.6398

19 12 0.9584 0.9375 0.7421 0.6344

90 53 0.9445 0.9366 0.8299 0.6959

54 64 0.9773 0.9802 0.9009 0.9468

34 16 0.9844 0.9727 0.8335 0.8575

45 65 0.9944 0.9992 0.9945 0.9975

44 22 0.9918 0.9566 0.7368 0.7852

55 96 0.9756 0.9816 0.9361 0.9680

39 18 0.9352 0.9321 0.5730 0.4271

30 20 0.9525 0.9600 0.9619 0.9035

31 20 0.9800 0.9825 0.8964 0.9158

Average across categories 0.9730 0.9659 0.8483 0.8291

Object category

Number of object imagesImage classification results

ROC-AUCObject detection results

RP-AUCTraining Testing

Our method Shotton et. al. Our method Shotton et. al.

Object category

Number of object imagesImage classification results

ROC-AUCObject detection results

RP-AUCTraining Testing

Our method Shotton et. al. Our method Shotton et. al.

Plane

Motorbike

Face

Car-rear

Car-2/3-rear

Car-front

Bike-rear

Bike-front

Bike-side

Bottle

Cow-front

Cow-side

Horse-front

Horse-side

Person

Mug

Cup

Average across categories 0.9730 0. 9659 0.8483 0.8291

Experimental Results – Graz-17

J. Shotton et. al, TPAMI, 2008.

• Additional comparisons with other methods provided in paper

21

• Presented a contour based recognition approach which exploits simple and generic shape primitives

• Proposed a method to learn discriminative primitive combinations which have variable number of primitives

• Demonstrated with extensive experiments across 17 categories the effectiveness of our approach

Summary

Thank you

[email protected]

mailto:[email protected]

object recognition by discriminative combinations of line segments and ellipses alex chia ^˚...

Documents