perceptual and sensory augmented computing visual object recognition tutorial visual object...

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al

Visual Object Recognition

Bastian Leibe &

Computer Vision LaboratoryETH Zurich

Chicago, 14.07.2008

Kristen Grauman

Department of Computer SciencesUniversity of Texas in Austin

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al

2K. Grauman, B. Leibe

Outline

1. Detection with Global Appearance & Sliding Windows

2. Local Invariant Features: Detection & Description

3. Specific Object Recognition with Local Features

― Coffee Break ―

4. Visual Words: Indexing, Bags of Words Categorization

5. Matching Local Features

6. Part-Based Models for Categorization

7. Current Challenges and Research Directions

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al


Recognition with Local Features

• Image content is transformed into local features that are invariant to translation, rotation, and scale

• Goal: Verify if they belong to a consistent configuration

Local Features, e.g. SIFT

Slide credit: David Lowe

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al


Finding Consistent Configurations

• Global spatial models Generalized Hough Transform [Lowe99] RANSAC [Obdrzalek02, Chum05, Nister06] Basic assumption: object is planar

• Assumption is often justified in practice Valid for many structures on

buildings Sufficient for small viewpoint

variations on 3D objects

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al


Hough Transform

• Origin: Detection of straight lines in clutter Basic idea: each candidate point votes

for all lines that it is consistent with. Votes are accumulated in quantized array Local maxima correspond to candidate lines

• Representation of a line Usual form y = a x + b has a singularity around 90º. Better parameterization: x cos() + y sin() =

x

y

θ

ρ

x

y

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al


Hough Transform: Noisy Line

• Problem: Finding the true maximum

Tokens Votesθ

ρ


Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al


Hough Transform: Noisy Input

• Problem: Lots of spurious maxima

Tokens Votes


θ

ρ

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al


Generalized Hough Transform [Ballard81]

• Generalization for an arbitrary contour or shape Choose reference point for the contour (e.g. center) For each point on the contour remember where it is

located w.r.t. to the reference point Remember radius r and angle

relative to the contour tangent Recognition: whenever you find

a contour point, calculate the tangent angle and ‘vote’ for all possible reference points

Instead of reference point, can also vote for transformation

The same idea can be used with local features!

Slide credit: Bernt Schiele

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al


Gen. Hough Transform with Local Features

• For every feature, store possible “occurrences”

– Object identity– Pose– Relative position

• For new image, let the matched features vote for possible object positions

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al


3D Object Recognition

• Gen. HT for Recognition Typically only 3 feature

matches needed for recognition

Extra matches provide robustness

Affine model can be used for planar objects


[Lowe99]

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al


View Interpolation

• Training Training views from similar

viewpoints are clusteredbased on feature matches.

Matching features between adjacent views are linked.

• Recognition Feature matches may be

spread over several training viewpoints.

Use the known links to “transfer votes” to other viewpoints.


[Lowe01]

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al


Recognition Using View Interpolation


[Lowe01]

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al


Location Recognition


Training

[Lowe04]

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al


Applications

• Sony Aibo(Evolution Robotics)

• SIFT usage Recognize

docking station Communicate

with visual cards

• Other uses Place recognition Loop closure in SLAM


Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al


RANSAC (RANdom SAmple Consensus) [Fischler81]

• Randomly choose a minimal subset of data points necessary to fit a model (a sample)

• Points within some distance threshold t of model are a consensus set. Size of consensus set is model’s support.

• Repeat for N samples; model with biggest support is most robust fit

Points within distance t of best model are inliers Fit final model to all inliers


Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al


RANSAC: How many samples?

• How many samples are needed? Suppose w is fraction of inliers (points from line). n points needed to define hypothesis (2 for lines) k samples chosen.

• Prob. that a single sample of n points is correct:

• Prob. that all samples fail is:

Choose k high enough to keep this below desired failure rate.

nwknw )1(


Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al


After RANSAC

• RANSAC divides data into inliers and outliers and yields estimate computed from minimal set of inliers

• Improve this initial estimate with estimation over all inliers (e.g. with standard least-squares minimization)

• But this may change inliers, so alternate fitting with re-classification as inlier/outlier


Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al


Example: Finding Feature Matches

• Find best stereo match within a square search window (here 300 pixels2)

• Global transformation model: epipolar geometry

from Hartley & Zisserman


Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al


Example: Finding Feature Matches

• Find best stereo match within a square search window (here 300 pixels2)

• Global transformation model: epipolar geometry

from Hartley & Zisserman


before RANSAC after RANSAC

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al


Comparison

Gen. Hough Transform• Advantages

Very effective for recognizing arbitrary shapes or objects

Can handle high percentage of outliers (>95%)

Extracts groupings from clutter in linear time

• Disadvantages Quantization issues Only practical for small

number of dimensions (up to 4)

• Improvements available Probabilistic Extensions Continuous Voting Space

RANSAC• Advantages

General method suited to large range of problems

Easy to implement Independent of number of

dimensions

• Disadvantages Only handles moderate

number of outliers (<50%)

• Many variants available, e.g. PROSAC: Progressive RANSAC

[Chum05] Preemptive RANSAC [Nister05]

[Leibe08]

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al

25B. Leibe

Example Applications

Mobile tourist guide• Self-localization• Object/building recognition• Photo/video augmentation

Aachen Cathedral

[Quack, Leibe, Van Gool, CIVR’08]

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al

Vis

ual O

bje

ct

Recog

nit

ion

Tu

tori

al

Web Demo: Movie Poster Recognition


http://www.kooaba.com/en/products_engine.html#

50’000 movieposters indexed

Query-by-imagefrom mobile phoneavailable in Switzer-land

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al

Vis

ual O

bje

ct

Recog

nit

ion

Tu

tori

al

Application: Large-Scale Retrieval

27K. Grauman, B. Leibe [Philbin CVPR’07]

Query Results from 5k Flickr images (demo available for 100k set)

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al

Vis

ual O

bje

ct

Recog

nit

ion

Tu

tori

al

Application: Image Auto-Annotation


Left: Wikipedia imageRight: closest match from Flickr

[Quack CIVR’08]

Moulin Rouge

Tour Montparnasse Colosseum

ViktualienmarktMaypole

Old Town Square (Prague)

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al


Outline

1. Detection with Global Appearance & Sliding Windows

2. Local Invariant Features: Detection & Description

3. Specific Object Recognition with Local Features

― Coffee Break ―

4. Visual Words: Indexing, Bags of Words Categorization

5. Matching Local Features

6. Part-Based Models for Categorization

7. Current Challenges and Research Directions

perceptual and sensory augmented computing visual object recognition tutorial visual object...

Documents