perceptual and sensory augmented computing visual object recognition tutorial visual object...

25
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH Zurich Chicago, 14.07.2008 Kristen Grauman Department of Computer Sciences University of Texas in Austin

Upload: kelly-jacobs

Post on 13-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al

Visual Object Recognition

Bastian Leibe &

Computer Vision LaboratoryETH Zurich

Chicago, 14.07.2008

Kristen Grauman

Department of Computer SciencesUniversity of Texas in Austin

Page 2: Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al

2K. Grauman, B. Leibe

Outline

1. Detection with Global Appearance & Sliding Windows

2. Local Invariant Features: Detection & Description

3. Specific Object Recognition with Local Features

― Coffee Break ―

4. Visual Words: Indexing, Bags of Words Categorization

5. Matching Local Features

6. Part-Based Models for Categorization

7. Current Challenges and Research Directions

Page 3: Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al

3K. Grauman, B. Leibe

Recognition with Local Features

• Image content is transformed into local features that are invariant to translation, rotation, and scale

• Goal: Verify if they belong to a consistent configuration

Local Features, e.g. SIFT

Slide credit: David Lowe

Page 4: Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al

4K. Grauman, B. Leibe

Finding Consistent Configurations

• Global spatial models Generalized Hough Transform [Lowe99] RANSAC [Obdrzalek02, Chum05, Nister06] Basic assumption: object is planar

• Assumption is often justified in practice Valid for many structures on

buildings Sufficient for small viewpoint

variations on 3D objects

Page 5: Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al

5K. Grauman, B. Leibe

Hough Transform

• Origin: Detection of straight lines in clutter Basic idea: each candidate point votes

for all lines that it is consistent with. Votes are accumulated in quantized array Local maxima correspond to candidate lines

• Representation of a line Usual form y = a x + b has a singularity around 90º. Better parameterization: x cos() + y sin() =

x

y

θ

ρ

x

y

Page 6: Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al

7K. Grauman, B. Leibe

Hough Transform: Noisy Line

• Problem: Finding the true maximum

Tokens Votesθ

ρ

Slide credit: David Lowe

Page 7: Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al

8K. Grauman, B. Leibe

Hough Transform: Noisy Input

• Problem: Lots of spurious maxima

Tokens Votes

Slide credit: David Lowe

θ

ρ

Page 8: Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al

9K. Grauman, B. Leibe

Generalized Hough Transform [Ballard81]

• Generalization for an arbitrary contour or shape Choose reference point for the contour (e.g. center) For each point on the contour remember where it is

located w.r.t. to the reference point Remember radius r and angle

relative to the contour tangent Recognition: whenever you find

a contour point, calculate the tangent angle and ‘vote’ for all possible reference points

Instead of reference point, can also vote for transformation

The same idea can be used with local features!

Slide credit: Bernt Schiele

Page 9: Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al

10K. Grauman, B. Leibe

Gen. Hough Transform with Local Features

• For every feature, store possible “occurrences”

– Object identity– Pose– Relative position

• For new image, let the matched features vote for possible object positions

Page 10: Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al

12K. Grauman, B. Leibe

3D Object Recognition

• Gen. HT for Recognition Typically only 3 feature

matches needed for recognition

Extra matches provide robustness

Affine model can be used for planar objects

Slide credit: David Lowe

[Lowe99]

Page 11: Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al

13K. Grauman, B. Leibe

View Interpolation

• Training Training views from similar

viewpoints are clusteredbased on feature matches.

Matching features between adjacent views are linked.

• Recognition Feature matches may be

spread over several training viewpoints.

Use the known links to “transfer votes” to other viewpoints.

Slide credit: David Lowe

[Lowe01]

Page 12: Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al

14K. Grauman, B. Leibe

Recognition Using View Interpolation

Slide credit: David Lowe

[Lowe01]

Page 13: Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al

15K. Grauman, B. Leibe

Location Recognition

Slide credit: David Lowe

Training

[Lowe04]

Page 14: Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al

16K. Grauman, B. Leibe

Applications

• Sony Aibo(Evolution Robotics)

• SIFT usage Recognize

docking station Communicate

with visual cards

• Other uses Place recognition Loop closure in SLAM

Slide credit: David Lowe

Page 15: Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al

17K. Grauman, B. Leibe

RANSAC (RANdom SAmple Consensus) [Fischler81]

• Randomly choose a minimal subset of data points necessary to fit a model (a sample)

• Points within some distance threshold t of model are a consensus set. Size of consensus set is model’s support.

• Repeat for N samples; model with biggest support is most robust fit

Points within distance t of best model are inliers Fit final model to all inliers

Slide credit: David Lowe

Page 16: Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al

19K. Grauman, B. Leibe

RANSAC: How many samples?

• How many samples are needed? Suppose w is fraction of inliers (points from line). n points needed to define hypothesis (2 for lines) k samples chosen.

• Prob. that a single sample of n points is correct:

• Prob. that all samples fail is:

Choose k high enough to keep this below desired failure rate.

nwknw )1(

Slide credit: David Lowe

Page 17: Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al

21K. Grauman, B. Leibe

After RANSAC

• RANSAC divides data into inliers and outliers and yields estimate computed from minimal set of inliers

• Improve this initial estimate with estimation over all inliers (e.g. with standard least-squares minimization)

• But this may change inliers, so alternate fitting with re-classification as inlier/outlier

Slide credit: David Lowe

Page 18: Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al

22K. Grauman, B. Leibe

Example: Finding Feature Matches

• Find best stereo match within a square search window (here 300 pixels2)

• Global transformation model: epipolar geometry

from Hartley & Zisserman

Slide credit: David Lowe

Page 19: Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al

23K. Grauman, B. Leibe

Example: Finding Feature Matches

• Find best stereo match within a square search window (here 300 pixels2)

• Global transformation model: epipolar geometry

from Hartley & Zisserman

Slide credit: David Lowe

before RANSAC after RANSAC

Page 20: Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al

24K. Grauman, B. Leibe

Comparison

Gen. Hough Transform• Advantages

Very effective for recognizing arbitrary shapes or objects

Can handle high percentage of outliers (>95%)

Extracts groupings from clutter in linear time

• Disadvantages Quantization issues Only practical for small

number of dimensions (up to 4)

• Improvements available Probabilistic Extensions Continuous Voting Space

RANSAC• Advantages

General method suited to large range of problems

Easy to implement Independent of number of

dimensions

• Disadvantages Only handles moderate

number of outliers (<50%)

• Many variants available, e.g. PROSAC: Progressive RANSAC

[Chum05] Preemptive RANSAC [Nister05]

[Leibe08]

Page 21: Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al

25B. Leibe

Example Applications

Mobile tourist guide• Self-localization• Object/building recognition• Photo/video augmentation

Aachen Cathedral

[Quack, Leibe, Van Gool, CIVR’08]

Page 22: Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al

Vis

ual O

bje

ct

Recog

nit

ion

Tu

tori

al

Web Demo: Movie Poster Recognition

26K. Grauman, B. Leibe

http://www.kooaba.com/en/products_engine.html#

50’000 movieposters indexed

Query-by-imagefrom mobile phoneavailable in Switzer-land

Page 23: Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al

Vis

ual O

bje

ct

Recog

nit

ion

Tu

tori

al

Application: Large-Scale Retrieval

27K. Grauman, B. Leibe [Philbin CVPR’07]

Query Results from 5k Flickr images (demo available for 100k set)

Page 24: Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al

Vis

ual O

bje

ct

Recog

nit

ion

Tu

tori

al

Application: Image Auto-Annotation

28K. Grauman, B. Leibe

Left: Wikipedia imageRight: closest match from Flickr

[Quack CIVR’08]

Moulin Rouge

Tour Montparnasse Colosseum

ViktualienmarktMaypole

Old Town Square (Prague)

Page 25: Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH

Perc

ep

tual an

d S

en

sory

Au

gm

en

ted

Com

pu

tin

gV

isu

al O

bje

ct

Recog

nit

ion

Tu

tori

al

29K. Grauman, B. Leibe

Outline

1. Detection with Global Appearance & Sliding Windows

2. Local Invariant Features: Detection & Description

3. Specific Object Recognition with Local Features

― Coffee Break ―

4. Visual Words: Indexing, Bags of Words Categorization

5. Matching Local Features

6. Part-Based Models for Categorization

7. Current Challenges and Research Directions