learning local affine representations for texture and object recognition

49
Learning Local Affine Learning Local Affine Representations for Texture and Representations for Texture and Object Recognition Object Recognition Svetlana Lazebnik Svetlana Lazebnik Beckman Institute, University of Illinois at Beckman Institute, University of Illinois at Urbana-Champaign Urbana-Champaign (joint work with Cordelia Schmid, Jean Ponce) (joint work with Cordelia Schmid, Jean Ponce)

Upload: denver

Post on 06-Feb-2016

51 views

Category:

Documents


0 download

DESCRIPTION

Learning Local Affine Representations for Texture and Object Recognition. Svetlana Lazebnik Beckman Institute, University of Illinois at Urbana-Champaign (joint work with Cordelia Schmid, Jean Ponce). Overview. Goal: Recognition of 3D textured surfaces, object classes Our contribution: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Learning Local Affine Representations for Texture and Object Recognition

Learning Local Affine Representations Learning Local Affine Representations for Texture and Object Recognitionfor Texture and Object Recognition

Svetlana Lazebnik Svetlana Lazebnik Beckman Institute, University of Illinois at Urbana-ChampaignBeckman Institute, University of Illinois at Urbana-Champaign

(joint work with Cordelia Schmid, Jean Ponce)(joint work with Cordelia Schmid, Jean Ponce)

Page 2: Learning Local Affine Representations for Texture and Object Recognition

OverviewOverview• Goal:Goal:

– Recognition of 3D textured surfaces, object classes

• Our contribution:Our contribution: – Texture and object representations based on

local affine regions

• Advantages of proposed approach:Advantages of proposed approach: – Distinctive, repeatable primitives– Robustness to clutter and occlusion – Ability to approximate 3D geometric transformations

Page 3: Learning Local Affine Representations for Texture and Object Recognition

The ScopeThe Scope1. Recognition of single-texture images (CVPR 2003)

2. Recognition of individual texture regions in multi-texture images (ICCV 2003)

3. Recognition of object classes (BMVC 2004, work in progress)

Page 4: Learning Local Affine Representations for Texture and Object Recognition

1. Recognition of Single-Texture Images1. Recognition of Single-Texture Images

Page 5: Learning Local Affine Representations for Texture and Object Recognition

Affine Region DetectorsAffine Region DetectorsHarris detector (H) Laplacian detector (L)

Mikolajczyk & Schmid (2002), Gårding & Lindeberg (1996)

Page 6: Learning Local Affine Representations for Texture and Object Recognition

Affine Rectification ProcessAffine Rectification Process

Patch 2Patch 1

Rectified patches (rotational ambiguity)

Page 7: Learning Local Affine Representations for Texture and Object Recognition

Rotation-Invariant Descriptors 1: Rotation-Invariant Descriptors 1: Spin ImagesSpin Images

• Based on range spin images (Johnson & Hebert 1998)• Two-dimensional histogram:

distance from center × intensity value

Page 8: Learning Local Affine Representations for Texture and Object Recognition

Rotation-Invariant Descriptors 2: RIFTRotation-Invariant Descriptors 2: RIFT• Based on SIFT (Lowe 1999)• Two-dimensional histogram:

distance from center × gradient orientation• Gradient orientation is measured w.r.t. to the direction

pointing from the center of the patch

Page 9: Learning Local Affine Representations for Texture and Object Recognition

Signatures and EMDSignatures and EMD

• SignaturesS = {(m1, w1), … , (mk, wk)} mi — cluster center wi — relative weight

• Earth Mover’s Distance (Rubner et al. 1998)– Computed from ground distances d(mi, m'j) – Can compare signatures of different sizes – Insensitive to the number of clusters

Page 10: Learning Local Affine Representations for Texture and Object Recognition

Database: Textured SurfacesDatabase: Textured Surfaces

25 textures, 40 sample images each (640x480)

Page 11: Learning Local Affine Representations for Texture and Object Recognition

EvaluationEvaluation

• Channels: HS, HR, LS, LR– Combined through addition of EMD matrices

• Classification results– 10 training images per class, rates averaged over

200 random training subsets

Page 12: Learning Local Affine Representations for Texture and Object Recognition

Comparative EvaluationComparative EvaluationOur method Varma & Zisserman

(2003)

Spatial selection Harris and Laplacian detectors

None (every pixel location is used)

Neighborhood shape selection

Affine adaptation None (support of descriptors is fixed)

Descriptors Spin images, RIFT Raw pixel values

Textons Separate set of textons for each image

Universal texton dictionary

Representing/comparing texton distributions

Signatures/EMD Histograms/ chi-squared distance

Page 13: Learning Local Affine Representations for Texture and Object Recognition

Results of Evaluation:Results of Evaluation:Classification rate vs. number of training samplesClassification rate vs. number of training samples

• Conclusion:Conclusion: an intrinsically invariant representation is necessary to deal with intra-class variations when they are not adequately represented in the training set

(H+L)(S+R) VZ-Joint VZ-MRF

Page 14: Learning Local Affine Representations for Texture and Object Recognition

SummarySummary

• A sparse texture representation based on local affine regions

• Two novel descriptors (spin images, RIFT)• Successful recognition in the presence of viewpoint

changes, non-rigidity, non-homogeneity• A flexible approach to invariance

Page 15: Learning Local Affine Representations for Texture and Object Recognition

2. Recognition of Individual Regions in 2. Recognition of Individual Regions in Multi-Texture ImagesMulti-Texture Images

• A two-layer architecture: – Local appearance + neighborhood relations

• Learning:– Represent the local appearance of each texture class

using a mixture-of-Gaussians model– Compute co-occurrence statistics of sub-class labels over

affinely adapted neighborhoods

• Recognition:– Obtain initial class membership probabilities from the

generative model– Use relaxation to refine these probabilities

Page 16: Learning Local Affine Representations for Texture and Object Recognition

Two Learning ScenariosTwo Learning Scenarios

• Fully supervised:Fully supervised: every region in the training image is labeled with its texture class

• Weakly supervised:Weakly supervised: each training image is labeled with the classes occurring in it

brick

brick, marble, carpet

Page 17: Learning Local Affine Representations for Texture and Object Recognition

Estimate:• probability p(c,c')• correlation r(c,c')

Neighborhood StatisticsNeighborhood Statistics

Neighborhood definition

Page 18: Learning Local Affine Representations for Texture and Object Recognition

Relaxation (Rosenfeld et al. 1976)Relaxation (Rosenfeld et al. 1976)

• Iterative process: – Initialized with posterior probabilities p(c|xi) obtained from

the generative model– For each region i and each sub-class label c, update the

probability pi(c) based on neighbor probabilities pj(c') and correlations r(c,c')

• Shortcomings:– No formal guarantee of convergence– After the initialization, the updates to the probability values

do not depend on the image data

Page 19: Learning Local Affine Representations for Texture and Object Recognition

Experiment 1: 3D Textured SurfacesExperiment 1: 3D Textured SurfacesSingle-texture images

Multi-texture images

T1 (brick) T2 (carpet) T3 (chair) T4 (floor 1) T5 (floor 2) T6 (marble) T7 (wood)

10 single-texture training images per class, 13 two-texture training images, 45 multi-texture test images

Page 20: Learning Local Affine Representations for Texture and Object Recognition

Effect of Relaxation on LabelingEffect of Relaxation on Labeling

Original image

Top: before relaxation, bottom: after relaxation

Page 21: Learning Local Affine Representations for Texture and Object Recognition

T1 (brick) T2 (carpet) T3 (chair) T4 (floor 1)

T5 (floor 2) T6 (marble) T7 (wood)

(single-texture training images)

RetrievalRetrieval

Page 22: Learning Local Affine Representations for Texture and Object Recognition

Successful Segmentation ExamplesSuccessful Segmentation Examples

Page 23: Learning Local Affine Representations for Texture and Object Recognition

Unsuccessful Segmentation ExamplesUnsuccessful Segmentation Examples

Page 24: Learning Local Affine Representations for Texture and Object Recognition

Experiment 2: AnimalsExperiment 2: Animals

• No manual segmentation• Training data: 10 sample images per class• Test data: 20 samples per class + 20 negative

images

cheetah, background zebra, background giraffe, background

Page 25: Learning Local Affine Representations for Texture and Object Recognition

Cheetah ResultsCheetah Results

Page 26: Learning Local Affine Representations for Texture and Object Recognition

Zebra ResultsZebra Results

Page 27: Learning Local Affine Representations for Texture and Object Recognition

Giraffe ResultsGiraffe Results

Page 28: Learning Local Affine Representations for Texture and Object Recognition

Future WorkFuture Work• Design an improved representation using a random

field framework, e.g., conditional random fields (Lafferty 2001, Kumar & Hebert 2003)

• Develop a procedure for weakly supervised learning of random field parameters

• Apply method to recognition of natural texture categories

SummarySummary• A two-level representation (local appearance +

neighborhood relations)• Weakly supervised learning of texture models

Page 29: Learning Local Affine Representations for Texture and Object Recognition

3. Recognition of Object Classes 3. Recognition of Object Classes

The approach:• Represent objects using multiple composite

semi-local affine parts– More expressive than individual regions– Not globally rigid

• Correspondence search is key to learning and detection

Page 30: Learning Local Affine Representations for Texture and Object Recognition

Correspondence SearchCorrespondence Search• Basic operation: a two-image matching procedure for finding

collections of affine regions that can be mapped onto each other using a single affine transformation

• Implementation: greedy search based on geometric and photometric consistency constraints

– Returns multiple correspondence hypotheses

– Automatically determines number of regions in correspondence

– Works on unsegmented, cluttered images (weakly supervised learning)

A

Page 31: Learning Local Affine Representations for Texture and Object Recognition

Matching: 3D ObjectsMatching: 3D Objects

Page 32: Learning Local Affine Representations for Texture and Object Recognition

Matching: 3D ObjectsMatching: 3D Objects

closeup closeup

Page 33: Learning Local Affine Representations for Texture and Object Recognition

Matching: FacesMatching: Faces

spurious match ???

Page 34: Learning Local Affine Representations for Texture and Object Recognition

Finding SymmetriesFinding Symmetries

Page 35: Learning Local Affine Representations for Texture and Object Recognition

Finding Repeated Patterns and Finding Repeated Patterns and SymmetriesSymmetries

Page 36: Learning Local Affine Representations for Texture and Object Recognition

Learning Object Models for RecognitionLearning Object Models for Recognition

• Match multiple pairs of training images to produce a set of candidate parts

• Use additional validation images to evaluate repeatability of parts and individual regions

• Retain a fixed number of parts having the best repeatability score

Page 37: Learning Local Affine Representations for Texture and Object Recognition

Recognition Experiment: ButterfliesRecognition Experiment: Butterflies

• 16 training images (8 pairs) per class• 10 validation images per class• 437 test images• 619 images total

Admiral Swallowtail Machaon Monarch 1 Monarch 2 Peacock Zebra

Page 38: Learning Local Affine Representations for Texture and Object Recognition

Butterfly PartsButterfly Parts

Page 39: Learning Local Affine Representations for Texture and Object Recognition

RecognitionRecognition

• Top 10 parts per class used for recognition• Relative repeatability score:• Classification results:

total number of regions detected

total part size

Total part size (smallest/largest)

Page 40: Learning Local Affine Representations for Texture and Object Recognition

Classification Rate vs. Classification Rate vs. Number of PartsNumber of Parts

Page 41: Learning Local Affine Representations for Texture and Object Recognition

Detection Results (ROC Curves)Detection Results (ROC Curves)

Circles: reference relative repeatability rates. Red square: ROC equal error rate (in parentheses)

Page 42: Learning Local Affine Representations for Texture and Object Recognition

Successful Detection ExamplesSuccessful Detection ExamplesTraining images

Test images (blue: occluded regions)

All ellipses found in the test images

Page 43: Learning Local Affine Representations for Texture and Object Recognition

Unsuccessful Detection ExamplesUnsuccessful Detection ExamplesTraining images

Test images (blue: occluded regions)

All ellipses found in the test image

Page 44: Learning Local Affine Representations for Texture and Object Recognition

• Semi-local affine parts for describing structure of 3D objects

• Finding a part vocabulary:– Correspondence search between pairs of images– Validation

• Additional application: – Finding symmetry and repetition

SummarySummarySummarySummary

Future WorkFuture Work

• Find a better affine region detector• Represent, learn inter-part relations• Evaluation: CalTech database, harder classes, etc.

Page 45: Learning Local Affine Representations for Texture and Object Recognition

BirdsBirdsEgret

Snowy Owl Mandarin Duck Wood Duck

Puffin

Page 46: Learning Local Affine Representations for Texture and Object Recognition

Birds: Candidate PartsBirds: Candidate PartsMandarin Duck

Puffin

Page 47: Learning Local Affine Representations for Texture and Object Recognition

Objects without Characteristic TextureObjects without Characteristic Texture

(LeCun’04)

Page 48: Learning Local Affine Representations for Texture and Object Recognition

Summary of TalkSummary of Talk

1. Recognition of single-texture images • Distribution of local appearance descriptors

2. Recognition of individual regions in multi-texture images• Local appearance + loose statistical neighborhood

relations

3. Recognition of object categories• Local appearance + strong geometric relations

For more information: http://www-cvr.ai.uiuc.edu/ponce_grp

Page 49: Learning Local Affine Representations for Texture and Object Recognition

Issues, ExtensionsIssues, Extensions

• Weakly supervised learning– Evaluation methods?– Learning from contaminated data?

• Probabilistic vs. geometric approaches to invariance• EM vs. direct correspondence search• Training set size• Background modeling• Strengthening the representation

– Heterogeneous local features– Automatic feature selection– Inter-part relations