computer vision, part 2 object recognition and scene “understanding”

Computer Vision, Part 2

Object recognition and scene “understanding”

• What makes object recognition a hard task for computers?

HMAX Riesenhuber, M. & Poggio, T. (1999),

“Hierarchical Models of Object Recognition in Cortex”

Serre, T., Wolf, L., Bileschi, S., Risenhuber, M., and Poggio, T. (2006),“Robust Object Recognition with Cortex-Like Mechanisms”

• HMAX: A hierarchical neural-network model of object recognition.

• Meant to model human vision at level of “immediate recognition” capabilities of ventral visual pathway, independent of attention or other top-down processes.

• Also called “Standard Model” (because it incorporates the “standard model” of visual cortex)

• Inspired by earlier “Neocognitron” model of Fukushima (1980)

General ideas behind model

• “Immediate” visual processing is feedforward and hierachical: low levels detect simple features, which are combined hierarchically into increasingly complex features to be detected

• Layers of hierarchy alternate between “sensitivity” (to detecting features) and “invariance” (to position, scale, orientation)

• Size of receptive fields increases along the hierarchy

• Degree of invariance increases along the hierarchy

The HMAX model for object recognition(Riesenhuber, Poggio, Serre, et al.)

Image (gray-scale)

S1 layerEdge detectors

Image (gray-scale)

C1 layerMax over local S1 units

Layers alternatebetween“specificity”and“invariance”over position, scale, orientation

Image (gray-scale)

S2 layerPrototypes

(small image patches)

Image (gray-scale)

S2 layerPrototypes

C2 layerMax activation over each

prototype Layers alternatebetween“specificity”and“invariance”over position, scale, orientation

Image (gray-scale)

S2 layerPrototypes

prototype

Classification layerObject or image classification

Image (gray-scale)

S2 layerPrototypes

prototype

Classification layerObject or image classification

Job of HMAX is toproduce a higher-level representation of an image thatwill be useful for classification.

4 orientations, 16 scales

Image (gray-scale)

Etc.: 16 scales

One S1 receptive field:

MAX MAX

C1 layerMax activation over local S1 units (local position, scale)

Image (gray-scale)

S2 layerCalculate similarity to

prototype (radial basis function)4 orientations, 8 scales

S2 unit: Calculate similarity to prototype for each “pooled” position in C1 layer.

Prototypes(~1000, chosen from image collection,

translated to C1 features)

S2 unit: Calculate similarity to prototype for each “pooled” position in C1 layer.

Prototypes(~1000, chosen from image collection,

translated to C1 features)

S2 unit: Calculate similarity to prototype for each “pooled” positionin C1 layer.

Similarity: Radial basis function:

S2 i = exp −γ X − Pi2

C2 layerMax activation over

position, orientation, scale

S21 S22 …

MAX(1 value)

C2 layerMax over position, orientation, scale

.11 .78 … .32

Support Vector Machineclassification(e.g., dog / not dog)

Streetscenes “scene understanding” system(Bileschi, 2006)

Use HMAX + SVM to identify object classes: Car, Pedestrian, Bicycle, Building, Tree

How Streetscenes Works(Bileschi, 2006)

1. Densely tile the image withwindows of different sizes.

2. C1 and C2 features are computed in each window.

3. The features in eachwindow are given as inputto each of five trained support vector machines

4. If any return a classification with score above a learned threshold, that object is said to be “detected” .

Object detection (here, “car”) with HMAX model (Bileschi, 2006)

Sample of results from HMAX model

(Serre et al., 2006)

computer vision, part 2 object recognition and scene “understanding”

object recognition riesenhuber

hmax model

image grayscale slide

image grayscale c1 layer

s1 layer edge detectors

robust object recognition

hmax riesenhuber

orientation slide

Documents

kapitel 14 recognition – p. 1 recognition scene...

future technologies that will change the …...•augmented...

scene recognition and weakly supervised object localization...

scene recognition

2011.4.14 reporter: fei-fei chen. wide-baseline matching...

evaluating color descriptors for object and scene...

object recognition. so what does object recognition involve?

recognition scene understanding / visual object...

3d object recognition and scene...

scene recognition and weakly supervised object localization...

computational vision: object recognition object recognition...

mit6.870 grounding object recognition and scene...

generating videos with scene dynamics - arxiv ·...

project-team learlear.inrialpes.fr/lear2013.pdf · lear’s...

2 – object recognition a visual scene consists of objects:...

object reading: text recognition for object recognition ·...

object detectors emerge in deep scene...

recognition scene understanding / visual object...

6.870 object recognition and scene understanding

2 – object recognition a visual scene consists of objects:...