hierarchies in object recognition and classification nat twarog 6.870, november 24, 2008
TRANSCRIPT
Hierarchies in Object Recognition and Classification
Nat Twarog6.870, November 24, 2008
What's a hierarchy?
Mathematics: A partial ordering. Wikipedia: A hierarchy is an arrangement of
objects, people, elements, values, grades, orders, classes, etc., in a ranked or graduated series.
Basically a hierarchy is a system by which things are arranged in an order, typically with all but one item having a “parent” (or occasionally multiple “parents”) immediately above them, and any number of “children” immediately below them
Compositional Hierarchy
Population
Organ
Organism
Tissue
Cell
Organelle
Protein
Biology
Containment Hierarchy
Animalia
Mammalia
Chordata
Primates
Hominidae
Homo
Sapiens
Parralelogram
Rectangle
Kite
Rhombus
Square
Quadrilateral
Trapezoid
Polygon
Shape
Biology Mathematics
Feature Hierarchies for Object ClassificationBoris Epshtein & Shimon Ullman
ICCV '05
Visual Features of Intermediate Complexity and Their Use in Classification
Shimon Ullman, Michael Vidal Naquet & Erez SaliNature Neuroscience, July 2002
Motivation
Most object recognition schemes using parts or features fall into one of two categories: Using small basic features like Gabor patches or
SIFT features Using large global complex features like templates
or constellations However, small features return false positives
far too often, and large global features are too picky
Intermediate Features - Better?
To test the hypothesis that features of intermediate complexity (IC) might be optimal for use in object classification, image fragments of varying size were extracted from images of frontal faces and side views of cars
Fragments were sorted by location, and for each fragment the mutual information was calculated: I(C,F) = H(C) – H(C|F) Where C denotes the class, F denotes the feature,
and H denotes entropy.
Results
Conclusion
IC features carry more information than small, simple features and large, complex ones Rationale is simple: IC features represent minimal
trade-off between frequency and specificity But use of IC features raises a new problem:
there are far more IC features than small, simple features
Solution: heirarchical object classification
Hierarchical Classification
Basic idea: in traditional object classification, objects are detected due to a sufficient presence/arrangement of some subset of features e.g. Cow ~ Body, tail, legs, head
If features are too simple, too many false positives; too complex, too many misses
Solution, represent parts as their own network of subparts e.g. Head ~ eyes, snout, ears, neck
Examples
Previous Work
Several algorithms have been proposed using more informative IC features, but without any hierarchical representation
Other algorithms have used hierarchical representation of features, but hierarchies had predefined structures and/or used features which were significantly less informative than IC features
Structure of Training Algorithm
Find most informative set of IC features For each feature, find most informative set of
subfeatures Continue subdivision until subfeatures no
longer significantly increase informativeness Select weights and regions of interest for all
features and subfeatures using optimization of mutual information
Find informative features
For a given class, collect >10,000 image fragments from positive images (size 10%-50% of image on either axis)
Select fragment with highest mutual information For each subsequent fragment, choose
fragment which provides the most additional information
Continue until best fragment adds insignificant information or set number of features is reached
Finding Sub-features
For any feature, the aim is to repeat the above process using the new feature as the target, rather than the overall object Positive examples are thus fragments in class-
positive images where the feature is detected or almost detected
Negative examples are fragments in class-negative images where the feature is detected or almost detected
Sub-features are selected as before; if set of sub-features increases information, they are added to the hierarchy
Positive & Negative Feature Examples
Regions of Interest and Weights
Once all relevant features are found, their regions of interest (ROIs) are chosen to optimize mutual information ROI size is optimized first for root node, holding
other nodes constant, then for child nodes, and so on
Relative weights and positions of ROIs are then simultaneously optimized using an iterative back-and-forth procedure
Use of algorithm
For any given feature, response is a linear combination of sub-feature responses:
r=w0∑i=1
n
w i si
Where si is the maximal response of sub-feature i in
its ROI, and wi is the weight of sub-feature i
Response is then passed through a sigmoid to get the parent node's final response:
s p=2
1e−r−1
Results
Results
Issues and Questions
Optimization of features is not quite complete Features are chosen for maximum mutual
information, then have ROIs optimized: it's possible that some ultimately more optimal features are excluded
Application process is not as computationally efficient Response map must be calculated over entire
image for every node in hierarchy Use of actual image fragments as features
seems sub-optimal
Coarse-to-Fine Classification and Scene Labeling
Donald Geman, 2003
Hierarchical Object Indexing and Sequential Learning
Xiaodong Fan & Donald GemanICPR 2004
Multi-class Object Classification
Suppose we wish to read symbols off of a license plate
With most algorithms we would test for the presence of every possible symbol one at a time, using a very large number of relevant object features
This is slow, inefficient, and often inaccurate
Multi-class Object Classification
What would be ideal would be a classifier which tests for multiple classes simultaenously
In addition, the classifier should not be forced to fully check for every class at every location
How to accomplish this? Let's play Twenty Questions
Twenty Questions
Is it a living thing? Yes Is it a person? No Is it a mammal? Yes Is it a pet? Yes Is it a dog? Yes Is it a large dog? No Is it a chihuahua? Yes
Is it a living thing? Yes Is it a person? Yes Is it female? No Is he famous? Yes Is he alive? No Did he die in the last 100
years? No Was he a president? Yes Is he Abraham Lincoln?
Yes
Twenty Questions
In Twenty Questions, a question is selected to pare down the set of possible answers as much as possible
More importantly, each choice of question depends on what answers come before It wouldn't be very smart to ask “Is it a chihuahau?”
after being told the answer was a president Similarly, asking “Is it Abraham Lincoln?” doesn't
make much sense after being told the answer is a dog
The questions being asked are adaptive
Coarse-to-Fine Classification
Apply a similar principle to object classification Begin with broad tests to eliminate large
numbers of possible hypotheses According to results of tests, apply successively
more specific tests This set of tests creates a containment
hierarchy of hypotheses
Specifics
λ = (C,θ) is called a instance where C is a particular class, and θ, the presentation, includes information which varies within class: size, position, orientation; the space of all possible λ will be called S
Construct a containment hierarchy of select subsets of S, with leaf nodes corresponding to single instance (or small enough sets of instances that object classification is possible) or to the empty set, meaning no object of any class is present
With each node, associate some test which will always return a positive response if any of the instances in the subset at that node are present in the image
Toy Example
{I, H, L, T}Vertical Bar?
Horizontal Bar?No Other Marks?
{I} {H, L, T}
Another vertical? No other vertical?
{H} {L, T}
Horizontal on top? Horizontal on bottom?
{T} {L}
First Implementation: Faces
One class – faces; presentations vary in position, size, and orientation
Successive levels of hierarchy decrease space of possible presentations
Initial test of detector simply tests for the rough possibility of a face: low discrimination, high false alarm rate
Only later tests, adapted to repsonses, have higher discriminability
Advantages
High computational efficiency: analysis of a 400x400 pixel image can take place at ~ 1 second
Computational resources are far more likely to be used in locations where there is more likely to be a face, often several orders of magnitude more
Resource Distribution
Resource Distribution
Resource Distribution
Limitations
Nature of tests in hierarchy lead to a high number of false positives
According to Geman et al., false positives and multiple detections can be dealt with in post-processing
Second Implementation: License Plates
37 object classes: 26 letters, 10 digits, and a vertical bar
Presentation includes position, size, and tilt (assumed to be fairly small, < 5º)
Hierarchy of instances constructed using “hierarchical clustering”
Tests consist of weighted sum of responses of local orientation detectors, trained on positive & negative examples created by varying size, position and tilt of given templates
Results
Algorithm returns ~400 false alarms per image Clustering class-consistent results and
suppressing reduces this number to ~40 Further processing (not described here) gives
fairly satisfactory results
Conclusions
Most important insight: use of hierarchical adaptive processing can greatly improve efficiency of object recognition algorithms, particularly when classifying multiple closely related classes like characters, animals, or vehicles
Potential limitations of these particular algorithms related to high false positive rates These limitations, however, do not negate the
positive effects on efficiency nor the potential contribution to object recognition architecture
ReferencesD. Geman, "Coarse-to-fine classification and scene labeling," Nonlinear Estimation and Classification (eds. D. D. Denison et al.), Lecture Notes in Statistics. New York: Springer-Verlag, 31-48, 2003.
F. Fleuret and D. Geman, "Fast face detection with precise pose estimation," Proceedings ICPR 2002, 1, 235-238, 2002.
X. Fan and D. Geman, "Hierarchical object indexing and sequential learning," Proceedings ICPR 2004, 3, 65-68, 2004.
Ullman, S., Vidal-Naquet, M. , and Sali, E. “Visual features of intermediate complexity and their use in classification,” Nature Neuroscience, 5(7), 1-6, 2002.
Epshtein, B., and Ullman S. “Feature Hierarchies for Object Classification,” Proceeedings ICCV 05, 1, 220-227, 2005.