![Page 1: Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley](https://reader035.vdocument.in/reader035/viewer/2022070404/56649f355503460f94c52cef/html5/thumbnails/1.jpg)
Computer Vision GroupUniversity of California Berkeley
On Visual Recognition
Jitendra Malik UC Berkeley
![Page 2: Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley](https://reader035.vdocument.in/reader035/viewer/2022070404/56649f355503460f94c52cef/html5/thumbnails/2.jpg)
Computer Vision GroupUniversity of California Berkeley
From Pixels to Perception
TigerGrass
Water
Sand
outdoorwildlife
Tiger
tail
eye
legs
head
back
shadow
mouse
![Page 3: Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley](https://reader035.vdocument.in/reader035/viewer/2022070404/56649f355503460f94c52cef/html5/thumbnails/3.jpg)
Computer Vision GroupUniversity of California Berkeley
Object Category Recognition
![Page 4: Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley](https://reader035.vdocument.in/reader035/viewer/2022070404/56649f355503460f94c52cef/html5/thumbnails/4.jpg)
Computer Vision GroupUniversity of California Berkeley
Defining Categories
• What is a “visual category”?– Not semantic
– Working hypothesis: Two instances of the same category must have “correspondence” (i.e. one can be morphed into the other)
• e.g. Four-legged animals
– Biederman’s estimate of 30,000 basic visual categories
![Page 5: Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley](https://reader035.vdocument.in/reader035/viewer/2022070404/56649f355503460f94c52cef/html5/thumbnails/5.jpg)
Computer Vision GroupUniversity of California Berkeley
Facts from Biological Vision
• Timing
• Abstraction/Generalization
• Taxonomy and Partonomy
![Page 6: Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley](https://reader035.vdocument.in/reader035/viewer/2022070404/56649f355503460f94c52cef/html5/thumbnails/6.jpg)
Computer Vision GroupUniversity of California Berkeley
Detection can be very fast
• On a task of judging animal vs no animal, humans can make mostly correct saccades in 150 ms (Kirchner & Thorpe, 2006)
– Comparable to synaptic delay in the retina, LGN, V1, V2, V4, IT pathway.
– Doesn’t rule out feed back but shows feed forward only is very powerful
![Page 7: Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley](https://reader035.vdocument.in/reader035/viewer/2022070404/56649f355503460f94c52cef/html5/thumbnails/7.jpg)
Computer Vision GroupUniversity of California Berkeley
-100
102030405060708090
100
0 50 100 150 200
exposure [ms]
accuracy
(corrected for guessing)
detection: obj vs. texture
categorization: car vs. obj
identification: jeep vs. car
As Soon as You Know It Is There, You Know What It Is
Grill-Spector & Kanwisher, Psychological Science, 2005
![Page 8: Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley](https://reader035.vdocument.in/reader035/viewer/2022070404/56649f355503460f94c52cef/html5/thumbnails/8.jpg)
Computer Vision GroupUniversity of California Berkeley
Abstraction/Generalization
• Configurations of oriented contours
• Considerable toleration for small deformations
![Page 9: Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley](https://reader035.vdocument.in/reader035/viewer/2022070404/56649f355503460f94c52cef/html5/thumbnails/9.jpg)
Computer Vision GroupUniversity of California Berkeley
Attneave’s Cat (1954)Line drawings convey most of the information
![Page 10: Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley](https://reader035.vdocument.in/reader035/viewer/2022070404/56649f355503460f94c52cef/html5/thumbnails/10.jpg)
Computer Vision GroupUniversity of California Berkeley
Taxonomy and Partonomy
• Taxonomy: E.g. Cats are in the order Felidae which in turn is in the class Mammalia– Recognition can be at multiple levels of categorization, or be identification
at the level of specific individuals , as in faces.
• Partonomy: Objects have parts, they have subparts and so on. The human body contains the head, which in turn contains the eyes.
• These notions apply equally well to scenes and to activities.
• Psychologists have argued that there is a “basic-level” at which categorization is fastest (Eleanor Rosch et al).
• In a partonomy each level contributes useful information fro recognition.
![Page 11: Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley](https://reader035.vdocument.in/reader035/viewer/2022070404/56649f355503460f94c52cef/html5/thumbnails/11.jpg)
Computer Vision GroupUniversity of California Berkeley
Matching with Exemplars
• Use exemplars as templates
• Correspond features between query and exemplar
• Evaluate similarity score
Query
Image
Database of
Templates
![Page 12: Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley](https://reader035.vdocument.in/reader035/viewer/2022070404/56649f355503460f94c52cef/html5/thumbnails/12.jpg)
Computer Vision GroupUniversity of California Berkeley
Matching with Exemplars
• Use exemplars as templates
• Correspond features between query and exemplar
• Evaluate similarity score
Query
Image
Database of
Templates
Best matching template is a helicopter
![Page 13: Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley](https://reader035.vdocument.in/reader035/viewer/2022070404/56649f355503460f94c52cef/html5/thumbnails/13.jpg)
Computer Vision GroupUniversity of California Berkeley
3D objects using multiple 2D views
View selection algorithm from Belongie, Malik & Puzicha (2001)
![Page 14: Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley](https://reader035.vdocument.in/reader035/viewer/2022070404/56649f355503460f94c52cef/html5/thumbnails/14.jpg)
Computer Vision GroupUniversity of California Berkeley
Error vs. Number of Views
![Page 15: Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley](https://reader035.vdocument.in/reader035/viewer/2022070404/56649f355503460f94c52cef/html5/thumbnails/15.jpg)
Computer Vision GroupUniversity of California Berkeley
Three Big Ideas
• Correspondence based on local shape/appearance descriptors
• Deformable Template Matching
• Machine learning for finding discriminative features
![Page 16: Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley](https://reader035.vdocument.in/reader035/viewer/2022070404/56649f355503460f94c52cef/html5/thumbnails/16.jpg)
Computer Vision GroupUniversity of California Berkeley
Three Big Ideas
• Correspondence based on local shape/appearance descriptors
• Deformable Template Matching
• Machine learning for finding discriminative features
![Page 17: Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley](https://reader035.vdocument.in/reader035/viewer/2022070404/56649f355503460f94c52cef/html5/thumbnails/17.jpg)
Computer Vision GroupUniversity of California Berkeley
Comparing Pointsets
![Page 18: Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley](https://reader035.vdocument.in/reader035/viewer/2022070404/56649f355503460f94c52cef/html5/thumbnails/18.jpg)
Computer Vision GroupUniversity of California Berkeley
Shape ContextCount the number of points inside each bin, e.g.:
Count = 4
Count = 10
...
Compact representation of distribution of points relative to each point
(Belongie, Malik & Puzicha, 2001)
![Page 19: Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley](https://reader035.vdocument.in/reader035/viewer/2022070404/56649f355503460f94c52cef/html5/thumbnails/19.jpg)
Computer Vision GroupUniversity of California Berkeley
Shape Context
![Page 20: Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley](https://reader035.vdocument.in/reader035/viewer/2022070404/56649f355503460f94c52cef/html5/thumbnails/20.jpg)
Computer Vision GroupUniversity of California Berkeley
Geometric Blur(Local Appearance Descriptor)
Geometric Blur Descriptor
~
Compute sparse
channels from image
Extract a patch
in each channel
Apply spatially varying
blur and sub-sample
(Idealized signal)
Descriptor is robust to small affine distortions
Berg & Malik '01
![Page 21: Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley](https://reader035.vdocument.in/reader035/viewer/2022070404/56649f355503460f94c52cef/html5/thumbnails/21.jpg)
Computer Vision GroupUniversity of California Berkeley
Three Big Ideas
• Correspondence based on local shape/appearance descriptors
• Deformable Template Matching
• Machine learning for finding discriminative features
![Page 22: Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley](https://reader035.vdocument.in/reader035/viewer/2022070404/56649f355503460f94c52cef/html5/thumbnails/22.jpg)
Computer Vision GroupUniversity of California Berkeley
Modeling shape variation in a category
• D’Arcy Thompson: On Growth and Form, 1917– studied transformations between shapes of organisms
![Page 23: Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley](https://reader035.vdocument.in/reader035/viewer/2022070404/56649f355503460f94c52cef/html5/thumbnails/23.jpg)
Computer Vision GroupUniversity of California Berkeley
MatchingExample
model target
![Page 24: Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley](https://reader035.vdocument.in/reader035/viewer/2022070404/56649f355503460f94c52cef/html5/thumbnails/24.jpg)
Computer Vision GroupUniversity of California Berkeley
Handwritten Digit Recognition
• MNIST 60 000: – linear: 12.0%
– 40 PCA+ quad: 3.3%
– 1000 RBF +linear: 3.6%
– K-NN: 5%
– K-NN (deskewed): 2.4%
– K-NN (tangent dist.): 1.1%
– SVM: 1.1%
– LeNet 5: 0.95%
• MNIST 600 000 (distortions): – LeNet 5: 0.8%– SVM: 0.8%– Boosted LeNet 4: 0.7%
• MNIST 20 000: – K-NN, Shape Context
matching: 0.63%
![Page 25: Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley](https://reader035.vdocument.in/reader035/viewer/2022070404/56649f355503460f94c52cef/html5/thumbnails/25.jpg)
Computer Vision GroupUniversity of California Berkeley
![Page 26: Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley](https://reader035.vdocument.in/reader035/viewer/2022070404/56649f355503460f94c52cef/html5/thumbnails/26.jpg)
Computer Vision GroupUniversity of California Berkeley
EZ-Gimpy Results
• 171 of 192 images correctly identified: 92 %
horse
smile
canvas
spade
join
here
![Page 27: Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley](https://reader035.vdocument.in/reader035/viewer/2022070404/56649f355503460f94c52cef/html5/thumbnails/27.jpg)
Computer Vision GroupUniversity of California Berkeley
Three Big Ideas
• Correspondence based on local shape/appearance descriptors
• Deformable Template Matching
• Machine learning for finding discriminative features
![Page 28: Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley](https://reader035.vdocument.in/reader035/viewer/2022070404/56649f355503460f94c52cef/html5/thumbnails/28.jpg)
Computer Vision GroupUniversity of California Berkeley
Discriminative learning(Frome, Singer, Malik, 2006)
weights on patch features in training images
distance functions from training images to any other images
browsing, retrieval, classification
83/40079/400
![Page 29: Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley](https://reader035.vdocument.in/reader035/viewer/2022070404/56649f355503460f94c52cef/html5/thumbnails/29.jpg)
Computer Vision GroupUniversity of California Berkeley
triplets
•learn from relative similarity
image iimage j image k
want:
image-to-image distances based on feature-to-image distances
compare image-to-image distances
![Page 30: Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley](https://reader035.vdocument.in/reader035/viewer/2022070404/56649f355503460f94c52cef/html5/thumbnails/30.jpg)
Computer Vision GroupUniversity of California Berkeley
focal image version
image i (focal)
0.3
0.8
0.4
0.2
image j
image k
-...
0.8 0.2
...
0.3 0.4
=xijk...
0.5 -0.2
dik
dij
![Page 31: Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley](https://reader035.vdocument.in/reader035/viewer/2022070404/56649f355503460f94c52cef/html5/thumbnails/31.jpg)
Computer Vision GroupUniversity of California Berkeley
large-margin formulation
slack variables like soft-margin SVMw constrained to be positiveL2 regularization
![Page 32: Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley](https://reader035.vdocument.in/reader035/viewer/2022070404/56649f355503460f94c52cef/html5/thumbnails/32.jpg)
Computer Vision GroupUniversity of California Berkeley
Caltech-101 [Fei-Fei et al. 04]
• 102 classes, 31-300 images/class
![Page 33: Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley](https://reader035.vdocument.in/reader035/viewer/2022070404/56649f355503460f94c52cef/html5/thumbnails/33.jpg)
Computer Vision GroupUniversity of California Berkeley
retrieval examplequery image
retrieval results:
![Page 34: Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley](https://reader035.vdocument.in/reader035/viewer/2022070404/56649f355503460f94c52cef/html5/thumbnails/34.jpg)
Computer Vision GroupUniversity of California Berkeley
Caltech 101 classification results
(see Manik Verma’s talks for the best yet..)
![Page 35: Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley](https://reader035.vdocument.in/reader035/viewer/2022070404/56649f355503460f94c52cef/html5/thumbnails/35.jpg)
Computer Vision GroupUniversity of California Berkeley
15 training/class, 63.2%
![Page 36: Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley](https://reader035.vdocument.in/reader035/viewer/2022070404/56649f355503460f94c52cef/html5/thumbnails/36.jpg)
Computer Vision GroupUniversity of California Berkeley
Conclusion
• Correspondence based on local shape/appearance descriptors
• Deformable Template Matching
• Machine learning for finding discriminative features
• Integrating Perceptual Organization and Recognition