the three r’s of vision jitendra malik. the three r’s of vision

46
The Three R’s of Vision Jitendra Malik

Upload: cory-ball

Post on 28-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

The Three R’s of Vision

Jitendra Malik

Page 2: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

The Three R’s of Vision

Page 3: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

Recognition, Reconstruction & Reorganization

Page 4: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

Review

• Recognition– 2D problems such as handwriting recognition, face

detection– Partial progress on 3d object category recognition

• Reconstruction– Feature matching + multiple view geometry has led to

city scale point cloud reconstructions

• Reorganization– Graphcuts for interactive segmentation– Bottom up boundaries and regions/superpixels

Page 5: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

Caltech-101 [Fei-Fei et al. 04]• 102 classes, 31-300 images/class

Page 6: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

Caltech 101 classification results

(even better by combining cues..)

Page 7: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

PASCAL Visual Object Challenge (Everingham et al)

Page 8: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision
Page 9: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

Builds on Dalal & Triggs HOG detector (2005)

Page 10: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

Critique of the State of the Art

• Performance is quite poor compared to that at 2d recognition tasks and the needs of many applications.

• Pose Estimation / Localization of parts or keypoints is even worse. We can’t isolate decent stick figures from radiance images, making use of depth data necessary.

• Progress has slowed down. Variations of HOG/Deformable part models dominate.

Page 11: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

Recall saturates around 70%

Page 12: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

Recall saturates around 70%

Page 13: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

State of the Art in Reconstruction

• Multiple photographs • Range Sensors

Agarwal et al (2010)

Kinect (PrimeSense)

Velodyne Lidar

Critique: Semantic Segmentation is needed to make this more useful…

Page 14: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

State of the Art in Reorganization• Interactive segmentation

using graph cuts• Berkeley gPb edges &

regions

Rother, Kolmogorov & Blake (2004), Boykov & Jolly (2001), Boykov, Veksler & Zabih(2001)

Arbelaez et al (2009), Martin, Fowlkes, Malik (2004), Shi & Malik (2000)

Critique: What is needed is fully automatic semantic segmentation

Page 15: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

The Three R’s of Vision

Each of the 6 directed arcs in this diagram is a useful direction of information flow

Recognition

Reconstruction Reorganization

Page 16: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

The Three R’s of Vision

Recognition

Reconstruction Reorganization

Shape as feature

Morphable shape models

Superpixel assemblies as candidates

Page 17: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

Next steps in recognition

• Incorporate the “shape bias” known from child development literature to improve generalization– This requires monocular computation of shape, as once

posited in the 2.5D sketch, and distinguishing albedo and illumination changes from geometric contours

• Top down templates should predict keypoint locations and image support, not just information about category

• Recognition and figure-ground inference need to co-evolve. Occlusion is signal, not noise.

Page 18: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

Next steps in recognition

• Incorporate the “shape bias” known from child development literature– This requires monocular computation of shape, as once

posited in the 2.5D sketch, and distinguishing albedo and illumination changes from geometric contours

• Top down templates should predict keypoint locations and image support, not just information about category Poselets: Bourdev & Malik, 2009 & later

• Recognition and figure-ground inference need to co-evolve. Occlusion is signal, not noise.

Barron & Malik, CVPR 2012

Arbelaez et al, CVPR 2012

Page 19: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

ReconstructionShape, Albedo & Illumination

Albedo

Shape

Page 20: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

Shape, Albedo, and Illumination from Shading

Jonathan Barron Jitendra Malik

UC Berkeley

Page 21: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

Far

Near

shape / depth

Forward Optics

Page 22: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

Far

Near

shape / depth

Forward Optics

illumination

Page 23: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

Far

Near

log-shading image of Z and Lshape / depth

Forward Optics

illumination

Page 24: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

Far

Near

log-shading image of Z and Lshape / depth

log-albedo / log-reflectance

Forward Optics

illumination

Page 25: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

Far

Near

log-shading image of Z and Lshape / depth

log-albedo / log-reflectance

illumination

Lambertian reflectance in log-intensity

Forward Optics

Page 26: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

Far

Near

?? ??

??

Shape, Albedo, and Illumination from ShadingSAIFS (“safes”)

??log-shading image of Z and Lshape / depth

log-albedo / log-reflectance

illumination

Lambertian reflectance in log-intensity

Page 27: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

Assume illumination and albedo are known, and solve for the shape

??

Shape from Shading

??Ignore shape and illumination, and classify edges as either shading or albedo

??

Intrinsic Images

Past Work

Page 28: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

Input:

Output:

IlluminationShape Albedo Shading

Problem Formulation

Given a single image, search for the most likely explanation(shape, albedo, and illumination) that together exactly reproduces that image.

Page 29: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

Some Explanations

Page 30: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

Some Explanations

Page 31: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

Some Explanations

Page 32: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

Some Explanations

Page 33: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

MIT intrinsic images data set

Evaluation:

Page 34: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

Real World ImagesEvaluatio

n:

Page 35: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

Real World ImagesEvaluatio

n:

Page 36: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

Real World ImagesEvaluatio

n:

Page 37: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

Evaluation:Graphics!

Page 38: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

Evaluation:Graphics!

Page 39: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

Evaluation:Graphics!

Page 40: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

Evaluation:Graphics!

Page 41: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

Semantic Segmentation using Regions and Parts

P. Arbeláez, B. Hariharan, S. Gupta,

C. Gu, L. Bourdev and J. Malik

Page 42: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

This WorkTop-down Part/Object Detectors

0.93

Cat Segmenter

0.570.32

Bottom-up Region Segmentation

Page 43: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

Region Generation

• Hierarchical segmentation tree based on contrast• Hierarchy computed at three image resolutions• Nodes of the trees are object candidates, and also pairs and triplets of

adjacent nodes

• Output: For each image, a set of ~1000 object candidates (connected regions)

Page 44: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

Results on PASCAL VOC

Page 45: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

How to think about Vision

• “Theory” • Models– Feature Histograms– Support Vector machines– Randomized decision trees– Spectral partitioning– L1 minimization– Stochastic Grammars– Deep Learning– Markov Random Fields– …

Page 46: The Three R’s of Vision Jitendra Malik. The Three R’s of Vision

Thanks!