Open-World Vision Amir R. Zamir
What, Why, and How of “Representation”
Things... Our Knowledge...
3
“Transcript”
Cat
Macbeth was guilty.
4
“Transcript”
Cat
Macbeth was guilty.
[ 81 20 84 64 58 39 17 54 72 15]
Representation Mathematical Model (e.g., classifier)
5
~12 lbs
~8 lbs
-5 0 +20 7 15 11
X X X X X X X X X X X X X X X X X X X X X X X X
w
6
~12 lbs
~8 lbs
-5 0 +20 7 15 11
X X X X X X X X X X X X X X X X X X X X X X X X
w
7
~12 lbs
~8 lbs
-5 0 +20 7 15 11
X X X X X X X X X X X X X X X X X X X X X X X X
w
Weight (w)
Representation Mathematical Model (Classifier)
w>11
X X
Type B
Type A
8
Represent these cats for a cat detector!
9
Represent these cats for a cat detector! (II)
10
Represent these cats for a cat detector! (II)
11
Represent these cats for a cat detector! (III)
12
Represent these cats for a cat detector! (IV)
13
Color Histograms
Deformable Part based Models
(DPM)
Histogram of Gradients
(HOG)
Models based Shapes
14 Felzenszwalb et al. 2010. Dalal and Triggs, 2005. Beis and Lowe, 1997.
Not always as easy (Happy vs Sad)
15
Not always as easy (Sad)
16
Learning Representations
Convolutional Neural Network Autoencoder 17
LeCun et al. 1998. Hinton et al. 2006.
Two approaches to learning Unsupervised
Representation constrained on reconstruction.
18
Supervised
Representation constrained on task(s).
Two approaches to learning Supervised
Representation constrained on task(s).
Unsupervised
Representation constrained on reconstruction.
19 LeCun et al. 1998. Hinton et al. 2006.
20 Stanford CS231n
Lightning overview of Neural Networks
Neural Net
A Neuron
Convolutional X
Understanding Representations
Embedding 21 Maaten and Hinton, 2008.
Understanding Representations
Inverting a representation
[ 81 20 84 64 58 39 17 54 72 15]
22
Understanding Representations
Inverting a representation 23 Dosovitskiy and Brox, 2015.
Representations in NLP, Brain, Speech, etc.
Word2Vec (NLP) FMRI Scan (brain) 24 Mikolov et al. 2013
“Transcript”
Cat
Macbeth was guilty.
[ 81 20 84 64 58 39 17 54 72 15]
Representation Mathematical Model (e.g., classifier)
CS231n
CS331b CS229
25
Now that we’re done with background building…
26
Open-World (Generic) Representations
27
Open-World (Generic) Computer Vision
An Exciting Time!
Fully Supervised Learning
• Fully supervised learning is task specific. • Will not lead to a human-like comprehensive perception.
• Characterized by Generalization & Abstraction
➔ How to develop a system/representation with Generalization & Abstraction?
How to achieve generalization & abstraction?
• Proposition: • (Instead of providing supervision over the desired tasks) • Provide supervision over a set of selected foundational tasks ⇒ generalization to novel tasks and abstraction capabilities.
Held & Hein. 1963.
• But how to pick the foundational tasks? • Biology! • Inspirations from developmental
stages of visual skills in brainGeneric 3D Representation via Pose Estimation and Matching. ECCV 2016. Amir Zamir, Tilman Wekel, Pulkit Agrawal, Colin Wei, Jitendra Malik, Silvio Savarese.
Generic 3D Representation Learning
Generic 3D Representation via Pose Estimation and Matching. Amir Zamir, Tilman Wekel, Pulkit Agrawal, Colin Wei, Jitendra Malik, Silvio Savarese. ECCV 2016.
Generic 3D Representation Learning
Generic 3D Representation via Pose Estimation and Matching. Amir Zamir, Tilman Wekel, Pulkit Agrawal, Colin Wei, Jitendra Malik, Silvio Savarese. ECCV 2016.
Learn it from the world!
Dataset Coverage
The 3D Representation
Evaluations
• Camera pose estimation • Matching (wide-baseline)
State-of-the-art Human-level
• Surface Normal • 3D Object Pose • 3D Scene Layout • Visual Abstraction
State-of-the-art unsupervised
Unsupervised TasksSupervised Tasks
Pose (surface normal) Embedding
Imag
eNet
(A
lexN
et)
Gen
eric
3D
Rep
.
Krizhevsky et al. 2012. Russakovsky et al. 2015
MIT Places
MIT Places
ImageNet (AlexNet) Generic 3D Rep.
Zhou et al. 2014 Krizhevsky et al. 2012. Russakovsky et al. 2015
3D Object Pose - ImageNet
Generic 3D Rep.
Wang & Gupta. 2015 Krizhevsky et al. 2012. Russakovsky et al. 2015
3D Object Pose - ImageNet
3D Object Pose - ImageNet
http://3drepresentation.stanford.edu/
Query Image
Generic 3D Representation
ImageNet (AlexNet)
3D Object Pose Estimation - Abstraction
Wan
g &
Gup
ta. 2
015
Agr
awal
et a
l. 20
15.
Rus
sako
vsky
et a
l. 20
15
Ozu
ysal
et a
l. 20
09.
3D Object Pose Estimation - Abstraction
Wan
g &
Gup
ta. 2
015
Agr
awal
et a
l. 20
15.
Rus
sako
vsky
et a
l. 20
15
Ozu
ysal
et a
l. 20
09.
3D Object Pose Estimation – Cross Category
Wang & Gupta. 2015 Agrawal et al. 2015. Russakovsky et al. 2015 Xiang et al. 2014.
3D Object Pose Estimation – Cross Category
Wang & Gupta. 2015 Agrawal et al. 2015. Russakovsky et al. 2015 Xiang et al. 2014.
3D Layout Estimation - Abstraction
Wan
g &
Gup
ta. 2
015
Agr
awal
et a
l. 20
15.
Rus
sako
vsky
et a
l. 20
15
Zhan
g et
al.
2016
.
3D Layout Estimation - Abstraction
Wan
g &
Gup
ta. 2
015
Agr
awal
et a
l. 20
15.
Rus
sako
vsky
et a
l. 20
15
Zhan
g et
al.
2016
.
Unsupervised Evaluations
Surface Normal Estimation (NYUv2)
Scene Layout Classification (LSUN)
Scene Layout Estimation (LSUN)
Object Pose Estimation (PASCAL3D)
What’s under the hood – Vanishing Points?
Mahendran & Vedadi. 2015. Angladon et al. 2015. Denis et al. 2008. Li et al. 2010.
Matching Evaluation
Zagoruyko, & Komodakis. 2015. Simi-Serra et al. 2015. Lowe. 2004. Arandjelovic & Zisserman. 2012. Wu et al. 2011. Simonyan & Zisserman. 2014. Tola et al. 2008. Morel et al. 2009.
Matching Results
Matching Results
Matching Results
Pose Regression Results
GTEstimated
Pose Regression Results
GT
Estimated
Pose Estimation Evaluation
Wu. 2011. Geiger et al. 2011 Wu et al. 2011.
`• Task Taxonomy
• Problem space is unknown. • Essential for the 3D-complete representation
• Proper Fusion Techniques • Beyond simplistic late-fusion or ConvNet fine tuning • Essential for the vision-complete representation
• Proper Data!
Generic 3D Representation via Pose Estimation and Matching. Amir Zamir, Tilman Wekel, Pulkit Agrowal, Colin Wei, Jitendra Malik, Silvio Savarese. ECCV 2016.
http://www.cs.stanford.edu/~amirz/
http://3DRepresentation.stanford.edu/