presentation (paper review) learning image representations...

21
Hilgad Montelo Learning Image Representations Tied to Egomotion Jayaraman and Grauman. ICCV 2015 Presentation (paper review) 2016 March University of Texas at Austin Visual Recognition

Upload: others

Post on 10-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Presentation (paper review) Learning Image Representations ...vision.cs.utexas.edu/381V-spring2016/slides/montelo-paper.pdfHilgad Montelo Learning Image Representations Tied to Ego‐motion

Hilgad Montelo

Learning Image Representations Tied to Ego‐motion

Jayaraman and Grauman. ICCV 2015

Presentation (paper review)

2016 March

University of Texas at AustinVisual Recognition

Page 2: Presentation (paper review) Learning Image Representations ...vision.cs.utexas.edu/381V-spring2016/slides/montelo-paper.pdfHilgad Montelo Learning Image Representations Tied to Ego‐motion

Outline

• The "Kitten Carousel" Experiment • Problem• Objective• Main Idea• Related Work• Approach• Experiments and Results• Conclusions

Page 3: Presentation (paper review) Learning Image Representations ...vision.cs.utexas.edu/381V-spring2016/slides/montelo-paper.pdfHilgad Montelo Learning Image Representations Tied to Ego‐motion

The "Kitten Carousel" Experiment (Held & Hein, 1963)

3

active kitten passive kitten

Key to perceptual development:self-generated motion + visual feedback

[Slide credit: Dinesh Jayaraman]

Page 4: Presentation (paper review) Learning Image Representations ...vision.cs.utexas.edu/381V-spring2016/slides/montelo-paper.pdfHilgad Montelo Learning Image Representations Tied to Ego‐motion

Problem

• Today’s visual recognition algorithms  learn from “disembodied” bag of labeled snapshots.

Page 5: Presentation (paper review) Learning Image Representations ...vision.cs.utexas.edu/381V-spring2016/slides/montelo-paper.pdfHilgad Montelo Learning Image Representations Tied to Ego‐motion

Objective

• Provide visual recognition algorithm that learns in the context of acting and moving in the world.

Page 6: Presentation (paper review) Learning Image Representations ...vision.cs.utexas.edu/381V-spring2016/slides/montelo-paper.pdfHilgad Montelo Learning Image Representations Tied to Ego‐motion

• Associate Ego‐Motion and vision by teaching computer vision system the connection:

• “how I move”  “how my visual surroundings change”

+

Main Idea

Page 7: Presentation (paper review) Learning Image Representations ...vision.cs.utexas.edu/381V-spring2016/slides/montelo-paper.pdfHilgad Montelo Learning Image Representations Tied to Ego‐motion

Ego‐motion vision: view prediction

After moving:

7 [Slide credit: Dinesh Jayaraman]

Page 8: Presentation (paper review) Learning Image Representations ...vision.cs.utexas.edu/381V-spring2016/slides/montelo-paper.pdfHilgad Montelo Learning Image Representations Tied to Ego‐motion

Ego‐motion vision for recognition

• Learning this connection requires:

Depth, 3D geometry Semantics Context

• Can be learned without manual labels!

Also key to recognition!

Approach: unsupervised feature learning using egocentric video + motor signals

8

Page 9: Presentation (paper review) Learning Image Representations ...vision.cs.utexas.edu/381V-spring2016/slides/montelo-paper.pdfHilgad Montelo Learning Image Representations Tied to Ego‐motion

Related Works

9

Integrating vision and motion

Visual predictionDoersch, Gupta, Efros, “… context prediction”, ICCV 2015Oh, Guo, Lee, Lewis, Singh, “Action-conditional video …”, NIPS 2015Kulkarni, Whitney, Kohli, Tenenbaum, “… inverse graphics ...”, NIPS 2015Vondrick, Pirsiavash, Torralba, “Anticipating the future ...”, arXiv 2015

Wang, Gupta, “Unsupervised learning of visual …”, ICCV 2015Goroshin, Bruna, Tompson, Eigen, LeCun, “Unsupervised ...”, ICCV 2015

Agrawal, Carreira, Malik, “Learning to see by moving”, ICCV 2015Watter, Springenberg, Boedecker, Riedmiller, “Embed to control...”, NIPS 2015Levine, Finn, Darrell, Abbeel, “… visuomotor policies”, arXiv 2015 Konda, Memisevic, “Learning visual odometry ...”, VISAPP 2015

Video for unsupervised image features

Page 10: Presentation (paper review) Learning Image Representations ...vision.cs.utexas.edu/381V-spring2016/slides/montelo-paper.pdfHilgad Montelo Learning Image Representations Tied to Ego‐motion

Ego‐motion equivarianceInvariant features: unresponsive to some classes of transformations

Invariance discards information;equivariance organizes it.

Equivariant features : predictably responsive to some classes of transformations, through simple mappings (e.g., linear) “equivariance map”

Approach

10

Page 11: Presentation (paper review) Learning Image Representations ...vision.cs.utexas.edu/381V-spring2016/slides/montelo-paper.pdfHilgad Montelo Learning Image Representations Tied to Ego‐motion

Equivariant embedding organized by ego‐motions

Pairs of frames related by similar ego‐motion should be related by same feature 

transformation

11

left turnright turnforward

Learn

time →

motor signal

Training dataUnlabeled video + motor signals

Approach

Source: “Learning image representations equivariant to ego motion ” Jayaraman and Grauman ICCV 2015

Page 12: Presentation (paper review) Learning Image Representations ...vision.cs.utexas.edu/381V-spring2016/slides/montelo-paper.pdfHilgad Montelo Learning Image Representations Tied to Ego‐motion

Approach

1. Extract training frame pairs from video

2. Learn ego‐motion‐equivariant image features

3. Train on target recognition task in parallel

12

Page 13: Presentation (paper review) Learning Image Representations ...vision.cs.utexas.edu/381V-spring2016/slides/montelo-paper.pdfHilgad Montelo Learning Image Representations Tied to Ego‐motion

Training frame pair mining

Discovery of ego‐motion clusters

Right turn

=forward

=right turn

=left turn

yaw cha

nge

forward distance

13 [Slide credit: Dinesh Jayaraman]

Page 14: Presentation (paper review) Learning Image Representations ...vision.cs.utexas.edu/381V-spring2016/slides/montelo-paper.pdfHilgad Montelo Learning Image Representations Tied to Ego‐motion

∥ ∥

Ego‐motion equivariant feature learning

Desired: for all motions  and all images  ,Given:

softmax loss  , y

Unsupervised training

Supervised trainingFeature space

class  ,  and jointly trained

14 [Slide credit: Dinesh Jayaraman]

Page 15: Presentation (paper review) Learning Image Representations ...vision.cs.utexas.edu/381V-spring2016/slides/montelo-paper.pdfHilgad Montelo Learning Image Representations Tied to Ego‐motion

Experiments

15

• Validation using 3 public datasets: NORB, KITTI, SUN.• Comparison with different methods: CLSNET, TEMPORAL, DRLIM.

Page 16: Presentation (paper review) Learning Image Representations ...vision.cs.utexas.edu/381V-spring2016/slides/montelo-paper.pdfHilgad Montelo Learning Image Representations Tied to Ego‐motion

Results: RecognitionLearn from unlabeled car video (KITTI)

Exploit features for static scene classification (SUN, 397 classes)

Geiger et al, IJRR ’13

Xiao et al, CVPR ’1016

Page 17: Presentation (paper review) Learning Image Representations ...vision.cs.utexas.edu/381V-spring2016/slides/montelo-paper.pdfHilgad Montelo Learning Image Representations Tied to Ego‐motion

KITTI⟶ SUN

Do ego‐motion equivariant features improve recognition?

397 classes

recognition

 accuracy (%

)

**Mobahi et al., Deep Learning from Temporal Coherence in Video, ICML’09 

*Hadsell et al., Dimensionality Reduction by Learning an Invariant Mapping, CVPR’06

Results: Recognition

6 labeled training examples per class

KITTI⟶KITTI

NORB⟶NORB

Up to 30% accuracy increase over state of the art!

0.25

0.70

1.02

1.21

1.58

invariance

17

Page 18: Presentation (paper review) Learning Image Representations ...vision.cs.utexas.edu/381V-spring2016/slides/montelo-paper.pdfHilgad Montelo Learning Image Representations Tied to Ego‐motion

• Leverage proposed equivariant embedding to select next best view for object recognition

Results: Active recognition

[Slide credit: Dinesh Jayaraman]

cup frying pan

cup/bowl/pan? cup/bowl/pan?

0

10

20

30

40

50

Accuracy (%

)

NORB data

Page 19: Presentation (paper review) Learning Image Representations ...vision.cs.utexas.edu/381V-spring2016/slides/montelo-paper.pdfHilgad Montelo Learning Image Representations Tied to Ego‐motion

Conclusion and Future Work• The paper provided a new embodied visual feature learning paradigm.

• The Ego‐motion equivariance boosts performance across multiple challenging recognition tasks.

19

Page 20: Presentation (paper review) Learning Image Representations ...vision.cs.utexas.edu/381V-spring2016/slides/montelo-paper.pdfHilgad Montelo Learning Image Representations Tied to Ego‐motion

Questions• Why KITTI training and not some other domain based training?

• Why does incorporating DRLIM improve EQUIV? Still Temporal coherence properties left to be learned?

• Is it meaningful to compare EQUIV or EQUIV + DRLIM with the other cases with respect to equivarianceerror?

Page 21: Presentation (paper review) Learning Image Representations ...vision.cs.utexas.edu/381V-spring2016/slides/montelo-paper.pdfHilgad Montelo Learning Image Representations Tied to Ego‐motion