deep learning for 3d localization - university of...
TRANSCRIPT
![Page 1: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/1.jpg)
1!
!
Vincent Lepetit !
University of Bordeaux, France & TU Graz, Austria!
Deep Learning for 3D Localization !
![Page 2: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/2.jpg)
2!
• 3D object detection from color images; !!
• Accurate geolocalization without registered images. !
![Page 3: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/3.jpg)
3!
BB8: 3D Pose Without Using Depth !
BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth. Mahdi Rad, Vincent Lepetit, ICCV 2017.!
![Page 4: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/4.jpg)
4!
Predicting the 3D Pose given a 2D Location of the object !
(Exponential
Map / quaternion /
rotation matrix,
Translation)
CNN
Solution #1: Directly predicting the pose!
![Page 5: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/5.jpg)
5!
Predicting 2D locations from an image is an easier regression task; !
We can compute the 3D pose from these 2D locations. !
We Can Do Better !
(the 2D
projections of
the 8 corners of
the 3D bounding
box)
CNN
![Page 6: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/6.jpg)
6!
à we can compute the 3D pose using a PnP algorithm. !
Getting the 3D Pose !
![Page 7: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/7.jpg)
7!
Training Set!Split the LINEMOD data: 15% of images for training, 85% for testing. !
!
Augmentation (200,000 images in total):!
extraction from real image !
random scaling ! random background !
random translation !
Other examples:!
![Page 8: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/8.jpg)
8!
CNN !
handle only 1 object, !OR !
VGG + retraining of the fully connected layers + fine-tuning of the last convolutional layers, can handle all 15 objects of the LINEMOD dataset. !
![Page 9: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/9.jpg)
9!
Direct Pose Estimation VS BB8 !
Direct Pose BB8
Average 64.9 85.4
Ape* 91.2 96.2
Bench Vise 61.3 80.2
Camera 43.1 82.8
Can 62.5 85.8
Cat* 93.1 97.2
Driller* 46.5 77.6
Duck 67.9 84.6
Egg Box 68.2 90.1
Glue 69.3 93.5
Hole Puncher 78.2 91.7
Iron 64.5 79.0
Lamp 50.4 79.9
Phone 46.9 80.0
2D projection metric !!
![Page 10: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/10.jpg)
10!
Refining the Pose !
input image!
3D model rendered from the current pose estimate !
CNN2 (2D displacements
improving the
current estimates
of the corners'
projections)
![Page 11: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/11.jpg)
11!
Refining the Pose !BB8 BB8 +
Refinement
Average 85.4 91.7
Ape 96.2 97.6
Bench Vise 80.2 92.0
Camera 82.8 88.3
Can 85.8 93.7
Cat 97.2 98.7
Driller 77.6 83.4
Duck 84.6 94.1
Egg Box 90.1 93.4
Glue 93.5 96.0
Hole Puncher 91.7 97.4
Iron 79.0 85.2
Lamp 79.9 83.8
Phone 80.0 88.8
![Page 12: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/12.jpg)
12!
Others [Brachmann16] VS Ours!
Metric 2D Projection 6D Pose 5cm and 5°
Sequence others ours others ours others ours
Average 73.7 89.4 50.2 62.8 40.6 69.0
Ape 85.2 96.5 33.2 40.2 34.4 80.0
Bench Vise 67.9 91.0 64.8 92.0 40.6 81.8
Camera 58.7 86.2 38.4 56.2 30.5 60.3
Can 70.8 92.1 62.9 64.6 48.4 77.1
Cat 84.2 98.7 42.7 62.3 34.6 79.6
Driller 73.9 80.7 61.9 74.1 54.5 69.3
Duck 73.1 92.4 30.2 44.8 22.0 53.6
Egg Box 83.1 91.1 49.9 58.1 57.1 81.3
Glue 74.2 92.5 31.2 41.6 23.6 54.2
Hole
Puncher
78.9 95.1 52.8 67.1 47.3 73.1
Iron 83.6 85.0 80.0 84.9 58.7 61.3
Lamp 64.0 75.5 67.0 76.3 49.3 67.4
Phone 60.6 85.1 38.1 53.9 26.8 58.4
![Page 13: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/13.jpg)
13!
Robustness to Partial Occlusion !
When generating training images, we randomly superimpose
objects from other sequences to the target object to be robust to
occlusion: !
![Page 14: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/14.jpg)
14!
Robustness to Partial Occlusion !
![Page 15: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/15.jpg)
15!
(Almost) Symmetric Objects !
![Page 16: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/16.jpg)
16!
T-Less Dataset [Hodan et al]: !(Almost) Symmetric Objects !
![Page 17: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/17.jpg)
17!
What Is the Problem?!
The network is trained to predict very different poses for very similar images, or exactly the same images if the
object is perfectly symmetrical. !
image space !pose space !
![Page 18: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/18.jpg)
18!
Solution (1)!Let β be the angle of symmetry of the object: !
1. We train a regressor (Convolutional Neural Network in practice) to predict the projection of the 3D bounding box
as in our previous method BUT ONLY on a restricted
range: [0°, β/2].!
!
β = 180° in this object !
0° …! β/2 !β/4 … !
à no more ambiguities!
![Page 19: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/19.jpg)
19!
Solution (2)!
2. To handle larger ranges, we train a classifier (also a Convolutional Neural Network) to tell if the pose is between [0°,
β/2] or between [β/2, β]: !
β/4! 3β/4!
![Page 20: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/20.jpg)
20!
Solution (3)!At run-time, if the pose is between [β/2, β] (as given by the classifier), we flip the image before applying the regressor: !
β/4! 3β/4!
![Page 21: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/21.jpg)
21!
Solution (4)!(still at run-time), if we flipped the image before applying the regressor, we flip the corners of the bounding box predicted
by the regressor:!
!
!
!
!
!
!
!!
!
![Page 22: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/22.jpg)
22!
![Page 23: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/23.jpg)
23!
Robustness to Light Changes: Adaptive Local Contrast Normalization (ALCN)!
27
ALCN: Adaptive Local Contrast Normalization for Robust Object Detection and 3D Pose Estimation. Mahdi Rad, Peter Roth, Vincent Lepetit, BMVC 2017. !
Training Sequence! Test Sequence!
![Page 24: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/24.jpg)
24!
Existing Illumination Normalization Methods !
!
Difference-of-Gaussians!
![Page 25: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/25.jpg)
25!
Difference-of-Gaussians !
!!"#
= !!!!! ∗ ! − !!!!!! ∗ !!
Observation: The parameters should be carefully tuned for optimal performance !
![Page 26: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/26.jpg)
26!
Idea: The Parameters Should Adapt to the Input Image !
!!"#$
= ! !! ! .!!!!"#$
!
!!!
∗ !!
BUT we cannot train this CNN in a standard supervised manner.!
* I Normalizer
f =
f .
.
.
N
f 2
f 1
+ ALCN
I
![Page 27: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/27.jpg)
27!
Training !
Solution: train jointly in a supervised way together with another CNN !
* Image
window
Normalizer
f
Detector
g
+ + + = BG
Obj. #1
Obj. #N
f . . . N f 2 f 1
![Page 28: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/28.jpg)
28!
Training !
We augment the Phos dataset:!
with synthetic images: !
![Page 29: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/29.jpg)
29!
Comparing with Existing Methods !
!
Difference-of-Gaussians!
ALCN [our method] using 1 real image
of a target object!
![Page 30: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/30.jpg)
30!
Predicted Filters for Different Input !
(a)
35
![Page 31: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/31.jpg)
31!
Predicted Filters for different input !
(b)
35
![Page 32: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/32.jpg)
32!
Predicted Filters for different input !
(c)
35
![Page 33: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/33.jpg)
33!
Predicted Filters for different input !
(d)
35
![Page 34: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/34.jpg)
34!
Color Image Normalization!
abcolor
RGBimage
lightness Normalizedlightness
Normalizedcolor image
![Page 35: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/35.jpg)
35!
Explicit Normalization vs. !Illumination Robustness with Deep Learning !
!
AUC
Shallow 0.401
ALCN + Shallow 0.787
VGG 0.606
ResNet (20 Layers) 0.456
ResNet (32 Layers) 0.498
ResNet (44 Layers) 0.518
ResNet (56 Layers) 0.589
ResNet (110 Layers) 0.565
![Page 36: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/36.jpg)
36!
Evaluation !
w/o with illumination changes
sequence #1 #2 #3 #4 #5 #6 #7 #8
BB8 100 47.3 18.3 32.7 0.00 0.00 0.00 0.00
BB8 + ALCN 100 77.8 60.7 70.7 64.1 51.4 76.2 50.1
Green: Ground Truth Red: BB8
Blue: BB8 + ALCN
![Page 37: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/37.jpg)
37!
Accurate Localization from Images !Accurate Camera Registration in Urban Environments Using High-Level Feature Matching, Anil Armagan, Martin Hirzer, Peter Roth, Vincent Lepetit, BMVC 2017. !
![Page 38: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/38.jpg)
38!
reprojection of the 2D map using the pose from
the sensors !
![Page 39: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/39.jpg)
39!
Use registered images? Very cumbersome !!
Idea: Use 2.5D maps from OpenStreetMap!
![Page 40: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/40.jpg)
40!
In 2.5D maps, buildings are modeled as 2D polygons + height !
We Want to Use 2.5D Maps !
![Page 41: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/41.jpg)
41!
- sensors give accurate angles wrt gravity: !à we can work on rectified images.!
!
!
!
!
!
!
!
!
!
- we assume we know the altitude of the camera; !
!
Remain 3 degrees-of-freedom: !
2D translation + rotation along the ground plane !
Original image ! Rectified image!
![Page 42: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/42.jpg)
42!
?!2.5D map around the GPS location!
![Page 43: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/43.jpg)
43!
Matching High-Level Features !
![Page 44: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/44.jpg)
44!
Matching High-Level Features !
3 correspondences between building corners in the image and in the
2D map à pose (2d translation + angle) !
+ RANSAC!
![Page 45: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/45.jpg)
45!
Matching High-Level Features !
![Page 46: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/46.jpg)
46!
Matching High-Level Features !
1 correspondence between a façade in the image and a façade in the
2D map à pose (2d translation + angle) !
+ RANSAC!
![Page 47: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/47.jpg)
47!
Semantic Segmentation and !Façades' Normals Estimation !
Semantic Segmentation! Façades' Normal Estimation!
CNN1 CNN2
![Page 48: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/48.jpg)
48!
Creating Training Data !
3D Tracker!
with human supervision !
Image Sequences!
3D Model from 2.5D Map!
Façade, Corners, and Normals Labels !
![Page 49: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/49.jpg)
49!
The segmentation is robust to (limited) occlusions. !
Segmentation Results !
![Page 50: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/50.jpg)
50!
Locating the Buildings' Corners and Façades !
x-coordinates for the building corners !min and max x-coordinates for the façades !
orientation (1 angle) for the façades!
![Page 51: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/51.jpg)
51!
Selecting the Best Hypothesis: Maximum Likelihood!
maxpose
X
x
logPclass(Render(pose),x)(x)
![Page 52: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/52.jpg)
52!
Some Results !
![Page 53: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/53.jpg)
53!
Some More Results !
![Page 54: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/54.jpg)
54!
Efficient 3D Tracking in Urban Environments with Semantic
Segmentation, Martin Hirzer, Clemens Arth, Peter Roth, Vincent Lepetit,
BMVC 2017. !
![Page 55: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/55.jpg)
56!
Thanks for listening! !!
Questions? !!
![Page 56: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/56.jpg)
57!
Thanks for listening! !!
Questions? !!
![Page 57: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/57.jpg)
58!
Training Set!Split the LINEMOD data: 15% of images for training, 85% for testing. !
!
Augment the training set: !
1. Extract the object from a training image; !
2. Scale the segmented object; !
3. Change the background with a random image from ImageNet;!
4. Shift the object by some pixels.!
!
We generate 200,000 training images !
![Page 58: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/58.jpg)
59!
Results !
!
• Rectangular cuboid: !
!
Previous method Ours
Green: Ground truth - Blue: Estimated Pose
![Page 59: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/59.jpg)
60!
Implementation !
Train two CNNs:!1. Train a regressor using training images
between range of [0° - α, β/2 + α] !– β is the symmetric angle (e.g. β = 180° for
rectangular cuboid.!
– α << β!
– α helps to have more accurate results for 0° and β. !
2. Train a classifier to predict if the angle is between 0° and β/2 (class 1), or between β/2 and β (class 2). !
– If it is detected as class 2, we rotate the image by β degree, then apply the regressor to predict BB. !
α
α
ω
1
1
2
2
β/2
![Page 60: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/60.jpg)
61!
Implementation !
Examples: !
!
!
!α
β
Rectangular cuboid: β = 180°
Cylindrical: β = 0°, α = 0°
![Page 61: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/61.jpg)
62!
(the 2D projections of the 8
corners of the 3D bounding box)
kfΘ(
prediction for the i-th corner (2 values) !
network parameters !
Θ̂ = argminX
(W,p)∈TrainingSet
8X
i=1
kfΘ(W )[i]� pik2
CNN !
kfΘ(W )[
projection of the i-th corner !
![Page 62: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/62.jpg)
63!
But Before That, We Need to Find the Object in 2D !
W
Detector !
![Page 63: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/63.jpg)
64!
1. We split the input image into regions of size 128x128:!
128
128
384
512
![Page 64: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/64.jpg)
65!
2. We segment each region as an 8x8 !binary mask.!Each block of the masks corresponds to a 16x16 image window!
8
8
![Page 65: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/65.jpg)
66!
3. We only keep the largest component. !
![Page 66: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/66.jpg)
67!
4. We segment each active block again. !Decreases the uncertainty from 16px to 4px !
67
![Page 67: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/67.jpg)
68!
5. We use the centroid of the segmentation as the center of window W
68
![Page 68: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/68.jpg)
69!
Robustness to Light Changes: !Adaptive Local Contrast Normalization !
Normalization to illumination model: !
![Page 69: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/69.jpg)
70!
Comparing with Existing Methods !
!
37
![Page 70: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/70.jpg)
71!
From Image Window Normalization !to Whole Image Normalization !
!
!
!
!! ! !" = !!!(!!")!
Image I
!!"#$
= ! !!!!"#$ ∗ ! ! !∘ !
!
!!!
!!
!!"!"#$
= ! !! !!" .!!!!"#$
!
!!!
∗ !!" !Iij
Fk !! ! !" !
F(I) Normalizer
f
![Page 71: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/71.jpg)
72!
Image
I
Normalizer
f
F N
*
F 2
F 1
I
I
I
*
*
+
+
+
I
∘!
∘!
∘!
N ALCN
2 ALCN
1 ALCN
ALCN
From Image Window Normalization !to Whole Image Normalization !
40
![Page 72: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/72.jpg)
73!
Robustness to Light Changes: !Adaptive Local Contrast Normalization !
How we can detect the target object:!
• Under challenging illumination conditions !
• From very few training samples !
!
Example: !
28
![Page 73: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/73.jpg)
74!
Semantic Segmentation !
![Page 74: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/74.jpg)
76!
SegNet Video !
![Page 75: Deep Learning for 3D Localization - University of Oxfordseminars/seminars/Extra/2017_09_05_Vincen… · Ape* 91.2 96.2 Bench Vise 61.3 80.2 Camera 43.1 82.8 Can 62.5 85.8 Cat* 93.1](https://reader033.vdocument.in/reader033/viewer/2022052100/603a09408dd0aa233c198a49/html5/thumbnails/75.jpg)
85!
Images with Various Illuminations After Filtering !