Download - L11: 6D Pose Estimation - GitHub Pages
![Page 1: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/1.jpg)
L11: 6D Pose Estimation
Hao Su
Machine Learning meets Geometry
Ack: Minghua Liu and Jiayuan Gu for helping to prepare slides
![Page 2: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/2.jpg)
We aren’t talking about human pose
2
Figure from https://www.tensorflow.org/lite/models/pose_estimation/overview
![Page 3: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/3.jpg)
We are talking about object pose
3
Figure from https://paperswithcode.com/task/6d-pose-estimation
![Page 4: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/4.jpg)
Rigid Transformation
4
rotate R ∈ 𝕊𝕆(3)
translate t ∈ ℝ3×1
source points target points
Transformation is relative!
![Page 5: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/5.jpg)
Rigid Transformation
• Rigid transformation , where
• Represented by a rotation , and a translation
• All the rigid transformations form the special Euclidean group, denoted by
T(p) = Rp + t p ∈ ℝ3×1
R ∈ SO(3)t ∈ ℝ3×1
{T}SE(3)
5
![Page 6: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/6.jpg)
6D Pose Estimation
6
recognize the 3D location and orientation of an object relative to a canonical frame
canonical frame
pose 1
pose 2
![Page 7: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/7.jpg)
6D Pose & Rigid Transformation
• 6D pose: object-level rigid transformation, associated with a canonical frame
• rigid transformation: can be object-level or scene-level, no predefined canonical frame
7
![Page 8: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/8.jpg)
Agenda
• Introduction
• Rigid transformation estimation- Closed-form solution given correspondence- Iterative closet point (ICP)
• Learning-based approaches- Direct approaches- Indirect approaches
8
![Page 9: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/9.jpg)
Rigid Transformation Estimation
![Page 10: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/10.jpg)
Rigid Transformation Estimation
10
Rigid transformation T(p) = Rp + t
source
rotate translate
target
![Page 11: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/11.jpg)
Correspondence
11
Rigid transformation T(p) = Rp + t
source
target
q1 = T(p1) = Rp1 + t
q2 = T(p2) = Rp2 + t
qn = T(pn) = Rpn + t
…
3n equations from n pairs of points
pi
qi
![Page 12: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/12.jpg)
Estimate from Correspondence
12
Rigid transformation T(p) = Rp + t
source
target
How many pairs of points are required to
uniquely define a rigid transformation?
pi
qi
![Page 13: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/13.jpg)
Two Key Steps
• Find the correspondence between source and target- combinatorial problem- greedy heuristic or exhaustive search
• Estimate the rigid transformation given the correspondence- constrained (for rotation) optimization- closed-form solution
13
![Page 14: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/14.jpg)
Two Key Steps
• Find the correspondence between source and target- combinatorial problem- greedy heuristic or exhaustive search
• Estimate the rigid transformation given the correspondence- constrained (for rotation) optimization- closed-form solution
14
Math is coming
![Page 15: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/15.jpg)
Least-square Estimation of Rigid Transformation
• Given source points , and target points , the objective (least-square error) is:
P = {pi}Q = {qi}
L =n
∑i=1
∥Rpi + t − qi∥2
15
![Page 16: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/16.jpg)
Optimization Problem
• Parameters: and
• Target point cloud:
• Source point cloud:
• Objective:
R ∈ 𝕊𝕆(3) t ∈ ℝ3×1
Q = {qi}
P = {pi}
minR,t
n
∑i=1
∥Rpi + t − qi∥2
16
![Page 17: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/17.jpg)
• Assuming is known,
- Recall the objective
- Calculate the gradient
- Solve :
R ∈ 𝕊𝕆(3)
L =n
∑i=1
∥Rpi + t − qi∥2
∂L∂t
=n
∑i=1
(Rpi + t − qi)
∂L∂t
= 0
17
Step I: Representing by t R
t =∑n
i=1 qi
n−
∑ni=1 Rpi
n
![Page 18: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/18.jpg)
• Substituting the expressed by , the objective can be simplified as
,
where
,
t ∈ ℝ3×1 R
L =n
∑i=1
∥Rp̄i − q̄i∥2
p̄i = pi −∑n
i=1 pi
n
q̄i = qi −∑n
i=1 qi
n
18
Step I: Representing by t R
![Page 19: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/19.jpg)
19
L =n
∑i=1
∥Rp̄i − q̄i∥2
, subject to , where ,
⇒ R* = argminR∥RP − Q∥F
RTR = Idet(R) = 1
P = [p1, p2, ⋯, pn] ∈ ℝ3×n
Q = [q1, q2, ⋯, qn] ∈ ℝ3×n
Objective:
Step II: Solve R
![Page 20: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/20.jpg)
• Orthogonal Procrustes Problem, subject to
• Notice: No determinant constraint• This problem has a closed form solution!
- Project to the space of orthogonal matrices
- Numerically, the magical SVD comes!‣ If ,‣ then
• The proof can be found on the wiki
argminR∥RP − Q∥F RTR = I
M = QPT
M = UΣVT
R = UVT
20
Step II: Solve R
![Page 21: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/21.jpg)
• How to satisfy the determinant constraint?
• Assume - If , then flip the sign of the last
column of
• The proof can be found in Umeyama's paper
R = UVT
det(R) = − 1V
21
Step II: Solve R
![Page 22: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/22.jpg)
Summary of Closed-Form Solution (Known Correspondences)
• Known: ,
• Objective:
• Solution
- (SVD)
- (flip the sign of the last column of if )
-
P = {pi} Q = {qi}
minR,t
n
∑i=1
∥Rpi + t − qi∥2
n
∑i=1
(qi − q̄i)(pi − p̄i)T = UΣVT
R = UVT Vdet(R) = − 1
t =∑n
i=1 qi
n−
∑ni=1 Rpi
n:= q̄ − Rp̄
22
![Page 23: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/23.jpg)
Two Key Steps
• Find the correspondence between source and target- combinatorial problem- greedy heuristic or exhaustive search
• Estimate the rigid transformation given the correspondence- constrained (for rotation) optimization- closed-form solution
23
![Page 24: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/24.jpg)
Iterative Closet Point (ICP)
![Page 25: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/25.jpg)
Heuristic
• The closest point might be the corresponding point
• If starting from a transformation close to the actual one, we can iteratively improve the estimation
25
![Page 26: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/26.jpg)
Find Correspondence
26
source
target
pi
qi
source
target
pi
qi
GT correspondence ICP correspondence
![Page 27: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/27.jpg)
Update Transformation
27
source
target
Update transformation by minimizing the least square error
source
target
pi
qi
ICP correspondence
![Page 28: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/28.jpg)
Iterate to Refine
28
source
target
Find correspondence
Update transformation
source
target
![Page 29: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/29.jpg)
General ICP Algorithm
Starting from an initial transformation
1. Find correspondence: for each point in the source point cloud transformed with current transformation, find the nearest neighbor in the target point cloud
2. Update the transformation by minimizing an objective function over the correspondence
3. Go to step 1 until the transformation is not updated
T = (R, t)
E(T )
29
![Page 30: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/30.jpg)
Illustration
30
Animation from https://github.com/yassram/iterative-closest-point Animation from https://github.com/pglira/simpleICP
![Page 31: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/31.jpg)
Improve ICP
• Objective functions- PointToPoint- PointToPlane (faster convergence, but requires
normal computation)
• Outlier removal: abandon pairs of points with too large distance
31
![Page 32: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/32.jpg)
Useful Libraries
• Open3D- http://www.open3d.org/docs/release/tutorial/
pipelines/icp_registration.html- http://www.open3d.org/docs/release/tutorial/
geometry/kdtree.html
• PCL: https://pointclouds.org/documentation/group__registration.html
32
![Page 33: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/33.jpg)
Limitation of ICP
• However, even with improvement- Easy to get stuck in local minima- Require a good initialization to work
33
![Page 34: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/34.jpg)
Acquire the Initialization for ICP• Go-ICP (correspondences based on features)
- http://www.open3d.org/docs/release/tutorial/pipelines/global_registration.html
- https://github.com/yangjiaolong/Go-ICP
• Teaser (more robust to outliers)- https://github.com/MIT-SPARK/TEASER-plusplus
34 Read by yourself
![Page 35: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/35.jpg)
Learning-based Approach
![Page 36: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/36.jpg)
Two Categories of Approaches
• Direct: predict directly
• Indirect: predict corresponding pairs - points in the canonical frame - points in the camera frame - estimate by solving
(R, t)
{(pi, qi)}{pi}
{qi}(R, t)
minR,t
n
∑i=1
∥Rpi + t − qi∥2
36
![Page 37: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/37.jpg)
Direct Approaches
![Page 38: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/38.jpg)
Direct Approaches• Input: cropped image/point cloud/depth map of a
single object• Output: (R, t)
38
Point cloud
Depth map
Image
Neural Network (R, t)
![Page 39: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/39.jpg)
Challenges for Direct Approaches
• The choice of the representation of
- Recall Lec 5: rotation matrix, Euler angles, quaternion, axis-angle, … (more will be introduced in this lecture)
- Which is more learnable for neural networks?
R
39
![Page 40: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/40.jpg)
Direct ApproachesRepresentation of rotationLoss for rotationExample: DenseFusion
![Page 41: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/41.jpg)
Continuity: 2D Rotation Example
41
2D rotation can be parameterized by θ
However, the mapping is discontinuous due to the topology difference
Zhou, Yi, et al. "On the continuity of rotation representations in neural networks." CVPR 2019
![Page 42: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/42.jpg)
Rotation Representations
42
Representation Continuous parameterization Unique for a rotation
Rotation Matrix ✔ ✔
Euler Angle No No (gimbal lock)Angle-axis No No (singularity)Quaternion No No (double covering)
Note that neural networks are generally continuous
Fitting a discontinuous function is not friendly to neural networks
![Page 43: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/43.jpg)
Continuous Representation
• Next, we will introduce two continuous representations: 6D (vector) and 9D (vector)
43
![Page 44: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/44.jpg)
6D Representation
• 6D representation: , where
• Convert to a rotation matrix through the Gram-Schmidt process,
where with unit length
, ,
• Convert a rotation matrix to 6D:
[aT1 , aT
2 ] a1, a2 ∈ ℝ3x1
x = [aT1 , aT
2 ]R = [b1, b2, b3]
b1, b2, b3 ∈ ℝ3x1
b1 =a1
∥a1∥b2 ∝
a2
∥a2∥− ⟨b1,
a2
∥a2∥⟩b1 b3 = b1 × b2
x = [bT1 , bT
2 ]
44
Zhou, Yi, et al. "On the continuity of rotation representations in neural networks." CVPR 2019
![Page 45: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/45.jpg)
9D Representation
• 9D representation (rotation matrix):
• Find the closest rotation matrix - Recall the Orthogonal Procrustes Problem
, subject to - equivalent to and
• Implementation
X ∈ ℝ3x3
R ∈ SO(3)
argminR∥RP − Q∥F RTR = IP = I Q = X
45
Levinson, Jake, et al. "An analysis of svd for deep rotation estimation." NeurIPS 2020
![Page 46: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/46.jpg)
Direct ApproachesRepresentation of rotationLoss for rotationExample: DenseFusion
![Page 47: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/47.jpg)
Loss for Direct Approaches
• shape-agnostic: distance between and
• shape-aware: distance between the same shape transformed by and
respectively
(R, t)(RGT, tGT)
X ∈ ℝ3×n (R, t) (RGT, tGT)
47
![Page 48: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/48.jpg)
Shape-agnostic Loss
• Rotation- Geodesic distance (angle difference, relative
rotation error) on : - Mean square error over : - distance on representation :
• Translation- distance:
SO(3) arccos[12
(tr(RGT RT) − 1)]
ℝ3×3 ∥R − RGT∥2F
Lp x ∥x − xGT∥p
Lp ∥t − tGT∥p
48
![Page 49: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/49.jpg)
Challenges: Symmetry • Symmetry introduces ambiguities of GT labels
• Require specially designed losses to tackle it
49
Examples of symmetric objects in YCB dataset
![Page 50: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/50.jpg)
Rotational Symmetry
50
1 symmetry axissymmetry order 2
1 symmetry axissymmetry order infinite
3 symmetry axissymmetry order (4, 2, 2)
![Page 51: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/51.jpg)
Shape-agnostic Loss for Symmetric Objects
• Symmetry group: Multiple symmetry-equivalent GT rotations (n is the order of symmetry)
• Min of N loss (for finite order of symmetry)
ℛ = {R1GT, R2
GT, ⋯, RnGT}
minRi
GT∈ℛL(Ri
GT, R)
51
Q: How to deal with infinite order?
![Page 52: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/52.jpg)
Shape-agnostic Loss for Symmetric Objects (Infinite Order)
52 Read by yourself
θ
Use the angle between two symmetry axes
![Page 53: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/53.jpg)
Motivation of Shape-aware Loss
53
target pose predicted pose1 predicted pose2
Similar angle difference, but very different perception error
![Page 54: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/54.jpg)
Shape-aware Loss
• Given a shape , the loss for rotation can be the distance between and
• e.g., (per-point Mean Square Estimation (MSE))
X ∈ ℝ3×n
RX RGT X
L = ∥RX − RGT X∥2F
54
![Page 55: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/55.jpg)
Shape-aware Loss for Symmetry Objects
• With as the distance metric, we can also apply the Min of N loss
∥RX − RGT X∥2F
L = minRi
GT∈ℛ∥RX − Ri
GT X∥2F
55
![Page 56: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/56.jpg)
Shape-aware Loss for Symmetry Objects
• Recall the distance metrics for point clouds in Lec7- Chamfer distance- earth mover distance
• Those distance metrics are compatible with symmetry (even infinite order)
• But requires a shape and might get stuck in local minima
56
Xiang et al., “PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes”, RSS 2018
![Page 57: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/57.jpg)
Example of Local Minima for Chamfer Distance
57
No local minima for Min of N
CD=2.02
Rotate the red left a bit?
![Page 58: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/58.jpg)
Example of Local Minima for Chamfer Distance
58
No local minima for Min of N
CD=2.05
![Page 59: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/59.jpg)
Example of Local Minima for Chamfer Distance
59
No local minima for Min of N
CD=2.02
Reset, and try rotate right:
![Page 60: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/60.jpg)
Example of Local Minima for Chamfer Distance
60
No local minima for Min of N
CD=2.50
![Page 61: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/61.jpg)
Example of Local Minima for Chamfer Distance
61
Note: No local minima for Min of N
CD=2.05 CD=2.02 CD=2.50
![Page 62: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/62.jpg)
Direct ApproachesRepresentation of rotationLoss for rotationExample: DenseFusion
![Page 63: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/63.jpg)
Example: DenseFusion
63Wang et al., “DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion”, CVPR 2019
Segment objects
![Page 64: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/64.jpg)
Example: DenseFusion
64Wang et al., “DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion”, CVPR 2019
Get the image crop and the point cloud for an object
![Page 65: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/65.jpg)
Example: DenseFusion
65Wang et al., “DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion”, CVPR 2019
Encode appearance (image) with CNN and geometry (point cloud) with PointNet
appearance
![Page 66: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/66.jpg)
Example: DenseFusion
66Wang et al., “DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion”, CVPR 2019
Predict poses from features
appearance
![Page 67: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/67.jpg)
Example: DenseFusion
67Wang et al., “DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion”, CVPR 2019
Compute shape-aware loss
non-symmetric objects:per-point MSE
symmetric objects:chamfer distance
Min-of-N as alternative?
![Page 68: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/68.jpg)
Indirect Approaches
![Page 69: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/69.jpg)
Indirect Approaches
• Input: cropped image/point cloud/depth map of a single object
• Output: corresponding pairs - points in canonical frame - points in camera frame - estimate by solving
{(pi, qi)}{pi}
{qi}(R, t)
minR,t
n
∑i=1
∥Rpi + t − qi∥2
69
![Page 70: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/70.jpg)
Two Categories of Indirect Approaches
• If points in the canonical frame are known, predict their corresponding locations in the camera frame
• If points in the camera frame are known, predict their corresponding locations in the canonical frame
70
![Page 71: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/71.jpg)
Given Points in the Canonical Frame, Predict Corresponding Location in the Camera Frame
• Recall: Three correspondences are enough • Which points in the canonical frame should be given?
- Choice by PVN3D: keypoints in the canonical frame
71He et al., “PVN3D: A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation”, CVPR 2020
![Page 72: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/72.jpg)
• Option1: bounding box vertices
• Option 2: farthest point sampling (FPS) over CAD object model
Keypoint Selection
72Chen et al., “G2L-Net: Global to Local Network for Real-time 6D Pose Estimation with Embedding Vector Features”, CVPR 2020
![Page 73: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/73.jpg)
Example: PVN3D
73
Get point-wise features by fusing color and geometry features
![Page 74: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/74.jpg)
Example: PVN3D
74
For each keypoint:• Voting: for each point in the camera frame, predict its offset to the keypoint (in the camera frame)
• Clustering: find one location according to all the candidates
Keypoint
![Page 75: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/75.jpg)
Given Points in the Camera Frame, Predict Corresponding Location in the Canonical Frame
• Which points in the camera frame should be given?- Choice by NOCS: every point in the camera frame
75Wang, He, et al. "Normalized object coordinate space for category-level 6d object pose and size estimation." CVPR 2019.
3D point in the camera frame(2D visible pixel with depth)
3D point in the (normalized) canonical frame
![Page 76: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/76.jpg)
Example: NOCS
76Wang, He, et al. "Normalized object coordinate space for category-level 6d object pose and size estimation." CVPR 2019.
Output NOCS Map H × W × 3
Input Image
Note: the object is normalized to have unit diagonal of bounding box in the canonical space, so the canonical space is called “Normalized Object Canonical Space” (NOCS)
![Page 77: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/77.jpg)
Example: NOCS for Symmetric Objects
• Given equivalent GT rotations (finite symmetry order n),
we can generate n equivalent NOCS maps
• Similar to shape-agnostic loss in direct approaches, we can use Min of N loss
ℛ = {R1GT, R2
GT, ⋯, RnGT}
77Wang, He, et al. "Normalized object coordinate space for category-level 6d object pose and size estimation." CVPR 2019.
![Page 78: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/78.jpg)
Umeyama’s Algorithm
• However, the target points in the canonical space of NOCS are normalized, and thus we also need to predict the scale factor
• Similarity transformation estimation (rigid transformation + uniform scale factor)
• Closed-form solution- Umeyama algorithm: http://web.stanford.edu/class/
cs273/refs/umeyama.pdf- Similar to the counterpart without scale
78 Read by yourself
![Page 79: L11: 6D Pose Estimation - GitHub Pages](https://reader033.vdocument.in/reader033/viewer/2022050207/626dccec3437fc0d242c7db0/html5/thumbnails/79.jpg)
Tips for Homework 2
• For learning-based approaches- Start with direct approaches- Crop the point cloud of each object from GT depth
map given GT segmentation mask- Train a neural network, e.g. PointNet, with shape-
agnostic loss- Improve the results considering symmetry
79