exact recovery in semidefinite relaxation of synchronization …vkobzar/kobzar_thesis.pdf · exact...
TRANSCRIPT
Exact Recovery in Semidefinite Relaxation of
Synchronization over the Euclidean Group in One
Dimension
by
Vladimir A. Kobzar
A Thesis
Submitted in Partial
Fulfillment of the Requirements
for the Degree
of Master of Science
Courant Institute of Mathematical Sciences
New York University
September 2017
Adviser: Afonso S. Bandeira
c© Copyright by Vladimir A. Kobzar, 2017.
All Rights Reserved
Abstract
Nonconvex maximum likelihood estimation problems are often hard to solve computationally. As a result,
convex relaxations of the maximum likelihood estimator (MLE) are commonly used, in particular relaxations
based on semidefinite programming (SDP), which can be solved in a polynomial amount of time.
We consider the problem of synchronization over the Euclidean group in one dimension (E(1)): the goal is
to recover n elements of such group (ground truth) from measurements of their pairwise products corrupted
with non-adversarial noise. Informally this can be seen as the problem of recovering the orientation and
position of a ballerina that moves along a line and faces in one of two possible directions in each of n poor
quality photos.
We show that the SDP relaxation of synchronization over E(1) exactly recovers the orientation ground
truth with high probability. This is demonstrated for any level of noise by leveraging non-asymptotic bounds
for the spectral norm of random matrices with independent entries.
From the orientation, the MLE of the position is recovered by the least squares estimate. Such recovery
is tight, meaning that the least squares solution matches the MLE of the position ground truth. However,
due to noise the exact recovery of the position ground truth is not possible.
Synchronization over the special Euclidean group in d dimensions (SE(d)) includes important problems
in robotics and computer vision. We expect that establishing the tightness of synchronization over SE(d)
should be similar to establishing that over the Euclidean group in d dimensions (E(d)). Therefore, we hope
that our result for E(1) will be extended to higher dimensions.
iii
Acknowledgements
First and foremost, I am extremely grateful to Afonso Bandeira for all the invaluable guidance, instruction
and support he provided as my thesis advisor and teacher at the Courant Institute. The research in this
thesis is inspired by Afonso’s course entitled ”Mathematics of Data Science” at Courant, which greatly
shaped my academic interests.
I also would like to thank Carlos Fernandez-Granda, my faculty mentor at the NYU Center for Data
Science (CDS) and the second reader of this thesis, for all of his advice and encouragement with respect to
my research.
I owe a debt of gratitude to Sinan Gunturk and Yuri Bakhtin, whose leadership and mentorship profoundly
shaped my career at Courant.
I am grateful to all participants in the Math and Data Group at Courant and the CDS. Being a part of
that group provided an extremely supportive community while I was working on this thesis.
Keith Moffat, Vukica Srajer and Robert Henning first introduced me to image processing problems in
the context of dynamic X-ray diffraction studies of macromolecules at BioCARS, Argonne National Labo-
ratory/The University of Chicago. My work with them at BioCARS motivated my interest in molecular
imaging models and, more broadly, the mathematics of data. I am very fortunate to have them as my
mentors, colleagues and friends.
Last but not least, Brett Bernstein very generously shared ideas about this thesis and various related
topics.
Any errors should be attributed to me.
iv
Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
1 Introduction 1
1.1 Formulation of synchronization over E(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Overview of related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.1 Synchronization over Z2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.2 Synchronization over SE(d) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Nonconvex MLE and its relaxation 10
2.1 Least squares estimate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 SDP relaxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3 Strong duality and exact recovery of the orientation 14
3.1 Dual certificate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Decomposition of LQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3 Exactness conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.4 Exactness without pairwise orientation measurements . . . . . . . . . . . . . . . . . . . . . . 22
4 Conclusion 25
A Proof of Lemma 2.1.1 28
v
B Proof of Lemma 3.2.1 31
B.1 Decomposition of L(TG) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
B.2 Decomposition of L(TB) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
B.3 Decomposition of L(TN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
B.3.1 Decomposition of H +HT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
B.3.2 Decomposition of DH+HT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
B.3.3 Decomposition of G and DG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
C Proof of Lemma 3.3.1 39
C.1 Minimum eigenvalue of L(TG) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
C.2 Spectral radius of L(TN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
C.2.1 Spectral radius of H +HT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
C.2.2 Spectral radius of DH+HT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
C.2.3 Spectral radius of G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
C.2.4 Spectral radius of DG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
C.3 Spectral radius of D(XN) and M (XN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
vi
Chapter 1
Introduction
Many signal recovery problems are solved as optimization problems over a set of feasible signals where the
optimum represents the signal with the maximum likelihood given the data. Unfortunately such problems
are often noncovex and the parameter space is often exponentially large, which makes them computationally
challenging. It is therefore common to use heuristics, such as expectation maximization, to approximate
the maximum likelihood estimator (MLE) [3]. However, the convergence of heuristic methods to the global
maximum is not formally guaranteed in many cases. Also when these methods do attain a global maximum,
in general it is not possible to formally certify that fact.
As a result, another common approach is to determine the MLE over a larger feasible set by removing
the nonconvex constrains, a so-called convex relaxation. The idea is that convex optimization problems
are generally guaranteed to converge to the global maximum. One particular class of relaxations is based
on semidefinite program (SDP) where a linear objective function is optimized over a convex set of positive
semidefinite matrices. An SDP can be solved in a polynomial amount of time [14]. However, the solution
may not be in the original parameter space, and therefore its projection onto the original parameter space
may be suboptimal.
However, in a non-adversarial noise setting, it may be possible to achieve tightness of the convex re-
laxation, meaning that that the relaxation recovers the solution in the original parameter space with high
probability.1 Thus, instead of addressing every possible instance of a computationally intractable prob-
lem, the convex optimization route achieves an optimal and computationally tractable solution with high
1In the context of recovering signal or other object of interest from incomplete measurements, such as compressed sensing[9, 11], matrix completion [10], and linear inverse problems [5], the recovery by convex optimization is achieved with highprobability.
1
probability.
In the case of a discrete signal, the relaxation may be not only tight, but may also be exact, meaning that
the MLE coincides with the ground truth signal with high probability. For example, exactness was shown
in the context of Z2 signal recovery (SDP Z2) [6, 7] from noisy measurements, as well as in the context of
the stochastic block model, also known as correlation clustering, with two [1, 2] or more communities [4].2
So-called synchronization problems are one important class of signal recovery problems. They entail
estimating a set of signals from data concerning relations or interactions between them. As described
in [6, 16,17], this includes various problems in:
1. Computer vision, such as determining structure from motion. This entails building a three-dimensional
model of an object from several two-dimensional photos taken from unknown positions. Although it is
usually not possible to estimate the position of the object relative to the camera from a given photo,
one can compare pairs of photos and estimate their relative positions.
2. Signal processing, such as synchronization for molecule reconstruction in cryo-electron microscopy. This
entails resolving the global three-dimensional structure of a molecule by recording multiple images of
the molecule at unknown orientations, where we can estimate relative orientation of the molecule in a
pair of images.
3. Robotics, such as pose-graph simultaneous localization and mapping (pose-graph SLAM). This entails
the determination of a collection of poses (position and orientation) of a robot or another object from
noisy pairwise relative measurements.
2In the context of compressed sensing, matrix completion and linear inverse problems, exact recovery from incomplete datawas achieved with high probability as well. See the references cited in the previous footnote.
2
1.1 Formulation of synchronization over E(1)
We consider one particular problem of the above class: synchronization over the Euclidean group in one
dimension (E(1)), where E(1) is a product RoO(1) = Ro Z2 with the multiplication given by3
r1 · r2 = (t1, x1) · (t2, x2) = (t1 + x1t2, x1x2).
The goal is to estimate the values of a set of unknown group elements r\i = (t\i , x\i) ∈ E(1) (ground truth)
for 1 ≤ i ≤ n from pairwise products
r\−1i r\j = (x\i(t\j − t
\i), x
\ix\j)
corrupted with noise. 4
1.2 Contribution
We show that although the synchronization over E(1) is non-convex, in the case of Gaussian noise the
orientation is recovered by an SDP relaxation (SDP E(1)) matches the ground truth x\i ∈ Z2 (i.e., achieves
exact recovery) with high probability. This holds even if as noise levels grow to infinity as the number
of elements n grows to infinity. Moreover, such recovery is achieved with fewer measurements than the
number of measurements needed to achieve recovery by SDP Z2, as defined in Section 1.3.1, i.e., if only
relative orientation x\−1i x\j measurements were available. These results are demonstrated by leveraging
non-asymptotic bounds for the spectral norm of random matrices with independent entries by an approach
adapted from [6,7].
Numerical simulations confirm the analytic conditions for the exactness of recovery of orientations in
SDP E(1).5 Figures 1.1 and 1.2 demonstrate the exactness of recovery of orientation where we scale a sparse
3This multiplication convention follows from matrix multiplication if we represent (t, x) by
[x t0 1
]. As a technical matter,
the multiplication by x1 is a homomorphism ϕx1 : R → R given by ϕx1 (t2) = x1t2. Since trivially R ∈ Aut(R) and themultiplication in E(1) is given by
· : (R× Z2)× (R× Z2)→ R× Z2
(t1, x1) · (t2, x2) = (t1 + ϕx1 (t2), x1x2)
for t1, t2 in R and x1, x2 in Z2, the Cartesian product R× Z2 meets the definition of an outer semidirect product. Therefore,we denote it by R o Z2.
4Note that r−1 = (t, x)−1 = (−xt, x).5The following experiments were performed on a Linux system with two Intel Xeon E5-2680 (2.80 GHz) CPUs (20 cores)
and 128 GB of memory. The experimental software was written in Matlab and used the default CVX solver (SDPT3 4.0).
3
2 10 100 600 0
1
10
32
2 10 100 600 0
1
10
32
Figure 1.1: This figure shows how frequently SDP E(1) exactly recovers the orientation X\ = x\x\T when thecentered translation ground truth t\c has a fixed l2 norm. For each (σx, n), 10 realization of the data were generatedand the exactness of the recovered orientation was verified. The frequency of success is represented in grayscale(white for 100% success and black for 0% success). The results agree with the analytic predictions (solid curve). Theanalytic predictions without the translation data (SDP Z2) are plotted as well (dashed curve). (In these experiments,the translation noise σt (plotted on the right vertical axis) is equal to the orientation noise σx. For such levels ofnoise, the experiments demonstrate on the tightness of the σx bound.)
translations ground truth t\ to achieve, respectively figures, the fixed and increasing l2 norms of the centered
translations ground truth t\c = t\ − 1n
∑ni=1 t
\i . (Similar results can be obtained for a nonsparse t\.)
The translations estimate t ∈ Rn are recovered by a least squares estimate from the orientation ground
truth x\, and such recovery is tight (meaning that t is the MLE of the position ground truth with high
probability). However, due to noise the exact recovery of the position ground truth t\ is not possible.
The orientation ground truth x\ can be recovered exactly even without the pairwise orientation measure-
ments x\ix\j since the measurements of the relative translations x\i(t
\j−t
\i) incorporate orientation information.
In this context, for a given level of noise more measurements are needed to achieve recovery than in the con-
text of the regular SDP E(1) (where measurements of (x\i(t\j − t
\i), x
\ix\j) are available). Compare Figure 1.1
with Figure 3.1.
4
2 10 100 600 0
1
10
32 2 10 35
0
1
10
32
Figure 1.2: This figure shows how frequently the semidefinite relaxation (SDP E(1)) exactly recovers the orientationX\ = x\x\T when the centered ground truth translation t\c has increasing l2 norm (plotted on the top horizontal axis).For each (σx, n), 10 realization of the data were generated and the exactness of the recovered orientation was verified.The frequency of success is represented in grayscale (white for 100% success and black for 0% success). The resultsagree with the analytic predictions (solid curve). (In these experiments, the translation noise σt (plotted on theright vertical axis) is equal to the orientation noise σx. For such levels of noise, the experiments demonstrate on thetightness of the σx bound.)
5
1.3 Overview of related work
1.3.1 Synchronization over Z2
The synchronization problem over Z2 (which arises, for example, in the context of community detection for
two communities) is to recover zi ∈ {±1}, 1 ≤ i ≤ n from observations given by:
yij = zizj + σwij
where wij = wji is a standard Gaussian (N(0,1)), and the MLE is given by a least squares solution:
z = arg minx1,..,xn
∑ij
(yij − xixj)2
In [6, 7] this problem was ”lifted” into an equivalent matrix form: for the underlying ground truth
z ∈ {±1}n, the observations are given by the n× n matrix:
Y = zzT + σW
where W is a real symmetric random matrix (Wij = Wji = N(0,1)). Then the MLE for zzT is given by
arg maxX=xxT
tr(Y X)
An SDP can be formed by relaxing the last constraint as follows:
arg maxX�0Xii=1
tr(Y X)
In the above-referenced papers, it was shown that the relaxation recovers zzT exactly with high probability
if σ <√
n(2+ε) logn .
Note that when the t\c = 0, the synchronization problems over Z2 and E(1) are equivalent. This is
illustrated in Figure 1.3.
6
2 10 100 600 0
1
10
32
2 10 100 600 0
1
10
32
Figure 1.3: This figure shows how frequently the semidefinite relaxation (SDP E(1)) exactly recovers the orientationX\ = x\x\T when the centered translation ground truth t\c is equal to zero. For each (σx, n), 10 realization of the datawere generated and the exactness of the recovered orientation was verified. The frequency of success is representedin grayscale (white for 100% success and black for 0% success). The results agree with the analytic predictionsfor SDP E(1), which in this context are the same as those for SDP Z2 (solid curve). (In these experiments, thetranslation noise σt (plotted on the right vertical axis) is equal to the orientation noise σx. For such levels of noise,the experiments demonstrate the tightness of the σx bound.)
7
1.3.2 Synchronization over SE(d)
The SE(d) synchronization problem entails estimating a set of unknown poses pi = (ti, xi), 1 ≤ i ≤ n given
noisy measurements of their pairwise relative transforms p−1i pj . As noted previously, this problem arises in
robotics, such asr pose-graph SLAM and computer vision, such as camera pose estimation. It entails the
determination of a collection of poses (position and orientation) of a robot or another object from noisy
pairwise relative measurements.
In [15–17], the SE(d) synchronization problem in a non-adversarial (but operationally relevant) noise
regime was posed a nonconvex MLE. The algorithm proposed in these papers verified the tightness of the
MLE recovery of an SDP relaxation post-hoc for each given instance of problem.
1.4 Notation
We will use the following standard matrix and probability notation. For a matrix M , we denote it’s k-
th smallest eigenvalue by λk(M), the largest eigenvalue by λmax(M), its spectral and Frobenius norms by
‖M‖ and ‖M‖F , respectively, and diag(M) refers to a vector with the diagonal elements of M as entries,
and ddiag(M) sets the off-diagonal entries of M to zero. For x ∈ Rn, diag(x) refers to a diagonal matrix
D ∈ Rn×n with Dii = xi. 1n denotes the vector in Rn with all components equal to 1 (we will omit the
subscript when the dimension is clear from the context). DM refers to a diagonal matrix diag(M1), i.e.,
with Dii =∑nj=1 = Mij , and LM is a matrix given by LM = DM −M .
a . b means that there exists a universal constant C > 0 such that a ≤ Cb.
We say that an event E happens with high probability as n → ∞ if there exists ε > 0 such that
P[E ] = 1− n−ε. M � 0 means that not only M is positive semidefinite (PSD), but is also symmetric.
E(d) refers to the Euclidean group of isometries in d dimensions given by
X : X =
R | r
− −
01×3 | 1
, R ∈ O(d), r ∈ Rd
8
SE(d) refers to the special Euclidean group of rigid body motions in d dimensions given by
X : X =
R | r
− −
01×3 | 1
, R ∈ SO(d), r ∈ Rd
where O(d) and SO(d) are the orthogonal and special orthogonal groups in d dimensions respectively.
9
Chapter 2
Nonconvex MLE and its relaxation
2.1 Least squares estimate
In this chapter and Chapter 3, we adapt the approach developed in [6,7] for synchronization over Z2 to the
synchronization problem over E(1): determine r\ = (t\, x\) ∈ E(1)n from the following noisy observations
Zij = (Sij , Yij) =(t\i , x\i)−1(t\j , x
\j) + (σtNij , σxWij)
=(x\i(t\j − t
\i), x
\ix\j) + (σtNij , σxWij)
where xiNij = −xjNji and Wij = Wji are N(0,1) i.i.d. for i 6= j (Nii = Wii = 0). For simplicity, we assume
that we have a complete set of n2 − n pairwise relative measurements.
By definition, the maximum a posteriori estimator (MAP) maximizes the probability of recovering r\ =
(t\, x\). Since we have no prior information on r\, we assume a uniform prior, in which case, the MAP is
given by the MLE, i.e., the least squares solution:
arg minri,rj∈{RZ2}
∑i,j
‖r−1i rj − Zij‖22
10
where
‖r−1i rj − Zij‖22 = ‖(ti, xi)−1 · (tj , xj)− (Sij , Yij)‖22
= ‖(−xiti + xitj , xixj)− (Sij , Yij)‖22
= (xi(tj − ti)− Sij)2 + (xixj − Yij)2
Accordingly, the minimization problem is given by
r = (t, x) = arg minxi,xj∈{±1}ti,tj∈R
∑i,j
(tj − ti − xiSij)2 − 2xixjYij (2.1)
Lemma 2.1.1. The minimization problem (2.1) is equivalent to
x = arg maxX=xxT
x∈Zn2
Trace (QX) (2.2)
independently of t where
Q =1
2nV TV + 2Y,
V =BT1 B2 =
∑j 6=1 S1j −S21 −S31 · · · −Sn−1,1 −Sn,1
−S12
∑j 6=2 S2j −S32 · · · −Sn−1,2 −Sn,2
−S13 −S23
∑j 6=3 S3j · · · −Sn−1,3 −Sn,3
......
......
......
−S1n −S2n −S3n · · · −Sn−1,n∑j 6=n Sn,j
,
t = −B†1B2x (uniquely up to a global shift),
B1 ∈ Rn2×n is an incidence a matrix of a connected graph, i.e., B1 is comprised of n matrix blocks B(i)1 ∈
Rn×n arranged vertically. Each B(i)1 is given by
(B(i)1 ):i = −1n
(B(i)1 )jj = 1 for i 6= j
11
and the remaining entries are zero. Therefore B1 has the following structure
B1 =
0 0 0 0 0 · · · 0 0 0
−1 1 0 0 0 · · · 0 0 0
−1 0 1 0 0 · · · 0 0 0
−1 0 0 1 0 · · · 0 0 0
......
......
......
......
1 −1 0 0 0 · · · 0 0 0
0 0 0 0 0 · · · 0 0 0
0 −1 1 0 0 · · · 0 0 0
0 −1 0 1 0 · · · 0 0 0
......
......
......
......
......
......
......
......
0 0 0 0 0 · · · 0 0 −1
0 0 0 0 0 · · · 1 0 −1
0 0 0 0 0 · · · 0 1 −1
0 0 0 0 0 · · · 0 0 0
B2 ∈ Rn2×n is a matrix comprised of n matrix blocks B(i)2 ∈ Rn×n arranged vertically. Each B
(i)2 is given
12
by (B(i)2 ):i = −STi: and the remaining entries are zero. Therefore B2 has the following structure:
B2 =
0 0 0 0 0 · · · 0 0 0
−S12 0 0 0 0 · · · 0 0 0
−S13 0 0 0 0 · · · 0 0 0
−S14 0 0 0 0 · · · 0 0 0
......
......
......
......
0 −S21 0 0 0 · · · 0 0 0
0 0 0 0 0 · · · 0 0 0
0 −S23 0 0 0 · · · 0 0 0
0 −S24 0 0 0 · · · 0 0 0
......
......
......
......
......
......
......
......
0 0 0 0 0 · · · 0 0 −Sn,n−3
0 0 0 0 0 · · · 0 0 −Sn,n−2
0 0 0 0 0 · · · 0 0 −Sn,n−1
0 0 0 0 0 · · · 0 0 0
The proof is provided in Section A of the Appendix.
2.2 SDP relaxation
We replace the nonconvex rank constraint in (2.2) as follows:
maxX�0Xii=1|Xij |≤1
Trace (QX)
However, the positive semidefiniteness, together with Xii = 1, implies that the absolute value of the
off-diagonal terms will be dominated by 1. Therefore, the foregoing relaxation is equivalent to:
maxX�0Xii=1
Trace (QX) (2.3)
13
Chapter 3
Strong duality and exact recovery of
the orientation
3.1 Dual certificate
Since Trace (QX) is given by∑i,j QijXji, we can express (2.3) in the vector notation. Specifically, given a
vector q in Rn(n−1)
2 which contains the entries of Q above the main diagonal, we have
Trace (Q) + maxF (x′)�0
2qTx′
where
F (x′) = I +∑
i,j above themain diagonal
x′i,jFi,j
x′ ∈ Rn(n−1)
2 , I is the n×n identity matrix, and Fi,j are a collection of n(n− 1) symmetric matrices in Rn×n
containing 1’s in the ijth and jith entries and 0’s elsewhere where i, j are the indices of matrix entries above
the main diagonal. Then, using Eq. 28 in [20], the associated dual problem is given by
Trace (Q) + minTrace(Fi,jC)=2qi,j
C�0
Trace (C)
Observe that the first constraint disregards the diagonal terms. Thus, if we represent C = D −Q where
14
D is a diagonal matrix, the constraint will be satisfied for any D. Therefore, the preceding dual problem is
equivalent to
Trace (Q) + minD−Q�0
Trace (D −Q)
and we obtain the following lemma.
Lemma 3.1.1. The dual problem associated with the relaxation in (2.3) is given by
minD−Q�0
Trace (D) (3.1)
where D is a diagonal matrix.
If X and D −Q are optimal solutions to (2.3) and (3.1), respectively, weak duality provides that
Trace (QX) ≤ Trace (D)
To establish strong duality, we look for a dual certificate D, a diagonal matrix that satisfies
Trace (D)− Trace (QX) = 0 (3.2)
Since D is diagonal and per the above, all entries of X are assumed to be equal to 1, and we have
Trace (D) = Trace (DX)
Therefore (3.2) is equivalent to
Trace ((D −Q)X) = 0
We observe that since X and (D−Q) are PSD, they are simultaneously diagonalizable. If (D−Q)1 = 0
and λ2(D−Q) > 0, the eigenspace of the eigenvectors vi corresponding to nonzero eigenvalues λi (2 ≤ i ≤ n
is the orthogonal complement of 1. We have X(D −Q)vi = Xλvi(D −Q)vi. Since X is PSD and λvi > 0,
the condition Trace ((D −Q)X) = 0 requires Xvi = 0. This implies that X is a scalar multiple of 11T , and
the fact that the diagonal terms must be equal to 1 requires that X = 11T . See also [1].
Since diag(x\)Wdiag(x\) ∼W and diag(x\)N ∼ N , WLOG, finding a dual certificate D for r\ = (t\, x\)
15
where t\ ∈ Rn and x\ ∈ {±1}n is equivalent to finding it for x\ = 1 (which implies Sij = −Sji and
Nij = −Nji). Therefore, we established the following result.
Theorem 3.1.2. If a diagonal matrix D satisfies
(D −Q)1 = 0; and
λ2(D −Q) > 0
then the unique optimal solution of (2.3) is X\ = x\x\T and t is the MLE of t\ (up to a global shift), recovered
in each case with high probability as n→∞.
Let D = 12nDV TV + 2DY . Following [6, 7], to find DY we set the following expression to zero
(DY − Y )1 =(DY − 11T − σxW )1
=DY 1− n1− σx
∑nj=1W1,j
...∑nj=1W1,j
1
Therefore,
DY = nIn×n + σxDW , and
DW = diag(W1) =
∑nj=1W1,j 0 0 ... 0
0∑nj=1W2,j 0 ... 0
.... . .
...
0 0 ...∑nj=1Wn,j
To find, DV TV we set
1
2n(DV TV − V TV )1 = 0
16
where
DV TV = diag(V TV 1) =
(V TV 1)1 0 0 ... 0
0 (V TV 1)2 0 ... 0
.... . .
...
0 0 ... (V TV 1)n
Observe that by construction LQ = (DQ −Q)1 = 0. Thus, to complete the proof, we just need to confirm
λ2(LQ) or λ2(LV TV + 2LY ) > 0, which would imply that the SDP recovers x\ exactly with high probability
as n→∞.
3.2 Decomposition of LQ
We decompose V TV into the noiseless signal (ground truth) M (TG), the bias term M (TB) and pure noise
M (TN).
M (TG) := (EV )T (EV )
M (TB) := EV TV − (EV )T (EV )
M (TN) := V TV − EV TV
Letting a matrix T be given by Tij = t\j − t\i , we can decompose V and V TV as
V = BT1 B2 = BT1 (B2T + σB2N )V TV = (BT2T + σtBT2N )B1B
T1 (B2T + σtB2N ))
17
where B2T ∈ Rn2×n is a matrix comprised of n matrix blocks B(i)2T ∈ Rn×n arranged vertically. Each B
(i)2T is
given by (B(i)2T ):i = −TTi: and the remaining entries are zero. Therefore B2T has the following structure.
B2T =
0 0 0 0 0 · · · 0 0 0
−T12 0 0 0 0 · · · 0 0 0
−T13 0 0 0 0 · · · 0 0 0
−T14 0 0 0 0 · · · 0 0 0
......
......
......
......
0 −T21 0 0 0 · · · 0 0 0
0 0 0 0 0 · · · 0 0 0
0 −T23 0 0 0 · · · 0 0 0
0 −T24 0 0 0 · · · 0 0 0
......
......
......
......
0 0 0 0 0 · · · 0 0 −Tn,n−3
0 0 0 0 0 · · · 0 0 −Tn,n−2
0 0 0 0 0 · · · 0 0 −Tn,n−1
0 0 0 0 0 · · · 0 0 0
18
Similarly B2N ∈ Rn2×n is a matrix comprised of n matrix blocks B(i)2N ∈ Rn×n arranged vertically. Each B
(i)2N
is given by (B(i)2N ):i = −NT
i: and the remaining entries are zero. Therefore B2N has the following structure.
B2N =
0 0 0 0 0 · · · 0 0 0
−N12 0 0 0 0 · · · 0 0 0
−N13 0 0 0 0 · · · 0 0 0
−N14 0 0 0 0 · · · 0 0 0
......
......
......
......
0 −N21 0 0 0 · · · 0 0 0
0 0 0 0 0 · · · 0 0 0
0 −N23 0 0 0 · · · 0 0 0
0 −N24 0 0 0 · · · 0 0 0
......
......
......
......
0 0 0 0 0 · · · 0 0 −Nn,n−3
0 0 0 0 0 · · · 0 0 −Nn,n−2
0 0 0 0 0 · · · 0 0 −Nn,n−1
0 0 0 0 0 · · · 0 0 0
WLOG, we can assume that
∑ni=1 t
\i = 0 since t\ is recovered only up to a global shift. On this basis and
letting H = BT2NB1BT1 B2T and G = BT2NB1B
T1 B2N , in Section B of the Appendix, we show the following
result.
Lemma 3.2.1. For LQ where Q is as defined in Lemma 2.1.1, we have LQ = L(TG) +L(TB) +L(TN) where
L(TG) = n2ddiag(t\t\T ) + 2n‖t\‖2I − n(1ddiag(t\t\T )− t\t\T + ddiag(t\t\T )1T )− ‖t\‖211T
L(TB) = 2σ2t (nI − 11T )
L(TN) = σtDH+HT − σt(H +HT ) + σ2tDG − σ2
tG+ L(TB)
19
and
DH+HT = 2nD1 + 2D2
(D1)ii =∑k 6=i
TikNik
(D1)ij = 0
D2 =∑j
∑k<j
TkjNkjI
H +HT = nT �N +
N(1:)T
T(1:)
...
NT(n:)T(n:)
1T + 1
[NT
(:1)T(:1), · · · , NT(:n)T(:n)
]
G = N � (N11T + 11TN)−N2
DG = D2N + 2DNTN
where i 6= j and Ti: and T:i are respectively the i-th row and column of the matrix T.
On the other hand, the orientation measurements Y don’t have a bias since they only have linear noise.
M (XN) = Y − EY = σxW
D(XN) = σxDW
M (XG) = EY = 11T
D(XG) = nI
L(XG) = nI − 11T
3.3 Exactness conditions
For the purposes of the remainder of the thesis, we define the operator norm ‖ · ‖ of a square matrix M by
‖M‖ = max‖x‖=1
xT1=0
xTMx
20
and the minimum eigenvalue of M corresponding to an eigenvector orthogonal to 1 by
λ2(M) = min‖v‖=1
vT1=0
vTMv
Since D2N is PSD, WLOG we can disregard this term, i.e. we assume that DG = 2DNTN , and in Section C
of the Appendix, we show the following result.
Lemma 3.3.1. For L(TG), L(TB) and L(TN) as defined in Lemma 3.2.1 and L(XG), D(XN),M (XN) are as
defined in Section 3.2, we have
λ2(L(TG)) ≥n2t2min + 2n‖t\‖2
λ2(L(TB)) =2σ2t n
λ2(L(XG)) =n
‖L(TN)‖ ≤σt[3nv√
(2 + ε) log n+√
2‖T‖F√ε log n]
+ σ2t [6n+ 8n
√n]
‖D(XN)‖ ≤ σx√
(2 + ε)n log n
‖M (XN)‖ ≤ σx2√n
with high probability as n→∞ where
v = max(‖T1:‖, ‖T1:‖, ..., ‖Tn:‖)
tmin = min(|t\1|, ..., |t\n|)
Therefore, λ2(LQ) > 0, if
1
2n‖L(TN)‖+ 2‖D(XN)‖+ 2‖M (XN)‖ ≤ 1
2n(λ2(L(TG)) + λ2(L(TB))) + 2λ2(L(XG))
or alternatively
σt[3
2v√
(2 + ε) log n+‖T‖F√
2n
√ε log n] + σ2
t [4√n+ 2]
+ σx(2√
(2 + ε)n log n+ 4√n) ≤ ‖t\‖2 +
1
2t2minn+ 2n
21
This implies the following main result in this work.
Theorem 3.3.2. For LQ where Q is as defined in Lemma 2.1.1, we have λ2(LQ) > 0 with high probability
as n→∞ if
σx ≤‖t\‖2 + 1
2 t2minn+ 2n
2√
(2 + ε)n log n
σt ≤‖t\‖2 + 1
2 t2minn+ 2n
32v√
(2 + ε) log n+ ‖T‖F√2n
√ε log n
σt ≤
√‖t\‖2 + 1
2 t2minn+ 2n
4√n
where
tmin = min(|t\1|, ..., |t\n|)
v = max(‖T1:‖, ‖T1:‖, ..., ‖Tn:‖)
and Ti: is the i-th row of the matrix T.
Note that letting tmin, v, ‖T‖F , and ‖t\‖2 go to zero, confirms that our result for synchronization over
E(1) generalizes the exact recovery conditions for synchronization over Z2.
3.4 Exactness without pairwise orientation measurements
Lastly we consider the case when the pairwise orientation measurements x\ix\j represented by Y are unavail-
able, i.e., only the measurements of the relative translations x\i(t\j − t\i) are available.Therefore, we have
Q = 12nV
TV . The analysis in the preceding section implies that λ2(LQ) > 0 if
1
2n‖L(TN)‖ ≤ 1
2n(λ2(L(TG)) + λ2(L(TB)))
or alternatively
σt[3
2v√
(2 + ε) log n+‖T‖F√
2n
√ε log n] + σ2
t [4√n+ 2] ≤ ‖t\‖2 +
1
2t2minn
This implies the following result.
22
Theorem 3.4.1. If Q = 12nV
TV where V is as defined in Lemma 2.1.1, λ2(LQ) > 0 with high probability
as n→∞ if
σt ≤‖t\‖2 + 1
2 t2minn
32v√
(2 + ε) log n+ ‖T‖F√2n
√ε log n
σt ≤
√‖t\‖2 + 1
2 t2minn
4√n
where
tmin = min(|t\1|, ..., |t\n|)
v = max(‖T1:‖, ‖T1:‖, ..., ‖Tn:‖)
and Ti: is the i-th row of the matrix T.
The orientation ground truth x\ can be recovered exactly even without the pairwise orientation mea-
surements x\ix\j . However, in this context, for a given level of noise, more measurements are needed to
achieve recovery than in the context of the regular SDP E(1) (where measurements of (x\i(t\j − t
\i), x
\ix\j) are
available). Compare the numerical simulations in this scenario in Figure 1.1 with Figure 3.1.
23
2 10 100 600 0
1
10
32
2 10 100 600 0
1
10
32
Figure 3.1: This figure shows how frequently SDP E(1) exactly recovers the orientation X\ = x\x\T when therelative orientation measurements are not available. Here, the centered ground truth translation t\c has a fixed l2norm. For each (σt, n), 10 realization of the data were generated and the exactness of the recovered orientation wasverified. The frequency of success is represented in grayscale (white for 100% success and black for 0% success). Theresults agree with the analytic predictions (solid curve). The analytic predictions for the regula SDP E(1) and SDPZ2 are plotted as well. (The orientation noise σx was not used in the experiments. Nevertheless, it was set to beequal to the translation noise σt and plotted on the right vertical axis for reference in the SDP E(1) and SDP Z2analytical predictions.)
24
Chapter 4
Conclusion
We showed that the SDP relaxation for the E(1) synchronization problem exactly recovers of the orientation
ground truth and tightly recovers the translation MLE. As noted in [15–17], synchronization over SE(d)
includes important problems in robotics and computer vision. We expect that establishing the tightness of
synchronization over SE(d) should be similar to establishing that over E(d). Therefore, we hope that our
result for E(1) will be extended to higher dimensions.
25
Bibliography
[1] E. Abbe, A. S. Bandeira, A. Bracher, and A. Singer. Decoding binary node labels from censored
edge measurements: Phase transition and efficient recovery. Transactions on Network Science and
Engineering, to appear. Available online at arXiv:1404.4749 [cs.IT], 2014.
[2] E. Abbe, A. S. Bandeira, and G. Hall. Exact recovery in the stochastic block model. IEEE Transactions
on Information Theory, 62(1):471–487, Jan 2016.
[3] P.-A. Absil, C.G. Baker, and K.A. Gallivan. Trust-region methods on riemannian manifolds. Foundations
of Computational Mathematics, 7(3):303–330, Jul 2007.
[4] N. Agarwal, A.S. Bandeira, K. Koiliaris, and A. Kolla. Multisection in the stochastic block model
using semidefinite programming. Compressed Sensing and its Applications: MATHEON Workshop
2015 (Applied and Numerical Harmonic Analysis), to appear., abs/1507.02323, 2015.
[5] D. Amelunxen, M. Lotz, M.B. McCoy, and J.A. Tropp. Living on the edge: Phase transitions in convex
programs with random data. Information and Inference, 3, 2014.
[6] A. S. Bandeira. Convex relaxations for certain inverse problems on graphs. PhD thesis, Program in
Applied and Computational Mathematics, Princeton University, 2015.
[7] A.S. Bandeira, N. Boumal, and A. Singer. Tightness of the maximum likelihood semidefinite relaxation
for angular synchronization. Mathematical Programming, pages 1–23, 2016.
[8] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, New York, NY,
USA, 2004.
[9] E. J. Candes, J. Romberg, and T. Tao. Robust uncertainty principles: exact signal reconstruction from
highly incomplete frequency information. IEEE Transactions on Information Theory, 52(2):489–509,
Feb 2006.
26
[10] E.J. Candes and B. Recht. Exact matrix completion via convex optimization. Foundations of Compu-
tational Mathematics, 9(6):717, Apr 2009.
[11] D. L. Donoho. Compressed sensing. IEEE Transactions on Information Theory, 52(4):1289–1306, April
2006.
[12] J. Gallier. The schur complement and symmetric positive semidefinite (and definite) matrices. Available
online at http://www.cis.upenn.edu/ jean/schur-comp.pdf, Dec 2010.
[13] R. A. Horn. Topics in Matrix Analysis. Cambridge University Press, New York, NY, USA, 1986.
[14] Y. Nesterov and A. Nemirovskii. Interior-Point Polynomial Algorithms in Convex Programming. Society
for Industrial and Applied Mathematics, 1994.
[15] D.M. Rosen. Certifiably Correct SLAM. PhD thesis, MIT, 2016.
[16] D.M. Rosen, L. Carlone, A.S. Bandeira, and J.J. Leonard. A certifiably correct algorithm for synchro-
nization over the special Euclidean group. In Intl. Workshop on the Algorithmic Foundations of Robotics
(WAFR), San Francisco, CA, December 2016.
[17] D.M. Rosen, L. Carlone, A.S. Bandeira, and J.J. Leonard. SE-Sync: A certifiably correct algorithm for
synchronization over the special Euclidean group. Technical Report MIT-CSAIL-TR-2017-002, Com-
puter Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge,
MA, February 2017.
[18] H. J. Sommers, A. Crisanti, H. Sompolinsky, and Y. Stein. Spectrum of large random asymmetric
matrices. Phys. Rev. Lett., 60:1895–1898, May 1988.
[19] J.A. Tropp. User-friendly tail bounds for sums of random matrices. Foundations of Computational
Mathematics, 12(4):389–434, Aug 2012.
[20] L. Vandenberghe and S. Boyd. Semidefinite programming. SIAM Review, 38(1):49–95, 1996.
27
Appendix A
Proof of Lemma 2.1.1
We have the following Rn×n matrices:
L = BT1 B1 = 2
n− 1 −1 −1 · · · −1 −1 −1
−1 n− 1 −1 · · · −1 −1 −1
−1 −1 n− 1 · · · −1 −1 −1
......
......
......
...
−1 −1 −1 −1 · · · −1 n− 1
Σ = BT2 B2 =
∑j 6=1 S
21j 0 0 · · · 0 0
0∑j 6=2 S
22j 0 · · · 0 0
0 0∑j 6=3 S
23j · · · 0 0
......
......
......
0 0 0 · · · 0∑j 6=n S
2nj
28
We observe (2.1) is equivalent to:
arg mint∈Rn
x∈Zn2
‖[B1 B2
]tx
‖22 − 2xTY x
= arg mint∈Rn
x∈Zn2
‖B1t+B2x‖22 − 2xTY x
= arg mint∈Rn
x∈Zn2
tTLt+ 2xTV T t+ xTΣx− 2xTY x
= arg mint∈Rn
x∈Zn2
tTLt+ 2xTV T t− 2xTY x
since Σ is diagonal, we have xTΣx = trΣ = constant for all x ∈ Zn2 . For a fixed x, the least squares solution
to the minimization problem is given by t∗ = −B†1B2x (up to a global shift since 1 is in the nullspace of
B†1B2, a rank n− 1 matrix).
arg mint∈Rn
x∈Zn2
tTLt+ 2xTV T t
= arg minx∈Zn
2
xT (B†1B2)TBT1 B1B†1B2x− 2xT (BT1 B2)TB†1B2x
= arg minx∈Zn
2
xTBT2 (B1B†1)TB1B
†1B2x− 2xTBT2 B1B
†1B2x
= arg minx∈Zn
2
xTBT2 ((B1B†1)TB1B
†1 − 2B1B
†1)B2x
= arg minx∈Zn
2
−xTBT2 B1B†1B2x
Thus, we can express the optimization problem in terms of x only:1
arg minx∈Zn
2
−xTBT2 B1B†1B2x− 2xTY x
1The same result can be obtained from Appendix A.5.5 in [8] and Proposition 4.2 in [12] Observe that xTV T1 = 0, which
implies that V x ⊥ kerL for all x. Therefore, (In×n − LL†)V x = 0, which implies that
mint∈Rn
tTLt + 2xTV T t = −xTV TL†V x = −xTBT2 B1B
†1B2x
arg mint∈Rn
tTLt + 2xTV T t = −L†V x = −B†1B2x
29
which is equivalent to
arg maxX=xxT
x∈Zn2
Trace (QX)
where Q = BT2 B1B†1B2 + 2Y .
Observe that
BT2 B1B†1B2 =BT2 B1(BT1 B1)†BT1 B2
=V TL†V
=1
2nV T (I − 1
n11
T )V
=1
2nV TV
since 1TV = 0 (since L is a Laplacian of a complete graph, kerB1 = kerL = span(1)). Therefore, Q =
12nV
TV + 2Y .
30
Appendix B
Proof of Lemma 3.2.1
B.1 Decomposition of L(TG)
We have
M (TG) =(EV )TEV = BT2TB1BT1 B2T
=
∑j 6=1 T1j −T12 · · · −T1,n
−T21∑j 6=2 T2j · · · −T2,n
−T31 −T32 · · · −T3,n...
......
...
−Tn1 −Tn2 · · ·∑j 6=n Tn,j
∑j 6=1 T1j −T21 · · · −Tn,1
−T12∑j 6=2 T2j · · · −Tn,2
−T13 −T23 · · · −Tn,3...
......
...
−T1n −T2n · · ·∑j 6=n Tn,j
and therefore
M(TG)ii = (
∑k 6=i
Tik)2 +∑k 6=i
T 2ik
and since we can cancel the diagonal terms of M (TG) and D(TG), we can take
M(TG)ii = (
∑k 6=i
Tik)2 +∑k 6=i
T 2ik
31
Also
M(TG)ij = Tij(
∑k 6=i
Tik −∑k 6=j
Tjk) +∑k 6=i,j
TikTjk
= Tij [∑k 6=i
(tk − ti)−∑k 6=j
(tk − tj)] +∑k 6=i,j
TikTjk
= Tij [(tj − ti)− (ti − tj) + (n− 2)(tj − ti)] +∑k 6=i,j
TikTjk
= nT 2ij +
∑k 6=i,j
TikTjk
where i 6= j. Therefore,
M(TG)ij =
∑k
(TikTjk + T 2ij)
=∑k
(t\k − t\i)(t
\k − t
\j) + (t\j − t
\i)
2
=∑k
t\2k − t\k(t\i + t\j) + t\it
\j + (t\j − t
\i)
2
=n(t\2i + t\2j − t\it\j) +
∑k
t\2k
This implies that
M (TG) = n(1ddiag(t\t\T )− t\t\T + ddiag(t\t\T )1T ) + ‖t\‖211T
Consequently,
D(TG)ii =
∑j
(n(t\2i + t\2j − t\it\j) +
∑k
t\2k
=n2t\2i + n∑j
t\2j + n∑k
t\2k
=n2t\2i + 2n∑k
t\2k
This implies that
D(TG) = n2ddiag(t\t\T ) + 2n‖t\‖2I
32
Therefore,
L(TG) = n2ddiag(t\t\T ) + 2n‖t\‖2I − n(1ddiag(t\t\T )− t\t\T + ddiag(t\t\T )1T )− ‖t\‖211T
B.2 Decomposition of L(TB)
G =
∑j 6=1N1j −N12 · · · −N1,n
−N21
∑j 6=2N2j · · · −N2,n
−N31 −N32 · · · −N3,n
......
......
−Nn1 −Nn2 · · ·∑j 6=nNn,j
∑j 6=1N1j −N21 · · · −Nn,1
−N12
∑j 6=2N2j · · · −Nn,2
−N13 −N23 · · · −Nn,3...
......
...
−N1n −N2n · · ·∑j 6=nNn,j
and therefore
Gii = (∑k 6=i
Nik)2 +∑k 6=i
N2ik
Gij = Nij(∑k 6=i
Nik −∑k 6=j
Njk) +∑k 6=i,j
NikNjk
= 2N2ij +Nij(
∑k 6=i,j
Nik −∑k 6=i,j
Njk) +∑k 6=i,j
NikNjk
EGii = 2(n− 1)
EGij = 2
where i 6= j. Therefore,
M(TB)ii = 2(n− 1)σ2
t
M(TB)ij = 2σ2
t
where i 6= j or equivalently,
M (TB) = 2(n− 2)σ2t I + 2σ2
t 11T
33
Therefore,
D(TB)ii = 4(n− 1)σ2
t
D(TB)′
ij = 0
L(TB)ii = 2(n− 1)σ2
t
L(TB)ij = −2σ2
t
where i 6= j. or equivalently
L(TB) = 2σ2t (nI − 11T )
B.3 Decomposition of L(TN)
B.3.1 Decomposition of H +HT
Let H = BT2NB1BT1 B2T and HT = BT2TB1B
T1 B2N . Then
H =
∑j 6=1N1j −N12 · · · −N1,n
−N21
∑j 6=2N2j · · · −N2,n
−N31 −N32 · · · −N3,n
......
......
−Nn1 −Nn2 · · ·∑j 6=nNn,j
∑j 6=1 T1j −T21 · · · −Tn,1
−T12∑j 6=2 T2j · · · −Tn,2
−T13 −T23 · · · −Tn,3...
......
...
−T1n −T2n · · ·∑j 6=n Tn,j
and therefore
Hii = (∑k 6=i
Nik)(∑k 6=i
Tik) +∑k 6=i
NikTik
Hij = −Nij∑k 6=j
Tjk + Tij∑k 6=i
Nik) +∑k 6=i,j
NikTjk
34
where i 6= j. Similarly
HTii = Hii
HTij = −Tij
∑k 6=j
Njk +Nij∑k 6=i
Tik) +∑k 6=i,j
TikNjk
where i 6= j. Therefore,
H +HTii = 2[(
∑k 6=i
Nik)(∑k 6=i
Tik) +∑k 6=i
NikTik]
H +HTij = Nij(−
∑k 6=j
Tjk +∑k 6=i
Tik) + Tij(∑k 6=i
Nik −∑k 6=j
Njk) +∑k 6=i,j
(NikTjk + TikNjk)
= Nij(−∑k 6=j
(tk − tj) +∑k 6=i
(tk − ti)) + Tij(∑k 6=i
Nik −∑k 6=j
Njk) +∑k 6=i,j
(NikTjk + TikNjk)
= Nij(−(ti − tj) + (tj − ti) +∑k 6=i,j
(tj − ti)) + Tij(∑k 6=i
Nik −∑k 6=j
Njk) +∑k 6=i,j
(NikTjk + TikNjk)
= nTijNij + Tij(∑k 6=i
Nik −∑k 6=j
Njk) +∑k 6=i,j
(NikTjk + TikNjk)
= nTijNij +∑k
(NikTik +NkjTkj)
Note that in the foregoing calculation we used the following result
Tij(∑k 6=i
Nik −∑k 6=j
Njk) +∑k 6=i,j
(NikTjk + TikNjk)
= 2TijNij + Tij(∑k 6=i,j
Nik −Njk) +∑k 6=i,j
(NikTjk + TikNjk)
= 2TijNij +∑k 6=i,j
Nik(Tij + Tjk) +Njk(−Tij + Tik)
=∑k
NikTik +NjkTjk
35
where i 6= j. Accordingly,
(DH+HT )ii =2[(∑k 6=i
Nik)(∑k 6=i
Tik) +∑k 6=i
NikTik]
+∑j 6=i
nTijNij +∑k
NikTik +NkjTkj
=2[(∑k 6=i
Nik)(∑k 6=i
Tik) +∑k 6=i
NikTik]
+ 2n∑k 6=i
TikNik +∑j
∑k 6=j
TkjNkj
(DH+HT )ij = 0
Since we can cancel the terms on the main diagonal of H +HT and DH+HT , we have
(H +HT )ii = 2∑k 6=i
NikTik
(H +HT )ij = nTijNij +∑k
(NikTik +NkjTkj)
or equivalently
H +HT = nT �N +
N(1:)T
T(1:)
...
NT(n:)T(n:)
1T + 1
[NT
(:1)T(:1), · · · , NT(:n)T(:n)
]
B.3.2 Decomposition of DH+HT
(DH+HT )ii = 2n∑k 6=i
TikNik +∑j
∑k 6=j
TkjNkj
= 2n∑k 6=i
TikNik + 2∑j
∑k<j
TkjNkj
(DH+HT )ij = 0
36
Therefore DH+HT = 2nD1 + 2D2 where D1 is given by
(D1)ii =∑k 6=i
TikNik
(D1)ij = 0
and
D2 =∑j
∑k<j
TkjNkjI
B.3.3 Decomposition of G and DG
Also, as we saw previously
Gii = (∑k 6=i
Nik)2 +∑k 6=i
N2ik
Gij = 2N2ij +Nij(
∑k 6=i,j
Nik −∑k 6=i,j
Njk) +∑k 6=i,j
NikNjk
where i 6= j.
Therefore DG is given by
(DG)ii = (∑k 6=i
Nik)2 +∑k 6=i
N2ik +
∑j 6=i
[2N2ij +Nij
∑k 6=i,j
(Nik − 2Njk)]
= (∑k 6=i
Nik)2 +∑j 6=i
(3N2ij +Nij
∑k 6=i,j
(Nik − 2Njk))
(DG)ij = 0
Since we can cancel the terms on the main diagonal of G and DG, we have
Gii =∑k 6=i
N2ik
Gij = 2N2ij +Nij(
∑k 6=i,j
Nik −∑k 6=i,j
Njk) +∑k 6=i,j
NikNjk
(DG)ii =∑j 6=i
(3N2ij +Nij
∑k 6=i,j
(Nik − 2Njk))
(DG)ij = 0
37
The foregoing implies that
Gij = Nij(∑k
Nik +∑k
Nkj)−∑k 6=i,j
NikNkj
G = N � (N11T + 11TN)−N2
where � denotes the componentwise (i.e. Schur or Hadamard) product of matrices, and
(DG)ii =∑j
[Nij∑k
(Nik − 2Njk)
(DG)ij = 0
or alternatively
DG = D2N + 2DNTN
38
Appendix C
Proof of Lemma 3.3.1
C.1 Minimum eigenvalue of L(TG)
We previously saw that
L(TG) = n2diag(t\t\T ) + 2n‖t\‖2I − n(1diag(t\t\T )− t\t\T + diag(t\t\T )1T )− ‖t\‖211T
Therefore,
λ2(L(TG)) ≥ n2t2min + 2n‖t\‖2
where tmin = min(|t\1|, ..., |t\n|).
C.2 Spectral radius of L(TN)
Based on the results below, we have
‖L(TN)‖ ≤σt‖DTH+H‖+ σt‖H +HT ‖+ σ2
t ‖DG‖+ σ2t ‖G‖+ 2σ2
t n
≤σt[3nv√
(2 + ε) log n+ ‖T‖F√
2ε log n]
+ σ2t [8n√n+ 6n]
39
C.2.1 Spectral radius of H +HT
We previously showed that
H +HT = nT �N +
N(1:)T
T(1:)
...
NT(n:)T(n:)
1T + 1
[NT
(:1)T(:1), · · · , NT(:n)T(:n)
]
To determine the spectral radius of H +HT in the eigenspace orthogonal to 1, it suffices to consider the
spectral norm of E = nT �N . We use the following lemma.
Lemma C.2.1. (Following Section 4.3 in [19]) Let T be a deterministic n × n skew-symmetric matrix
and N be a random n × n skew-symmetric matrix with independent standard normal (Gaussian) entries.
Construct the random matrix T �N and observe that its (i, j) component is a Gaussian variable with zero
mean and variance |Tij |2. We have
P (‖T �N‖ ≥ a) ≤ 2ne−a2/2v2
where v2 = max(‖T1:‖2, ‖T1:‖2, ..., ‖Tn:‖2‖) and Ti: is the i-th row of the matrix T.
Proof. (of Lemma C.2.1) We decompose the matrix of interest as a Gaussian series:
T �N =∑i<j
NijTijEij
where Eij is an n× n matrix having 1 as the ij-th element, -1 as the ji-th element, and zeros elsewhere.
To determine the variance parameter, we let
∑i<j
TijEij(TijEij)T =
∑i<j
T 2ijIij = diag(‖T1:‖2, ‖T1:‖2, ..., ‖Tn:‖2)
= diag(‖T:1‖2, ‖T:2‖2, ..., ‖T:n‖2) =∑i<j
(TijEij)TTijEij
Therefore,
v2 = ‖diag(‖T1:‖2, ‖T1:‖2, ..., ‖Tn:‖2)‖ = max(‖T1:‖2, ‖T1:‖2, ..., ‖Tn:‖2‖)
and the final result follows by Corollary 4.2 in [19].
40
Therefore, for ε > 0, we have P (‖T �N‖ ≥ a) ≤ n−ε if
a ≥ v√
(2 + ε) log n
i.e.
‖H +HT ‖ < nv√
(2 + ε) log n
C.2.2 Spectral radius of DH+HT
We previously showed that DH+HT = 2nD1 + 2D2 where D1 is given by
(D1)ii =∑k 6=i
TikNik
(D1)ij = 0
and
D2 =∑j
∑k<j
TkjNkjI
By the union bound and the upper deviation inequality,
P (‖D1‖ > a) =nP (|∑k 6=i
TikNik| > a)
=nP (N4 >a
v)
=ne−a2
2nv2
where N4 ∼ N(0, 1) and again
v = max(‖T1:‖, ‖T1:‖, ..., ‖Tn:‖)
Thus, if a ≥ v√
(2 + ε) log n, for ε > 0 we have
P (‖D1‖ > a) = n−ε/2
41
Therefore,
‖D1‖ ≤ v√
(2 + ε) log n
with high probability as n→∞.
Similarly,
P (‖D2‖ > a) =P (|∑j
∑k<j
TkjNkj | > a)
=P (N5 >a
‖T‖F /√
2)
=e−a2
V 2
where N5 ∼ N(0, 1). Thus, if a ≥ ‖T‖F√2
√ε log n, for ε > 0 we have
P (‖D2‖ > a) = n−ε
Therefore,
‖D2‖ ≤‖T‖F√
2
√ε log n
with high probability as n→∞. Combining the foregoing results, we get
‖DH+HT ‖ ≤ 2nv√
(2 + ε) log n+ ‖T‖F√
2ε log n
C.2.3 Spectral radius of G
We previously saw that
G = N � (N11T + 11TN)−N2
42
where � denotes the componentwise (i.e. Schur or Hadamard) product of matrices. Consequently, since the
Hadamard product is submultiplicative with respect to the spectral norm [13,18], we have
‖G‖ ≤ ‖N‖‖N11T + 11TN‖+ ‖N2‖ ≤ 4n
C.2.4 Spectral radius of DG
As we noted previously, we can assume that
DG = 2DNTN
By the semi-circle law,
‖NTN1‖ =
√∑i
(DG)2ii ≤ 4n√n
with high probability as n→∞. This implies that
P (‖DG‖ > 8n√n) ≤P (∪i|DGii| > 8n
√n) = 0
with high probability as n→∞, i.e.
‖DG‖ ≤ 8n√n
C.3 Spectral radius of D(XN) and M (XN)
The spectral radius of M (XN) is bounded by 2σx√n for n → ∞ in accordance with the semi-circular law.
Therefore, we just need to bound the diagonal entries of DW given by∑nj=1Wij .
43
By the union bound and the upper deviation inequality,
P (‖DW ‖ > a) =nP (|n∑j=1
Wij | > a)
=nP (N3 >a√n
)
=ne−a2
2n
where N3 ∼ N(0, 1) Thus, if a ≥√
(2 + ε)n log n, for ε > 0 we have
P (‖DW ‖ > a) =ne−(1+ε/2) logn
=n−ε/2
Therefore,
‖D(XN)‖ ≤ σx√
(2 + ε)n log n
with high probability as n→∞.
44