exact recovery in semidefinite relaxation of synchronization …vkobzar/kobzar_thesis.pdf · exact...

Exact Recovery in Semidefinite Relaxation of

Synchronization over the Euclidean Group in One

Dimension

by

Vladimir A. Kobzar

A Thesis

Submitted in Partial

Fulfillment of the Requirements

for the Degree

of Master of Science

Courant Institute of Mathematical Sciences

New York University

September 2017

Adviser: Afonso S. Bandeira

c© Copyright by Vladimir A. Kobzar, 2017.

All Rights Reserved

Abstract

Nonconvex maximum likelihood estimation problems are often hard to solve computationally. As a result,

convex relaxations of the maximum likelihood estimator (MLE) are commonly used, in particular relaxations

based on semidefinite programming (SDP), which can be solved in a polynomial amount of time.

We consider the problem of synchronization over the Euclidean group in one dimension (E(1)): the goal is

to recover n elements of such group (ground truth) from measurements of their pairwise products corrupted

with non-adversarial noise. Informally this can be seen as the problem of recovering the orientation and

position of a ballerina that moves along a line and faces in one of two possible directions in each of n poor

quality photos.

We show that the SDP relaxation of synchronization over E(1) exactly recovers the orientation ground

truth with high probability. This is demonstrated for any level of noise by leveraging non-asymptotic bounds

for the spectral norm of random matrices with independent entries.

From the orientation, the MLE of the position is recovered by the least squares estimate. Such recovery

is tight, meaning that the least squares solution matches the MLE of the position ground truth. However,

due to noise the exact recovery of the position ground truth is not possible.

Synchronization over the special Euclidean group in d dimensions (SE(d)) includes important problems

in robotics and computer vision. We expect that establishing the tightness of synchronization over SE(d)

should be similar to establishing that over the Euclidean group in d dimensions (E(d)). Therefore, we hope

that our result for E(1) will be extended to higher dimensions.

iii

Acknowledgements

First and foremost, I am extremely grateful to Afonso Bandeira for all the invaluable guidance, instruction

and support he provided as my thesis advisor and teacher at the Courant Institute. The research in this

thesis is inspired by Afonso’s course entitled ”Mathematics of Data Science” at Courant, which greatly

shaped my academic interests.

I also would like to thank Carlos Fernandez-Granda, my faculty mentor at the NYU Center for Data

Science (CDS) and the second reader of this thesis, for all of his advice and encouragement with respect to

my research.

I owe a debt of gratitude to Sinan Gunturk and Yuri Bakhtin, whose leadership and mentorship profoundly

shaped my career at Courant.

I am grateful to all participants in the Math and Data Group at Courant and the CDS. Being a part of

that group provided an extremely supportive community while I was working on this thesis.

Keith Moffat, Vukica Srajer and Robert Henning first introduced me to image processing problems in

the context of dynamic X-ray diffraction studies of macromolecules at BioCARS, Argonne National Labo-

ratory/The University of Chicago. My work with them at BioCARS motivated my interest in molecular

imaging models and, more broadly, the mathematics of data. I am very fortunate to have them as my

mentors, colleagues and friends.

Last but not least, Brett Bernstein very generously shared ideas about this thesis and various related

topics.

Any errors should be attributed to me.

iv

Contents

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

1 Introduction 1

1.1 Formulation of synchronization over E(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Overview of related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3.1 Synchronization over Z2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3.2 Synchronization over SE(d) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.4 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Nonconvex MLE and its relaxation 10

2.1 Least squares estimate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2 SDP relaxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3 Strong duality and exact recovery of the orientation 14

3.1 Dual certificate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2 Decomposition of LQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.3 Exactness conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.4 Exactness without pairwise orientation measurements . . . . . . . . . . . . . . . . . . . . . . 22

4 Conclusion 25

A Proof of Lemma 2.1.1 28

v

B Proof of Lemma 3.2.1 31

B.1 Decomposition of L(TG) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

B.2 Decomposition of L(TB) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

B.3 Decomposition of L(TN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

B.3.1 Decomposition of H +HT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

B.3.2 Decomposition of DH+HT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

B.3.3 Decomposition of G and DG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

C Proof of Lemma 3.3.1 39

C.1 Minimum eigenvalue of L(TG) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

C.2 Spectral radius of L(TN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

C.2.1 Spectral radius of H +HT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

C.2.2 Spectral radius of DH+HT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

C.2.3 Spectral radius of G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

C.2.4 Spectral radius of DG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

C.3 Spectral radius of D(XN) and M (XN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

vi

Chapter 1

Introduction

Many signal recovery problems are solved as optimization problems over a set of feasible signals where the

optimum represents the signal with the maximum likelihood given the data. Unfortunately such problems

are often noncovex and the parameter space is often exponentially large, which makes them computationally

challenging. It is therefore common to use heuristics, such as expectation maximization, to approximate

the maximum likelihood estimator (MLE) [3]. However, the convergence of heuristic methods to the global

maximum is not formally guaranteed in many cases. Also when these methods do attain a global maximum,

in general it is not possible to formally certify that fact.

As a result, another common approach is to determine the MLE over a larger feasible set by removing

the nonconvex constrains, a so-called convex relaxation. The idea is that convex optimization problems

are generally guaranteed to converge to the global maximum. One particular class of relaxations is based

on semidefinite program (SDP) where a linear objective function is optimized over a convex set of positive

semidefinite matrices. An SDP can be solved in a polynomial amount of time [14]. However, the solution

may not be in the original parameter space, and therefore its projection onto the original parameter space

may be suboptimal.

However, in a non-adversarial noise setting, it may be possible to achieve tightness of the convex re-

laxation, meaning that that the relaxation recovers the solution in the original parameter space with high

probability.1 Thus, instead of addressing every possible instance of a computationally intractable prob-

lem, the convex optimization route achieves an optimal and computationally tractable solution with high

1In the context of recovering signal or other object of interest from incomplete measurements, such as compressed sensing[9, 11], matrix completion [10], and linear inverse problems [5], the recovery by convex optimization is achieved with highprobability.

1

probability.

In the case of a discrete signal, the relaxation may be not only tight, but may also be exact, meaning that

the MLE coincides with the ground truth signal with high probability. For example, exactness was shown

in the context of Z2 signal recovery (SDP Z2) [6, 7] from noisy measurements, as well as in the context of

the stochastic block model, also known as correlation clustering, with two [1, 2] or more communities [4].2

So-called synchronization problems are one important class of signal recovery problems. They entail

estimating a set of signals from data concerning relations or interactions between them. As described

in [6, 16,17], this includes various problems in:

1. Computer vision, such as determining structure from motion. This entails building a three-dimensional

model of an object from several two-dimensional photos taken from unknown positions. Although it is

usually not possible to estimate the position of the object relative to the camera from a given photo,

one can compare pairs of photos and estimate their relative positions.

2. Signal processing, such as synchronization for molecule reconstruction in cryo-electron microscopy. This

entails resolving the global three-dimensional structure of a molecule by recording multiple images of

the molecule at unknown orientations, where we can estimate relative orientation of the molecule in a

pair of images.

3. Robotics, such as pose-graph simultaneous localization and mapping (pose-graph SLAM). This entails

the determination of a collection of poses (position and orientation) of a robot or another object from

noisy pairwise relative measurements.

2In the context of compressed sensing, matrix completion and linear inverse problems, exact recovery from incomplete datawas achieved with high probability as well. See the references cited in the previous footnote.

2

1.1 Formulation of synchronization over E(1)

We consider one particular problem of the above class: synchronization over the Euclidean group in one

dimension (E(1)), where E(1) is a product RoO(1) = Ro Z2 with the multiplication given by3

r1 · r2 = (t1, x1) · (t2, x2) = (t1 + x1t2, x1x2).

The goal is to estimate the values of a set of unknown group elements r\i = (t\i , x\i) ∈ E(1) (ground truth)

for 1 ≤ i ≤ n from pairwise products

r\−1i r\j = (x\i(t\j − t

\i), x

\ix\j)

corrupted with noise. 4

1.2 Contribution

We show that although the synchronization over E(1) is non-convex, in the case of Gaussian noise the

orientation is recovered by an SDP relaxation (SDP E(1)) matches the ground truth x\i ∈ Z2 (i.e., achieves

exact recovery) with high probability. This holds even if as noise levels grow to infinity as the number

of elements n grows to infinity. Moreover, such recovery is achieved with fewer measurements than the

number of measurements needed to achieve recovery by SDP Z2, as defined in Section 1.3.1, i.e., if only

relative orientation x\−1i x\j measurements were available. These results are demonstrated by leveraging

non-asymptotic bounds for the spectral norm of random matrices with independent entries by an approach

adapted from [6,7].

Numerical simulations confirm the analytic conditions for the exactness of recovery of orientations in

SDP E(1).5 Figures 1.1 and 1.2 demonstrate the exactness of recovery of orientation where we scale a sparse

3This multiplication convention follows from matrix multiplication if we represent (t, x) by

[x t0 1

]. As a technical matter,

the multiplication by x1 is a homomorphism ϕx1 : R → R given by ϕx1 (t2) = x1t2. Since trivially R ∈ Aut(R) and themultiplication in E(1) is given by

· : (R× Z2)× (R× Z2)→ R× Z2

(t1, x1) · (t2, x2) = (t1 + ϕx1 (t2), x1x2)

for t1, t2 in R and x1, x2 in Z2, the Cartesian product R× Z2 meets the definition of an outer semidirect product. Therefore,we denote it by R o Z2.

4Note that r−1 = (t, x)−1 = (−xt, x).5The following experiments were performed on a Linux system with two Intel Xeon E5-2680 (2.80 GHz) CPUs (20 cores)

and 128 GB of memory. The experimental software was written in Matlab and used the default CVX solver (SDPT3 4.0).

3

2 10 100 600 0

1

10

32

2 10 100 600 0

1

10

32

Figure 1.1: This figure shows how frequently SDP E(1) exactly recovers the orientation X\ = x\x\T when thecentered translation ground truth t\c has a fixed l2 norm. For each (σx, n), 10 realization of the data were generatedand the exactness of the recovered orientation was verified. The frequency of success is represented in grayscale(white for 100% success and black for 0% success). The results agree with the analytic predictions (solid curve). Theanalytic predictions without the translation data (SDP Z2) are plotted as well (dashed curve). (In these experiments,the translation noise σt (plotted on the right vertical axis) is equal to the orientation noise σx. For such levels ofnoise, the experiments demonstrate on the tightness of the σx bound.)

translations ground truth t\ to achieve, respectively figures, the fixed and increasing l2 norms of the centered

translations ground truth t\c = t\ − 1n

∑ni=1 t

\i . (Similar results can be obtained for a nonsparse t\.)

The translations estimate t ∈ Rn are recovered by a least squares estimate from the orientation ground

truth x\, and such recovery is tight (meaning that t is the MLE of the position ground truth with high

probability). However, due to noise the exact recovery of the position ground truth t\ is not possible.

The orientation ground truth x\ can be recovered exactly even without the pairwise orientation measure-

ments x\ix\j since the measurements of the relative translations x\i(t

\j−t

\i) incorporate orientation information.

In this context, for a given level of noise more measurements are needed to achieve recovery than in the con-

text of the regular SDP E(1) (where measurements of (x\i(t\j − t

\i), x

\ix\j) are available). Compare Figure 1.1

with Figure 3.1.

4

2 10 100 600 0

1

10

32 2 10 35

0

1

10

32

Figure 1.2: This figure shows how frequently the semidefinite relaxation (SDP E(1)) exactly recovers the orientationX\ = x\x\T when the centered ground truth translation t\c has increasing l2 norm (plotted on the top horizontal axis).For each (σx, n), 10 realization of the data were generated and the exactness of the recovered orientation was verified.The frequency of success is represented in grayscale (white for 100% success and black for 0% success). The resultsagree with the analytic predictions (solid curve). (In these experiments, the translation noise σt (plotted on theright vertical axis) is equal to the orientation noise σx. For such levels of noise, the experiments demonstrate on thetightness of the σx bound.)

5

1.3 Overview of related work

1.3.1 Synchronization over Z2

The synchronization problem over Z2 (which arises, for example, in the context of community detection for

two communities) is to recover zi ∈ {±1}, 1 ≤ i ≤ n from observations given by:

yij = zizj + σwij

where wij = wji is a standard Gaussian (N(0,1)), and the MLE is given by a least squares solution:

z = arg minx1,..,xn

∑ij

(yij − xixj)2

In [6, 7] this problem was ”lifted” into an equivalent matrix form: for the underlying ground truth

z ∈ {±1}n, the observations are given by the n× n matrix:

Y = zzT + σW

where W is a real symmetric random matrix (Wij = Wji = N(0,1)). Then the MLE for zzT is given by

arg maxX=xxT

tr(Y X)

An SDP can be formed by relaxing the last constraint as follows:

arg maxX�0Xii=1

tr(Y X)

In the above-referenced papers, it was shown that the relaxation recovers zzT exactly with high probability

if σ <√

n(2+ε) logn .

Note that when the t\c = 0, the synchronization problems over Z2 and E(1) are equivalent. This is

illustrated in Figure 1.3.

6

2 10 100 600 0

1

10

32

2 10 100 600 0

1

10

32

Figure 1.3: This figure shows how frequently the semidefinite relaxation (SDP E(1)) exactly recovers the orientationX\ = x\x\T when the centered translation ground truth t\c is equal to zero. For each (σx, n), 10 realization of the datawere generated and the exactness of the recovered orientation was verified. The frequency of success is representedin grayscale (white for 100% success and black for 0% success). The results agree with the analytic predictionsfor SDP E(1), which in this context are the same as those for SDP Z2 (solid curve). (In these experiments, thetranslation noise σt (plotted on the right vertical axis) is equal to the orientation noise σx. For such levels of noise,the experiments demonstrate the tightness of the σx bound.)

7

1.3.2 Synchronization over SE(d)

The SE(d) synchronization problem entails estimating a set of unknown poses pi = (ti, xi), 1 ≤ i ≤ n given

noisy measurements of their pairwise relative transforms p−1i pj . As noted previously, this problem arises in

robotics, such asr pose-graph SLAM and computer vision, such as camera pose estimation. It entails the

determination of a collection of poses (position and orientation) of a robot or another object from noisy

pairwise relative measurements.

In [15–17], the SE(d) synchronization problem in a non-adversarial (but operationally relevant) noise

regime was posed a nonconvex MLE. The algorithm proposed in these papers verified the tightness of the

MLE recovery of an SDP relaxation post-hoc for each given instance of problem.

1.4 Notation

We will use the following standard matrix and probability notation. For a matrix M , we denote it’s k-

th smallest eigenvalue by λk(M), the largest eigenvalue by λmax(M), its spectral and Frobenius norms by

‖M‖ and ‖M‖F , respectively, and diag(M) refers to a vector with the diagonal elements of M as entries,

and ddiag(M) sets the off-diagonal entries of M to zero. For x ∈ Rn, diag(x) refers to a diagonal matrix

D ∈ Rn×n with Dii = xi. 1n denotes the vector in Rn with all components equal to 1 (we will omit the

subscript when the dimension is clear from the context). DM refers to a diagonal matrix diag(M1), i.e.,

with Dii =∑nj=1 = Mij , and LM is a matrix given by LM = DM −M .

a . b means that there exists a universal constant C > 0 such that a ≤ Cb.

We say that an event E happens with high probability as n → ∞ if there exists ε > 0 such that

P[E ] = 1− n−ε. M � 0 means that not only M is positive semidefinite (PSD), but is also symmetric.

E(d) refers to the Euclidean group of isometries in d dimensions given by

X : X =

R | r

− −

01×3 | 1

, R ∈ O(d), r ∈ Rd

8

SE(d) refers to the special Euclidean group of rigid body motions in d dimensions given by

X : X =

R | r

− −

01×3 | 1

, R ∈ SO(d), r ∈ Rd

where O(d) and SO(d) are the orthogonal and special orthogonal groups in d dimensions respectively.

9

Chapter 2

Nonconvex MLE and its relaxation

2.1 Least squares estimate

In this chapter and Chapter 3, we adapt the approach developed in [6,7] for synchronization over Z2 to the

synchronization problem over E(1): determine r\ = (t\, x\) ∈ E(1)n from the following noisy observations

Zij = (Sij , Yij) =(t\i , x\i)−1(t\j , x

\j) + (σtNij , σxWij)

=(x\i(t\j − t

\i), x

\ix\j) + (σtNij , σxWij)

where xiNij = −xjNji and Wij = Wji are N(0,1) i.i.d. for i 6= j (Nii = Wii = 0). For simplicity, we assume

that we have a complete set of n2 − n pairwise relative measurements.

By definition, the maximum a posteriori estimator (MAP) maximizes the probability of recovering r\ =

(t\, x\). Since we have no prior information on r\, we assume a uniform prior, in which case, the MAP is

given by the MLE, i.e., the least squares solution:

arg minri,rj∈{RZ2}

∑i,j

‖r−1i rj − Zij‖22

10

where

‖r−1i rj − Zij‖22 = ‖(ti, xi)−1 · (tj , xj)− (Sij , Yij)‖22

= ‖(−xiti + xitj , xixj)− (Sij , Yij)‖22

= (xi(tj − ti)− Sij)2 + (xixj − Yij)2

Accordingly, the minimization problem is given by

r = (t, x) = arg minxi,xj∈{±1}ti,tj∈R

∑i,j

(tj − ti − xiSij)2 − 2xixjYij (2.1)

Lemma 2.1.1. The minimization problem (2.1) is equivalent to

x = arg maxX=xxT

x∈Zn2

Trace (QX) (2.2)

independently of t where

Q =1

2nV TV + 2Y,

V =BT1 B2 =

∑j 6=1 S1j −S21 −S31 · · · −Sn−1,1 −Sn,1

−S12

∑j 6=2 S2j −S32 · · · −Sn−1,2 −Sn,2

−S13 −S23

∑j 6=3 S3j · · · −Sn−1,3 −Sn,3

......

......

......

−S1n −S2n −S3n · · · −Sn−1,n∑j 6=n Sn,j

,

t = −B†1B2x (uniquely up to a global shift),

B1 ∈ Rn2×n is an incidence a matrix of a connected graph, i.e., B1 is comprised of n matrix blocks B(i)1 ∈

Rn×n arranged vertically. Each B(i)1 is given by

(B(i)1 ):i = −1n

(B(i)1 )jj = 1 for i 6= j

11

and the remaining entries are zero. Therefore B1 has the following structure

B1 =

0 0 0 0 0 · · · 0 0 0

−1 1 0 0 0 · · · 0 0 0

−1 0 1 0 0 · · · 0 0 0

−1 0 0 1 0 · · · 0 0 0

......

......

......

......

1 −1 0 0 0 · · · 0 0 0

0 0 0 0 0 · · · 0 0 0

0 −1 1 0 0 · · · 0 0 0

0 −1 0 1 0 · · · 0 0 0

......

......

......

......

......

......

......

......

0 0 0 0 0 · · · 0 0 −1

0 0 0 0 0 · · · 1 0 −1

0 0 0 0 0 · · · 0 1 −1

0 0 0 0 0 · · · 0 0 0

B2 ∈ Rn2×n is a matrix comprised of n matrix blocks B(i)2 ∈ Rn×n arranged vertically. Each B

(i)2 is given

12

by (B(i)2 ):i = −STi: and the remaining entries are zero. Therefore B2 has the following structure:

B2 =

0 0 0 0 0 · · · 0 0 0

−S12 0 0 0 0 · · · 0 0 0

−S13 0 0 0 0 · · · 0 0 0

−S14 0 0 0 0 · · · 0 0 0

......

......

......

......

0 −S21 0 0 0 · · · 0 0 0

0 0 0 0 0 · · · 0 0 0

0 −S23 0 0 0 · · · 0 0 0

0 −S24 0 0 0 · · · 0 0 0

......

......

......

......

......

......

......

......

0 0 0 0 0 · · · 0 0 −Sn,n−3

0 0 0 0 0 · · · 0 0 −Sn,n−2

0 0 0 0 0 · · · 0 0 −Sn,n−1

0 0 0 0 0 · · · 0 0 0

The proof is provided in Section A of the Appendix.

2.2 SDP relaxation

We replace the nonconvex rank constraint in (2.2) as follows:

maxX�0Xii=1|Xij |≤1

Trace (QX)

However, the positive semidefiniteness, together with Xii = 1, implies that the absolute value of the

off-diagonal terms will be dominated by 1. Therefore, the foregoing relaxation is equivalent to:

maxX�0Xii=1

Trace (QX) (2.3)

13

Chapter 3

Strong duality and exact recovery of

the orientation

3.1 Dual certificate

Since Trace (QX) is given by∑i,j QijXji, we can express (2.3) in the vector notation. Specifically, given a

vector q in Rn(n−1)

2 which contains the entries of Q above the main diagonal, we have

Trace (Q) + maxF (x′)�0

2qTx′

where

F (x′) = I +∑

i,j above themain diagonal

x′i,jFi,j

x′ ∈ Rn(n−1)

2 , I is the n×n identity matrix, and Fi,j are a collection of n(n− 1) symmetric matrices in Rn×n

containing 1’s in the ijth and jith entries and 0’s elsewhere where i, j are the indices of matrix entries above

the main diagonal. Then, using Eq. 28 in [20], the associated dual problem is given by

Trace (Q) + minTrace(Fi,jC)=2qi,j

C�0

Trace (C)

Observe that the first constraint disregards the diagonal terms. Thus, if we represent C = D −Q where

14

D is a diagonal matrix, the constraint will be satisfied for any D. Therefore, the preceding dual problem is

equivalent to

Trace (Q) + minD−Q�0

Trace (D −Q)

and we obtain the following lemma.

Lemma 3.1.1. The dual problem associated with the relaxation in (2.3) is given by

minD−Q�0

Trace (D) (3.1)

where D is a diagonal matrix.

If X and D −Q are optimal solutions to (2.3) and (3.1), respectively, weak duality provides that

Trace (QX) ≤ Trace (D)

To establish strong duality, we look for a dual certificate D, a diagonal matrix that satisfies

Trace (D)− Trace (QX) = 0 (3.2)

Since D is diagonal and per the above, all entries of X are assumed to be equal to 1, and we have

Trace (D) = Trace (DX)

Therefore (3.2) is equivalent to

Trace ((D −Q)X) = 0

We observe that since X and (D−Q) are PSD, they are simultaneously diagonalizable. If (D−Q)1 = 0

and λ2(D−Q) > 0, the eigenspace of the eigenvectors vi corresponding to nonzero eigenvalues λi (2 ≤ i ≤ n

is the orthogonal complement of 1. We have X(D −Q)vi = Xλvi(D −Q)vi. Since X is PSD and λvi > 0,

the condition Trace ((D −Q)X) = 0 requires Xvi = 0. This implies that X is a scalar multiple of 11T , and

the fact that the diagonal terms must be equal to 1 requires that X = 11T . See also [1].

Since diag(x\)Wdiag(x\) ∼W and diag(x\)N ∼ N , WLOG, finding a dual certificate D for r\ = (t\, x\)

15

where t\ ∈ Rn and x\ ∈ {±1}n is equivalent to finding it for x\ = 1 (which implies Sij = −Sji and

Nij = −Nji). Therefore, we established the following result.

Theorem 3.1.2. If a diagonal matrix D satisfies

(D −Q)1 = 0; and

λ2(D −Q) > 0

then the unique optimal solution of (2.3) is X\ = x\x\T and t is the MLE of t\ (up to a global shift), recovered

in each case with high probability as n→∞.

Let D = 12nDV TV + 2DY . Following [6, 7], to find DY we set the following expression to zero

(DY − Y )1 =(DY − 11T − σxW )1

=DY 1− n1− σx

∑nj=1W1,j

...∑nj=1W1,j

1

Therefore,

DY = nIn×n + σxDW , and

DW = diag(W1) =

∑nj=1W1,j 0 0 ... 0

0∑nj=1W2,j 0 ... 0

.... . .

...

0 0 ...∑nj=1Wn,j

To find, DV TV we set

1

2n(DV TV − V TV )1 = 0

16

where

DV TV = diag(V TV 1) =

(V TV 1)1 0 0 ... 0

0 (V TV 1)2 0 ... 0

.... . .

...

0 0 ... (V TV 1)n

Observe that by construction LQ = (DQ −Q)1 = 0. Thus, to complete the proof, we just need to confirm

λ2(LQ) or λ2(LV TV + 2LY ) > 0, which would imply that the SDP recovers x\ exactly with high probability

as n→∞.

3.2 Decomposition of LQ

We decompose V TV into the noiseless signal (ground truth) M (TG), the bias term M (TB) and pure noise

M (TN).

M (TG) := (EV )T (EV )

M (TB) := EV TV − (EV )T (EV )

M (TN) := V TV − EV TV

Letting a matrix T be given by Tij = t\j − t\i , we can decompose V and V TV as

V = BT1 B2 = BT1 (B2T + σB2N )V TV = (BT2T + σtBT2N )B1B

T1 (B2T + σtB2N ))

17

where B2T ∈ Rn2×n is a matrix comprised of n matrix blocks B(i)2T ∈ Rn×n arranged vertically. Each B

(i)2T is

given by (B(i)2T ):i = −TTi: and the remaining entries are zero. Therefore B2T has the following structure.

B2T =

0 0 0 0 0 · · · 0 0 0

−T12 0 0 0 0 · · · 0 0 0

−T13 0 0 0 0 · · · 0 0 0

−T14 0 0 0 0 · · · 0 0 0

......

......

......

......

0 −T21 0 0 0 · · · 0 0 0

0 0 0 0 0 · · · 0 0 0

0 −T23 0 0 0 · · · 0 0 0

0 −T24 0 0 0 · · · 0 0 0

......

......

......

......

0 0 0 0 0 · · · 0 0 −Tn,n−3

0 0 0 0 0 · · · 0 0 −Tn,n−2

0 0 0 0 0 · · · 0 0 −Tn,n−1

0 0 0 0 0 · · · 0 0 0

18

Similarly B2N ∈ Rn2×n is a matrix comprised of n matrix blocks B(i)2N ∈ Rn×n arranged vertically. Each B

(i)2N

is given by (B(i)2N ):i = −NT

i: and the remaining entries are zero. Therefore B2N has the following structure.

B2N =

0 0 0 0 0 · · · 0 0 0

−N12 0 0 0 0 · · · 0 0 0

−N13 0 0 0 0 · · · 0 0 0

−N14 0 0 0 0 · · · 0 0 0

......

......

......

......

0 −N21 0 0 0 · · · 0 0 0

0 0 0 0 0 · · · 0 0 0

0 −N23 0 0 0 · · · 0 0 0

0 −N24 0 0 0 · · · 0 0 0

......

......

......

......

0 0 0 0 0 · · · 0 0 −Nn,n−3

0 0 0 0 0 · · · 0 0 −Nn,n−2

0 0 0 0 0 · · · 0 0 −Nn,n−1

0 0 0 0 0 · · · 0 0 0

WLOG, we can assume that

∑ni=1 t

\i = 0 since t\ is recovered only up to a global shift. On this basis and

letting H = BT2NB1BT1 B2T and G = BT2NB1B

T1 B2N , in Section B of the Appendix, we show the following

result.

Lemma 3.2.1. For LQ where Q is as defined in Lemma 2.1.1, we have LQ = L(TG) +L(TB) +L(TN) where

L(TG) = n2ddiag(t\t\T ) + 2n‖t\‖2I − n(1ddiag(t\t\T )− t\t\T + ddiag(t\t\T )1T )− ‖t\‖211T

L(TB) = 2σ2t (nI − 11T )

L(TN) = σtDH+HT − σt(H +HT ) + σ2tDG − σ2

tG+ L(TB)

19

and

DH+HT = 2nD1 + 2D2

(D1)ii =∑k 6=i

TikNik

(D1)ij = 0

D2 =∑j

∑k<j

TkjNkjI

H +HT = nT �N +

N(1:)T

T(1:)

...

NT(n:)T(n:)

1T + 1

[NT

(:1)T(:1), · · · , NT(:n)T(:n)

]

G = N � (N11T + 11TN)−N2

DG = D2N + 2DNTN

where i 6= j and Ti: and T:i are respectively the i-th row and column of the matrix T.

On the other hand, the orientation measurements Y don’t have a bias since they only have linear noise.

M (XN) = Y − EY = σxW

D(XN) = σxDW

M (XG) = EY = 11T

D(XG) = nI

L(XG) = nI − 11T

3.3 Exactness conditions

For the purposes of the remainder of the thesis, we define the operator norm ‖ · ‖ of a square matrix M by

‖M‖ = max‖x‖=1

xT1=0

xTMx

20

and the minimum eigenvalue of M corresponding to an eigenvector orthogonal to 1 by

λ2(M) = min‖v‖=1

vT1=0

vTMv

Since D2N is PSD, WLOG we can disregard this term, i.e. we assume that DG = 2DNTN , and in Section C

of the Appendix, we show the following result.

Lemma 3.3.1. For L(TG), L(TB) and L(TN) as defined in Lemma 3.2.1 and L(XG), D(XN),M (XN) are as

defined in Section 3.2, we have

λ2(L(TG)) ≥n2t2min + 2n‖t\‖2

λ2(L(TB)) =2σ2t n

λ2(L(XG)) =n

‖L(TN)‖ ≤σt[3nv√

(2 + ε) log n+√

2‖T‖F√ε log n]

+ σ2t [6n+ 8n

√n]

‖D(XN)‖ ≤ σx√

(2 + ε)n log n

‖M (XN)‖ ≤ σx2√n

with high probability as n→∞ where

v = max(‖T1:‖, ‖T1:‖, ..., ‖Tn:‖)

tmin = min(|t\1|, ..., |t\n|)

Therefore, λ2(LQ) > 0, if

1

2n‖L(TN)‖+ 2‖D(XN)‖+ 2‖M (XN)‖ ≤ 1

2n(λ2(L(TG)) + λ2(L(TB))) + 2λ2(L(XG))

or alternatively

σt[3

2v√

(2 + ε) log n+‖T‖F√

2n

√ε log n] + σ2

t [4√n+ 2]

+ σx(2√

(2 + ε)n log n+ 4√n) ≤ ‖t\‖2 +

1

2t2minn+ 2n

21

This implies the following main result in this work.

Theorem 3.3.2. For LQ where Q is as defined in Lemma 2.1.1, we have λ2(LQ) > 0 with high probability

as n→∞ if

σx ≤‖t\‖2 + 1

2 t2minn+ 2n

2√

(2 + ε)n log n

σt ≤‖t\‖2 + 1

2 t2minn+ 2n

32v√

(2 + ε) log n+ ‖T‖F√2n

√ε log n

σt ≤

√‖t\‖2 + 1

2 t2minn+ 2n

4√n

where

tmin = min(|t\1|, ..., |t\n|)

v = max(‖T1:‖, ‖T1:‖, ..., ‖Tn:‖)

and Ti: is the i-th row of the matrix T.

Note that letting tmin, v, ‖T‖F , and ‖t\‖2 go to zero, confirms that our result for synchronization over

E(1) generalizes the exact recovery conditions for synchronization over Z2.

3.4 Exactness without pairwise orientation measurements

Lastly we consider the case when the pairwise orientation measurements x\ix\j represented by Y are unavail-

able, i.e., only the measurements of the relative translations x\i(t\j − t\i) are available.Therefore, we have

Q = 12nV

TV . The analysis in the preceding section implies that λ2(LQ) > 0 if

1

2n‖L(TN)‖ ≤ 1

2n(λ2(L(TG)) + λ2(L(TB)))

or alternatively

σt[3

2v√

(2 + ε) log n+‖T‖F√

2n

√ε log n] + σ2

t [4√n+ 2] ≤ ‖t\‖2 +

1

2t2minn

This implies the following result.

22

Theorem 3.4.1. If Q = 12nV

TV where V is as defined in Lemma 2.1.1, λ2(LQ) > 0 with high probability

as n→∞ if

σt ≤‖t\‖2 + 1

2 t2minn

32v√

(2 + ε) log n+ ‖T‖F√2n

√ε log n

σt ≤

√‖t\‖2 + 1

2 t2minn

4√n

where

tmin = min(|t\1|, ..., |t\n|)

v = max(‖T1:‖, ‖T1:‖, ..., ‖Tn:‖)

and Ti: is the i-th row of the matrix T.

The orientation ground truth x\ can be recovered exactly even without the pairwise orientation mea-

surements x\ix\j . However, in this context, for a given level of noise, more measurements are needed to

achieve recovery than in the context of the regular SDP E(1) (where measurements of (x\i(t\j − t

\i), x

\ix\j) are

available). Compare the numerical simulations in this scenario in Figure 1.1 with Figure 3.1.

23

2 10 100 600 0

1

10

32

2 10 100 600 0

1

10

32

Figure 3.1: This figure shows how frequently SDP E(1) exactly recovers the orientation X\ = x\x\T when therelative orientation measurements are not available. Here, the centered ground truth translation t\c has a fixed l2norm. For each (σt, n), 10 realization of the data were generated and the exactness of the recovered orientation wasverified. The frequency of success is represented in grayscale (white for 100% success and black for 0% success). Theresults agree with the analytic predictions (solid curve). The analytic predictions for the regula SDP E(1) and SDPZ2 are plotted as well. (The orientation noise σx was not used in the experiments. Nevertheless, it was set to beequal to the translation noise σt and plotted on the right vertical axis for reference in the SDP E(1) and SDP Z2analytical predictions.)

24

Chapter 4

Conclusion

We showed that the SDP relaxation for the E(1) synchronization problem exactly recovers of the orientation

ground truth and tightly recovers the translation MLE. As noted in [15–17], synchronization over SE(d)

includes important problems in robotics and computer vision. We expect that establishing the tightness of

synchronization over SE(d) should be similar to establishing that over E(d). Therefore, we hope that our

result for E(1) will be extended to higher dimensions.

25

Bibliography

[1] E. Abbe, A. S. Bandeira, A. Bracher, and A. Singer. Decoding binary node labels from censored

edge measurements: Phase transition and efficient recovery. Transactions on Network Science and

Engineering, to appear. Available online at arXiv:1404.4749 [cs.IT], 2014.

[2] E. Abbe, A. S. Bandeira, and G. Hall. Exact recovery in the stochastic block model. IEEE Transactions

on Information Theory, 62(1):471–487, Jan 2016.

[3] P.-A. Absil, C.G. Baker, and K.A. Gallivan. Trust-region methods on riemannian manifolds. Foundations

of Computational Mathematics, 7(3):303–330, Jul 2007.

[4] N. Agarwal, A.S. Bandeira, K. Koiliaris, and A. Kolla. Multisection in the stochastic block model

using semidefinite programming. Compressed Sensing and its Applications: MATHEON Workshop

2015 (Applied and Numerical Harmonic Analysis), to appear., abs/1507.02323, 2015.

[5] D. Amelunxen, M. Lotz, M.B. McCoy, and J.A. Tropp. Living on the edge: Phase transitions in convex

programs with random data. Information and Inference, 3, 2014.

[6] A. S. Bandeira. Convex relaxations for certain inverse problems on graphs. PhD thesis, Program in

Applied and Computational Mathematics, Princeton University, 2015.

[7] A.S. Bandeira, N. Boumal, and A. Singer. Tightness of the maximum likelihood semidefinite relaxation

for angular synchronization. Mathematical Programming, pages 1–23, 2016.

[8] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, New York, NY,

USA, 2004.

[9] E. J. Candes, J. Romberg, and T. Tao. Robust uncertainty principles: exact signal reconstruction from

highly incomplete frequency information. IEEE Transactions on Information Theory, 52(2):489–509,

Feb 2006.

26

[10] E.J. Candes and B. Recht. Exact matrix completion via convex optimization. Foundations of Compu-

tational Mathematics, 9(6):717, Apr 2009.

[11] D. L. Donoho. Compressed sensing. IEEE Transactions on Information Theory, 52(4):1289–1306, April

2006.

[12] J. Gallier. The schur complement and symmetric positive semidefinite (and definite) matrices. Available

online at http://www.cis.upenn.edu/ jean/schur-comp.pdf, Dec 2010.

[13] R. A. Horn. Topics in Matrix Analysis. Cambridge University Press, New York, NY, USA, 1986.

[14] Y. Nesterov and A. Nemirovskii. Interior-Point Polynomial Algorithms in Convex Programming. Society

for Industrial and Applied Mathematics, 1994.

[15] D.M. Rosen. Certifiably Correct SLAM. PhD thesis, MIT, 2016.

[16] D.M. Rosen, L. Carlone, A.S. Bandeira, and J.J. Leonard. A certifiably correct algorithm for synchro-

nization over the special Euclidean group. In Intl. Workshop on the Algorithmic Foundations of Robotics

(WAFR), San Francisco, CA, December 2016.

[17] D.M. Rosen, L. Carlone, A.S. Bandeira, and J.J. Leonard. SE-Sync: A certifiably correct algorithm for

synchronization over the special Euclidean group. Technical Report MIT-CSAIL-TR-2017-002, Com-

puter Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge,

MA, February 2017.

[18] H. J. Sommers, A. Crisanti, H. Sompolinsky, and Y. Stein. Spectrum of large random asymmetric

matrices. Phys. Rev. Lett., 60:1895–1898, May 1988.

[19] J.A. Tropp. User-friendly tail bounds for sums of random matrices. Foundations of Computational

Mathematics, 12(4):389–434, Aug 2012.

[20] L. Vandenberghe and S. Boyd. Semidefinite programming. SIAM Review, 38(1):49–95, 1996.

27

Appendix A

Proof of Lemma 2.1.1

We have the following Rn×n matrices:

L = BT1 B1 = 2

n− 1 −1 −1 · · · −1 −1 −1

−1 n− 1 −1 · · · −1 −1 −1

−1 −1 n− 1 · · · −1 −1 −1

......

......

......

...

−1 −1 −1 −1 · · · −1 n− 1

Σ = BT2 B2 =

∑j 6=1 S

21j 0 0 · · · 0 0

0∑j 6=2 S

22j 0 · · · 0 0

0 0∑j 6=3 S

23j · · · 0 0

......

......

......

0 0 0 · · · 0∑j 6=n S

2nj

28

We observe (2.1) is equivalent to:

arg mint∈Rn

x∈Zn2

‖[B1 B2

]tx

‖22 − 2xTY x

= arg mint∈Rn

x∈Zn2

‖B1t+B2x‖22 − 2xTY x

= arg mint∈Rn

x∈Zn2

tTLt+ 2xTV T t+ xTΣx− 2xTY x

= arg mint∈Rn

x∈Zn2

tTLt+ 2xTV T t− 2xTY x

since Σ is diagonal, we have xTΣx = trΣ = constant for all x ∈ Zn2 . For a fixed x, the least squares solution

to the minimization problem is given by t∗ = −B†1B2x (up to a global shift since 1 is in the nullspace of

B†1B2, a rank n− 1 matrix).

arg mint∈Rn

x∈Zn2

tTLt+ 2xTV T t

= arg minx∈Zn

2

xT (B†1B2)TBT1 B1B†1B2x− 2xT (BT1 B2)TB†1B2x

= arg minx∈Zn

2

xTBT2 (B1B†1)TB1B

†1B2x− 2xTBT2 B1B

†1B2x

= arg minx∈Zn

2

xTBT2 ((B1B†1)TB1B

†1 − 2B1B

†1)B2x

= arg minx∈Zn

2

−xTBT2 B1B†1B2x

Thus, we can express the optimization problem in terms of x only:1

arg minx∈Zn

2

−xTBT2 B1B†1B2x− 2xTY x

1The same result can be obtained from Appendix A.5.5 in [8] and Proposition 4.2 in [12] Observe that xTV T1 = 0, which

implies that V x ⊥ kerL for all x. Therefore, (In×n − LL†)V x = 0, which implies that

mint∈Rn

tTLt + 2xTV T t = −xTV TL†V x = −xTBT2 B1B

†1B2x

arg mint∈Rn

tTLt + 2xTV T t = −L†V x = −B†1B2x

29

which is equivalent to

arg maxX=xxT

x∈Zn2

Trace (QX)

where Q = BT2 B1B†1B2 + 2Y .

Observe that

BT2 B1B†1B2 =BT2 B1(BT1 B1)†BT1 B2

=V TL†V

=1

2nV T (I − 1

n11

T )V

=1

2nV TV

since 1TV = 0 (since L is a Laplacian of a complete graph, kerB1 = kerL = span(1)). Therefore, Q =

12nV

TV + 2Y .

30

Appendix B


B.1 Decomposition of L(TG)

We have

M (TG) =(EV )TEV = BT2TB1BT1 B2T

=

∑j 6=1 T1j −T12 · · · −T1,n

−T21∑j 6=2 T2j · · · −T2,n

−T31 −T32 · · · −T3,n...

......

...

−Tn1 −Tn2 · · ·∑j 6=n Tn,j

∑j 6=1 T1j −T21 · · · −Tn,1

−T12∑j 6=2 T2j · · · −Tn,2

−T13 −T23 · · · −Tn,3...

......

...

−T1n −T2n · · ·∑j 6=n Tn,j

and therefore

M(TG)ii = (

∑k 6=i

Tik)2 +∑k 6=i

T 2ik

and since we can cancel the diagonal terms of M (TG) and D(TG), we can take

M(TG)ii = (

∑k 6=i

Tik)2 +∑k 6=i

T 2ik

31

Also

M(TG)ij = Tij(

∑k 6=i

Tik −∑k 6=j

Tjk) +∑k 6=i,j

TikTjk

= Tij [∑k 6=i

(tk − ti)−∑k 6=j

(tk − tj)] +∑k 6=i,j

TikTjk

= Tij [(tj − ti)− (ti − tj) + (n− 2)(tj − ti)] +∑k 6=i,j

TikTjk

= nT 2ij +

∑k 6=i,j

TikTjk

where i 6= j. Therefore,

M(TG)ij =

∑k

(TikTjk + T 2ij)

=∑k

(t\k − t\i)(t

\k − t

\j) + (t\j − t

\i)

2

=∑k

t\2k − t\k(t\i + t\j) + t\it

\j + (t\j − t

\i)

2

=n(t\2i + t\2j − t\it\j) +

∑k

t\2k

This implies that

M (TG) = n(1ddiag(t\t\T )− t\t\T + ddiag(t\t\T )1T ) + ‖t\‖211T

Consequently,

D(TG)ii =

∑j

(n(t\2i + t\2j − t\it\j) +

∑k

t\2k

=n2t\2i + n∑j

t\2j + n∑k

t\2k

=n2t\2i + 2n∑k

t\2k

This implies that

D(TG) = n2ddiag(t\t\T ) + 2n‖t\‖2I

32

Therefore,

L(TG) = n2ddiag(t\t\T ) + 2n‖t\‖2I − n(1ddiag(t\t\T )− t\t\T + ddiag(t\t\T )1T )− ‖t\‖211T

B.2 Decomposition of L(TB)

G =

∑j 6=1N1j −N12 · · · −N1,n

−N21

∑j 6=2N2j · · · −N2,n

−N31 −N32 · · · −N3,n

......

......

−Nn1 −Nn2 · · ·∑j 6=nNn,j

∑j 6=1N1j −N21 · · · −Nn,1

−N12

∑j 6=2N2j · · · −Nn,2

−N13 −N23 · · · −Nn,3...

......

...

−N1n −N2n · · ·∑j 6=nNn,j

and therefore

Gii = (∑k 6=i

Nik)2 +∑k 6=i

N2ik

Gij = Nij(∑k 6=i

Nik −∑k 6=j

Njk) +∑k 6=i,j

NikNjk

= 2N2ij +Nij(

∑k 6=i,j

Nik −∑k 6=i,j

Njk) +∑k 6=i,j

NikNjk

EGii = 2(n− 1)

EGij = 2


M(TB)ii = 2(n− 1)σ2

t

M(TB)ij = 2σ2

t

where i 6= j or equivalently,

M (TB) = 2(n− 2)σ2t I + 2σ2

t 11T

33

Therefore,

D(TB)ii = 4(n− 1)σ2

t

D(TB)′

ij = 0

L(TB)ii = 2(n− 1)σ2

t

L(TB)ij = −2σ2

t

where i 6= j. or equivalently

L(TB) = 2σ2t (nI − 11T )

B.3 Decomposition of L(TN)

B.3.1 Decomposition of H +HT

Let H = BT2NB1BT1 B2T and HT = BT2TB1B

T1 B2N . Then

H =

∑j 6=1N1j −N12 · · · −N1,n

−N21

∑j 6=2N2j · · · −N2,n

−N31 −N32 · · · −N3,n

......

......

−Nn1 −Nn2 · · ·∑j 6=nNn,j

∑j 6=1 T1j −T21 · · · −Tn,1

−T12∑j 6=2 T2j · · · −Tn,2

−T13 −T23 · · · −Tn,3...

......

...

−T1n −T2n · · ·∑j 6=n Tn,j

and therefore

Hii = (∑k 6=i

Nik)(∑k 6=i

Tik) +∑k 6=i

NikTik

Hij = −Nij∑k 6=j

Tjk + Tij∑k 6=i

Nik) +∑k 6=i,j

NikTjk

34

where i 6= j. Similarly

HTii = Hii

HTij = −Tij

∑k 6=j

Njk +Nij∑k 6=i

Tik) +∑k 6=i,j

TikNjk


H +HTii = 2[(

∑k 6=i

Nik)(∑k 6=i

Tik) +∑k 6=i

NikTik]

H +HTij = Nij(−

∑k 6=j

Tjk +∑k 6=i

Tik) + Tij(∑k 6=i

Nik −∑k 6=j

Njk) +∑k 6=i,j

(NikTjk + TikNjk)

= Nij(−∑k 6=j

(tk − tj) +∑k 6=i

(tk − ti)) + Tij(∑k 6=i

Nik −∑k 6=j

Njk) +∑k 6=i,j

(NikTjk + TikNjk)

= Nij(−(ti − tj) + (tj − ti) +∑k 6=i,j

(tj − ti)) + Tij(∑k 6=i

Nik −∑k 6=j

Njk) +∑k 6=i,j

(NikTjk + TikNjk)

= nTijNij + Tij(∑k 6=i

Nik −∑k 6=j

Njk) +∑k 6=i,j

(NikTjk + TikNjk)

= nTijNij +∑k

(NikTik +NkjTkj)

Note that in the foregoing calculation we used the following result

Tij(∑k 6=i

Nik −∑k 6=j

Njk) +∑k 6=i,j

(NikTjk + TikNjk)

= 2TijNij + Tij(∑k 6=i,j

Nik −Njk) +∑k 6=i,j

(NikTjk + TikNjk)

= 2TijNij +∑k 6=i,j

Nik(Tij + Tjk) +Njk(−Tij + Tik)

=∑k

NikTik +NjkTjk

35

where i 6= j. Accordingly,

(DH+HT )ii =2[(∑k 6=i

Nik)(∑k 6=i

Tik) +∑k 6=i

NikTik]

+∑j 6=i

nTijNij +∑k

NikTik +NkjTkj

=2[(∑k 6=i

Nik)(∑k 6=i

Tik) +∑k 6=i

NikTik]

+ 2n∑k 6=i

TikNik +∑j

∑k 6=j

TkjNkj

(DH+HT )ij = 0

Since we can cancel the terms on the main diagonal of H +HT and DH+HT , we have

(H +HT )ii = 2∑k 6=i

NikTik

(H +HT )ij = nTijNij +∑k

(NikTik +NkjTkj)

or equivalently

H +HT = nT �N +

N(1:)T

T(1:)

...

NT(n:)T(n:)

1T + 1

[NT

(:1)T(:1), · · · , NT(:n)T(:n)

]

B.3.2 Decomposition of DH+HT

(DH+HT )ii = 2n∑k 6=i

TikNik +∑j

∑k 6=j

TkjNkj

= 2n∑k 6=i

TikNik + 2∑j

∑k<j

TkjNkj

(DH+HT )ij = 0

36

Therefore DH+HT = 2nD1 + 2D2 where D1 is given by

(D1)ii =∑k 6=i

TikNik

(D1)ij = 0

and

D2 =∑j

∑k<j

TkjNkjI

B.3.3 Decomposition of G and DG

Also, as we saw previously

Gii = (∑k 6=i

Nik)2 +∑k 6=i

N2ik

Gij = 2N2ij +Nij(

∑k 6=i,j

Nik −∑k 6=i,j

Njk) +∑k 6=i,j

NikNjk

where i 6= j.

Therefore DG is given by

(DG)ii = (∑k 6=i

Nik)2 +∑k 6=i

N2ik +

∑j 6=i

[2N2ij +Nij

∑k 6=i,j

(Nik − 2Njk)]

= (∑k 6=i

Nik)2 +∑j 6=i

(3N2ij +Nij

∑k 6=i,j

(Nik − 2Njk))

(DG)ij = 0

Since we can cancel the terms on the main diagonal of G and DG, we have

Gii =∑k 6=i

N2ik

Gij = 2N2ij +Nij(

∑k 6=i,j

Nik −∑k 6=i,j

Njk) +∑k 6=i,j

NikNjk

(DG)ii =∑j 6=i

(3N2ij +Nij

∑k 6=i,j

(Nik − 2Njk))

(DG)ij = 0

37

The foregoing implies that

Gij = Nij(∑k

Nik +∑k

Nkj)−∑k 6=i,j

NikNkj

G = N � (N11T + 11TN)−N2

where � denotes the componentwise (i.e. Schur or Hadamard) product of matrices, and

(DG)ii =∑j

[Nij∑k

(Nik − 2Njk)

(DG)ij = 0

or alternatively

DG = D2N + 2DNTN

38

Appendix C


C.1 Minimum eigenvalue of L(TG)

We previously saw that

L(TG) = n2diag(t\t\T ) + 2n‖t\‖2I − n(1diag(t\t\T )− t\t\T + diag(t\t\T )1T )− ‖t\‖211T

Therefore,

λ2(L(TG)) ≥ n2t2min + 2n‖t\‖2

where tmin = min(|t\1|, ..., |t\n|).

C.2 Spectral radius of L(TN)

Based on the results below, we have

‖L(TN)‖ ≤σt‖DTH+H‖+ σt‖H +HT ‖+ σ2

t ‖DG‖+ σ2t ‖G‖+ 2σ2

t n

≤σt[3nv√

(2 + ε) log n+ ‖T‖F√

2ε log n]

+ σ2t [8n√n+ 6n]

39

C.2.1 Spectral radius of H +HT

We previously showed that

H +HT = nT �N +

N(1:)T

T(1:)

...

NT(n:)T(n:)

1T + 1

[NT

(:1)T(:1), · · · , NT(:n)T(:n)

]

To determine the spectral radius of H +HT in the eigenspace orthogonal to 1, it suffices to consider the

spectral norm of E = nT �N . We use the following lemma.

Lemma C.2.1. (Following Section 4.3 in [19]) Let T be a deterministic n × n skew-symmetric matrix

and N be a random n × n skew-symmetric matrix with independent standard normal (Gaussian) entries.

Construct the random matrix T �N and observe that its (i, j) component is a Gaussian variable with zero

mean and variance |Tij |2. We have

P (‖T �N‖ ≥ a) ≤ 2ne−a2/2v2

where v2 = max(‖T1:‖2, ‖T1:‖2, ..., ‖Tn:‖2‖) and Ti: is the i-th row of the matrix T.

Proof. (of Lemma C.2.1) We decompose the matrix of interest as a Gaussian series:

T �N =∑i<j

NijTijEij

where Eij is an n× n matrix having 1 as the ij-th element, -1 as the ji-th element, and zeros elsewhere.

To determine the variance parameter, we let

∑i<j

TijEij(TijEij)T =

∑i<j

T 2ijIij = diag(‖T1:‖2, ‖T1:‖2, ..., ‖Tn:‖2)

= diag(‖T:1‖2, ‖T:2‖2, ..., ‖T:n‖2) =∑i<j

(TijEij)TTijEij

Therefore,

v2 = ‖diag(‖T1:‖2, ‖T1:‖2, ..., ‖Tn:‖2)‖ = max(‖T1:‖2, ‖T1:‖2, ..., ‖Tn:‖2‖)

and the final result follows by Corollary 4.2 in [19].

40

Therefore, for ε > 0, we have P (‖T �N‖ ≥ a) ≤ n−ε if

a ≥ v√

(2 + ε) log n

i.e.

‖H +HT ‖ < nv√

(2 + ε) log n

C.2.2 Spectral radius of DH+HT

We previously showed that DH+HT = 2nD1 + 2D2 where D1 is given by

(D1)ii =∑k 6=i

TikNik

(D1)ij = 0

and

D2 =∑j

∑k<j

TkjNkjI

By the union bound and the upper deviation inequality,

P (‖D1‖ > a) =nP (|∑k 6=i

TikNik| > a)

=nP (N4 >a

v)

=ne−a2

2nv2

where N4 ∼ N(0, 1) and again

v = max(‖T1:‖, ‖T1:‖, ..., ‖Tn:‖)

Thus, if a ≥ v√

(2 + ε) log n, for ε > 0 we have

P (‖D1‖ > a) = n−ε/2

41

Therefore,

‖D1‖ ≤ v√

(2 + ε) log n

with high probability as n→∞.

Similarly,

P (‖D2‖ > a) =P (|∑j

∑k<j

TkjNkj | > a)

=P (N5 >a

‖T‖F /√

2)

=e−a2

V 2

where N5 ∼ N(0, 1). Thus, if a ≥ ‖T‖F√2

√ε log n, for ε > 0 we have

P (‖D2‖ > a) = n−ε

Therefore,

‖D2‖ ≤‖T‖F√

2

√ε log n

with high probability as n→∞. Combining the foregoing results, we get

‖DH+HT ‖ ≤ 2nv√

(2 + ε) log n+ ‖T‖F√

2ε log n

C.2.3 Spectral radius of G

We previously saw that

G = N � (N11T + 11TN)−N2

42

where � denotes the componentwise (i.e. Schur or Hadamard) product of matrices. Consequently, since the

Hadamard product is submultiplicative with respect to the spectral norm [13,18], we have

‖G‖ ≤ ‖N‖‖N11T + 11TN‖+ ‖N2‖ ≤ 4n

C.2.4 Spectral radius of DG

As we noted previously, we can assume that

DG = 2DNTN

By the semi-circle law,

‖NTN1‖ =

√∑i

(DG)2ii ≤ 4n√n

with high probability as n→∞. This implies that

P (‖DG‖ > 8n√n) ≤P (∪i|DGii| > 8n

√n) = 0

with high probability as n→∞, i.e.

‖DG‖ ≤ 8n√n

C.3 Spectral radius of D(XN) and M (XN)

The spectral radius of M (XN) is bounded by 2σx√n for n → ∞ in accordance with the semi-circular law.

Therefore, we just need to bound the diagonal entries of DW given by∑nj=1Wij .

43

By the union bound and the upper deviation inequality,

P (‖DW ‖ > a) =nP (|n∑j=1

Wij | > a)

=nP (N3 >a√n

)

=ne−a2

2n

where N3 ∼ N(0, 1) Thus, if a ≥√

(2 + ε)n log n, for ε > 0 we have

P (‖DW ‖ > a) =ne−(1+ε/2) logn

=n−ε/2

Therefore,

‖D(XN)‖ ≤ σx√

(2 + ε)n log n

with high probability as n→∞.

44

exact recovery in semidefinite relaxation of synchronization …vkobzar/kobzar_thesis.pdf · exact...

Documents