stein's method for matrix concentrationlmackey/papers/matstein-12_10_12-slides.pdf ·...
Post on 16-Mar-2020
4 Views
Preview:
TRANSCRIPT
Steinrsquos Method for Matrix Concentration
Lester Mackeydagger
Collaborators Michael I JordanDagger Richard Y Chenlowast Brendan Farrelllowast and Joel A Tropplowast
daggerStanford University DaggerUniversity of California BerkeleylowastCalifornia Institute of Technology
December 10 2012
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 1 35
Motivation
Concentration Inequalities
Matrix concentration
PX minus EX ge t le δ
Pλmax(X minus EX) ge t le δ
Non-asymptotic control of random matrices with complexdistributions
Applications
Matrix completion from sparse random measurements(Gross 2011 Recht 2011 Negahban and Wainwright 2010 Mackey Talwalkar and Jordan 2011)
Randomized matrix multiplication and factorization(Drineas Mahoney and Muthukrishnan 2008 Hsu Kakade and Zhang 2011b)
Convex relaxation of robust or chance-constrained optimization(Nemirovski 2007 So 2011 Cheung So and Wang 2011)
Random graph analysis (Christofides and Markstrom 2008 Oliveira 2009)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 2 35
Motivation Matrix Completion
Motivation Matrix Completion
Goal Recover a matrix L0 isin Rmtimesn from a subset of its entries
1 43 5 5
rarr
2 3 1 43 4 5 12 5 3 5
Examples
Collaborative filtering How will user i rate movie j
Ranking on the web Is URL j relevant to user i
Link prediction Is user i friends with user j
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 3 35
Motivation Matrix Completion
Motivation Matrix Completion
Goal Recover a matrix L0 isin Rmtimesn from a subset of its entries
1 43 5 5
rarr
2 3 1 43 4 5 12 5 3 5
Bad News Impossible to recover a generic matrixToo many degrees of freedom too few observations
Good News
Small number of latent factors determine preferencesMovie ratings cluster by genre and director
L0 = A
B⊤
These low-rank matrices are easier to completeMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 4 35
Motivation Matrix Completion
How to Complete a Low-rank Matrix
Suppose Ω is the set of observed entry locations
First attemptminimizeA rankA
subject to Aij = L0ij (i j) isin Ω
Problem NP-hard rArr computationally intractable
Solution Solve convex relaxation ()
minimizeA Alowastsubject to Aij = L0ij (i j) isin Ω
where Alowast =sum
k σk(A) is the tracenuclear norm of A
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 5 35
Motivation Matrix Completion
Can Convex Optimization Recover L0
Yes with high probability
Theorem (Recht 2011)
If L0 isin Rmtimesn has rank r and s amp βrn log2(n) entries are observeduniformly at random then (under some technical conditions) convexoptimization recovers L0 exactly with probability at least 1minus nminusβ
See also Gross (2011) Mackey Talwalkar and Jordan (2011)
Past results (Candes and Recht 2009 Candes and Tao 2009) requiredstronger assumptions and more intensive analysis
Streamlined approach reposes on a matrix variant of a classicalBernstein inequality (1946)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 6 35
Motivation Matrix Completion
Scalar Bernstein Inequality
Theorem (Bernstein 1946)
Let (Yk)kge1 be independent random variables in R satisfying
EYk = 0 and |Yk| le R for each index k
Define the variance parameter
σ2 =sum
kEY 2
k
Then for all t ge 0
P
∣
∣
∣
sum
kYk
∣
∣
∣ge t
le 2 middot exp minust22σ2 + 2Rt3
Gaussian decay controlled by variance when t is small
Exponential decay controlled by uniform bound for large t
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 7 35
Motivation Matrix Completion
Matrix Bernstein Inequality
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Rmtimesn satisfying
EYk = 0 and Yk le R for each index k
Define the variance parameter
σ2 = max(∥
∥
∥
sum
kEYkY
⊤k
∥
∥
∥∥
∥
∥
sum
kEY ⊤
k Yk
∥
∥
∥
)
Then for all t ge 0
P
∥
∥
∥
sum
kYk
∥
∥
∥ge t
le (m+ n) middot exp minust23σ2 + 2Rt
See also Tropp (2011) Oliveira (2009) Recht (2011)
Gaussian tail when t is small exponential tail for large t
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 8 35
Motivation Matrix Completion
Matrix Bernstein Inequality
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
For all t ge 0
P
∥
∥
∥
sum
kYk
∥
∥
∥ge t
le (m+ n) middot exp minust23σ2 + 2Rt
Consequences for matrix completion
Recht (2011) showed that uniform sampling of entries capturesmost of the information in incoherent low-rank matrices
Negahban and Wainwright (2010) showed that iid sampling ofentries captures most of the information in non-spiky (near)low-rank matrices
Foygel and Srebro (2011) characterized the generalization errorof convex MC through Rademacher complexity
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 9 35
Motivation Matrix Concentration
Concentration Inequalities
Matrix concentration
Pλmax(X minus EX) ge t le δ
Difficulty Matrix multiplication is not commutative
rArr eX+Y 6= eXeY
Past approaches (Ahlswede and Winter 2002 Oliveira 2009 Tropp 2011)
Rely on deep results from matrix analysis
Apply to sums of independent matrices and matrix martingales
This work
Steinrsquos method of exchangeable pairs (1972) as advanced byChatterjee (2007) for scalar concentrationrArr Improved exponential tail inequalities (Hoeffding Bernstein)rArr Polynomial moment inequalities (Khintchine Rosenthal)rArr Dependent sums and more general matrix functionals
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 10 35
Motivation Matrix Concentration
Roadmap
1 Motivation
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent Sequences
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 11 35
Background
Notation
Hermitian matrices Hd = A isin Cdtimesd A = AlowastAll matrices in this talk are Hermitian
Maximum eigenvalue λmax(middot)Trace trB the sum of the diagonal entries of B
Spectral norm B the maximum singular value of B
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 12 35
Background
Matrix Stein Pair
Definition (Exchangeable Pair)
(ZZ prime) is an exchangeable pair if (ZZ prime)d= (Z prime Z)
Definition (Matrix Stein Pair)
Let (ZZ prime) be an exchangeable pair and let Ψ Z rarr Hd be ameasurable function Define the random matrices
X = Ψ(Z) and X prime = Ψ(Z prime)
(XX prime) is a matrix Stein pair with scale factor α isin (0 1] if
E[X prime |Z] = (1minus α)X
Matrix Stein pairs are exchangeable pairs
Matrix Stein pairs always have zero mean
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 13 35
Background
The Conditional Variance
Definition (Conditional Variance)
Suppose that (XX prime) is a matrix Stein pair with scale factor αconstructed from the exchangeable pair (ZZ prime) The conditional
variance is the random matrix
∆X = ∆X(Z) =1
2αE[
(X minusX prime)2 |Z]
∆X is a stochastic estimate for the variance EX2
Take-home Message
Control over ∆X yields control over λmax(X)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 14 35
Exponential Tail Inequalities
Exponential Concentration for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a matrix Stein pair with X isin Hd Suppose that
∆X 4 cX + v I almost surely for c v ge 0
Then for all t ge 0
Pλmax(X) ge t le d middot exp minust22v + 2ct
Control over the conditional variance ∆X yields
Gaussian tail for λmax(X) for small t exponential tail for large t
When d = 1 improves scalar result of Chatterjee (2007)
The dimensional factor d cannot be removed
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 15 35
Exponential Tail Inequalities
Matrix Hoeffding Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let X =sum
k Yk for independent matrices in Hd satisfying
EYk = 0 and Y 2k 4 A2
k
for deterministic matrices (Ak)kge1 Define the variance parameter
σ2 =∥
∥
∥
sum
kA2k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot eminust22σ2
Improves upon the matrix Hoeffding inequality of Tropp (2011)Optimal constant 12 in the exponent
Can replace variance parameter with σ2 = 12
∥
∥
sum
k
(
A2k + EY 2
k
)∥
∥
Tighter than classical Hoeffding inequality (1963) when d = 1
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 16 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
1 Matrix Laplace transform method (Ahlswede amp Winter 2002)
Relate tail probability to the trace of the mgf of X
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ)
where m(θ) = E tr eθX
Problem eX+Y 6= eXeY when XY isin Hd
How to bound the trace mgf
Past approaches Golden-Thompson Liebrsquos concavity theorem
Chatterjeersquos strategy for scalar concentration
Control mgf growth by bounding derivative
mprime(θ) = E trXeθX for θ isin R
Rewrite using exchangeable pairs
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 17 35
Exponential Tail Inequalities
Method of Exchangeable Pairs
Lemma
Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying
E(X minusX prime)F (X) ltinfin
Then
E[X F (X)] =1
2αE[(X minusX prime)(F (X)minus F (X prime))] (1)
Intuition
Can characterize the distribution of a random matrix byintegrating it against a class of test functions F
Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 18 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
2 Method of Exchangeable Pairs
Rewrite the derivative of the trace mgf
mprime(θ) = E trXeθX =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
Goal Use the smoothness of F (X) = eθX to bound the derivative
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 19 35
Exponential Tail Inequalities
Mean Value Trace Inequality
Lemma (Mackey Jordan Chen Farrell and Tropp 2012)
Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that
tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1
2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]
Standard matrix functions If g R rarr R and
A = Q
λ1
λd
Qlowast then g(A) = Q
g(λ1)
g(λd)
Qlowast
Inequality does not hold without the traceFor exponential concentration we let g(A) = A and h(B) = eθB
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 20 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
le θ
4αE tr
[
(X minusX prime)2 middot(
eθX + eθXprime)]
=θ
2αE tr
[
(X minusX prime)2 middot eθX]
= θ middot E tr
[
1
2αE[
(X minusX prime)2 |Z]
middot eθX]
= θ middot E tr[
∆X eθX]
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 21 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) le θ middot E tr[
∆X eθX]
4 Conditional Variance Bound ∆X 4 cX + v I
Yields differential inequality
mprime(θ) le cθE tr[
X eθX]
+ vθE tr[
eθX]
= cθ middotmprime(θ) + vθ middotm(θ)
Solve to bound m(θ) and thereby bound
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ) le d middot exp minust22v + 2ct
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 22 35
Exponential Tail Inequalities
Refined Exponential Concentration
Relaxing the constraint ∆X 4 cX + v
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a bounded matrix Stein pair with X isin Hd Definethe function
r(ψ) =1
ψlogE tr(eψ∆Xd) for each ψ gt 0
Then for all t ge 0 and all ψ gt 0
Pλmax(X) ge t le d middot exp minust22r(ψ) + 2t
radicψ
r(ψ) measures typical magnitude of conditional variance
Eλmax(∆X) le infψgt0
[
r(ψ) + log dψ
]
When d = 1 improves scalar result of Chatterjee (2008)Proof extends to unbounded random matricesMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 23 35
Exponential Tail Inequalities
Matrix Bernstein Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Hd satisfying
EYk = 0 and Yk le R for each index k
Define the variance parameter
σ2 =∥
∥
∥
sum
kEY 2
k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot exp minust23σ2 + 2Rt
Gaussian tail controlled by improved variance when t is smallKey proof idea Apply refined concentration and boundr(ψ) = 1
ψlogE tr(eψ∆Xd) using unrefined concentration
Constants better than Oliveira (2009) worse than Tropp (2011)Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 24 35
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere E tr |X|2p ltinfin Then
(
E tr |X|2p)12p le
radic
2pminus 1 middot(
E tr∆pX
)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 25 35
Polynomial Moment Inequalities
Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
E tr(
sum
kεkAk
)2p
le (2pminus 1)p middot tr(
sum
kA2k
)p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 26 35
Dependent Sequences
Adding Dependence
1 MotivationMatrix CompletionMatrix Concentration
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent SequencesSums of Conditionally Zero-mean MatricesCombinatorial Sums
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 27 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Sum of Conditionally Zero-Mean Matrices)
Given a sequence of Hermitian matrices (Yk)nk=1 satisfying the
Conditional zero mean property E[Yk | (Yj)j 6=k] = 0
for all k define the random sum X =sumn
k=1Yk
Note (Yk)kge1 is a martingale difference sequence
Examples
Sums of independent centered random matricesMany sums of conditionally independent random matrices
Yk perpperp (Yj)j 6=k | Z and E[Yk |Z] = 0
Rademacher series with random matrix coefficients
X =sum
kεkWk
(Wk)kge1 Hermitian (εk)kge1 independent Rademacher
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 28 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Matrix Stein Pair for X =sumn
k=1 Yk
Let Y primek and Yk be conditionally iid given (Yj)j 6=k
Draw index K uniformly from 1 nDefine X prime = X + Y prime
K minus YK
Check Stein pair condition
E[X minusX prime | (Yj)jge1] = E[YK minus Y primeK | (Yj)jge1]
=1
n
sumn
k=1
(
Yk minus E[Y primek | (Yj)j 6=k]
)
=1
n
sumn
k=1Yk =
1
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 29 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Conditional Variance for X = Y minus EY
∆X =n
2middot E
[
(X minusX prime)2 | (Yj)jge1
]
=n
2middot E
[
(YK minus Y primeK)
2 | (Yj)jge1
]
=1
2
sumn
k=1
(
Y 2k + E[Y 2
k | (Yj)j 6=k])
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 30 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Given a deterministic array (Ajk)njk=1 of Hermitian matrices and a
uniformly random permutation π on 1 n define thecombinatorial matrix statistic
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Generalizes the scalar statistics studied by Hoeffding (1951)
Example
Sampling without replacement from B1 BnW =
sums
j=1Bπ(j)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 31 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Matrix Stein Pair for X = Y minus EY
Draw indices (JK) uniformly from 1 n2Define πprime = π (JK) and X prime =
sumnj=1Ajπprime(j) minus EY
Check Stein pair condition
E[X minusX prime | π] = E[
AJπ(J) +AKπ(K) minusAJπ(K) minusAKπ(J) | π]
=1
n2
sumn
jk=1Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
=2
n(Y minus EY ) =
2
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 32 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Conditional Variance for X = Y minus EY
∆X(π) =n
4E[
(X minusX prime)2 | π]
=1
4n
sumn
jk=1
[
Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
]2
41
n
sumn
jk=1
[
A2jπ(j) +A2
kπ(k) +A2jπ(k) +A2
kπ(j)
]
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 33 35
Extensions
Extensions
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = B
Beyond Sums
Matrix-valued functions satisfying a self-reproducing property
eg Matrix second-order Rademacher chaossum
jk εjεkAjk
Yields a dependent bounded differences inequality for matrices
Generalized Matrix Stein Pairs
Satisfy E[g(X)minus g(X prime) |Z] = αX almost surely forg R rarr R weakly increasingMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 34 35
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Bernstein S The theory of probabilities Gastehizdat Publishing House 1946
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Candes E J and Recht B Exact matrix completion via convex optimization Found Comput Math 9717ndash772 2009
Candes E J and Tao T The power of convex relaxation Near-optimal matrix completion IEEE Trans Info Theory 2009URL arXiv09031476 To appear Available at arXiv09031476
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Chatterjee S Concentration inequalities with exchangeable pairs PhD thesis Stanford University Palo Alto Feb 2008 URLarxivmath0507526vl
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available athttpwwwoptimization-onlineorgDB_FILE2011012898pdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Foygel R and Srebro N Concentration-based guarantees for low-rank matrix reconstruction Journal of Machine Learning
Research - Proceedings Track 19315ndash340 2011
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W A combinatorial central limit theorem Ann Math Statist 22558ndash566 1951
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 35 35
Extensions
References II
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Shawe-Taylor J Zemel R SBartlett P L Pereira F C N and Weinberger K Q (eds) Advances in Neural Information Processing Systems 24 pp1134ndash1142 2011
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs URL httparxivorgabs12016002 2012
Negahban S and Wainwright M J Restricted strong convexity and weighted matrix completion Optimal bounds with noisearXiv10092118v2[csIT] 2010
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B Simpler approach to matrix completion J Mach Learn Res 123413ndash3430 2011
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 36 35
- Motivation
-
- Matrix Completion
- Matrix Concentration
-
- Extensions
-
Motivation
Concentration Inequalities
Matrix concentration
PX minus EX ge t le δ
Pλmax(X minus EX) ge t le δ
Non-asymptotic control of random matrices with complexdistributions
Applications
Matrix completion from sparse random measurements(Gross 2011 Recht 2011 Negahban and Wainwright 2010 Mackey Talwalkar and Jordan 2011)
Randomized matrix multiplication and factorization(Drineas Mahoney and Muthukrishnan 2008 Hsu Kakade and Zhang 2011b)
Convex relaxation of robust or chance-constrained optimization(Nemirovski 2007 So 2011 Cheung So and Wang 2011)
Random graph analysis (Christofides and Markstrom 2008 Oliveira 2009)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 2 35
Motivation Matrix Completion
Motivation Matrix Completion
Goal Recover a matrix L0 isin Rmtimesn from a subset of its entries
1 43 5 5
rarr
2 3 1 43 4 5 12 5 3 5
Examples
Collaborative filtering How will user i rate movie j
Ranking on the web Is URL j relevant to user i
Link prediction Is user i friends with user j
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 3 35
Motivation Matrix Completion
Motivation Matrix Completion
Goal Recover a matrix L0 isin Rmtimesn from a subset of its entries
1 43 5 5
rarr
2 3 1 43 4 5 12 5 3 5
Bad News Impossible to recover a generic matrixToo many degrees of freedom too few observations
Good News
Small number of latent factors determine preferencesMovie ratings cluster by genre and director
L0 = A
B⊤
These low-rank matrices are easier to completeMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 4 35
Motivation Matrix Completion
How to Complete a Low-rank Matrix
Suppose Ω is the set of observed entry locations
First attemptminimizeA rankA
subject to Aij = L0ij (i j) isin Ω
Problem NP-hard rArr computationally intractable
Solution Solve convex relaxation ()
minimizeA Alowastsubject to Aij = L0ij (i j) isin Ω
where Alowast =sum
k σk(A) is the tracenuclear norm of A
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 5 35
Motivation Matrix Completion
Can Convex Optimization Recover L0
Yes with high probability
Theorem (Recht 2011)
If L0 isin Rmtimesn has rank r and s amp βrn log2(n) entries are observeduniformly at random then (under some technical conditions) convexoptimization recovers L0 exactly with probability at least 1minus nminusβ
See also Gross (2011) Mackey Talwalkar and Jordan (2011)
Past results (Candes and Recht 2009 Candes and Tao 2009) requiredstronger assumptions and more intensive analysis
Streamlined approach reposes on a matrix variant of a classicalBernstein inequality (1946)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 6 35
Motivation Matrix Completion
Scalar Bernstein Inequality
Theorem (Bernstein 1946)
Let (Yk)kge1 be independent random variables in R satisfying
EYk = 0 and |Yk| le R for each index k
Define the variance parameter
σ2 =sum
kEY 2
k
Then for all t ge 0
P
∣
∣
∣
sum
kYk
∣
∣
∣ge t
le 2 middot exp minust22σ2 + 2Rt3
Gaussian decay controlled by variance when t is small
Exponential decay controlled by uniform bound for large t
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 7 35
Motivation Matrix Completion
Matrix Bernstein Inequality
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Rmtimesn satisfying
EYk = 0 and Yk le R for each index k
Define the variance parameter
σ2 = max(∥
∥
∥
sum
kEYkY
⊤k
∥
∥
∥∥
∥
∥
sum
kEY ⊤
k Yk
∥
∥
∥
)
Then for all t ge 0
P
∥
∥
∥
sum
kYk
∥
∥
∥ge t
le (m+ n) middot exp minust23σ2 + 2Rt
See also Tropp (2011) Oliveira (2009) Recht (2011)
Gaussian tail when t is small exponential tail for large t
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 8 35
Motivation Matrix Completion
Matrix Bernstein Inequality
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
For all t ge 0
P
∥
∥
∥
sum
kYk
∥
∥
∥ge t
le (m+ n) middot exp minust23σ2 + 2Rt
Consequences for matrix completion
Recht (2011) showed that uniform sampling of entries capturesmost of the information in incoherent low-rank matrices
Negahban and Wainwright (2010) showed that iid sampling ofentries captures most of the information in non-spiky (near)low-rank matrices
Foygel and Srebro (2011) characterized the generalization errorof convex MC through Rademacher complexity
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 9 35
Motivation Matrix Concentration
Concentration Inequalities
Matrix concentration
Pλmax(X minus EX) ge t le δ
Difficulty Matrix multiplication is not commutative
rArr eX+Y 6= eXeY
Past approaches (Ahlswede and Winter 2002 Oliveira 2009 Tropp 2011)
Rely on deep results from matrix analysis
Apply to sums of independent matrices and matrix martingales
This work
Steinrsquos method of exchangeable pairs (1972) as advanced byChatterjee (2007) for scalar concentrationrArr Improved exponential tail inequalities (Hoeffding Bernstein)rArr Polynomial moment inequalities (Khintchine Rosenthal)rArr Dependent sums and more general matrix functionals
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 10 35
Motivation Matrix Concentration
Roadmap
1 Motivation
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent Sequences
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 11 35
Background
Notation
Hermitian matrices Hd = A isin Cdtimesd A = AlowastAll matrices in this talk are Hermitian
Maximum eigenvalue λmax(middot)Trace trB the sum of the diagonal entries of B
Spectral norm B the maximum singular value of B
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 12 35
Background
Matrix Stein Pair
Definition (Exchangeable Pair)
(ZZ prime) is an exchangeable pair if (ZZ prime)d= (Z prime Z)
Definition (Matrix Stein Pair)
Let (ZZ prime) be an exchangeable pair and let Ψ Z rarr Hd be ameasurable function Define the random matrices
X = Ψ(Z) and X prime = Ψ(Z prime)
(XX prime) is a matrix Stein pair with scale factor α isin (0 1] if
E[X prime |Z] = (1minus α)X
Matrix Stein pairs are exchangeable pairs
Matrix Stein pairs always have zero mean
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 13 35
Background
The Conditional Variance
Definition (Conditional Variance)
Suppose that (XX prime) is a matrix Stein pair with scale factor αconstructed from the exchangeable pair (ZZ prime) The conditional
variance is the random matrix
∆X = ∆X(Z) =1
2αE[
(X minusX prime)2 |Z]
∆X is a stochastic estimate for the variance EX2
Take-home Message
Control over ∆X yields control over λmax(X)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 14 35
Exponential Tail Inequalities
Exponential Concentration for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a matrix Stein pair with X isin Hd Suppose that
∆X 4 cX + v I almost surely for c v ge 0
Then for all t ge 0
Pλmax(X) ge t le d middot exp minust22v + 2ct
Control over the conditional variance ∆X yields
Gaussian tail for λmax(X) for small t exponential tail for large t
When d = 1 improves scalar result of Chatterjee (2007)
The dimensional factor d cannot be removed
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 15 35
Exponential Tail Inequalities
Matrix Hoeffding Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let X =sum
k Yk for independent matrices in Hd satisfying
EYk = 0 and Y 2k 4 A2
k
for deterministic matrices (Ak)kge1 Define the variance parameter
σ2 =∥
∥
∥
sum
kA2k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot eminust22σ2
Improves upon the matrix Hoeffding inequality of Tropp (2011)Optimal constant 12 in the exponent
Can replace variance parameter with σ2 = 12
∥
∥
sum
k
(
A2k + EY 2
k
)∥
∥
Tighter than classical Hoeffding inequality (1963) when d = 1
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 16 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
1 Matrix Laplace transform method (Ahlswede amp Winter 2002)
Relate tail probability to the trace of the mgf of X
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ)
where m(θ) = E tr eθX
Problem eX+Y 6= eXeY when XY isin Hd
How to bound the trace mgf
Past approaches Golden-Thompson Liebrsquos concavity theorem
Chatterjeersquos strategy for scalar concentration
Control mgf growth by bounding derivative
mprime(θ) = E trXeθX for θ isin R
Rewrite using exchangeable pairs
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 17 35
Exponential Tail Inequalities
Method of Exchangeable Pairs
Lemma
Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying
E(X minusX prime)F (X) ltinfin
Then
E[X F (X)] =1
2αE[(X minusX prime)(F (X)minus F (X prime))] (1)
Intuition
Can characterize the distribution of a random matrix byintegrating it against a class of test functions F
Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 18 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
2 Method of Exchangeable Pairs
Rewrite the derivative of the trace mgf
mprime(θ) = E trXeθX =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
Goal Use the smoothness of F (X) = eθX to bound the derivative
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 19 35
Exponential Tail Inequalities
Mean Value Trace Inequality
Lemma (Mackey Jordan Chen Farrell and Tropp 2012)
Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that
tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1
2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]
Standard matrix functions If g R rarr R and
A = Q
λ1
λd
Qlowast then g(A) = Q
g(λ1)
g(λd)
Qlowast
Inequality does not hold without the traceFor exponential concentration we let g(A) = A and h(B) = eθB
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 20 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
le θ
4αE tr
[
(X minusX prime)2 middot(
eθX + eθXprime)]
=θ
2αE tr
[
(X minusX prime)2 middot eθX]
= θ middot E tr
[
1
2αE[
(X minusX prime)2 |Z]
middot eθX]
= θ middot E tr[
∆X eθX]
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 21 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) le θ middot E tr[
∆X eθX]
4 Conditional Variance Bound ∆X 4 cX + v I
Yields differential inequality
mprime(θ) le cθE tr[
X eθX]
+ vθE tr[
eθX]
= cθ middotmprime(θ) + vθ middotm(θ)
Solve to bound m(θ) and thereby bound
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ) le d middot exp minust22v + 2ct
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 22 35
Exponential Tail Inequalities
Refined Exponential Concentration
Relaxing the constraint ∆X 4 cX + v
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a bounded matrix Stein pair with X isin Hd Definethe function
r(ψ) =1
ψlogE tr(eψ∆Xd) for each ψ gt 0
Then for all t ge 0 and all ψ gt 0
Pλmax(X) ge t le d middot exp minust22r(ψ) + 2t
radicψ
r(ψ) measures typical magnitude of conditional variance
Eλmax(∆X) le infψgt0
[
r(ψ) + log dψ
]
When d = 1 improves scalar result of Chatterjee (2008)Proof extends to unbounded random matricesMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 23 35
Exponential Tail Inequalities
Matrix Bernstein Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Hd satisfying
EYk = 0 and Yk le R for each index k
Define the variance parameter
σ2 =∥
∥
∥
sum
kEY 2
k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot exp minust23σ2 + 2Rt
Gaussian tail controlled by improved variance when t is smallKey proof idea Apply refined concentration and boundr(ψ) = 1
ψlogE tr(eψ∆Xd) using unrefined concentration
Constants better than Oliveira (2009) worse than Tropp (2011)Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 24 35
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere E tr |X|2p ltinfin Then
(
E tr |X|2p)12p le
radic
2pminus 1 middot(
E tr∆pX
)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 25 35
Polynomial Moment Inequalities
Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
E tr(
sum
kεkAk
)2p
le (2pminus 1)p middot tr(
sum
kA2k
)p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 26 35
Dependent Sequences
Adding Dependence
1 MotivationMatrix CompletionMatrix Concentration
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent SequencesSums of Conditionally Zero-mean MatricesCombinatorial Sums
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 27 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Sum of Conditionally Zero-Mean Matrices)
Given a sequence of Hermitian matrices (Yk)nk=1 satisfying the
Conditional zero mean property E[Yk | (Yj)j 6=k] = 0
for all k define the random sum X =sumn
k=1Yk
Note (Yk)kge1 is a martingale difference sequence
Examples
Sums of independent centered random matricesMany sums of conditionally independent random matrices
Yk perpperp (Yj)j 6=k | Z and E[Yk |Z] = 0
Rademacher series with random matrix coefficients
X =sum
kεkWk
(Wk)kge1 Hermitian (εk)kge1 independent Rademacher
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 28 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Matrix Stein Pair for X =sumn
k=1 Yk
Let Y primek and Yk be conditionally iid given (Yj)j 6=k
Draw index K uniformly from 1 nDefine X prime = X + Y prime
K minus YK
Check Stein pair condition
E[X minusX prime | (Yj)jge1] = E[YK minus Y primeK | (Yj)jge1]
=1
n
sumn
k=1
(
Yk minus E[Y primek | (Yj)j 6=k]
)
=1
n
sumn
k=1Yk =
1
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 29 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Conditional Variance for X = Y minus EY
∆X =n
2middot E
[
(X minusX prime)2 | (Yj)jge1
]
=n
2middot E
[
(YK minus Y primeK)
2 | (Yj)jge1
]
=1
2
sumn
k=1
(
Y 2k + E[Y 2
k | (Yj)j 6=k])
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 30 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Given a deterministic array (Ajk)njk=1 of Hermitian matrices and a
uniformly random permutation π on 1 n define thecombinatorial matrix statistic
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Generalizes the scalar statistics studied by Hoeffding (1951)
Example
Sampling without replacement from B1 BnW =
sums
j=1Bπ(j)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 31 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Matrix Stein Pair for X = Y minus EY
Draw indices (JK) uniformly from 1 n2Define πprime = π (JK) and X prime =
sumnj=1Ajπprime(j) minus EY
Check Stein pair condition
E[X minusX prime | π] = E[
AJπ(J) +AKπ(K) minusAJπ(K) minusAKπ(J) | π]
=1
n2
sumn
jk=1Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
=2
n(Y minus EY ) =
2
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 32 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Conditional Variance for X = Y minus EY
∆X(π) =n
4E[
(X minusX prime)2 | π]
=1
4n
sumn
jk=1
[
Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
]2
41
n
sumn
jk=1
[
A2jπ(j) +A2
kπ(k) +A2jπ(k) +A2
kπ(j)
]
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 33 35
Extensions
Extensions
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = B
Beyond Sums
Matrix-valued functions satisfying a self-reproducing property
eg Matrix second-order Rademacher chaossum
jk εjεkAjk
Yields a dependent bounded differences inequality for matrices
Generalized Matrix Stein Pairs
Satisfy E[g(X)minus g(X prime) |Z] = αX almost surely forg R rarr R weakly increasingMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 34 35
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Bernstein S The theory of probabilities Gastehizdat Publishing House 1946
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Candes E J and Recht B Exact matrix completion via convex optimization Found Comput Math 9717ndash772 2009
Candes E J and Tao T The power of convex relaxation Near-optimal matrix completion IEEE Trans Info Theory 2009URL arXiv09031476 To appear Available at arXiv09031476
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Chatterjee S Concentration inequalities with exchangeable pairs PhD thesis Stanford University Palo Alto Feb 2008 URLarxivmath0507526vl
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available athttpwwwoptimization-onlineorgDB_FILE2011012898pdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Foygel R and Srebro N Concentration-based guarantees for low-rank matrix reconstruction Journal of Machine Learning
Research - Proceedings Track 19315ndash340 2011
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W A combinatorial central limit theorem Ann Math Statist 22558ndash566 1951
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 35 35
Extensions
References II
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Shawe-Taylor J Zemel R SBartlett P L Pereira F C N and Weinberger K Q (eds) Advances in Neural Information Processing Systems 24 pp1134ndash1142 2011
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs URL httparxivorgabs12016002 2012
Negahban S and Wainwright M J Restricted strong convexity and weighted matrix completion Optimal bounds with noisearXiv10092118v2[csIT] 2010
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B Simpler approach to matrix completion J Mach Learn Res 123413ndash3430 2011
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 36 35
- Motivation
-
- Matrix Completion
- Matrix Concentration
-
- Extensions
-
Motivation Matrix Completion
Motivation Matrix Completion
Goal Recover a matrix L0 isin Rmtimesn from a subset of its entries
1 43 5 5
rarr
2 3 1 43 4 5 12 5 3 5
Examples
Collaborative filtering How will user i rate movie j
Ranking on the web Is URL j relevant to user i
Link prediction Is user i friends with user j
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 3 35
Motivation Matrix Completion
Motivation Matrix Completion
Goal Recover a matrix L0 isin Rmtimesn from a subset of its entries
1 43 5 5
rarr
2 3 1 43 4 5 12 5 3 5
Bad News Impossible to recover a generic matrixToo many degrees of freedom too few observations
Good News
Small number of latent factors determine preferencesMovie ratings cluster by genre and director
L0 = A
B⊤
These low-rank matrices are easier to completeMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 4 35
Motivation Matrix Completion
How to Complete a Low-rank Matrix
Suppose Ω is the set of observed entry locations
First attemptminimizeA rankA
subject to Aij = L0ij (i j) isin Ω
Problem NP-hard rArr computationally intractable
Solution Solve convex relaxation ()
minimizeA Alowastsubject to Aij = L0ij (i j) isin Ω
where Alowast =sum
k σk(A) is the tracenuclear norm of A
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 5 35
Motivation Matrix Completion
Can Convex Optimization Recover L0
Yes with high probability
Theorem (Recht 2011)
If L0 isin Rmtimesn has rank r and s amp βrn log2(n) entries are observeduniformly at random then (under some technical conditions) convexoptimization recovers L0 exactly with probability at least 1minus nminusβ
See also Gross (2011) Mackey Talwalkar and Jordan (2011)
Past results (Candes and Recht 2009 Candes and Tao 2009) requiredstronger assumptions and more intensive analysis
Streamlined approach reposes on a matrix variant of a classicalBernstein inequality (1946)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 6 35
Motivation Matrix Completion
Scalar Bernstein Inequality
Theorem (Bernstein 1946)
Let (Yk)kge1 be independent random variables in R satisfying
EYk = 0 and |Yk| le R for each index k
Define the variance parameter
σ2 =sum
kEY 2
k
Then for all t ge 0
P
∣
∣
∣
sum
kYk
∣
∣
∣ge t
le 2 middot exp minust22σ2 + 2Rt3
Gaussian decay controlled by variance when t is small
Exponential decay controlled by uniform bound for large t
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 7 35
Motivation Matrix Completion
Matrix Bernstein Inequality
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Rmtimesn satisfying
EYk = 0 and Yk le R for each index k
Define the variance parameter
σ2 = max(∥
∥
∥
sum
kEYkY
⊤k
∥
∥
∥∥
∥
∥
sum
kEY ⊤
k Yk
∥
∥
∥
)
Then for all t ge 0
P
∥
∥
∥
sum
kYk
∥
∥
∥ge t
le (m+ n) middot exp minust23σ2 + 2Rt
See also Tropp (2011) Oliveira (2009) Recht (2011)
Gaussian tail when t is small exponential tail for large t
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 8 35
Motivation Matrix Completion
Matrix Bernstein Inequality
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
For all t ge 0
P
∥
∥
∥
sum
kYk
∥
∥
∥ge t
le (m+ n) middot exp minust23σ2 + 2Rt
Consequences for matrix completion
Recht (2011) showed that uniform sampling of entries capturesmost of the information in incoherent low-rank matrices
Negahban and Wainwright (2010) showed that iid sampling ofentries captures most of the information in non-spiky (near)low-rank matrices
Foygel and Srebro (2011) characterized the generalization errorof convex MC through Rademacher complexity
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 9 35
Motivation Matrix Concentration
Concentration Inequalities
Matrix concentration
Pλmax(X minus EX) ge t le δ
Difficulty Matrix multiplication is not commutative
rArr eX+Y 6= eXeY
Past approaches (Ahlswede and Winter 2002 Oliveira 2009 Tropp 2011)
Rely on deep results from matrix analysis
Apply to sums of independent matrices and matrix martingales
This work
Steinrsquos method of exchangeable pairs (1972) as advanced byChatterjee (2007) for scalar concentrationrArr Improved exponential tail inequalities (Hoeffding Bernstein)rArr Polynomial moment inequalities (Khintchine Rosenthal)rArr Dependent sums and more general matrix functionals
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 10 35
Motivation Matrix Concentration
Roadmap
1 Motivation
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent Sequences
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 11 35
Background
Notation
Hermitian matrices Hd = A isin Cdtimesd A = AlowastAll matrices in this talk are Hermitian
Maximum eigenvalue λmax(middot)Trace trB the sum of the diagonal entries of B
Spectral norm B the maximum singular value of B
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 12 35
Background
Matrix Stein Pair
Definition (Exchangeable Pair)
(ZZ prime) is an exchangeable pair if (ZZ prime)d= (Z prime Z)
Definition (Matrix Stein Pair)
Let (ZZ prime) be an exchangeable pair and let Ψ Z rarr Hd be ameasurable function Define the random matrices
X = Ψ(Z) and X prime = Ψ(Z prime)
(XX prime) is a matrix Stein pair with scale factor α isin (0 1] if
E[X prime |Z] = (1minus α)X
Matrix Stein pairs are exchangeable pairs
Matrix Stein pairs always have zero mean
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 13 35
Background
The Conditional Variance
Definition (Conditional Variance)
Suppose that (XX prime) is a matrix Stein pair with scale factor αconstructed from the exchangeable pair (ZZ prime) The conditional
variance is the random matrix
∆X = ∆X(Z) =1
2αE[
(X minusX prime)2 |Z]
∆X is a stochastic estimate for the variance EX2
Take-home Message
Control over ∆X yields control over λmax(X)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 14 35
Exponential Tail Inequalities
Exponential Concentration for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a matrix Stein pair with X isin Hd Suppose that
∆X 4 cX + v I almost surely for c v ge 0
Then for all t ge 0
Pλmax(X) ge t le d middot exp minust22v + 2ct
Control over the conditional variance ∆X yields
Gaussian tail for λmax(X) for small t exponential tail for large t
When d = 1 improves scalar result of Chatterjee (2007)
The dimensional factor d cannot be removed
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 15 35
Exponential Tail Inequalities
Matrix Hoeffding Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let X =sum
k Yk for independent matrices in Hd satisfying
EYk = 0 and Y 2k 4 A2
k
for deterministic matrices (Ak)kge1 Define the variance parameter
σ2 =∥
∥
∥
sum
kA2k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot eminust22σ2
Improves upon the matrix Hoeffding inequality of Tropp (2011)Optimal constant 12 in the exponent
Can replace variance parameter with σ2 = 12
∥
∥
sum
k
(
A2k + EY 2
k
)∥
∥
Tighter than classical Hoeffding inequality (1963) when d = 1
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 16 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
1 Matrix Laplace transform method (Ahlswede amp Winter 2002)
Relate tail probability to the trace of the mgf of X
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ)
where m(θ) = E tr eθX
Problem eX+Y 6= eXeY when XY isin Hd
How to bound the trace mgf
Past approaches Golden-Thompson Liebrsquos concavity theorem
Chatterjeersquos strategy for scalar concentration
Control mgf growth by bounding derivative
mprime(θ) = E trXeθX for θ isin R
Rewrite using exchangeable pairs
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 17 35
Exponential Tail Inequalities
Method of Exchangeable Pairs
Lemma
Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying
E(X minusX prime)F (X) ltinfin
Then
E[X F (X)] =1
2αE[(X minusX prime)(F (X)minus F (X prime))] (1)
Intuition
Can characterize the distribution of a random matrix byintegrating it against a class of test functions F
Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 18 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
2 Method of Exchangeable Pairs
Rewrite the derivative of the trace mgf
mprime(θ) = E trXeθX =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
Goal Use the smoothness of F (X) = eθX to bound the derivative
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 19 35
Exponential Tail Inequalities
Mean Value Trace Inequality
Lemma (Mackey Jordan Chen Farrell and Tropp 2012)
Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that
tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1
2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]
Standard matrix functions If g R rarr R and
A = Q
λ1
λd
Qlowast then g(A) = Q
g(λ1)
g(λd)
Qlowast
Inequality does not hold without the traceFor exponential concentration we let g(A) = A and h(B) = eθB
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 20 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
le θ
4αE tr
[
(X minusX prime)2 middot(
eθX + eθXprime)]
=θ
2αE tr
[
(X minusX prime)2 middot eθX]
= θ middot E tr
[
1
2αE[
(X minusX prime)2 |Z]
middot eθX]
= θ middot E tr[
∆X eθX]
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 21 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) le θ middot E tr[
∆X eθX]
4 Conditional Variance Bound ∆X 4 cX + v I
Yields differential inequality
mprime(θ) le cθE tr[
X eθX]
+ vθE tr[
eθX]
= cθ middotmprime(θ) + vθ middotm(θ)
Solve to bound m(θ) and thereby bound
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ) le d middot exp minust22v + 2ct
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 22 35
Exponential Tail Inequalities
Refined Exponential Concentration
Relaxing the constraint ∆X 4 cX + v
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a bounded matrix Stein pair with X isin Hd Definethe function
r(ψ) =1
ψlogE tr(eψ∆Xd) for each ψ gt 0
Then for all t ge 0 and all ψ gt 0
Pλmax(X) ge t le d middot exp minust22r(ψ) + 2t
radicψ
r(ψ) measures typical magnitude of conditional variance
Eλmax(∆X) le infψgt0
[
r(ψ) + log dψ
]
When d = 1 improves scalar result of Chatterjee (2008)Proof extends to unbounded random matricesMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 23 35
Exponential Tail Inequalities
Matrix Bernstein Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Hd satisfying
EYk = 0 and Yk le R for each index k
Define the variance parameter
σ2 =∥
∥
∥
sum
kEY 2
k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot exp minust23σ2 + 2Rt
Gaussian tail controlled by improved variance when t is smallKey proof idea Apply refined concentration and boundr(ψ) = 1
ψlogE tr(eψ∆Xd) using unrefined concentration
Constants better than Oliveira (2009) worse than Tropp (2011)Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 24 35
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere E tr |X|2p ltinfin Then
(
E tr |X|2p)12p le
radic
2pminus 1 middot(
E tr∆pX
)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 25 35
Polynomial Moment Inequalities
Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
E tr(
sum
kεkAk
)2p
le (2pminus 1)p middot tr(
sum
kA2k
)p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 26 35
Dependent Sequences
Adding Dependence
1 MotivationMatrix CompletionMatrix Concentration
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent SequencesSums of Conditionally Zero-mean MatricesCombinatorial Sums
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 27 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Sum of Conditionally Zero-Mean Matrices)
Given a sequence of Hermitian matrices (Yk)nk=1 satisfying the
Conditional zero mean property E[Yk | (Yj)j 6=k] = 0
for all k define the random sum X =sumn
k=1Yk
Note (Yk)kge1 is a martingale difference sequence
Examples
Sums of independent centered random matricesMany sums of conditionally independent random matrices
Yk perpperp (Yj)j 6=k | Z and E[Yk |Z] = 0
Rademacher series with random matrix coefficients
X =sum
kεkWk
(Wk)kge1 Hermitian (εk)kge1 independent Rademacher
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 28 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Matrix Stein Pair for X =sumn
k=1 Yk
Let Y primek and Yk be conditionally iid given (Yj)j 6=k
Draw index K uniformly from 1 nDefine X prime = X + Y prime
K minus YK
Check Stein pair condition
E[X minusX prime | (Yj)jge1] = E[YK minus Y primeK | (Yj)jge1]
=1
n
sumn
k=1
(
Yk minus E[Y primek | (Yj)j 6=k]
)
=1
n
sumn
k=1Yk =
1
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 29 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Conditional Variance for X = Y minus EY
∆X =n
2middot E
[
(X minusX prime)2 | (Yj)jge1
]
=n
2middot E
[
(YK minus Y primeK)
2 | (Yj)jge1
]
=1
2
sumn
k=1
(
Y 2k + E[Y 2
k | (Yj)j 6=k])
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 30 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Given a deterministic array (Ajk)njk=1 of Hermitian matrices and a
uniformly random permutation π on 1 n define thecombinatorial matrix statistic
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Generalizes the scalar statistics studied by Hoeffding (1951)
Example
Sampling without replacement from B1 BnW =
sums
j=1Bπ(j)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 31 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Matrix Stein Pair for X = Y minus EY
Draw indices (JK) uniformly from 1 n2Define πprime = π (JK) and X prime =
sumnj=1Ajπprime(j) minus EY
Check Stein pair condition
E[X minusX prime | π] = E[
AJπ(J) +AKπ(K) minusAJπ(K) minusAKπ(J) | π]
=1
n2
sumn
jk=1Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
=2
n(Y minus EY ) =
2
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 32 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Conditional Variance for X = Y minus EY
∆X(π) =n
4E[
(X minusX prime)2 | π]
=1
4n
sumn
jk=1
[
Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
]2
41
n
sumn
jk=1
[
A2jπ(j) +A2
kπ(k) +A2jπ(k) +A2
kπ(j)
]
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 33 35
Extensions
Extensions
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = B
Beyond Sums
Matrix-valued functions satisfying a self-reproducing property
eg Matrix second-order Rademacher chaossum
jk εjεkAjk
Yields a dependent bounded differences inequality for matrices
Generalized Matrix Stein Pairs
Satisfy E[g(X)minus g(X prime) |Z] = αX almost surely forg R rarr R weakly increasingMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 34 35
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Bernstein S The theory of probabilities Gastehizdat Publishing House 1946
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Candes E J and Recht B Exact matrix completion via convex optimization Found Comput Math 9717ndash772 2009
Candes E J and Tao T The power of convex relaxation Near-optimal matrix completion IEEE Trans Info Theory 2009URL arXiv09031476 To appear Available at arXiv09031476
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Chatterjee S Concentration inequalities with exchangeable pairs PhD thesis Stanford University Palo Alto Feb 2008 URLarxivmath0507526vl
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available athttpwwwoptimization-onlineorgDB_FILE2011012898pdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Foygel R and Srebro N Concentration-based guarantees for low-rank matrix reconstruction Journal of Machine Learning
Research - Proceedings Track 19315ndash340 2011
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W A combinatorial central limit theorem Ann Math Statist 22558ndash566 1951
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 35 35
Extensions
References II
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Shawe-Taylor J Zemel R SBartlett P L Pereira F C N and Weinberger K Q (eds) Advances in Neural Information Processing Systems 24 pp1134ndash1142 2011
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs URL httparxivorgabs12016002 2012
Negahban S and Wainwright M J Restricted strong convexity and weighted matrix completion Optimal bounds with noisearXiv10092118v2[csIT] 2010
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B Simpler approach to matrix completion J Mach Learn Res 123413ndash3430 2011
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 36 35
- Motivation
-
- Matrix Completion
- Matrix Concentration
-
- Extensions
-
Motivation Matrix Completion
Motivation Matrix Completion
Goal Recover a matrix L0 isin Rmtimesn from a subset of its entries
1 43 5 5
rarr
2 3 1 43 4 5 12 5 3 5
Bad News Impossible to recover a generic matrixToo many degrees of freedom too few observations
Good News
Small number of latent factors determine preferencesMovie ratings cluster by genre and director
L0 = A
B⊤
These low-rank matrices are easier to completeMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 4 35
Motivation Matrix Completion
How to Complete a Low-rank Matrix
Suppose Ω is the set of observed entry locations
First attemptminimizeA rankA
subject to Aij = L0ij (i j) isin Ω
Problem NP-hard rArr computationally intractable
Solution Solve convex relaxation ()
minimizeA Alowastsubject to Aij = L0ij (i j) isin Ω
where Alowast =sum
k σk(A) is the tracenuclear norm of A
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 5 35
Motivation Matrix Completion
Can Convex Optimization Recover L0
Yes with high probability
Theorem (Recht 2011)
If L0 isin Rmtimesn has rank r and s amp βrn log2(n) entries are observeduniformly at random then (under some technical conditions) convexoptimization recovers L0 exactly with probability at least 1minus nminusβ
See also Gross (2011) Mackey Talwalkar and Jordan (2011)
Past results (Candes and Recht 2009 Candes and Tao 2009) requiredstronger assumptions and more intensive analysis
Streamlined approach reposes on a matrix variant of a classicalBernstein inequality (1946)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 6 35
Motivation Matrix Completion
Scalar Bernstein Inequality
Theorem (Bernstein 1946)
Let (Yk)kge1 be independent random variables in R satisfying
EYk = 0 and |Yk| le R for each index k
Define the variance parameter
σ2 =sum
kEY 2
k
Then for all t ge 0
P
∣
∣
∣
sum
kYk
∣
∣
∣ge t
le 2 middot exp minust22σ2 + 2Rt3
Gaussian decay controlled by variance when t is small
Exponential decay controlled by uniform bound for large t
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 7 35
Motivation Matrix Completion
Matrix Bernstein Inequality
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Rmtimesn satisfying
EYk = 0 and Yk le R for each index k
Define the variance parameter
σ2 = max(∥
∥
∥
sum
kEYkY
⊤k
∥
∥
∥∥
∥
∥
sum
kEY ⊤
k Yk
∥
∥
∥
)
Then for all t ge 0
P
∥
∥
∥
sum
kYk
∥
∥
∥ge t
le (m+ n) middot exp minust23σ2 + 2Rt
See also Tropp (2011) Oliveira (2009) Recht (2011)
Gaussian tail when t is small exponential tail for large t
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 8 35
Motivation Matrix Completion
Matrix Bernstein Inequality
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
For all t ge 0
P
∥
∥
∥
sum
kYk
∥
∥
∥ge t
le (m+ n) middot exp minust23σ2 + 2Rt
Consequences for matrix completion
Recht (2011) showed that uniform sampling of entries capturesmost of the information in incoherent low-rank matrices
Negahban and Wainwright (2010) showed that iid sampling ofentries captures most of the information in non-spiky (near)low-rank matrices
Foygel and Srebro (2011) characterized the generalization errorof convex MC through Rademacher complexity
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 9 35
Motivation Matrix Concentration
Concentration Inequalities
Matrix concentration
Pλmax(X minus EX) ge t le δ
Difficulty Matrix multiplication is not commutative
rArr eX+Y 6= eXeY
Past approaches (Ahlswede and Winter 2002 Oliveira 2009 Tropp 2011)
Rely on deep results from matrix analysis
Apply to sums of independent matrices and matrix martingales
This work
Steinrsquos method of exchangeable pairs (1972) as advanced byChatterjee (2007) for scalar concentrationrArr Improved exponential tail inequalities (Hoeffding Bernstein)rArr Polynomial moment inequalities (Khintchine Rosenthal)rArr Dependent sums and more general matrix functionals
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 10 35
Motivation Matrix Concentration
Roadmap
1 Motivation
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent Sequences
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 11 35
Background
Notation
Hermitian matrices Hd = A isin Cdtimesd A = AlowastAll matrices in this talk are Hermitian
Maximum eigenvalue λmax(middot)Trace trB the sum of the diagonal entries of B
Spectral norm B the maximum singular value of B
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 12 35
Background
Matrix Stein Pair
Definition (Exchangeable Pair)
(ZZ prime) is an exchangeable pair if (ZZ prime)d= (Z prime Z)
Definition (Matrix Stein Pair)
Let (ZZ prime) be an exchangeable pair and let Ψ Z rarr Hd be ameasurable function Define the random matrices
X = Ψ(Z) and X prime = Ψ(Z prime)
(XX prime) is a matrix Stein pair with scale factor α isin (0 1] if
E[X prime |Z] = (1minus α)X
Matrix Stein pairs are exchangeable pairs
Matrix Stein pairs always have zero mean
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 13 35
Background
The Conditional Variance
Definition (Conditional Variance)
Suppose that (XX prime) is a matrix Stein pair with scale factor αconstructed from the exchangeable pair (ZZ prime) The conditional
variance is the random matrix
∆X = ∆X(Z) =1
2αE[
(X minusX prime)2 |Z]
∆X is a stochastic estimate for the variance EX2
Take-home Message
Control over ∆X yields control over λmax(X)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 14 35
Exponential Tail Inequalities
Exponential Concentration for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a matrix Stein pair with X isin Hd Suppose that
∆X 4 cX + v I almost surely for c v ge 0
Then for all t ge 0
Pλmax(X) ge t le d middot exp minust22v + 2ct
Control over the conditional variance ∆X yields
Gaussian tail for λmax(X) for small t exponential tail for large t
When d = 1 improves scalar result of Chatterjee (2007)
The dimensional factor d cannot be removed
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 15 35
Exponential Tail Inequalities
Matrix Hoeffding Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let X =sum
k Yk for independent matrices in Hd satisfying
EYk = 0 and Y 2k 4 A2
k
for deterministic matrices (Ak)kge1 Define the variance parameter
σ2 =∥
∥
∥
sum
kA2k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot eminust22σ2
Improves upon the matrix Hoeffding inequality of Tropp (2011)Optimal constant 12 in the exponent
Can replace variance parameter with σ2 = 12
∥
∥
sum
k
(
A2k + EY 2
k
)∥
∥
Tighter than classical Hoeffding inequality (1963) when d = 1
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 16 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
1 Matrix Laplace transform method (Ahlswede amp Winter 2002)
Relate tail probability to the trace of the mgf of X
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ)
where m(θ) = E tr eθX
Problem eX+Y 6= eXeY when XY isin Hd
How to bound the trace mgf
Past approaches Golden-Thompson Liebrsquos concavity theorem
Chatterjeersquos strategy for scalar concentration
Control mgf growth by bounding derivative
mprime(θ) = E trXeθX for θ isin R
Rewrite using exchangeable pairs
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 17 35
Exponential Tail Inequalities
Method of Exchangeable Pairs
Lemma
Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying
E(X minusX prime)F (X) ltinfin
Then
E[X F (X)] =1
2αE[(X minusX prime)(F (X)minus F (X prime))] (1)
Intuition
Can characterize the distribution of a random matrix byintegrating it against a class of test functions F
Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 18 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
2 Method of Exchangeable Pairs
Rewrite the derivative of the trace mgf
mprime(θ) = E trXeθX =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
Goal Use the smoothness of F (X) = eθX to bound the derivative
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 19 35
Exponential Tail Inequalities
Mean Value Trace Inequality
Lemma (Mackey Jordan Chen Farrell and Tropp 2012)
Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that
tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1
2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]
Standard matrix functions If g R rarr R and
A = Q
λ1
λd
Qlowast then g(A) = Q
g(λ1)
g(λd)
Qlowast
Inequality does not hold without the traceFor exponential concentration we let g(A) = A and h(B) = eθB
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 20 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
le θ
4αE tr
[
(X minusX prime)2 middot(
eθX + eθXprime)]
=θ
2αE tr
[
(X minusX prime)2 middot eθX]
= θ middot E tr
[
1
2αE[
(X minusX prime)2 |Z]
middot eθX]
= θ middot E tr[
∆X eθX]
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 21 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) le θ middot E tr[
∆X eθX]
4 Conditional Variance Bound ∆X 4 cX + v I
Yields differential inequality
mprime(θ) le cθE tr[
X eθX]
+ vθE tr[
eθX]
= cθ middotmprime(θ) + vθ middotm(θ)
Solve to bound m(θ) and thereby bound
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ) le d middot exp minust22v + 2ct
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 22 35
Exponential Tail Inequalities
Refined Exponential Concentration
Relaxing the constraint ∆X 4 cX + v
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a bounded matrix Stein pair with X isin Hd Definethe function
r(ψ) =1
ψlogE tr(eψ∆Xd) for each ψ gt 0
Then for all t ge 0 and all ψ gt 0
Pλmax(X) ge t le d middot exp minust22r(ψ) + 2t
radicψ
r(ψ) measures typical magnitude of conditional variance
Eλmax(∆X) le infψgt0
[
r(ψ) + log dψ
]
When d = 1 improves scalar result of Chatterjee (2008)Proof extends to unbounded random matricesMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 23 35
Exponential Tail Inequalities
Matrix Bernstein Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Hd satisfying
EYk = 0 and Yk le R for each index k
Define the variance parameter
σ2 =∥
∥
∥
sum
kEY 2
k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot exp minust23σ2 + 2Rt
Gaussian tail controlled by improved variance when t is smallKey proof idea Apply refined concentration and boundr(ψ) = 1
ψlogE tr(eψ∆Xd) using unrefined concentration
Constants better than Oliveira (2009) worse than Tropp (2011)Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 24 35
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere E tr |X|2p ltinfin Then
(
E tr |X|2p)12p le
radic
2pminus 1 middot(
E tr∆pX
)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 25 35
Polynomial Moment Inequalities
Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
E tr(
sum
kεkAk
)2p
le (2pminus 1)p middot tr(
sum
kA2k
)p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 26 35
Dependent Sequences
Adding Dependence
1 MotivationMatrix CompletionMatrix Concentration
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent SequencesSums of Conditionally Zero-mean MatricesCombinatorial Sums
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 27 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Sum of Conditionally Zero-Mean Matrices)
Given a sequence of Hermitian matrices (Yk)nk=1 satisfying the
Conditional zero mean property E[Yk | (Yj)j 6=k] = 0
for all k define the random sum X =sumn
k=1Yk
Note (Yk)kge1 is a martingale difference sequence
Examples
Sums of independent centered random matricesMany sums of conditionally independent random matrices
Yk perpperp (Yj)j 6=k | Z and E[Yk |Z] = 0
Rademacher series with random matrix coefficients
X =sum
kεkWk
(Wk)kge1 Hermitian (εk)kge1 independent Rademacher
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 28 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Matrix Stein Pair for X =sumn
k=1 Yk
Let Y primek and Yk be conditionally iid given (Yj)j 6=k
Draw index K uniformly from 1 nDefine X prime = X + Y prime
K minus YK
Check Stein pair condition
E[X minusX prime | (Yj)jge1] = E[YK minus Y primeK | (Yj)jge1]
=1
n
sumn
k=1
(
Yk minus E[Y primek | (Yj)j 6=k]
)
=1
n
sumn
k=1Yk =
1
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 29 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Conditional Variance for X = Y minus EY
∆X =n
2middot E
[
(X minusX prime)2 | (Yj)jge1
]
=n
2middot E
[
(YK minus Y primeK)
2 | (Yj)jge1
]
=1
2
sumn
k=1
(
Y 2k + E[Y 2
k | (Yj)j 6=k])
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 30 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Given a deterministic array (Ajk)njk=1 of Hermitian matrices and a
uniformly random permutation π on 1 n define thecombinatorial matrix statistic
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Generalizes the scalar statistics studied by Hoeffding (1951)
Example
Sampling without replacement from B1 BnW =
sums
j=1Bπ(j)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 31 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Matrix Stein Pair for X = Y minus EY
Draw indices (JK) uniformly from 1 n2Define πprime = π (JK) and X prime =
sumnj=1Ajπprime(j) minus EY
Check Stein pair condition
E[X minusX prime | π] = E[
AJπ(J) +AKπ(K) minusAJπ(K) minusAKπ(J) | π]
=1
n2
sumn
jk=1Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
=2
n(Y minus EY ) =
2
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 32 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Conditional Variance for X = Y minus EY
∆X(π) =n
4E[
(X minusX prime)2 | π]
=1
4n
sumn
jk=1
[
Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
]2
41
n
sumn
jk=1
[
A2jπ(j) +A2
kπ(k) +A2jπ(k) +A2
kπ(j)
]
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 33 35
Extensions
Extensions
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = B
Beyond Sums
Matrix-valued functions satisfying a self-reproducing property
eg Matrix second-order Rademacher chaossum
jk εjεkAjk
Yields a dependent bounded differences inequality for matrices
Generalized Matrix Stein Pairs
Satisfy E[g(X)minus g(X prime) |Z] = αX almost surely forg R rarr R weakly increasingMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 34 35
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Bernstein S The theory of probabilities Gastehizdat Publishing House 1946
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Candes E J and Recht B Exact matrix completion via convex optimization Found Comput Math 9717ndash772 2009
Candes E J and Tao T The power of convex relaxation Near-optimal matrix completion IEEE Trans Info Theory 2009URL arXiv09031476 To appear Available at arXiv09031476
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Chatterjee S Concentration inequalities with exchangeable pairs PhD thesis Stanford University Palo Alto Feb 2008 URLarxivmath0507526vl
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available athttpwwwoptimization-onlineorgDB_FILE2011012898pdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Foygel R and Srebro N Concentration-based guarantees for low-rank matrix reconstruction Journal of Machine Learning
Research - Proceedings Track 19315ndash340 2011
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W A combinatorial central limit theorem Ann Math Statist 22558ndash566 1951
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 35 35
Extensions
References II
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Shawe-Taylor J Zemel R SBartlett P L Pereira F C N and Weinberger K Q (eds) Advances in Neural Information Processing Systems 24 pp1134ndash1142 2011
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs URL httparxivorgabs12016002 2012
Negahban S and Wainwright M J Restricted strong convexity and weighted matrix completion Optimal bounds with noisearXiv10092118v2[csIT] 2010
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B Simpler approach to matrix completion J Mach Learn Res 123413ndash3430 2011
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 36 35
- Motivation
-
- Matrix Completion
- Matrix Concentration
-
- Extensions
-
Motivation Matrix Completion
How to Complete a Low-rank Matrix
Suppose Ω is the set of observed entry locations
First attemptminimizeA rankA
subject to Aij = L0ij (i j) isin Ω
Problem NP-hard rArr computationally intractable
Solution Solve convex relaxation ()
minimizeA Alowastsubject to Aij = L0ij (i j) isin Ω
where Alowast =sum
k σk(A) is the tracenuclear norm of A
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 5 35
Motivation Matrix Completion
Can Convex Optimization Recover L0
Yes with high probability
Theorem (Recht 2011)
If L0 isin Rmtimesn has rank r and s amp βrn log2(n) entries are observeduniformly at random then (under some technical conditions) convexoptimization recovers L0 exactly with probability at least 1minus nminusβ
See also Gross (2011) Mackey Talwalkar and Jordan (2011)
Past results (Candes and Recht 2009 Candes and Tao 2009) requiredstronger assumptions and more intensive analysis
Streamlined approach reposes on a matrix variant of a classicalBernstein inequality (1946)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 6 35
Motivation Matrix Completion
Scalar Bernstein Inequality
Theorem (Bernstein 1946)
Let (Yk)kge1 be independent random variables in R satisfying
EYk = 0 and |Yk| le R for each index k
Define the variance parameter
σ2 =sum
kEY 2
k
Then for all t ge 0
P
∣
∣
∣
sum
kYk
∣
∣
∣ge t
le 2 middot exp minust22σ2 + 2Rt3
Gaussian decay controlled by variance when t is small
Exponential decay controlled by uniform bound for large t
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 7 35
Motivation Matrix Completion
Matrix Bernstein Inequality
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Rmtimesn satisfying
EYk = 0 and Yk le R for each index k
Define the variance parameter
σ2 = max(∥
∥
∥
sum
kEYkY
⊤k
∥
∥
∥∥
∥
∥
sum
kEY ⊤
k Yk
∥
∥
∥
)
Then for all t ge 0
P
∥
∥
∥
sum
kYk
∥
∥
∥ge t
le (m+ n) middot exp minust23σ2 + 2Rt
See also Tropp (2011) Oliveira (2009) Recht (2011)
Gaussian tail when t is small exponential tail for large t
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 8 35
Motivation Matrix Completion
Matrix Bernstein Inequality
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
For all t ge 0
P
∥
∥
∥
sum
kYk
∥
∥
∥ge t
le (m+ n) middot exp minust23σ2 + 2Rt
Consequences for matrix completion
Recht (2011) showed that uniform sampling of entries capturesmost of the information in incoherent low-rank matrices
Negahban and Wainwright (2010) showed that iid sampling ofentries captures most of the information in non-spiky (near)low-rank matrices
Foygel and Srebro (2011) characterized the generalization errorof convex MC through Rademacher complexity
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 9 35
Motivation Matrix Concentration
Concentration Inequalities
Matrix concentration
Pλmax(X minus EX) ge t le δ
Difficulty Matrix multiplication is not commutative
rArr eX+Y 6= eXeY
Past approaches (Ahlswede and Winter 2002 Oliveira 2009 Tropp 2011)
Rely on deep results from matrix analysis
Apply to sums of independent matrices and matrix martingales
This work
Steinrsquos method of exchangeable pairs (1972) as advanced byChatterjee (2007) for scalar concentrationrArr Improved exponential tail inequalities (Hoeffding Bernstein)rArr Polynomial moment inequalities (Khintchine Rosenthal)rArr Dependent sums and more general matrix functionals
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 10 35
Motivation Matrix Concentration
Roadmap
1 Motivation
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent Sequences
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 11 35
Background
Notation
Hermitian matrices Hd = A isin Cdtimesd A = AlowastAll matrices in this talk are Hermitian
Maximum eigenvalue λmax(middot)Trace trB the sum of the diagonal entries of B
Spectral norm B the maximum singular value of B
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 12 35
Background
Matrix Stein Pair
Definition (Exchangeable Pair)
(ZZ prime) is an exchangeable pair if (ZZ prime)d= (Z prime Z)
Definition (Matrix Stein Pair)
Let (ZZ prime) be an exchangeable pair and let Ψ Z rarr Hd be ameasurable function Define the random matrices
X = Ψ(Z) and X prime = Ψ(Z prime)
(XX prime) is a matrix Stein pair with scale factor α isin (0 1] if
E[X prime |Z] = (1minus α)X
Matrix Stein pairs are exchangeable pairs
Matrix Stein pairs always have zero mean
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 13 35
Background
The Conditional Variance
Definition (Conditional Variance)
Suppose that (XX prime) is a matrix Stein pair with scale factor αconstructed from the exchangeable pair (ZZ prime) The conditional
variance is the random matrix
∆X = ∆X(Z) =1
2αE[
(X minusX prime)2 |Z]
∆X is a stochastic estimate for the variance EX2
Take-home Message
Control over ∆X yields control over λmax(X)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 14 35
Exponential Tail Inequalities
Exponential Concentration for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a matrix Stein pair with X isin Hd Suppose that
∆X 4 cX + v I almost surely for c v ge 0
Then for all t ge 0
Pλmax(X) ge t le d middot exp minust22v + 2ct
Control over the conditional variance ∆X yields
Gaussian tail for λmax(X) for small t exponential tail for large t
When d = 1 improves scalar result of Chatterjee (2007)
The dimensional factor d cannot be removed
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 15 35
Exponential Tail Inequalities
Matrix Hoeffding Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let X =sum
k Yk for independent matrices in Hd satisfying
EYk = 0 and Y 2k 4 A2
k
for deterministic matrices (Ak)kge1 Define the variance parameter
σ2 =∥
∥
∥
sum
kA2k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot eminust22σ2
Improves upon the matrix Hoeffding inequality of Tropp (2011)Optimal constant 12 in the exponent
Can replace variance parameter with σ2 = 12
∥
∥
sum
k
(
A2k + EY 2
k
)∥
∥
Tighter than classical Hoeffding inequality (1963) when d = 1
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 16 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
1 Matrix Laplace transform method (Ahlswede amp Winter 2002)
Relate tail probability to the trace of the mgf of X
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ)
where m(θ) = E tr eθX
Problem eX+Y 6= eXeY when XY isin Hd
How to bound the trace mgf
Past approaches Golden-Thompson Liebrsquos concavity theorem
Chatterjeersquos strategy for scalar concentration
Control mgf growth by bounding derivative
mprime(θ) = E trXeθX for θ isin R
Rewrite using exchangeable pairs
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 17 35
Exponential Tail Inequalities
Method of Exchangeable Pairs
Lemma
Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying
E(X minusX prime)F (X) ltinfin
Then
E[X F (X)] =1
2αE[(X minusX prime)(F (X)minus F (X prime))] (1)
Intuition
Can characterize the distribution of a random matrix byintegrating it against a class of test functions F
Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 18 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
2 Method of Exchangeable Pairs
Rewrite the derivative of the trace mgf
mprime(θ) = E trXeθX =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
Goal Use the smoothness of F (X) = eθX to bound the derivative
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 19 35
Exponential Tail Inequalities
Mean Value Trace Inequality
Lemma (Mackey Jordan Chen Farrell and Tropp 2012)
Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that
tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1
2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]
Standard matrix functions If g R rarr R and
A = Q
λ1
λd
Qlowast then g(A) = Q
g(λ1)
g(λd)
Qlowast
Inequality does not hold without the traceFor exponential concentration we let g(A) = A and h(B) = eθB
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 20 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
le θ
4αE tr
[
(X minusX prime)2 middot(
eθX + eθXprime)]
=θ
2αE tr
[
(X minusX prime)2 middot eθX]
= θ middot E tr
[
1
2αE[
(X minusX prime)2 |Z]
middot eθX]
= θ middot E tr[
∆X eθX]
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 21 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) le θ middot E tr[
∆X eθX]
4 Conditional Variance Bound ∆X 4 cX + v I
Yields differential inequality
mprime(θ) le cθE tr[
X eθX]
+ vθE tr[
eθX]
= cθ middotmprime(θ) + vθ middotm(θ)
Solve to bound m(θ) and thereby bound
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ) le d middot exp minust22v + 2ct
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 22 35
Exponential Tail Inequalities
Refined Exponential Concentration
Relaxing the constraint ∆X 4 cX + v
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a bounded matrix Stein pair with X isin Hd Definethe function
r(ψ) =1
ψlogE tr(eψ∆Xd) for each ψ gt 0
Then for all t ge 0 and all ψ gt 0
Pλmax(X) ge t le d middot exp minust22r(ψ) + 2t
radicψ
r(ψ) measures typical magnitude of conditional variance
Eλmax(∆X) le infψgt0
[
r(ψ) + log dψ
]
When d = 1 improves scalar result of Chatterjee (2008)Proof extends to unbounded random matricesMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 23 35
Exponential Tail Inequalities
Matrix Bernstein Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Hd satisfying
EYk = 0 and Yk le R for each index k
Define the variance parameter
σ2 =∥
∥
∥
sum
kEY 2
k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot exp minust23σ2 + 2Rt
Gaussian tail controlled by improved variance when t is smallKey proof idea Apply refined concentration and boundr(ψ) = 1
ψlogE tr(eψ∆Xd) using unrefined concentration
Constants better than Oliveira (2009) worse than Tropp (2011)Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 24 35
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere E tr |X|2p ltinfin Then
(
E tr |X|2p)12p le
radic
2pminus 1 middot(
E tr∆pX
)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 25 35
Polynomial Moment Inequalities
Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
E tr(
sum
kεkAk
)2p
le (2pminus 1)p middot tr(
sum
kA2k
)p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 26 35
Dependent Sequences
Adding Dependence
1 MotivationMatrix CompletionMatrix Concentration
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent SequencesSums of Conditionally Zero-mean MatricesCombinatorial Sums
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 27 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Sum of Conditionally Zero-Mean Matrices)
Given a sequence of Hermitian matrices (Yk)nk=1 satisfying the
Conditional zero mean property E[Yk | (Yj)j 6=k] = 0
for all k define the random sum X =sumn
k=1Yk
Note (Yk)kge1 is a martingale difference sequence
Examples
Sums of independent centered random matricesMany sums of conditionally independent random matrices
Yk perpperp (Yj)j 6=k | Z and E[Yk |Z] = 0
Rademacher series with random matrix coefficients
X =sum
kεkWk
(Wk)kge1 Hermitian (εk)kge1 independent Rademacher
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 28 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Matrix Stein Pair for X =sumn
k=1 Yk
Let Y primek and Yk be conditionally iid given (Yj)j 6=k
Draw index K uniformly from 1 nDefine X prime = X + Y prime
K minus YK
Check Stein pair condition
E[X minusX prime | (Yj)jge1] = E[YK minus Y primeK | (Yj)jge1]
=1
n
sumn
k=1
(
Yk minus E[Y primek | (Yj)j 6=k]
)
=1
n
sumn
k=1Yk =
1
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 29 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Conditional Variance for X = Y minus EY
∆X =n
2middot E
[
(X minusX prime)2 | (Yj)jge1
]
=n
2middot E
[
(YK minus Y primeK)
2 | (Yj)jge1
]
=1
2
sumn
k=1
(
Y 2k + E[Y 2
k | (Yj)j 6=k])
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 30 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Given a deterministic array (Ajk)njk=1 of Hermitian matrices and a
uniformly random permutation π on 1 n define thecombinatorial matrix statistic
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Generalizes the scalar statistics studied by Hoeffding (1951)
Example
Sampling without replacement from B1 BnW =
sums
j=1Bπ(j)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 31 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Matrix Stein Pair for X = Y minus EY
Draw indices (JK) uniformly from 1 n2Define πprime = π (JK) and X prime =
sumnj=1Ajπprime(j) minus EY
Check Stein pair condition
E[X minusX prime | π] = E[
AJπ(J) +AKπ(K) minusAJπ(K) minusAKπ(J) | π]
=1
n2
sumn
jk=1Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
=2
n(Y minus EY ) =
2
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 32 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Conditional Variance for X = Y minus EY
∆X(π) =n
4E[
(X minusX prime)2 | π]
=1
4n
sumn
jk=1
[
Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
]2
41
n
sumn
jk=1
[
A2jπ(j) +A2
kπ(k) +A2jπ(k) +A2
kπ(j)
]
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 33 35
Extensions
Extensions
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = B
Beyond Sums
Matrix-valued functions satisfying a self-reproducing property
eg Matrix second-order Rademacher chaossum
jk εjεkAjk
Yields a dependent bounded differences inequality for matrices
Generalized Matrix Stein Pairs
Satisfy E[g(X)minus g(X prime) |Z] = αX almost surely forg R rarr R weakly increasingMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 34 35
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Bernstein S The theory of probabilities Gastehizdat Publishing House 1946
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Candes E J and Recht B Exact matrix completion via convex optimization Found Comput Math 9717ndash772 2009
Candes E J and Tao T The power of convex relaxation Near-optimal matrix completion IEEE Trans Info Theory 2009URL arXiv09031476 To appear Available at arXiv09031476
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Chatterjee S Concentration inequalities with exchangeable pairs PhD thesis Stanford University Palo Alto Feb 2008 URLarxivmath0507526vl
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available athttpwwwoptimization-onlineorgDB_FILE2011012898pdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Foygel R and Srebro N Concentration-based guarantees for low-rank matrix reconstruction Journal of Machine Learning
Research - Proceedings Track 19315ndash340 2011
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W A combinatorial central limit theorem Ann Math Statist 22558ndash566 1951
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 35 35
Extensions
References II
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Shawe-Taylor J Zemel R SBartlett P L Pereira F C N and Weinberger K Q (eds) Advances in Neural Information Processing Systems 24 pp1134ndash1142 2011
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs URL httparxivorgabs12016002 2012
Negahban S and Wainwright M J Restricted strong convexity and weighted matrix completion Optimal bounds with noisearXiv10092118v2[csIT] 2010
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B Simpler approach to matrix completion J Mach Learn Res 123413ndash3430 2011
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 36 35
- Motivation
-
- Matrix Completion
- Matrix Concentration
-
- Extensions
-
Motivation Matrix Completion
Can Convex Optimization Recover L0
Yes with high probability
Theorem (Recht 2011)
If L0 isin Rmtimesn has rank r and s amp βrn log2(n) entries are observeduniformly at random then (under some technical conditions) convexoptimization recovers L0 exactly with probability at least 1minus nminusβ
See also Gross (2011) Mackey Talwalkar and Jordan (2011)
Past results (Candes and Recht 2009 Candes and Tao 2009) requiredstronger assumptions and more intensive analysis
Streamlined approach reposes on a matrix variant of a classicalBernstein inequality (1946)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 6 35
Motivation Matrix Completion
Scalar Bernstein Inequality
Theorem (Bernstein 1946)
Let (Yk)kge1 be independent random variables in R satisfying
EYk = 0 and |Yk| le R for each index k
Define the variance parameter
σ2 =sum
kEY 2
k
Then for all t ge 0
P
∣
∣
∣
sum
kYk
∣
∣
∣ge t
le 2 middot exp minust22σ2 + 2Rt3
Gaussian decay controlled by variance when t is small
Exponential decay controlled by uniform bound for large t
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 7 35
Motivation Matrix Completion
Matrix Bernstein Inequality
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Rmtimesn satisfying
EYk = 0 and Yk le R for each index k
Define the variance parameter
σ2 = max(∥
∥
∥
sum
kEYkY
⊤k
∥
∥
∥∥
∥
∥
sum
kEY ⊤
k Yk
∥
∥
∥
)
Then for all t ge 0
P
∥
∥
∥
sum
kYk
∥
∥
∥ge t
le (m+ n) middot exp minust23σ2 + 2Rt
See also Tropp (2011) Oliveira (2009) Recht (2011)
Gaussian tail when t is small exponential tail for large t
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 8 35
Motivation Matrix Completion
Matrix Bernstein Inequality
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
For all t ge 0
P
∥
∥
∥
sum
kYk
∥
∥
∥ge t
le (m+ n) middot exp minust23σ2 + 2Rt
Consequences for matrix completion
Recht (2011) showed that uniform sampling of entries capturesmost of the information in incoherent low-rank matrices
Negahban and Wainwright (2010) showed that iid sampling ofentries captures most of the information in non-spiky (near)low-rank matrices
Foygel and Srebro (2011) characterized the generalization errorof convex MC through Rademacher complexity
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 9 35
Motivation Matrix Concentration
Concentration Inequalities
Matrix concentration
Pλmax(X minus EX) ge t le δ
Difficulty Matrix multiplication is not commutative
rArr eX+Y 6= eXeY
Past approaches (Ahlswede and Winter 2002 Oliveira 2009 Tropp 2011)
Rely on deep results from matrix analysis
Apply to sums of independent matrices and matrix martingales
This work
Steinrsquos method of exchangeable pairs (1972) as advanced byChatterjee (2007) for scalar concentrationrArr Improved exponential tail inequalities (Hoeffding Bernstein)rArr Polynomial moment inequalities (Khintchine Rosenthal)rArr Dependent sums and more general matrix functionals
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 10 35
Motivation Matrix Concentration
Roadmap
1 Motivation
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent Sequences
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 11 35
Background
Notation
Hermitian matrices Hd = A isin Cdtimesd A = AlowastAll matrices in this talk are Hermitian
Maximum eigenvalue λmax(middot)Trace trB the sum of the diagonal entries of B
Spectral norm B the maximum singular value of B
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 12 35
Background
Matrix Stein Pair
Definition (Exchangeable Pair)
(ZZ prime) is an exchangeable pair if (ZZ prime)d= (Z prime Z)
Definition (Matrix Stein Pair)
Let (ZZ prime) be an exchangeable pair and let Ψ Z rarr Hd be ameasurable function Define the random matrices
X = Ψ(Z) and X prime = Ψ(Z prime)
(XX prime) is a matrix Stein pair with scale factor α isin (0 1] if
E[X prime |Z] = (1minus α)X
Matrix Stein pairs are exchangeable pairs
Matrix Stein pairs always have zero mean
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 13 35
Background
The Conditional Variance
Definition (Conditional Variance)
Suppose that (XX prime) is a matrix Stein pair with scale factor αconstructed from the exchangeable pair (ZZ prime) The conditional
variance is the random matrix
∆X = ∆X(Z) =1
2αE[
(X minusX prime)2 |Z]
∆X is a stochastic estimate for the variance EX2
Take-home Message
Control over ∆X yields control over λmax(X)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 14 35
Exponential Tail Inequalities
Exponential Concentration for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a matrix Stein pair with X isin Hd Suppose that
∆X 4 cX + v I almost surely for c v ge 0
Then for all t ge 0
Pλmax(X) ge t le d middot exp minust22v + 2ct
Control over the conditional variance ∆X yields
Gaussian tail for λmax(X) for small t exponential tail for large t
When d = 1 improves scalar result of Chatterjee (2007)
The dimensional factor d cannot be removed
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 15 35
Exponential Tail Inequalities
Matrix Hoeffding Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let X =sum
k Yk for independent matrices in Hd satisfying
EYk = 0 and Y 2k 4 A2
k
for deterministic matrices (Ak)kge1 Define the variance parameter
σ2 =∥
∥
∥
sum
kA2k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot eminust22σ2
Improves upon the matrix Hoeffding inequality of Tropp (2011)Optimal constant 12 in the exponent
Can replace variance parameter with σ2 = 12
∥
∥
sum
k
(
A2k + EY 2
k
)∥
∥
Tighter than classical Hoeffding inequality (1963) when d = 1
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 16 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
1 Matrix Laplace transform method (Ahlswede amp Winter 2002)
Relate tail probability to the trace of the mgf of X
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ)
where m(θ) = E tr eθX
Problem eX+Y 6= eXeY when XY isin Hd
How to bound the trace mgf
Past approaches Golden-Thompson Liebrsquos concavity theorem
Chatterjeersquos strategy for scalar concentration
Control mgf growth by bounding derivative
mprime(θ) = E trXeθX for θ isin R
Rewrite using exchangeable pairs
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 17 35
Exponential Tail Inequalities
Method of Exchangeable Pairs
Lemma
Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying
E(X minusX prime)F (X) ltinfin
Then
E[X F (X)] =1
2αE[(X minusX prime)(F (X)minus F (X prime))] (1)
Intuition
Can characterize the distribution of a random matrix byintegrating it against a class of test functions F
Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 18 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
2 Method of Exchangeable Pairs
Rewrite the derivative of the trace mgf
mprime(θ) = E trXeθX =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
Goal Use the smoothness of F (X) = eθX to bound the derivative
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 19 35
Exponential Tail Inequalities
Mean Value Trace Inequality
Lemma (Mackey Jordan Chen Farrell and Tropp 2012)
Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that
tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1
2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]
Standard matrix functions If g R rarr R and
A = Q
λ1
λd
Qlowast then g(A) = Q
g(λ1)
g(λd)
Qlowast
Inequality does not hold without the traceFor exponential concentration we let g(A) = A and h(B) = eθB
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 20 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
le θ
4αE tr
[
(X minusX prime)2 middot(
eθX + eθXprime)]
=θ
2αE tr
[
(X minusX prime)2 middot eθX]
= θ middot E tr
[
1
2αE[
(X minusX prime)2 |Z]
middot eθX]
= θ middot E tr[
∆X eθX]
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 21 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) le θ middot E tr[
∆X eθX]
4 Conditional Variance Bound ∆X 4 cX + v I
Yields differential inequality
mprime(θ) le cθE tr[
X eθX]
+ vθE tr[
eθX]
= cθ middotmprime(θ) + vθ middotm(θ)
Solve to bound m(θ) and thereby bound
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ) le d middot exp minust22v + 2ct
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 22 35
Exponential Tail Inequalities
Refined Exponential Concentration
Relaxing the constraint ∆X 4 cX + v
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a bounded matrix Stein pair with X isin Hd Definethe function
r(ψ) =1
ψlogE tr(eψ∆Xd) for each ψ gt 0
Then for all t ge 0 and all ψ gt 0
Pλmax(X) ge t le d middot exp minust22r(ψ) + 2t
radicψ
r(ψ) measures typical magnitude of conditional variance
Eλmax(∆X) le infψgt0
[
r(ψ) + log dψ
]
When d = 1 improves scalar result of Chatterjee (2008)Proof extends to unbounded random matricesMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 23 35
Exponential Tail Inequalities
Matrix Bernstein Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Hd satisfying
EYk = 0 and Yk le R for each index k
Define the variance parameter
σ2 =∥
∥
∥
sum
kEY 2
k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot exp minust23σ2 + 2Rt
Gaussian tail controlled by improved variance when t is smallKey proof idea Apply refined concentration and boundr(ψ) = 1
ψlogE tr(eψ∆Xd) using unrefined concentration
Constants better than Oliveira (2009) worse than Tropp (2011)Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 24 35
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere E tr |X|2p ltinfin Then
(
E tr |X|2p)12p le
radic
2pminus 1 middot(
E tr∆pX
)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 25 35
Polynomial Moment Inequalities
Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
E tr(
sum
kεkAk
)2p
le (2pminus 1)p middot tr(
sum
kA2k
)p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 26 35
Dependent Sequences
Adding Dependence
1 MotivationMatrix CompletionMatrix Concentration
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent SequencesSums of Conditionally Zero-mean MatricesCombinatorial Sums
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 27 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Sum of Conditionally Zero-Mean Matrices)
Given a sequence of Hermitian matrices (Yk)nk=1 satisfying the
Conditional zero mean property E[Yk | (Yj)j 6=k] = 0
for all k define the random sum X =sumn
k=1Yk
Note (Yk)kge1 is a martingale difference sequence
Examples
Sums of independent centered random matricesMany sums of conditionally independent random matrices
Yk perpperp (Yj)j 6=k | Z and E[Yk |Z] = 0
Rademacher series with random matrix coefficients
X =sum
kεkWk
(Wk)kge1 Hermitian (εk)kge1 independent Rademacher
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 28 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Matrix Stein Pair for X =sumn
k=1 Yk
Let Y primek and Yk be conditionally iid given (Yj)j 6=k
Draw index K uniformly from 1 nDefine X prime = X + Y prime
K minus YK
Check Stein pair condition
E[X minusX prime | (Yj)jge1] = E[YK minus Y primeK | (Yj)jge1]
=1
n
sumn
k=1
(
Yk minus E[Y primek | (Yj)j 6=k]
)
=1
n
sumn
k=1Yk =
1
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 29 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Conditional Variance for X = Y minus EY
∆X =n
2middot E
[
(X minusX prime)2 | (Yj)jge1
]
=n
2middot E
[
(YK minus Y primeK)
2 | (Yj)jge1
]
=1
2
sumn
k=1
(
Y 2k + E[Y 2
k | (Yj)j 6=k])
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 30 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Given a deterministic array (Ajk)njk=1 of Hermitian matrices and a
uniformly random permutation π on 1 n define thecombinatorial matrix statistic
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Generalizes the scalar statistics studied by Hoeffding (1951)
Example
Sampling without replacement from B1 BnW =
sums
j=1Bπ(j)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 31 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Matrix Stein Pair for X = Y minus EY
Draw indices (JK) uniformly from 1 n2Define πprime = π (JK) and X prime =
sumnj=1Ajπprime(j) minus EY
Check Stein pair condition
E[X minusX prime | π] = E[
AJπ(J) +AKπ(K) minusAJπ(K) minusAKπ(J) | π]
=1
n2
sumn
jk=1Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
=2
n(Y minus EY ) =
2
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 32 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Conditional Variance for X = Y minus EY
∆X(π) =n
4E[
(X minusX prime)2 | π]
=1
4n
sumn
jk=1
[
Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
]2
41
n
sumn
jk=1
[
A2jπ(j) +A2
kπ(k) +A2jπ(k) +A2
kπ(j)
]
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 33 35
Extensions
Extensions
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = B
Beyond Sums
Matrix-valued functions satisfying a self-reproducing property
eg Matrix second-order Rademacher chaossum
jk εjεkAjk
Yields a dependent bounded differences inequality for matrices
Generalized Matrix Stein Pairs
Satisfy E[g(X)minus g(X prime) |Z] = αX almost surely forg R rarr R weakly increasingMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 34 35
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Bernstein S The theory of probabilities Gastehizdat Publishing House 1946
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Candes E J and Recht B Exact matrix completion via convex optimization Found Comput Math 9717ndash772 2009
Candes E J and Tao T The power of convex relaxation Near-optimal matrix completion IEEE Trans Info Theory 2009URL arXiv09031476 To appear Available at arXiv09031476
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Chatterjee S Concentration inequalities with exchangeable pairs PhD thesis Stanford University Palo Alto Feb 2008 URLarxivmath0507526vl
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available athttpwwwoptimization-onlineorgDB_FILE2011012898pdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Foygel R and Srebro N Concentration-based guarantees for low-rank matrix reconstruction Journal of Machine Learning
Research - Proceedings Track 19315ndash340 2011
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W A combinatorial central limit theorem Ann Math Statist 22558ndash566 1951
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 35 35
Extensions
References II
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Shawe-Taylor J Zemel R SBartlett P L Pereira F C N and Weinberger K Q (eds) Advances in Neural Information Processing Systems 24 pp1134ndash1142 2011
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs URL httparxivorgabs12016002 2012
Negahban S and Wainwright M J Restricted strong convexity and weighted matrix completion Optimal bounds with noisearXiv10092118v2[csIT] 2010
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B Simpler approach to matrix completion J Mach Learn Res 123413ndash3430 2011
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 36 35
- Motivation
-
- Matrix Completion
- Matrix Concentration
-
- Extensions
-
Motivation Matrix Completion
Scalar Bernstein Inequality
Theorem (Bernstein 1946)
Let (Yk)kge1 be independent random variables in R satisfying
EYk = 0 and |Yk| le R for each index k
Define the variance parameter
σ2 =sum
kEY 2
k
Then for all t ge 0
P
∣
∣
∣
sum
kYk
∣
∣
∣ge t
le 2 middot exp minust22σ2 + 2Rt3
Gaussian decay controlled by variance when t is small
Exponential decay controlled by uniform bound for large t
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 7 35
Motivation Matrix Completion
Matrix Bernstein Inequality
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Rmtimesn satisfying
EYk = 0 and Yk le R for each index k
Define the variance parameter
σ2 = max(∥
∥
∥
sum
kEYkY
⊤k
∥
∥
∥∥
∥
∥
sum
kEY ⊤
k Yk
∥
∥
∥
)
Then for all t ge 0
P
∥
∥
∥
sum
kYk
∥
∥
∥ge t
le (m+ n) middot exp minust23σ2 + 2Rt
See also Tropp (2011) Oliveira (2009) Recht (2011)
Gaussian tail when t is small exponential tail for large t
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 8 35
Motivation Matrix Completion
Matrix Bernstein Inequality
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
For all t ge 0
P
∥
∥
∥
sum
kYk
∥
∥
∥ge t
le (m+ n) middot exp minust23σ2 + 2Rt
Consequences for matrix completion
Recht (2011) showed that uniform sampling of entries capturesmost of the information in incoherent low-rank matrices
Negahban and Wainwright (2010) showed that iid sampling ofentries captures most of the information in non-spiky (near)low-rank matrices
Foygel and Srebro (2011) characterized the generalization errorof convex MC through Rademacher complexity
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 9 35
Motivation Matrix Concentration
Concentration Inequalities
Matrix concentration
Pλmax(X minus EX) ge t le δ
Difficulty Matrix multiplication is not commutative
rArr eX+Y 6= eXeY
Past approaches (Ahlswede and Winter 2002 Oliveira 2009 Tropp 2011)
Rely on deep results from matrix analysis
Apply to sums of independent matrices and matrix martingales
This work
Steinrsquos method of exchangeable pairs (1972) as advanced byChatterjee (2007) for scalar concentrationrArr Improved exponential tail inequalities (Hoeffding Bernstein)rArr Polynomial moment inequalities (Khintchine Rosenthal)rArr Dependent sums and more general matrix functionals
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 10 35
Motivation Matrix Concentration
Roadmap
1 Motivation
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent Sequences
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 11 35
Background
Notation
Hermitian matrices Hd = A isin Cdtimesd A = AlowastAll matrices in this talk are Hermitian
Maximum eigenvalue λmax(middot)Trace trB the sum of the diagonal entries of B
Spectral norm B the maximum singular value of B
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 12 35
Background
Matrix Stein Pair
Definition (Exchangeable Pair)
(ZZ prime) is an exchangeable pair if (ZZ prime)d= (Z prime Z)
Definition (Matrix Stein Pair)
Let (ZZ prime) be an exchangeable pair and let Ψ Z rarr Hd be ameasurable function Define the random matrices
X = Ψ(Z) and X prime = Ψ(Z prime)
(XX prime) is a matrix Stein pair with scale factor α isin (0 1] if
E[X prime |Z] = (1minus α)X
Matrix Stein pairs are exchangeable pairs
Matrix Stein pairs always have zero mean
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 13 35
Background
The Conditional Variance
Definition (Conditional Variance)
Suppose that (XX prime) is a matrix Stein pair with scale factor αconstructed from the exchangeable pair (ZZ prime) The conditional
variance is the random matrix
∆X = ∆X(Z) =1
2αE[
(X minusX prime)2 |Z]
∆X is a stochastic estimate for the variance EX2
Take-home Message
Control over ∆X yields control over λmax(X)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 14 35
Exponential Tail Inequalities
Exponential Concentration for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a matrix Stein pair with X isin Hd Suppose that
∆X 4 cX + v I almost surely for c v ge 0
Then for all t ge 0
Pλmax(X) ge t le d middot exp minust22v + 2ct
Control over the conditional variance ∆X yields
Gaussian tail for λmax(X) for small t exponential tail for large t
When d = 1 improves scalar result of Chatterjee (2007)
The dimensional factor d cannot be removed
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 15 35
Exponential Tail Inequalities
Matrix Hoeffding Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let X =sum
k Yk for independent matrices in Hd satisfying
EYk = 0 and Y 2k 4 A2
k
for deterministic matrices (Ak)kge1 Define the variance parameter
σ2 =∥
∥
∥
sum
kA2k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot eminust22σ2
Improves upon the matrix Hoeffding inequality of Tropp (2011)Optimal constant 12 in the exponent
Can replace variance parameter with σ2 = 12
∥
∥
sum
k
(
A2k + EY 2
k
)∥
∥
Tighter than classical Hoeffding inequality (1963) when d = 1
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 16 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
1 Matrix Laplace transform method (Ahlswede amp Winter 2002)
Relate tail probability to the trace of the mgf of X
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ)
where m(θ) = E tr eθX
Problem eX+Y 6= eXeY when XY isin Hd
How to bound the trace mgf
Past approaches Golden-Thompson Liebrsquos concavity theorem
Chatterjeersquos strategy for scalar concentration
Control mgf growth by bounding derivative
mprime(θ) = E trXeθX for θ isin R
Rewrite using exchangeable pairs
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 17 35
Exponential Tail Inequalities
Method of Exchangeable Pairs
Lemma
Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying
E(X minusX prime)F (X) ltinfin
Then
E[X F (X)] =1
2αE[(X minusX prime)(F (X)minus F (X prime))] (1)
Intuition
Can characterize the distribution of a random matrix byintegrating it against a class of test functions F
Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 18 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
2 Method of Exchangeable Pairs
Rewrite the derivative of the trace mgf
mprime(θ) = E trXeθX =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
Goal Use the smoothness of F (X) = eθX to bound the derivative
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 19 35
Exponential Tail Inequalities
Mean Value Trace Inequality
Lemma (Mackey Jordan Chen Farrell and Tropp 2012)
Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that
tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1
2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]
Standard matrix functions If g R rarr R and
A = Q
λ1
λd
Qlowast then g(A) = Q
g(λ1)
g(λd)
Qlowast
Inequality does not hold without the traceFor exponential concentration we let g(A) = A and h(B) = eθB
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 20 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
le θ
4αE tr
[
(X minusX prime)2 middot(
eθX + eθXprime)]
=θ
2αE tr
[
(X minusX prime)2 middot eθX]
= θ middot E tr
[
1
2αE[
(X minusX prime)2 |Z]
middot eθX]
= θ middot E tr[
∆X eθX]
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 21 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) le θ middot E tr[
∆X eθX]
4 Conditional Variance Bound ∆X 4 cX + v I
Yields differential inequality
mprime(θ) le cθE tr[
X eθX]
+ vθE tr[
eθX]
= cθ middotmprime(θ) + vθ middotm(θ)
Solve to bound m(θ) and thereby bound
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ) le d middot exp minust22v + 2ct
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 22 35
Exponential Tail Inequalities
Refined Exponential Concentration
Relaxing the constraint ∆X 4 cX + v
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a bounded matrix Stein pair with X isin Hd Definethe function
r(ψ) =1
ψlogE tr(eψ∆Xd) for each ψ gt 0
Then for all t ge 0 and all ψ gt 0
Pλmax(X) ge t le d middot exp minust22r(ψ) + 2t
radicψ
r(ψ) measures typical magnitude of conditional variance
Eλmax(∆X) le infψgt0
[
r(ψ) + log dψ
]
When d = 1 improves scalar result of Chatterjee (2008)Proof extends to unbounded random matricesMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 23 35
Exponential Tail Inequalities
Matrix Bernstein Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Hd satisfying
EYk = 0 and Yk le R for each index k
Define the variance parameter
σ2 =∥
∥
∥
sum
kEY 2
k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot exp minust23σ2 + 2Rt
Gaussian tail controlled by improved variance when t is smallKey proof idea Apply refined concentration and boundr(ψ) = 1
ψlogE tr(eψ∆Xd) using unrefined concentration
Constants better than Oliveira (2009) worse than Tropp (2011)Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 24 35
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere E tr |X|2p ltinfin Then
(
E tr |X|2p)12p le
radic
2pminus 1 middot(
E tr∆pX
)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 25 35
Polynomial Moment Inequalities
Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
E tr(
sum
kεkAk
)2p
le (2pminus 1)p middot tr(
sum
kA2k
)p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 26 35
Dependent Sequences
Adding Dependence
1 MotivationMatrix CompletionMatrix Concentration
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent SequencesSums of Conditionally Zero-mean MatricesCombinatorial Sums
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 27 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Sum of Conditionally Zero-Mean Matrices)
Given a sequence of Hermitian matrices (Yk)nk=1 satisfying the
Conditional zero mean property E[Yk | (Yj)j 6=k] = 0
for all k define the random sum X =sumn
k=1Yk
Note (Yk)kge1 is a martingale difference sequence
Examples
Sums of independent centered random matricesMany sums of conditionally independent random matrices
Yk perpperp (Yj)j 6=k | Z and E[Yk |Z] = 0
Rademacher series with random matrix coefficients
X =sum
kεkWk
(Wk)kge1 Hermitian (εk)kge1 independent Rademacher
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 28 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Matrix Stein Pair for X =sumn
k=1 Yk
Let Y primek and Yk be conditionally iid given (Yj)j 6=k
Draw index K uniformly from 1 nDefine X prime = X + Y prime
K minus YK
Check Stein pair condition
E[X minusX prime | (Yj)jge1] = E[YK minus Y primeK | (Yj)jge1]
=1
n
sumn
k=1
(
Yk minus E[Y primek | (Yj)j 6=k]
)
=1
n
sumn
k=1Yk =
1
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 29 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Conditional Variance for X = Y minus EY
∆X =n
2middot E
[
(X minusX prime)2 | (Yj)jge1
]
=n
2middot E
[
(YK minus Y primeK)
2 | (Yj)jge1
]
=1
2
sumn
k=1
(
Y 2k + E[Y 2
k | (Yj)j 6=k])
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 30 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Given a deterministic array (Ajk)njk=1 of Hermitian matrices and a
uniformly random permutation π on 1 n define thecombinatorial matrix statistic
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Generalizes the scalar statistics studied by Hoeffding (1951)
Example
Sampling without replacement from B1 BnW =
sums
j=1Bπ(j)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 31 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Matrix Stein Pair for X = Y minus EY
Draw indices (JK) uniformly from 1 n2Define πprime = π (JK) and X prime =
sumnj=1Ajπprime(j) minus EY
Check Stein pair condition
E[X minusX prime | π] = E[
AJπ(J) +AKπ(K) minusAJπ(K) minusAKπ(J) | π]
=1
n2
sumn
jk=1Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
=2
n(Y minus EY ) =
2
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 32 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Conditional Variance for X = Y minus EY
∆X(π) =n
4E[
(X minusX prime)2 | π]
=1
4n
sumn
jk=1
[
Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
]2
41
n
sumn
jk=1
[
A2jπ(j) +A2
kπ(k) +A2jπ(k) +A2
kπ(j)
]
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 33 35
Extensions
Extensions
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = B
Beyond Sums
Matrix-valued functions satisfying a self-reproducing property
eg Matrix second-order Rademacher chaossum
jk εjεkAjk
Yields a dependent bounded differences inequality for matrices
Generalized Matrix Stein Pairs
Satisfy E[g(X)minus g(X prime) |Z] = αX almost surely forg R rarr R weakly increasingMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 34 35
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Bernstein S The theory of probabilities Gastehizdat Publishing House 1946
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Candes E J and Recht B Exact matrix completion via convex optimization Found Comput Math 9717ndash772 2009
Candes E J and Tao T The power of convex relaxation Near-optimal matrix completion IEEE Trans Info Theory 2009URL arXiv09031476 To appear Available at arXiv09031476
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Chatterjee S Concentration inequalities with exchangeable pairs PhD thesis Stanford University Palo Alto Feb 2008 URLarxivmath0507526vl
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available athttpwwwoptimization-onlineorgDB_FILE2011012898pdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Foygel R and Srebro N Concentration-based guarantees for low-rank matrix reconstruction Journal of Machine Learning
Research - Proceedings Track 19315ndash340 2011
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W A combinatorial central limit theorem Ann Math Statist 22558ndash566 1951
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 35 35
Extensions
References II
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Shawe-Taylor J Zemel R SBartlett P L Pereira F C N and Weinberger K Q (eds) Advances in Neural Information Processing Systems 24 pp1134ndash1142 2011
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs URL httparxivorgabs12016002 2012
Negahban S and Wainwright M J Restricted strong convexity and weighted matrix completion Optimal bounds with noisearXiv10092118v2[csIT] 2010
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B Simpler approach to matrix completion J Mach Learn Res 123413ndash3430 2011
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 36 35
- Motivation
-
- Matrix Completion
- Matrix Concentration
-
- Extensions
-
Motivation Matrix Completion
Matrix Bernstein Inequality
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Rmtimesn satisfying
EYk = 0 and Yk le R for each index k
Define the variance parameter
σ2 = max(∥
∥
∥
sum
kEYkY
⊤k
∥
∥
∥∥
∥
∥
sum
kEY ⊤
k Yk
∥
∥
∥
)
Then for all t ge 0
P
∥
∥
∥
sum
kYk
∥
∥
∥ge t
le (m+ n) middot exp minust23σ2 + 2Rt
See also Tropp (2011) Oliveira (2009) Recht (2011)
Gaussian tail when t is small exponential tail for large t
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 8 35
Motivation Matrix Completion
Matrix Bernstein Inequality
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
For all t ge 0
P
∥
∥
∥
sum
kYk
∥
∥
∥ge t
le (m+ n) middot exp minust23σ2 + 2Rt
Consequences for matrix completion
Recht (2011) showed that uniform sampling of entries capturesmost of the information in incoherent low-rank matrices
Negahban and Wainwright (2010) showed that iid sampling ofentries captures most of the information in non-spiky (near)low-rank matrices
Foygel and Srebro (2011) characterized the generalization errorof convex MC through Rademacher complexity
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 9 35
Motivation Matrix Concentration
Concentration Inequalities
Matrix concentration
Pλmax(X minus EX) ge t le δ
Difficulty Matrix multiplication is not commutative
rArr eX+Y 6= eXeY
Past approaches (Ahlswede and Winter 2002 Oliveira 2009 Tropp 2011)
Rely on deep results from matrix analysis
Apply to sums of independent matrices and matrix martingales
This work
Steinrsquos method of exchangeable pairs (1972) as advanced byChatterjee (2007) for scalar concentrationrArr Improved exponential tail inequalities (Hoeffding Bernstein)rArr Polynomial moment inequalities (Khintchine Rosenthal)rArr Dependent sums and more general matrix functionals
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 10 35
Motivation Matrix Concentration
Roadmap
1 Motivation
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent Sequences
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 11 35
Background
Notation
Hermitian matrices Hd = A isin Cdtimesd A = AlowastAll matrices in this talk are Hermitian
Maximum eigenvalue λmax(middot)Trace trB the sum of the diagonal entries of B
Spectral norm B the maximum singular value of B
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 12 35
Background
Matrix Stein Pair
Definition (Exchangeable Pair)
(ZZ prime) is an exchangeable pair if (ZZ prime)d= (Z prime Z)
Definition (Matrix Stein Pair)
Let (ZZ prime) be an exchangeable pair and let Ψ Z rarr Hd be ameasurable function Define the random matrices
X = Ψ(Z) and X prime = Ψ(Z prime)
(XX prime) is a matrix Stein pair with scale factor α isin (0 1] if
E[X prime |Z] = (1minus α)X
Matrix Stein pairs are exchangeable pairs
Matrix Stein pairs always have zero mean
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 13 35
Background
The Conditional Variance
Definition (Conditional Variance)
Suppose that (XX prime) is a matrix Stein pair with scale factor αconstructed from the exchangeable pair (ZZ prime) The conditional
variance is the random matrix
∆X = ∆X(Z) =1
2αE[
(X minusX prime)2 |Z]
∆X is a stochastic estimate for the variance EX2
Take-home Message
Control over ∆X yields control over λmax(X)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 14 35
Exponential Tail Inequalities
Exponential Concentration for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a matrix Stein pair with X isin Hd Suppose that
∆X 4 cX + v I almost surely for c v ge 0
Then for all t ge 0
Pλmax(X) ge t le d middot exp minust22v + 2ct
Control over the conditional variance ∆X yields
Gaussian tail for λmax(X) for small t exponential tail for large t
When d = 1 improves scalar result of Chatterjee (2007)
The dimensional factor d cannot be removed
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 15 35
Exponential Tail Inequalities
Matrix Hoeffding Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let X =sum
k Yk for independent matrices in Hd satisfying
EYk = 0 and Y 2k 4 A2
k
for deterministic matrices (Ak)kge1 Define the variance parameter
σ2 =∥
∥
∥
sum
kA2k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot eminust22σ2
Improves upon the matrix Hoeffding inequality of Tropp (2011)Optimal constant 12 in the exponent
Can replace variance parameter with σ2 = 12
∥
∥
sum
k
(
A2k + EY 2
k
)∥
∥
Tighter than classical Hoeffding inequality (1963) when d = 1
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 16 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
1 Matrix Laplace transform method (Ahlswede amp Winter 2002)
Relate tail probability to the trace of the mgf of X
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ)
where m(θ) = E tr eθX
Problem eX+Y 6= eXeY when XY isin Hd
How to bound the trace mgf
Past approaches Golden-Thompson Liebrsquos concavity theorem
Chatterjeersquos strategy for scalar concentration
Control mgf growth by bounding derivative
mprime(θ) = E trXeθX for θ isin R
Rewrite using exchangeable pairs
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 17 35
Exponential Tail Inequalities
Method of Exchangeable Pairs
Lemma
Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying
E(X minusX prime)F (X) ltinfin
Then
E[X F (X)] =1
2αE[(X minusX prime)(F (X)minus F (X prime))] (1)
Intuition
Can characterize the distribution of a random matrix byintegrating it against a class of test functions F
Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 18 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
2 Method of Exchangeable Pairs
Rewrite the derivative of the trace mgf
mprime(θ) = E trXeθX =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
Goal Use the smoothness of F (X) = eθX to bound the derivative
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 19 35
Exponential Tail Inequalities
Mean Value Trace Inequality
Lemma (Mackey Jordan Chen Farrell and Tropp 2012)
Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that
tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1
2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]
Standard matrix functions If g R rarr R and
A = Q
λ1
λd
Qlowast then g(A) = Q
g(λ1)
g(λd)
Qlowast
Inequality does not hold without the traceFor exponential concentration we let g(A) = A and h(B) = eθB
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 20 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
le θ
4αE tr
[
(X minusX prime)2 middot(
eθX + eθXprime)]
=θ
2αE tr
[
(X minusX prime)2 middot eθX]
= θ middot E tr
[
1
2αE[
(X minusX prime)2 |Z]
middot eθX]
= θ middot E tr[
∆X eθX]
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 21 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) le θ middot E tr[
∆X eθX]
4 Conditional Variance Bound ∆X 4 cX + v I
Yields differential inequality
mprime(θ) le cθE tr[
X eθX]
+ vθE tr[
eθX]
= cθ middotmprime(θ) + vθ middotm(θ)
Solve to bound m(θ) and thereby bound
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ) le d middot exp minust22v + 2ct
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 22 35
Exponential Tail Inequalities
Refined Exponential Concentration
Relaxing the constraint ∆X 4 cX + v
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a bounded matrix Stein pair with X isin Hd Definethe function
r(ψ) =1
ψlogE tr(eψ∆Xd) for each ψ gt 0
Then for all t ge 0 and all ψ gt 0
Pλmax(X) ge t le d middot exp minust22r(ψ) + 2t
radicψ
r(ψ) measures typical magnitude of conditional variance
Eλmax(∆X) le infψgt0
[
r(ψ) + log dψ
]
When d = 1 improves scalar result of Chatterjee (2008)Proof extends to unbounded random matricesMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 23 35
Exponential Tail Inequalities
Matrix Bernstein Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Hd satisfying
EYk = 0 and Yk le R for each index k
Define the variance parameter
σ2 =∥
∥
∥
sum
kEY 2
k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot exp minust23σ2 + 2Rt
Gaussian tail controlled by improved variance when t is smallKey proof idea Apply refined concentration and boundr(ψ) = 1
ψlogE tr(eψ∆Xd) using unrefined concentration
Constants better than Oliveira (2009) worse than Tropp (2011)Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 24 35
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere E tr |X|2p ltinfin Then
(
E tr |X|2p)12p le
radic
2pminus 1 middot(
E tr∆pX
)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 25 35
Polynomial Moment Inequalities
Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
E tr(
sum
kεkAk
)2p
le (2pminus 1)p middot tr(
sum
kA2k
)p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 26 35
Dependent Sequences
Adding Dependence
1 MotivationMatrix CompletionMatrix Concentration
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent SequencesSums of Conditionally Zero-mean MatricesCombinatorial Sums
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 27 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Sum of Conditionally Zero-Mean Matrices)
Given a sequence of Hermitian matrices (Yk)nk=1 satisfying the
Conditional zero mean property E[Yk | (Yj)j 6=k] = 0
for all k define the random sum X =sumn
k=1Yk
Note (Yk)kge1 is a martingale difference sequence
Examples
Sums of independent centered random matricesMany sums of conditionally independent random matrices
Yk perpperp (Yj)j 6=k | Z and E[Yk |Z] = 0
Rademacher series with random matrix coefficients
X =sum
kεkWk
(Wk)kge1 Hermitian (εk)kge1 independent Rademacher
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 28 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Matrix Stein Pair for X =sumn
k=1 Yk
Let Y primek and Yk be conditionally iid given (Yj)j 6=k
Draw index K uniformly from 1 nDefine X prime = X + Y prime
K minus YK
Check Stein pair condition
E[X minusX prime | (Yj)jge1] = E[YK minus Y primeK | (Yj)jge1]
=1
n
sumn
k=1
(
Yk minus E[Y primek | (Yj)j 6=k]
)
=1
n
sumn
k=1Yk =
1
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 29 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Conditional Variance for X = Y minus EY
∆X =n
2middot E
[
(X minusX prime)2 | (Yj)jge1
]
=n
2middot E
[
(YK minus Y primeK)
2 | (Yj)jge1
]
=1
2
sumn
k=1
(
Y 2k + E[Y 2
k | (Yj)j 6=k])
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 30 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Given a deterministic array (Ajk)njk=1 of Hermitian matrices and a
uniformly random permutation π on 1 n define thecombinatorial matrix statistic
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Generalizes the scalar statistics studied by Hoeffding (1951)
Example
Sampling without replacement from B1 BnW =
sums
j=1Bπ(j)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 31 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Matrix Stein Pair for X = Y minus EY
Draw indices (JK) uniformly from 1 n2Define πprime = π (JK) and X prime =
sumnj=1Ajπprime(j) minus EY
Check Stein pair condition
E[X minusX prime | π] = E[
AJπ(J) +AKπ(K) minusAJπ(K) minusAKπ(J) | π]
=1
n2
sumn
jk=1Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
=2
n(Y minus EY ) =
2
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 32 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Conditional Variance for X = Y minus EY
∆X(π) =n
4E[
(X minusX prime)2 | π]
=1
4n
sumn
jk=1
[
Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
]2
41
n
sumn
jk=1
[
A2jπ(j) +A2
kπ(k) +A2jπ(k) +A2
kπ(j)
]
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 33 35
Extensions
Extensions
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = B
Beyond Sums
Matrix-valued functions satisfying a self-reproducing property
eg Matrix second-order Rademacher chaossum
jk εjεkAjk
Yields a dependent bounded differences inequality for matrices
Generalized Matrix Stein Pairs
Satisfy E[g(X)minus g(X prime) |Z] = αX almost surely forg R rarr R weakly increasingMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 34 35
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Bernstein S The theory of probabilities Gastehizdat Publishing House 1946
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Candes E J and Recht B Exact matrix completion via convex optimization Found Comput Math 9717ndash772 2009
Candes E J and Tao T The power of convex relaxation Near-optimal matrix completion IEEE Trans Info Theory 2009URL arXiv09031476 To appear Available at arXiv09031476
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Chatterjee S Concentration inequalities with exchangeable pairs PhD thesis Stanford University Palo Alto Feb 2008 URLarxivmath0507526vl
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available athttpwwwoptimization-onlineorgDB_FILE2011012898pdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Foygel R and Srebro N Concentration-based guarantees for low-rank matrix reconstruction Journal of Machine Learning
Research - Proceedings Track 19315ndash340 2011
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W A combinatorial central limit theorem Ann Math Statist 22558ndash566 1951
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 35 35
Extensions
References II
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Shawe-Taylor J Zemel R SBartlett P L Pereira F C N and Weinberger K Q (eds) Advances in Neural Information Processing Systems 24 pp1134ndash1142 2011
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs URL httparxivorgabs12016002 2012
Negahban S and Wainwright M J Restricted strong convexity and weighted matrix completion Optimal bounds with noisearXiv10092118v2[csIT] 2010
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B Simpler approach to matrix completion J Mach Learn Res 123413ndash3430 2011
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 36 35
- Motivation
-
- Matrix Completion
- Matrix Concentration
-
- Extensions
-
Motivation Matrix Completion
Matrix Bernstein Inequality
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
For all t ge 0
P
∥
∥
∥
sum
kYk
∥
∥
∥ge t
le (m+ n) middot exp minust23σ2 + 2Rt
Consequences for matrix completion
Recht (2011) showed that uniform sampling of entries capturesmost of the information in incoherent low-rank matrices
Negahban and Wainwright (2010) showed that iid sampling ofentries captures most of the information in non-spiky (near)low-rank matrices
Foygel and Srebro (2011) characterized the generalization errorof convex MC through Rademacher complexity
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 9 35
Motivation Matrix Concentration
Concentration Inequalities
Matrix concentration
Pλmax(X minus EX) ge t le δ
Difficulty Matrix multiplication is not commutative
rArr eX+Y 6= eXeY
Past approaches (Ahlswede and Winter 2002 Oliveira 2009 Tropp 2011)
Rely on deep results from matrix analysis
Apply to sums of independent matrices and matrix martingales
This work
Steinrsquos method of exchangeable pairs (1972) as advanced byChatterjee (2007) for scalar concentrationrArr Improved exponential tail inequalities (Hoeffding Bernstein)rArr Polynomial moment inequalities (Khintchine Rosenthal)rArr Dependent sums and more general matrix functionals
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 10 35
Motivation Matrix Concentration
Roadmap
1 Motivation
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent Sequences
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 11 35
Background
Notation
Hermitian matrices Hd = A isin Cdtimesd A = AlowastAll matrices in this talk are Hermitian
Maximum eigenvalue λmax(middot)Trace trB the sum of the diagonal entries of B
Spectral norm B the maximum singular value of B
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 12 35
Background
Matrix Stein Pair
Definition (Exchangeable Pair)
(ZZ prime) is an exchangeable pair if (ZZ prime)d= (Z prime Z)
Definition (Matrix Stein Pair)
Let (ZZ prime) be an exchangeable pair and let Ψ Z rarr Hd be ameasurable function Define the random matrices
X = Ψ(Z) and X prime = Ψ(Z prime)
(XX prime) is a matrix Stein pair with scale factor α isin (0 1] if
E[X prime |Z] = (1minus α)X
Matrix Stein pairs are exchangeable pairs
Matrix Stein pairs always have zero mean
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 13 35
Background
The Conditional Variance
Definition (Conditional Variance)
Suppose that (XX prime) is a matrix Stein pair with scale factor αconstructed from the exchangeable pair (ZZ prime) The conditional
variance is the random matrix
∆X = ∆X(Z) =1
2αE[
(X minusX prime)2 |Z]
∆X is a stochastic estimate for the variance EX2
Take-home Message
Control over ∆X yields control over λmax(X)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 14 35
Exponential Tail Inequalities
Exponential Concentration for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a matrix Stein pair with X isin Hd Suppose that
∆X 4 cX + v I almost surely for c v ge 0
Then for all t ge 0
Pλmax(X) ge t le d middot exp minust22v + 2ct
Control over the conditional variance ∆X yields
Gaussian tail for λmax(X) for small t exponential tail for large t
When d = 1 improves scalar result of Chatterjee (2007)
The dimensional factor d cannot be removed
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 15 35
Exponential Tail Inequalities
Matrix Hoeffding Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let X =sum
k Yk for independent matrices in Hd satisfying
EYk = 0 and Y 2k 4 A2
k
for deterministic matrices (Ak)kge1 Define the variance parameter
σ2 =∥
∥
∥
sum
kA2k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot eminust22σ2
Improves upon the matrix Hoeffding inequality of Tropp (2011)Optimal constant 12 in the exponent
Can replace variance parameter with σ2 = 12
∥
∥
sum
k
(
A2k + EY 2
k
)∥
∥
Tighter than classical Hoeffding inequality (1963) when d = 1
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 16 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
1 Matrix Laplace transform method (Ahlswede amp Winter 2002)
Relate tail probability to the trace of the mgf of X
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ)
where m(θ) = E tr eθX
Problem eX+Y 6= eXeY when XY isin Hd
How to bound the trace mgf
Past approaches Golden-Thompson Liebrsquos concavity theorem
Chatterjeersquos strategy for scalar concentration
Control mgf growth by bounding derivative
mprime(θ) = E trXeθX for θ isin R
Rewrite using exchangeable pairs
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 17 35
Exponential Tail Inequalities
Method of Exchangeable Pairs
Lemma
Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying
E(X minusX prime)F (X) ltinfin
Then
E[X F (X)] =1
2αE[(X minusX prime)(F (X)minus F (X prime))] (1)
Intuition
Can characterize the distribution of a random matrix byintegrating it against a class of test functions F
Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 18 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
2 Method of Exchangeable Pairs
Rewrite the derivative of the trace mgf
mprime(θ) = E trXeθX =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
Goal Use the smoothness of F (X) = eθX to bound the derivative
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 19 35
Exponential Tail Inequalities
Mean Value Trace Inequality
Lemma (Mackey Jordan Chen Farrell and Tropp 2012)
Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that
tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1
2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]
Standard matrix functions If g R rarr R and
A = Q
λ1
λd
Qlowast then g(A) = Q
g(λ1)
g(λd)
Qlowast
Inequality does not hold without the traceFor exponential concentration we let g(A) = A and h(B) = eθB
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 20 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
le θ
4αE tr
[
(X minusX prime)2 middot(
eθX + eθXprime)]
=θ
2αE tr
[
(X minusX prime)2 middot eθX]
= θ middot E tr
[
1
2αE[
(X minusX prime)2 |Z]
middot eθX]
= θ middot E tr[
∆X eθX]
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 21 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) le θ middot E tr[
∆X eθX]
4 Conditional Variance Bound ∆X 4 cX + v I
Yields differential inequality
mprime(θ) le cθE tr[
X eθX]
+ vθE tr[
eθX]
= cθ middotmprime(θ) + vθ middotm(θ)
Solve to bound m(θ) and thereby bound
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ) le d middot exp minust22v + 2ct
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 22 35
Exponential Tail Inequalities
Refined Exponential Concentration
Relaxing the constraint ∆X 4 cX + v
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a bounded matrix Stein pair with X isin Hd Definethe function
r(ψ) =1
ψlogE tr(eψ∆Xd) for each ψ gt 0
Then for all t ge 0 and all ψ gt 0
Pλmax(X) ge t le d middot exp minust22r(ψ) + 2t
radicψ
r(ψ) measures typical magnitude of conditional variance
Eλmax(∆X) le infψgt0
[
r(ψ) + log dψ
]
When d = 1 improves scalar result of Chatterjee (2008)Proof extends to unbounded random matricesMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 23 35
Exponential Tail Inequalities
Matrix Bernstein Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Hd satisfying
EYk = 0 and Yk le R for each index k
Define the variance parameter
σ2 =∥
∥
∥
sum
kEY 2
k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot exp minust23σ2 + 2Rt
Gaussian tail controlled by improved variance when t is smallKey proof idea Apply refined concentration and boundr(ψ) = 1
ψlogE tr(eψ∆Xd) using unrefined concentration
Constants better than Oliveira (2009) worse than Tropp (2011)Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 24 35
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere E tr |X|2p ltinfin Then
(
E tr |X|2p)12p le
radic
2pminus 1 middot(
E tr∆pX
)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 25 35
Polynomial Moment Inequalities
Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
E tr(
sum
kεkAk
)2p
le (2pminus 1)p middot tr(
sum
kA2k
)p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 26 35
Dependent Sequences
Adding Dependence
1 MotivationMatrix CompletionMatrix Concentration
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent SequencesSums of Conditionally Zero-mean MatricesCombinatorial Sums
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 27 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Sum of Conditionally Zero-Mean Matrices)
Given a sequence of Hermitian matrices (Yk)nk=1 satisfying the
Conditional zero mean property E[Yk | (Yj)j 6=k] = 0
for all k define the random sum X =sumn
k=1Yk
Note (Yk)kge1 is a martingale difference sequence
Examples
Sums of independent centered random matricesMany sums of conditionally independent random matrices
Yk perpperp (Yj)j 6=k | Z and E[Yk |Z] = 0
Rademacher series with random matrix coefficients
X =sum
kεkWk
(Wk)kge1 Hermitian (εk)kge1 independent Rademacher
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 28 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Matrix Stein Pair for X =sumn
k=1 Yk
Let Y primek and Yk be conditionally iid given (Yj)j 6=k
Draw index K uniformly from 1 nDefine X prime = X + Y prime
K minus YK
Check Stein pair condition
E[X minusX prime | (Yj)jge1] = E[YK minus Y primeK | (Yj)jge1]
=1
n
sumn
k=1
(
Yk minus E[Y primek | (Yj)j 6=k]
)
=1
n
sumn
k=1Yk =
1
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 29 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Conditional Variance for X = Y minus EY
∆X =n
2middot E
[
(X minusX prime)2 | (Yj)jge1
]
=n
2middot E
[
(YK minus Y primeK)
2 | (Yj)jge1
]
=1
2
sumn
k=1
(
Y 2k + E[Y 2
k | (Yj)j 6=k])
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 30 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Given a deterministic array (Ajk)njk=1 of Hermitian matrices and a
uniformly random permutation π on 1 n define thecombinatorial matrix statistic
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Generalizes the scalar statistics studied by Hoeffding (1951)
Example
Sampling without replacement from B1 BnW =
sums
j=1Bπ(j)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 31 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Matrix Stein Pair for X = Y minus EY
Draw indices (JK) uniformly from 1 n2Define πprime = π (JK) and X prime =
sumnj=1Ajπprime(j) minus EY
Check Stein pair condition
E[X minusX prime | π] = E[
AJπ(J) +AKπ(K) minusAJπ(K) minusAKπ(J) | π]
=1
n2
sumn
jk=1Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
=2
n(Y minus EY ) =
2
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 32 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Conditional Variance for X = Y minus EY
∆X(π) =n
4E[
(X minusX prime)2 | π]
=1
4n
sumn
jk=1
[
Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
]2
41
n
sumn
jk=1
[
A2jπ(j) +A2
kπ(k) +A2jπ(k) +A2
kπ(j)
]
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 33 35
Extensions
Extensions
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = B
Beyond Sums
Matrix-valued functions satisfying a self-reproducing property
eg Matrix second-order Rademacher chaossum
jk εjεkAjk
Yields a dependent bounded differences inequality for matrices
Generalized Matrix Stein Pairs
Satisfy E[g(X)minus g(X prime) |Z] = αX almost surely forg R rarr R weakly increasingMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 34 35
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Bernstein S The theory of probabilities Gastehizdat Publishing House 1946
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Candes E J and Recht B Exact matrix completion via convex optimization Found Comput Math 9717ndash772 2009
Candes E J and Tao T The power of convex relaxation Near-optimal matrix completion IEEE Trans Info Theory 2009URL arXiv09031476 To appear Available at arXiv09031476
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Chatterjee S Concentration inequalities with exchangeable pairs PhD thesis Stanford University Palo Alto Feb 2008 URLarxivmath0507526vl
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available athttpwwwoptimization-onlineorgDB_FILE2011012898pdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Foygel R and Srebro N Concentration-based guarantees for low-rank matrix reconstruction Journal of Machine Learning
Research - Proceedings Track 19315ndash340 2011
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W A combinatorial central limit theorem Ann Math Statist 22558ndash566 1951
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 35 35
Extensions
References II
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Shawe-Taylor J Zemel R SBartlett P L Pereira F C N and Weinberger K Q (eds) Advances in Neural Information Processing Systems 24 pp1134ndash1142 2011
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs URL httparxivorgabs12016002 2012
Negahban S and Wainwright M J Restricted strong convexity and weighted matrix completion Optimal bounds with noisearXiv10092118v2[csIT] 2010
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B Simpler approach to matrix completion J Mach Learn Res 123413ndash3430 2011
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 36 35
- Motivation
-
- Matrix Completion
- Matrix Concentration
-
- Extensions
-
Motivation Matrix Concentration
Concentration Inequalities
Matrix concentration
Pλmax(X minus EX) ge t le δ
Difficulty Matrix multiplication is not commutative
rArr eX+Y 6= eXeY
Past approaches (Ahlswede and Winter 2002 Oliveira 2009 Tropp 2011)
Rely on deep results from matrix analysis
Apply to sums of independent matrices and matrix martingales
This work
Steinrsquos method of exchangeable pairs (1972) as advanced byChatterjee (2007) for scalar concentrationrArr Improved exponential tail inequalities (Hoeffding Bernstein)rArr Polynomial moment inequalities (Khintchine Rosenthal)rArr Dependent sums and more general matrix functionals
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 10 35
Motivation Matrix Concentration
Roadmap
1 Motivation
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent Sequences
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 11 35
Background
Notation
Hermitian matrices Hd = A isin Cdtimesd A = AlowastAll matrices in this talk are Hermitian
Maximum eigenvalue λmax(middot)Trace trB the sum of the diagonal entries of B
Spectral norm B the maximum singular value of B
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 12 35
Background
Matrix Stein Pair
Definition (Exchangeable Pair)
(ZZ prime) is an exchangeable pair if (ZZ prime)d= (Z prime Z)
Definition (Matrix Stein Pair)
Let (ZZ prime) be an exchangeable pair and let Ψ Z rarr Hd be ameasurable function Define the random matrices
X = Ψ(Z) and X prime = Ψ(Z prime)
(XX prime) is a matrix Stein pair with scale factor α isin (0 1] if
E[X prime |Z] = (1minus α)X
Matrix Stein pairs are exchangeable pairs
Matrix Stein pairs always have zero mean
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 13 35
Background
The Conditional Variance
Definition (Conditional Variance)
Suppose that (XX prime) is a matrix Stein pair with scale factor αconstructed from the exchangeable pair (ZZ prime) The conditional
variance is the random matrix
∆X = ∆X(Z) =1
2αE[
(X minusX prime)2 |Z]
∆X is a stochastic estimate for the variance EX2
Take-home Message
Control over ∆X yields control over λmax(X)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 14 35
Exponential Tail Inequalities
Exponential Concentration for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a matrix Stein pair with X isin Hd Suppose that
∆X 4 cX + v I almost surely for c v ge 0
Then for all t ge 0
Pλmax(X) ge t le d middot exp minust22v + 2ct
Control over the conditional variance ∆X yields
Gaussian tail for λmax(X) for small t exponential tail for large t
When d = 1 improves scalar result of Chatterjee (2007)
The dimensional factor d cannot be removed
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 15 35
Exponential Tail Inequalities
Matrix Hoeffding Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let X =sum
k Yk for independent matrices in Hd satisfying
EYk = 0 and Y 2k 4 A2
k
for deterministic matrices (Ak)kge1 Define the variance parameter
σ2 =∥
∥
∥
sum
kA2k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot eminust22σ2
Improves upon the matrix Hoeffding inequality of Tropp (2011)Optimal constant 12 in the exponent
Can replace variance parameter with σ2 = 12
∥
∥
sum
k
(
A2k + EY 2
k
)∥
∥
Tighter than classical Hoeffding inequality (1963) when d = 1
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 16 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
1 Matrix Laplace transform method (Ahlswede amp Winter 2002)
Relate tail probability to the trace of the mgf of X
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ)
where m(θ) = E tr eθX
Problem eX+Y 6= eXeY when XY isin Hd
How to bound the trace mgf
Past approaches Golden-Thompson Liebrsquos concavity theorem
Chatterjeersquos strategy for scalar concentration
Control mgf growth by bounding derivative
mprime(θ) = E trXeθX for θ isin R
Rewrite using exchangeable pairs
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 17 35
Exponential Tail Inequalities
Method of Exchangeable Pairs
Lemma
Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying
E(X minusX prime)F (X) ltinfin
Then
E[X F (X)] =1
2αE[(X minusX prime)(F (X)minus F (X prime))] (1)
Intuition
Can characterize the distribution of a random matrix byintegrating it against a class of test functions F
Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 18 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
2 Method of Exchangeable Pairs
Rewrite the derivative of the trace mgf
mprime(θ) = E trXeθX =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
Goal Use the smoothness of F (X) = eθX to bound the derivative
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 19 35
Exponential Tail Inequalities
Mean Value Trace Inequality
Lemma (Mackey Jordan Chen Farrell and Tropp 2012)
Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that
tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1
2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]
Standard matrix functions If g R rarr R and
A = Q
λ1
λd
Qlowast then g(A) = Q
g(λ1)
g(λd)
Qlowast
Inequality does not hold without the traceFor exponential concentration we let g(A) = A and h(B) = eθB
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 20 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
le θ
4αE tr
[
(X minusX prime)2 middot(
eθX + eθXprime)]
=θ
2αE tr
[
(X minusX prime)2 middot eθX]
= θ middot E tr
[
1
2αE[
(X minusX prime)2 |Z]
middot eθX]
= θ middot E tr[
∆X eθX]
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 21 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) le θ middot E tr[
∆X eθX]
4 Conditional Variance Bound ∆X 4 cX + v I
Yields differential inequality
mprime(θ) le cθE tr[
X eθX]
+ vθE tr[
eθX]
= cθ middotmprime(θ) + vθ middotm(θ)
Solve to bound m(θ) and thereby bound
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ) le d middot exp minust22v + 2ct
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 22 35
Exponential Tail Inequalities
Refined Exponential Concentration
Relaxing the constraint ∆X 4 cX + v
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a bounded matrix Stein pair with X isin Hd Definethe function
r(ψ) =1
ψlogE tr(eψ∆Xd) for each ψ gt 0
Then for all t ge 0 and all ψ gt 0
Pλmax(X) ge t le d middot exp minust22r(ψ) + 2t
radicψ
r(ψ) measures typical magnitude of conditional variance
Eλmax(∆X) le infψgt0
[
r(ψ) + log dψ
]
When d = 1 improves scalar result of Chatterjee (2008)Proof extends to unbounded random matricesMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 23 35
Exponential Tail Inequalities
Matrix Bernstein Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Hd satisfying
EYk = 0 and Yk le R for each index k
Define the variance parameter
σ2 =∥
∥
∥
sum
kEY 2
k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot exp minust23σ2 + 2Rt
Gaussian tail controlled by improved variance when t is smallKey proof idea Apply refined concentration and boundr(ψ) = 1
ψlogE tr(eψ∆Xd) using unrefined concentration
Constants better than Oliveira (2009) worse than Tropp (2011)Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 24 35
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere E tr |X|2p ltinfin Then
(
E tr |X|2p)12p le
radic
2pminus 1 middot(
E tr∆pX
)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 25 35
Polynomial Moment Inequalities
Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
E tr(
sum
kεkAk
)2p
le (2pminus 1)p middot tr(
sum
kA2k
)p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 26 35
Dependent Sequences
Adding Dependence
1 MotivationMatrix CompletionMatrix Concentration
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent SequencesSums of Conditionally Zero-mean MatricesCombinatorial Sums
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 27 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Sum of Conditionally Zero-Mean Matrices)
Given a sequence of Hermitian matrices (Yk)nk=1 satisfying the
Conditional zero mean property E[Yk | (Yj)j 6=k] = 0
for all k define the random sum X =sumn
k=1Yk
Note (Yk)kge1 is a martingale difference sequence
Examples
Sums of independent centered random matricesMany sums of conditionally independent random matrices
Yk perpperp (Yj)j 6=k | Z and E[Yk |Z] = 0
Rademacher series with random matrix coefficients
X =sum
kεkWk
(Wk)kge1 Hermitian (εk)kge1 independent Rademacher
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 28 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Matrix Stein Pair for X =sumn
k=1 Yk
Let Y primek and Yk be conditionally iid given (Yj)j 6=k
Draw index K uniformly from 1 nDefine X prime = X + Y prime
K minus YK
Check Stein pair condition
E[X minusX prime | (Yj)jge1] = E[YK minus Y primeK | (Yj)jge1]
=1
n
sumn
k=1
(
Yk minus E[Y primek | (Yj)j 6=k]
)
=1
n
sumn
k=1Yk =
1
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 29 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Conditional Variance for X = Y minus EY
∆X =n
2middot E
[
(X minusX prime)2 | (Yj)jge1
]
=n
2middot E
[
(YK minus Y primeK)
2 | (Yj)jge1
]
=1
2
sumn
k=1
(
Y 2k + E[Y 2
k | (Yj)j 6=k])
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 30 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Given a deterministic array (Ajk)njk=1 of Hermitian matrices and a
uniformly random permutation π on 1 n define thecombinatorial matrix statistic
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Generalizes the scalar statistics studied by Hoeffding (1951)
Example
Sampling without replacement from B1 BnW =
sums
j=1Bπ(j)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 31 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Matrix Stein Pair for X = Y minus EY
Draw indices (JK) uniformly from 1 n2Define πprime = π (JK) and X prime =
sumnj=1Ajπprime(j) minus EY
Check Stein pair condition
E[X minusX prime | π] = E[
AJπ(J) +AKπ(K) minusAJπ(K) minusAKπ(J) | π]
=1
n2
sumn
jk=1Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
=2
n(Y minus EY ) =
2
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 32 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Conditional Variance for X = Y minus EY
∆X(π) =n
4E[
(X minusX prime)2 | π]
=1
4n
sumn
jk=1
[
Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
]2
41
n
sumn
jk=1
[
A2jπ(j) +A2
kπ(k) +A2jπ(k) +A2
kπ(j)
]
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 33 35
Extensions
Extensions
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = B
Beyond Sums
Matrix-valued functions satisfying a self-reproducing property
eg Matrix second-order Rademacher chaossum
jk εjεkAjk
Yields a dependent bounded differences inequality for matrices
Generalized Matrix Stein Pairs
Satisfy E[g(X)minus g(X prime) |Z] = αX almost surely forg R rarr R weakly increasingMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 34 35
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Bernstein S The theory of probabilities Gastehizdat Publishing House 1946
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Candes E J and Recht B Exact matrix completion via convex optimization Found Comput Math 9717ndash772 2009
Candes E J and Tao T The power of convex relaxation Near-optimal matrix completion IEEE Trans Info Theory 2009URL arXiv09031476 To appear Available at arXiv09031476
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Chatterjee S Concentration inequalities with exchangeable pairs PhD thesis Stanford University Palo Alto Feb 2008 URLarxivmath0507526vl
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available athttpwwwoptimization-onlineorgDB_FILE2011012898pdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Foygel R and Srebro N Concentration-based guarantees for low-rank matrix reconstruction Journal of Machine Learning
Research - Proceedings Track 19315ndash340 2011
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W A combinatorial central limit theorem Ann Math Statist 22558ndash566 1951
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 35 35
Extensions
References II
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Shawe-Taylor J Zemel R SBartlett P L Pereira F C N and Weinberger K Q (eds) Advances in Neural Information Processing Systems 24 pp1134ndash1142 2011
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs URL httparxivorgabs12016002 2012
Negahban S and Wainwright M J Restricted strong convexity and weighted matrix completion Optimal bounds with noisearXiv10092118v2[csIT] 2010
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B Simpler approach to matrix completion J Mach Learn Res 123413ndash3430 2011
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 36 35
- Motivation
-
- Matrix Completion
- Matrix Concentration
-
- Extensions
-
Motivation Matrix Concentration
Roadmap
1 Motivation
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent Sequences
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 11 35
Background
Notation
Hermitian matrices Hd = A isin Cdtimesd A = AlowastAll matrices in this talk are Hermitian
Maximum eigenvalue λmax(middot)Trace trB the sum of the diagonal entries of B
Spectral norm B the maximum singular value of B
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 12 35
Background
Matrix Stein Pair
Definition (Exchangeable Pair)
(ZZ prime) is an exchangeable pair if (ZZ prime)d= (Z prime Z)
Definition (Matrix Stein Pair)
Let (ZZ prime) be an exchangeable pair and let Ψ Z rarr Hd be ameasurable function Define the random matrices
X = Ψ(Z) and X prime = Ψ(Z prime)
(XX prime) is a matrix Stein pair with scale factor α isin (0 1] if
E[X prime |Z] = (1minus α)X
Matrix Stein pairs are exchangeable pairs
Matrix Stein pairs always have zero mean
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 13 35
Background
The Conditional Variance
Definition (Conditional Variance)
Suppose that (XX prime) is a matrix Stein pair with scale factor αconstructed from the exchangeable pair (ZZ prime) The conditional
variance is the random matrix
∆X = ∆X(Z) =1
2αE[
(X minusX prime)2 |Z]
∆X is a stochastic estimate for the variance EX2
Take-home Message
Control over ∆X yields control over λmax(X)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 14 35
Exponential Tail Inequalities
Exponential Concentration for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a matrix Stein pair with X isin Hd Suppose that
∆X 4 cX + v I almost surely for c v ge 0
Then for all t ge 0
Pλmax(X) ge t le d middot exp minust22v + 2ct
Control over the conditional variance ∆X yields
Gaussian tail for λmax(X) for small t exponential tail for large t
When d = 1 improves scalar result of Chatterjee (2007)
The dimensional factor d cannot be removed
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 15 35
Exponential Tail Inequalities
Matrix Hoeffding Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let X =sum
k Yk for independent matrices in Hd satisfying
EYk = 0 and Y 2k 4 A2
k
for deterministic matrices (Ak)kge1 Define the variance parameter
σ2 =∥
∥
∥
sum
kA2k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot eminust22σ2
Improves upon the matrix Hoeffding inequality of Tropp (2011)Optimal constant 12 in the exponent
Can replace variance parameter with σ2 = 12
∥
∥
sum
k
(
A2k + EY 2
k
)∥
∥
Tighter than classical Hoeffding inequality (1963) when d = 1
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 16 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
1 Matrix Laplace transform method (Ahlswede amp Winter 2002)
Relate tail probability to the trace of the mgf of X
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ)
where m(θ) = E tr eθX
Problem eX+Y 6= eXeY when XY isin Hd
How to bound the trace mgf
Past approaches Golden-Thompson Liebrsquos concavity theorem
Chatterjeersquos strategy for scalar concentration
Control mgf growth by bounding derivative
mprime(θ) = E trXeθX for θ isin R
Rewrite using exchangeable pairs
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 17 35
Exponential Tail Inequalities
Method of Exchangeable Pairs
Lemma
Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying
E(X minusX prime)F (X) ltinfin
Then
E[X F (X)] =1
2αE[(X minusX prime)(F (X)minus F (X prime))] (1)
Intuition
Can characterize the distribution of a random matrix byintegrating it against a class of test functions F
Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 18 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
2 Method of Exchangeable Pairs
Rewrite the derivative of the trace mgf
mprime(θ) = E trXeθX =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
Goal Use the smoothness of F (X) = eθX to bound the derivative
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 19 35
Exponential Tail Inequalities
Mean Value Trace Inequality
Lemma (Mackey Jordan Chen Farrell and Tropp 2012)
Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that
tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1
2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]
Standard matrix functions If g R rarr R and
A = Q
λ1
λd
Qlowast then g(A) = Q
g(λ1)
g(λd)
Qlowast
Inequality does not hold without the traceFor exponential concentration we let g(A) = A and h(B) = eθB
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 20 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
le θ
4αE tr
[
(X minusX prime)2 middot(
eθX + eθXprime)]
=θ
2αE tr
[
(X minusX prime)2 middot eθX]
= θ middot E tr
[
1
2αE[
(X minusX prime)2 |Z]
middot eθX]
= θ middot E tr[
∆X eθX]
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 21 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) le θ middot E tr[
∆X eθX]
4 Conditional Variance Bound ∆X 4 cX + v I
Yields differential inequality
mprime(θ) le cθE tr[
X eθX]
+ vθE tr[
eθX]
= cθ middotmprime(θ) + vθ middotm(θ)
Solve to bound m(θ) and thereby bound
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ) le d middot exp minust22v + 2ct
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 22 35
Exponential Tail Inequalities
Refined Exponential Concentration
Relaxing the constraint ∆X 4 cX + v
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a bounded matrix Stein pair with X isin Hd Definethe function
r(ψ) =1
ψlogE tr(eψ∆Xd) for each ψ gt 0
Then for all t ge 0 and all ψ gt 0
Pλmax(X) ge t le d middot exp minust22r(ψ) + 2t
radicψ
r(ψ) measures typical magnitude of conditional variance
Eλmax(∆X) le infψgt0
[
r(ψ) + log dψ
]
When d = 1 improves scalar result of Chatterjee (2008)Proof extends to unbounded random matricesMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 23 35
Exponential Tail Inequalities
Matrix Bernstein Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Hd satisfying
EYk = 0 and Yk le R for each index k
Define the variance parameter
σ2 =∥
∥
∥
sum
kEY 2
k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot exp minust23σ2 + 2Rt
Gaussian tail controlled by improved variance when t is smallKey proof idea Apply refined concentration and boundr(ψ) = 1
ψlogE tr(eψ∆Xd) using unrefined concentration
Constants better than Oliveira (2009) worse than Tropp (2011)Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 24 35
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere E tr |X|2p ltinfin Then
(
E tr |X|2p)12p le
radic
2pminus 1 middot(
E tr∆pX
)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 25 35
Polynomial Moment Inequalities
Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
E tr(
sum
kεkAk
)2p
le (2pminus 1)p middot tr(
sum
kA2k
)p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 26 35
Dependent Sequences
Adding Dependence
1 MotivationMatrix CompletionMatrix Concentration
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent SequencesSums of Conditionally Zero-mean MatricesCombinatorial Sums
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 27 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Sum of Conditionally Zero-Mean Matrices)
Given a sequence of Hermitian matrices (Yk)nk=1 satisfying the
Conditional zero mean property E[Yk | (Yj)j 6=k] = 0
for all k define the random sum X =sumn
k=1Yk
Note (Yk)kge1 is a martingale difference sequence
Examples
Sums of independent centered random matricesMany sums of conditionally independent random matrices
Yk perpperp (Yj)j 6=k | Z and E[Yk |Z] = 0
Rademacher series with random matrix coefficients
X =sum
kεkWk
(Wk)kge1 Hermitian (εk)kge1 independent Rademacher
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 28 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Matrix Stein Pair for X =sumn
k=1 Yk
Let Y primek and Yk be conditionally iid given (Yj)j 6=k
Draw index K uniformly from 1 nDefine X prime = X + Y prime
K minus YK
Check Stein pair condition
E[X minusX prime | (Yj)jge1] = E[YK minus Y primeK | (Yj)jge1]
=1
n
sumn
k=1
(
Yk minus E[Y primek | (Yj)j 6=k]
)
=1
n
sumn
k=1Yk =
1
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 29 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Conditional Variance for X = Y minus EY
∆X =n
2middot E
[
(X minusX prime)2 | (Yj)jge1
]
=n
2middot E
[
(YK minus Y primeK)
2 | (Yj)jge1
]
=1
2
sumn
k=1
(
Y 2k + E[Y 2
k | (Yj)j 6=k])
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 30 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Given a deterministic array (Ajk)njk=1 of Hermitian matrices and a
uniformly random permutation π on 1 n define thecombinatorial matrix statistic
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Generalizes the scalar statistics studied by Hoeffding (1951)
Example
Sampling without replacement from B1 BnW =
sums
j=1Bπ(j)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 31 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Matrix Stein Pair for X = Y minus EY
Draw indices (JK) uniformly from 1 n2Define πprime = π (JK) and X prime =
sumnj=1Ajπprime(j) minus EY
Check Stein pair condition
E[X minusX prime | π] = E[
AJπ(J) +AKπ(K) minusAJπ(K) minusAKπ(J) | π]
=1
n2
sumn
jk=1Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
=2
n(Y minus EY ) =
2
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 32 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Conditional Variance for X = Y minus EY
∆X(π) =n
4E[
(X minusX prime)2 | π]
=1
4n
sumn
jk=1
[
Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
]2
41
n
sumn
jk=1
[
A2jπ(j) +A2
kπ(k) +A2jπ(k) +A2
kπ(j)
]
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 33 35
Extensions
Extensions
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = B
Beyond Sums
Matrix-valued functions satisfying a self-reproducing property
eg Matrix second-order Rademacher chaossum
jk εjεkAjk
Yields a dependent bounded differences inequality for matrices
Generalized Matrix Stein Pairs
Satisfy E[g(X)minus g(X prime) |Z] = αX almost surely forg R rarr R weakly increasingMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 34 35
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Bernstein S The theory of probabilities Gastehizdat Publishing House 1946
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Candes E J and Recht B Exact matrix completion via convex optimization Found Comput Math 9717ndash772 2009
Candes E J and Tao T The power of convex relaxation Near-optimal matrix completion IEEE Trans Info Theory 2009URL arXiv09031476 To appear Available at arXiv09031476
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Chatterjee S Concentration inequalities with exchangeable pairs PhD thesis Stanford University Palo Alto Feb 2008 URLarxivmath0507526vl
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available athttpwwwoptimization-onlineorgDB_FILE2011012898pdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Foygel R and Srebro N Concentration-based guarantees for low-rank matrix reconstruction Journal of Machine Learning
Research - Proceedings Track 19315ndash340 2011
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W A combinatorial central limit theorem Ann Math Statist 22558ndash566 1951
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 35 35
Extensions
References II
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Shawe-Taylor J Zemel R SBartlett P L Pereira F C N and Weinberger K Q (eds) Advances in Neural Information Processing Systems 24 pp1134ndash1142 2011
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs URL httparxivorgabs12016002 2012
Negahban S and Wainwright M J Restricted strong convexity and weighted matrix completion Optimal bounds with noisearXiv10092118v2[csIT] 2010
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B Simpler approach to matrix completion J Mach Learn Res 123413ndash3430 2011
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 36 35
- Motivation
-
- Matrix Completion
- Matrix Concentration
-
- Extensions
-
Background
Notation
Hermitian matrices Hd = A isin Cdtimesd A = AlowastAll matrices in this talk are Hermitian
Maximum eigenvalue λmax(middot)Trace trB the sum of the diagonal entries of B
Spectral norm B the maximum singular value of B
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 12 35
Background
Matrix Stein Pair
Definition (Exchangeable Pair)
(ZZ prime) is an exchangeable pair if (ZZ prime)d= (Z prime Z)
Definition (Matrix Stein Pair)
Let (ZZ prime) be an exchangeable pair and let Ψ Z rarr Hd be ameasurable function Define the random matrices
X = Ψ(Z) and X prime = Ψ(Z prime)
(XX prime) is a matrix Stein pair with scale factor α isin (0 1] if
E[X prime |Z] = (1minus α)X
Matrix Stein pairs are exchangeable pairs
Matrix Stein pairs always have zero mean
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 13 35
Background
The Conditional Variance
Definition (Conditional Variance)
Suppose that (XX prime) is a matrix Stein pair with scale factor αconstructed from the exchangeable pair (ZZ prime) The conditional
variance is the random matrix
∆X = ∆X(Z) =1
2αE[
(X minusX prime)2 |Z]
∆X is a stochastic estimate for the variance EX2
Take-home Message
Control over ∆X yields control over λmax(X)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 14 35
Exponential Tail Inequalities
Exponential Concentration for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a matrix Stein pair with X isin Hd Suppose that
∆X 4 cX + v I almost surely for c v ge 0
Then for all t ge 0
Pλmax(X) ge t le d middot exp minust22v + 2ct
Control over the conditional variance ∆X yields
Gaussian tail for λmax(X) for small t exponential tail for large t
When d = 1 improves scalar result of Chatterjee (2007)
The dimensional factor d cannot be removed
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 15 35
Exponential Tail Inequalities
Matrix Hoeffding Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let X =sum
k Yk for independent matrices in Hd satisfying
EYk = 0 and Y 2k 4 A2
k
for deterministic matrices (Ak)kge1 Define the variance parameter
σ2 =∥
∥
∥
sum
kA2k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot eminust22σ2
Improves upon the matrix Hoeffding inequality of Tropp (2011)Optimal constant 12 in the exponent
Can replace variance parameter with σ2 = 12
∥
∥
sum
k
(
A2k + EY 2
k
)∥
∥
Tighter than classical Hoeffding inequality (1963) when d = 1
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 16 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
1 Matrix Laplace transform method (Ahlswede amp Winter 2002)
Relate tail probability to the trace of the mgf of X
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ)
where m(θ) = E tr eθX
Problem eX+Y 6= eXeY when XY isin Hd
How to bound the trace mgf
Past approaches Golden-Thompson Liebrsquos concavity theorem
Chatterjeersquos strategy for scalar concentration
Control mgf growth by bounding derivative
mprime(θ) = E trXeθX for θ isin R
Rewrite using exchangeable pairs
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 17 35
Exponential Tail Inequalities
Method of Exchangeable Pairs
Lemma
Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying
E(X minusX prime)F (X) ltinfin
Then
E[X F (X)] =1
2αE[(X minusX prime)(F (X)minus F (X prime))] (1)
Intuition
Can characterize the distribution of a random matrix byintegrating it against a class of test functions F
Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 18 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
2 Method of Exchangeable Pairs
Rewrite the derivative of the trace mgf
mprime(θ) = E trXeθX =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
Goal Use the smoothness of F (X) = eθX to bound the derivative
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 19 35
Exponential Tail Inequalities
Mean Value Trace Inequality
Lemma (Mackey Jordan Chen Farrell and Tropp 2012)
Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that
tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1
2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]
Standard matrix functions If g R rarr R and
A = Q
λ1
λd
Qlowast then g(A) = Q
g(λ1)
g(λd)
Qlowast
Inequality does not hold without the traceFor exponential concentration we let g(A) = A and h(B) = eθB
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 20 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
le θ
4αE tr
[
(X minusX prime)2 middot(
eθX + eθXprime)]
=θ
2αE tr
[
(X minusX prime)2 middot eθX]
= θ middot E tr
[
1
2αE[
(X minusX prime)2 |Z]
middot eθX]
= θ middot E tr[
∆X eθX]
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 21 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) le θ middot E tr[
∆X eθX]
4 Conditional Variance Bound ∆X 4 cX + v I
Yields differential inequality
mprime(θ) le cθE tr[
X eθX]
+ vθE tr[
eθX]
= cθ middotmprime(θ) + vθ middotm(θ)
Solve to bound m(θ) and thereby bound
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ) le d middot exp minust22v + 2ct
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 22 35
Exponential Tail Inequalities
Refined Exponential Concentration
Relaxing the constraint ∆X 4 cX + v
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a bounded matrix Stein pair with X isin Hd Definethe function
r(ψ) =1
ψlogE tr(eψ∆Xd) for each ψ gt 0
Then for all t ge 0 and all ψ gt 0
Pλmax(X) ge t le d middot exp minust22r(ψ) + 2t
radicψ
r(ψ) measures typical magnitude of conditional variance
Eλmax(∆X) le infψgt0
[
r(ψ) + log dψ
]
When d = 1 improves scalar result of Chatterjee (2008)Proof extends to unbounded random matricesMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 23 35
Exponential Tail Inequalities
Matrix Bernstein Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Hd satisfying
EYk = 0 and Yk le R for each index k
Define the variance parameter
σ2 =∥
∥
∥
sum
kEY 2
k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot exp minust23σ2 + 2Rt
Gaussian tail controlled by improved variance when t is smallKey proof idea Apply refined concentration and boundr(ψ) = 1
ψlogE tr(eψ∆Xd) using unrefined concentration
Constants better than Oliveira (2009) worse than Tropp (2011)Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 24 35
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere E tr |X|2p ltinfin Then
(
E tr |X|2p)12p le
radic
2pminus 1 middot(
E tr∆pX
)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 25 35
Polynomial Moment Inequalities
Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
E tr(
sum
kεkAk
)2p
le (2pminus 1)p middot tr(
sum
kA2k
)p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 26 35
Dependent Sequences
Adding Dependence
1 MotivationMatrix CompletionMatrix Concentration
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent SequencesSums of Conditionally Zero-mean MatricesCombinatorial Sums
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 27 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Sum of Conditionally Zero-Mean Matrices)
Given a sequence of Hermitian matrices (Yk)nk=1 satisfying the
Conditional zero mean property E[Yk | (Yj)j 6=k] = 0
for all k define the random sum X =sumn
k=1Yk
Note (Yk)kge1 is a martingale difference sequence
Examples
Sums of independent centered random matricesMany sums of conditionally independent random matrices
Yk perpperp (Yj)j 6=k | Z and E[Yk |Z] = 0
Rademacher series with random matrix coefficients
X =sum
kεkWk
(Wk)kge1 Hermitian (εk)kge1 independent Rademacher
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 28 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Matrix Stein Pair for X =sumn
k=1 Yk
Let Y primek and Yk be conditionally iid given (Yj)j 6=k
Draw index K uniformly from 1 nDefine X prime = X + Y prime
K minus YK
Check Stein pair condition
E[X minusX prime | (Yj)jge1] = E[YK minus Y primeK | (Yj)jge1]
=1
n
sumn
k=1
(
Yk minus E[Y primek | (Yj)j 6=k]
)
=1
n
sumn
k=1Yk =
1
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 29 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Conditional Variance for X = Y minus EY
∆X =n
2middot E
[
(X minusX prime)2 | (Yj)jge1
]
=n
2middot E
[
(YK minus Y primeK)
2 | (Yj)jge1
]
=1
2
sumn
k=1
(
Y 2k + E[Y 2
k | (Yj)j 6=k])
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 30 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Given a deterministic array (Ajk)njk=1 of Hermitian matrices and a
uniformly random permutation π on 1 n define thecombinatorial matrix statistic
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Generalizes the scalar statistics studied by Hoeffding (1951)
Example
Sampling without replacement from B1 BnW =
sums
j=1Bπ(j)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 31 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Matrix Stein Pair for X = Y minus EY
Draw indices (JK) uniformly from 1 n2Define πprime = π (JK) and X prime =
sumnj=1Ajπprime(j) minus EY
Check Stein pair condition
E[X minusX prime | π] = E[
AJπ(J) +AKπ(K) minusAJπ(K) minusAKπ(J) | π]
=1
n2
sumn
jk=1Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
=2
n(Y minus EY ) =
2
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 32 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Conditional Variance for X = Y minus EY
∆X(π) =n
4E[
(X minusX prime)2 | π]
=1
4n
sumn
jk=1
[
Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
]2
41
n
sumn
jk=1
[
A2jπ(j) +A2
kπ(k) +A2jπ(k) +A2
kπ(j)
]
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 33 35
Extensions
Extensions
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = B
Beyond Sums
Matrix-valued functions satisfying a self-reproducing property
eg Matrix second-order Rademacher chaossum
jk εjεkAjk
Yields a dependent bounded differences inequality for matrices
Generalized Matrix Stein Pairs
Satisfy E[g(X)minus g(X prime) |Z] = αX almost surely forg R rarr R weakly increasingMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 34 35
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Bernstein S The theory of probabilities Gastehizdat Publishing House 1946
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Candes E J and Recht B Exact matrix completion via convex optimization Found Comput Math 9717ndash772 2009
Candes E J and Tao T The power of convex relaxation Near-optimal matrix completion IEEE Trans Info Theory 2009URL arXiv09031476 To appear Available at arXiv09031476
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Chatterjee S Concentration inequalities with exchangeable pairs PhD thesis Stanford University Palo Alto Feb 2008 URLarxivmath0507526vl
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available athttpwwwoptimization-onlineorgDB_FILE2011012898pdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Foygel R and Srebro N Concentration-based guarantees for low-rank matrix reconstruction Journal of Machine Learning
Research - Proceedings Track 19315ndash340 2011
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W A combinatorial central limit theorem Ann Math Statist 22558ndash566 1951
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 35 35
Extensions
References II
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Shawe-Taylor J Zemel R SBartlett P L Pereira F C N and Weinberger K Q (eds) Advances in Neural Information Processing Systems 24 pp1134ndash1142 2011
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs URL httparxivorgabs12016002 2012
Negahban S and Wainwright M J Restricted strong convexity and weighted matrix completion Optimal bounds with noisearXiv10092118v2[csIT] 2010
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B Simpler approach to matrix completion J Mach Learn Res 123413ndash3430 2011
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 36 35
- Motivation
-
- Matrix Completion
- Matrix Concentration
-
- Extensions
-
Background
Matrix Stein Pair
Definition (Exchangeable Pair)
(ZZ prime) is an exchangeable pair if (ZZ prime)d= (Z prime Z)
Definition (Matrix Stein Pair)
Let (ZZ prime) be an exchangeable pair and let Ψ Z rarr Hd be ameasurable function Define the random matrices
X = Ψ(Z) and X prime = Ψ(Z prime)
(XX prime) is a matrix Stein pair with scale factor α isin (0 1] if
E[X prime |Z] = (1minus α)X
Matrix Stein pairs are exchangeable pairs
Matrix Stein pairs always have zero mean
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 13 35
Background
The Conditional Variance
Definition (Conditional Variance)
Suppose that (XX prime) is a matrix Stein pair with scale factor αconstructed from the exchangeable pair (ZZ prime) The conditional
variance is the random matrix
∆X = ∆X(Z) =1
2αE[
(X minusX prime)2 |Z]
∆X is a stochastic estimate for the variance EX2
Take-home Message
Control over ∆X yields control over λmax(X)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 14 35
Exponential Tail Inequalities
Exponential Concentration for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a matrix Stein pair with X isin Hd Suppose that
∆X 4 cX + v I almost surely for c v ge 0
Then for all t ge 0
Pλmax(X) ge t le d middot exp minust22v + 2ct
Control over the conditional variance ∆X yields
Gaussian tail for λmax(X) for small t exponential tail for large t
When d = 1 improves scalar result of Chatterjee (2007)
The dimensional factor d cannot be removed
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 15 35
Exponential Tail Inequalities
Matrix Hoeffding Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let X =sum
k Yk for independent matrices in Hd satisfying
EYk = 0 and Y 2k 4 A2
k
for deterministic matrices (Ak)kge1 Define the variance parameter
σ2 =∥
∥
∥
sum
kA2k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot eminust22σ2
Improves upon the matrix Hoeffding inequality of Tropp (2011)Optimal constant 12 in the exponent
Can replace variance parameter with σ2 = 12
∥
∥
sum
k
(
A2k + EY 2
k
)∥
∥
Tighter than classical Hoeffding inequality (1963) when d = 1
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 16 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
1 Matrix Laplace transform method (Ahlswede amp Winter 2002)
Relate tail probability to the trace of the mgf of X
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ)
where m(θ) = E tr eθX
Problem eX+Y 6= eXeY when XY isin Hd
How to bound the trace mgf
Past approaches Golden-Thompson Liebrsquos concavity theorem
Chatterjeersquos strategy for scalar concentration
Control mgf growth by bounding derivative
mprime(θ) = E trXeθX for θ isin R
Rewrite using exchangeable pairs
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 17 35
Exponential Tail Inequalities
Method of Exchangeable Pairs
Lemma
Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying
E(X minusX prime)F (X) ltinfin
Then
E[X F (X)] =1
2αE[(X minusX prime)(F (X)minus F (X prime))] (1)
Intuition
Can characterize the distribution of a random matrix byintegrating it against a class of test functions F
Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 18 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
2 Method of Exchangeable Pairs
Rewrite the derivative of the trace mgf
mprime(θ) = E trXeθX =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
Goal Use the smoothness of F (X) = eθX to bound the derivative
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 19 35
Exponential Tail Inequalities
Mean Value Trace Inequality
Lemma (Mackey Jordan Chen Farrell and Tropp 2012)
Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that
tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1
2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]
Standard matrix functions If g R rarr R and
A = Q
λ1
λd
Qlowast then g(A) = Q
g(λ1)
g(λd)
Qlowast
Inequality does not hold without the traceFor exponential concentration we let g(A) = A and h(B) = eθB
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 20 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
le θ
4αE tr
[
(X minusX prime)2 middot(
eθX + eθXprime)]
=θ
2αE tr
[
(X minusX prime)2 middot eθX]
= θ middot E tr
[
1
2αE[
(X minusX prime)2 |Z]
middot eθX]
= θ middot E tr[
∆X eθX]
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 21 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) le θ middot E tr[
∆X eθX]
4 Conditional Variance Bound ∆X 4 cX + v I
Yields differential inequality
mprime(θ) le cθE tr[
X eθX]
+ vθE tr[
eθX]
= cθ middotmprime(θ) + vθ middotm(θ)
Solve to bound m(θ) and thereby bound
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ) le d middot exp minust22v + 2ct
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 22 35
Exponential Tail Inequalities
Refined Exponential Concentration
Relaxing the constraint ∆X 4 cX + v
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a bounded matrix Stein pair with X isin Hd Definethe function
r(ψ) =1
ψlogE tr(eψ∆Xd) for each ψ gt 0
Then for all t ge 0 and all ψ gt 0
Pλmax(X) ge t le d middot exp minust22r(ψ) + 2t
radicψ
r(ψ) measures typical magnitude of conditional variance
Eλmax(∆X) le infψgt0
[
r(ψ) + log dψ
]
When d = 1 improves scalar result of Chatterjee (2008)Proof extends to unbounded random matricesMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 23 35
Exponential Tail Inequalities
Matrix Bernstein Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Hd satisfying
EYk = 0 and Yk le R for each index k
Define the variance parameter
σ2 =∥
∥
∥
sum
kEY 2
k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot exp minust23σ2 + 2Rt
Gaussian tail controlled by improved variance when t is smallKey proof idea Apply refined concentration and boundr(ψ) = 1
ψlogE tr(eψ∆Xd) using unrefined concentration
Constants better than Oliveira (2009) worse than Tropp (2011)Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 24 35
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere E tr |X|2p ltinfin Then
(
E tr |X|2p)12p le
radic
2pminus 1 middot(
E tr∆pX
)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 25 35
Polynomial Moment Inequalities
Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
E tr(
sum
kεkAk
)2p
le (2pminus 1)p middot tr(
sum
kA2k
)p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 26 35
Dependent Sequences
Adding Dependence
1 MotivationMatrix CompletionMatrix Concentration
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent SequencesSums of Conditionally Zero-mean MatricesCombinatorial Sums
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 27 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Sum of Conditionally Zero-Mean Matrices)
Given a sequence of Hermitian matrices (Yk)nk=1 satisfying the
Conditional zero mean property E[Yk | (Yj)j 6=k] = 0
for all k define the random sum X =sumn
k=1Yk
Note (Yk)kge1 is a martingale difference sequence
Examples
Sums of independent centered random matricesMany sums of conditionally independent random matrices
Yk perpperp (Yj)j 6=k | Z and E[Yk |Z] = 0
Rademacher series with random matrix coefficients
X =sum
kεkWk
(Wk)kge1 Hermitian (εk)kge1 independent Rademacher
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 28 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Matrix Stein Pair for X =sumn
k=1 Yk
Let Y primek and Yk be conditionally iid given (Yj)j 6=k
Draw index K uniformly from 1 nDefine X prime = X + Y prime
K minus YK
Check Stein pair condition
E[X minusX prime | (Yj)jge1] = E[YK minus Y primeK | (Yj)jge1]
=1
n
sumn
k=1
(
Yk minus E[Y primek | (Yj)j 6=k]
)
=1
n
sumn
k=1Yk =
1
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 29 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Conditional Variance for X = Y minus EY
∆X =n
2middot E
[
(X minusX prime)2 | (Yj)jge1
]
=n
2middot E
[
(YK minus Y primeK)
2 | (Yj)jge1
]
=1
2
sumn
k=1
(
Y 2k + E[Y 2
k | (Yj)j 6=k])
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 30 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Given a deterministic array (Ajk)njk=1 of Hermitian matrices and a
uniformly random permutation π on 1 n define thecombinatorial matrix statistic
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Generalizes the scalar statistics studied by Hoeffding (1951)
Example
Sampling without replacement from B1 BnW =
sums
j=1Bπ(j)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 31 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Matrix Stein Pair for X = Y minus EY
Draw indices (JK) uniformly from 1 n2Define πprime = π (JK) and X prime =
sumnj=1Ajπprime(j) minus EY
Check Stein pair condition
E[X minusX prime | π] = E[
AJπ(J) +AKπ(K) minusAJπ(K) minusAKπ(J) | π]
=1
n2
sumn
jk=1Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
=2
n(Y minus EY ) =
2
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 32 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Conditional Variance for X = Y minus EY
∆X(π) =n
4E[
(X minusX prime)2 | π]
=1
4n
sumn
jk=1
[
Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
]2
41
n
sumn
jk=1
[
A2jπ(j) +A2
kπ(k) +A2jπ(k) +A2
kπ(j)
]
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 33 35
Extensions
Extensions
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = B
Beyond Sums
Matrix-valued functions satisfying a self-reproducing property
eg Matrix second-order Rademacher chaossum
jk εjεkAjk
Yields a dependent bounded differences inequality for matrices
Generalized Matrix Stein Pairs
Satisfy E[g(X)minus g(X prime) |Z] = αX almost surely forg R rarr R weakly increasingMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 34 35
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Bernstein S The theory of probabilities Gastehizdat Publishing House 1946
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Candes E J and Recht B Exact matrix completion via convex optimization Found Comput Math 9717ndash772 2009
Candes E J and Tao T The power of convex relaxation Near-optimal matrix completion IEEE Trans Info Theory 2009URL arXiv09031476 To appear Available at arXiv09031476
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Chatterjee S Concentration inequalities with exchangeable pairs PhD thesis Stanford University Palo Alto Feb 2008 URLarxivmath0507526vl
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available athttpwwwoptimization-onlineorgDB_FILE2011012898pdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Foygel R and Srebro N Concentration-based guarantees for low-rank matrix reconstruction Journal of Machine Learning
Research - Proceedings Track 19315ndash340 2011
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W A combinatorial central limit theorem Ann Math Statist 22558ndash566 1951
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 35 35
Extensions
References II
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Shawe-Taylor J Zemel R SBartlett P L Pereira F C N and Weinberger K Q (eds) Advances in Neural Information Processing Systems 24 pp1134ndash1142 2011
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs URL httparxivorgabs12016002 2012
Negahban S and Wainwright M J Restricted strong convexity and weighted matrix completion Optimal bounds with noisearXiv10092118v2[csIT] 2010
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B Simpler approach to matrix completion J Mach Learn Res 123413ndash3430 2011
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 36 35
- Motivation
-
- Matrix Completion
- Matrix Concentration
-
- Extensions
-
Background
The Conditional Variance
Definition (Conditional Variance)
Suppose that (XX prime) is a matrix Stein pair with scale factor αconstructed from the exchangeable pair (ZZ prime) The conditional
variance is the random matrix
∆X = ∆X(Z) =1
2αE[
(X minusX prime)2 |Z]
∆X is a stochastic estimate for the variance EX2
Take-home Message
Control over ∆X yields control over λmax(X)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 14 35
Exponential Tail Inequalities
Exponential Concentration for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a matrix Stein pair with X isin Hd Suppose that
∆X 4 cX + v I almost surely for c v ge 0
Then for all t ge 0
Pλmax(X) ge t le d middot exp minust22v + 2ct
Control over the conditional variance ∆X yields
Gaussian tail for λmax(X) for small t exponential tail for large t
When d = 1 improves scalar result of Chatterjee (2007)
The dimensional factor d cannot be removed
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 15 35
Exponential Tail Inequalities
Matrix Hoeffding Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let X =sum
k Yk for independent matrices in Hd satisfying
EYk = 0 and Y 2k 4 A2
k
for deterministic matrices (Ak)kge1 Define the variance parameter
σ2 =∥
∥
∥
sum
kA2k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot eminust22σ2
Improves upon the matrix Hoeffding inequality of Tropp (2011)Optimal constant 12 in the exponent
Can replace variance parameter with σ2 = 12
∥
∥
sum
k
(
A2k + EY 2
k
)∥
∥
Tighter than classical Hoeffding inequality (1963) when d = 1
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 16 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
1 Matrix Laplace transform method (Ahlswede amp Winter 2002)
Relate tail probability to the trace of the mgf of X
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ)
where m(θ) = E tr eθX
Problem eX+Y 6= eXeY when XY isin Hd
How to bound the trace mgf
Past approaches Golden-Thompson Liebrsquos concavity theorem
Chatterjeersquos strategy for scalar concentration
Control mgf growth by bounding derivative
mprime(θ) = E trXeθX for θ isin R
Rewrite using exchangeable pairs
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 17 35
Exponential Tail Inequalities
Method of Exchangeable Pairs
Lemma
Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying
E(X minusX prime)F (X) ltinfin
Then
E[X F (X)] =1
2αE[(X minusX prime)(F (X)minus F (X prime))] (1)
Intuition
Can characterize the distribution of a random matrix byintegrating it against a class of test functions F
Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 18 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
2 Method of Exchangeable Pairs
Rewrite the derivative of the trace mgf
mprime(θ) = E trXeθX =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
Goal Use the smoothness of F (X) = eθX to bound the derivative
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 19 35
Exponential Tail Inequalities
Mean Value Trace Inequality
Lemma (Mackey Jordan Chen Farrell and Tropp 2012)
Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that
tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1
2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]
Standard matrix functions If g R rarr R and
A = Q
λ1
λd
Qlowast then g(A) = Q
g(λ1)
g(λd)
Qlowast
Inequality does not hold without the traceFor exponential concentration we let g(A) = A and h(B) = eθB
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 20 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
le θ
4αE tr
[
(X minusX prime)2 middot(
eθX + eθXprime)]
=θ
2αE tr
[
(X minusX prime)2 middot eθX]
= θ middot E tr
[
1
2αE[
(X minusX prime)2 |Z]
middot eθX]
= θ middot E tr[
∆X eθX]
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 21 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) le θ middot E tr[
∆X eθX]
4 Conditional Variance Bound ∆X 4 cX + v I
Yields differential inequality
mprime(θ) le cθE tr[
X eθX]
+ vθE tr[
eθX]
= cθ middotmprime(θ) + vθ middotm(θ)
Solve to bound m(θ) and thereby bound
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ) le d middot exp minust22v + 2ct
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 22 35
Exponential Tail Inequalities
Refined Exponential Concentration
Relaxing the constraint ∆X 4 cX + v
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a bounded matrix Stein pair with X isin Hd Definethe function
r(ψ) =1
ψlogE tr(eψ∆Xd) for each ψ gt 0
Then for all t ge 0 and all ψ gt 0
Pλmax(X) ge t le d middot exp minust22r(ψ) + 2t
radicψ
r(ψ) measures typical magnitude of conditional variance
Eλmax(∆X) le infψgt0
[
r(ψ) + log dψ
]
When d = 1 improves scalar result of Chatterjee (2008)Proof extends to unbounded random matricesMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 23 35
Exponential Tail Inequalities
Matrix Bernstein Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Hd satisfying
EYk = 0 and Yk le R for each index k
Define the variance parameter
σ2 =∥
∥
∥
sum
kEY 2
k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot exp minust23σ2 + 2Rt
Gaussian tail controlled by improved variance when t is smallKey proof idea Apply refined concentration and boundr(ψ) = 1
ψlogE tr(eψ∆Xd) using unrefined concentration
Constants better than Oliveira (2009) worse than Tropp (2011)Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 24 35
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere E tr |X|2p ltinfin Then
(
E tr |X|2p)12p le
radic
2pminus 1 middot(
E tr∆pX
)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 25 35
Polynomial Moment Inequalities
Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
E tr(
sum
kεkAk
)2p
le (2pminus 1)p middot tr(
sum
kA2k
)p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 26 35
Dependent Sequences
Adding Dependence
1 MotivationMatrix CompletionMatrix Concentration
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent SequencesSums of Conditionally Zero-mean MatricesCombinatorial Sums
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 27 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Sum of Conditionally Zero-Mean Matrices)
Given a sequence of Hermitian matrices (Yk)nk=1 satisfying the
Conditional zero mean property E[Yk | (Yj)j 6=k] = 0
for all k define the random sum X =sumn
k=1Yk
Note (Yk)kge1 is a martingale difference sequence
Examples
Sums of independent centered random matricesMany sums of conditionally independent random matrices
Yk perpperp (Yj)j 6=k | Z and E[Yk |Z] = 0
Rademacher series with random matrix coefficients
X =sum
kεkWk
(Wk)kge1 Hermitian (εk)kge1 independent Rademacher
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 28 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Matrix Stein Pair for X =sumn
k=1 Yk
Let Y primek and Yk be conditionally iid given (Yj)j 6=k
Draw index K uniformly from 1 nDefine X prime = X + Y prime
K minus YK
Check Stein pair condition
E[X minusX prime | (Yj)jge1] = E[YK minus Y primeK | (Yj)jge1]
=1
n
sumn
k=1
(
Yk minus E[Y primek | (Yj)j 6=k]
)
=1
n
sumn
k=1Yk =
1
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 29 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Conditional Variance for X = Y minus EY
∆X =n
2middot E
[
(X minusX prime)2 | (Yj)jge1
]
=n
2middot E
[
(YK minus Y primeK)
2 | (Yj)jge1
]
=1
2
sumn
k=1
(
Y 2k + E[Y 2
k | (Yj)j 6=k])
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 30 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Given a deterministic array (Ajk)njk=1 of Hermitian matrices and a
uniformly random permutation π on 1 n define thecombinatorial matrix statistic
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Generalizes the scalar statistics studied by Hoeffding (1951)
Example
Sampling without replacement from B1 BnW =
sums
j=1Bπ(j)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 31 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Matrix Stein Pair for X = Y minus EY
Draw indices (JK) uniformly from 1 n2Define πprime = π (JK) and X prime =
sumnj=1Ajπprime(j) minus EY
Check Stein pair condition
E[X minusX prime | π] = E[
AJπ(J) +AKπ(K) minusAJπ(K) minusAKπ(J) | π]
=1
n2
sumn
jk=1Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
=2
n(Y minus EY ) =
2
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 32 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Conditional Variance for X = Y minus EY
∆X(π) =n
4E[
(X minusX prime)2 | π]
=1
4n
sumn
jk=1
[
Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
]2
41
n
sumn
jk=1
[
A2jπ(j) +A2
kπ(k) +A2jπ(k) +A2
kπ(j)
]
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 33 35
Extensions
Extensions
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = B
Beyond Sums
Matrix-valued functions satisfying a self-reproducing property
eg Matrix second-order Rademacher chaossum
jk εjεkAjk
Yields a dependent bounded differences inequality for matrices
Generalized Matrix Stein Pairs
Satisfy E[g(X)minus g(X prime) |Z] = αX almost surely forg R rarr R weakly increasingMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 34 35
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Bernstein S The theory of probabilities Gastehizdat Publishing House 1946
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Candes E J and Recht B Exact matrix completion via convex optimization Found Comput Math 9717ndash772 2009
Candes E J and Tao T The power of convex relaxation Near-optimal matrix completion IEEE Trans Info Theory 2009URL arXiv09031476 To appear Available at arXiv09031476
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Chatterjee S Concentration inequalities with exchangeable pairs PhD thesis Stanford University Palo Alto Feb 2008 URLarxivmath0507526vl
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available athttpwwwoptimization-onlineorgDB_FILE2011012898pdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Foygel R and Srebro N Concentration-based guarantees for low-rank matrix reconstruction Journal of Machine Learning
Research - Proceedings Track 19315ndash340 2011
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W A combinatorial central limit theorem Ann Math Statist 22558ndash566 1951
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 35 35
Extensions
References II
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Shawe-Taylor J Zemel R SBartlett P L Pereira F C N and Weinberger K Q (eds) Advances in Neural Information Processing Systems 24 pp1134ndash1142 2011
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs URL httparxivorgabs12016002 2012
Negahban S and Wainwright M J Restricted strong convexity and weighted matrix completion Optimal bounds with noisearXiv10092118v2[csIT] 2010
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B Simpler approach to matrix completion J Mach Learn Res 123413ndash3430 2011
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 36 35
- Motivation
-
- Matrix Completion
- Matrix Concentration
-
- Extensions
-
Exponential Tail Inequalities
Exponential Concentration for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a matrix Stein pair with X isin Hd Suppose that
∆X 4 cX + v I almost surely for c v ge 0
Then for all t ge 0
Pλmax(X) ge t le d middot exp minust22v + 2ct
Control over the conditional variance ∆X yields
Gaussian tail for λmax(X) for small t exponential tail for large t
When d = 1 improves scalar result of Chatterjee (2007)
The dimensional factor d cannot be removed
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 15 35
Exponential Tail Inequalities
Matrix Hoeffding Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let X =sum
k Yk for independent matrices in Hd satisfying
EYk = 0 and Y 2k 4 A2
k
for deterministic matrices (Ak)kge1 Define the variance parameter
σ2 =∥
∥
∥
sum
kA2k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot eminust22σ2
Improves upon the matrix Hoeffding inequality of Tropp (2011)Optimal constant 12 in the exponent
Can replace variance parameter with σ2 = 12
∥
∥
sum
k
(
A2k + EY 2
k
)∥
∥
Tighter than classical Hoeffding inequality (1963) when d = 1
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 16 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
1 Matrix Laplace transform method (Ahlswede amp Winter 2002)
Relate tail probability to the trace of the mgf of X
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ)
where m(θ) = E tr eθX
Problem eX+Y 6= eXeY when XY isin Hd
How to bound the trace mgf
Past approaches Golden-Thompson Liebrsquos concavity theorem
Chatterjeersquos strategy for scalar concentration
Control mgf growth by bounding derivative
mprime(θ) = E trXeθX for θ isin R
Rewrite using exchangeable pairs
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 17 35
Exponential Tail Inequalities
Method of Exchangeable Pairs
Lemma
Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying
E(X minusX prime)F (X) ltinfin
Then
E[X F (X)] =1
2αE[(X minusX prime)(F (X)minus F (X prime))] (1)
Intuition
Can characterize the distribution of a random matrix byintegrating it against a class of test functions F
Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 18 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
2 Method of Exchangeable Pairs
Rewrite the derivative of the trace mgf
mprime(θ) = E trXeθX =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
Goal Use the smoothness of F (X) = eθX to bound the derivative
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 19 35
Exponential Tail Inequalities
Mean Value Trace Inequality
Lemma (Mackey Jordan Chen Farrell and Tropp 2012)
Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that
tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1
2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]
Standard matrix functions If g R rarr R and
A = Q
λ1
λd
Qlowast then g(A) = Q
g(λ1)
g(λd)
Qlowast
Inequality does not hold without the traceFor exponential concentration we let g(A) = A and h(B) = eθB
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 20 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
le θ
4αE tr
[
(X minusX prime)2 middot(
eθX + eθXprime)]
=θ
2αE tr
[
(X minusX prime)2 middot eθX]
= θ middot E tr
[
1
2αE[
(X minusX prime)2 |Z]
middot eθX]
= θ middot E tr[
∆X eθX]
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 21 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) le θ middot E tr[
∆X eθX]
4 Conditional Variance Bound ∆X 4 cX + v I
Yields differential inequality
mprime(θ) le cθE tr[
X eθX]
+ vθE tr[
eθX]
= cθ middotmprime(θ) + vθ middotm(θ)
Solve to bound m(θ) and thereby bound
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ) le d middot exp minust22v + 2ct
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 22 35
Exponential Tail Inequalities
Refined Exponential Concentration
Relaxing the constraint ∆X 4 cX + v
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a bounded matrix Stein pair with X isin Hd Definethe function
r(ψ) =1
ψlogE tr(eψ∆Xd) for each ψ gt 0
Then for all t ge 0 and all ψ gt 0
Pλmax(X) ge t le d middot exp minust22r(ψ) + 2t
radicψ
r(ψ) measures typical magnitude of conditional variance
Eλmax(∆X) le infψgt0
[
r(ψ) + log dψ
]
When d = 1 improves scalar result of Chatterjee (2008)Proof extends to unbounded random matricesMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 23 35
Exponential Tail Inequalities
Matrix Bernstein Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Hd satisfying
EYk = 0 and Yk le R for each index k
Define the variance parameter
σ2 =∥
∥
∥
sum
kEY 2
k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot exp minust23σ2 + 2Rt
Gaussian tail controlled by improved variance when t is smallKey proof idea Apply refined concentration and boundr(ψ) = 1
ψlogE tr(eψ∆Xd) using unrefined concentration
Constants better than Oliveira (2009) worse than Tropp (2011)Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 24 35
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere E tr |X|2p ltinfin Then
(
E tr |X|2p)12p le
radic
2pminus 1 middot(
E tr∆pX
)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 25 35
Polynomial Moment Inequalities
Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
E tr(
sum
kεkAk
)2p
le (2pminus 1)p middot tr(
sum
kA2k
)p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 26 35
Dependent Sequences
Adding Dependence
1 MotivationMatrix CompletionMatrix Concentration
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent SequencesSums of Conditionally Zero-mean MatricesCombinatorial Sums
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 27 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Sum of Conditionally Zero-Mean Matrices)
Given a sequence of Hermitian matrices (Yk)nk=1 satisfying the
Conditional zero mean property E[Yk | (Yj)j 6=k] = 0
for all k define the random sum X =sumn
k=1Yk
Note (Yk)kge1 is a martingale difference sequence
Examples
Sums of independent centered random matricesMany sums of conditionally independent random matrices
Yk perpperp (Yj)j 6=k | Z and E[Yk |Z] = 0
Rademacher series with random matrix coefficients
X =sum
kεkWk
(Wk)kge1 Hermitian (εk)kge1 independent Rademacher
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 28 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Matrix Stein Pair for X =sumn
k=1 Yk
Let Y primek and Yk be conditionally iid given (Yj)j 6=k
Draw index K uniformly from 1 nDefine X prime = X + Y prime
K minus YK
Check Stein pair condition
E[X minusX prime | (Yj)jge1] = E[YK minus Y primeK | (Yj)jge1]
=1
n
sumn
k=1
(
Yk minus E[Y primek | (Yj)j 6=k]
)
=1
n
sumn
k=1Yk =
1
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 29 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Conditional Variance for X = Y minus EY
∆X =n
2middot E
[
(X minusX prime)2 | (Yj)jge1
]
=n
2middot E
[
(YK minus Y primeK)
2 | (Yj)jge1
]
=1
2
sumn
k=1
(
Y 2k + E[Y 2
k | (Yj)j 6=k])
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 30 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Given a deterministic array (Ajk)njk=1 of Hermitian matrices and a
uniformly random permutation π on 1 n define thecombinatorial matrix statistic
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Generalizes the scalar statistics studied by Hoeffding (1951)
Example
Sampling without replacement from B1 BnW =
sums
j=1Bπ(j)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 31 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Matrix Stein Pair for X = Y minus EY
Draw indices (JK) uniformly from 1 n2Define πprime = π (JK) and X prime =
sumnj=1Ajπprime(j) minus EY
Check Stein pair condition
E[X minusX prime | π] = E[
AJπ(J) +AKπ(K) minusAJπ(K) minusAKπ(J) | π]
=1
n2
sumn
jk=1Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
=2
n(Y minus EY ) =
2
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 32 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Conditional Variance for X = Y minus EY
∆X(π) =n
4E[
(X minusX prime)2 | π]
=1
4n
sumn
jk=1
[
Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
]2
41
n
sumn
jk=1
[
A2jπ(j) +A2
kπ(k) +A2jπ(k) +A2
kπ(j)
]
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 33 35
Extensions
Extensions
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = B
Beyond Sums
Matrix-valued functions satisfying a self-reproducing property
eg Matrix second-order Rademacher chaossum
jk εjεkAjk
Yields a dependent bounded differences inequality for matrices
Generalized Matrix Stein Pairs
Satisfy E[g(X)minus g(X prime) |Z] = αX almost surely forg R rarr R weakly increasingMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 34 35
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Bernstein S The theory of probabilities Gastehizdat Publishing House 1946
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Candes E J and Recht B Exact matrix completion via convex optimization Found Comput Math 9717ndash772 2009
Candes E J and Tao T The power of convex relaxation Near-optimal matrix completion IEEE Trans Info Theory 2009URL arXiv09031476 To appear Available at arXiv09031476
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Chatterjee S Concentration inequalities with exchangeable pairs PhD thesis Stanford University Palo Alto Feb 2008 URLarxivmath0507526vl
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available athttpwwwoptimization-onlineorgDB_FILE2011012898pdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Foygel R and Srebro N Concentration-based guarantees for low-rank matrix reconstruction Journal of Machine Learning
Research - Proceedings Track 19315ndash340 2011
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W A combinatorial central limit theorem Ann Math Statist 22558ndash566 1951
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 35 35
Extensions
References II
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Shawe-Taylor J Zemel R SBartlett P L Pereira F C N and Weinberger K Q (eds) Advances in Neural Information Processing Systems 24 pp1134ndash1142 2011
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs URL httparxivorgabs12016002 2012
Negahban S and Wainwright M J Restricted strong convexity and weighted matrix completion Optimal bounds with noisearXiv10092118v2[csIT] 2010
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B Simpler approach to matrix completion J Mach Learn Res 123413ndash3430 2011
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 36 35
- Motivation
-
- Matrix Completion
- Matrix Concentration
-
- Extensions
-
Exponential Tail Inequalities
Matrix Hoeffding Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let X =sum
k Yk for independent matrices in Hd satisfying
EYk = 0 and Y 2k 4 A2
k
for deterministic matrices (Ak)kge1 Define the variance parameter
σ2 =∥
∥
∥
sum
kA2k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot eminust22σ2
Improves upon the matrix Hoeffding inequality of Tropp (2011)Optimal constant 12 in the exponent
Can replace variance parameter with σ2 = 12
∥
∥
sum
k
(
A2k + EY 2
k
)∥
∥
Tighter than classical Hoeffding inequality (1963) when d = 1
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 16 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
1 Matrix Laplace transform method (Ahlswede amp Winter 2002)
Relate tail probability to the trace of the mgf of X
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ)
where m(θ) = E tr eθX
Problem eX+Y 6= eXeY when XY isin Hd
How to bound the trace mgf
Past approaches Golden-Thompson Liebrsquos concavity theorem
Chatterjeersquos strategy for scalar concentration
Control mgf growth by bounding derivative
mprime(θ) = E trXeθX for θ isin R
Rewrite using exchangeable pairs
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 17 35
Exponential Tail Inequalities
Method of Exchangeable Pairs
Lemma
Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying
E(X minusX prime)F (X) ltinfin
Then
E[X F (X)] =1
2αE[(X minusX prime)(F (X)minus F (X prime))] (1)
Intuition
Can characterize the distribution of a random matrix byintegrating it against a class of test functions F
Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 18 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
2 Method of Exchangeable Pairs
Rewrite the derivative of the trace mgf
mprime(θ) = E trXeθX =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
Goal Use the smoothness of F (X) = eθX to bound the derivative
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 19 35
Exponential Tail Inequalities
Mean Value Trace Inequality
Lemma (Mackey Jordan Chen Farrell and Tropp 2012)
Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that
tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1
2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]
Standard matrix functions If g R rarr R and
A = Q
λ1
λd
Qlowast then g(A) = Q
g(λ1)
g(λd)
Qlowast
Inequality does not hold without the traceFor exponential concentration we let g(A) = A and h(B) = eθB
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 20 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
le θ
4αE tr
[
(X minusX prime)2 middot(
eθX + eθXprime)]
=θ
2αE tr
[
(X minusX prime)2 middot eθX]
= θ middot E tr
[
1
2αE[
(X minusX prime)2 |Z]
middot eθX]
= θ middot E tr[
∆X eθX]
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 21 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) le θ middot E tr[
∆X eθX]
4 Conditional Variance Bound ∆X 4 cX + v I
Yields differential inequality
mprime(θ) le cθE tr[
X eθX]
+ vθE tr[
eθX]
= cθ middotmprime(θ) + vθ middotm(θ)
Solve to bound m(θ) and thereby bound
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ) le d middot exp minust22v + 2ct
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 22 35
Exponential Tail Inequalities
Refined Exponential Concentration
Relaxing the constraint ∆X 4 cX + v
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a bounded matrix Stein pair with X isin Hd Definethe function
r(ψ) =1
ψlogE tr(eψ∆Xd) for each ψ gt 0
Then for all t ge 0 and all ψ gt 0
Pλmax(X) ge t le d middot exp minust22r(ψ) + 2t
radicψ
r(ψ) measures typical magnitude of conditional variance
Eλmax(∆X) le infψgt0
[
r(ψ) + log dψ
]
When d = 1 improves scalar result of Chatterjee (2008)Proof extends to unbounded random matricesMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 23 35
Exponential Tail Inequalities
Matrix Bernstein Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Hd satisfying
EYk = 0 and Yk le R for each index k
Define the variance parameter
σ2 =∥
∥
∥
sum
kEY 2
k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot exp minust23σ2 + 2Rt
Gaussian tail controlled by improved variance when t is smallKey proof idea Apply refined concentration and boundr(ψ) = 1
ψlogE tr(eψ∆Xd) using unrefined concentration
Constants better than Oliveira (2009) worse than Tropp (2011)Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 24 35
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere E tr |X|2p ltinfin Then
(
E tr |X|2p)12p le
radic
2pminus 1 middot(
E tr∆pX
)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 25 35
Polynomial Moment Inequalities
Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
E tr(
sum
kεkAk
)2p
le (2pminus 1)p middot tr(
sum
kA2k
)p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 26 35
Dependent Sequences
Adding Dependence
1 MotivationMatrix CompletionMatrix Concentration
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent SequencesSums of Conditionally Zero-mean MatricesCombinatorial Sums
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 27 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Sum of Conditionally Zero-Mean Matrices)
Given a sequence of Hermitian matrices (Yk)nk=1 satisfying the
Conditional zero mean property E[Yk | (Yj)j 6=k] = 0
for all k define the random sum X =sumn
k=1Yk
Note (Yk)kge1 is a martingale difference sequence
Examples
Sums of independent centered random matricesMany sums of conditionally independent random matrices
Yk perpperp (Yj)j 6=k | Z and E[Yk |Z] = 0
Rademacher series with random matrix coefficients
X =sum
kεkWk
(Wk)kge1 Hermitian (εk)kge1 independent Rademacher
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 28 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Matrix Stein Pair for X =sumn
k=1 Yk
Let Y primek and Yk be conditionally iid given (Yj)j 6=k
Draw index K uniformly from 1 nDefine X prime = X + Y prime
K minus YK
Check Stein pair condition
E[X minusX prime | (Yj)jge1] = E[YK minus Y primeK | (Yj)jge1]
=1
n
sumn
k=1
(
Yk minus E[Y primek | (Yj)j 6=k]
)
=1
n
sumn
k=1Yk =
1
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 29 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Conditional Variance for X = Y minus EY
∆X =n
2middot E
[
(X minusX prime)2 | (Yj)jge1
]
=n
2middot E
[
(YK minus Y primeK)
2 | (Yj)jge1
]
=1
2
sumn
k=1
(
Y 2k + E[Y 2
k | (Yj)j 6=k])
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 30 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Given a deterministic array (Ajk)njk=1 of Hermitian matrices and a
uniformly random permutation π on 1 n define thecombinatorial matrix statistic
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Generalizes the scalar statistics studied by Hoeffding (1951)
Example
Sampling without replacement from B1 BnW =
sums
j=1Bπ(j)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 31 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Matrix Stein Pair for X = Y minus EY
Draw indices (JK) uniformly from 1 n2Define πprime = π (JK) and X prime =
sumnj=1Ajπprime(j) minus EY
Check Stein pair condition
E[X minusX prime | π] = E[
AJπ(J) +AKπ(K) minusAJπ(K) minusAKπ(J) | π]
=1
n2
sumn
jk=1Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
=2
n(Y minus EY ) =
2
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 32 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Conditional Variance for X = Y minus EY
∆X(π) =n
4E[
(X minusX prime)2 | π]
=1
4n
sumn
jk=1
[
Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
]2
41
n
sumn
jk=1
[
A2jπ(j) +A2
kπ(k) +A2jπ(k) +A2
kπ(j)
]
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 33 35
Extensions
Extensions
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = B
Beyond Sums
Matrix-valued functions satisfying a self-reproducing property
eg Matrix second-order Rademacher chaossum
jk εjεkAjk
Yields a dependent bounded differences inequality for matrices
Generalized Matrix Stein Pairs
Satisfy E[g(X)minus g(X prime) |Z] = αX almost surely forg R rarr R weakly increasingMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 34 35
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Bernstein S The theory of probabilities Gastehizdat Publishing House 1946
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Candes E J and Recht B Exact matrix completion via convex optimization Found Comput Math 9717ndash772 2009
Candes E J and Tao T The power of convex relaxation Near-optimal matrix completion IEEE Trans Info Theory 2009URL arXiv09031476 To appear Available at arXiv09031476
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Chatterjee S Concentration inequalities with exchangeable pairs PhD thesis Stanford University Palo Alto Feb 2008 URLarxivmath0507526vl
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available athttpwwwoptimization-onlineorgDB_FILE2011012898pdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Foygel R and Srebro N Concentration-based guarantees for low-rank matrix reconstruction Journal of Machine Learning
Research - Proceedings Track 19315ndash340 2011
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W A combinatorial central limit theorem Ann Math Statist 22558ndash566 1951
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 35 35
Extensions
References II
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Shawe-Taylor J Zemel R SBartlett P L Pereira F C N and Weinberger K Q (eds) Advances in Neural Information Processing Systems 24 pp1134ndash1142 2011
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs URL httparxivorgabs12016002 2012
Negahban S and Wainwright M J Restricted strong convexity and weighted matrix completion Optimal bounds with noisearXiv10092118v2[csIT] 2010
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B Simpler approach to matrix completion J Mach Learn Res 123413ndash3430 2011
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 36 35
- Motivation
-
- Matrix Completion
- Matrix Concentration
-
- Extensions
-
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
1 Matrix Laplace transform method (Ahlswede amp Winter 2002)
Relate tail probability to the trace of the mgf of X
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ)
where m(θ) = E tr eθX
Problem eX+Y 6= eXeY when XY isin Hd
How to bound the trace mgf
Past approaches Golden-Thompson Liebrsquos concavity theorem
Chatterjeersquos strategy for scalar concentration
Control mgf growth by bounding derivative
mprime(θ) = E trXeθX for θ isin R
Rewrite using exchangeable pairs
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 17 35
Exponential Tail Inequalities
Method of Exchangeable Pairs
Lemma
Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying
E(X minusX prime)F (X) ltinfin
Then
E[X F (X)] =1
2αE[(X minusX prime)(F (X)minus F (X prime))] (1)
Intuition
Can characterize the distribution of a random matrix byintegrating it against a class of test functions F
Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 18 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
2 Method of Exchangeable Pairs
Rewrite the derivative of the trace mgf
mprime(θ) = E trXeθX =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
Goal Use the smoothness of F (X) = eθX to bound the derivative
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 19 35
Exponential Tail Inequalities
Mean Value Trace Inequality
Lemma (Mackey Jordan Chen Farrell and Tropp 2012)
Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that
tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1
2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]
Standard matrix functions If g R rarr R and
A = Q
λ1
λd
Qlowast then g(A) = Q
g(λ1)
g(λd)
Qlowast
Inequality does not hold without the traceFor exponential concentration we let g(A) = A and h(B) = eθB
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 20 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
le θ
4αE tr
[
(X minusX prime)2 middot(
eθX + eθXprime)]
=θ
2αE tr
[
(X minusX prime)2 middot eθX]
= θ middot E tr
[
1
2αE[
(X minusX prime)2 |Z]
middot eθX]
= θ middot E tr[
∆X eθX]
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 21 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) le θ middot E tr[
∆X eθX]
4 Conditional Variance Bound ∆X 4 cX + v I
Yields differential inequality
mprime(θ) le cθE tr[
X eθX]
+ vθE tr[
eθX]
= cθ middotmprime(θ) + vθ middotm(θ)
Solve to bound m(θ) and thereby bound
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ) le d middot exp minust22v + 2ct
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 22 35
Exponential Tail Inequalities
Refined Exponential Concentration
Relaxing the constraint ∆X 4 cX + v
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a bounded matrix Stein pair with X isin Hd Definethe function
r(ψ) =1
ψlogE tr(eψ∆Xd) for each ψ gt 0
Then for all t ge 0 and all ψ gt 0
Pλmax(X) ge t le d middot exp minust22r(ψ) + 2t
radicψ
r(ψ) measures typical magnitude of conditional variance
Eλmax(∆X) le infψgt0
[
r(ψ) + log dψ
]
When d = 1 improves scalar result of Chatterjee (2008)Proof extends to unbounded random matricesMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 23 35
Exponential Tail Inequalities
Matrix Bernstein Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Hd satisfying
EYk = 0 and Yk le R for each index k
Define the variance parameter
σ2 =∥
∥
∥
sum
kEY 2
k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot exp minust23σ2 + 2Rt
Gaussian tail controlled by improved variance when t is smallKey proof idea Apply refined concentration and boundr(ψ) = 1
ψlogE tr(eψ∆Xd) using unrefined concentration
Constants better than Oliveira (2009) worse than Tropp (2011)Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 24 35
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere E tr |X|2p ltinfin Then
(
E tr |X|2p)12p le
radic
2pminus 1 middot(
E tr∆pX
)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 25 35
Polynomial Moment Inequalities
Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
E tr(
sum
kεkAk
)2p
le (2pminus 1)p middot tr(
sum
kA2k
)p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 26 35
Dependent Sequences
Adding Dependence
1 MotivationMatrix CompletionMatrix Concentration
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent SequencesSums of Conditionally Zero-mean MatricesCombinatorial Sums
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 27 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Sum of Conditionally Zero-Mean Matrices)
Given a sequence of Hermitian matrices (Yk)nk=1 satisfying the
Conditional zero mean property E[Yk | (Yj)j 6=k] = 0
for all k define the random sum X =sumn
k=1Yk
Note (Yk)kge1 is a martingale difference sequence
Examples
Sums of independent centered random matricesMany sums of conditionally independent random matrices
Yk perpperp (Yj)j 6=k | Z and E[Yk |Z] = 0
Rademacher series with random matrix coefficients
X =sum
kεkWk
(Wk)kge1 Hermitian (εk)kge1 independent Rademacher
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 28 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Matrix Stein Pair for X =sumn
k=1 Yk
Let Y primek and Yk be conditionally iid given (Yj)j 6=k
Draw index K uniformly from 1 nDefine X prime = X + Y prime
K minus YK
Check Stein pair condition
E[X minusX prime | (Yj)jge1] = E[YK minus Y primeK | (Yj)jge1]
=1
n
sumn
k=1
(
Yk minus E[Y primek | (Yj)j 6=k]
)
=1
n
sumn
k=1Yk =
1
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 29 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Conditional Variance for X = Y minus EY
∆X =n
2middot E
[
(X minusX prime)2 | (Yj)jge1
]
=n
2middot E
[
(YK minus Y primeK)
2 | (Yj)jge1
]
=1
2
sumn
k=1
(
Y 2k + E[Y 2
k | (Yj)j 6=k])
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 30 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Given a deterministic array (Ajk)njk=1 of Hermitian matrices and a
uniformly random permutation π on 1 n define thecombinatorial matrix statistic
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Generalizes the scalar statistics studied by Hoeffding (1951)
Example
Sampling without replacement from B1 BnW =
sums
j=1Bπ(j)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 31 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Matrix Stein Pair for X = Y minus EY
Draw indices (JK) uniformly from 1 n2Define πprime = π (JK) and X prime =
sumnj=1Ajπprime(j) minus EY
Check Stein pair condition
E[X minusX prime | π] = E[
AJπ(J) +AKπ(K) minusAJπ(K) minusAKπ(J) | π]
=1
n2
sumn
jk=1Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
=2
n(Y minus EY ) =
2
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 32 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Conditional Variance for X = Y minus EY
∆X(π) =n
4E[
(X minusX prime)2 | π]
=1
4n
sumn
jk=1
[
Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
]2
41
n
sumn
jk=1
[
A2jπ(j) +A2
kπ(k) +A2jπ(k) +A2
kπ(j)
]
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 33 35
Extensions
Extensions
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = B
Beyond Sums
Matrix-valued functions satisfying a self-reproducing property
eg Matrix second-order Rademacher chaossum
jk εjεkAjk
Yields a dependent bounded differences inequality for matrices
Generalized Matrix Stein Pairs
Satisfy E[g(X)minus g(X prime) |Z] = αX almost surely forg R rarr R weakly increasingMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 34 35
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Bernstein S The theory of probabilities Gastehizdat Publishing House 1946
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Candes E J and Recht B Exact matrix completion via convex optimization Found Comput Math 9717ndash772 2009
Candes E J and Tao T The power of convex relaxation Near-optimal matrix completion IEEE Trans Info Theory 2009URL arXiv09031476 To appear Available at arXiv09031476
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Chatterjee S Concentration inequalities with exchangeable pairs PhD thesis Stanford University Palo Alto Feb 2008 URLarxivmath0507526vl
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available athttpwwwoptimization-onlineorgDB_FILE2011012898pdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Foygel R and Srebro N Concentration-based guarantees for low-rank matrix reconstruction Journal of Machine Learning
Research - Proceedings Track 19315ndash340 2011
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W A combinatorial central limit theorem Ann Math Statist 22558ndash566 1951
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 35 35
Extensions
References II
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Shawe-Taylor J Zemel R SBartlett P L Pereira F C N and Weinberger K Q (eds) Advances in Neural Information Processing Systems 24 pp1134ndash1142 2011
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs URL httparxivorgabs12016002 2012
Negahban S and Wainwright M J Restricted strong convexity and weighted matrix completion Optimal bounds with noisearXiv10092118v2[csIT] 2010
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B Simpler approach to matrix completion J Mach Learn Res 123413ndash3430 2011
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 36 35
- Motivation
-
- Matrix Completion
- Matrix Concentration
-
- Extensions
-
Exponential Tail Inequalities
Method of Exchangeable Pairs
Lemma
Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying
E(X minusX prime)F (X) ltinfin
Then
E[X F (X)] =1
2αE[(X minusX prime)(F (X)minus F (X prime))] (1)
Intuition
Can characterize the distribution of a random matrix byintegrating it against a class of test functions F
Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 18 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
2 Method of Exchangeable Pairs
Rewrite the derivative of the trace mgf
mprime(θ) = E trXeθX =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
Goal Use the smoothness of F (X) = eθX to bound the derivative
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 19 35
Exponential Tail Inequalities
Mean Value Trace Inequality
Lemma (Mackey Jordan Chen Farrell and Tropp 2012)
Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that
tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1
2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]
Standard matrix functions If g R rarr R and
A = Q
λ1
λd
Qlowast then g(A) = Q
g(λ1)
g(λd)
Qlowast
Inequality does not hold without the traceFor exponential concentration we let g(A) = A and h(B) = eθB
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 20 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
le θ
4αE tr
[
(X minusX prime)2 middot(
eθX + eθXprime)]
=θ
2αE tr
[
(X minusX prime)2 middot eθX]
= θ middot E tr
[
1
2αE[
(X minusX prime)2 |Z]
middot eθX]
= θ middot E tr[
∆X eθX]
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 21 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) le θ middot E tr[
∆X eθX]
4 Conditional Variance Bound ∆X 4 cX + v I
Yields differential inequality
mprime(θ) le cθE tr[
X eθX]
+ vθE tr[
eθX]
= cθ middotmprime(θ) + vθ middotm(θ)
Solve to bound m(θ) and thereby bound
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ) le d middot exp minust22v + 2ct
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 22 35
Exponential Tail Inequalities
Refined Exponential Concentration
Relaxing the constraint ∆X 4 cX + v
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a bounded matrix Stein pair with X isin Hd Definethe function
r(ψ) =1
ψlogE tr(eψ∆Xd) for each ψ gt 0
Then for all t ge 0 and all ψ gt 0
Pλmax(X) ge t le d middot exp minust22r(ψ) + 2t
radicψ
r(ψ) measures typical magnitude of conditional variance
Eλmax(∆X) le infψgt0
[
r(ψ) + log dψ
]
When d = 1 improves scalar result of Chatterjee (2008)Proof extends to unbounded random matricesMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 23 35
Exponential Tail Inequalities
Matrix Bernstein Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Hd satisfying
EYk = 0 and Yk le R for each index k
Define the variance parameter
σ2 =∥
∥
∥
sum
kEY 2
k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot exp minust23σ2 + 2Rt
Gaussian tail controlled by improved variance when t is smallKey proof idea Apply refined concentration and boundr(ψ) = 1
ψlogE tr(eψ∆Xd) using unrefined concentration
Constants better than Oliveira (2009) worse than Tropp (2011)Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 24 35
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere E tr |X|2p ltinfin Then
(
E tr |X|2p)12p le
radic
2pminus 1 middot(
E tr∆pX
)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 25 35
Polynomial Moment Inequalities
Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
E tr(
sum
kεkAk
)2p
le (2pminus 1)p middot tr(
sum
kA2k
)p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 26 35
Dependent Sequences
Adding Dependence
1 MotivationMatrix CompletionMatrix Concentration
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent SequencesSums of Conditionally Zero-mean MatricesCombinatorial Sums
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 27 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Sum of Conditionally Zero-Mean Matrices)
Given a sequence of Hermitian matrices (Yk)nk=1 satisfying the
Conditional zero mean property E[Yk | (Yj)j 6=k] = 0
for all k define the random sum X =sumn
k=1Yk
Note (Yk)kge1 is a martingale difference sequence
Examples
Sums of independent centered random matricesMany sums of conditionally independent random matrices
Yk perpperp (Yj)j 6=k | Z and E[Yk |Z] = 0
Rademacher series with random matrix coefficients
X =sum
kεkWk
(Wk)kge1 Hermitian (εk)kge1 independent Rademacher
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 28 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Matrix Stein Pair for X =sumn
k=1 Yk
Let Y primek and Yk be conditionally iid given (Yj)j 6=k
Draw index K uniformly from 1 nDefine X prime = X + Y prime
K minus YK
Check Stein pair condition
E[X minusX prime | (Yj)jge1] = E[YK minus Y primeK | (Yj)jge1]
=1
n
sumn
k=1
(
Yk minus E[Y primek | (Yj)j 6=k]
)
=1
n
sumn
k=1Yk =
1
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 29 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Conditional Variance for X = Y minus EY
∆X =n
2middot E
[
(X minusX prime)2 | (Yj)jge1
]
=n
2middot E
[
(YK minus Y primeK)
2 | (Yj)jge1
]
=1
2
sumn
k=1
(
Y 2k + E[Y 2
k | (Yj)j 6=k])
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 30 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Given a deterministic array (Ajk)njk=1 of Hermitian matrices and a
uniformly random permutation π on 1 n define thecombinatorial matrix statistic
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Generalizes the scalar statistics studied by Hoeffding (1951)
Example
Sampling without replacement from B1 BnW =
sums
j=1Bπ(j)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 31 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Matrix Stein Pair for X = Y minus EY
Draw indices (JK) uniformly from 1 n2Define πprime = π (JK) and X prime =
sumnj=1Ajπprime(j) minus EY
Check Stein pair condition
E[X minusX prime | π] = E[
AJπ(J) +AKπ(K) minusAJπ(K) minusAKπ(J) | π]
=1
n2
sumn
jk=1Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
=2
n(Y minus EY ) =
2
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 32 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Conditional Variance for X = Y minus EY
∆X(π) =n
4E[
(X minusX prime)2 | π]
=1
4n
sumn
jk=1
[
Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
]2
41
n
sumn
jk=1
[
A2jπ(j) +A2
kπ(k) +A2jπ(k) +A2
kπ(j)
]
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 33 35
Extensions
Extensions
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = B
Beyond Sums
Matrix-valued functions satisfying a self-reproducing property
eg Matrix second-order Rademacher chaossum
jk εjεkAjk
Yields a dependent bounded differences inequality for matrices
Generalized Matrix Stein Pairs
Satisfy E[g(X)minus g(X prime) |Z] = αX almost surely forg R rarr R weakly increasingMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 34 35
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Bernstein S The theory of probabilities Gastehizdat Publishing House 1946
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Candes E J and Recht B Exact matrix completion via convex optimization Found Comput Math 9717ndash772 2009
Candes E J and Tao T The power of convex relaxation Near-optimal matrix completion IEEE Trans Info Theory 2009URL arXiv09031476 To appear Available at arXiv09031476
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Chatterjee S Concentration inequalities with exchangeable pairs PhD thesis Stanford University Palo Alto Feb 2008 URLarxivmath0507526vl
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available athttpwwwoptimization-onlineorgDB_FILE2011012898pdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Foygel R and Srebro N Concentration-based guarantees for low-rank matrix reconstruction Journal of Machine Learning
Research - Proceedings Track 19315ndash340 2011
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W A combinatorial central limit theorem Ann Math Statist 22558ndash566 1951
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 35 35
Extensions
References II
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Shawe-Taylor J Zemel R SBartlett P L Pereira F C N and Weinberger K Q (eds) Advances in Neural Information Processing Systems 24 pp1134ndash1142 2011
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs URL httparxivorgabs12016002 2012
Negahban S and Wainwright M J Restricted strong convexity and weighted matrix completion Optimal bounds with noisearXiv10092118v2[csIT] 2010
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B Simpler approach to matrix completion J Mach Learn Res 123413ndash3430 2011
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 36 35
- Motivation
-
- Matrix Completion
- Matrix Concentration
-
- Extensions
-
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
2 Method of Exchangeable Pairs
Rewrite the derivative of the trace mgf
mprime(θ) = E trXeθX =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
Goal Use the smoothness of F (X) = eθX to bound the derivative
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 19 35
Exponential Tail Inequalities
Mean Value Trace Inequality
Lemma (Mackey Jordan Chen Farrell and Tropp 2012)
Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that
tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1
2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]
Standard matrix functions If g R rarr R and
A = Q
λ1
λd
Qlowast then g(A) = Q
g(λ1)
g(λd)
Qlowast
Inequality does not hold without the traceFor exponential concentration we let g(A) = A and h(B) = eθB
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 20 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
le θ
4αE tr
[
(X minusX prime)2 middot(
eθX + eθXprime)]
=θ
2αE tr
[
(X minusX prime)2 middot eθX]
= θ middot E tr
[
1
2αE[
(X minusX prime)2 |Z]
middot eθX]
= θ middot E tr[
∆X eθX]
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 21 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) le θ middot E tr[
∆X eθX]
4 Conditional Variance Bound ∆X 4 cX + v I
Yields differential inequality
mprime(θ) le cθE tr[
X eθX]
+ vθE tr[
eθX]
= cθ middotmprime(θ) + vθ middotm(θ)
Solve to bound m(θ) and thereby bound
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ) le d middot exp minust22v + 2ct
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 22 35
Exponential Tail Inequalities
Refined Exponential Concentration
Relaxing the constraint ∆X 4 cX + v
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a bounded matrix Stein pair with X isin Hd Definethe function
r(ψ) =1
ψlogE tr(eψ∆Xd) for each ψ gt 0
Then for all t ge 0 and all ψ gt 0
Pλmax(X) ge t le d middot exp minust22r(ψ) + 2t
radicψ
r(ψ) measures typical magnitude of conditional variance
Eλmax(∆X) le infψgt0
[
r(ψ) + log dψ
]
When d = 1 improves scalar result of Chatterjee (2008)Proof extends to unbounded random matricesMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 23 35
Exponential Tail Inequalities
Matrix Bernstein Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Hd satisfying
EYk = 0 and Yk le R for each index k
Define the variance parameter
σ2 =∥
∥
∥
sum
kEY 2
k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot exp minust23σ2 + 2Rt
Gaussian tail controlled by improved variance when t is smallKey proof idea Apply refined concentration and boundr(ψ) = 1
ψlogE tr(eψ∆Xd) using unrefined concentration
Constants better than Oliveira (2009) worse than Tropp (2011)Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 24 35
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere E tr |X|2p ltinfin Then
(
E tr |X|2p)12p le
radic
2pminus 1 middot(
E tr∆pX
)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 25 35
Polynomial Moment Inequalities
Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
E tr(
sum
kεkAk
)2p
le (2pminus 1)p middot tr(
sum
kA2k
)p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 26 35
Dependent Sequences
Adding Dependence
1 MotivationMatrix CompletionMatrix Concentration
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent SequencesSums of Conditionally Zero-mean MatricesCombinatorial Sums
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 27 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Sum of Conditionally Zero-Mean Matrices)
Given a sequence of Hermitian matrices (Yk)nk=1 satisfying the
Conditional zero mean property E[Yk | (Yj)j 6=k] = 0
for all k define the random sum X =sumn
k=1Yk
Note (Yk)kge1 is a martingale difference sequence
Examples
Sums of independent centered random matricesMany sums of conditionally independent random matrices
Yk perpperp (Yj)j 6=k | Z and E[Yk |Z] = 0
Rademacher series with random matrix coefficients
X =sum
kεkWk
(Wk)kge1 Hermitian (εk)kge1 independent Rademacher
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 28 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Matrix Stein Pair for X =sumn
k=1 Yk
Let Y primek and Yk be conditionally iid given (Yj)j 6=k
Draw index K uniformly from 1 nDefine X prime = X + Y prime
K minus YK
Check Stein pair condition
E[X minusX prime | (Yj)jge1] = E[YK minus Y primeK | (Yj)jge1]
=1
n
sumn
k=1
(
Yk minus E[Y primek | (Yj)j 6=k]
)
=1
n
sumn
k=1Yk =
1
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 29 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Conditional Variance for X = Y minus EY
∆X =n
2middot E
[
(X minusX prime)2 | (Yj)jge1
]
=n
2middot E
[
(YK minus Y primeK)
2 | (Yj)jge1
]
=1
2
sumn
k=1
(
Y 2k + E[Y 2
k | (Yj)j 6=k])
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 30 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Given a deterministic array (Ajk)njk=1 of Hermitian matrices and a
uniformly random permutation π on 1 n define thecombinatorial matrix statistic
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Generalizes the scalar statistics studied by Hoeffding (1951)
Example
Sampling without replacement from B1 BnW =
sums
j=1Bπ(j)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 31 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Matrix Stein Pair for X = Y minus EY
Draw indices (JK) uniformly from 1 n2Define πprime = π (JK) and X prime =
sumnj=1Ajπprime(j) minus EY
Check Stein pair condition
E[X minusX prime | π] = E[
AJπ(J) +AKπ(K) minusAJπ(K) minusAKπ(J) | π]
=1
n2
sumn
jk=1Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
=2
n(Y minus EY ) =
2
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 32 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Conditional Variance for X = Y minus EY
∆X(π) =n
4E[
(X minusX prime)2 | π]
=1
4n
sumn
jk=1
[
Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
]2
41
n
sumn
jk=1
[
A2jπ(j) +A2
kπ(k) +A2jπ(k) +A2
kπ(j)
]
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 33 35
Extensions
Extensions
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = B
Beyond Sums
Matrix-valued functions satisfying a self-reproducing property
eg Matrix second-order Rademacher chaossum
jk εjεkAjk
Yields a dependent bounded differences inequality for matrices
Generalized Matrix Stein Pairs
Satisfy E[g(X)minus g(X prime) |Z] = αX almost surely forg R rarr R weakly increasingMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 34 35
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Bernstein S The theory of probabilities Gastehizdat Publishing House 1946
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Candes E J and Recht B Exact matrix completion via convex optimization Found Comput Math 9717ndash772 2009
Candes E J and Tao T The power of convex relaxation Near-optimal matrix completion IEEE Trans Info Theory 2009URL arXiv09031476 To appear Available at arXiv09031476
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Chatterjee S Concentration inequalities with exchangeable pairs PhD thesis Stanford University Palo Alto Feb 2008 URLarxivmath0507526vl
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available athttpwwwoptimization-onlineorgDB_FILE2011012898pdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Foygel R and Srebro N Concentration-based guarantees for low-rank matrix reconstruction Journal of Machine Learning
Research - Proceedings Track 19315ndash340 2011
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W A combinatorial central limit theorem Ann Math Statist 22558ndash566 1951
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 35 35
Extensions
References II
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Shawe-Taylor J Zemel R SBartlett P L Pereira F C N and Weinberger K Q (eds) Advances in Neural Information Processing Systems 24 pp1134ndash1142 2011
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs URL httparxivorgabs12016002 2012
Negahban S and Wainwright M J Restricted strong convexity and weighted matrix completion Optimal bounds with noisearXiv10092118v2[csIT] 2010
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B Simpler approach to matrix completion J Mach Learn Res 123413ndash3430 2011
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 36 35
- Motivation
-
- Matrix Completion
- Matrix Concentration
-
- Extensions
-
Exponential Tail Inequalities
Mean Value Trace Inequality
Lemma (Mackey Jordan Chen Farrell and Tropp 2012)
Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that
tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1
2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]
Standard matrix functions If g R rarr R and
A = Q
λ1
λd
Qlowast then g(A) = Q
g(λ1)
g(λd)
Qlowast
Inequality does not hold without the traceFor exponential concentration we let g(A) = A and h(B) = eθB
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 20 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
le θ
4αE tr
[
(X minusX prime)2 middot(
eθX + eθXprime)]
=θ
2αE tr
[
(X minusX prime)2 middot eθX]
= θ middot E tr
[
1
2αE[
(X minusX prime)2 |Z]
middot eθX]
= θ middot E tr[
∆X eθX]
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 21 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) le θ middot E tr[
∆X eθX]
4 Conditional Variance Bound ∆X 4 cX + v I
Yields differential inequality
mprime(θ) le cθE tr[
X eθX]
+ vθE tr[
eθX]
= cθ middotmprime(θ) + vθ middotm(θ)
Solve to bound m(θ) and thereby bound
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ) le d middot exp minust22v + 2ct
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 22 35
Exponential Tail Inequalities
Refined Exponential Concentration
Relaxing the constraint ∆X 4 cX + v
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a bounded matrix Stein pair with X isin Hd Definethe function
r(ψ) =1
ψlogE tr(eψ∆Xd) for each ψ gt 0
Then for all t ge 0 and all ψ gt 0
Pλmax(X) ge t le d middot exp minust22r(ψ) + 2t
radicψ
r(ψ) measures typical magnitude of conditional variance
Eλmax(∆X) le infψgt0
[
r(ψ) + log dψ
]
When d = 1 improves scalar result of Chatterjee (2008)Proof extends to unbounded random matricesMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 23 35
Exponential Tail Inequalities
Matrix Bernstein Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Hd satisfying
EYk = 0 and Yk le R for each index k
Define the variance parameter
σ2 =∥
∥
∥
sum
kEY 2
k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot exp minust23σ2 + 2Rt
Gaussian tail controlled by improved variance when t is smallKey proof idea Apply refined concentration and boundr(ψ) = 1
ψlogE tr(eψ∆Xd) using unrefined concentration
Constants better than Oliveira (2009) worse than Tropp (2011)Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 24 35
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere E tr |X|2p ltinfin Then
(
E tr |X|2p)12p le
radic
2pminus 1 middot(
E tr∆pX
)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 25 35
Polynomial Moment Inequalities
Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
E tr(
sum
kεkAk
)2p
le (2pminus 1)p middot tr(
sum
kA2k
)p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 26 35
Dependent Sequences
Adding Dependence
1 MotivationMatrix CompletionMatrix Concentration
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent SequencesSums of Conditionally Zero-mean MatricesCombinatorial Sums
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 27 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Sum of Conditionally Zero-Mean Matrices)
Given a sequence of Hermitian matrices (Yk)nk=1 satisfying the
Conditional zero mean property E[Yk | (Yj)j 6=k] = 0
for all k define the random sum X =sumn
k=1Yk
Note (Yk)kge1 is a martingale difference sequence
Examples
Sums of independent centered random matricesMany sums of conditionally independent random matrices
Yk perpperp (Yj)j 6=k | Z and E[Yk |Z] = 0
Rademacher series with random matrix coefficients
X =sum
kεkWk
(Wk)kge1 Hermitian (εk)kge1 independent Rademacher
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 28 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Matrix Stein Pair for X =sumn
k=1 Yk
Let Y primek and Yk be conditionally iid given (Yj)j 6=k
Draw index K uniformly from 1 nDefine X prime = X + Y prime
K minus YK
Check Stein pair condition
E[X minusX prime | (Yj)jge1] = E[YK minus Y primeK | (Yj)jge1]
=1
n
sumn
k=1
(
Yk minus E[Y primek | (Yj)j 6=k]
)
=1
n
sumn
k=1Yk =
1
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 29 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Conditional Variance for X = Y minus EY
∆X =n
2middot E
[
(X minusX prime)2 | (Yj)jge1
]
=n
2middot E
[
(YK minus Y primeK)
2 | (Yj)jge1
]
=1
2
sumn
k=1
(
Y 2k + E[Y 2
k | (Yj)j 6=k])
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 30 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Given a deterministic array (Ajk)njk=1 of Hermitian matrices and a
uniformly random permutation π on 1 n define thecombinatorial matrix statistic
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Generalizes the scalar statistics studied by Hoeffding (1951)
Example
Sampling without replacement from B1 BnW =
sums
j=1Bπ(j)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 31 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Matrix Stein Pair for X = Y minus EY
Draw indices (JK) uniformly from 1 n2Define πprime = π (JK) and X prime =
sumnj=1Ajπprime(j) minus EY
Check Stein pair condition
E[X minusX prime | π] = E[
AJπ(J) +AKπ(K) minusAJπ(K) minusAKπ(J) | π]
=1
n2
sumn
jk=1Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
=2
n(Y minus EY ) =
2
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 32 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Conditional Variance for X = Y minus EY
∆X(π) =n
4E[
(X minusX prime)2 | π]
=1
4n
sumn
jk=1
[
Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
]2
41
n
sumn
jk=1
[
A2jπ(j) +A2
kπ(k) +A2jπ(k) +A2
kπ(j)
]
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 33 35
Extensions
Extensions
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = B
Beyond Sums
Matrix-valued functions satisfying a self-reproducing property
eg Matrix second-order Rademacher chaossum
jk εjεkAjk
Yields a dependent bounded differences inequality for matrices
Generalized Matrix Stein Pairs
Satisfy E[g(X)minus g(X prime) |Z] = αX almost surely forg R rarr R weakly increasingMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 34 35
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Bernstein S The theory of probabilities Gastehizdat Publishing House 1946
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Candes E J and Recht B Exact matrix completion via convex optimization Found Comput Math 9717ndash772 2009
Candes E J and Tao T The power of convex relaxation Near-optimal matrix completion IEEE Trans Info Theory 2009URL arXiv09031476 To appear Available at arXiv09031476
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Chatterjee S Concentration inequalities with exchangeable pairs PhD thesis Stanford University Palo Alto Feb 2008 URLarxivmath0507526vl
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available athttpwwwoptimization-onlineorgDB_FILE2011012898pdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Foygel R and Srebro N Concentration-based guarantees for low-rank matrix reconstruction Journal of Machine Learning
Research - Proceedings Track 19315ndash340 2011
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W A combinatorial central limit theorem Ann Math Statist 22558ndash566 1951
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 35 35
Extensions
References II
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Shawe-Taylor J Zemel R SBartlett P L Pereira F C N and Weinberger K Q (eds) Advances in Neural Information Processing Systems 24 pp1134ndash1142 2011
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs URL httparxivorgabs12016002 2012
Negahban S and Wainwright M J Restricted strong convexity and weighted matrix completion Optimal bounds with noisearXiv10092118v2[csIT] 2010
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B Simpler approach to matrix completion J Mach Learn Res 123413ndash3430 2011
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 36 35
- Motivation
-
- Matrix Completion
- Matrix Concentration
-
- Extensions
-
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
le θ
4αE tr
[
(X minusX prime)2 middot(
eθX + eθXprime)]
=θ
2αE tr
[
(X minusX prime)2 middot eθX]
= θ middot E tr
[
1
2αE[
(X minusX prime)2 |Z]
middot eθX]
= θ middot E tr[
∆X eθX]
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 21 35
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) le θ middot E tr[
∆X eθX]
4 Conditional Variance Bound ∆X 4 cX + v I
Yields differential inequality
mprime(θ) le cθE tr[
X eθX]
+ vθE tr[
eθX]
= cθ middotmprime(θ) + vθ middotm(θ)
Solve to bound m(θ) and thereby bound
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ) le d middot exp minust22v + 2ct
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 22 35
Exponential Tail Inequalities
Refined Exponential Concentration
Relaxing the constraint ∆X 4 cX + v
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a bounded matrix Stein pair with X isin Hd Definethe function
r(ψ) =1
ψlogE tr(eψ∆Xd) for each ψ gt 0
Then for all t ge 0 and all ψ gt 0
Pλmax(X) ge t le d middot exp minust22r(ψ) + 2t
radicψ
r(ψ) measures typical magnitude of conditional variance
Eλmax(∆X) le infψgt0
[
r(ψ) + log dψ
]
When d = 1 improves scalar result of Chatterjee (2008)Proof extends to unbounded random matricesMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 23 35
Exponential Tail Inequalities
Matrix Bernstein Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Hd satisfying
EYk = 0 and Yk le R for each index k
Define the variance parameter
σ2 =∥
∥
∥
sum
kEY 2
k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot exp minust23σ2 + 2Rt
Gaussian tail controlled by improved variance when t is smallKey proof idea Apply refined concentration and boundr(ψ) = 1
ψlogE tr(eψ∆Xd) using unrefined concentration
Constants better than Oliveira (2009) worse than Tropp (2011)Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 24 35
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere E tr |X|2p ltinfin Then
(
E tr |X|2p)12p le
radic
2pminus 1 middot(
E tr∆pX
)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 25 35
Polynomial Moment Inequalities
Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
E tr(
sum
kεkAk
)2p
le (2pminus 1)p middot tr(
sum
kA2k
)p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 26 35
Dependent Sequences
Adding Dependence
1 MotivationMatrix CompletionMatrix Concentration
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent SequencesSums of Conditionally Zero-mean MatricesCombinatorial Sums
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 27 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Sum of Conditionally Zero-Mean Matrices)
Given a sequence of Hermitian matrices (Yk)nk=1 satisfying the
Conditional zero mean property E[Yk | (Yj)j 6=k] = 0
for all k define the random sum X =sumn
k=1Yk
Note (Yk)kge1 is a martingale difference sequence
Examples
Sums of independent centered random matricesMany sums of conditionally independent random matrices
Yk perpperp (Yj)j 6=k | Z and E[Yk |Z] = 0
Rademacher series with random matrix coefficients
X =sum
kεkWk
(Wk)kge1 Hermitian (εk)kge1 independent Rademacher
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 28 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Matrix Stein Pair for X =sumn
k=1 Yk
Let Y primek and Yk be conditionally iid given (Yj)j 6=k
Draw index K uniformly from 1 nDefine X prime = X + Y prime
K minus YK
Check Stein pair condition
E[X minusX prime | (Yj)jge1] = E[YK minus Y primeK | (Yj)jge1]
=1
n
sumn
k=1
(
Yk minus E[Y primek | (Yj)j 6=k]
)
=1
n
sumn
k=1Yk =
1
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 29 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Conditional Variance for X = Y minus EY
∆X =n
2middot E
[
(X minusX prime)2 | (Yj)jge1
]
=n
2middot E
[
(YK minus Y primeK)
2 | (Yj)jge1
]
=1
2
sumn
k=1
(
Y 2k + E[Y 2
k | (Yj)j 6=k])
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 30 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Given a deterministic array (Ajk)njk=1 of Hermitian matrices and a
uniformly random permutation π on 1 n define thecombinatorial matrix statistic
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Generalizes the scalar statistics studied by Hoeffding (1951)
Example
Sampling without replacement from B1 BnW =
sums
j=1Bπ(j)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 31 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Matrix Stein Pair for X = Y minus EY
Draw indices (JK) uniformly from 1 n2Define πprime = π (JK) and X prime =
sumnj=1Ajπprime(j) minus EY
Check Stein pair condition
E[X minusX prime | π] = E[
AJπ(J) +AKπ(K) minusAJπ(K) minusAKπ(J) | π]
=1
n2
sumn
jk=1Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
=2
n(Y minus EY ) =
2
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 32 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Conditional Variance for X = Y minus EY
∆X(π) =n
4E[
(X minusX prime)2 | π]
=1
4n
sumn
jk=1
[
Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
]2
41
n
sumn
jk=1
[
A2jπ(j) +A2
kπ(k) +A2jπ(k) +A2
kπ(j)
]
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 33 35
Extensions
Extensions
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = B
Beyond Sums
Matrix-valued functions satisfying a self-reproducing property
eg Matrix second-order Rademacher chaossum
jk εjεkAjk
Yields a dependent bounded differences inequality for matrices
Generalized Matrix Stein Pairs
Satisfy E[g(X)minus g(X prime) |Z] = αX almost surely forg R rarr R weakly increasingMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 34 35
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Bernstein S The theory of probabilities Gastehizdat Publishing House 1946
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Candes E J and Recht B Exact matrix completion via convex optimization Found Comput Math 9717ndash772 2009
Candes E J and Tao T The power of convex relaxation Near-optimal matrix completion IEEE Trans Info Theory 2009URL arXiv09031476 To appear Available at arXiv09031476
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Chatterjee S Concentration inequalities with exchangeable pairs PhD thesis Stanford University Palo Alto Feb 2008 URLarxivmath0507526vl
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available athttpwwwoptimization-onlineorgDB_FILE2011012898pdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Foygel R and Srebro N Concentration-based guarantees for low-rank matrix reconstruction Journal of Machine Learning
Research - Proceedings Track 19315ndash340 2011
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W A combinatorial central limit theorem Ann Math Statist 22558ndash566 1951
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 35 35
Extensions
References II
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Shawe-Taylor J Zemel R SBartlett P L Pereira F C N and Weinberger K Q (eds) Advances in Neural Information Processing Systems 24 pp1134ndash1142 2011
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs URL httparxivorgabs12016002 2012
Negahban S and Wainwright M J Restricted strong convexity and weighted matrix completion Optimal bounds with noisearXiv10092118v2[csIT] 2010
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B Simpler approach to matrix completion J Mach Learn Res 123413ndash3430 2011
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 36 35
- Motivation
-
- Matrix Completion
- Matrix Concentration
-
- Extensions
-
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) le θ middot E tr[
∆X eθX]
4 Conditional Variance Bound ∆X 4 cX + v I
Yields differential inequality
mprime(θ) le cθE tr[
X eθX]
+ vθE tr[
eθX]
= cθ middotmprime(θ) + vθ middotm(θ)
Solve to bound m(θ) and thereby bound
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ) le d middot exp minust22v + 2ct
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 22 35
Exponential Tail Inequalities
Refined Exponential Concentration
Relaxing the constraint ∆X 4 cX + v
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a bounded matrix Stein pair with X isin Hd Definethe function
r(ψ) =1
ψlogE tr(eψ∆Xd) for each ψ gt 0
Then for all t ge 0 and all ψ gt 0
Pλmax(X) ge t le d middot exp minust22r(ψ) + 2t
radicψ
r(ψ) measures typical magnitude of conditional variance
Eλmax(∆X) le infψgt0
[
r(ψ) + log dψ
]
When d = 1 improves scalar result of Chatterjee (2008)Proof extends to unbounded random matricesMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 23 35
Exponential Tail Inequalities
Matrix Bernstein Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Hd satisfying
EYk = 0 and Yk le R for each index k
Define the variance parameter
σ2 =∥
∥
∥
sum
kEY 2
k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot exp minust23σ2 + 2Rt
Gaussian tail controlled by improved variance when t is smallKey proof idea Apply refined concentration and boundr(ψ) = 1
ψlogE tr(eψ∆Xd) using unrefined concentration
Constants better than Oliveira (2009) worse than Tropp (2011)Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 24 35
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere E tr |X|2p ltinfin Then
(
E tr |X|2p)12p le
radic
2pminus 1 middot(
E tr∆pX
)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 25 35
Polynomial Moment Inequalities
Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
E tr(
sum
kεkAk
)2p
le (2pminus 1)p middot tr(
sum
kA2k
)p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 26 35
Dependent Sequences
Adding Dependence
1 MotivationMatrix CompletionMatrix Concentration
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent SequencesSums of Conditionally Zero-mean MatricesCombinatorial Sums
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 27 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Sum of Conditionally Zero-Mean Matrices)
Given a sequence of Hermitian matrices (Yk)nk=1 satisfying the
Conditional zero mean property E[Yk | (Yj)j 6=k] = 0
for all k define the random sum X =sumn
k=1Yk
Note (Yk)kge1 is a martingale difference sequence
Examples
Sums of independent centered random matricesMany sums of conditionally independent random matrices
Yk perpperp (Yj)j 6=k | Z and E[Yk |Z] = 0
Rademacher series with random matrix coefficients
X =sum
kεkWk
(Wk)kge1 Hermitian (εk)kge1 independent Rademacher
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 28 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Matrix Stein Pair for X =sumn
k=1 Yk
Let Y primek and Yk be conditionally iid given (Yj)j 6=k
Draw index K uniformly from 1 nDefine X prime = X + Y prime
K minus YK
Check Stein pair condition
E[X minusX prime | (Yj)jge1] = E[YK minus Y primeK | (Yj)jge1]
=1
n
sumn
k=1
(
Yk minus E[Y primek | (Yj)j 6=k]
)
=1
n
sumn
k=1Yk =
1
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 29 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Conditional Variance for X = Y minus EY
∆X =n
2middot E
[
(X minusX prime)2 | (Yj)jge1
]
=n
2middot E
[
(YK minus Y primeK)
2 | (Yj)jge1
]
=1
2
sumn
k=1
(
Y 2k + E[Y 2
k | (Yj)j 6=k])
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 30 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Given a deterministic array (Ajk)njk=1 of Hermitian matrices and a
uniformly random permutation π on 1 n define thecombinatorial matrix statistic
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Generalizes the scalar statistics studied by Hoeffding (1951)
Example
Sampling without replacement from B1 BnW =
sums
j=1Bπ(j)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 31 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Matrix Stein Pair for X = Y minus EY
Draw indices (JK) uniformly from 1 n2Define πprime = π (JK) and X prime =
sumnj=1Ajπprime(j) minus EY
Check Stein pair condition
E[X minusX prime | π] = E[
AJπ(J) +AKπ(K) minusAJπ(K) minusAKπ(J) | π]
=1
n2
sumn
jk=1Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
=2
n(Y minus EY ) =
2
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 32 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Conditional Variance for X = Y minus EY
∆X(π) =n
4E[
(X minusX prime)2 | π]
=1
4n
sumn
jk=1
[
Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
]2
41
n
sumn
jk=1
[
A2jπ(j) +A2
kπ(k) +A2jπ(k) +A2
kπ(j)
]
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 33 35
Extensions
Extensions
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = B
Beyond Sums
Matrix-valued functions satisfying a self-reproducing property
eg Matrix second-order Rademacher chaossum
jk εjεkAjk
Yields a dependent bounded differences inequality for matrices
Generalized Matrix Stein Pairs
Satisfy E[g(X)minus g(X prime) |Z] = αX almost surely forg R rarr R weakly increasingMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 34 35
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Bernstein S The theory of probabilities Gastehizdat Publishing House 1946
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Candes E J and Recht B Exact matrix completion via convex optimization Found Comput Math 9717ndash772 2009
Candes E J and Tao T The power of convex relaxation Near-optimal matrix completion IEEE Trans Info Theory 2009URL arXiv09031476 To appear Available at arXiv09031476
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Chatterjee S Concentration inequalities with exchangeable pairs PhD thesis Stanford University Palo Alto Feb 2008 URLarxivmath0507526vl
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available athttpwwwoptimization-onlineorgDB_FILE2011012898pdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Foygel R and Srebro N Concentration-based guarantees for low-rank matrix reconstruction Journal of Machine Learning
Research - Proceedings Track 19315ndash340 2011
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W A combinatorial central limit theorem Ann Math Statist 22558ndash566 1951
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 35 35
Extensions
References II
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Shawe-Taylor J Zemel R SBartlett P L Pereira F C N and Weinberger K Q (eds) Advances in Neural Information Processing Systems 24 pp1134ndash1142 2011
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs URL httparxivorgabs12016002 2012
Negahban S and Wainwright M J Restricted strong convexity and weighted matrix completion Optimal bounds with noisearXiv10092118v2[csIT] 2010
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B Simpler approach to matrix completion J Mach Learn Res 123413ndash3430 2011
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 36 35
- Motivation
-
- Matrix Completion
- Matrix Concentration
-
- Extensions
-
Exponential Tail Inequalities
Refined Exponential Concentration
Relaxing the constraint ∆X 4 cX + v
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a bounded matrix Stein pair with X isin Hd Definethe function
r(ψ) =1
ψlogE tr(eψ∆Xd) for each ψ gt 0
Then for all t ge 0 and all ψ gt 0
Pλmax(X) ge t le d middot exp minust22r(ψ) + 2t
radicψ
r(ψ) measures typical magnitude of conditional variance
Eλmax(∆X) le infψgt0
[
r(ψ) + log dψ
]
When d = 1 improves scalar result of Chatterjee (2008)Proof extends to unbounded random matricesMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 23 35
Exponential Tail Inequalities
Matrix Bernstein Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Hd satisfying
EYk = 0 and Yk le R for each index k
Define the variance parameter
σ2 =∥
∥
∥
sum
kEY 2
k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot exp minust23σ2 + 2Rt
Gaussian tail controlled by improved variance when t is smallKey proof idea Apply refined concentration and boundr(ψ) = 1
ψlogE tr(eψ∆Xd) using unrefined concentration
Constants better than Oliveira (2009) worse than Tropp (2011)Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 24 35
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere E tr |X|2p ltinfin Then
(
E tr |X|2p)12p le
radic
2pminus 1 middot(
E tr∆pX
)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 25 35
Polynomial Moment Inequalities
Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
E tr(
sum
kεkAk
)2p
le (2pminus 1)p middot tr(
sum
kA2k
)p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 26 35
Dependent Sequences
Adding Dependence
1 MotivationMatrix CompletionMatrix Concentration
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent SequencesSums of Conditionally Zero-mean MatricesCombinatorial Sums
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 27 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Sum of Conditionally Zero-Mean Matrices)
Given a sequence of Hermitian matrices (Yk)nk=1 satisfying the
Conditional zero mean property E[Yk | (Yj)j 6=k] = 0
for all k define the random sum X =sumn
k=1Yk
Note (Yk)kge1 is a martingale difference sequence
Examples
Sums of independent centered random matricesMany sums of conditionally independent random matrices
Yk perpperp (Yj)j 6=k | Z and E[Yk |Z] = 0
Rademacher series with random matrix coefficients
X =sum
kεkWk
(Wk)kge1 Hermitian (εk)kge1 independent Rademacher
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 28 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Matrix Stein Pair for X =sumn
k=1 Yk
Let Y primek and Yk be conditionally iid given (Yj)j 6=k
Draw index K uniformly from 1 nDefine X prime = X + Y prime
K minus YK
Check Stein pair condition
E[X minusX prime | (Yj)jge1] = E[YK minus Y primeK | (Yj)jge1]
=1
n
sumn
k=1
(
Yk minus E[Y primek | (Yj)j 6=k]
)
=1
n
sumn
k=1Yk =
1
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 29 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Conditional Variance for X = Y minus EY
∆X =n
2middot E
[
(X minusX prime)2 | (Yj)jge1
]
=n
2middot E
[
(YK minus Y primeK)
2 | (Yj)jge1
]
=1
2
sumn
k=1
(
Y 2k + E[Y 2
k | (Yj)j 6=k])
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 30 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Given a deterministic array (Ajk)njk=1 of Hermitian matrices and a
uniformly random permutation π on 1 n define thecombinatorial matrix statistic
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Generalizes the scalar statistics studied by Hoeffding (1951)
Example
Sampling without replacement from B1 BnW =
sums
j=1Bπ(j)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 31 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Matrix Stein Pair for X = Y minus EY
Draw indices (JK) uniformly from 1 n2Define πprime = π (JK) and X prime =
sumnj=1Ajπprime(j) minus EY
Check Stein pair condition
E[X minusX prime | π] = E[
AJπ(J) +AKπ(K) minusAJπ(K) minusAKπ(J) | π]
=1
n2
sumn
jk=1Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
=2
n(Y minus EY ) =
2
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 32 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Conditional Variance for X = Y minus EY
∆X(π) =n
4E[
(X minusX prime)2 | π]
=1
4n
sumn
jk=1
[
Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
]2
41
n
sumn
jk=1
[
A2jπ(j) +A2
kπ(k) +A2jπ(k) +A2
kπ(j)
]
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 33 35
Extensions
Extensions
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = B
Beyond Sums
Matrix-valued functions satisfying a self-reproducing property
eg Matrix second-order Rademacher chaossum
jk εjεkAjk
Yields a dependent bounded differences inequality for matrices
Generalized Matrix Stein Pairs
Satisfy E[g(X)minus g(X prime) |Z] = αX almost surely forg R rarr R weakly increasingMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 34 35
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Bernstein S The theory of probabilities Gastehizdat Publishing House 1946
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Candes E J and Recht B Exact matrix completion via convex optimization Found Comput Math 9717ndash772 2009
Candes E J and Tao T The power of convex relaxation Near-optimal matrix completion IEEE Trans Info Theory 2009URL arXiv09031476 To appear Available at arXiv09031476
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Chatterjee S Concentration inequalities with exchangeable pairs PhD thesis Stanford University Palo Alto Feb 2008 URLarxivmath0507526vl
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available athttpwwwoptimization-onlineorgDB_FILE2011012898pdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Foygel R and Srebro N Concentration-based guarantees for low-rank matrix reconstruction Journal of Machine Learning
Research - Proceedings Track 19315ndash340 2011
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W A combinatorial central limit theorem Ann Math Statist 22558ndash566 1951
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 35 35
Extensions
References II
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Shawe-Taylor J Zemel R SBartlett P L Pereira F C N and Weinberger K Q (eds) Advances in Neural Information Processing Systems 24 pp1134ndash1142 2011
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs URL httparxivorgabs12016002 2012
Negahban S and Wainwright M J Restricted strong convexity and weighted matrix completion Optimal bounds with noisearXiv10092118v2[csIT] 2010
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B Simpler approach to matrix completion J Mach Learn Res 123413ndash3430 2011
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 36 35
- Motivation
-
- Matrix Completion
- Matrix Concentration
-
- Extensions
-
Exponential Tail Inequalities
Matrix Bernstein Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Hd satisfying
EYk = 0 and Yk le R for each index k
Define the variance parameter
σ2 =∥
∥
∥
sum
kEY 2
k
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot exp minust23σ2 + 2Rt
Gaussian tail controlled by improved variance when t is smallKey proof idea Apply refined concentration and boundr(ψ) = 1
ψlogE tr(eψ∆Xd) using unrefined concentration
Constants better than Oliveira (2009) worse than Tropp (2011)Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 24 35
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere E tr |X|2p ltinfin Then
(
E tr |X|2p)12p le
radic
2pminus 1 middot(
E tr∆pX
)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 25 35
Polynomial Moment Inequalities
Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
E tr(
sum
kεkAk
)2p
le (2pminus 1)p middot tr(
sum
kA2k
)p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 26 35
Dependent Sequences
Adding Dependence
1 MotivationMatrix CompletionMatrix Concentration
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent SequencesSums of Conditionally Zero-mean MatricesCombinatorial Sums
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 27 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Sum of Conditionally Zero-Mean Matrices)
Given a sequence of Hermitian matrices (Yk)nk=1 satisfying the
Conditional zero mean property E[Yk | (Yj)j 6=k] = 0
for all k define the random sum X =sumn
k=1Yk
Note (Yk)kge1 is a martingale difference sequence
Examples
Sums of independent centered random matricesMany sums of conditionally independent random matrices
Yk perpperp (Yj)j 6=k | Z and E[Yk |Z] = 0
Rademacher series with random matrix coefficients
X =sum
kεkWk
(Wk)kge1 Hermitian (εk)kge1 independent Rademacher
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 28 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Matrix Stein Pair for X =sumn
k=1 Yk
Let Y primek and Yk be conditionally iid given (Yj)j 6=k
Draw index K uniformly from 1 nDefine X prime = X + Y prime
K minus YK
Check Stein pair condition
E[X minusX prime | (Yj)jge1] = E[YK minus Y primeK | (Yj)jge1]
=1
n
sumn
k=1
(
Yk minus E[Y primek | (Yj)j 6=k]
)
=1
n
sumn
k=1Yk =
1
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 29 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Conditional Variance for X = Y minus EY
∆X =n
2middot E
[
(X minusX prime)2 | (Yj)jge1
]
=n
2middot E
[
(YK minus Y primeK)
2 | (Yj)jge1
]
=1
2
sumn
k=1
(
Y 2k + E[Y 2
k | (Yj)j 6=k])
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 30 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Given a deterministic array (Ajk)njk=1 of Hermitian matrices and a
uniformly random permutation π on 1 n define thecombinatorial matrix statistic
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Generalizes the scalar statistics studied by Hoeffding (1951)
Example
Sampling without replacement from B1 BnW =
sums
j=1Bπ(j)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 31 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Matrix Stein Pair for X = Y minus EY
Draw indices (JK) uniformly from 1 n2Define πprime = π (JK) and X prime =
sumnj=1Ajπprime(j) minus EY
Check Stein pair condition
E[X minusX prime | π] = E[
AJπ(J) +AKπ(K) minusAJπ(K) minusAKπ(J) | π]
=1
n2
sumn
jk=1Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
=2
n(Y minus EY ) =
2
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 32 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Conditional Variance for X = Y minus EY
∆X(π) =n
4E[
(X minusX prime)2 | π]
=1
4n
sumn
jk=1
[
Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
]2
41
n
sumn
jk=1
[
A2jπ(j) +A2
kπ(k) +A2jπ(k) +A2
kπ(j)
]
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 33 35
Extensions
Extensions
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = B
Beyond Sums
Matrix-valued functions satisfying a self-reproducing property
eg Matrix second-order Rademacher chaossum
jk εjεkAjk
Yields a dependent bounded differences inequality for matrices
Generalized Matrix Stein Pairs
Satisfy E[g(X)minus g(X prime) |Z] = αX almost surely forg R rarr R weakly increasingMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 34 35
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Bernstein S The theory of probabilities Gastehizdat Publishing House 1946
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Candes E J and Recht B Exact matrix completion via convex optimization Found Comput Math 9717ndash772 2009
Candes E J and Tao T The power of convex relaxation Near-optimal matrix completion IEEE Trans Info Theory 2009URL arXiv09031476 To appear Available at arXiv09031476
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Chatterjee S Concentration inequalities with exchangeable pairs PhD thesis Stanford University Palo Alto Feb 2008 URLarxivmath0507526vl
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available athttpwwwoptimization-onlineorgDB_FILE2011012898pdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Foygel R and Srebro N Concentration-based guarantees for low-rank matrix reconstruction Journal of Machine Learning
Research - Proceedings Track 19315ndash340 2011
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W A combinatorial central limit theorem Ann Math Statist 22558ndash566 1951
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 35 35
Extensions
References II
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Shawe-Taylor J Zemel R SBartlett P L Pereira F C N and Weinberger K Q (eds) Advances in Neural Information Processing Systems 24 pp1134ndash1142 2011
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs URL httparxivorgabs12016002 2012
Negahban S and Wainwright M J Restricted strong convexity and weighted matrix completion Optimal bounds with noisearXiv10092118v2[csIT] 2010
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B Simpler approach to matrix completion J Mach Learn Res 123413ndash3430 2011
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 36 35
- Motivation
-
- Matrix Completion
- Matrix Concentration
-
- Extensions
-
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere E tr |X|2p ltinfin Then
(
E tr |X|2p)12p le
radic
2pminus 1 middot(
E tr∆pX
)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 25 35
Polynomial Moment Inequalities
Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
E tr(
sum
kεkAk
)2p
le (2pminus 1)p middot tr(
sum
kA2k
)p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 26 35
Dependent Sequences
Adding Dependence
1 MotivationMatrix CompletionMatrix Concentration
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent SequencesSums of Conditionally Zero-mean MatricesCombinatorial Sums
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 27 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Sum of Conditionally Zero-Mean Matrices)
Given a sequence of Hermitian matrices (Yk)nk=1 satisfying the
Conditional zero mean property E[Yk | (Yj)j 6=k] = 0
for all k define the random sum X =sumn
k=1Yk
Note (Yk)kge1 is a martingale difference sequence
Examples
Sums of independent centered random matricesMany sums of conditionally independent random matrices
Yk perpperp (Yj)j 6=k | Z and E[Yk |Z] = 0
Rademacher series with random matrix coefficients
X =sum
kεkWk
(Wk)kge1 Hermitian (εk)kge1 independent Rademacher
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 28 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Matrix Stein Pair for X =sumn
k=1 Yk
Let Y primek and Yk be conditionally iid given (Yj)j 6=k
Draw index K uniformly from 1 nDefine X prime = X + Y prime
K minus YK
Check Stein pair condition
E[X minusX prime | (Yj)jge1] = E[YK minus Y primeK | (Yj)jge1]
=1
n
sumn
k=1
(
Yk minus E[Y primek | (Yj)j 6=k]
)
=1
n
sumn
k=1Yk =
1
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 29 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Conditional Variance for X = Y minus EY
∆X =n
2middot E
[
(X minusX prime)2 | (Yj)jge1
]
=n
2middot E
[
(YK minus Y primeK)
2 | (Yj)jge1
]
=1
2
sumn
k=1
(
Y 2k + E[Y 2
k | (Yj)j 6=k])
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 30 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Given a deterministic array (Ajk)njk=1 of Hermitian matrices and a
uniformly random permutation π on 1 n define thecombinatorial matrix statistic
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Generalizes the scalar statistics studied by Hoeffding (1951)
Example
Sampling without replacement from B1 BnW =
sums
j=1Bπ(j)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 31 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Matrix Stein Pair for X = Y minus EY
Draw indices (JK) uniformly from 1 n2Define πprime = π (JK) and X prime =
sumnj=1Ajπprime(j) minus EY
Check Stein pair condition
E[X minusX prime | π] = E[
AJπ(J) +AKπ(K) minusAJπ(K) minusAKπ(J) | π]
=1
n2
sumn
jk=1Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
=2
n(Y minus EY ) =
2
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 32 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Conditional Variance for X = Y minus EY
∆X(π) =n
4E[
(X minusX prime)2 | π]
=1
4n
sumn
jk=1
[
Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
]2
41
n
sumn
jk=1
[
A2jπ(j) +A2
kπ(k) +A2jπ(k) +A2
kπ(j)
]
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 33 35
Extensions
Extensions
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = B
Beyond Sums
Matrix-valued functions satisfying a self-reproducing property
eg Matrix second-order Rademacher chaossum
jk εjεkAjk
Yields a dependent bounded differences inequality for matrices
Generalized Matrix Stein Pairs
Satisfy E[g(X)minus g(X prime) |Z] = αX almost surely forg R rarr R weakly increasingMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 34 35
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Bernstein S The theory of probabilities Gastehizdat Publishing House 1946
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Candes E J and Recht B Exact matrix completion via convex optimization Found Comput Math 9717ndash772 2009
Candes E J and Tao T The power of convex relaxation Near-optimal matrix completion IEEE Trans Info Theory 2009URL arXiv09031476 To appear Available at arXiv09031476
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Chatterjee S Concentration inequalities with exchangeable pairs PhD thesis Stanford University Palo Alto Feb 2008 URLarxivmath0507526vl
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available athttpwwwoptimization-onlineorgDB_FILE2011012898pdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Foygel R and Srebro N Concentration-based guarantees for low-rank matrix reconstruction Journal of Machine Learning
Research - Proceedings Track 19315ndash340 2011
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W A combinatorial central limit theorem Ann Math Statist 22558ndash566 1951
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 35 35
Extensions
References II
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Shawe-Taylor J Zemel R SBartlett P L Pereira F C N and Weinberger K Q (eds) Advances in Neural Information Processing Systems 24 pp1134ndash1142 2011
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs URL httparxivorgabs12016002 2012
Negahban S and Wainwright M J Restricted strong convexity and weighted matrix completion Optimal bounds with noisearXiv10092118v2[csIT] 2010
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B Simpler approach to matrix completion J Mach Learn Res 123413ndash3430 2011
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 36 35
- Motivation
-
- Matrix Completion
- Matrix Concentration
-
- Extensions
-
Polynomial Moment Inequalities
Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
E tr(
sum
kεkAk
)2p
le (2pminus 1)p middot tr(
sum
kA2k
)p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 26 35
Dependent Sequences
Adding Dependence
1 MotivationMatrix CompletionMatrix Concentration
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent SequencesSums of Conditionally Zero-mean MatricesCombinatorial Sums
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 27 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Sum of Conditionally Zero-Mean Matrices)
Given a sequence of Hermitian matrices (Yk)nk=1 satisfying the
Conditional zero mean property E[Yk | (Yj)j 6=k] = 0
for all k define the random sum X =sumn
k=1Yk
Note (Yk)kge1 is a martingale difference sequence
Examples
Sums of independent centered random matricesMany sums of conditionally independent random matrices
Yk perpperp (Yj)j 6=k | Z and E[Yk |Z] = 0
Rademacher series with random matrix coefficients
X =sum
kεkWk
(Wk)kge1 Hermitian (εk)kge1 independent Rademacher
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 28 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Matrix Stein Pair for X =sumn
k=1 Yk
Let Y primek and Yk be conditionally iid given (Yj)j 6=k
Draw index K uniformly from 1 nDefine X prime = X + Y prime
K minus YK
Check Stein pair condition
E[X minusX prime | (Yj)jge1] = E[YK minus Y primeK | (Yj)jge1]
=1
n
sumn
k=1
(
Yk minus E[Y primek | (Yj)j 6=k]
)
=1
n
sumn
k=1Yk =
1
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 29 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Conditional Variance for X = Y minus EY
∆X =n
2middot E
[
(X minusX prime)2 | (Yj)jge1
]
=n
2middot E
[
(YK minus Y primeK)
2 | (Yj)jge1
]
=1
2
sumn
k=1
(
Y 2k + E[Y 2
k | (Yj)j 6=k])
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 30 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Given a deterministic array (Ajk)njk=1 of Hermitian matrices and a
uniformly random permutation π on 1 n define thecombinatorial matrix statistic
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Generalizes the scalar statistics studied by Hoeffding (1951)
Example
Sampling without replacement from B1 BnW =
sums
j=1Bπ(j)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 31 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Matrix Stein Pair for X = Y minus EY
Draw indices (JK) uniformly from 1 n2Define πprime = π (JK) and X prime =
sumnj=1Ajπprime(j) minus EY
Check Stein pair condition
E[X minusX prime | π] = E[
AJπ(J) +AKπ(K) minusAJπ(K) minusAKπ(J) | π]
=1
n2
sumn
jk=1Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
=2
n(Y minus EY ) =
2
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 32 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Conditional Variance for X = Y minus EY
∆X(π) =n
4E[
(X minusX prime)2 | π]
=1
4n
sumn
jk=1
[
Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
]2
41
n
sumn
jk=1
[
A2jπ(j) +A2
kπ(k) +A2jπ(k) +A2
kπ(j)
]
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 33 35
Extensions
Extensions
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = B
Beyond Sums
Matrix-valued functions satisfying a self-reproducing property
eg Matrix second-order Rademacher chaossum
jk εjεkAjk
Yields a dependent bounded differences inequality for matrices
Generalized Matrix Stein Pairs
Satisfy E[g(X)minus g(X prime) |Z] = αX almost surely forg R rarr R weakly increasingMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 34 35
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Bernstein S The theory of probabilities Gastehizdat Publishing House 1946
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Candes E J and Recht B Exact matrix completion via convex optimization Found Comput Math 9717ndash772 2009
Candes E J and Tao T The power of convex relaxation Near-optimal matrix completion IEEE Trans Info Theory 2009URL arXiv09031476 To appear Available at arXiv09031476
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Chatterjee S Concentration inequalities with exchangeable pairs PhD thesis Stanford University Palo Alto Feb 2008 URLarxivmath0507526vl
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available athttpwwwoptimization-onlineorgDB_FILE2011012898pdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Foygel R and Srebro N Concentration-based guarantees for low-rank matrix reconstruction Journal of Machine Learning
Research - Proceedings Track 19315ndash340 2011
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W A combinatorial central limit theorem Ann Math Statist 22558ndash566 1951
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 35 35
Extensions
References II
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Shawe-Taylor J Zemel R SBartlett P L Pereira F C N and Weinberger K Q (eds) Advances in Neural Information Processing Systems 24 pp1134ndash1142 2011
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs URL httparxivorgabs12016002 2012
Negahban S and Wainwright M J Restricted strong convexity and weighted matrix completion Optimal bounds with noisearXiv10092118v2[csIT] 2010
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B Simpler approach to matrix completion J Mach Learn Res 123413ndash3430 2011
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 36 35
- Motivation
-
- Matrix Completion
- Matrix Concentration
-
- Extensions
-
Dependent Sequences
Adding Dependence
1 MotivationMatrix CompletionMatrix Concentration
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Dependent SequencesSums of Conditionally Zero-mean MatricesCombinatorial Sums
6 Extensions
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 27 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Sum of Conditionally Zero-Mean Matrices)
Given a sequence of Hermitian matrices (Yk)nk=1 satisfying the
Conditional zero mean property E[Yk | (Yj)j 6=k] = 0
for all k define the random sum X =sumn
k=1Yk
Note (Yk)kge1 is a martingale difference sequence
Examples
Sums of independent centered random matricesMany sums of conditionally independent random matrices
Yk perpperp (Yj)j 6=k | Z and E[Yk |Z] = 0
Rademacher series with random matrix coefficients
X =sum
kεkWk
(Wk)kge1 Hermitian (εk)kge1 independent Rademacher
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 28 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Matrix Stein Pair for X =sumn
k=1 Yk
Let Y primek and Yk be conditionally iid given (Yj)j 6=k
Draw index K uniformly from 1 nDefine X prime = X + Y prime
K minus YK
Check Stein pair condition
E[X minusX prime | (Yj)jge1] = E[YK minus Y primeK | (Yj)jge1]
=1
n
sumn
k=1
(
Yk minus E[Y primek | (Yj)j 6=k]
)
=1
n
sumn
k=1Yk =
1
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 29 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Conditional Variance for X = Y minus EY
∆X =n
2middot E
[
(X minusX prime)2 | (Yj)jge1
]
=n
2middot E
[
(YK minus Y primeK)
2 | (Yj)jge1
]
=1
2
sumn
k=1
(
Y 2k + E[Y 2
k | (Yj)j 6=k])
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 30 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Given a deterministic array (Ajk)njk=1 of Hermitian matrices and a
uniformly random permutation π on 1 n define thecombinatorial matrix statistic
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Generalizes the scalar statistics studied by Hoeffding (1951)
Example
Sampling without replacement from B1 BnW =
sums
j=1Bπ(j)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 31 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Matrix Stein Pair for X = Y minus EY
Draw indices (JK) uniformly from 1 n2Define πprime = π (JK) and X prime =
sumnj=1Ajπprime(j) minus EY
Check Stein pair condition
E[X minusX prime | π] = E[
AJπ(J) +AKπ(K) minusAJπ(K) minusAKπ(J) | π]
=1
n2
sumn
jk=1Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
=2
n(Y minus EY ) =
2
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 32 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Conditional Variance for X = Y minus EY
∆X(π) =n
4E[
(X minusX prime)2 | π]
=1
4n
sumn
jk=1
[
Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
]2
41
n
sumn
jk=1
[
A2jπ(j) +A2
kπ(k) +A2jπ(k) +A2
kπ(j)
]
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 33 35
Extensions
Extensions
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = B
Beyond Sums
Matrix-valued functions satisfying a self-reproducing property
eg Matrix second-order Rademacher chaossum
jk εjεkAjk
Yields a dependent bounded differences inequality for matrices
Generalized Matrix Stein Pairs
Satisfy E[g(X)minus g(X prime) |Z] = αX almost surely forg R rarr R weakly increasingMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 34 35
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Bernstein S The theory of probabilities Gastehizdat Publishing House 1946
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Candes E J and Recht B Exact matrix completion via convex optimization Found Comput Math 9717ndash772 2009
Candes E J and Tao T The power of convex relaxation Near-optimal matrix completion IEEE Trans Info Theory 2009URL arXiv09031476 To appear Available at arXiv09031476
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Chatterjee S Concentration inequalities with exchangeable pairs PhD thesis Stanford University Palo Alto Feb 2008 URLarxivmath0507526vl
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available athttpwwwoptimization-onlineorgDB_FILE2011012898pdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Foygel R and Srebro N Concentration-based guarantees for low-rank matrix reconstruction Journal of Machine Learning
Research - Proceedings Track 19315ndash340 2011
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W A combinatorial central limit theorem Ann Math Statist 22558ndash566 1951
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 35 35
Extensions
References II
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Shawe-Taylor J Zemel R SBartlett P L Pereira F C N and Weinberger K Q (eds) Advances in Neural Information Processing Systems 24 pp1134ndash1142 2011
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs URL httparxivorgabs12016002 2012
Negahban S and Wainwright M J Restricted strong convexity and weighted matrix completion Optimal bounds with noisearXiv10092118v2[csIT] 2010
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B Simpler approach to matrix completion J Mach Learn Res 123413ndash3430 2011
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 36 35
- Motivation
-
- Matrix Completion
- Matrix Concentration
-
- Extensions
-
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Sum of Conditionally Zero-Mean Matrices)
Given a sequence of Hermitian matrices (Yk)nk=1 satisfying the
Conditional zero mean property E[Yk | (Yj)j 6=k] = 0
for all k define the random sum X =sumn
k=1Yk
Note (Yk)kge1 is a martingale difference sequence
Examples
Sums of independent centered random matricesMany sums of conditionally independent random matrices
Yk perpperp (Yj)j 6=k | Z and E[Yk |Z] = 0
Rademacher series with random matrix coefficients
X =sum
kεkWk
(Wk)kge1 Hermitian (εk)kge1 independent Rademacher
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 28 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Matrix Stein Pair for X =sumn
k=1 Yk
Let Y primek and Yk be conditionally iid given (Yj)j 6=k
Draw index K uniformly from 1 nDefine X prime = X + Y prime
K minus YK
Check Stein pair condition
E[X minusX prime | (Yj)jge1] = E[YK minus Y primeK | (Yj)jge1]
=1
n
sumn
k=1
(
Yk minus E[Y primek | (Yj)j 6=k]
)
=1
n
sumn
k=1Yk =
1
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 29 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Conditional Variance for X = Y minus EY
∆X =n
2middot E
[
(X minusX prime)2 | (Yj)jge1
]
=n
2middot E
[
(YK minus Y primeK)
2 | (Yj)jge1
]
=1
2
sumn
k=1
(
Y 2k + E[Y 2
k | (Yj)j 6=k])
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 30 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Given a deterministic array (Ajk)njk=1 of Hermitian matrices and a
uniformly random permutation π on 1 n define thecombinatorial matrix statistic
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Generalizes the scalar statistics studied by Hoeffding (1951)
Example
Sampling without replacement from B1 BnW =
sums
j=1Bπ(j)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 31 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Matrix Stein Pair for X = Y minus EY
Draw indices (JK) uniformly from 1 n2Define πprime = π (JK) and X prime =
sumnj=1Ajπprime(j) minus EY
Check Stein pair condition
E[X minusX prime | π] = E[
AJπ(J) +AKπ(K) minusAJπ(K) minusAKπ(J) | π]
=1
n2
sumn
jk=1Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
=2
n(Y minus EY ) =
2
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 32 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Conditional Variance for X = Y minus EY
∆X(π) =n
4E[
(X minusX prime)2 | π]
=1
4n
sumn
jk=1
[
Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
]2
41
n
sumn
jk=1
[
A2jπ(j) +A2
kπ(k) +A2jπ(k) +A2
kπ(j)
]
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 33 35
Extensions
Extensions
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = B
Beyond Sums
Matrix-valued functions satisfying a self-reproducing property
eg Matrix second-order Rademacher chaossum
jk εjεkAjk
Yields a dependent bounded differences inequality for matrices
Generalized Matrix Stein Pairs
Satisfy E[g(X)minus g(X prime) |Z] = αX almost surely forg R rarr R weakly increasingMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 34 35
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Bernstein S The theory of probabilities Gastehizdat Publishing House 1946
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Candes E J and Recht B Exact matrix completion via convex optimization Found Comput Math 9717ndash772 2009
Candes E J and Tao T The power of convex relaxation Near-optimal matrix completion IEEE Trans Info Theory 2009URL arXiv09031476 To appear Available at arXiv09031476
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Chatterjee S Concentration inequalities with exchangeable pairs PhD thesis Stanford University Palo Alto Feb 2008 URLarxivmath0507526vl
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available athttpwwwoptimization-onlineorgDB_FILE2011012898pdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Foygel R and Srebro N Concentration-based guarantees for low-rank matrix reconstruction Journal of Machine Learning
Research - Proceedings Track 19315ndash340 2011
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W A combinatorial central limit theorem Ann Math Statist 22558ndash566 1951
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 35 35
Extensions
References II
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Shawe-Taylor J Zemel R SBartlett P L Pereira F C N and Weinberger K Q (eds) Advances in Neural Information Processing Systems 24 pp1134ndash1142 2011
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs URL httparxivorgabs12016002 2012
Negahban S and Wainwright M J Restricted strong convexity and weighted matrix completion Optimal bounds with noisearXiv10092118v2[csIT] 2010
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B Simpler approach to matrix completion J Mach Learn Res 123413ndash3430 2011
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 36 35
- Motivation
-
- Matrix Completion
- Matrix Concentration
-
- Extensions
-
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Matrix Stein Pair for X =sumn
k=1 Yk
Let Y primek and Yk be conditionally iid given (Yj)j 6=k
Draw index K uniformly from 1 nDefine X prime = X + Y prime
K minus YK
Check Stein pair condition
E[X minusX prime | (Yj)jge1] = E[YK minus Y primeK | (Yj)jge1]
=1
n
sumn
k=1
(
Yk minus E[Y primek | (Yj)j 6=k]
)
=1
n
sumn
k=1Yk =
1
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 29 35
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Conditional Variance for X = Y minus EY
∆X =n
2middot E
[
(X minusX prime)2 | (Yj)jge1
]
=n
2middot E
[
(YK minus Y primeK)
2 | (Yj)jge1
]
=1
2
sumn
k=1
(
Y 2k + E[Y 2
k | (Yj)j 6=k])
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 30 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Given a deterministic array (Ajk)njk=1 of Hermitian matrices and a
uniformly random permutation π on 1 n define thecombinatorial matrix statistic
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Generalizes the scalar statistics studied by Hoeffding (1951)
Example
Sampling without replacement from B1 BnW =
sums
j=1Bπ(j)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 31 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Matrix Stein Pair for X = Y minus EY
Draw indices (JK) uniformly from 1 n2Define πprime = π (JK) and X prime =
sumnj=1Ajπprime(j) minus EY
Check Stein pair condition
E[X minusX prime | π] = E[
AJπ(J) +AKπ(K) minusAJπ(K) minusAKπ(J) | π]
=1
n2
sumn
jk=1Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
=2
n(Y minus EY ) =
2
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 32 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Conditional Variance for X = Y minus EY
∆X(π) =n
4E[
(X minusX prime)2 | π]
=1
4n
sumn
jk=1
[
Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
]2
41
n
sumn
jk=1
[
A2jπ(j) +A2
kπ(k) +A2jπ(k) +A2
kπ(j)
]
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 33 35
Extensions
Extensions
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = B
Beyond Sums
Matrix-valued functions satisfying a self-reproducing property
eg Matrix second-order Rademacher chaossum
jk εjεkAjk
Yields a dependent bounded differences inequality for matrices
Generalized Matrix Stein Pairs
Satisfy E[g(X)minus g(X prime) |Z] = αX almost surely forg R rarr R weakly increasingMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 34 35
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Bernstein S The theory of probabilities Gastehizdat Publishing House 1946
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Candes E J and Recht B Exact matrix completion via convex optimization Found Comput Math 9717ndash772 2009
Candes E J and Tao T The power of convex relaxation Near-optimal matrix completion IEEE Trans Info Theory 2009URL arXiv09031476 To appear Available at arXiv09031476
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Chatterjee S Concentration inequalities with exchangeable pairs PhD thesis Stanford University Palo Alto Feb 2008 URLarxivmath0507526vl
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available athttpwwwoptimization-onlineorgDB_FILE2011012898pdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Foygel R and Srebro N Concentration-based guarantees for low-rank matrix reconstruction Journal of Machine Learning
Research - Proceedings Track 19315ndash340 2011
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W A combinatorial central limit theorem Ann Math Statist 22558ndash566 1951
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 35 35
Extensions
References II
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Shawe-Taylor J Zemel R SBartlett P L Pereira F C N and Weinberger K Q (eds) Advances in Neural Information Processing Systems 24 pp1134ndash1142 2011
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs URL httparxivorgabs12016002 2012
Negahban S and Wainwright M J Restricted strong convexity and weighted matrix completion Optimal bounds with noisearXiv10092118v2[csIT] 2010
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B Simpler approach to matrix completion J Mach Learn Res 123413ndash3430 2011
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 36 35
- Motivation
-
- Matrix Completion
- Matrix Concentration
-
- Extensions
-
Dependent Sequences Sums of Conditionally Zero-mean Matrices
Sums of Conditionally Zero-mean Matrices
Definition (Conditional Zero Mean Property)
E[Yk | (Yj)j 6=k] = 0
Conditional Variance for X = Y minus EY
∆X =n
2middot E
[
(X minusX prime)2 | (Yj)jge1
]
=n
2middot E
[
(YK minus Y primeK)
2 | (Yj)jge1
]
=1
2
sumn
k=1
(
Y 2k + E[Y 2
k | (Yj)j 6=k])
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 30 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Given a deterministic array (Ajk)njk=1 of Hermitian matrices and a
uniformly random permutation π on 1 n define thecombinatorial matrix statistic
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Generalizes the scalar statistics studied by Hoeffding (1951)
Example
Sampling without replacement from B1 BnW =
sums
j=1Bπ(j)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 31 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Matrix Stein Pair for X = Y minus EY
Draw indices (JK) uniformly from 1 n2Define πprime = π (JK) and X prime =
sumnj=1Ajπprime(j) minus EY
Check Stein pair condition
E[X minusX prime | π] = E[
AJπ(J) +AKπ(K) minusAJπ(K) minusAKπ(J) | π]
=1
n2
sumn
jk=1Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
=2
n(Y minus EY ) =
2
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 32 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Conditional Variance for X = Y minus EY
∆X(π) =n
4E[
(X minusX prime)2 | π]
=1
4n
sumn
jk=1
[
Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
]2
41
n
sumn
jk=1
[
A2jπ(j) +A2
kπ(k) +A2jπ(k) +A2
kπ(j)
]
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 33 35
Extensions
Extensions
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = B
Beyond Sums
Matrix-valued functions satisfying a self-reproducing property
eg Matrix second-order Rademacher chaossum
jk εjεkAjk
Yields a dependent bounded differences inequality for matrices
Generalized Matrix Stein Pairs
Satisfy E[g(X)minus g(X prime) |Z] = αX almost surely forg R rarr R weakly increasingMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 34 35
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Bernstein S The theory of probabilities Gastehizdat Publishing House 1946
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Candes E J and Recht B Exact matrix completion via convex optimization Found Comput Math 9717ndash772 2009
Candes E J and Tao T The power of convex relaxation Near-optimal matrix completion IEEE Trans Info Theory 2009URL arXiv09031476 To appear Available at arXiv09031476
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Chatterjee S Concentration inequalities with exchangeable pairs PhD thesis Stanford University Palo Alto Feb 2008 URLarxivmath0507526vl
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available athttpwwwoptimization-onlineorgDB_FILE2011012898pdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Foygel R and Srebro N Concentration-based guarantees for low-rank matrix reconstruction Journal of Machine Learning
Research - Proceedings Track 19315ndash340 2011
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W A combinatorial central limit theorem Ann Math Statist 22558ndash566 1951
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 35 35
Extensions
References II
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Shawe-Taylor J Zemel R SBartlett P L Pereira F C N and Weinberger K Q (eds) Advances in Neural Information Processing Systems 24 pp1134ndash1142 2011
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs URL httparxivorgabs12016002 2012
Negahban S and Wainwright M J Restricted strong convexity and weighted matrix completion Optimal bounds with noisearXiv10092118v2[csIT] 2010
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B Simpler approach to matrix completion J Mach Learn Res 123413ndash3430 2011
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 36 35
- Motivation
-
- Matrix Completion
- Matrix Concentration
-
- Extensions
-
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Given a deterministic array (Ajk)njk=1 of Hermitian matrices and a
uniformly random permutation π on 1 n define thecombinatorial matrix statistic
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Generalizes the scalar statistics studied by Hoeffding (1951)
Example
Sampling without replacement from B1 BnW =
sums
j=1Bπ(j)
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 31 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Matrix Stein Pair for X = Y minus EY
Draw indices (JK) uniformly from 1 n2Define πprime = π (JK) and X prime =
sumnj=1Ajπprime(j) minus EY
Check Stein pair condition
E[X minusX prime | π] = E[
AJπ(J) +AKπ(K) minusAJπ(K) minusAKπ(J) | π]
=1
n2
sumn
jk=1Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
=2
n(Y minus EY ) =
2
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 32 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Conditional Variance for X = Y minus EY
∆X(π) =n
4E[
(X minusX prime)2 | π]
=1
4n
sumn
jk=1
[
Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
]2
41
n
sumn
jk=1
[
A2jπ(j) +A2
kπ(k) +A2jπ(k) +A2
kπ(j)
]
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 33 35
Extensions
Extensions
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = B
Beyond Sums
Matrix-valued functions satisfying a self-reproducing property
eg Matrix second-order Rademacher chaossum
jk εjεkAjk
Yields a dependent bounded differences inequality for matrices
Generalized Matrix Stein Pairs
Satisfy E[g(X)minus g(X prime) |Z] = αX almost surely forg R rarr R weakly increasingMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 34 35
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Bernstein S The theory of probabilities Gastehizdat Publishing House 1946
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Candes E J and Recht B Exact matrix completion via convex optimization Found Comput Math 9717ndash772 2009
Candes E J and Tao T The power of convex relaxation Near-optimal matrix completion IEEE Trans Info Theory 2009URL arXiv09031476 To appear Available at arXiv09031476
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Chatterjee S Concentration inequalities with exchangeable pairs PhD thesis Stanford University Palo Alto Feb 2008 URLarxivmath0507526vl
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available athttpwwwoptimization-onlineorgDB_FILE2011012898pdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Foygel R and Srebro N Concentration-based guarantees for low-rank matrix reconstruction Journal of Machine Learning
Research - Proceedings Track 19315ndash340 2011
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W A combinatorial central limit theorem Ann Math Statist 22558ndash566 1951
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 35 35
Extensions
References II
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Shawe-Taylor J Zemel R SBartlett P L Pereira F C N and Weinberger K Q (eds) Advances in Neural Information Processing Systems 24 pp1134ndash1142 2011
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs URL httparxivorgabs12016002 2012
Negahban S and Wainwright M J Restricted strong convexity and weighted matrix completion Optimal bounds with noisearXiv10092118v2[csIT] 2010
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B Simpler approach to matrix completion J Mach Learn Res 123413ndash3430 2011
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 36 35
- Motivation
-
- Matrix Completion
- Matrix Concentration
-
- Extensions
-
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Matrix Stein Pair for X = Y minus EY
Draw indices (JK) uniformly from 1 n2Define πprime = π (JK) and X prime =
sumnj=1Ajπprime(j) minus EY
Check Stein pair condition
E[X minusX prime | π] = E[
AJπ(J) +AKπ(K) minusAJπ(K) minusAKπ(J) | π]
=1
n2
sumn
jk=1Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
=2
n(Y minus EY ) =
2
nX
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 32 35
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Conditional Variance for X = Y minus EY
∆X(π) =n
4E[
(X minusX prime)2 | π]
=1
4n
sumn
jk=1
[
Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
]2
41
n
sumn
jk=1
[
A2jπ(j) +A2
kπ(k) +A2jπ(k) +A2
kπ(j)
]
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 33 35
Extensions
Extensions
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = B
Beyond Sums
Matrix-valued functions satisfying a self-reproducing property
eg Matrix second-order Rademacher chaossum
jk εjεkAjk
Yields a dependent bounded differences inequality for matrices
Generalized Matrix Stein Pairs
Satisfy E[g(X)minus g(X prime) |Z] = αX almost surely forg R rarr R weakly increasingMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 34 35
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Bernstein S The theory of probabilities Gastehizdat Publishing House 1946
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Candes E J and Recht B Exact matrix completion via convex optimization Found Comput Math 9717ndash772 2009
Candes E J and Tao T The power of convex relaxation Near-optimal matrix completion IEEE Trans Info Theory 2009URL arXiv09031476 To appear Available at arXiv09031476
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Chatterjee S Concentration inequalities with exchangeable pairs PhD thesis Stanford University Palo Alto Feb 2008 URLarxivmath0507526vl
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available athttpwwwoptimization-onlineorgDB_FILE2011012898pdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Foygel R and Srebro N Concentration-based guarantees for low-rank matrix reconstruction Journal of Machine Learning
Research - Proceedings Track 19315ndash340 2011
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W A combinatorial central limit theorem Ann Math Statist 22558ndash566 1951
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 35 35
Extensions
References II
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Shawe-Taylor J Zemel R SBartlett P L Pereira F C N and Weinberger K Q (eds) Advances in Neural Information Processing Systems 24 pp1134ndash1142 2011
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs URL httparxivorgabs12016002 2012
Negahban S and Wainwright M J Restricted strong convexity and weighted matrix completion Optimal bounds with noisearXiv10092118v2[csIT] 2010
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B Simpler approach to matrix completion J Mach Learn Res 123413ndash3430 2011
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 36 35
- Motivation
-
- Matrix Completion
- Matrix Concentration
-
- Extensions
-
Dependent Sequences Combinatorial Sums
Combinatorial Sums of Matrices
Definition (Combinatorial Matrix Statistic)
Y =sumn
j=1Ajπ(j) with mean EY =
1
n
sumn
jk=1Ajk
Conditional Variance for X = Y minus EY
∆X(π) =n
4E[
(X minusX prime)2 | π]
=1
4n
sumn
jk=1
[
Ajπ(j) +Akπ(k) minusAjπ(k) minusAkπ(j)
]2
41
n
sumn
jk=1
[
A2jπ(j) +A2
kπ(k) +A2jπ(k) +A2
kπ(j)
]
rArr Conditional variance controlled when summands are bounded
rArr Dependent analogues of concentration and moment inequalities
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 33 35
Extensions
Extensions
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = B
Beyond Sums
Matrix-valued functions satisfying a self-reproducing property
eg Matrix second-order Rademacher chaossum
jk εjεkAjk
Yields a dependent bounded differences inequality for matrices
Generalized Matrix Stein Pairs
Satisfy E[g(X)minus g(X prime) |Z] = αX almost surely forg R rarr R weakly increasingMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 34 35
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Bernstein S The theory of probabilities Gastehizdat Publishing House 1946
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Candes E J and Recht B Exact matrix completion via convex optimization Found Comput Math 9717ndash772 2009
Candes E J and Tao T The power of convex relaxation Near-optimal matrix completion IEEE Trans Info Theory 2009URL arXiv09031476 To appear Available at arXiv09031476
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Chatterjee S Concentration inequalities with exchangeable pairs PhD thesis Stanford University Palo Alto Feb 2008 URLarxivmath0507526vl
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available athttpwwwoptimization-onlineorgDB_FILE2011012898pdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Foygel R and Srebro N Concentration-based guarantees for low-rank matrix reconstruction Journal of Machine Learning
Research - Proceedings Track 19315ndash340 2011
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W A combinatorial central limit theorem Ann Math Statist 22558ndash566 1951
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 35 35
Extensions
References II
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Shawe-Taylor J Zemel R SBartlett P L Pereira F C N and Weinberger K Q (eds) Advances in Neural Information Processing Systems 24 pp1134ndash1142 2011
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs URL httparxivorgabs12016002 2012
Negahban S and Wainwright M J Restricted strong convexity and weighted matrix completion Optimal bounds with noisearXiv10092118v2[csIT] 2010
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B Simpler approach to matrix completion J Mach Learn Res 123413ndash3430 2011
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 36 35
- Motivation
-
- Matrix Completion
- Matrix Concentration
-
- Extensions
-
Extensions
Extensions
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = B
Beyond Sums
Matrix-valued functions satisfying a self-reproducing property
eg Matrix second-order Rademacher chaossum
jk εjεkAjk
Yields a dependent bounded differences inequality for matrices
Generalized Matrix Stein Pairs
Satisfy E[g(X)minus g(X prime) |Z] = αX almost surely forg R rarr R weakly increasingMackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 34 35
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Bernstein S The theory of probabilities Gastehizdat Publishing House 1946
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Candes E J and Recht B Exact matrix completion via convex optimization Found Comput Math 9717ndash772 2009
Candes E J and Tao T The power of convex relaxation Near-optimal matrix completion IEEE Trans Info Theory 2009URL arXiv09031476 To appear Available at arXiv09031476
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Chatterjee S Concentration inequalities with exchangeable pairs PhD thesis Stanford University Palo Alto Feb 2008 URLarxivmath0507526vl
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available athttpwwwoptimization-onlineorgDB_FILE2011012898pdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Foygel R and Srebro N Concentration-based guarantees for low-rank matrix reconstruction Journal of Machine Learning
Research - Proceedings Track 19315ndash340 2011
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W A combinatorial central limit theorem Ann Math Statist 22558ndash566 1951
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 35 35
Extensions
References II
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Shawe-Taylor J Zemel R SBartlett P L Pereira F C N and Weinberger K Q (eds) Advances in Neural Information Processing Systems 24 pp1134ndash1142 2011
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs URL httparxivorgabs12016002 2012
Negahban S and Wainwright M J Restricted strong convexity and weighted matrix completion Optimal bounds with noisearXiv10092118v2[csIT] 2010
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B Simpler approach to matrix completion J Mach Learn Res 123413ndash3430 2011
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 36 35
- Motivation
-
- Matrix Completion
- Matrix Concentration
-
- Extensions
-
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Bernstein S The theory of probabilities Gastehizdat Publishing House 1946
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Candes E J and Recht B Exact matrix completion via convex optimization Found Comput Math 9717ndash772 2009
Candes E J and Tao T The power of convex relaxation Near-optimal matrix completion IEEE Trans Info Theory 2009URL arXiv09031476 To appear Available at arXiv09031476
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Chatterjee S Concentration inequalities with exchangeable pairs PhD thesis Stanford University Palo Alto Feb 2008 URLarxivmath0507526vl
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available athttpwwwoptimization-onlineorgDB_FILE2011012898pdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Foygel R and Srebro N Concentration-based guarantees for low-rank matrix reconstruction Journal of Machine Learning
Research - Proceedings Track 19315ndash340 2011
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W A combinatorial central limit theorem Ann Math Statist 22558ndash566 1951
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 35 35
Extensions
References II
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Shawe-Taylor J Zemel R SBartlett P L Pereira F C N and Weinberger K Q (eds) Advances in Neural Information Processing Systems 24 pp1134ndash1142 2011
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs URL httparxivorgabs12016002 2012
Negahban S and Wainwright M J Restricted strong convexity and weighted matrix completion Optimal bounds with noisearXiv10092118v2[csIT] 2010
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B Simpler approach to matrix completion J Mach Learn Res 123413ndash3430 2011
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 36 35
- Motivation
-
- Matrix Completion
- Matrix Concentration
-
- Extensions
-
Extensions
References II
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Shawe-Taylor J Zemel R SBartlett P L Pereira F C N and Weinberger K Q (eds) Advances in Neural Information Processing Systems 24 pp1134ndash1142 2011
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs URL httparxivorgabs12016002 2012
Negahban S and Wainwright M J Restricted strong convexity and weighted matrix completion Optimal bounds with noisearXiv10092118v2[csIT] 2010
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B Simpler approach to matrix completion J Mach Learn Res 123413ndash3430 2011
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (Stanford) Steinrsquos Method for Matrix Concentration December 10 2012 36 35
- Motivation
-
- Matrix Completion
- Matrix Concentration
-
- Extensions
-
top related