constructive regularization of the random matrix norm.rebrova/hdi_workshop_presentation-1.pdf ·...

45
Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix Constructive regularization of the random matrix norm. Liza Rebrova University of California Los Angeles Structural inference in High Dimensional Models workshop, September 2018

Upload: hacong

Post on 20-Jul-2019

218 views

Category:

Documents


0 download

TRANSCRIPT

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

Constructiveregularization of the random matrix norm.

Liza Rebrova

University of California Los Angeles

Structural inference in High Dimensional Models workshop,September 2018

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

Non-asymptotic random matrix theory framework

A = (Aij)n×m. Aij are taken from some distribution.

Usually, we have

• no specific distribution assumption

• no symmetry assumption

• high probability results (hold with probability 1− o(1))

• for the dimensions large enough (all large matrices withn,m > N0)

Concentration on the sphere Convex set in Rn

(Right picture is taken from “Estimation in high dimensions” by R.Vershynin)

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

Operator (spectral) norm

By definition,

‖A‖ := sup‖x‖2=1

‖Ax‖2 = supu,v∈Sn−1

|〈Au, v〉| = s1(A)

Norm of the inverse 1/‖A−1‖ = inf‖x‖2=1

‖Ax‖2 = sn(A)

Singular values – real spectrum of the matrix

s(A) =√

eig(ATA), s1 ≥ s2 ≥ . . . ≥ sn ≥ 0.

Key idea: spectrum stabilizes as the size of the matrices →∞

eigenvalues of a Wigner matrixReal vs complex component of gaussian matrixeigenvalues

(Left picture is taken from “Estimation in high dimensions by R.Vershynin”)

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

What is optimal norm order?

Let A = (Aij)n×n be a square random matrix with i.i.d. entries.

Gaussian Subgaussian

for any t ≥ 0 for any t ≥ C0

s1(A) s1 ≤ 2√n + t s1 ≤ t

√n

with prob 1− 2e−t2/2 with prob 1− e−ct

2n

from Gordon’s theorem from Bernstein’s inequality

Blue - gaussian, Red - subgaussian, Green - heavy-tailed

Def.: Aij are subgaussian if P{|Aij | > t} ≤ C1e−c2t2 for any t > 0

(Picture is taken from D.Mixon blog “Short, fat matrices”)

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

Not an optimal order

Light tails ((sub)gaussian, 4 finite moments): with high probability,

‖A‖ = smax(A) ∼√n and smin(A) ∼ 1/

√n.

Heavy tails (2 finite moments): with high probability,

‖A‖ = smax(A) �√n and smin(A) ∼ 1/

√n.

Example (‖A‖ ∼ n�√n)

• Litvak-Spector: Constructive example of ‖A‖ ∼ O(n1−β) forany β ≥ 0 with probability at least 1/2.

• Bai-Silverstein-Yin: 4 moments are needed for ‖A‖ →√n.

sup‖x‖=1

‖Ax‖ ≥

1 1 1 11 1 1 11 1 1 11 1 1 1

·

n−1/2

n−1/2

n−1/2

n−1/2

= n

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

Local norm regularization

Questions:1. Can we regularize the norm correcting just a small fraction ofthe entries of A?2. What in the structure of a heavy-tailed matrix causes norm toblow up from the “ideal” order O(

√n)?

Local regularization: A 7→ A, such that

• A differs from A in a small εn × εn sub-matrix

• ‖A‖ .√n

Theorem (with R. Vershynin, informal statement)

Let A be a large enough random square matrix with i.i.d. elements.Local regularization is possible with high probability ⇐⇒

EAij = 0 and EA2ij is bounded.

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

Local norm regularization

Theorem (Part 1: local obstructions)

Let A = (Aij)n×n has i.i.d. entries, such thatEAij = 0, EA2

ij = 1. For any ε ∈ (0, 1/6],

with probability ≥ 1− 11e−εn/12

there exists an εn × εn sub-matrix A0 ⊂ A:

‖A \ A0‖ ≤ Cε√n, Cε = C · ln(ε−1)√

ε

0 εn

n A \ A0A0

A \ A0 = zero outall entries in A0

• log-optimal dependence of on size ε

• can consider any ε < 1 in trade of larger constants

• inconstructive: does not identify A0

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

Proof idea

”Ideal” norm relation?

‖A‖ . ‖A‖∞→2√n

. ‖A‖2→∞ .√n

Definition

‖A‖∞→2 := ‖A : l∞ → l2‖ = maxx∈{−1,1}n ‖Ax‖2 (Cut norm)‖A‖2→∞” = ‖A : l2 → l∞‖ = maxi ‖row(A)i‖2 (Max row norm)

Example (True for gaussian matrices)

For gaussian matrix (i.i.d. N(0,1) entries) we have:

‖A‖2→∞ ∼√n, ‖A‖∞→2 ∼ n, ‖A‖ ∼

√n

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

Proof idea

”Ideal” norm relation?

(((((((((((((((((hhhhhhhhhhhhhhhhh‖A‖ . ‖A‖∞→2√

n. ‖A‖2→∞ .

√n

Not true for heavy-tailed :) Instead, we prove

‖AJc3‖ .‖AJc2‖∞→2√n

. ‖AJc1‖2→∞ .

√n,

where J1 ⊂ J2 ⊂ J3 (|Ji | ≤ εn) are small column subsets to remove

• (*) εn column cut ∼ εn × εn sub-matrix cut

• (**) last step: Grothendieck-Pietsch factorization for matrices(inconstructive!)

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

(*) εn columns cut is as good

It is enough to show that εn - columns cut regularizes the norm:

0 + =

‖green‖ ≤√n ‖brown‖ ≤

√n ‖dashed‖ ≤ 2

√n

0

0

0

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

(**) Grothendieck-Pietsch factorization

Standard estimate: 1√n‖B‖∞→2 ≤ ‖B‖ ≤ ‖B‖∞→2

We want: ‖AJc3‖ . 1√

n‖AJc2‖∞→2 with high probability

Theorem (Grothendieck-Pietsch, sub-matrix version)

Let B be a n × n1 real matrix and δ > 0. Then there existsJ ⊂ [n1] with |J| ≥ (1− ε)n1 such that

‖B[n]×J‖ ≤2‖B‖∞→2√

εn1.

We apply it to B = AJc2with n1 = (1− ε

2)n to find |J| ≥ (1− ε)n.

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

Constructive regularization

Question:1. Can we regularize the norm correcting just a small fraction ofthe entries of A?

Yes, iff EAij = 0, EA2ij = 1.

2. What in the structure of a heavy-tailed matrix causes norm toblow up from the “ideal” order O(

√n)?

Or, how to perform local regularization constructively?

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

Individual entries correction

Theorem (more than 2 moments, any δ > 0)

Let A = (Aij)n×n has i.i.d. entries, s.t. EAij = 0, E|Aij |2+δ ≤ 1.With high probability, zeroing n1−δ/9 largest entries of A leads to

‖A‖ ≤ 8√n.

Proof based of Bandeira-Van Handel theorem: for any γ > 1

‖A‖ ≤ γ · σ + t with prob. 1− n exp(− t2

cγσ2∗),

where

• σ is max expected row/col norm; σ2row = maxi ‖row(EA2ij)‖22

• σ∗ is max entry; σ2∗ = maxij ‖Aij‖∞We have t, σ ∼

√n, and σ∗ �

√n.

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

If we have just finite 2nd moment...

Matrix Bernstein inequality: zeroing a few entries ∴ ‖A‖ . ln n√n.

Example (failure of individual corrections approach)

Consider scaled Bernoulli matrix Aij ∼√n · Ber( 1n ).

• There is a row with at least (ln n/ ln ln n) non-zero elementsw.h.p. So, norm regularization is needed, as

‖A‖ ≥ ‖Ai‖2 �√n

• Entries are {0,√n}, so looking at the size only, we can only

delete all or nothing.

• There are too many non-zero entries to fit in εn × εnsub-matrix

Need to use some information about entries locations with respectto each other (in given realization)

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

Constructive norm regularization

Theorem (Main)

Let A = (Aij)n×n has i.i.d. symmetrically distributed entries, suchthat EA2

ij = 1. For any ε ∈ (0, 1/6] and r > 1,

with probability ≥ 1− n0.1−r

zeroing out εn rows and εn columns with the largest L2-normsleads to the matrix A:

‖A‖ ≤ C√

cε ln ln n · n, where cε = ln(ε−1)/ε

• simple & constructive way to regularize the norm

• better description of the obstructions (to the good norm)

• extra ln ln n term and symmetry assumprion

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

Constructive norm regularization

Theorem (Main, equivalent version)

Let A = (Aij)n×n has i.i.d. symmetrically distributed entries, suchthat EA2

ij = 1. For any ε ∈ (0, 1/6] and r > 1,

with probability ≥ 1− n0.1−r

zeroing out any product subset of the entries such that on the restall rows and columns have ‖rowi (A)‖2, ‖coli (A)‖2 ≤ C

√cεn

produces A:

‖A‖ ≤ C√

cε ln ln n · n, where cε = ln(ε−1)/ε

• simple & constructive way to regularize the norm

• better description of the obstructions (to the good norm)

• extra ln ln n term and symmetry assumprion

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

Proof background: Bernoulli matrices

B is n × n matrix with 0-1 entries, EBij = p.

E(Bij − EBij)2 ∼ p ∴ optimal norm ‖B − EB‖ ∼ √np.

This is known:

• (Feige-Ofek) if p &√

ln n, then ‖B − EB‖ ∼ √np w.h.p.

• (Krievelevich-Sudakov) if p �√

ln n, No, counterexample

Note: p = 1/n means exactly 2 finite (constant) momentsRegularization for the sparse case:

• (Feige-Ofek) zero out all rows and columns of A with morethan 10d non-zero elements,

• (Le-Levina-Vershynin) reweight or zero out some elements s.t.sum of elements in every row and column is at most 10d ,

where d := E{ number of non-zero elements in a row/column }.

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

Proof background: Bernoulli matrices

B is n × n matrix with 0-1 entries, EBij = p.

E(Bij − EBij)2 ∼ p ∴ optimal norm ‖B − EB‖ ∼ √np.

This is known:

• (Feige-Ofek) if p &√

ln n, then ‖B − EB‖ ∼ √np w.h.p.

• (Krievelevich-Sudakov) if p �√

ln n, No, counterexample

Note: p = 1/n means exactly 2 finite (constant) momentsRegularization for the sparse case:

• (Feige-Ofek) zero out all rows and columns of A with morethan 10d non-zero elements,

• (Le-Levina-Vershynin) reweight or zero out some elements s.t.sum of elements in every row and column is at most 10d ,

where d := E{L2-norm of a row/column }.

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

Constructive regularization: proof ideas

High-level idea: split

|Aij | ∼∑k

2k1{|Aij |∈(2k−1,2k ]} =∑

2kBk

and apply Bernoulli results at each ”level”.

Some challenges:

1. Even though rows/columns of A have bounded L2-norms,some levels can be too heavy (compensated by other light)

2. Pass to absolute value (we cannot directly approximate ‖|A|‖,it is too large - mean zero is needed for local regularization)

3. Consider as few levels as possible

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

1. Bernoulli matrix decomposition

With probability 1− 3n−r all entries of B = B1 t B2 t B3:

• #(rowi (B1)) . rnp, #(coli (B1)) . rnp - bounded rows&cols

• #(rowi (B2)) . r - very sparse rows

• #(coli (B3)) . r - very sparse columns

To do

B1

B2

B3

p−1

n

n/2

Lemma (1, pn > 4)

In every n2−k × n2−k submatrixof B there are at most 2−k/pcolumns with > C1rnp non-zeros

Lemma (2, pn > 4)

In every n2−k × 2−k/p submatrixof B there are at most 2−kn/4columns with > C2r non-zeros

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

1. Bernoulli matrix decomposition

With probability 1− 3n−r all entries of B = B1 t B2 t B3:

• #(rowi (B1)) . rnp, #(coli (B1)) . rnp - bounded rows&cols

• #(rowi (B2)) . r - very sparse rows

• #(coli (B3)) . r - very sparse columns

To do

B1

B2

B3

p−1

n

n/4

Lemma (1, pn > 4)

In every n2−k × n2−k submatrixof B there are at most 2−k/pcolumns with > C1rnp non-zeros

Lemma (2, pn > 4)

In every n2−k × 2−k/p submatrixof B there are at most 2−kn/4columns with > C2r non-zeros

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

1. Bernoulli matrix decomposition

With probability 1− 3n−r all entries of B = B1 t B2 t B3:

• #(rowi (B1)) . rnp, #(coli (B1)) . rnp - bounded rows&cols

• #(rowi (B2)) . r - very sparse rows

• #(coli (B3)) . r - very sparse columns

To do

B2

B3

n/2

n/2 Lemma (1, pn ≤ 4)

In every n2−k × n2−k submatrixof B there are at most n2−k−1

columns with > C1r non-zeros

Lemma (2, pn ≤ 4)

In every n2−k × 2−k−1 submatrixof B there are at most 2−k−1ncolumns with > C2r non-zeros

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

1. Bernoulli matrix decomposition

With probability 1− 3n−r all entries of B = B1 t B2 t B3:

• #(rowi (B1)) . rnp, #(coli (B1)) . rnp - bounded rows&cols

• #(rowi (B2)) . r - very sparse rows

• #(coli (B3)) . r - very sparse columns

To do

B2

B3

n/4

n/2 Lemma (1, pn ≤ 4)

In every n2−k × n2−k submatrixof B there are at most n2−k−1

columns with > C1r non-zeros

Lemma (2, pn ≤ 4)

In every n2−k × 2−k−1 submatrixof B there are at most 2−k−1ncolumns with > C2r non-zeros

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

1. Bernoulli matrices: after decomposition

Recall:|Aij | ∼

∑k

2k1{|Aij |∈(2k−1,2k ]} =∑

2kBk

• For Apart1 =∑

BkB2 ∪ Bk

B3 use

Lemma (Norm of sparse matrices)

For any matrix Q and vectors u, v ∈ Sn−1, we have

‖Q‖ ≤ maxj‖colj(Q)‖2 ·

√max

i#(rowi (Q)).

↓ ↓√n

√const ·#(terms)

• For each Bkij ∈ B1 all rows and columns are bounded by

O(npk) =⇒ we can use results for Bernoulli matrices

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

2. Heavy and light indices: Bernoulli

Using definition ‖B‖ = supu,v∈Sn−1 |∑

ij Bijuivj |.Light indices := {(i , j) : |uivj | ≤

√p/n} for every u, v .

Split the sum

|∑ij

(Bij − EBij)uivj | ≤

|∑light

(Bij − EBij)uivj |+ |∑heavy

EBijuivj |+ |∑heavy

Bijuivj |

• Light part - bounded members - Bernstein’s concentration

• Expectation part - #(heavy indices) ≤ n/p - Cauchy-Swartz

• Heavy part - Feige-Ofek theorem (bound follows from tailestimate for e(S ,T ) = number of non-zero entries in someS × T sub-block)

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

2. Heavy and light indices: general case

Light indices := {(i , j) : |uivjAij | ≤√

4/n} for every u, v .Split the sum

|∑ij

Aijuivj | ≤ |∑light

Aijuivj |+∑heavy

|Aij |uivj

E|Aij | 6= 0, but we do not care, split into Bernoulli levels and useFeige-Ofek theorem at each level!∑

heavy

|Aij |uivj ≤∑ij

∑k

2kBkij uivj ≤

∑k

2k√npk

≤√n∑k

22kpk ·√

#(levels).

From second moment condition 1 ≥ EA2ij ≥ 0.25

∑k 22kpk .

Number of levels is an extra term - minimize it.

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

3. Only average levels matter

• Large entries (&√ncε) are zeroed (they produce heavy rows)

• Small entries (.√

n/ ln n) are bounded separately byBandeira-van Handel theorem

Number of levels is at most

log2(Ccεn)− log2( cn

ln n

)≤ log2

Ccεn · ln nc1n

∼ log log n.

Note: symmetry is needed only to keep zero mean in varioustruncations.

Q.E.D.

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

What is we want to zero out εn × εn block only?

Need to find the most ”dense” part of the matrix.Enough to find exceptional εn subset of columns (only),exceptional εn subset of rows (only) and take an intersection.

A A

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

What is we want to zero out εn × εn block only?

Need to find the most ”dense” part of the matrix.Enough to find exceptional εn subset of columns (only),exceptional εn subset of rows (only) and take an intersection.

0A

‖green‖ ≤√n ‖brown‖ ≤

√n

A

0

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

What is we want to zero out εn × εn block only?

Need to find the most ”dense” part of the matrix.Enough to find exceptional εn subset of columns (only),exceptional εn subset of rows (only) and take an intersection.

0

‖green‖ ≤√n ‖brown‖ ≤

√n

0

0

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

What is we want to zero out εn × εn block only?

Need to find the most ”dense” part of the matrix.Enough to find exceptional εn subset of columns (only),exceptional εn subset of rows (only) and take an intersection.

0 + =

‖green‖ ≤√n ‖brown‖ ≤

√n ‖dashed‖ ≤ 2

√n

0

0

0

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

Algorithm idea

Idea: find εn columns to replace with zeros, such that all rows andcolumns have bounded L2-norms + apply Main Theorem.

Lemma (with K.Tikhomirov)

B is n × n matrix with 0-1 entries, EBij = p. Then for any L ≥ 10with probability 1− exp(−n exp(−Lpn)): if we define

Wij :=

{1, if #(rowj(B)) ≤ Lnp or Bij = 0,

Lnp#(rowj (B)) , otherwise.

and Vj :=∏n

i=1Wij , and J := {j : Vj < 0.1}, then

|J| ≤ n exp(−Lnp) and∑j∈Jc

Bij ≤ 10Lnp, for any i ∈ [n].

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

Damping: Bernoulli example

Idea: we construct a diagonal matrix of weights that regularizeseach row

0 1 0 0 10 0 0 0 00 1 1 0 01 1 0 0 11 0 0 0 0

·

0δ1

00

δ1

=

0 δ1 0 0 δ10 0 0 0 00 1 1 0 01 1 0 0 11 0 0 0 0

1-st row: damping with the weight 0 < δ1 < 1

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

Damping: Bernoulli example

Idea: we construct a diagonal matrix of weights that regularizeseach row

0 1 0 0 10 0 0 0 00 1 1 0 01 1 0 0 11 0 0 0 0

·

0δ1

00

δ1

=

0 δ1 0 0 δ10 0 0 0 00 1 1 0 01 1 0 0 11 0 0 0 0

2-nd row: all good

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

Damping: Bernoulli example

Idea: we construct a diagonal matrix of weights that regularizeseach row

0 1 0 0 10 0 0 0 00 1 1 0 01 1 0 0 11 0 0 0 0

·

0δ21

δ10

δ1

=

0 δ1 0 0 δ10 0 0 0 00 δ1 δ1 0 01 1 0 0 11 0 0 0 0

3-rd row: damping with the weight 0 < δ1 < 1

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

Damping: Bernoulli example

Idea: we construct a diagonal matrix of weights that regularizeseach row

0 1 0 0 10 0 0 0 00 1 1 0 01 1 0 0 11 0 0 0 0

δ2

δ21δ2δ1

0δ1δ2

=

0 δ1 0 0 δ10 0 0 0 00 δ1 δ1 0 0δ2 δ2 0 0 δ21 0 0 0 0

4-th row: damping with the weight 0 < δ2 < δ1 < 1

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

Damping: Bernoulli example

Idea: we construct a diagonal matrix of weights that regularizeseach row

0 1 0 0 10 0 0 0 00 1 1 0 01 1 0 0 11 0 0 0 0

δ2

δ21δ2δ1

0δ1δ2

=

0 δ1 0 0 δ10 0 0 0 00 δ1 δ1 0 0δ2 δ2 0 0 δ21 0 0 0 0

5-th row: all good

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

Damping: Bernoulli example

Idea: we construct a diagonal matrix of weights that regularizeseach row

0 1 0 0 10 0 0 0 00 1 1 0 01 1 0 0 11 0 0 0 0

δ2

δ21δ2δ1

0δ1δ2

=

0 δ1 0 0 δ10 0 0 0 00 δ1 δ1 0 0δ2 δ2 0 0 δ21 0 0 0 0

2-nd column has small weight: to be deleted

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

Lemma (with K.Tikhomirov)

B is n × n matrix with 0-1 entries, EBij ≤ p. Then for any L ≥ 10with probability 1− exp(−n exp(−Lpn)): if we define Wij as beforeand Vj :=

∏ni=1Wij , and J := {j : Vj < 0.1}, then

|J| ≤ n exp(−Lnp) and∑j∈Jc

Bij ≤ 10Lnp, for any i ∈ [n].

How do we use Lemma? Split

A2ij ≤

∑k

qk1{A2ij∈(qk−1,qk ]}, Ik := (qk−1, qk ].

Then∑

k qk−1P{A2ij ∈ Ik} ≤ EA2

ij = 1. Apply Lemma to Bk with

entries Bkij = A2

ij1{Aij∈Ik} ∼ qk1{Aij∈Ik} to get

‖rowj(AJc )‖22 .∑k

qknP{A2ij ∈ Ik} ≤ 2n.

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

Quantiles and regularization process

To pass from Bernoulli to general case now we need pk to be incontrol: not too small (probability estimate), not too large(cardinality estimate).

Definition (2−k quantiles)

Denote as 2−k -quantiles of |Aij | points qk , such that

P{|Aij | > qk} = 2−k .

LetAk := A · 1{Aij∈(qk−1,qk ]}

Note: quantiles qk can be approximated by order statistics of Aij

(it is a free set of samples from the distribution!) We useAk ∼ Ak . So, the algorithm is distribution-oblivious.

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

Submatrix norm regularization algorithm

1. delete too large entries

2. small entries are fine without regularization

3. for each average k construct weights for Ak : W kij and V k

j tofind an exceptional subset of columns Jk :

| ∪k Jk | ≤ εn/2 with high probability

4. J = J ∪ (∪kJk), where J is a subset of εn/2 columns withlargest norms

5. repeat the process for AT to find an exceptional row subset I

6. intersection of I and J gives a εn × εn exceptional matrix A0

=⇒ ‖A‖ = ‖A \ A0‖ ∼√n ln ln n by Main Theorem.

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

THANKS FOR YOUR ATTENTION!

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

Applications to community detection

A - adjacency matrix of Erdos-Renyi random graph G (n, pij).Stochastic block model:

EA =

p p s sp p s s

s s p ps s p p

Inhomogeneous Erdos-Renyi model G(n, (pij))

Edges are still independent, but can have di↵erent probabilities pij .

Allows to model networks with structure = communities (clusters).

Example. Stochastic block model with two communities G(n, p, q):Edges within each community: probability p; across communities: probability q < p.

Spectral method for community detection is based on the idea:

1. EA eigenstructure reveals communities,

2. eigenstructure[A] ∼ eigenstructure[EA].

So, eigenstructure of A (observed) reveals communities too.

Condition 2 is satisfied only for dense graphs. Idea for sparsegraphs: regularize graph locally to impose ‖A− EA‖ ∼ 0.

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

Obstructions for random graphs

Sparse graphs: maximal expecteddegree d := max pijn� log n.

[Feige-Ofek] obstructions to‖A− EA‖ ∼ 0 are few highdegree vertices of the graph.

For the regularization it is enough to

• Feige-Ofek: delete all high-degree vertices (degree > 10d)

• Le-Levina-Vershynin: reweight or delete some of the edgesadjacent to high-degree vertices (to make all the degreesbounded by 10d)

• R.: enough to delete a small ne−d × ne−d subgraph; gotdescription of the “bad” subgraph (we can direct its edges s.t.every vertex has a finite number of the outcoming edges)

Matrix norms Local norm regularization Constructive regularization Sub-block regularization Appendix

Theorem (Part 2: global obstructions)

Let A is an n × n matrix with i.i.d. entries, such that

• EA2ij ≥ M,

• |Aij | ≤√n almost surely.

If M = M(C , ε) is a large enough constant, then any εn × εnsub-matrix A0 has large norm

‖A0‖ ≥ C√n,

with probability at least 1− exp(−εn).

So, if we were to cut some partfor regularization, we need to cutalmost everything! No εn × εnsub-matrix can survive.

εn

n 0