sub-gaussian estimators of the mean of a random … estimators of the mean of a random matrix with...
TRANSCRIPT
![Page 1: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/1.jpg)
Sub-Gaussian Estimators of the Mean of a Random Matrix withEntries Possessing Only Two Moments
Stas MinskerUniversity of Southern California
July 21, 2016
ICERM Workshop
![Page 2: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/2.jpg)
Simple question: how to estimate the mean?
Assume that X1, . . . ,Xn are i.i.d. N (µ, σ20).
Problem: construct CInorm(α) for µ with coverage probability ≥ 1− 2α.
Solution: compute µn := 1n
n∑j=1
Xj , take
CInorm(α) =
[µn − σ0
√2
√log(1/α)
n, µn + σ0
√2
√log(1/α)
n
]
![Page 3: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/3.jpg)
Simple question: how to estimate the mean?
Assume that X1, . . . ,Xn are i.i.d. N (µ, σ20).
Problem: construct CInorm(α) for µ with coverage probability ≥ 1− 2α.
Solution: compute µn := 1n
n∑j=1
Xj , take
CInorm(α) =
[µn − σ0
√2
√log(1/α)
n, µn + σ0
√2
√log(1/α)
n
]
![Page 4: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/4.jpg)
Simple question: how to estimate the mean?
Assume that X1, . . . ,Xn are i.i.d. N (µ, σ20).
Problem: construct CInorm(α) for µ with coverage probability ≥ 1− 2α.
Solution: compute µn := 1n
n∑j=1
Xj , take
CInorm(α) =
[µn − σ0
√2
√log(1/α)
n, µn + σ0
√2
√log(1/α)
n
]
Coverage is guaranteed since
Pr
(∣∣µn − µ∣∣ ≥ σ0
√2 log(1/α)
n
)≤ 2α.
![Page 5: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/5.jpg)
Example: how to estimate the mean?
P. J. Huber (1964): “...This raises a question which could have been asked already by Gauss,but which was, as far as I know, only raised a few years ago (notably by Tukey): whathappens if the true distribution deviates slightly from the assumed normal one?"
Going back to our question: what if X1, . . . ,Xn are i.i.d. copies of X ∼ Π such that
EX = µ, Var(X) ≤ σ20?
Problem: construct CI for µ with coverage probability ≥ 1− α such that for any α
length(CI(α)) ≤ (Absolute constant) · length(CInorm(α))
No additional assumptions on Π are imposed.
Remark: guarantees for the sample mean µn = 1n
n∑j=1
Xj is unsatisfactory:
Pr
(∣∣µn − µ∣∣ ≥ σ0
√(1/α)
n
)≤ α.
Does the solution exist?
![Page 6: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/6.jpg)
Example: how to estimate the mean?
P. J. Huber (1964): “...This raises a question which could have been asked already by Gauss,but which was, as far as I know, only raised a few years ago (notably by Tukey): whathappens if the true distribution deviates slightly from the assumed normal one?"
Going back to our question: what if X1, . . . ,Xn are i.i.d. copies of X ∼ Π such that
EX = µ, Var(X) ≤ σ20?
Problem: construct CI for µ with coverage probability ≥ 1− α such that for any α
length(CI(α)) ≤ (Absolute constant) · length(CInorm(α))
No additional assumptions on Π are imposed.
Remark: guarantees for the sample mean µn = 1n
n∑j=1
Xj is unsatisfactory:
Pr
(∣∣µn − µ∣∣ ≥ σ0
√(1/α)
n
)≤ α.
Does the solution exist?
![Page 7: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/7.jpg)
Example: how to estimate the mean?
P. J. Huber (1964): “...This raises a question which could have been asked already by Gauss,but which was, as far as I know, only raised a few years ago (notably by Tukey): whathappens if the true distribution deviates slightly from the assumed normal one?"
Going back to our question: what if X1, . . . ,Xn are i.i.d. copies of X ∼ Π such that
EX = µ, Var(X) ≤ σ20?
Problem: construct CI for µ with coverage probability ≥ 1− α such that for any α
length(CI(α)) ≤ (Absolute constant) · length(CInorm(α))
No additional assumptions on Π are imposed.
Remark: guarantees for the sample mean µn = 1n
n∑j=1
Xj is unsatisfactory:
Pr
(∣∣µn − µ∣∣ ≥ σ0
√(1/α)
n
)≤ α.
Does the solution exist?
![Page 8: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/8.jpg)
Example: how to estimate the mean?
Answer (somewhat unexpected?): Yes!
Construction: [A. Nemirovski, D. Yudin ‘83; N. Alon, Y. Matias, M. Szegedy ‘96; R. Oliveira, M. Lerasle ‘11]
Split the sample into k = blog(1/α)c+ 1 groups G1, . . . ,Gk of size ' n/k each:
G1︷ ︸︸ ︷X1, . . . ,X|G1|︸ ︷︷ ︸µ1:= 1
|G1|∑
Xi∈G1
Xi
. . . . . .
Gk︷ ︸︸ ︷Xn−|Gk |+1, . . . ,Xn︸ ︷︷ ︸µk := 1
|Gk |∑
Xi∈Gk
Xi︸ ︷︷ ︸µ∗=µ∗(α):=median(µ1,...,µk )
Claim:
Pr
(|µ∗ − µ| ≥ 7.7σ0
√log(e/α)
n
)≤ α
![Page 9: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/9.jpg)
Example: how to estimate the mean?
Answer (somewhat unexpected?): Yes!
Construction: [A. Nemirovski, D. Yudin ‘83; N. Alon, Y. Matias, M. Szegedy ‘96; R. Oliveira, M. Lerasle ‘11]
Split the sample into k = blog(1/α)c+ 1 groups G1, . . . ,Gk of size ' n/k each:
G1︷ ︸︸ ︷X1, . . . ,X|G1|︸ ︷︷ ︸µ1:= 1
|G1|∑
Xi∈G1
Xi
. . . . . .
Gk︷ ︸︸ ︷Xn−|Gk |+1, . . . ,Xn︸ ︷︷ ︸µk := 1
|Gk |∑
Xi∈Gk
Xi︸ ︷︷ ︸µ∗=µ∗(α):=median(µ1,...,µk )
Claim:
Pr
(|µ∗ − µ| ≥ 7.7σ0
√log(e/α)
n
)≤ α
![Page 10: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/10.jpg)
Example: how to estimate the mean?
Answer (somewhat unexpected?): Yes!
Construction: [A. Nemirovski, D. Yudin ‘83; N. Alon, Y. Matias, M. Szegedy ‘96; R. Oliveira, M. Lerasle ‘11]
Split the sample into k = blog(1/α)c+ 1 groups G1, . . . ,Gk of size ' n/k each:
G1︷ ︸︸ ︷X1, . . . ,X|G1|︸ ︷︷ ︸µ1:= 1
|G1|∑
Xi∈G1
Xi
. . . . . .
Gk︷ ︸︸ ︷Xn−|Gk |+1, . . . ,Xn︸ ︷︷ ︸µk := 1
|Gk |∑
Xi∈Gk
Xi︸ ︷︷ ︸µ∗=µ∗(α):=median(µ1,...,µk )
Claim:
Pr
(|µ∗ − µ| ≥ 7.7σ0
√log(e/α)
n
)≤ α
![Page 11: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/11.jpg)
Example: how to estimate the mean?
Answer (somewhat unexpected?): Yes!
Construction: [A. Nemirovski, D. Yudin ‘83; N. Alon, Y. Matias, M. Szegedy ‘96; R. Oliveira, M. Lerasle ‘11]
Split the sample into k = blog(1/α)c+ 1 groups G1, . . . ,Gk of size ' n/k each:
G1︷ ︸︸ ︷X1, . . . ,X|G1|︸ ︷︷ ︸µ1:= 1
|G1|∑
Xi∈G1
Xi
. . . . . .
Gk︷ ︸︸ ︷Xn−|Gk |+1, . . . ,Xn︸ ︷︷ ︸µk := 1
|Gk |∑
Xi∈Gk
Xi︸ ︷︷ ︸µ∗=µ∗(α):=median(µ1,...,µk )
Claim:
Pr
(|µ∗ − µ| ≥ 7.7σ0
√log(e/α)
n
)≤ α
Then take
CI(α) =
[µ∗ − 7.7σ0
√log(e/α)
n, µ∗ + 7.7σ0
√log(e/α)
n
]
![Page 12: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/12.jpg)
Idea of the proof:
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
µ8µ1 µ. . . . . . . . . . . .
|µ− µ| ≥ s =⇒ at least half of events {|µj − µ| ≥ s} occur.
![Page 13: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/13.jpg)
Improve the constant?
O. Catoni’s estimator (2012), “Generalized truncation”: let α > 0
− log(1− x + x2/2) ≤ ψ(x) ≤ log(1 + x + x2/2),
and define µ vian∑
j=1
ψ(θ(Xj − µ)
)= 0.
![Page 14: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/14.jpg)
Improve the constant?
O. Catoni’s estimator (2012), “Generalized truncation”: let α > 0
− log(1− x + x2/2) ≤ ψ(x) ≤ log(1 + x + x2/2),
and define µ vian∑
j=1
ψ(θ(Xj − µ)
)= 0.
Truncation τ(x) = (|x | ∧ 1)sign(x) satisfies a weaker inequality
− log(1− x + x2) ≤ τ(x) ≤ log(1 + x + x2)
!1 0 1
!1
0
1
![Page 15: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/15.jpg)
Improve the constant?
n∑j=1
ψ(θ(Xj − µ)
)= 0.
Intuition: for small θ > 0,
n∑j=1
ψ(θ(Xj − µ)
)'
n∑j=1
θ(Xj − µ) = 0
=⇒ µ '1n
n∑j=1
Xj
![Page 16: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/16.jpg)
Improve the constant?
n∑j=1
ψ(θ(Xj − µ)
)= 0.
The following holds: set θ∗ =√
2 log(1/α)n
1σ0
. Then
|µ− µ| ≤(√
2 + o(1))σ0
√log(1/α)
n
with probability ≥ 1− 2α.
![Page 17: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/17.jpg)
Extensions to higher dimensions
A natural question: is it possible to extend presented techniques to the multivariate mean?
Motivation: PCA
Genes mirror geography within Europe, J. Novembre et al, Nature 2008.
Mathematical framework:
Y1, . . . ,Yn ∈ Rd , i.i.d. EYj = 0, EYj Y Tj = Σ.
Goal: construct Σ, an estimator of Σ such that∥∥∥Σ− Σ∥∥∥
Op
is small.
Sample covariance
Σn =1n
n∑j=1
Yj Y Tj
is very sensitive to outliers.
![Page 18: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/18.jpg)
Extensions to higher dimensionsA natural question: is it possible to extend presented techniques to the multivariate mean?Motivation: PCA
0
1
2
3
4
5
6
7
8
9
10
0
1
2
3
4
5
6
7
8
9
10
71
71.1
71.2
71.3
71.4
71.5
71.6
71.7
71.8
71.9
72
=⇒
0
1
2
3
4
5
6
7
8
9
10
0
1
2
3
4
5
6
7
8
9
10
1
1.2
1.4
1.6
1.8
2
2.2
2.4
2.6
2.8
3
Genes mirror geography within Europe, J. Novembre et al, Nature 2008.Mathematical framework:
Y1, . . . ,Yn ∈ Rd , i.i.d. EYj = 0, EYj Y Tj = Σ.
Goal: construct Σ, an estimator of Σ such that∥∥∥Σ− Σ∥∥∥
Op
is small.Sample covariance
Σn =1n
n∑j=1
Yj Y Tj
is very sensitive to outliers.
![Page 19: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/19.jpg)
Extensions to higher dimensions
A natural question: is it possible to extend presented techniques to the multivariate mean?
Motivation: PCAGenes mirror geography within Europe, J. Novembre et al, Nature 2008.
The direction of the PC1 axis and its relative strength may reflect aspecial role for this geographic axis in the demographic history ofEuropeans (as first suggested in ref. 10). PC1 aligns north-northwest/south-southeast (NNW/SSE, 216 degrees) and accounts forapproximately twice the amount of variation as PC2 (0.30% versus0.15%, first eigenvalue 5 4.09, second eigenvalue 5 2.04). However,caution is required because the direction and relative strength of thePC axes are affected by factors such as the spatial distribution ofsamples (results not shown, also see ref. 9). More robust evidencefor the importance of a roughly NNW/SSE axis in Europe is that, inthese same data, haplotype diversity decreases from south to north(A.A. et al., submitted). As the fine-scale spatial structure evident inFig. 1 suggests, European DNA samples can be very informativeabout the geographical origins of their donors. Using a multi-ple-regression-based assignment approach, one can place 50% of
individuals within 310 km of their reported origin and 90% within700 km of their origin (Fig. 2 and Supplementary Table 4, resultsbased on populations with n . 6). Across all populations, 50% ofindividuals are placed within 540 km of their reported origin, and90% of individuals within 840 km (Supplementary Fig. 3 andSupplementary Table 4). These numbers exclude individuals whoreported mixed grandparental ancestry, who are typically assignedto locations between those expected from their grandparental origins(results not shown). Note that distances of assignments fromreported origin may be reduced if finer-scale information on originwere available for each individual.
Population structure poses a well-recognized challenge for disease-association studies (for example, refs 11–13). The results obtainedhere reinforce that the geographic distribution of a sample is impor-tant to consider when evaluating genome-wide association studies
–0.03 –0.02 –0.01 0 0.01 0.02 0.03–0.03
–0.02
–0.01
0
0.01
0.02
0.03
Italy
Germany
France
UK
SpainPortugal
0 1,000 2,000 3,000
–0.010
0
0.010
0.020
Geographic distance betweenpopulations (km)
Med
ian
gene
tic c
orre
latio
n
PC
1a
b c
French-speaking SwissGerman-speaking SwissItalian-speaking Swiss
FrenchGermanItalian
Nor
th–s
outh
in P
C1–
PC
2 sp
ace
East–west in PC1–PC2 space
PC2
Figure 1 | Population structure within Europe. a, A statistical summary ofgenetic data from 1,387 Europeans based on principal component axis one(PC1) and axis two (PC2). Small coloured labels represent individuals andlarge coloured points represent median PC1 and PC2 values for eachcountry. The inset map provides a key to the labels. The PC axes are rotatedto emphasize the similarity to the geographic map of Europe. AL, Albania;AT, Austria; BA, Bosnia-Herzegovina; BE, Belgium; BG, Bulgaria; CH,Switzerland; CY, Cyprus; CZ, Czech Republic; DE, Germany; DK, Denmark;ES, Spain; FI, Finland; FR, France; GB, United Kingdom; GR, Greece; HR,
Croatia; HU, Hungary; IE, Ireland; IT, Italy; KS, Kosovo; LV, Latvia; MK,Macedonia; NO, Norway; NL, Netherlands; PL, Poland; PT, Portugal; RO,Romania; RS, Serbia and Montenegro; RU, Russia, Sct, Scotland; SE,Sweden; SI, Slovenia; SK, Slovakia; TR, Turkey; UA, Ukraine; YG,Yugoslavia. b, A magnification of the area around Switzerland froma showing differentiation within Switzerland by language. c, Geneticsimilarity versus geographic distance. Median genetic correlation betweenpairs of individuals as a function of geographic distance between theirrespective populations.
NATURE | Vol 456 | 6 November 2008 LETTERS
99 ©2008 Macmillan Publishers Limited. All rights reserved
good explanation for non-experts:https://faculty.washington.edu/tathornt/SISG2015/lectures/assoc2015session05.pdf
Mathematical framework:
Y1, . . . ,Yn ∈ Rd , i.i.d. EYj = 0, EYj Y Tj = Σ.
Goal: construct Σ, an estimator of Σ such that∥∥∥Σ− Σ∥∥∥
Op
is small.Sample covariance
Σn =1n
n∑j=1
Yj Y Tj
is very sensitive to outliers.
![Page 20: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/20.jpg)
Extensions to higher dimensions
A natural question: is it possible to extend presented techniques to the multivariate mean?
Motivation: PCA
Genes mirror geography within Europe, J. Novembre et al, Nature 2008.
Mathematical framework:
Y1, . . . ,Yn ∈ Rd , i.i.d. EYj = 0, EYj Y Tj = Σ.
Goal: construct Σ, an estimator of Σ such that∥∥∥Σ− Σ∥∥∥
Op
is small.
Sample covariance
Σn =1n
n∑j=1
Yj Y Tj
is very sensitive to outliers.
![Page 21: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/21.jpg)
Extensions to higher dimensions
A natural question: is it possible to extend presented techniques to the multivariate mean?
Motivation: PCA
Genes mirror geography within Europe, J. Novembre et al, Nature 2008.
Mathematical framework:
Y1, . . . ,Yn ∈ Rd , i.i.d. EYj = 0, EYj Y Tj = Σ.
Goal: construct Σ, an estimator of Σ such that∥∥∥Σ− Σ∥∥∥
Op
is small.
Sample covariance
Σn =1n
n∑j=1
Yj Y Tj
is very sensitive to outliers.
![Page 22: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/22.jpg)
Extensions to higher dimensions
Naive approach: apply the "median trick" (or Catoni’s estimator) coordinatewise.Makes the bound dimension-dependent.
Better approach – replace the usual median by the geometric median.
x∗ = med(x1, . . . , xk ) := argminy∈Rd
k∑j=1
‖y − xj‖.
Still some issues:1 does not work well for small sample sizes;2 yields bounds in the wrong norm.
Alternatives: Tyler’s M-estimator, Maronna’s M-estimator; guarantees are limited to specialclasses of distributions.
![Page 23: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/23.jpg)
Extensions to higher dimensions
Naive approach: apply the "median trick" (or Catoni’s estimator) coordinatewise.Makes the bound dimension-dependent.
Better approach – replace the usual median by the geometric median.
x∗ = med(x1, . . . , xk ) := argminy∈Rd
k∑j=1
‖y − xj‖.
Still some issues:1 does not work well for small sample sizes;2 yields bounds in the wrong norm.
Alternatives: Tyler’s M-estimator, Maronna’s M-estimator; guarantees are limited to specialclasses of distributions.
![Page 24: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/24.jpg)
Extensions to higher dimensions
Naive approach: apply the "median trick" (or Catoni’s estimator) coordinatewise.Makes the bound dimension-dependent.
Better approach – replace the usual median by the geometric median.
x∗ = med(x1, . . . , xk ) := argminy∈Rd
k∑j=1
‖y − xj‖.
Still some issues:1 does not work well for small sample sizes;2 yields bounds in the wrong norm.
Alternatives: Tyler’s M-estimator, Maronna’s M-estimator; guarantees are limited to specialclasses of distributions.
![Page 25: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/25.jpg)
Extensions to higher dimensions
Naive approach: apply the "median trick" (or Catoni’s estimator) coordinatewise.Makes the bound dimension-dependent.
Better approach – replace the usual median by the geometric median.
x∗ = med(x1, . . . , xk ) := argminy∈Rd
k∑j=1
‖y − xj‖.
Still some issues:1 does not work well for small sample sizes;2 yields bounds in the wrong norm.
Alternatives: Tyler’s M-estimator, Maronna’s M-estimator; guarantees are limited to specialclasses of distributions.
![Page 26: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/26.jpg)
Matrix functions
f : R 7→ R, A = AT = UΛUT , then
f (A) = Uf (Λ)UT , f (Λ) = f
λ1
. . .λd
=
f (λ1)
. . .f (λd )
![Page 27: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/27.jpg)
Construction of the estimator
X ∈ Rd×d - symmetric random matrix, X1, . . . ,Xn ∈ Rd×d – i.i.d. copies of X , E‖X‖2F <∞.
No additional assumptions.
− log(1− x + x2/2) ≤ ψ(x) ≤ log(1 + x + x2/2), θ > 0, define
Σn =1nθ
n∑j=1
ψ(θXj )
For example, if Xj = Yj Y Tj , we get
Σn =1nθ
n∑j=1
ψ(θYj Y T
j
)Intuition: for small θ, ψ(θx) ' θx , hence
Σn ' Sample mean + o(θ)
![Page 28: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/28.jpg)
Construction of the estimator
X ∈ Rd×d - symmetric random matrix, X1, . . . ,Xn ∈ Rd×d – i.i.d. copies of X , E‖X‖2F <∞.
No additional assumptions.
− log(1− x + x2/2) ≤ ψ(x) ≤ log(1 + x + x2/2), θ > 0, define
Σn =1nθ
n∑j=1
ψ(θXj )
For example, if Xj = Yj Y Tj , we get
Σn =1nθ
n∑j=1
ψ(θYj Y T
j
)Intuition: for small θ, ψ(θx) ' θx , hence
Σn ' Sample mean + o(θ)
![Page 29: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/29.jpg)
Construction of the estimator
X ∈ Rd×d - symmetric random matrix, X1, . . . ,Xn ∈ Rd×d – i.i.d. copies of X , E‖X‖2F <∞.
No additional assumptions.
− log(1− x + x2/2) ≤ ψ(x) ≤ log(1 + x + x2/2), θ > 0, define
Σn =1nθ
n∑j=1
ψ(θXj )
For example, if Xj = Yj Y Tj , we get
Σn =1nθ
n∑j=1
ψ(θYj Y T
j
)Note that
ψ(θYj Y T
j
)= ψ(θ‖Yj‖2
2)Yj
‖Yj‖2
Y Tj
‖Yj‖2
is easy to compute.
Intuition: for small θ, ψ(θx) ' θx , hence
Σn ' Sample mean + o(θ)
![Page 30: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/30.jpg)
Construction of the estimator
X ∈ Rd×d - symmetric random matrix, X1, . . . ,Xn ∈ Rd×d – i.i.d. copies of X , E‖X‖2F <∞.
No additional assumptions.
− log(1− x + x2/2) ≤ ψ(x) ≤ log(1 + x + x2/2), θ > 0, define
Σn =1nθ
n∑j=1
ψ(θXj )
For example, if Xj = Yj Y Tj , we get
Σn =1nθ
n∑j=1
ψ(θYj Y T
j
)Intuition: for small θ, ψ(θx) ' θx , hence
Σn ' Sample mean + o(θ)
![Page 31: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/31.jpg)
Σn =1nθ
n∑j=1
ψ(θXj)
Theorem (M., 2016)
X1, . . . ,Xn - i.i.d. Assume that σ2 ≥ ‖EX 2‖. Let θ =√
2 log(d/α)n
1σ
, then
∥∥∥Σn − EX∥∥∥ ≤ σ√2 log(d/α)
n
with probability ≥ 1− 2α.
For example, in covariance estimation σ2 =∥∥∥E‖Y‖2
2 YY T∥∥∥.
![Page 32: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/32.jpg)
Theorem (M., 2016)
X1, . . . ,Xn - i.i.d. Assume that σ2 ≥ ‖EX 2‖. Let θ =√
2 log(d/α)n
1σ
, then
∥∥∥Σn − EX∥∥∥ ≤ σ√2 log(d/α)
n
with probability ≥ 1− 2α.
Compare to:
Theorem (Matrix Bernstein inequality, Tropp ‘11)
X ,X1, . . . ,Xn ∈ Rd×d - i.i.d., σ20 =
∥∥E(X − EX)2∥∥, ‖X‖ ≤ M. Then for all 0 < α < 1,
∥∥∥1n
n∑j=1
Xj − EX∥∥∥ ≤ max
(2σ0
√log(d/α)
n,
43
M log(d/α)
n
)
with probability ≥ 1− 2α.
![Page 33: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/33.jpg)
Further improvements: Xj 7→ Xj + S,
Σ(S) = S +1nθ
n∑j=1
ψ(θ(Xj − S)
)︸ ︷︷ ︸
'EX−S
.
"Ideal choice" S = EX is unavailable =⇒ use the initial estimator Σn in place of S.
Iterate...
S∞ = S∞ +1nθ
n∑j=1
ψ(θ(Xj − S∞)
)︸ ︷︷ ︸
=0
![Page 34: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/34.jpg)
Further improvements: Xj 7→ Xj + S,
Σ(S) = S +1nθ
n∑j=1
ψ(θ(Xj − S)
)︸ ︷︷ ︸
'EX−S
.
"Ideal choice" S = EX is unavailable =⇒ use the initial estimator Σn in place of S.
Iterate...
S∞ = S∞ +1nθ
n∑j=1
ψ(θ(Xj − S∞)
)︸ ︷︷ ︸
=0
![Page 35: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/35.jpg)
Further improvements: Xj 7→ Xj + S,
Σ(S) = S +1nθ
n∑j=1
ψ(θ(Xj − S)
)︸ ︷︷ ︸
'EX−S
.
"Ideal choice" S = EX is unavailable =⇒ use the initial estimator Σn in place of S.
Iterate...
S∞ = S∞ +1nθ
n∑j=1
ψ(θ(Xj − S∞)
)︸ ︷︷ ︸
=0
![Page 36: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/36.jpg)
Theorem (M., 2016)
Assume that σ20 ≥ ‖E(X − EX)2‖. Let θ =
√2 log(d/α)
n1σ0
, and
1nθ
n∑j=1
ψ(θ(Xj − S∞)
)= 0.
Assume that n is large enough (n & d3). Then S∞ exists and
∥∥∥S∞ − EX∥∥∥ ≤ Cσ0
√log(d/α)
n
with probability ≥ 1− α.
![Page 37: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/37.jpg)
Numerical results
Y1, . . . ,Yn ∈ R100,
Σ =
10
51
. . .1
100
Yi,j ∼ symmetric Pareto-type distribution with 4 moments.
![Page 38: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/38.jpg)
Numerical results
Histograms over 500 replications: n = 100.
1 2 3 4 5 6 7 8 9 10 110
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Error
Fre
quency
Sample covariance estimator
Robust covariance estimator
Sample covariance error
‖Sn− Σ‖/‖Σ‖
Robust estimator error
‖Σn − Σ‖/‖Σ‖
![Page 39: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/39.jpg)
Numerical results
Histograms over 500 replications: n = 100.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Error
Fre
quency
Sample covariance estimator
Robust covariance estimator
‖u1(Σn)u(Σn)T− u1(Σ)u1(Σ)T‖
‖u1(Sn)u1(Sn)T− u1(Σ)u1(Σ)T‖
![Page 40: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/40.jpg)
Numerical results
Histograms over 500 replications: n = 1000.
0 1 10 20 30 40 50 600
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Error
Fre
quency
Sample covariance estimator
Robust covariance estimator
Robust estimator error
‖Σn − Σ‖/‖Σ‖
Sample covariance error
‖Sn− Σ‖/‖Σ‖
![Page 41: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/41.jpg)
Numerical results
Histograms over 500 replications: n = 1000.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Error
Fre
quency
Sample covariance estimator
Robust covariance estimator
‖u1(Σn)u(Σn)T− u1(Σ)u1(Σ)T‖
‖u1(Sn)u1(Sn)T− u1(Σ)u1(Σ)T‖
![Page 42: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/42.jpg)
Matrix Completion
Observe some entries of the ratings matrix
A0 =
movie 1 movie 2 . . . movie n
user 1 ∗ ∗ . . . ∗... . . . . . . . . .
...user k ∗ ∗ . . . ∗
Question: can we predict the unobserved entries?
![Page 43: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/43.jpg)
Matrix Completion
X ={
ej (d)eTk (d), 1 ≤ j ≤ d , 1 ≤ k ≤ d
}.
X1, . . . ,Xn - independent sample from Π := Unif(X ), and observations Yj , j = 1, . . . , n havethe form
Yj = tr (X Tj A0) + ξj , (“noisy matrix entry”)
where ξj , j = 1, . . . , n is additive noise.
E(YX) = 1d2 A0, hence natural estimator of A0 is
A =d2
n
n∑j=1
Yj Xj .
Incorporate low rank assumption:
Aτ = argminA∈Rd×d
[‖A− A‖2
F
d2+ τ‖A‖1
]
![Page 44: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/44.jpg)
Matrix Completion
X ={
ej (d)eTk (d), 1 ≤ j ≤ d , 1 ≤ k ≤ d
}.
X1, . . . ,Xn - independent sample from Π := Unif(X ), and observations Yj , j = 1, . . . , n havethe form
Yj = tr (X Tj A0) + ξj , (“noisy matrix entry”)
where ξj , j = 1, . . . , n is additive noise.
E(YX) = 1d2 A0, hence natural estimator of A0 is
A =d2
n
n∑j=1
Yj Xj .
Incorporate low rank assumption:
Aτ = argminA∈Rd×d
[‖A− A‖2
F
d2+ τ‖A‖1
]
![Page 45: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/45.jpg)
Matrix Completion
X ={
ej (d)eTk (d), 1 ≤ j ≤ d , 1 ≤ k ≤ d
}.
X1, . . . ,Xn - independent sample from Π := Unif(X ), and observations Yj , j = 1, . . . , n havethe form
Yj = tr (X Tj A0) + ξj , (“noisy matrix entry”)
where ξj , j = 1, . . . , n is additive noise.
E(YX) = 1d2 A0, hence natural estimator of A0 is
A =d2
n
n∑j=1
Yj Xj .
Incorporate low rank assumption:
Aτ = argminA∈Rd×d
[‖A− A‖2
F
d2+ τ‖A‖1
]
![Page 46: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/46.jpg)
Matrix Completion
X ={
ej (d)eTk (d), 1 ≤ j ≤ d , 1 ≤ k ≤ d
}.
X1, . . . ,Xn - independent sample from Π := Unif(X ), and observations Yj , j = 1, . . . , n havethe form
Yj = tr (X Tj A0) + ξj , (“noisy matrix entry”)
where ξj , j = 1, . . . , n is additive noise.
E(YX) = 1d2 A0, hence natural estimator of A0 is
A =d2
n
n∑j=1
Yj Xj .
Incorporate low rank assumption:
Aτ = argminA∈Rd×d
[‖A− A‖2
F
d2+ τ‖A‖1
]
![Page 47: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/47.jpg)
Matrix completion
What if noise ξj is heavy-tailed (only Var(ξj ) <∞)?
Replace A with a "robust" estimator
R =d2
nθ
n∑j=1
ψ(θYjH(Xj )
)and
Rτ = argminA∈Rd×d
[‖A− R‖2
F
d2+ τ‖A‖1
].
Here, H(X) =
(0 X
X T 0
)is the so-called self-adjoint dilation.
![Page 48: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/48.jpg)
Matrix completion
What if noise ξj is heavy-tailed (only Var(ξj ) <∞)?
Replace A with a "robust" estimator
R =d2
nθ
n∑j=1
ψ(θYjH(Xj )
)and
Rτ = argminA∈Rd×d
[‖A− R‖2
F
d2+ τ‖A‖1
].
Here, H(X) =
(0 X
X T 0
)is the so-called self-adjoint dilation.
![Page 49: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/49.jpg)
Matrix completionWhat if noise ξj is heavy-tailed (only Var(ξj ) <∞)?
Replace A with a "robust" estimator
R =d2
nθ
n∑j=1
ψ(θYjH(Xj )
)and
Rτ = argminA∈Rd×d
[‖A− R‖2
F
d2+ τ‖A‖1
].
Here, H(X) =
(0 X
X T 0
)is the so-called self-adjoint dilation.
Theorem (M., 2016)Take
τ = Const ·√
t + log 2dnd
,
then1
d2
∥∥∥Rτ −H(A0)∥∥∥2
F≤(
1 +√
22
)2d · 2rank(A0)
n
√t + log 2d
with probability ≥ 1− e−t .
![Page 50: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of](https://reader033.vdocument.in/reader033/viewer/2022051803/5b0382c07f8b9a8c688c4504/html5/thumbnails/50.jpg)
Thank you for your attention!