epca: high dimensional exponential family principal ... · 1.new method epca for pca of exponential...

ePCA: High Dimensional Exponential FamilyPrincipal Component Analysis

Lydia T. Liu ’17

ORFE/PACM, Princeton University

April 20, 2017

Joint work with Edgar Dobriban and Amit Singer

1 / 35

Overview

IntroductionReview of PCAThe ePCA problem

The ePCA methodOverviewDiagonal DebiasingHomogenizationShrinkageHeterogenization and ScalingDenoising

Illustration with X-ray molecular imagingExperiments on simulated XFEL data

Software

2 / 35

Principal Component Analysis (PCA)

I PCA: useful tool for dimensionality reduction and dataanalysis

I Data X : n × p matrix (n samples from p-dimensionalpopulation)

I PC ’s: linear combinations of features explaining the mostvariance

I eigenvectors of sample covariance matrix Σn = 1nX>X

I corresponding eigenvalue λi is variance of PCi

3 / 35

PCA in classical vs. high-dimensional regimes

Classical setting: p fixed, n→∞I Σn → Σ by the law of large numbers

High dimensional setting: p/n = γ ∈ (0,∞), p →∞, n→∞I Classical asymptotics don’t apply. Original PCA is

inconsistent!

I Limiting spectrum: Marchenko-Pastur (MP) distribution(Marchenko and Pastur 1967)

I Acute need to shrink empirical eigenvalues (Donoho et al.2013)

4 / 35

Motivating example for ePCA: XFEL

(a) 3D structure of alysozyme

(b) XFEL imaging process (Gaffney and Chapman2007)

Figure: X-ray free electron laser (XFEL) molecular imaging

5 / 35

Demo of ePCA on XFEL images

(a) Clean intensitymaps

(b) Noisy photoncounts

(c) Denoised (PCA) (d) Denoised (ePCA)

Figure: XFEL diffraction images (n = 70, 000, p = 65, 536)

6 / 35

PCA for the exponential family

In many applications Xij ’s have an exponential family distribution

I SNPs: Binomial

I RNA-seq: Negative Binomial

I XFEL/photon-limited imaging: low-intensity Poisson

Single-parameter exponential family distributions

Has density of the form fθ(y) = exp θy − A(θ).

No commonly agreed upon version of PCA for non-Gaussian data(Jolliffe 2002)

7 / 35

A new method: ePCA

Deterministic 4-step algorithm using moments and shrinkage

Advantages compared to previous proposalsI Likelihood/generalized linear latent variable models (Collins

et al. 2001; Knott and Bartholomew 1999; Udell et al. 2016)I lack of global convergence guarantees

I Gaussianizing transforms: wavelet, Anscombe (Anscombe1948; Starck et al. 2010)

I unsuitable for low-intensity

8 / 35

Problem formulation

Sampling model for Poisson

I Each p-dim latent vector is drawn i.i.d. from distribution D

(X (1), · · · ,X (p))> = X ∼ D(µ,Σ)

— e.g., the noiseless image. µ and Σ are the mean andcovariance of D.

I Observations Yi ∼ Y ∈ Rp — e.g., the noisy image

I Model for Y : draw latent X ∈ Rp, then

Y = (Y (1), · · · ,Y (p))> where Y (j) ∼ Poisson(X (j))

Goal: Recover information about the original distribution D, i.e.Σ. Recover X .

9 / 35

Problem formulation

Sampling model for Exponential family

I One-parameter exponential family with density

fθ(y) = exp θy − A(θ)

Natural parameter θ, E[y ] = A′(θ),Var[y ] = A′′(θ)

I Observations Yi ∼ Y ∈ Rp

I Model for Y : draw latent θ ∈ Rp, then

Y (j) | θ(j) ∼ fθ(j)(y), Y = (Y (1), · · · ,Y (p))>

Goal: Recover Σx := Cov(A′(θ)) and X := E(Y |θ) = A′(θ)

10 / 35

Mean parameter is low-rank

Mean X = E(Y |θ) has unknown low-dim structure

I as opposed to natural parameter θ

I can leverage Random Matrix Theory ⇒ simple method

I reasonable for image data (Basri and Jacobs 2003)

11 / 35

Our sampling model is realistic

Two applications

1. XFEL images 2. Single cell RNA-seq

Sample la-tent vectorX

Random 2D intensitymap due to random 3Dorientation of molecule

Random expressionrate due to biologicalvariation of cells

Sample Y ∼Poisson(X )

Noisy 2D image Y dueto low photon count

Noisy read counts Ydue to technical varia-tion

12 / 35




Software

13 / 35

Summary of ePCA

ePCA can be seen as a sequence of improved covariance estimators

Table: Covariance estimators

Name Formula Motivation

Sample covariance S = 1n

∑ni=1(Yi − Y )(Yi − Y )> -

Diagonal debiasing Sd = S − diag[V (Y )] Hierarchy

Homogenization Sh = D−1/2n SdD

−1/2n Heteroskedasticity

Shrinkage Sh,η = η(Sh) High dimensionality

Heterogenization She = D1/2n Sh,ηD

1/2n Heteroskedasticity

Scaling Ss =∑

αi vi v>i (She =

∑vi v>i ) Heteroskedasticity

14 / 35

Diagonally debiasing the sample covariance

Poisson case:

I Sample Mean Y n = 1n

∑ni=1 Yi

I Sample Covariance S = 1n

∑ni=1

(Yi − Y n

) (Yi − Y n

)TI But it’s inconsistent! E[S ] = Σ + diag[µ]

I Dn := diag(Y n

)I Diagonally debiased covariance Sd = S − Dn

15 / 35

Diagonally debiasing the sample covariance

Exponential family:

I Cov[Y ] = Cov[A′(θ)] + Ediag[A′′(θ)] by law of total variance

I Mean-variance map:V (y) = A′′[(A′)−1(y)]

I Dn := diag[V (Y n)] estimates diag[A′′(θ)]

I Sd = S − Dn

Theorem (Convergence of the debiased covariance (Liu et al.2016))

E[‖Sd −Σx‖F ] .

√p

n[√p ·m4 + ‖µ‖]

16 / 35

Homogenization

I Motivation: Remove the effects of heteroskedasticity ⇒ closerto standard spiked model (Johnstone 2001) + MP law

I Sh = D−1/2n SdD

−1/2n = D

−1/2n SD

−1/2n − Ip

I Different from Standardization (dividing each measurement byits empirical standard deviation)!

(a) Spectrum before homogenization (b) Spectrum after homogenizationwith red MP density

17 / 35

Marchenko-Pastur Law for ePCA

Theorem (Marchenko-Pastur Law (Liu et al. 2016))

I The eigenvalue distribution of S converges a.s. to general MPFγ,H .

I The eigenvalue distribution of Sh + Ip converges a.s. to thestandard MP with aspect ratio γ.

Importance of Homogenization and the MP law

I Use optimal eigenvalue shrinkers for covariance estimation(Donoho et al. 2013; Lee et al. 2010)

I Improves signal strength (Liu et al. 2016)

I Matches Hardy-Weinberg equilibrium normalization(Patterson et al. 2006)

18 / 35

Eigenvalue shrinkageI Reduce noise by shrinkageI η(·) scalar shrinker, applied elementwise to eigenvalues:

M = VΛV>; η(M) = V η(Λ)V>; Sh,η = η(Sh)

I Optimal shrinkage functions (Donoho et al. 2013)

Figure: Need to Shrink! Spiked model spectrum after homogenization

19 / 35

Heterogenization

I Homogenized covariance matrix Sh doesn’t estimate theoriginal covariance matrix Σ! ⇒ need to heterogenize bymultiplying back the estimated standard errors

I Improves eigenvector estimates

I She = D1/2n · Sh,η · D

1/2n

20 / 35

Scaling

Heterogenization induces a bias in the eigenvalues (Liu et al. 2016)

I Need for final Scaling step to correct the bias

I Ss =∑αi vi v

>i (She =

∑vi v>i )

I Scaling function based on assuming a Gaussian spiked model(Baik et al. 2005; Johnstone 2001)

I we conjecture the literature about phase transition also appliesto our model

21 / 35

Denoising by EBLP

I Use standard strategy Empirical Best Linear Predictor (EBLP)from random effects model (Searle et al. 2009). First weestimate E[A′′(θ)] and Σx using ePCA. Then, estimate

Xi = Σx

[diag[EA′′(θ)] + Σx

]−1Yi + diag[EA′′(θ)]

[diag[EA′′(θ)] + Σx

]−1Y .

I For the Poisson distribution,

Xi = Ss(diag[Y ] + Ss

)−1Yi + diag[Y ]

(diag[Y ] + Ss

)−1Y .

22 / 35




Software

23 / 35

Covariance estimation

(a) Spectral norm (b) Frobenius norm

Figure: Error of covariance matrix estimation: norm of the differencebetween each successive covariance estimate (Sample, +Debiased,+Heterogenized, +Scaled) and the true covariance Σx .

24 / 35

Eigenvalues

Figure: Error of eigenvalue estimation for the top 5 eigenvalues,measured as percentage error relative to the true eigenvalue

25 / 35

Eigenvectors

Figure: XFEL Eigenimages for γ = 1/4, ordered by eigenvalue

26 / 35

Denoising comparison

Figure: Sampled reconstructions using the XFEL dataset

Figure: Comparing various methods’ sampled reconstructions of theXFEL dataset ePCA took 13.9 seconds, while (Collins et al. 2001)’sexponential family PCA took 10900 seconds, or 3 hours

27 / 35

Denoising results

Figure: MSE against log10 sample size. Mean over 50 Monte Carlo trials.Purple: PCA. Grey: ePCA (projection only). Green: ePCA +EBLP

28 / 35

Software

I ePCA is publicly available in an open-source Matlabimplementation from github.com/lydiatliu/epca/

I Link to ePCA main function

I e.g. [cov,~,~,~,~,eigval,eigvec,~,~,~]

= exp_fam_pca(data,’poisson’)

29 / 35

github.com/lydiatliu/epca/

https://github.com/lydiatliu/epca/blob/master/software/exp_fam_pca.m

Summary

1. New method ePCA for PCA of exponential family data, basedon a new covariance estimator.

2. Homogenization, shrinkage, heterogenization, and scaling ofthe debiased covariance matrix improve performance forhigh-dimensional data. Each step has theoretical justifications.

3. Applied ePCA to simulated XFEL data ⇒ reduce the MSE forcovariance, eigenvalue, eigenvector estimation.

4. Used ePCA to develop new denoising method, a form ofempirical Best Linear Predictor (EBLP) from random effectsmodels. Demonstrate on simulated XFEL data.

30 / 35

References I

F. J. Anscombe. “THE TRANSFORMATION OFPOISSON, BINOMIAL AND NEGATIVE-BINOMIALDATA”. In: Biometrika 35.3-4 (1948), p. 246.

Jinho Baik, Gerard Ben Arous, and Sandrine Peche.“Phase transition of the largest eigenvalue for nonnullcomplex sample covariance matrices”. In: Annals ofProbability 33.5 (2005), pp. 1643–1697.

Ronen Basri and David W Jacobs. “LambertianReflectance and Linear Subspaces”. In: IEEETransactions on Pattern Analysis and MachineIntelligence 25.2 (2003), pp. 218–233.

Michael Collins, S Dasgupta, and Re Schapire. “Ageneralization of principal component analysis to theexponential family”. In: Advances in NeuralInformation Processing Systems (NIPS) (2001).

31 / 35

References II

Dl Donoho, M Gavish, and Im Johnstone. “Optimalshrinkage of eigenvalues in the Spiked CovarianceModel”. In: arXiv preprint arXiv:1311.0851 0906812(2013), pp. 1–35. arXiv: arXiv:1311.0851v1.

K. J. Gaffney and H. N. Chapman. “Imaging AtomicStructure and Dynamics with Ultrafast X-rayScattering”. In: Science 316.5830 (2007),pp. 1444–1448. issn: 0036-8075. doi:10.1126/science.1135923. eprint:http://science.sciencemag.org/content/316/

5830/1444.full.pdf. url: http://science.sciencemag.org/content/316/5830/1444.

Iain M Johnstone. “On the distribution of the largesteigenvalue in principal components analysis”. In:Annals of Statistics 29.2 (2001), pp. 295–327.

32 / 35

http://arxiv.org/abs/arXiv:1311.0851v1

http://dx.doi.org/10.1126/science.1135923

http://science.sciencemag.org/content/316/5830/1444.full.pdf

http://science.sciencemag.org/content/316/5830/1444.full.pdf

http://science.sciencemag.org/content/316/5830/1444

http://science.sciencemag.org/content/316/5830/1444

References III

Ian Jolliffe. Principal Component Analysis. WileyOnline Library, 2002.

Martin Knott and David J Bartholomew. Latentvariable models and factor analysis. Edward Arnold,1999.

Seunggeun Lee, Fei Zou, and Fred A Wright.“Convergence and prediction of principal componentscores in high-dimensional settings”. In: Annals ofStatistics 38.6 (2010), pp. 3605–3629.

Vladimir A Marchenko and Leonid A Pastur.“Distribution of eigenvalues for some sets of randommatrices”. In: Mat. Sb. 114.4 (1967), pp. 507–536.

N Patterson, AL Price, and D Reich. “Populationstructure and eigenanalysis”. In: PLoS Genet 2.12(2006), e190.

33 / 35

References IV

Shayle R Searle, George Casella, andCharles E McCulloch. Variance components. JohnWiley & Sons, 2009.

Andrey A Shabalin and Andrew B Nobel.“Reconstruction of a low-rank matrix in the presenceof Gaussian noise”. In: Journal of MultivariateAnalysis 118 (2013), pp. 67–76.

Jean-Luc Starck, Fionn Murtagh, and Jalal M Fadili.Sparse image and signal processing: wavelets,curvelets, morphological diversity. Cambridgeuniversity press, 2010.

Madeleine Udell et al. “Generalized Low RankModels”. In: Foundations and Trends in MachineLearning 9.1 (2016), pp. 1–118.

34 / 35

References V

L. T. Liu, E. Dobriban, and A. Singer. “ePCA: HighDimensional Exponential Family PCA”. In: ArXive-prints (Nov. 2016). eprint: 1611.05550.

35 / 35

1611.05550

epca: high dimensional exponential family principal ... · 1.new method epca for pca of exponential...

Documents