arxiv:2005.06551v2 [astro-ph.co] 15 jul 2020 · both of tomographic weak lensing band-powers, and...

16
MNRAS 000, 115 (2020) Preprint 16 July 2020 Compiled using MNRAS L A T E X style file v3.0 Parameter Inference for Weak Lensing using Gaussian Processes and MOPED Arrykrishna Mootoovaloo 1 ? , Alan F. Heavens 1 , Andrew H. Jaffe 1 and Florent Leclercq 1 1 ICIC, Astrophysics, Imperial College, Blackett Laboratory, Prince Consort Road, London SW7 2AZ, UK Accepted 2020 July 10. Received 2020 July 03; in original form 2020 May 19 ABSTRACT In this paper, we propose a Gaussian Process (GP) emulator for the calculation both of tomographic weak lensing band-powers, and of coefficients of summary data massively compressed with the MOPED algorithm. In the former case cosmological parameter inference is accelerated by a factor of 10-30 compared with Boltzmann solver CLASS applied to KiDS-450 weak lensing data. Much larger gains of order 10 3 will come with future data, and MOPED with GPs will be fast enough to permit the Limber approximation to be dropped, with acceleration in this case of 10 5 .A potential advantage of GPs is that an error on the emulated function can be computed and this uncertainty incorporated into the likelihood. However, it is known that the GP error can be unreliable when applied to deterministic functions, and we find, using the Kullback-Leibler divergence between the emulator and CLASS likelihoods, and from the uncertainties on the parameters, that agreement is better when the GP uncertainty is not used. In future, weak lensing surveys such as Euclid, and the Legacy Survey of Space and Time (LSST), will have up to 10 4 summary statistics, and inference will be correspondingly more challenging. However, since the speed of MOPED is determined not the number of summary data, but by the number of parameters, MOPED analysis scales almost perfectly, provided that a fast way to compute the theoretical MOPED coefficients is available. The GP provides such a fast mechanism. Key words: cosmology: cosmological parameters - large-scale structure of Universe - gravitational lensing: weak - methods: data analysis - statistical 1 INTRODUCTION With the continuous advancement of technology, our under- standing of the Universe has progressively improved, start- ing with the radiation of the cosmic microwave background (CMB) from experiments such as COBE (Smoot et al. 1992; Jaffe et al. 2001), WMAP (Spergel et al. 2003) and Planck (Planck Collaboration et al. 2014, 2016) to ongoing observa- tions of supernovae as standard candles (Betoule et al. 2014) and large-scale structure probes (Anderson et al. 2014). While these experiments place the ΛCDM model on a firm footing, there is still the need to understand better dark mat- ter and the evolution of dark energy. A powerful probe of the geometry of the Universe is cosmic shear, the weak lensing effect observed as a result of bending of light between the observer and background galaxies due to intervening large- scale structure. Cosmic shear is proving to be a powerful method for measuring the temporal and spatial properties ? E-mail: [email protected] of dark species statistically over a large sample of sources (ohlinger et al. 2017). However, one possible bottleneck in the era of mas- sive surveys lies in the forward computation of two-point statistics from cosmological parameters, either correlation functions or power spectra (Kilbinger 2015). The complex- ity of the problem is exacerbated when performing weak lensing analysis in n tomographic redshift slices, for which n(n + 1)/2 auto- and cross-correlations are needed for each angular scale probed. If intrinsic alignments (Heavens et al. 2000a; Hirata & Seljak 2004) are included, we require an additional calculation of n(n + 1) power spectra. In general, running this full forward model is computationally expen- sive. The problem is even worse in very large simulation set- tings. As an example, Heitmann et al. (2009) argued that a naive analysis to obtain cosmological constraints from future suveys such as LSST (Seppala 2002; LSST Science Collabo- ration et al. 2017) from weak lensing shear spectra will take a 2048 processor machine 20 years. An alternative approach is to replace the simulator (the full forward model) by an © 2020 The Authors arXiv:2005.06551v2 [astro-ph.CO] 15 Jul 2020

Upload: others

Post on 14-Mar-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: arXiv:2005.06551v2 [astro-ph.CO] 15 Jul 2020 · both of tomographic weak lensing band-powers, and of coe cients of summary data massively compressed with the MOPED algorithm. In the

MNRAS 000, 1–15 (2020) Preprint 16 July 2020 Compiled using MNRAS LATEX style file v3.0

Parameter Inference for Weak Lensing using GaussianProcesses and MOPED

Arrykrishna Mootoovaloo1?, Alan F. Heavens1, Andrew H. Jaffe1 and Florent Leclercq11ICIC, Astrophysics, Imperial College, Blackett Laboratory, Prince Consort Road, London SW7 2AZ, UK

Accepted 2020 July 10. Received 2020 July 03; in original form 2020 May 19

ABSTRACT

In this paper, we propose a Gaussian Process (GP) emulator for the calculationboth of tomographic weak lensing band-powers, and of coefficients of summary datamassively compressed with the MOPED algorithm. In the former case cosmologicalparameter inference is accelerated by a factor of ∼ 10-30 compared with Boltzmannsolver CLASS applied to KiDS-450 weak lensing data. Much larger gains of order 103

will come with future data, and MOPED with GPs will be fast enough to permitthe Limber approximation to be dropped, with acceleration in this case of ∼ 105. Apotential advantage of GPs is that an error on the emulated function can be computedand this uncertainty incorporated into the likelihood. However, it is known that the GPerror can be unreliable when applied to deterministic functions, and we find, using theKullback-Leibler divergence between the emulator and CLASS likelihoods, and fromthe uncertainties on the parameters, that agreement is better when the GP uncertaintyis not used. In future, weak lensing surveys such as Euclid, and the Legacy Survey ofSpace and Time (LSST), will have up to ∼ 104 summary statistics, and inference will becorrespondingly more challenging. However, since the speed of MOPED is determinednot the number of summary data, but by the number of parameters, MOPED analysisscales almost perfectly, provided that a fast way to compute the theoretical MOPEDcoefficients is available. The GP provides such a fast mechanism.

Key words: cosmology: cosmological parameters - large-scale structure of Universe- gravitational lensing: weak - methods: data analysis - statistical

1 INTRODUCTION

With the continuous advancement of technology, our under-standing of the Universe has progressively improved, start-ing with the radiation of the cosmic microwave background(CMB) from experiments such as COBE (Smoot et al. 1992;Jaffe et al. 2001), WMAP (Spergel et al. 2003) and Planck(Planck Collaboration et al. 2014, 2016) to ongoing observa-tions of supernovae as standard candles (Betoule et al. 2014)and large-scale structure probes (Anderson et al. 2014).While these experiments place the ΛCDM model on a firmfooting, there is still the need to understand better dark mat-ter and the evolution of dark energy. A powerful probe of thegeometry of the Universe is cosmic shear, the weak lensingeffect observed as a result of bending of light between theobserver and background galaxies due to intervening large-scale structure. Cosmic shear is proving to be a powerfulmethod for measuring the temporal and spatial properties

? E-mail: [email protected]

of dark species statistically over a large sample of sources(Kohlinger et al. 2017).

However, one possible bottleneck in the era of mas-sive surveys lies in the forward computation of two-pointstatistics from cosmological parameters, either correlationfunctions or power spectra (Kilbinger 2015). The complex-ity of the problem is exacerbated when performing weaklensing analysis in n tomographic redshift slices, for whichn(n + 1)/2 auto- and cross-correlations are needed for eachangular scale probed. If intrinsic alignments (Heavens et al.2000a; Hirata & Seljak 2004) are included, we require anadditional calculation of n(n + 1) power spectra. In general,running this full forward model is computationally expen-sive. The problem is even worse in very large simulation set-tings. As an example, Heitmann et al. (2009) argued that anaive analysis to obtain cosmological constraints from futuresuveys such as LSST (Seppala 2002; LSST Science Collabo-ration et al. 2017) from weak lensing shear spectra will takea 2048 processor machine 20 years. An alternative approachis to replace the simulator (the full forward model) by an

© 2020 The Authors

arX

iv:2

005.

0655

1v2

[as

tro-

ph.C

O]

15

Jul 2

020

Page 2: arXiv:2005.06551v2 [astro-ph.CO] 15 Jul 2020 · both of tomographic weak lensing band-powers, and of coe cients of summary data massively compressed with the MOPED algorithm. In the

2 A. Mootoovaloo et al.

approximate mathematical function, often referred to as ametamodel, surrogate model, or emulator.

Various types of emulators have been designed for differ-ent purposes in cosmology. As an example, PICO (Fendt &Wandelt 2007b) was developed to accelerate parameter esti-mation for the CMB. The underlying idea is to use an order-p polynomial and a clustering method to interpolate be-tween power spectra generated at specific points in parame-ter space. On the other hand, neural networks have also beenbeen used as emulators for generating power spectra. Auldet al. (2007) used a 3-layer multilayer perceptron (MLP)algorithm to construct an emulator to accelerate the calcu-lation of the CMB power spectra. Neural networks learn thenon-linearities between the input space and output space byoptimizing the weights associated with each neuron. In thesame spirit, Schmit & Pritchard (2018) developed a 2-layerMLP algorithm to emulate the 21cm power spectrum. Simi-larly, Agarwal et al. (2012, 2014) designed a neural networkemulator for non-linear matter power spectrum interpolationbased on 6 cosmological parameters. Charnock et al. (2018)also used neural networks to find functionals of the data thatmaximise the Fisher information matrix. Manrique-Yus &Sellentin (2020) devised a neural network scheme, coupled toan MCMC, to accelerate cosmological parameter inference.In an alternative approach, Habib et al. (2007) developed astatistical framework using Gaussian Processes to emulatematter power spectrum and Schneider et al. (2011) used asimilar method to emulate cosmic microwave backgroundtemperature power spectrum. An analogous approach wasused and extended in the Coyote Universe collaboration(Heitmann et al. 2009, 2010; Lawrence et al. 2010; Heitmannet al. 2014). Recently, Bird et al. (2019) and Rogers et al.(2019) developed a Gaussian Process emulator for Lyman-α forest simulation. Kern et al. (2017) designed a GaussianProcess emulator for the 21cm power spectrum.

An alternative to predicting the signal using MachineLearning techniques is to predict the likelihood. Instead ofbuilding emulators at the level of the theory/data, anotheroption is to construct a likelihood regressor. Fendt & Wan-delt (2007a) extended the PICO formalism to fit a likelihoodfunction. Leclercq (2018) developed the BOLFI algorithm(Gutmann & Corander 2015) to construct a likelihood re-gressor which fits two cosmological parameters w0 and Ωmfor the JLA dataset (Betoule et al. 2014). In particular, inthe BOLFI approach, the uncertainty from the GaussianProcess appears in the acquisition function, which is usedto choose the next design point where the simulation willbe run. However, the surrogate model itself is not a perfectreconstruction of the true likelihood function. Instead, if theemulator were built at the level of the theoretical model,the GP uncertainty can be propagated in the full forwardmodel. In the same spectrum, Alsing et al. (2018) developeda density-estimation likelihood-free method and argued thatit requires less tuning compared to traditional approachessuch as Approximate Bayesian Computation, ABC. Leclercqet al. (2019) also developed a likelihood-free approach to in-fer power spectrum and cosmological parameters from for-ward simulations only.

Each type of emulator (polynomial, neural network andGaussian Processes) has its own advantages and disadvan-tages. In the case of polynomial regression, one has to specifythe basis functions prior to performing interpolation. How-

ever, it provides a full predictive distribution of the functionat a test point. On the other hand, neural network regres-sion requires much empirical work, for example, choosingan appropriate optimiser, setting the learning rate of theoptimiser and choosing the number of layers and neurons.The training stage is computationally very challenging if wehave tens or hundreds of functions to interpolate, as in ourcase, but it can be done (Manrique-Yus & Sellentin 2020).The outputs from a neural network are point estimates, un-less we consider Bayesian neural networks which attempt toget a full posterior distribution on the weights of the neuralnetwork using variational inference methods, but this is alsochallenging (Kendall & Gal 2017).

In this paper, we develop a Gaussian Process emulatorfor speeding cosmological parameter inference for weak lens-ing cosmology whilst keeping the number of calls to the fullsolver to the order of a few tens of thousands only, to avoidan expensive and potentially unstable Cholesky decompo-sition. In other contexts, where the generation of trainingpoint data may be done through expensive full numericalsimulations, the number may need to be further restricted.Despite this restriction, an important and interesting resultis that the posterior densities between the emulator andthe full solver are comparable. On the other hand, in thecase of full N-body simulations, one might be restricted toonly a few hundred forward simulations, and the mechanism(MOPED + GP) proposed in this work (see discussion be-low) will enable us to develop promising inference schemesfor this scenario. A useful advantage of GPs is that we alsoget a theoretical predictive uncertainty. However, note thatthe latter depends on the Gaussian Process model, includingthe kernel. Building such a framework is a challenging task.It is well-known that non-parametric regression methodssuch as Gaussian Processes suffer from the so-called ‘curse ofdimensionality’ (Geenens et al. 2011) as a result of the train-ing data being sparse in high dimensional spaces. The onlyway to obtain accurate interpolation is to add information toour model and this can be achieved by adding more and moretraining points. However, this involves a penalty, not onlyin terms of computations but also storage. Moreover, fromthe cosmology perspective, drawing inferences from currentweak lensing data is a demanding task, for we have to cor-rectly account for sources of systematic error which add neweffective parameters to our model. In addition to this, giventhe current status of the weak lensing field with relativelyfew data points at low signal-to-noise, cosmological param-eters are not very well-constrained compared to (e.g.) theCMB. As a result, our training points have to be distributedover a larger volume in parameter space.

Fortunately, sophisticated techniques such as Latin Hy-percube (LH) sampling (McKay et al. 1979), with the appro-priate transformations at the level of both the input covari-ates and the response, can simplify the problem. Re-castingthe full problem as a hierarchical model enables us to lever-age the predictive uncertainty of the reconstructed proba-bilistic functions, thus propagating the uncertainty consis-tently in the statistical framework. This is in principle amajor advantage of Gaussian Processes, but the GP error isnot always a reliable measure when applied to deterministicfunctions (Karvonen et al. 2020; Wang 2020), and we findthat the posteriors are more accurate if it is not included(see §6 for further details).

MNRAS 000, 1–15 (2020)

Page 3: arXiv:2005.06551v2 [astro-ph.CO] 15 Jul 2020 · both of tomographic weak lensing band-powers, and of coe cients of summary data massively compressed with the MOPED algorithm. In the

Inference with GP and MOPED for WL 3

Gaussian'Processes

LikelihoodCovariance Data Systematics

Figure 1. A diagrammatic form of the core principle in this work.

We substitute the most expensive part of the pipeline by surrogatemodels (Gaussian Processes) built at the level of the band powers.

The other blocks in the inference procedure, for example, for the

computations related to the nuisance parameters, are unaltered.

A powerful application of the GP emulator is to combineit with the extreme data compression algorithm MOPED(Heavens et al. 2000b) to further speed up parameter in-ference. MOPED is essentially a lossless data compressiontechnique which, irrespective of the size of the original data,compresses the latter to the number of parameters in themodel. One can simply replace the theoretical predictionby the surrogate model in the MOPED likelihood functionand use the MOPED vectors to compute the MOPED coeffi-cients. However, in order to achieve the full acceleration thatMOPED allows, a fast mechanism to generate the expectedcoefficients themselves is needed, and GPs can do this.

In this study, as a first application, the Gaussian Pro-cess emulator is built firstly for tomographic weak lensingband power spectra, and secondly for MOPED coefficients.With a well-designed emulator, it is possible to obtain re-liable marginalised posterior densities of cosmological andnuisance parameters with a few hundred forward simulationsonly. We find that for a small number of training points, theemulator yields posterior distributions close to the true pos-terior whilst still maintaining a decent speed for each likeli-hood evaluation. When combined with MOPED data com-pression, the GP allows for very large acceleration of anal-ysis of large datasets, which may otherwise be prohibitivelyexpensive.

The paper is organised as follows: in §2 we providea brief overview on weak lensing theory and in §3, wetouch upon Gaussian Processes before systematically goingthrough the main steps taken to build the emulator in §4.In §5, we discuss how the MOPED algorithm can be usedalongside the emulator and we present our results in §6 be-fore concluding in §7. Throughout, we will assume a flatuniverse.

2 WEAK LENSING

Gravitational lensing is the bending of light as it propa-gates through the inhomogeneous Universe, which leads toa coherent distortion of galaxy images. In particular, weaklensing can only be studied in a statistical sense since thedistortion is small and requires averaging over a large sampleof galaxies as a result of broad distribution of intrinsic ellip-ticities. Weak lensing has the advantage of probing the dis-tribution of matter and not of biased tracers such as galaxieswhich is hard to predict. In essence, we would ideally wantto perform a full 3d statistical analysis but much of the cos-mological information can still be retained by tomography,

where objects are separated by redshift. We refer the readerto common literature on weak gravitational lensing (Bartel-mann & Schneider 2001; Kilbinger 2015) for further detailson these techniques.

In a weak lensing analysis, observables include galaxypositions, photometric redshifts and shapes. The latter isgiven in terms of the ellipticity components, ε1 and ε2. Inparticular, this observed ellipticity, ε = ε1 + iε2 is related tothe unlensed intrinsic ellipticity, ε0 via the shear field suchthat

ε ' ε0 + γ. (1)

The observed ellipticities are binned into pixels and redshiftbin, i. An estimate of the shear field, γi is obtained by av-eraging the ellipticities in each pixel. This (complex) shearfield can be expanded in terms of spin-weighted sphericalharmonics, sY m (Hu 2000; Castro et al. 2005)

γ1(r) ± iγ2(r) =√

∑`m

∫dk k2γ`m(k)±2Y m(n) j`(kr), (2)

where j` is a spherical Bessel function, k is a radial wavenum-ber and ` is a positive integer while m = −`, . . . `. The coef-ficients γ`m are related to the transform of the lensing po-tential, φ(r) by

γ`m =12

√(` + 2)!(` − 2)!φ`m(k). (3)

Similarly, the expansion coefficients for the convergencefield, κ, is

κ`m = −`(` + 1)

2φ`m(k). (4)

The shear field can be decomposed into E and B modescorresponding to the curl-free and divergence-free compo-nents. In particular, the convergence field, κE contains mostof the cosmological information since κB is negligible in theabsence of systematics (Castro et al. 2005). Under this con-dition, the E-mode lensing power spectrum between tomo-graphic bins i and j is equal to the convergence power spec-trum, that is, CEE

`, i j= Cκκ

`, i jand is given, in the Limber ap-

proximation (Limber 1953; Loverde & Afshordi 2008) by

CEE`, i j =

∫ χH

0dχ

wi(χ)wj (χ)χ2 Pδ

(k =

` + 1/2χ

; χ), (5)

where χ is the comoving radial distance and χH is the co-moving distance to the horizon. Without the Limber ap-proximation, the integrals can be slow to compute, althoughfaster methods are being developed (Fang et al. 2019). Cru-cially, the tomographic convergence power spectrum is sen-sitive to the background geometry and the growth of struc-ture. It depends on the the three-dimensional matter powerspectrum, Pδ(k; χ) which is a function of redshift (Weinberget al. 2013). The weight function wi for a flat universe is

wi(χ) =3ΩmH2

02c2 χ(1 + z)

∫ χH

χdχ′ni(χ′)

(χ′ − χχ′

), (6)

which depends on the lensing kernel. Ωm is the present mat-ter density, H0 is the Hubble constant and c is the speed

MNRAS 000, 1–15 (2020)

Page 4: arXiv:2005.06551v2 [astro-ph.CO] 15 Jul 2020 · both of tomographic weak lensing band-powers, and of coe cients of summary data massively compressed with the MOPED algorithm. In the

4 A. Mootoovaloo et al.

of light. An important quantity is the redshift distribution,ni (z) dz = ni (χ) dχ which is normalised such that∫

ni(χ)dχ = 1. (7)

For a weak lensing survey, the data vector consists ofthe measured shear per pixel for each redshift bin. At thispoint, in order to extract the shear power spectrum, one caneither take a quadratic estimator approach using maximum-likelihood technique (Bond et al. 1998) or employ a pseudo-C(`) approach (Hinshaw et al. 2007). Alternatively, one canalso build a full Bayesian hierarchical model, to infer thefull shear power power spectrum (Alsing et al. 2016, 2017).Here, we focus on the tomographic band power spectra, asdetermined by Kohlinger et al. (2017).

2.1 Astrophysical Systematics

Coupled to the E-mode power spectrum are various sys-tematics which we should consider. For example, baryonfeedback results in altering the power in high k. Althoughfeedback is not fully understood, it is often parameterizedthrough the bias function, b2(k, z) such that the modifiedpower spectrum is

Pmodδ (k, z) = b2 (k, z) Pδ (k, z) . (8)

As an example, for the KiDS-450 analysis, the followingfitting formula from van Daalen et al. (2011) was used

b2 (k, z) = 1 − Abary

[Aze(Bz x−Cz )3 − Dz xeEz x

], (9)

where x = log10(k/1 Mpc−1) and the other parametersAz, Bz, Cz, Dz and Ez depend on the scale factor a. More-over, we must account for intrinsic alignment effects whichgive rise to a preferred ellipticity orientation. The total lens-ing power spectrum between two redshift slices is a lin-ear combination of the gravitational lensing (EE), intrinsicalignment (II) and interference (GI) power spectra. Specifi-cally, the II effect is due to correlation of ellipticities in thelocal environment and contributes positively towards the to-tal lensing spectrum. The second effect, GI, is due to corre-lation between tidally-stretched foreground galaxies and theshear of background galaxies. The GI term subtracts fromthe total lensing spectrum. We model the power spectrum,following Kohlinger et al. (2017), as

Ctot`, i j = CEE

`, i j + A2IACII

`, i j − AIACGI`, i j, (10)

where the II power spectrum, CII`, i j

and the GI power spec-

trum, CGI`, i j

respectively are

CII`, i j =

∫ χH

0dχ

wi(χ)wj (χ)χ2 Pδ

(k =

` + 0.5χ

; χ)

F2 (χ) ,

(11)

CGI`, i j =

∫ χH

0dχ

wi(χ)nj (χ) + wj (χ)ni(χ)χ2

(k =

` + 0.5χ

; χ)

F(χ),(12)

F(χ) = C1ρcritΩm

D+(χ), (13)

and AIA is a free parameter to be inferred during sampling.This allows for the flexibility of rescaling the otherwise fixednormalisation value, C1 = 5 × 10−14 h−2 M−1

Mpc3. ρcrit isthe critical density of the Universe while D+(χ) refers to thelinear growth factor normalized to unity today.

3 GAUSSIAN PROCESSES

In this section, we provide a general outline of the basicconcepts behind the Gaussian Process, which is at the coreof our cosmological parameter inference scheme. We refer thereader to Rasmussen & Williams (2006) for further details.

Our goal is to design an emulator, which takes as inputa set of cosmological parameters, θ, and outputs summarystatistics, y, for example, power spectrum or band powers.We consider our regression problem to be in the form,

y = f + ε, (14)

where f is a multivariate Gaussian when evaluated at anarbitrary number of points, and ε is an error. The distri-bution of f is informed by a set of Ntrain training pointsin parameter space, and has a Gaussian prior distribution,f ∼ N(0, kpq) with a covariance kpq ≡ k(θp, θq).

The kernel function k which encapsulates the correla-tion between points in the parameter space is used to modelthe smoothness of the function. We choose the automaticrelevance determination (ARD) kernel, also referred to asthe Squared Exponential (SE) kernel, Radial Basis Function(RBF) or simply the Gaussian kernel:

k(θp, θq

)= A2 exp

[−1

2(θp − θq

)TΛ−1 (

θp − θq) ], (15)

where Λ = diag (λ1, λ2 . . . λd). A and λ are referred to asthe amplitude and characteristic squared length-scale re-spectively. In particular, the former determines the averagedistance of the function from the mean while λ controls thesmoothness of the function. The set of hyper-parameters forthis particular kernel is A, λ. This kernel has the nice prop-erty that it is fully differentiable and positive definite.

Unlike parametric regression where we define priors overparameters, we will now define a prior covariance over thefunctions directly and using Bayes’ theorem, the joint pos-terior distribution of the functions is

p ( f | θ, y) = p (y | θ, f ) p ( f | θ)p (y | θ) , (16)

where the likelihood p (y | θ, f ) ∼ N ( f , Σ) and the prior isp ( f | θ) ∼ N (0,K). Σ is the noise covariance covariance ma-trix. K = k (θ, θ ′) is the kernel matrix by calculating equation(15) for every pair of points in θ. Prior to making predictiveinference, the Gaussian Process is trained by finding the setof hyper-parameters A, λ which maximizes the BayesianEvidence (marginal likelihood)

log p (y | θ) = −12yTK−1

y y − 12

logKy

+ constant, (17)

MNRAS 000, 1–15 (2020)

Page 5: arXiv:2005.06551v2 [astro-ph.CO] 15 Jul 2020 · both of tomographic weak lensing band-powers, and of coe cients of summary data massively compressed with the MOPED algorithm. In the

Inference with GP and MOPED for WL 5

10−7

10−6

10−5

10−4

10−3

10−2

`(`+

1)2π

CEE `

z1× z1

10−7

10−6

10−5

10−4

10−3

10−2

`(`+

1)2π

CEE `

z2× z1 z2× z2

102 103

`

10−7

10−6

10−5

10−4

10−3

10−2

`(`+

1)2π

CEE `

z3× z1

102 103

`

z3× z2

102 103

`

z3× z3

Figure 2. The E-mode band powers (data) used in our inference scheme, similar to the KiDS-450 analysis (Kohlinger et al. 2017). The

`−ranges are as follows: 76 ≤ ` < 220, 221 ≤ ` < 420, 421 ≤ ` < 670 and 671 ≤ ` < 1310. In particular, the auto-correlation band powersare along the main diagonal (z1 × z1, z2 × z2 and z3 × z3) for the 3 redshift bins 0.10 < z1 ≤ 0.30, 0.30 < z2 ≤ 0.60 and 0.60 < z3 ≤ 0.90.

The off-diagonal blocks show the unique cross-correlation band powers. The blue shaded regions indicate the 1σ level errors from the

covariance matrix.

where Ky = K + Σ. The first term in the marginal likelihoodcontrols the fit to the data while the second term controls themodel complexity. For numerical stability, we first computethe Cholesky factor, L, of Ky ≡ LLT, solve for u in the linear

system Lu = y followed by solving for α in LTα = u. Themarginal likelihood is then given by

log p (y | θ) = −12yTα −

∑i

log Lii + constant. (18)

Moreover, the partial derivatives of equation (17) withrespect to the kernel hyperparameters, η = A, λ can becomputed in closed form

∂ηilog p (y | θ) = 1

2tr

[(ααT − K−1

) ∂K∂ηi

](19)

and α = K−1y y. The gradients are useful when maximising the

marginal likelihood when using gradient-based optimisation.Another option, which we will not use for this particular ap-plication, is to marginalise over the kernel hyperparameters(given an appropriate set of priors) in a fully Bayesian for-malism. The reason for not taking this route is that thenumber of latent variables in the data model becomes large(∼ 102) and recall that the marginal likelihood for a GP isan expensive calculation.

For a given test point θ∗, the posterior distribution ofthe function, f∗ = f (θ∗) is a Gaussian distribution with meanand variance given by

f∗ = kT∗ α

var ( f∗) = k∗∗ − kT∗ K−1y k∗.

(20)

The GP approach is quite appealing as it predicts boththe mean and variance, at the expense of an O(N2) opera-

MNRAS 000, 1–15 (2020)

Page 6: arXiv:2005.06551v2 [astro-ph.CO] 15 Jul 2020 · both of tomographic weak lensing band-powers, and of coe cients of summary data massively compressed with the MOPED algorithm. In the

6 A. Mootoovaloo et al.

CEE`, 11 CEE

`, 21 CEE`, 22 CEE

`, 31 CEE`, 32 CEE

`, 33

CEE`, 11

CEE`, 21

CEE`, 22

CEE`, 31

CEE`, 32

CEE`, 33

0.2

0.4

0.6

0.8

1.0

Cij

√C

iiC

jj

Figure 3. The data correlation matrix for the KiDS-450 anal-ysis. We have ordered the covariance matrix in order of the to-

mographic labelling i j. Note that we have 4 band powers per

tomographic bin, hence 6 × 4 blocks in the covariance matrix.

tion, of the function for a particular test point. Moreover,as seen from equation (20), the mean of a GP is a linearpredictor. Once α is calculated, the mean function can bequickly and easily calculated for any set of test points sinceit involves only O(N) operations. In addition, if required,the analytical gradient with respect to the mean functionat a particular test point can also be calculated. However, aGP can be prohibitively expensive for large data sets sinceit involves the computation of the Cholesky factor whichhas computational complexity of O(N3) during the trainingphase and requires O(N2) operations for the predictive vari-ance. Besides, a GP has O(N2) memory requirements forstoring the Cholesky factor (if the computation of the pre-dictive error is required) and the vector α.

In the next section, we look into how a GP emulatorcan be useful for future weak lensing surveys, for whichwe naively expect around 10 tomographic redshift bins andthousands of summary statistics (Euclid Collaboration et al.2019). In particular, one can either emulate the band pow-ers directly or the MOPED coefficients. We investigate bothand discuss the advantages of using the MOPED coefficientsin the following sections.

4 EMULATOR

In this section, we use the formalism presented above tobuild the emulator. In brief, the latter involves 4 main stages,1) generating a set of design points, 2) running the full for-ward simulator at these points, 3) training the emulator and4) making predictions at test locations in the parameterspace. Once this is done, the emulator is connected to anMCMC sampler to obtain the marginalised posterior distri-butions of the parameters in our model. A simple flow ofthe core idea is shown in Fig. 1. In the following, we touchbriefly on the data we have used for our analysis before sys-tematically going through the steps we have taken to buildthe emulator.

Table 1. Set of cosmological and systematic parameters whichare used as inputs in the emulator. θ is the set of parameters (first

8 parameters in the table below) used for the emulation schemeand β is the set of the remaining 4 parameters which are also

marginalised over.

Definition Symbol

CDM density Ωcdmh2

Baryon density Ωbh2

Scalar spectrum amplitude ln(1010As)

Scalar spectral index ns

Hubble parameter h

Free amplitude baryon feedback parameter Abary

Intrinsic alignment parameter AIA

Neutrino mass (eV) Σmν

Free amplitude (bin 1) A1

Free amplitude (bin 2) A2

Free amplitude (bin 3) A3

Multiplicative bias m

4.1 Data

We use the publicly-available weak lensing data from Koh-linger et al. (2017) to test the performance of our emulator.We use 3 tomographic redshift bins, namely, 0.10 < z < 0.30,0.30 < z < 0.60 and 0.60 < z < 0.90 and the convergencepower spectrum is computed in the range 10 < ` < 4000.Moreover, we follow Kohlinger et al. (2017) and drop thefirst, second-to-last and last band powers in our analysis,that is, we use only the band powers corresponding to thefollowing `-ranges:76 ≤ ` < 220, 221 ≤ ` < 420, 421 ≤ ` < 670and 671 ≤ ` < 1310. For a 3-bin tomographic analysis, wehave 6 auto- and cross- tomographic power spectra to cal-culate. The data and covariance matrix for this problem areshown in Fig. 2 and 3 respectively.

The emulator can be built at the level of the powerspectra or the band powers. Here we choose to build aGP for each band power, giving 24 GPs. Alternatively, forlikelihood-free inference methods, one can also emulate thelikelihood directly using the GPs (see Leclercq (2018) andFendt & Wandelt (2007a)). For power spectrum reconstruc-tion, one can use the PICO method or an alternative, butconstrictive, stance is to adopt the approach taken by Habibet al. (2007) to first learn a set of basis functions via Sin-gular Value Decomposition (SVD) and model the resultingweights by a Gaussian Process. However, building an emula-tor for weak lensing analysis needs to account for systematiceffects, but some of these can be included analytically with-out emulation, resulting in an 8-dimensional GP, rather than12 (6 cosmological and 6 systematic parameters) if we wereto emulate the likelihood.

4.2 Training Points

The generation of the training points is a key ingredientfor the emulator to perform well. Accurate high-dimensionalregression is not easy, mainly due to the curse of dimen-sionality. With the formalism presented in this work and

MNRAS 000, 1–15 (2020)

Page 7: arXiv:2005.06551v2 [astro-ph.CO] 15 Jul 2020 · both of tomographic weak lensing band-powers, and of coe cients of summary data massively compressed with the MOPED algorithm. In the

Inference with GP and MOPED for WL 7

0.0190

0.0204

0.0218

0.0232

0.0246

0.0260

Ωbh2

0.010

0.088

0.166

0.244

0.322

0.400

Ωcdmh2

1.70

2.36

3.02

3.68

4.34

5.00

ln(1

010)A

s

0.019

0

0.020

4

0.021

8

0.023

2

0.024

6

0.026

0

Ωbh2

Figure 4. Five Latin Hypercube samples (using the maximinmethod) projected in 2D. In particular, we generate five Latin

Hypercube samples in 8D and we scale them according to our pre-

defined priors. In the figure, we show the projection in 2D for 3parameters and as expected, each point occupies it corresponding

row and column.

1 2 3 4 5ith Optimisation

0

1

2

3

4

5

6

7

-log

p(y|

θ)

×103

Figure 5. The figure shows the marginal likelihood of the Gaus-

sian Process (with 3000 training points) for the fourth band powermatrix and i = j = 2. Note the local minimum for the 3rd run of

the optimiser. The other bars have almost the same value, henceshowing that Nrestart = 5 is a good choice for training the GP.

depending on the complexity of the function, one can recon-struct the function precisely and accurately in low dimen-sions, hence leading to an accurate likelihood as would be thecase if we were to use the full simulator, CLASS (Lesgour-gues 2011) in this case. As the dimensionality of the prob-lem increases, we need an exponentially increasing numberof training points to emulate the true function accurately.

In PICO, the training points were generated uniformly

from a box whose sides were centred on the mean of a con-verged MCMC chain (consisting of ∼ 60000 cosmologicalmodels) and width 3σ along each direction. In the second re-lease of PICO, they selected training points which lie within25 log-likelihoods of the WMAP peak (Fendt & Wandelt2007a). On the other hand, Auld et al. (2007) first drew2000 training points from the same box defined in PICOand also added an extra 5000 training points drawn from aGaussian distribution, whose covariance was twice the ex-pected covariance matrix, centred on the maximum likeli-hood. These techniques perform quite well for two reasons:1) by restricting the prior volume of the training points tothe high likelihood regions allows the sampler explictly toexplore this specific region in parameter space, 2) creatinga data set with thousands of training points will also im-prove any regression method. A shortcoming of using theseapproaches is that the algorithm will not perform well in re-gions where there is no training point nearby (see AppendixA in Habib et al. (2007) for a comparison of their methodwith PICO). This is a typical manifestation of almost anyMachine Learning algorithm. They are good at making reli-able predictions within a pre-defined prior, provided they aretrained with enough data points. Building Machine Learn-ing algorithms in the small data regime is still in its infancy,hence an active area of research (Barz & Denzler 2019).

Moreover, if the training points are naively generatedrandomly from our pre-defined priors, we might not obtain asuitable coverage of the parameter space. A possible solutionto this, is to use a grid but then the number of training pointsgrows exponentially as the dimensionality of the problemincreases. As an example, say, we have a 7D problem andwe choose to have 10 points per parameter, then our trainingset will have 10 million points.

Alternatively, we can use Latin Hypercube (LH) sam-pling (McKay et al. 1979) which is a method for generatingrandom samples from a multidimensional dimensional dis-tribution in a controlled (quasi-random) way. A point is as-signed such that it uniquely occupies its row and column re-spectively. This procedure generalises to higher dimensionaldesigns. In Fig. 4, we show the projection of 5 LH samples,which have been generated from a box in 8D and scaled bythe pre-defined priors in §4.3, in 2D. In particular, we showthe projection for 3 parameters only but the same applies forthe other parameters, where each point uniquely occupies itscorresponding row and its column. The LH method is now aubiquitous tool for performing emulation in large simulationscenarios (Habib et al. 2007; Schneider et al. 2011; Schmit &Pritchard 2018) and is seen to be quite efficient, not only inproducing a fair interpolation, but also provides reasonableposterior densities.

In this work, we adopt the LH approach to generateour training set. The LH samples are generated using themaximinLHS function from the lhs R package (Carnell 2012).This particular design relies on distance criterion (Johnsonet al. 1990) and the final design is a result of maximising theminimum distance between points.

4.3 Priors

In our baseline emulator, we generate 1000 Latin Hypercubesamples from a box, between 0 and 1. We first linearly trans-

MNRAS 000, 1–15 (2020)

Page 8: arXiv:2005.06551v2 [astro-ph.CO] 15 Jul 2020 · both of tomographic weak lensing band-powers, and of coe cients of summary data massively compressed with the MOPED algorithm. In the

8 A. Mootoovaloo et al.

form these samples to the range of the pre-defined prior boxfor the 6 cosmological and 2 systematics parameters,

Ωcdmh2 ∼ U[0.01, 0.40]Ωbh2 ∼ U[0.019, 0.026]

ln(1010 As) ∼ U[1.70, 5.00]ns ∼ U[0.70, 1.30]h ∼ U[0.64, 0.82]

Abary ∼ U[0.0, 2.0]AIA ∼ U[−6.0, 6.0]Σmν ∼ U[0.06, 1.0]

followed by running the full simulator at these points to ob-tain the total band powers. U[a, b] denotes a uniform dis-tribution with lower and upper limits a and b respectively.We apply a more restrictive prior than the original KiDS-450 prior [0.01,0.99] for Ωcdmh2 since otherwise a large frac-tion of the LH samples we generate lie outside the regionof parameter space constrained by the current weak lensinganalysis. Moreover, having a smaller volume of parameterspace also improves the performance of the emulator. Theprior for the Abary is set to an upper limit of 2 (instead of 10in Kohlinger et al. (2017)) because we found that, values ofAbary & 3 lead to negative b2, which implies an unphysicalnegative power spectrum. In the same spirit, large values ofAbary lead to negative auto-correlated band powers and insome cases, the band power matrix (equation (21)) was notpositive definite. We also found that large values of neutrinomasses, Σmν & 1eV result in almost half of the CLASS bandpowers in our training set to be nan. We therefore set anupper limit for Σmν to 1 eV.

4.4 Transformations

Training the Gaussian Processes with the LH samples fromabove might be suboptimal, the reason being that the vol-ume occupied by a hypercube grows exponentially with in-creasing dimensions. On the other hand, a sphered trainingset (hypersphere) has a smaller volume compared to its cor-responding hypercube but with the same scaling with di-mension. This transformation step is analogous to the oneused by Fendt & Wandelt (2007b). Schneider et al. (2011)assessed in detail the effect of various transformations priorto building an emulator for the CMB power spectrum. Theyfound that de-correlating the input space leads to signifi-cant improvements compared to working with the originalform of the input parameters. The interpolation can furtherbe improved if one uses a known Fisher information matrixspecific to the problem.

The transformation matrix can be calculated as follows:we first compute the sample covariance, Cθ of the 1000 inputparameters, θ to the emulator (see Table 1), which we diag-onalise, Cθ = UDUT. U is a d × d orthonormal matrix and Dis a diagonal d× d matrix consisting of the (necessarily posi-tive) eigenvalues. The transformation matrix which whitens

θ is then UD12 , such that the transformed input covariates

are X = UD12 θ, and the covariance of X is the identity matrix.

Also, having a pre-whitened basis also justifies the use of adiagonal kernel matrix such as the ARD kernel in equation

(15), for which it is often blindly assumed (without trans-forming the inputs) that the correlation among the inputparameters vanishes.

Next, we consider the transformation of the band pow-ers. The distribution of the original band powers in our train-ing set is left-skewed. For a fixed ` in our 3-bin tomographicanalysis, the resulting 3 × 3 matrix,

B` =©­«

B`, 00 B`, 01 B`, 02B`, 10 B`, 11 B`, 12B`, 20 B`, 21 B`, 22

ª®¬ (21)

must be positive-definite and emulating the matrix elementsindividually will not guarantee this.

To ensure that the 3 × 3 band power matrix remainspositive-definite during the prediction phase when using theemulator, we instead build the latter on each element of thelogarithm B` (lower or upper triangular part, essentially allthe unique elements),

V` = RΛRT = log (B`) , (22)

where B` = RΛRT, Λνν = log(Λνν) and Λ and Λ are diago-nal. Moreover, since we normally assume a Gaussian Processwith mean zero and kernel, K, we do an additional linear scal-ing such that the mean of the band powers in our trainingset is zero and has a standard deviation of one, for example,for the ith transformed band power,

v′i →vi − viσi

(23)

and the predictive mean and variance are

E[vi(∗)] = σi E[v′i(∗)] + vi

var[vi(∗)] = σ2i var[v′

i(∗)].(24)

4.5 Training the Emulator

We now have our training set X,V`,i j . Therefore we havea set of 24 Gaussian Processes due to each element of thetransformed band powers. Prior to building the emulator, acrucial step is to choose a kernel function for the GaussianProcess. Here we use the ARD kernel, defined in equation(15).

To ensure a good performance, we have to find the set ofhyperparameters which maximises the marginal likelihood,as discussed in §3. An important ingredient is the analyticalgradient of the marginal likelihood with respect to the ker-nel hyperparameters to guarantee convergence to the globalminimum. The gradients are

∂kpq∂A

=2A

kARDpq

∂kpq∂`i

= kARDpq

(θp(i) − θq(i))2

`3i

,

(25)

where i indicates the ith dimension of the problem. We usethe Limited memory Broyden-Fletcher-Goldfarb-Shanno, L-BFGS-B algorithm (Zhu et al. 1997; Press et al. 2007) along

MNRAS 000, 1–15 (2020)

Page 9: arXiv:2005.06551v2 [astro-ph.CO] 15 Jul 2020 · both of tomographic weak lensing band-powers, and of coe cients of summary data massively compressed with the MOPED algorithm. In the

Inference with GP and MOPED for WL 9

0.1 0.2 0.3 0.4Ωcdmh2

−15

−14

−13

−12

−11

−10

v `,0

0

CLASS

GP Mean

3σ Credible Interval

0.1 0.2 0.3 0.4Ωcdmh2

0.75

0.80

0.85

0.90

0.95

1.00

v `,2

1

CLASS

GP Mean

3σ Credible Interval

Training Point

Figure 6. The left plot shows the predicted band power across a slice in parameter space. In other words, we choose a point within the

prior box and compute the GP mean, variance and the actual band power for Ωcdmh2 ∈ [0.05, 0.40]. The same procedure is repeated inthe right plot, but we instead choose a point from a training set, to illustrate the fact that the predicted GP uncertainty tends to zero

near the training point and the predictive variance increases towards the edge of the prior box.

with the gradients defined above to optimise for these hy-perparameters by minimising the negative log-marginal like-lihood, in equation (17), via gradient descent. However, it isa known fact that training a Gaussian Process is not an easytask because the marginal likelihood has various local max-ima (Rasmussen & Williams 2006). We adopt the standardapproach of restarting our optimiser at different positionsand we find that Nrestart = 5 was sufficient in practice toensure that we find the set of hyperparameters correspond-ing to the global optimum (see Fig. 5). Although this is notguaranteed, we also want to emphasise that the use of thegradients was required to find the global optimum. Once theGaussian Process is trained, the kernel parameters are fixedat the optimised values of the hyperparameters and then useequations (20) to make predictions.

4.6 The GP Uncertainty

In this section, we look into propagating the GP uncertaintythrough the full forward model when we use the emulator. Tobe more specific, we seek the posterior distributions of thecosmological parameters and the two nuisance parameters(AIA, Abary), that is,

θ =[Ωcdmh2, Ωbh2, ln(1010 As), ns, h, Abary, AIA, Σmν

]and the other 4 nuisance parameters,

β = [A1, A2, A3, m]

marginalised over the probabilistic band powers. A1, A2, A3correspond to free parameters which determine excess noisein the autocorrelation power spectrum, while m is the shearmultiplicative bias parameter (Kohlinger et al. 2017). Using

v

θ

d

β

Figure 7. The full forward model can be understood as follows:

at each step in the inference procedure, a random set of samples

of the cosmological, θ and nuisance, β is drawn from the prior,followed by a random realisation of the probabilistic band powers,

centred on its mean and variance before computing the likelihood.Note that the kernel hyperparameters are fixed to their optimisedvalues.

equation (10) and defining v as the total band powers, wecan write the joint posterior, p(θ, β |d ) as

p (θ, β |d ) =∫

p (θ, β, v |d ) dv

=

∫p (d |v, β ) p (v |θ ) dv p (θ) p (β) .

(26)

If p(ν |θ) were a Gaussian distribution of the band powerfrom the Gaussian Process, the above integration would be aconvolution of two Gaussian distributions and the likelihoodpart would be Gaussian.

However, in our analysis, the predictive distribution isGaussian in each element of the logarithm of the band powermatrix. For example, in Fig. 6, we show the GP mean and

MNRAS 000, 1–15 (2020)

Page 10: arXiv:2005.06551v2 [astro-ph.CO] 15 Jul 2020 · both of tomographic weak lensing band-powers, and of coe cients of summary data massively compressed with the MOPED algorithm. In the

10 A. Mootoovaloo et al.

0.0 0.3 0.6 0.9 1.2

Σmν

0.4

0.6

0.8

1.0

σ 8

0.60.81.01.21.4

n s

0.600.660.720.780.84

h

0.00.61.21.82.4

Aba

ry

0.000.020.040.060.08

A1

0.00

0.02

0.04

0.06

A2

0.00

0.04

A3

−6−4−2

02

AIA

−0.03

0.00

m

0.15

0.30

0.45

0.60

Ωm

0.00.30.60.91.2

Σm

ν

0.4 0.6 0.8 1.0

σ80.6 0.8 1.0 1.2 1.4

ns0.6

00.6

60.7

20.7

80.8

4

h0.0 0.6 1.2 1.8 2.4

Abary0.0

00.0

20.0

40.0

60.0

8

A10.0

00.0

20.0

40.0

6

A20.0

00.0

4

A3

−6 −4 −2 0 2

AIA −0.03

0.00

m

CLASS

GP (Mean)

GP (Uncertainty)

Figure 8. The full 1D and 2D marginalised posterior distributions obtained using three different methods - The one in tan colourcorresponds to posterior distributions with the full simulator (CLASS) while the solid brown one corresponds to the Gaussian Process

emulator when random functions of the band powers are drawn, hence marginalising over the Gaussian Process uncertainty. The posteriorin blue shows the distributions obtained when only the mean of the Gaussian Process was used in the inference routine. The contours

denote the 68 % and 95 % credible interval respectively. Note that some parameters are dominated by their respective priors and are not

constrained at all. A similar conclusion was drawn by (Kohlinger et al. 2017). However, the important point here is that the posteriorfrom the GP is close to that obtained with CLASS.

variance for two elements across a slice in parameter space.As previously discussed, if the GP predictions were Gaus-sian in the band powers, we could marginalise analyticallyover the theoretical uncertainty. Since they are Gaussian ineach element of V`,i j , we marginalise by drawing samples ofthe cosmological and nuisance parameters (see Fig. 7) andperform a Monte Carlo integration, which is relatively fastand approximating the joint posterior as

p(θ, β |d ) ∝ p(θ)p(β)Ns

Ns∑i=1

p(dV`,i j ) . (27)

Ns is the number of random band powers drawn after com-puting the predictive mean and variance. We use Ns = 20at every step in the MCMC to take into account the un-certainty from the Gaussian Process. Recall that each bandpower is being modelled independently by a GP and hencethe Monte Carlo integral in equation (27) requires few drawsof the probabilistic band powers.

MNRAS 000, 1–15 (2020)

Page 11: arXiv:2005.06551v2 [astro-ph.CO] 15 Jul 2020 · both of tomographic weak lensing band-powers, and of coe cients of summary data massively compressed with the MOPED algorithm. In the

Inference with GP and MOPED for WL 11

−14 −12 −10 −8 −6 −4 −2 0log p(θ, β|d)

0.00

0.05

0.10

0.15

0.20

0.25

CLASS

GP (Mean)

GP (Uncertainty)

(a) log-Posterior without Compression

−14 −12 −10 −8 −6 −4 −2 0log p(θ, β|d)

0.00

0.05

0.10

0.15

0.20

0.25

CLASS

GP (Mean)

GP (Uncertainty)

(b) log-Posterior with MOPED Compression

Figure 9. Samples of the log-posterior obtained with the 3 methods investigated - In panel (a), the pale blue histogram refers to the

log-posterior samples from CLASS while the red and green step histogram correspond to the mean and error of the GP respectively. Asimilar plot is shown in panel (b) but after applying the MOPED compression step.

5 DATA COMPRESSION

The next era of weak lensing surveys such as Euclid andLSST will have ∼ 10 tomographic bins, and with multipleband powers or correlation functions per bin, the number ofsummary statistics will be large, ∼ 103−104. As an example,Euclid Collaboration et al. (2019) considered 100 bandpow-ers per bin, and 10 tomographic bins, which gives a mini-mum of 1000 summaries, and 5500 if cross-band powers areincluded. The setup, in the previous section, is not a scalableapproach for these future surveys. In particular, emulatingeach band power is not an entirely feasible approach be-cause one will have to train and store thousands of separateGaussian Processes and this process in itself can be quiteexpensive.

In this section, we show that the emulator can be usedwith the MOPED algorithm (Heavens et al. 2000b) whichreduces the number of data points from N to just p num-bers. N is the number of data points and p is the number ofparameters in our model. For current weak lensing analysis,the gain is not significant (since we are working with only24 band powers) but the method proposed in this work isexpected to yield fast parameter inference in the regime ofa large number of band powers, N ∼ 104, with only p ∼ 10parameters of interest.

Here, we briefly cover the MOPED algorithm. The lat-ter essentially finds some weighing vector, b which encap-sulates as much information as possible for a specific modelparameter θα. This vector is then used to find linear combi-nation of the data, d such that the compressed data is

yα ≡ bTα d. (28)

The first and subsequent MOPED vectors are given respec-tively by

b1 =C−1µ,1√µT,1C−1µ,1

(29)

and

bα =C−1µ,α −

∑α−1β=1 (µ

T,αbβ)bβ√

µT,αC−1µ,α −∑α−1β=1 (µ

T,αbβ)2

(α > 1), (30)

where C is the data covariance matrix and µ,α is the vec-tor obtained by calculating the gradient of our theoreticalmodel at a fiducial parameter set. In the previous applica-tions of the MOPED algorithm, it was assumed that the co-variance matrix is fixed. In our case, Kohlinger et al. (2017)constucted a covariance matrix which depends on the m pa-rameter, the multiplicative bias. In this work, we fix C atthe average fiducial value provided1 in the data. Data com-pression with parameter-dependent covariance matrix hasbeen explored by Heavens et al. (2017). If B ∈ RN×p is thematrix whose columns consist of the MOPED vectors, thecompressed data vector is just

y = BTx. (31)

By construction, the MOPED vectors bα are orthogonal toeach other, that is, bTα Cbβ = δαβ . Therefore, the covariance

matrix of y, BTCB = I, the identity matrix, of size p × p. Asa result of this orthogonality condition, elements from the

1 Cosmological parameter inference depends mildly on the pa-

rameter m.

MNRAS 000, 1–15 (2020)

Page 12: arXiv:2005.06551v2 [astro-ph.CO] 15 Jul 2020 · both of tomographic weak lensing band-powers, and of coe cients of summary data massively compressed with the MOPED algorithm. In the

12 A. Mootoovaloo et al.

compressed data vector are uncorrelated. Hence, the log-likelihood is straightforwardly

logL = −12

p∑α=1(yα − bTα µ)2 + constant, (32)

where bTα µ is computed using the emulator. The fact thatthe likelihood of the compressed data involves only O(p) op-eration makes parameter inference very fast since the O(N3)operation in the standard likelihood is completely elimi-nated, provided bTα µ can be rapidly computed.

By emulating the MOPED coefficients directly with sep-arate Gaussian Processes, we have a very powerful tool. TheGPs are still functions of just 8 parameters (6 cosmologicaland 2 systematics) and we now have only 11 separate GPs.Crucially, this setup is interesting because increasing thenumber of band powers (for example, in forthcoming lens-ing surveys) will not affect the MOPED timings at all.

6 RESULTS

Fig. 6 shows 2 band powers, evaluated across the Ωcdmh2

slice in parameter space. In particular, the function in blackcorresponds to the accurate solver, CLASS while the bro-ken red function corresponds to the GP mean, with the tanshading giving the 3σ credible interval of the GP. Note alsothat the right panel shows the GP prediction through a giventraining point and as expected, the GP uncertainty tends tozero. As seen in Fig. 6, the GP is able to predict the bandpowers quite well.

Since the predictive function is a Gaussian distribution,we can build a simple emulator by just using the mean,or propagate the uncertainty from the Gaussian Processthrough the model. Either method gives reasonable poste-rior densities as shown in Fig. 8. On a high end desktopcomputer, the evaluation is quite fast. Computing one like-lihood with the mean of the Gaussian Process takes 0.03seconds compared to 0.09 seconds if we include the Gaus-sian Process uncertainty with 20 Monte Carlo samples tomarginalise over the GP uncertainty. On the other hand,CLASS takes 0.65 seconds for a likelihood evaluation. If weuse 1000 training points, this yields an overall speed-up bya factor of ∼ 12− 30 depending on whether we use the meanor the GP variance. In our case, we generate 360000 MCMCsamples using EMCEE (Goodman & Weare 2010; Foreman-Mackey et al. 2013) for which the full simulator takes about44 hours while the Gaussian Process emulator, using themean, takes about ∼ 1.5 hours. On the other hand, when weemulate the MOPED coefficients using 1000 training points,each likelihood computation takes ∼ 0.03 seconds with ei-ther the mean or the variance of the GPs. As an example,with the MOPED compression, CLASS takes ∼ 44 hours togenerate 330000 MCMC samples (note that there is no sig-nificant improvement in speedup because we have just 24band powers and each likelihood computation with or with-out the MOPED compression is almost the same). However,with the emulator, we obtain the same number of MCMCsamples in ∼ 1.5 hours with either the mean or varianceof the GPs. All experiments with EMCEE were run on asingle core. An interesting additional feature for the emula-

tion scheme would be to exploit parallelization to speedupinference further.

The distribution of the log-posterior (up to a normal-isation constant) of the MCMC samples obtained by usingCLASS (in pale blue) is shown in Fig. 9. In the same plot,the red and green histograms show the distribution of thelog-posterior when using the mean and error from the GPrespectively. In the same figure in panel (b), we show the log-posterior of the samples obtained after compressing the datausing the MOPED formalism. Note that, the distribution ofthe log-posterior of the different MCMC samples gives anindication of how faithful the function reconstruction withthe GP is. With a small number of training points, thereis a small shift of the log-posterior distribution of the GPemulator (either with the mean or the uncertainty) relativeto the CLASS distribution.

To compare the two distributions, we compute theKullback-Leibler (KL) divergence between the CLASS dis-tribution and the GP distribution, that is,

DKL (p ‖q ) =∑

p (θ, β |d ) log

[p (θ, β |d )q (θ, β |d )

](33)

where p (θ, β |d ) and q (θ, β |d ) are the posterior probabili-ties computed using CLASS and the GP at the same pointsin parameter space. Since the posterior probability is cheapto compute with the GP, we use all the MCMC samples ob-tained using CLASS to compute q(θ, β). The KL-divergencein nats, as a function of the number of training points, isshown in Table 2. In general, the reconstruction of the bandpowers is almost perfect as the number of training pointsincreases. This can also be deduced from the 5th column inTable 2 where the KL divergence decreases with increasingtraining points. If one could afford additional simulations,one option would be to just use the mean of the GP tosample the posterior distribution since it is not only fastercompared to the case where the GP uncertainty is included,but is also closer to the actual true posterior distribution.

To assess the convergence of our MCMC chains, wealso compute the Gelman-Rubin statistics (Gelman & Ru-bin 1992) for different scenarios. The latter is simply definedas R = V/W , where V is the between-chain variance and W isthe within chain variance. R is calculated for different cases,for example, for a fixed number of training points, we usethe MCMC samples using the GP (mean) and the MCMCsamples obtained using CLASS. This is repeated with theMCMC samples where the GP uncertainty is included. In allcases, we apply a threshold of 1.05 to ensure that the chainssatisfy the ergodicity condition.

We are also interested in the S8 = σ8√Ωm/0.3 cosmolog-

ical parameter constraint. Recall that the GPs for samplingthe posterior are built using the 8 parameters (6 cosmologi-cal and 2 systematics) and they do not allow us to predict σ8directly. However, the latter is a function of just the 6 cosmo-logical parameters, since it involves an integration over thepower spectrum. Therefore, as we compute the band powerswith the 1000 training points, we also record the σ8 values,as generated by CLASS. We then construct a training setwith inputs

[Ωcdmh2, Ωbh2, ln(1010 As), ns, h, Σmν

]MNRAS 000, 1–15 (2020)

Page 13: arXiv:2005.06551v2 [astro-ph.CO] 15 Jul 2020 · both of tomographic weak lensing band-powers, and of coe cients of summary data massively compressed with the MOPED algorithm. In the

Inference with GP and MOPED for WL 13

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7Ωm

0.40

0.48

0.56

0.64

0.72

0.80

0.88

S 8=

σ 8√

Ωm

/0.

3

CLASS

CLASS (GP)

(a) Emulator for computing σ8

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7Ωm

0.40

0.48

0.56

0.64

0.72

0.80

0.88

S 8=

σ 8√

Ωm

/0.

3

CLASS

GP (Mean)

GP (Uncertainty)

(b) Without MOPED Compression

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7Ωm

0.40

0.48

0.56

0.64

0.72

0.80

0.88

S 8=

σ 8√

Ωm

/0.

3

CLASS

GP (Mean)

GP (Uncertainty)

(c) With MOPED Compression

Figure 10. S8 versus Ωm plane for our analysis. The left panel shows that the Gaussian Process emulator, which is a function of our

cosmological parameters, for computing σ8 is accurate and precise enough compared to CLASS. The middle panel shows the constraints

without MOPED compression while the right panel includes MOPED compression. The inner and outer contours correspond the 68%and 95% credible interval respectively.

Table 2. Computational cost comparison between CLASS and the GP emulator

Ntrain Training MCMC (Mean) MCMC (Uncertainty) DKL (Mean) DKL (Uncertainty)

1000 20 84 216 0.84 1.00

1500 48 85 290 0.63 0.89

2000 92 86 396 0.60 0.81

2500 139 88 524 0.47 0.68

3000 209 90 692 0.09 0.65

Note:The training and sampling time (columns 2,3,4) are given in minutes and the KLdivergence is computed in units of nats (scaled by a constant); the largest DKL is with the

1000 training points when we include the GP uncertainty, in which case, DKL = 2.03× 10−12.

which is then used to build an additional GP for σ8. Thisthen allows us to predict σ8 at any point in the parameterspace within the prior box. We find that it takes only 1minute to predict σ8 for 360 000 MCMC samples.

In Fig. 10, we show the 2D marginalised posterior dis-tribution of the derived parameters S8 and Ωm using threedifferent methods is shown. In particular, we compare thedistribution obtained from CLASS with the mean and un-certainty of the GP and we conclude that we are able torecover comparable posterior densities for these two quanti-ties, S8 and Ωm.

In high dimensions, the GP uncertainty inflates betweenany two points. It is expected that adding more trainingpoints will improve the performance of the emulator (eitherwith the mean or GP uncertainty) since the reconstructionof the emulated function will converge to the original func-tion. In general, with increasing number of training points,the GP uncertainty will also decrease. The effect of the num-ber of training points is indicated by the values of the KL-divergence in the last two columns of Table 2. However, weempirically found that the KL-divergence when we use themean of the emulator, is always better compared to the GPuncertainty.

One might expect the inclusion of the GP uncertaintyto broaden the likelihoods, so the KL divergence would notbe an appropriate measure of success. However, this doesnot appear to be the case: marginal errors are not notice-ably increased. Our conclusion is that inclusion of the GP

uncertainty does not improve results, but this might varywith application. The reason is probably that we are emu-lating a precise function, where the training points have zeroerror, and in this circumstance, the GP (which makes someassumptions that do not hold in detail) provides an errorthat is only approximately correct (Karvonen et al. 2020;Wang 2020).

7 CONCLUSIONS

We have designed a principled and detailed Gaussian Pro-cess emulator for constraining not only weak lensing cosmo-logical parameters but also the nuisance parameters. In sum-mary, for this problem, 1) the (expensive) solver is querieda few thousand times only, to generate a training set (com-pared to a conventional MCMC routine where the solver isqueried at every likelihood computation, 2) the emulator is∼ 20 times faster compared to the full solver and this makesinference very quick and 3) by emulating the MOPED coef-ficients, the number of separate Gaussian Processes is equalto the number of parameters in the model and inference, ir-respective of the number of data points, and is a very pow-erful technique for analysing large datasets. Moreover, theposterior distributions obtained from the emulator are quiterobust compared to the full run of the simulator, with andwithout MOPED.

We have also demonstrated that the emulator can be

MNRAS 000, 1–15 (2020)

Page 14: arXiv:2005.06551v2 [astro-ph.CO] 15 Jul 2020 · both of tomographic weak lensing band-powers, and of coe cients of summary data massively compressed with the MOPED algorithm. In the

14 A. Mootoovaloo et al.

1000 1500 2000 2500 3000Number of Training Points

0

100

200

300

400

500

600

700

Tim

e(m

in)

Performance

GP (Training)

GP (Mean)

GP (Uncertainty)

Figure 11. Illustrating the performance of the emulator as a

function of the number of training points. The expensive com-putations reside in training and predicting the GP uncertainty.

Sampling the posterior with the GP mean is quick, even with

the 3000 training points. The graphs do not perfectly follow theexpected scaling with N because of various overheads.

used to emulate the MOPED coefficients directly. Both com-bined are expected to accelerate cosmological parameter in-ference. Emulating the MOPED compressed data has twomajor advantages. The first is a feature of MOPED itself,that the compressed data set does not grow at all as theoriginal dataset increases in size, so scales exceptionally wellto Euclid and LSST. The second is that MOPED is only fastif the theoretical values of the MOPED coefficients can becomputed very quickly. The GP provides this functionality.This is the most important conclusion of this paper.

In addition, we have used the KL-divergence as a met-ric to assess the performance of the emulator in obtainingreliable high dimensional posterior distributions. As evidentfrom Table 2, the larger the number of training points, thebetter the reconstruction of the emulated function and hencethe lower the KL-divergence between the accurate CLASSposterior distribution and the emulator posterior distribu-tion.

We also recommend using the mean of the emulator forthis application. In Table 2, the KL-divergence between theemulator posterior and the CLASS posterior shows that themean is always better than the emulator with the GP uncer-tainty. From a computational perspective, this has variousother advantages, for example, inference is very fast since theGP mean prediction requires O(N) operations (recall thatthe GP mean is a linear predictor) and storage.

An exciting application of this emulator can be in thecase where one requires non-Limber computation of thepower spectra. This certainly applies to galaxy clusteringstatistics (Fang et al. 2019) and the weak lensing bispec-trum (Deshpande & Kitching 2020), even if for the weaklensing power spectrum it is a good approximation (Kitch-ing & Heavens 2017; Kilbinger et al. 2017). In general, thelatter is computationally expensive to be calculated accu-

rately, especially at large ` because of the rapid oscillationsof the spherical Bessel functions (Lemos et al. 2017). Forexample, if the CLASS run were to be repeated without theLimber approximation, the emulator would have been ∼ 103

times even faster. In future surveys, because the numberof tomographics bins will be large, one would require morepower spectra computations. For example, 10 tomographicbins lead to 55 auto- and cross- power spectra and the emu-lator would be ∼ 103 and ∼ 105 faster with and without theLimber approximation respectively.

Emulation has various other key advantages, apart fromspeeding up inference. As an example, one has to choose agood proposal distribution, which often requires tuning, torun an MCMC chain with the full simulator. The emulatorcan be used to explore the parameter space quickly and learna suboptimal proposal distribution which can then be usedwith the full simulator.

The accuracy of the reconstructed function can be im-proved by adding more training points as we have demon-strated. However, scaling Gaussian Processes to large num-ber of training points results in a major computational bot-tleneck, mainly due to O(N3) operations in training andO(N2) in predicting the uncertainty (see Fig. 11). Fortu-nately, here a few hundred training points suffices to givecosmological results with only a few percent degeneracy inerror bars. Moreover, in this work, the training points havebeen placed according to the prior range itself. However,the interpolation scheme can be improved if we have moreconstrained parameters where we can use better prior in-formation such as a Fisher matrix to intelligently place thetraining points. Alternatively, one can also do a quick opti-misation to find the maximum likelihood estimator and theHessian matrix, both of which can be used to construct anoptimal design for the training points.

An alternative option to accelerate the computation ofGP uncertainty is to intelligently partition the training setby using a clustering algorithm, for example, k−means clus-tering (Hastie et al. 2001). During the prediction step, onecan then use a local expert, which has a smaller kernel size,to compute the GP uncertainty swiftly.

A quantity which is often required in optimisation orMonte Carlo methods such as Hamiltonian Monte Carlo(HMC) is the gradient with respect to the negative log-likelihood (cost function). Conveniently, the gradient withrespect to the mean of the Gaussian Process surrogate modelis analytic and this opens a new avenue towards acceleratinggradient computation as well.

Gaussian Processes should not only be interpreted asa method for just accelerating computations. Instead, it ef-fectively allows us to compute the posterior distribution ofa function by placing a prior over it. In this work, the EEband powers and MOPED coefficients are modelled inde-pendently as Gaussian Processes and we have shown thatwe can recover robust cosmological parameters, whilst stillmarginalising over the nuisance parameters.

ACKNOWLEDGEMENT

We thank the referee for helpful and meaningful comments.AM is supported by the Imperial College President’s Schol-arship. We thank Jonathan Pritchard for suggesting the use

MNRAS 000, 1–15 (2020)

Page 15: arXiv:2005.06551v2 [astro-ph.CO] 15 Jul 2020 · both of tomographic weak lensing band-powers, and of coe cients of summary data massively compressed with the MOPED algorithm. In the

Inference with GP and MOPED for WL 15

of LH samples for building the emulator and Pat Scott,Daniel Jones for useful discussion. We also thank ZafiirahHosenie for providing useful suggestions to improve thismanuscript. We also thank Prof. Marc Deisonruth and Prof.David van Dyk for insightful discussion at the beginning ofthis project.

DATA AVAILABILITY

The code and data products underly-ing this article will be made available athttps://github.com/Harry45/gp_emulator.

REFERENCES

Agarwal, S., Abdalla, F. B., Feldman, H. A., Lahav, O., &

Thomas, S. A. 2012, MNRAS, 424, 1409

Agarwal, S., Abdalla, F. B., Feldman, H. A., Lahav, O., &

Thomas, S. A. 2014, MNRAS, 439, 2102

Alsing, J., Heavens, A., & Jaffe, A. H. 2017, MNRAS, 466, 3272

Alsing, J., Heavens, A., Jaffe, A. H., et al. 2016, MNRAS, 455,4452

Alsing, J., Wandelt, B., & Feeney, S. 2018, MNRAS, 477, 2874

Anderson, L., Aubourg, E., Bailey, S., et al. 2014, MNRAS, 441,

24

Auld, T., Bridges, M., Hobson, M. P., & Gull, S. F. 2007, MNRAS,

376, L11

Bartelmann, M. & Schneider, P. 2001, Phys. Rep., 340, 291

Barz, B. & Denzler, J. 2019, arXiv e-prints, arXiv:1901.09054

Betoule, M., Kessler, R., Guy, J., et al. 2014, A&A, 568, A22

Bird, S., Rogers, K. K., Peiris, H. V., et al. 2019, J. CosmologyAstropart. Phys., 2019, 050

Bond, J. R., Jaffe, A. H., & Knox, L. 1998, Phys. Rev. D, 57,2117

Carnell, R. 2012, R package version 0.10

Castro, P. G., Heavens, A. F., & Kitching, T. D. 2005, Phys.

Rev. D, 72, 023516

Charnock, T., Lavaux, G., & Wandelt, B. D. 2018, Phys. Rev. D,

97, 083004

Deshpande, A. C. & Kitching, T. D. 2020, arXiv e-prints,

arXiv:2004.01666

Euclid Collaboration, Blanchard, A., Camera, S., et al. 2019,

arXiv e-prints, arXiv:1910.09273

Fang, X., Krause, E., Eifler, T., & MacCrann, N. 2019, arXiv

e-prints, arXiv:1911.11947

Fendt, W. A. & Wandelt, B. D. 2007a, arXiv e-prints,

arXiv:0712.0194

Fendt, W. A. & Wandelt, B. D. 2007b, ApJ, 654, 2

Foreman-Mackey, D., Hogg, D. W., Lang, D., & Goodman, J.

2013, PASP, 125, 306

Geenens, G. et al. 2011, Statistics Surveys, 5, 5

Gelman, A. & Rubin, D. B. 1992, Statistical Science, 7, 457

Goodman, J. & Weare, J. 2010, Communications in Applied

Mathematics and Computational Science, 5, 65

Gutmann, M. U. & Corander, J. 2015, arXiv e-prints,arXiv:1501.03291

Habib, S., Heitmann, K., Higdon, D., Nakhleh, C., & Williams,

B. 2007, Phys. Rev. D, 76, 083503

Hastie, T., Tibshirani, R., & Friedman, J. 2001, The Elements of

Statistical Learning (Springer series in statistics New York)

Heavens, A., Refregier, A., & Heymans, C. 2000a, MNRAS, 319,649

Heavens, A. F., Jimenez, R., & Lahav, O. 2000b, MNRAS, 317,

965

Heavens, A. F., Sellentin, E., de Mijolla, D., & Vianello, A. 2017,

MNRAS, 472, 4244

Heitmann, K., Higdon, D., White, M., et al. 2009, ApJ, 705, 156

Heitmann, K., Lawrence, E., Kwan, J., Habib, S., & Higdon, D.

2014, ApJ, 780, 111

Heitmann, K., White, M., Wagner, C., Habib, S., & Higdon, D.

2010, ApJ, 715, 104

Hinshaw, G., Nolta, M. R., Bennett, C. L., et al. 2007, ApJS, 170,

288

Hirata, C. M. & Seljak, U. 2004, Phys. Rev. D, 70, 063526

Hu, W. 2000, Phys. Rev. D, 62, 043007

Jaffe, A. H., Ade, P. A., Balbi, A., et al. 2001, Phys. Rev. Lett.,86, 3475

Johnson, M., Moore, L., & Ylvisaker, D. 1990, Journal of Statis-tical Planning and Inference, 26, 26

Karvonen, T., Wynne, G., Tronarp, F., Oates, C. J., & Sarkka,S. 2020, arXiv e-prints, arXiv:2001.10965

Kendall, A. & Gal, Y. 2017, arXiv e-prints, arXiv:1703.04977

Kern, N. S., Liu, A., Parsons, A. R., Mesinger, A., & Greig, B.2017, ApJ, 848, 23

Kilbinger, M. 2015, Reports on Progress in Physics, 78, 086901

Kilbinger, M., Heymans, C., Asgari, M., et al. 2017, MNRAS,

472, 2126

Kitching, T. D. & Heavens, A. F. 2017, Phys. Rev. D, 95, 063522

Kohlinger, F., Viola, M., Joachimi, B., et al. 2017, MNRAS, 471,

4412

Lawrence, E., Heitmann, K., White, M., et al. 2010, ApJ, 713,

1322

Leclercq, F. 2018, Phys. Rev. D, 98, 063511

Leclercq, F., Enzi, W., Jasche, J., & Heavens, A. 2019, MNRAS,

490, 4237

Lemos, P., Challinor, A., & Efstathiou, G. 2017, J. Cosmology

Astropart. Phys., 2017, 014

Lesgourgues, J. 2011, arXiv e-prints, arXiv:1104.2932

Limber, D. N. 1953, ApJ, 117, 134

Loverde, M. & Afshordi, N. 2008, Phys. Rev. D, 78, 123506

LSST Science Collaboration, Marshall, P., Anguita, T., et al.

2017, arXiv e-prints, arXiv:1708.04058

Manrique-Yus, A. & Sellentin, E. 2020, MNRAS, 491, 2655

McKay, M. D., Beckman, R. J., & Conover, W. J. 1979, Techno-

metrics, 21, 21

Planck Collaboration, Ade, P. A. R., Aghanim, N., et al. 2014,

A&A, 571, A16

Planck Collaboration, Ade, P. A. R., Aghanim, N., et al. 2016,

A&A, 594, A13

Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery,

B. P. 2007, Numerical recipes 3rd edition: The art of scientificcomputing (Cambridge university press)

Rasmussen, C. E. & Williams, C. K. I. 2006, Gaussian Processes

for Machine Learning

Rogers, K. K., Peiris, H. V., Pontzen, A., et al. 2019, J. Cosmology

Astropart. Phys., 2019, 031

Schmit, C. J. & Pritchard, J. R. 2018, MNRAS, 475, 1213

Schneider, M. D., Holm, O., & Knox, L. 2011, ApJ, 728, 137

Seppala, L. G. 2002, in Society of Photo-Optical InstrumentationEngineers (SPIE) Conference Series, Vol. 4836, Proc. SPIE,

ed. J. A. Tyson & S. Wolff, 111–118

Smoot, G. F., Bennett, C. L., Kogut, A., et al. 1992, ApJ, 396,

L1

Spergel, D. N., Verde, L., Peiris, H. V., et al. 2003, ApJS, 148,

175

van Daalen, M. P., Schaye, J., Booth, C. M., & Dalla Vecchia, C.

2011, MNRAS, 415, 3649

Wang, W. 2020, arXiv e-prints, arXiv:2002.01381

Weinberg, D. H., Mortonson, M. J., Eisenstein, D. J., et al. 2013,

Phys. Rep., 530, 87

Zhu, C., Byrd, R. H., Lu, P., & Nocedal, J. 1997, ACM Trans.

Math. Softw., 23, 23

MNRAS 000, 1–15 (2020)

Page 16: arXiv:2005.06551v2 [astro-ph.CO] 15 Jul 2020 · both of tomographic weak lensing band-powers, and of coe cients of summary data massively compressed with the MOPED algorithm. In the

16 A. Mootoovaloo et al.

This paper has been typeset from a TEX/LATEX file prepared by

the author.

MNRAS 000, 1–15 (2020)