trans-dimensional bayesian inference for gravitational ... · exible lens models, and reliable...

10
Mon. Not. R. Astron. Soc. 000, 000–000 (0000) Printed 3 May 2019 (MN L A T E X style file v2.2) Trans-Dimensional Bayesian Inference for Gravitational Lens Substructures Brendon J. Brewer 1? , David Huijser 1 , Geraint F. Lewis 2 1 Department of Statistics, The University of Auckland, Private Bag 92019, Auckland 1142, New Zealand 2 Sydney Institute for Astronomy, School of Physics, A28, The University of Sydney, NSW 2006, Australia ABSTRACT We introduce a Bayesian solution to the problem of inferring the density profile of strong gravitational lenses when the lens galaxy may contain multiple dark or faint substructures. The source and lens models are based on a superposition of an unknown number of non-negative basis functions (or “blobs”) whose form was chosen with speed as a primary criterion. The prior distribution for the blobs’ properties is specified hi- erarchically, so the mass function of substructures is a natural output of the method. We use reversible jump Markov Chain Monte Carlo (MCMC) within Diffusive Nested Sampling (DNS) to sample the posterior distribution and evaluate the marginal likeli- hood of the model, including the summation over the unknown number of blobs in the source and the lens. We demonstrate the method on a simulated data set with a single substructure, which is recovered well with moderate uncertainties. We also apply the method to the g-band image of the “Cosmic Horseshoe” system, and find some hints of potential substructures. However, we caution that such results could also be caused by misspecifications in the model (such as the shape of the smooth lens component or the point spread function), which are difficult to guard against in full generality. Key words: gravitational lensing: strong — methods: data analysis — methods: statistical 1 INTRODUCTION Galaxy-galaxy gravitational lensing is a powerful astrophys- ical tool for studying the distribution of matter, including dark matter, in galaxies (Treu 2010). One promising applica- tion of lensing is to study and measure the properties of dark matter substructures in the lens galaxy (Koopmans 2005). In recent years, this promise has been realised in several lens systems which are thought to contain at least one dark substructure (Vegetti et al. 2012, 2010; Vegetti, Czoske, & Koopmans 2010). In these studies, the lens was modelled as a superposition of a smooth component (such as an ellip- tical power-law matter distribution) plus a pixellized “non- parametric” correction term. The estimated spatial struc- ture of the correction term provides clues about the loca- tions of possible substructures. In a second modelling step, the lens is assumed to be a smooth overall component plus a compact “blob” near any locations suggested by the cor- rection term. This kind of inference has also been done with point-like multiple images of QSOs (Fadely & Keeton 2012). We have developed an approach for solving this same ? To whom correspondence should be addressed. Email: [email protected] problem (inferring the number and properties of dark sub- structures given an image) in a more direct fashion. How- ever, with appropriate modifications, it may also prove use- ful in other galaxy-galaxy lens modelling areas. One exam- ple is time-delay cosmography, which requires realistically flexible lens models, and reliable quantification of the un- certainties (Suyu et al. 2013, 2014; Grillo et al. 2015). Lens modelling has been a topic of much interest over the last two decades. Most approaches are based on Bayesian inference (Sivia & Skilling 2006; O’Hagan and Forster 2004), maximum likelihood (Millar 2011) or variations thereof. These approaches may be categorized according to the fol- lowing criteria: (i) Whether to use a simply parameterized (e.g. a S´ ersic profile), moderately flexible (e.g. a mixture model, Brewer et al. 2011) or free-form (e.g. pixellated) model for the surface brightness profile of the source; (ii) Whether to use a simply parameterized (e.g. Singular Isothermal Ellipsoid [SIE]) or flexible model (e.g. pixellated) for the mass profile of the lens; and (iii) How to compute the results (e.g. optimization meth- ods, Markov Chain Monte Carlo [MCMC] with source pa- c 0000 RAS arXiv:1508.00662v1 [astro-ph.IM] 4 Aug 2015

Upload: others

Post on 02-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Trans-Dimensional Bayesian Inference for Gravitational ... · exible lens models, and reliable quanti cation of the un-certainties (Suyu et al. 2013, 2014; Grillo et al. 2015). Lens

Mon. Not. R. Astron. Soc. 000, 000–000 (0000) Printed 3 May 2019 (MN LATEX style file v2.2)

Trans-Dimensional Bayesian Inference for GravitationalLens Substructures

Brendon J. Brewer1?, David Huijser1, Geraint F. Lewis2

1Department of Statistics, The University of Auckland, Private Bag 92019, Auckland 1142, New Zealand2Sydney Institute for Astronomy, School of Physics, A28, The University of Sydney, NSW 2006, Australia

ABSTRACTWe introduce a Bayesian solution to the problem of inferring the density profile ofstrong gravitational lenses when the lens galaxy may contain multiple dark or faintsubstructures. The source and lens models are based on a superposition of an unknownnumber of non-negative basis functions (or “blobs”) whose form was chosen with speedas a primary criterion. The prior distribution for the blobs’ properties is specified hi-erarchically, so the mass function of substructures is a natural output of the method.We use reversible jump Markov Chain Monte Carlo (MCMC) within Diffusive NestedSampling (DNS) to sample the posterior distribution and evaluate the marginal likeli-hood of the model, including the summation over the unknown number of blobs in thesource and the lens. We demonstrate the method on a simulated data set with a singlesubstructure, which is recovered well with moderate uncertainties. We also apply themethod to the g-band image of the “Cosmic Horseshoe” system, and find some hintsof potential substructures. However, we caution that such results could also be causedby misspecifications in the model (such as the shape of the smooth lens component orthe point spread function), which are difficult to guard against in full generality.

Key words: gravitational lensing: strong — methods: data analysis — methods:statistical

1 INTRODUCTION

Galaxy-galaxy gravitational lensing is a powerful astrophys-ical tool for studying the distribution of matter, includingdark matter, in galaxies (Treu 2010). One promising applica-tion of lensing is to study and measure the properties of darkmatter substructures in the lens galaxy (Koopmans 2005).In recent years, this promise has been realised in severallens systems which are thought to contain at least one darksubstructure (Vegetti et al. 2012, 2010; Vegetti, Czoske, &Koopmans 2010). In these studies, the lens was modelled asa superposition of a smooth component (such as an ellip-tical power-law matter distribution) plus a pixellized “non-parametric” correction term. The estimated spatial struc-ture of the correction term provides clues about the loca-tions of possible substructures. In a second modelling step,the lens is assumed to be a smooth overall component plusa compact “blob” near any locations suggested by the cor-rection term. This kind of inference has also been done withpoint-like multiple images of QSOs (Fadely & Keeton 2012).

We have developed an approach for solving this same

? To whom correspondence should be addressed. Email:[email protected]

problem (inferring the number and properties of dark sub-structures given an image) in a more direct fashion. How-ever, with appropriate modifications, it may also prove use-ful in other galaxy-galaxy lens modelling areas. One exam-ple is time-delay cosmography, which requires realisticallyflexible lens models, and reliable quantification of the un-certainties (Suyu et al. 2013, 2014; Grillo et al. 2015).

Lens modelling has been a topic of much interest overthe last two decades. Most approaches are based on Bayesianinference (Sivia & Skilling 2006; O’Hagan and Forster 2004),maximum likelihood (Millar 2011) or variations thereof.These approaches may be categorized according to the fol-lowing criteria:

(i) Whether to use a simply parameterized (e.g. a Sersicprofile), moderately flexible (e.g. a mixture model, Breweret al. 2011) or free-form (e.g. pixellated) model for thesurface brightness profile of the source;

(ii) Whether to use a simply parameterized (e.g. SingularIsothermal Ellipsoid [SIE]) or flexible model (e.g. pixellated)for the mass profile of the lens; and

(iii) How to compute the results (e.g. optimization meth-ods, Markov Chain Monte Carlo [MCMC] with source pa-

c© 0000 RAS

arX

iv:1

508.

0066

2v1

[as

tro-

ph.I

M]

4 A

ug 2

015

Page 2: Trans-Dimensional Bayesian Inference for Gravitational ... · exible lens models, and reliable quanti cation of the un-certainties (Suyu et al. 2013, 2014; Grillo et al. 2015). Lens

2 Brewer, Huijser and Lewis

rameters marginalized out analytically, or MCMC with thesource parameters included).

Recent sophisicated approaches investigating different priorassumptions and computational approaches include Coles,Read, & Saha (2014), Tagore & Jackson (2015), and Birrer,Amara, & Refregier (2015).

Each approach involves tradeoffs between convenienceand realism. Simply parameterized models, such as Sersicsurface brightness profiles for the source, and singularisothermal ellipsoid plus external shear (SIE+γ) models forthe lens (Kormann, Schneider, & Bartelmann 1994), are veryconvenient. They only have a few adjustable parameters, andthey capture (to “first order”) relevant prior informationthat we have about the source and lens profiles. Of course,they are clearly simplifications, and can produce misleadingresults if the actual profile is very different from any memberof the assumed family.

On the other hand, pixellated models for the source (e.g.Suyu et al. 2006) or the lens (e.g. Coles, Read, & Saha 2014)can in principle represent “any” source surface brightnessprofile or lens projected density profile. However, the priordistribution over pixel values is often chosen to be a multi-variate gaussian for mathematical reasons, so that the sourcecan be analytically marginalized out (Warren & Dye 2003).Unfortunately, a multivariate gaussian prior over pixel inten-sities usually corresponds to a poor model of our prior beliefsabout the source. It assigns virtually zero prior probabilityto the hypothesis that the source actually looks like a galaxy,and very high prior probability to the hypothesis that thesource looks like noise (or blurred noise). These priors alsoassign nonzero probability to negative surface brightness ordensity values; in fact, the marginal prior probability thatany pixel is negative is typically 0.5.

Brewer et al. (2011) argued that an ideal modelling ap-proach lies somewhere between simply-parameterised andpixellated models, which is also a motivation behind Tagore& Jackson (2015)’s investigation of shapelets. One way ofachieving this is with mixture models. The source and thelens can be built up from a mixture of a moderate number(from a few to a few hundred) of simply parameterised com-ponents. For the source, this allows us to incorporate priorknowledge about the local correlations (the surface bright-ness at any particular point is likely to be similar to that ata nearby point) and the fact that most of the sky is dark(Brewer & Lewis 2006). The Brewer et al. (2011) model alsoallowed for multi-band data, and allowed for our expecta-tion that the image may or may not be similar in differentbands.

The present paper is similar in approach to Breweret al. (2011). The main differences are: i) we use a sim-pler and faster set of basis functions; ii) we apply it tothe lens as well as the source, allowing for the possibil-ity of substructure; and iii) the implementation is basedon a C++ template library developed by Brewer (2014)which allows for hierarchical priors and trans-dimensionalMCMC to be implemented in a relatively straightforwardmanner. However, in the present paper we do not accountfor multi-band data; this will be reserved for a future con-tribution. The source code for this project is available athttp://www.github.com/eggplantbren/Lensing2.

If we are interested in detecting and measuring the

properties of possible dark substructures in a lens galaxy,a superposition of an unknown number of “blobs” is themost natural model. In addition, if we want to constrain themass function of these blobs (e.g. Vegetti & Koopmans 2009;Vegetti et al. 2014), we will need a hierarchical model whichspecifies the prior probability distribution for the massesconditional on some hyperparameters. Inferring the massfunction of the blobs then reduces to calculating the poste-rior distribution for the hyperparameters. This is related tothe general principle that we should (ideally) construct ourinference methods so that the quantities we infer directly an-swer our scientific questions (Alsing et al. 2015; Schneideret al. 2015; Pancoast, Brewer, & Treu 2011).

2 THE MODEL

We now describe the details of our model and its parame-terization. The motivation for most of our modelling choicesis a compromise between computational efficiency, realism,and ease of implementation. None of the choices we havemade are final in any sense; rather, this model should beconsidered a proof of concept. We encourage exploration ofother choices. Since we can compute marginal likelihoods,Bayesian model comparison between our choices here andany proposed alternatives should be straightforward, if aunion of our hypothesis space and another can be consid-ered a reasonable model of prior uncertainty.

2.1 The Source

The surface brightness profile of the source is assumed to becomposed of a sum of a finite number of “blobs”, or basisfunctions, in order to allow some flexibility while incorpo-rating prior information about the non-negativity of surfacebrightness, and the spatial correlation expected in real sur-face brightness profiles. For computational speed, we choosethe following functional form for the surface brightness pro-file of a single blob centered at the origin:

f(x, y) =

2Aπw2

(1− r2

w2

), r 6 w

0, otherwise(1)

where r =√x2 + y2, w is the width of the blob, and A

is the total flux of the blob (i.e. the integral of the surfacebrightness over the entire domain). These basis functionsare inverted paraboloids, which are faster to evaluate thangaussians, since they do not contain an exponential function.In addition, the finite support means that each blob willevaluate to zero over a large fraction of the domain whichconfers an additional speed advantage.

If our model contains Nsrc such blobs, positioned at(xsrc

i , ysrci ) with widths wi and total fluxes Ai, the

overall surface brightness profile is:

f(x, y) =

Nsrc∑i=1

2Ai

πw2i

(1− r2i

w2i

), ri 6 wi

0, otherwise(2)

where ri =√

(x− xsrci )2 + (y − ysrc

i )2. Under these assump-tions, the source can be described in its entirety by the fol-lowing parameters:Nsrc, θsrc

i Ni=1

(3)

c© 0000 RAS, MNRAS 000, 000–000

Page 3: Trans-Dimensional Bayesian Inference for Gravitational ... · exible lens models, and reliable quanti cation of the un-certainties (Suyu et al. 2013, 2014; Grillo et al. 2015). Lens

3

where θsrci denotes the parameters for source blob i:

θsrci = xi, yi, Ai, wi . (4)

The dimensionality of the parameter space for thesource depends on the minimum and maximum values ofNsrc, which we set to 0 and 100 respectively (more detailabout priors is given in Section 3). Therefore the source isdescribed by 1 – 401 parameters.

2.2 The Lens

The surface mass density profile of the lens is modelled as asuperposition of a singular isothermal ellipsoid plus externalshear (SIE+γ) and N circular lens “blobs”. The SIE+γ is asimple and widely used lens model with analytically avail-able deflection angles, and is intended to account for thebulk of the lensing effect due to the smooth spatial distribu-tion of (visible and dark) matter in the lens galaxy. Unfor-tunately, along with simplicity comes a lack of realism, andgeneralising the smooth lens model is an important futurestep. Ultimately, concentric mixtures of smooth componentsmay be the most useful for realistic inference of the densityprofile of the halo. A common approach in the lensing com-munity, that is similar in spirit, is a superposition of anelliptical power law, external shear, and a pixellated poten-tial correction. However, the direct use of blobs is closer tothe scientific question at hand when investigating postentialdark substructures.

The SIE+γ has nine free parameters: the (circularized)Einstein radius b, axis ratio q, central position (xc, yc), ori-entation angle θ, the external shear γ, and the orientationangle of the external shear, θγ .

The N lens blobs are intended to model possible darkor faint substructures in the projected mass profile of thelens. A lensing blob with mass M and width v centered atthe origin has the following surface mass density profile:

ρ(x, y) =

2Mπv2

(1− r2

v2

), r 6 v

0, otherwise(5)

which is the same as the surface brightness profile of a sourceblob. The deflection angles for a single blob are:

αx(x, y) =

(2− r2/v2

)Mxπv2

, r 6 vMxπr2

, otherwise(6)

αy(x, y) =

(2− r2/v2

)Myπv2

, r 6 vMyπr2

, otherwise(7)

where r =√x2 + y2. These do not depend on any slow func-

tions such as square roots or exponentials. For N such blobsthe deflection angles are summed over all blobs. Thereforethe lens can be entirely described by the following parame-ters:

θSIE, Nlens,

θlensi

Ni=1

(8)

where θlensi denotes the parameters for lens blob i:

θlensi =

xlensi , ylens

i ,Mi, vi, (9)

and θSIE describes the parameters for the SIE+γ compo-

nent:

θSIE =b, q, xSIE

c , ySIEc , θ, γ, θγ

. (10)

Since the model is not of fixed dimension (the num-ber of source and lens “blobs” is unknown), we useda trans-dimensional MCMC sampler based on reversiblejump MCMC (Green 1995). This framework has beenused successfully in many astronomical inference problems(e.g. Jones, Kashyap, & van Dyk 2014; Umstatter et al.2005). We implemented the MCMC using the RJObject

software (Brewer 2014), a C++ library for implementingtrans-dimensional MCMC with hierarchically-specified pri-ors when the model is a “mixture model” or similar, as is thecase here. The RJObject library uses the Diffusive NestedSampling algorithm (DNS Brewer, Partay, & Csanyi 2011)for its sampling, but the trans-dimensionality is handled bythe MCMC moves. Therefore, the marginal likelihood we ob-tain is one that involves a sum over the hypothesis space forNsrc and Nlens. In other words, we do not need separate runswith different trial values of Nsrc and Nlens. Previous astro-nomical applications of RJObject include Huppenkothen etal. (2015) and Brewer & Donovan (2015). Proposal movesfor the parameters (positions and masses of the source andlens blobs), and birth/death moves to add or remove blobsto the source or lens, are handled internally by RJObject andfit into the general framework described in Brewer (2014).

3 PRIOR PROBABILITY DISTRIBUTIONS

The prior probability distributions for all hyperparameters,parameters, and the data, are given in Table 1, and a fac-torization of the joint distribution is displayed in Figure 1.With these priors, we aim to express a large degree of prioruncertainty (hence the liberal use of uniform, log-uniform,and Cauchy distributions). However, we also specify somepriors hierarchically to allow (for example) blobs to clus-ter around a typical central position, rather than implying ahigh probability for substructure positions being spread uni-formly (in a frequency sense) over the sky. We also appliedhierarchical priors to the substructure masses (and sourcefluxes) so that, while the typical order of magnitude of themasses is unknown, the masses themselves are likely to beroughly the same order of magnitude. The hyperparametersof these distributions can also be interpreted as straight-forward answers to questions about the substructure massfunction.

We assign somewhat informative (but heavy-tailed)Cauchy priors to the central positions of the source and thelens. One might object to the Cauchy priors for the centralpositions (and argue instead for gaussians) on the basis thatthey do not have rotational symmetry. However, the priorswe are assigning are prior only to the values of the data pixelintensities only, and not other facts about the data, such asits dimensions in arc seconds, or its rectangular shape. Giventhese, there is no reason to insist on rotational symmetry.The Cauchy priors are intended to enhance the plausibility(relative to what a uniform prior would imply) that the lensand source are somewhere near the centre of the image, butin a cautious way.

The “circular” conditional prior for the blob positions

c© 0000 RAS, MNRAS 000, 000–000

Page 4: Trans-Dimensional Bayesian Inference for Gravitational ... · exible lens models, and reliable quanti cation of the un-certainties (Suyu et al. 2013, 2014; Grillo et al. 2015). Lens

4 Brewer, Huijser and Lewis

Quantity Meaning Prior

Numbers of Blobs

Nsrc Number of source blobs Uniform(0, 1, ..., 100)Nlens Number of lens blobs Uniform(0, 1, ..., 10)

Source hyperparameters (αsrc)

(xsrcc , ysrc

c ) Typical position of source blobs iid Cauchy(location=(0, 0), scale=0.1×imageWidth)

Rsrc Typical distance of blobs from (xsrcc , ysrc

c ) LogUniform(0.01×imageWidth, 10×imageWidth)µsrc Typical flux of source blobs ln(µsrc) ∼ Cauchy(0, 1)T (−21.205, 21.205)

W srcmax Maximum width of source blobs LogUniform(0.001×imageWidth, imageWidth)

W srcmin Minimum width of source blobs Uniform(0, W src

max)

Lens hyperparameters (αlens)

(xlensc , ylens

c ) Typical position of lens blobs iid Cauchy(location=(0, 0), scale=0.1×imageWidth)

Rlens Typical distance of blobs from (xlensc , ylens

c ) LogUniform(0.01×imageWidth, 10×imageWidth)

µlens Typical mass of lens blobs ln(µlens) ∼ Cauchy(0, 1)T (−21.205, 21.205)W lens

max Maximum width of lens blobs LogUniform(0.001×imageWidth, imageWidth)

W lensmin Minimum width of lens blobs Uniform(0, W lens

max)

Source Blob Parameters (θsrci )

(xsrci , ysrc

i ) Blob position Circular(location=(xsrcc , ysrc

c ), scale=Rsrc)Ai Blob flux Exponential(mean=µsrc)

wi Blob width Uniform(W srcmin, W src

max)

Lens Blob Parameters (θlensi )

(xlensi , ylens

i ) Blob position Circular(location=(xlensc , ylens

c ), scale=Rlens)Mi Blob mass Exponential(mean=µlens)

vi Blob width Uniform(W lensmin , W lens

max)

Smooth Lens Parameters (θSIE)

b SIE Einstein Radius LogUniform(0.001×imageWidth, imageWidth)q Axis ratio Uniform(0, 0.95)

(xSIEc , ySIE

c ) Central position iid Cauchy(location=(0, 0), scale=0.1×imageWidth)

θ Orientation angle Uniform(0, π)γ External shear Cauchy(0, 0.05)T (0,∞)

θγ External shear angle Uniform(0, π)

Noise Parameters (σ)

σ0 Constant component of noise variance ln(σ0) ∼ Cauchy(0, 1)T (−21.205, 21.205)σ1 Coefficient for variance increasing with flux ln(σ1) ∼ Cauchy(0, 1)T (−21.205, 21.205)

Data (D)

Dij Pixel intensities Normal(mij , s2ij + σ2

0 + σ1mij)

Table 1. The prior distribution for all hyperparameters, parameters, and the data, in our model. Uniform(a, b) is a uniform distribution

between a and b. LogUniform(a, b) is a log-uniform distribution (with density f(x) ∝ 1/x, sometimes erroneously called a Jeffreys prior)between a and b. The notation T (α, β) after a distribution denotes truncation to the interval [α, β]. The constant imageWidth is thegeometric mean of the image dimensions in the x and y directions.

has the following density:

ρ(x, y) dx dy =1

2πW

1

rexp

(− r

W

)dx dy (11)

where r =√

(x− xc)2 + (y − yc)2. This results from an ex-ponential distribution (with expectation W ) for the radialcoordinate and a uniform distribution for the angular coor-dinate in plane polar coordinates. This was chosen (as op-posed to a more “obvious” choice such as a normal distribu-tion) for computational reasons: the RJObject code requiresa function to transform from Uniform(0, 1) distributions tothis distribution and back (making use of cumulative distri-

bution functions and their inverses), and the normal wouldhave required the use of special functions for this.

3.1 Conditional prior for the data

The conditional prior for the data given the parameters is aproduct of independent gaussian distributions, one for eachpixel. The mean of the gaussian is given by the “mock”noise-free image we would expect based on the parameters.The standard deviation of the gaussian is a combination ofthree terms; the first (denoted sij) is a “noise map” whichis loaded from a file, the second is an unknown constant σ0

which applies to the whole image, and the third is propor-

c© 0000 RAS, MNRAS 000, 000–000

Page 5: Trans-Dimensional Bayesian Inference for Gravitational ... · exible lens models, and reliable quanti cation of the un-certainties (Suyu et al. 2013, 2014; Grillo et al. 2015). Lens

5

Source Blobsi = 1, ..., Nsrc

Lens Blobsi = 1, ..., Nlens

αlensσ

Nsrc

θSIE

θsrci θlens

i

αsrc

D Nlens

Figure 1. A probabilistic graphical model (PGM) of the dependence structure of the prior information, produced using DAFT(www.daft-pgm.org). The prior for the source and lens blob parameters are specified conditional on hyperparameters. The purpose

of this is to induce dependence in the prior distribution for the blob parameters, implying (for example) that the mass of one blob isslightly informative about the mass of another, and that the locations might be clustered around a certain typical location.

tional to the square root of the mock image, with propor-tionality constant σ1.

This is usually called the “sampling distribution”, orsometimes just the “likelihood”. However, sampling distri-bution is misleading since no physical frequency distributionexists which is being sampled from. The term likelihood isalso usually used to refer to the scalar function of the param-eters obtained when the data are known. For a discussion ofthe view that this object is really a prior distribution, seeCaticha (2008, pp. 33–35).

It is common to provide a “variance image” to lens mod-elling software, which specifies the standard deviations usedin the likelihood function. However, telescopes do not pro-vide variance images. It also does not make sense to specifyvague priors prior to the image pixel values but posteriorto the variance image (which usually resembles the imageitself, and therefore would be highly informative). Allowingthe standard deviation of the gaussian in the likelihood todepend on the mock image brightness is a more principledway of modelling our actual inferential situation. Neverthe-less, we allow for an input variance image as well, which canbe used for ad-hoc masking of troublesome regions (such asunmodelled flux from other sources).

4 COMPUTATION

Computing the posterior distribution over the parameters ofsuch a model requires that we can implement Markov ChainMonte Carlo (MCMC) over the space of possible sources

and lenses. To compute the posterior distribution for theparameters (of which there are 24–464, depending on thevalues of Nsrc and Nlens), we use Diffusive Nested Sampling(Brewer, Partay, & Csanyi 2011), a form of Nested Sampling(Skilling 2006) that uses the Metropolis algorithm to movearound the parameter space.

The proposal distributions for the blob parameters(both source and lens) are handled by the RJObject library(Brewer 2014). This includes birth and death proposals thatincrease or decrease either Nsrc, or Nlens, as well as propos-als that move the blobs (in their parameter space) whilekeeping the number of blobs fixed. RJObject also facilitatesproposals that change the hyperparameters (either for thelens or the source) while keeping the actual blobs in place,as well as proposals that change the hyperparameters andshift all of the relevant blobs in so they are appropriate forthe new values of the hyperparameters.

Evaluating the likelihood function requires that we com-pute a “mock” image from the current setting of the param-eters. This mock image is calculated using standard ray-tracing methods with a uniform grid of n× n rays fired perimage pixel. However, certain kinds of proposals do not af-fect the image in any way, such as those which change thenoise parameters. For efficiency we do not recompute themock image in these cases.

c© 0000 RAS, MNRAS 000, 000–000

Page 6: Trans-Dimensional Bayesian Inference for Gravitational ... · exible lens models, and reliable quanti cation of the un-certainties (Suyu et al. 2013, 2014; Grillo et al. 2015). Lens

6 Brewer, Huijser and Lewis

Simulated Dataset

Figure 2. A simulated image of a simple “galaxy” lensed by an

SIE+γ lens in the center of the image (which does not contributeany flux), plus a single substructure close to the top left image.

The image was blurred by a PSF and had some noise added.

5 SIMULATED DATA

To demonstrate the method, we generated a simulateddataset with a single substructure. The image is shown inFigure 2, and consists of 100 × 100 pixels. The point spreadfunction (PSF) was compact and was defined on a 5× 5 grid.The data was created by firing only one ray per pixel, andthe inference was also carried out under the same level ofapproximation. Since the model assumptions are all correctfor this image, the demonstration here is purely to illustratethe computational tractability of the problem, and the kindof outputs the method can produce.

We executed the code to generate 5,000 samples fromthe “mixture of constrained priors” distribution of DNS. Ourthinning factor was 105, so 5×108 MCMC iterations were ac-tually performed, taking approximately 48 hours on a mod-est desktop PC1. After resampling these samples to reflectthe posterior distribution, we were left with 537 (equallyweighted) posterior samples.

The posterior distributions for complex models, such asthe mixture models used here, are often challenging and un-intuitive to summarise. One effective way to visualise theuncertainty in the inferences is to play a movie where eachframe is a sample from the posterior distribution. The de-gree to which the frames differ from each other conveys theuncertainty remaining after taking the data into account.

For the purposes of a paper, static summaries are moreconvenient than movies. One useful summary is based onthe concept of empirical measure. The empirical measure ofthe substructure positions is a function that takes the actualsubstructure positions (xlens

i , ylensi ) and produces a “density

function” over two dimensions, composed of delta functions

1 The computer was purchased in 2012 and has a second-generation intel i7 processor, and the process was run on 8threads.

at the positions themselves:(xlensi , ylens

i )Nlens

i=1=⇒

Nlens∑i=1

δ2(x− xlens

i , y − ylensi

). (12)

Intuitively, the empirical measure is a mathematical ob-ject that is like an “infinite resolution histogram”, in thiscase a two dimensional histogram, of the substructure posi-tions. Being a function of the actual substructure positions,the empirical measure is not available to us since we do notknow those positions with certainty. However, we have sam-ples from the posterior distribution for those positions, andcan use these (trivially) to create samples from the posteriordistribution for the empirical measure. We can also sum-marise this posterior, for example, by taking its expectedvalue.

The posterior expected value of the empirical measureis:∫

p(θ|D)

Nlens∑i=1

δ2(x− xlens

i , ylensi − ylens

i

)dθ (13)

where θ denotes all parameters and hyperparameters (in-cluding the Ns) and “

∫dθ” is an integral and summation

over the entire parameter space.Since we can approximate posterior expectations using

Monte Carlo, we can obtain the expected value of the em-pirical measure using:

1

n

n∑k=1

Nlens∑i=1

δ2(x− xlens

i , ylensi − ylens

i

)(14)

where n is the number of posterior samples. The resultingfunction is an “image” with a point mass wherever a sub-structure occurred. For visualisation purposes the image canbe blurred, or calculated at a lower resolution by replacingthe Dirac-delta function with a discrete version which re-turns a nonzero constant if a substructure appears in a pixelor zero otherwise.

The masses of the smooth and substructure componentsof the lens are usually of interest. Since the total mass ofan SIE lens is infinite, the question needs to be redefined,so we ask about the mass within some aperture of finitearea. For an SIE, the mass (in dimensionless units based onthe critical density) within the critical ellipse is simply πb2.However, the blobs have finite total masses Mi. To obtainthe posterior distribution for the lens masses, one must beclear about exactly which mass they are talking about, andmany definitions are possible, although some might be moremeaningful or well constrained than others.

In the present paper we do not address the questionof exactly which quantities related to the density profile ofthe lens are most scientifically interesting. Nevertheless, wecan verify that the results of the inference do behave in un-derstandable ways. For example, in Figure 3, we plot thejoint posterior distribution for the SIE mass within its crit-ical ellipse and the total substructure mass over the entiredomain. As one would expect, there is a strong dependencebetween these two quantities in the posterior distribution, asmass in the SIE can be traded off with mass in substructuresto some extent, while the model remains (loosely speaking)“consistent with the data”.

The posterior distribution for Nlens is also clearly of in-terest, and is displayed in Figure 4. The prior for this param-

c© 0000 RAS, MNRAS 000, 000–000

Page 7: Trans-Dimensional Bayesian Inference for Gravitational ... · exible lens models, and reliable quanti cation of the un-certainties (Suyu et al. 2013, 2014; Grillo et al. 2015). Lens

7

55 60 65 70 75 80 85 90

SIE mass (within critical ellipse)

0

5

10

15

20

25

30

35

40

45

Tot

alsu

bst

ruct

ure

mas

s

Figure 3. The joint posterior distribution for the SIE mass (in-

tegrated within its critical ellipse, which is not the critical curve

of the lens overall), and the total mass in substructures.

0 1 2 3 4 5 6 7 8 9 10Nlens

0

50

100

150

200

Nu

mb

erof

Pos

teri

orS

amp

les

Figure 4. The marginal posterior distribution forNlens, the num-

ber of substructures in the lens. The prior was uniform and thetrue value used to generate the data was 1. However, the data

is only informative enough to suppress the probability of values

above 1 slightly, since it is possible (given this data) that low masssubstructures might exist somewhere, or that what we think is one

substructure might actually be two close together, and other suchpossibilities.

eter was uniform from 0 to 10 (inclusive), and the possibilityN = 0 has been (loosely speaking) “ruled out” by the imagedata. The true solution (N = 1) has the highest probability.However, the possibilities with N > 1 are still fairly plausi-ble, since it’s possible that a very small substructure exists,and it’s also possible that two substructures near each othercould mimic the effect of one. The degree to which thesepossibilities are plausible is related to the choice of prior forthe blob amplitudes and positions.

The estimated marginal likelihood of our model (av-eraged over all parameters including Nlens and Nsrc) isln [p(D|θ)] ≈ −14248, and the information (Kullback-Leibler divergence from the prior to the posterior) is H ≈ 99nats. The information represents the degree of compressionof the posterior distribution with respect to the prior, andcan be interpreted quite literally as how much was learnedabout the parameters from the data. It is also straightfor-

Substructure Positions

Figure 5. The simulated data, with substructure positions over-laid (a Monte Carlo approximation to the posterior expectation

of the empirical measure). The density of black points in any

region is proportional to the expected number of substructureswhose centers lie within that region. In this case, there is strong

evidence for a substructure close to the top-left image (where one

was actually placed).

ward to estimate from Nested Sampling (Skilling 2006). Itsdefinition is:

H =

∫p(θ|D) ln

[p(θ|D)

p(θ)

]dθ (15)

and we can build intuition about its meaning based on somesimple examples. One such example is a uniform prior overa volume V0 and a posterior which is uniform over a smallervolume V1 contained within V0. In this case H = ln(V0/V1)nats. Therefore, a value of H = 100 nats implies the poste-rior distribution occupies roughly e−100 of the prior volume.

6 THE COSMIC HORSESHOE

As a further demonstration the method, we apply it to the g-band data of the Cosmic Horseshoe J1004+4112 (Belokurovet al. 2007; Dye et al. 2008) taken with the Isaac NewtonTelescope (INT). The image is shown in Figure 6. For thissystem, more data is available (i and U band data, as wellas more recent Hubble Space Telescope (HST) imaging), al-though the g-band image is the highest signal-to-noise ofthe INT images. The source redshift is zs = 2.379, andthe lens is a massive luminous red galaxy at zl = 0.4457.Unfortunately our current implementation doesn’t allow formulti-band data (unlike Brewer et al. (2011)), and is quiteslow when running on the larger HST image. Therefore thissection should be considered a further demonstration of thetechnique, and not a thorough study of this system.

We generated 5,000 samples from the DNS target dis-tribution and thinned by a factor of 105, so 5× 108 MCMCiterations were actually performed, taking approximately 40hours. The first 1/3 of the run was excluded as burn-in. Afterresampling, this resulted in 632 equally weighted posteriorsamples. Although this image is smaller than the simulated

c© 0000 RAS, MNRAS 000, 000–000

Page 8: Trans-Dimensional Bayesian Inference for Gravitational ... · exible lens models, and reliable quanti cation of the un-certainties (Suyu et al. 2013, 2014; Grillo et al. 2015). Lens

8 Brewer, Huijser and Lewis

-7.66 0 7.66

x (arcseconds)

-7.66

0

7.66

y(a

rcse

con

ds)

Cosmic Horseshoe

Figure 6. The g-band INT image of the Cosmic Horseshoe, withthe lens galaxy subtracted.

data (46×46 pixels), we fired 2×2 rays per pixel for greateraccuracy.

The joint posterior distribution for the SIE mass to-tal substructure mass for the Cosmic Horseshoe is given inFigure 7. As with the simulated data (Figure 3), we see anexpected negative correlation between these two quantities.However, the points are not as smoothly distributed in thiscase (we discuss this issue further in Section 6.1). An upper“limit” on the SIE mass, with 95% posterior probability, is5.21×1012 solar masses. The algorithm has also found somepossibilities where the substructure mass is much greaterthan this. In these cases, the substructures are far from theimage — it is the environment that is being modelled.

The posterior distribution for Nlens (Figure 8) alsoshows some evidence for more than zero substructures. Inparticular, the prior probability for Nlens > 0 was 10/11≈ 0.91, whereas the posterior probability is approximately0.97. This corresponds to a Bayes Factor of approximately3 in favour of Nlens > 0 versus the alternative Nlens = 0.

Despite weak evidence for the existence of substruc-ture, when we examine the expected value of the empiri-cal measure of substructure positions (Figure 9) we find noconsistency in their positions, unlike for the simulated data(Figure 5). This situation is not uncommon. For example,Brewer & Donovan (2015) found evidence for a large numberof Keplerian signals in a time series, but the number of suchsignals with well constrained properties was much lower. Re-lated to this, we can compute posterior probabilities for anyhypothesis about the substructure masses. With 95% prob-ability the substructure mass is less than 5.80×1012 solarmasses and with 75% probability it is less than 3.56×1012

solar masses. As the Einstein ring itself implies a mass ofabout 5 × 1012 solar masses, these summaries are affectedby the possibility of blobs outside the main Einstein ring.

As with any inference, the results presented here maybe sensitive to many of the input modelling assumptions,and a slightly different (yet still reasonable) set of choicesmight yield different results. For example, if the density pro-

0 1 2 3 4 5 6 7SIE mass (within critical ellipse) (M) ×1012

0

1

2

3

4

5

6

7

Tot

alsu

bst

ruct

ure

mas

s(M

)

×1012

Figure 7. The joint posterior distribution for the SIE mass (inte-

grated within its critical ellipse, which is not the critical curve of

the lens overall), and the total mass in substructures. The unitsare defined by the critical density, so that a mass of π units would

have an Einstein radius of one arcsecond. To calculate the masses

in solar masses, we assumed a flat cosmology with Ωm = 0.3,ΩΛ = 0.7, and H0 = 70 km/s/Mpc.

0 1 2 3 4 5 6 7 8 9 10Nlens

0

20

40

60

80

100

120

140

160

Nu

mb

erof

Pos

teri

orS

amp

les

Figure 8. The posterior distribution for the number of lens sub-

structures in the Cosmic Horseshoe system. There is mild evi-dence in favour of the hypothesis that Nlens 6= 0.

file of the lens was smooth but not of the SIE form, andthe data were informative enough to show this, the currentmodel would only be able to explain the data by addingsubstructures. Hence, an alternative explanation for theseresults is that, rather than containing a substructure, thedensity profile of the lens is simply not within the SIE+γfamily. In fact, some of the posterior samples obtained con-tain massive substructures outside of the image, which couldbe modelling higher-order effects of the environment beyondwhat is captured by the simple external shear model. Thisis one way of violating the SIE+γ assumption, and anotheris simply to have a different projected density profile. Oneway of further investigating this is to use a different familyof smooth lens models and doing model selection based on

c© 0000 RAS, MNRAS 000, 000–000

Page 9: Trans-Dimensional Bayesian Inference for Gravitational ... · exible lens models, and reliable quanti cation of the un-certainties (Suyu et al. 2013, 2014; Grillo et al. 2015). Lens

9

Substructure Positions

Figure 9. The positions of substructures encountered in theMCMC sampling for the Cosmic Horseshoe; technically a Monte

Carlo approximation to the posterior expected value of the em-

pirical measure of substructure positions. Unlike Figure 5, for theCosmic Horseshoe there is little consistency in the positions of

the substructures.

the marginal likelihoods. Another option which we defer tofuture work is to implement a more flexible model (such asa mixture of concentric elliptical lenses). This result is con-sistent with that of Dye et al. (2008), who found a slightpreference for an elliptical power law profile over the SIE(which is a specific case of an elliptical power law).

Another potential inadequacy of our model is the as-sumption that the PSF is known. In practice, the PSF isoften estimated from the image of nearby star(s), or fromtheoretical knowledge of the telescope (especially in the caseof space based imaging). However, if the PSF is misspecified,or in fact varies across the image, this could induce subtleeffects in the imaging which our current model could onlyexplain using substructure.

The estimated marginal likelihood of our model isln [p(D|θ)] ≈ −6967, and the information (Kullback-Leiblerdivergence from the prior to the posterior) is H ≈ 129 nats.Three example models (lens and source) sampled from theposterior are shown in Figure 10.

6.1 Reproducibility of the results

The “effective sample size” returned by DNS, which we havedescribed as the number of posterior samples, takes into ac-count the fact DNS’s target distribution is not the poste-rior. However, it does not take autocorrelation into account,and thus can present an optimistic picture of the accuracyof any Monte Carlo approximations to posterior quantities.Most standard diagnostic techniques used in MCMC can beapplied here. The simplest of these is a check of reproducibil-ity. If different runs yield substantially different results, theMCMC output should be treated with caution. For exam-ple, in Figure 7, there is a correlation between the massattributed to the SIE component and that attributed tosubstructures. However, the distribution also appears some-what “lumpy” or multimodal. This may not be a feature

Source Image

Figure 10. Three example sources (and the correspondinglensed, blurred images) representative of the posterior distribu-

tion. All three lens system scenarios have substructures, whichcause subtle distortions from the usual shape of the caustics and

critical curves (plotted as black curves). The angular scales are 3

arcseconds for the sources and 15 arcseconds for the images.

of the actual posterior distribution, but could arise due toimperfect sampling. Whereas a standard Metropolis sam-pler would move around very slowly in the hypothesis space,DNS naturally spends a non-negligible fraction of the timesampling the prior. Therefore, a particle exploring the pa-rameter space can “forget” its good fitting position, movesomewhere completely different, and find another good fit-ting model in a different location. This is a natural featureof the algorithm (and is also present in related algorithmssuch as parallel tempering). Despite this, the patchy natureof Figure 7 suggests that posterior exploration is still chal-lenging in this problem.

The results for the simulated data (Figure 3) were lesscomplex because of the definite existence of one substruc-ture. Its mass and position provide some information aboutthe hyperparameters αlens, restricting the probability thathigh mass substructures exist far from the image.

7 CONCLUSIONS

We have developed a trans-dimensional Bayesian approachmotivated directly by the question of whether dark substruc-tures exist in a lens galaxy, given image data. By making useof the Diffusive Nested Sampling algorithm (Brewer, Partay,& Csanyi 2011) and the RJObject library (Brewer 2014), weoutsource the difficulties associated with choosing Metropo-

c© 0000 RAS, MNRAS 000, 000–000

Page 10: Trans-Dimensional Bayesian Inference for Gravitational ... · exible lens models, and reliable quanti cation of the un-certainties (Suyu et al. 2013, 2014; Grillo et al. 2015). Lens

10 Brewer, Huijser and Lewis

lis proposals for a hierarchical model of non-fixed dimension.The model allows for source and lens “blobs” to appear asneeded to explain the data. The prior for the blobs’ prop-erties is specified hierarchically, to more realistically modelsensible prior beliefs, and to tie the model parameters di-rectly to questions of scientific interest such as the massfunction of substructures.

As a proof of concept, we demonstrated the success-ful recovery of a single substructure from simulated datafor which all of the model assumptions were true. We thenapplied the method to an image of the Cosmic Horseshoesystem and found mild evidence in favour of the existenceof a substructure, or at the very least, a departure from asimple SIE+γ lens profile. The main output of our methodis a set of posterior samples, each representing a plausiblescenario for the lens and the source given the data and theassumed prior information. These samples can be displayedas a movie, which is a very useful method for intuitively un-derstanding the remaining uncertainty. Any posterior sum-maries of interest can be produced from the samples, but theinterpretation of many of these summaries is not necessarilystraightforward. We have suggested that the posterior ex-pectation of the empirical measure of substructure positionsis one of the most helpful summaries.

The model assumptions used in this paper are fairlysimple. In future work we intend to generalize the modelto multi-band data, and extend the smooth lens model be-yond the overly simplistic SIE+γ assumption. Applicationsto other systems are also forthcoming, as well as variantson the model using different conditional priors for the sub-structure properties given the hyperparameters.

ACKNOWLEDGEMENTS

It is a pleasure to thank Phil Marshall (Stanford), Tom-maso Treu (UCLA), Ross Fadely (NYU), and Alan Heavens(Edinburgh) for valuable discussion. Thomas Lumley (Auck-land) also deserves credit for ending BJB’s usage of the jetcolormap. This work was funded by a Marsden Fast Startgrant from the Royal Society of New Zealand.

REFERENCES

Alsing J., Heavens A., Jaffe A. H., Kiessling A., WandeltB., Hoffmann T., 2015, arXiv, arXiv:1505.07840

Belokurov V., et al., 2007, ApJ, 671, L9Birrer S., Amara A., Refregier A., 2015, arXiv,arXiv:1504.07629

Brewer B. J., Lewis G. F., 2006, ApJ, 637, 608Brewer B. J., Partay L. B., Csanyi G., 2011, Statistics andComputing, 21, 4, 649-656. arXiv:0912.2380

Brewer B. J., Lewis G. F., Belokurov V., Irwin M. J.,Bridges T. J., Evans N. W., 2011, MNRAS, 412, 2521

Brewer, B. J., 2014, preprint. ArXiv: 1411.3921Brewer B. J., Donovan C. P., 2015, MNRAS, 448, 3206Caticha, A. 2008. Lectures on Probability, Entropy, andStatistical Physics. ArXiv e-prints arXiv:0808.0012.

Coles J. P., Read J. I., Saha P., 2014, MNRAS, 445, 2181Dye S., Evans N. W., Belokurov V., Warren S. J., HewettP., 2008, MNRAS, 388, 384

Fadely R., Keeton C. R., 2012, MNRAS, 419, 936Green, P. J., 1995, Reversible Jump Markov Chain MonteCarlo Computation and Bayesian Model Determination,Biometrika 82 (4): 711–732.

Grillo C., et al., 2015, ApJ, 800, 38Huppenkothen D., et al., 2015, accepted for publication inApJ. arXiv:1501.05251

Jones D. E., Kashyap V. L., van Dyk D. A., 2014, arXiv,arXiv:1411.7447

Koopmans L. V. E., 2005, MNRAS, 363, 1136Kormann R., Schneider P., Bartelmann M., 1994, A&A,284, 285

Millar, R. B., Maximum likelihood estimation and infer-ence: with examples in R, SAS and ADMB. Vol. 111. JohnWiley & Sons, 2011.

O’Hagan, A., Forster, J., 2004, Bayesian inference. London:Arnold.

Pancoast A., Brewer B. J., Treu T., 2011, ApJ, 730, 139Schneider M. D., Hogg D. W., Marshall P. J., DawsonW. A., Meyers J., Bard D. J., Lang D., 2015, ApJ, 807,87

Sivia, D. S., Skilling, J., 2006, Data Analysis: A BayesianTutorial, 2nd Edition, Oxford University Press

Skilling, J., 2006, “Nested Sampling for General BayesianComputation”, Bayesian Analysis 4, pp. 833-860

Suyu S. H., Marshall P. J., Hobson M. P., Blandford R. D.,2006, MNRAS, 371, 983

Suyu S. H., et al., 2014, ApJ, 788, L35Suyu S. H., et al., 2013, ApJ, 766, 70Tagore A. S., Jackson N., 2015, arXiv, arXiv:1505.00198Treu T., 2010, ARA&A, 48, 87Umstatter R., Christensen N., Hendry M., Meyer R., SimhaV., Veitch J., Vigeland S., Woan G., 2005, PhRvD, 72,022001

Vegetti S., Koopmans L. V. E., 2009, MNRAS, 400, 1583Vegetti S., Lagattuta D. J., McKean J. P., Auger M. W.,Fassnacht C. D., Koopmans L. V. E., 2012, Natur, 481,341

Vegetti S., Czoske O., Koopmans L. V. E., 2010, MNRAS,407, 225

Vegetti S., Koopmans L. V. E., Bolton A., Treu T., GavazziR., 2010, MNRAS, 408, 1969

Vegetti S., Koopmans L. V. E., Auger M. W., Treu T.,Bolton A. S., 2014, MNRAS, 442, 2017

Warren S. J., Dye S., 2003, ApJ, 590, 673

c© 0000 RAS, MNRAS 000, 000–000