euclid: constraining ensemble photometric redshift

15
Astronomy & Astrophysics manuscript no. paper_main ©ESO 2021 September 16, 2021 Euclid : Constraining ensemble photometric redshift distributions with stacked spectroscopy ? M.S. Cagliari 1?? , B.R. Granett 2??? , L. Guzzo 1,2,3 , M. Bolzonella 4 , L. Pozzetti 4 , I. Tutusaus 5,6 , S. Camera 7,8,9 , A. Amara 10 , N. Auricchio 4 , R. Bender 11,12 , C. Bodendorf 12 , D. Bonino 9 , E. Branchini 13,14 , M. Brescia 15 , V. Capobianco 9 , C. Carbone 16 , J. Carretero 17 , F.J. Castander 5,6 , M. Castellano 18 , S. Cavuoti 15,19,20 , A. Cimatti 21,22 , R. Cledassou 23,24 , G. Congedo 25 , C.J. Conselice 26 , L. Conversi 27,28 , Y. Copin 29 , L. Corcione 9 , M. Cropper 30 , H. Degaudenzi 31 , M. Douspis 32 , F. Dubath 31 , S. Dusini 33 , A. Ealet 29 , S. Ferriol 29 , N. Fourmanoit 34 , M. Frailis 35 , E. Franceschi 4 , P. Franzetti 16 , B. Garilli 16 , C. Giocoli 36,37 , A. Grazian 33 , F. Grupp 11,12 , S.V.H. Haugan 38 , H. Hoekstra 39 , W. Holmes 40 , F. Hormuth 41,42 , P. Hudelot 43 , K. Jahnke 42 , S. Kermiche 44 , A. Kiessling 40 , M. Kilbinger 45 , T. Kitching 30 , M. Kümmel 11 , M. Kunz 46 , H. Kurki-Suonio 47 , S. Ligori 9 , P.B. Lilje 38 , I. Lloro 48 , E. Maiorano 4 , O. Mansutti 35 , O. Marggraf 49 , K. Markovic 40 , R. Massey 50 , M. Meneghetti 4,51,52 , E. Merlin 18 , G. Meylan 53 , M. Moresco 4,21 , L. Moscardini 4,21,52 , S.M. Niemi 54 , C. Padilla 17 , S. Paltani 31 , F. Pasian 35 , K. Pedersen 55 , W.J. Percival 56,57,58 , V. Pettorino 45 , S. Pires 45 , M. Poncet 24 , L. Popa 59 , F. Raison 12 , R. Rebolo 60,61 , J. Rhodes 40 , H.-W. Rix 42 , M. Roncarelli 4,21 , E. Rossetti 21 , R. Saglia 11,12 , R. Scaramella 18,62 , P. Schneider 49 , M. Scodeggio 16 , A. Secroun 44 , G. Seidel 42 , S. Serrano 5,6 , C. Sirignano 63,64 , G. Sirri 52 , D. Tavagnacco 35 , A.N. Taylor 25 , I. Tereno 65,66 , R. Toledo-Moreo 67 , E.A. Valentijn 68 , L. Valenziano 4,52 , Y. Wang 69 , N. Welikala 25 , J. Weller 11,12 , G. Zamorani 4 , J. Zoubian 44 , M. Baldi 4,21,52 , R. Farinelli 70 , E. Medinaceli 4 , S. Mei 71 , G. Polenta 72 , E. Romelli 35 , T. Vassallo 11 , A. Humphrey 73 (Aliations can be found after the references) ABSTRACT Context. The ESA Euclid mission will produce photometric galaxy samples over 15 000 square degrees of the sky that will be rich for clustering and weak lensing statistics. The accuracy of the cosmological constraints derived from these measurements will depend on the knowledge of the underlying redshift distributions based on photometric redshift calibrations. Aims. A new approach is proposed to use the stacked spectra from Euclid slitless spectroscopy to augment the broad-band photometric infor- mation to constrain the redshift distribution with spectral energy distribution fitting. The high spectral resolution available in the stacked spectra complements the photometry and helps to break the colour-redshift degeneracy and constrain the redshift distribution of galaxy samples. Methods. We model the stacked spectra as a linear mixture of spectral templates. The mixture may be inverted to infer the underlying redshift distribution using constrained regression algorithms. We demonstrate the method on simulated Vera C. Rubin Observatory and Euclid mock survey data sets based on the Euclid Flagship mock galaxy catalogue. We assess the accuracy of the reconstruction by considering the inference of the baryon acoustic scale from angular two-point correlation function measurements. Results. We select mock photometric galaxy samples at redshift z > 1 using the self-organizing map algorithm. Considering the idealized case without dust attenuation, we find that the redshift distributions of these samples can be recovered with 0.5% accuracy on the baryon acoustic scale. The estimates are not significantly degraded by the spectroscopic measurement noise due to the large sample size. However, the error degrades to 2% when the dust attenuation model is left free. We find that the colour degeneracies introduced by attenuation limit the accuracy considering the wavelength coverage of the Euclid near-infrared spectroscopy. Key words. method: data analysis – method: statistical – galaxies: distances and redshifts – large-scale structure of Universe 1. Introduction The next generation of photometric surveys will produce un- precedented galaxy statistics that will fuel large-scale structure studies (LSST Science Collaboration et al. 2009; Laureijs et al. 2011; Benitez et al. 2014; Dark Energy Survey Collaboration et al. 2016). Compared with their spectroscopic counterparts (Le Fèvre et al. 2005; Driver et al. 2011; Guzzo et al. 2014; DESI Collaboration et al. 2016), photometric surveys go deeper and faster; however, the surveying eciency comes at the cost of ? This paper is published on behalf of the Euclid Consortium. ?? e-mail: [email protected] ??? e-mail: [email protected] spectral resolution. Imaging surveys are limited to photomet- ric measurements such as broadband colours to infer the red- shifts of galaxies (Connolly et al. 1995; Bolzonella et al. 2000; Benítez 2000). The minimum error in a photometric redshift es- timate with optical and near-infrared broadband photometry is σ z /(1 + z) 0.02 due to fundamental degeneracies in the colour- redshift parameter space (Salvato et al. 2019). Nevertheless, with the promise of large sample sizes this precision is often accept- able for large-scale structure studies based on galaxy clustering and weak lensing analyses. The redshift of individual galaxies is not required for these analyses, but instead precise knowledge of the redshift distribution of the sample is needed to properly inter- pret the statistics. A systematic error in the redshift distribution Article number, page 1 of 15 arXiv:2109.07303v1 [astro-ph.CO] 15 Sep 2021

Upload: others

Post on 08-Nov-2021

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Euclid: Constraining ensemble photometric redshift

Astronomy & Astrophysics manuscript no. paper_main ©ESO 2021September 16, 2021

Euclid: Constraining ensemble photometric redshift distributionswith stacked spectroscopy?

M.S. Cagliari1??, B.R. Granett2???, L. Guzzo1,2,3, M. Bolzonella4, L. Pozzetti4, I. Tutusaus5,6, S. Camera7,8,9,A. Amara10, N. Auricchio4, R. Bender11,12, C. Bodendorf12, D. Bonino9, E. Branchini13,14, M. Brescia15,

V. Capobianco9, C. Carbone16, J. Carretero17, F.J. Castander5,6, M. Castellano18, S. Cavuoti15,19,20, A. Cimatti21,22,R. Cledassou23,24, G. Congedo25, C.J. Conselice26, L. Conversi27,28, Y. Copin29, L. Corcione9, M. Cropper30,

H. Degaudenzi31, M. Douspis32, F. Dubath31, S. Dusini33, A. Ealet29, S. Ferriol29, N. Fourmanoit34, M. Frailis35,E. Franceschi4, P. Franzetti16, B. Garilli16, C. Giocoli36,37, A. Grazian33, F. Grupp11,12, S.V.H. Haugan38,H. Hoekstra39, W. Holmes40, F. Hormuth41,42, P. Hudelot43, K. Jahnke42, S. Kermiche44, A. Kiessling40,

M. Kilbinger45, T. Kitching30, M. Kümmel11, M. Kunz46, H. Kurki-Suonio47, S. Ligori9, P.B. Lilje38, I. Lloro48,E. Maiorano4, O. Mansutti35, O. Marggraf49, K. Markovic40, R. Massey50, M. Meneghetti4,51,52, E. Merlin18,

G. Meylan53, M. Moresco4,21, L. Moscardini4,21,52, S.M. Niemi54, C. Padilla17, S. Paltani31, F. Pasian35, K. Pedersen55,W.J. Percival56,57,58, V. Pettorino45, S. Pires45, M. Poncet24, L. Popa59, F. Raison12, R. Rebolo60,61, J. Rhodes40,H.-W. Rix42, M. Roncarelli4,21, E. Rossetti21, R. Saglia11,12, R. Scaramella18,62, P. Schneider49, M. Scodeggio16,

A. Secroun44, G. Seidel42, S. Serrano5,6, C. Sirignano63,64, G. Sirri52, D. Tavagnacco35, A.N. Taylor25, I. Tereno65,66,R. Toledo-Moreo67, E.A. Valentijn68, L. Valenziano4,52, Y. Wang69, N. Welikala25, J. Weller11,12, G. Zamorani4,J. Zoubian44, M. Baldi4,21,52, R. Farinelli70, E. Medinaceli4, S. Mei71, G. Polenta72, E. Romelli35, T. Vassallo11,

A. Humphrey73

(Affiliations can be found after the references)

ABSTRACT

Context. The ESA Euclid mission will produce photometric galaxy samples over 15 000 square degrees of the sky that will be rich for clusteringand weak lensing statistics. The accuracy of the cosmological constraints derived from these measurements will depend on the knowledge of theunderlying redshift distributions based on photometric redshift calibrations.Aims. A new approach is proposed to use the stacked spectra from Euclid slitless spectroscopy to augment the broad-band photometric infor-mation to constrain the redshift distribution with spectral energy distribution fitting. The high spectral resolution available in the stacked spectracomplements the photometry and helps to break the colour-redshift degeneracy and constrain the redshift distribution of galaxy samples.Methods. We model the stacked spectra as a linear mixture of spectral templates. The mixture may be inverted to infer the underlying redshiftdistribution using constrained regression algorithms. We demonstrate the method on simulated Vera C. Rubin Observatory and Euclid mock surveydata sets based on the Euclid Flagship mock galaxy catalogue. We assess the accuracy of the reconstruction by considering the inference of thebaryon acoustic scale from angular two-point correlation function measurements.Results. We select mock photometric galaxy samples at redshift z > 1 using the self-organizing map algorithm. Considering the idealized casewithout dust attenuation, we find that the redshift distributions of these samples can be recovered with 0.5% accuracy on the baryon acoustic scale.The estimates are not significantly degraded by the spectroscopic measurement noise due to the large sample size. However, the error degrades to2% when the dust attenuation model is left free. We find that the colour degeneracies introduced by attenuation limit the accuracy considering thewavelength coverage of the Euclid near-infrared spectroscopy.

Key words. method: data analysis – method: statistical – galaxies: distances and redshifts – large-scale structure of Universe

1. Introduction

The next generation of photometric surveys will produce un-precedented galaxy statistics that will fuel large-scale structurestudies (LSST Science Collaboration et al. 2009; Laureijs et al.2011; Benitez et al. 2014; Dark Energy Survey Collaborationet al. 2016). Compared with their spectroscopic counterparts (LeFèvre et al. 2005; Driver et al. 2011; Guzzo et al. 2014; DESICollaboration et al. 2016), photometric surveys go deeper andfaster; however, the surveying efficiency comes at the cost of

? This paper is published on behalf of the Euclid Consortium.?? e-mail: [email protected]

??? e-mail: [email protected]

spectral resolution. Imaging surveys are limited to photomet-ric measurements such as broadband colours to infer the red-shifts of galaxies (Connolly et al. 1995; Bolzonella et al. 2000;Benítez 2000). The minimum error in a photometric redshift es-timate with optical and near-infrared broadband photometry isσz/(1+ z) ∼ 0.02 due to fundamental degeneracies in the colour-redshift parameter space (Salvato et al. 2019). Nevertheless, withthe promise of large sample sizes this precision is often accept-able for large-scale structure studies based on galaxy clusteringand weak lensing analyses. The redshift of individual galaxies isnot required for these analyses, but instead precise knowledge ofthe redshift distribution of the sample is needed to properly inter-pret the statistics. A systematic error in the redshift distribution

Article number, page 1 of 15

arX

iv:2

109.

0730

3v1

[as

tro-

ph.C

O]

15

Sep

2021

Page 2: Euclid: Constraining ensemble photometric redshift

A&A proofs: manuscript no. paper_main

1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0z

0.6

0.8

1.0

1.2

YJ[mag]

SBEll

10000 12000 14000 16000 18000

Wavelength [Å]

0.6

0.8

1.0

1.2

1.4

1.6

Norm

aliz

ed F

lux

NISP YNISP J

Ell at z 1.7SB at z 1.9SB at z 2.5

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Tra

nsm

issi

on

Fig. 1. Left: To illustrate the method we show a colour degeneracy in Y − J as a function of redshift for starburst (SB) and elliptical (Ell) galaxytemplates from Ilbert et al. (2009). The dashed horizontal line indicates Y − J = 0.8 and shows three redshift solutions consistent with the colour:elliptical galaxies at z ∼ 1.7, and starburst galaxies at z ∼ 1.9 and z ∼ 2.5. Right: We see that each solution gives a unique spectral shape in thenear infrared range probed by the Euclid NISP instrument (red shaded area). The red line shows the elliptical template at z ∼ 1.7 and the blue andgreen lines show the starburst template at z ∼ 1.9 and 2.5. The spectra are normalised at the effective wavelength of the Y NISP filter. The EuclidNISP Y and J filter transmission are overplotted. The spectral resolution of the plotted templates is lower than the Euclid NISP spectrograph one.The stacked spectroscopy at fixed colour is built from the linear combination of these templates and encodes enough information to recover therelative contributions of spectral type at each redshift.

estimate propagates directly to biases in the results (Newmanet al. 2015).

The ensemble redshift distribution of a photometrically-selected galaxy sample can be measured directly by targeting arepresentative subsample with spectroscopy. Currently the com-plete calibration of the colour-redshift relation (C3R2) campaignis underway using 8m class telescopes to construct a calibrationdataset for Euclid1 (Masters et al. 2019; Euclid Collaborationet al. 2020; Stanford et al. 2021). It is challenging to build fullyrepresentative spectroscopic samples particularly at the faint endat both low and high redshift. For past surveys it was necessaryto include corrections for incompleteness in the spectroscopicmeasurements (Lima et al. 2008; Hartley et al. 2020) and thesecorrections depend on a complex set of parameters related to theobserving conditions and intrinsic galaxy properties (Scodeggioet al. 2018). An alternative solution, the clustering redshift esti-mator, uses the signal encoded in the spatial correlation betweena photometric sample and reference spectroscopic samples to in-fer the redshift distribution of the photometric sample (Schnei-der et al. 2006; Newman 2008; Schmidt et al. 2013; Scottez et al.2016). This approach is expected to be successful when appliedto the Euclid data set. Each method to calibrate photometric red-shift distributions comes with its own assumptions and sourcesof systematic errors; therefore, it is worthwhile to develop com-plementary methods that can provide robust cross-checks. Wefocus here on an approach that will be enabled by the rich dataset provided by Euclid’s slitless spectroscopy.

Slitless spectroscopy provides a unique tool since a measure-ment of spectral flux can be extracted for every source detectedin imaging (e.g. Zwicky 1941; Momcheva et al. 2016). The ESAEuclid mission will be the first modern all-sky survey program toemploy a slitless spectrograph (Laureijs et al. 2011; SPHEREx,Doré et al. 2018, and the NASA Nancy Grace Roman, Akesonet al. 2019, missions will follow). By design the Euclid spec-troscopy will detect and measure the redshifts for the brightestemission line galaxies primarily using the Hα line in the redshiftrange 0.9 < z < 1.8. The majority of photometrically-detected

1 http://www.euclid-ec.org

sources will be fainter and give only a very low signal-to-noiseratio spectrum precluding a direct redshift measurement. How-ever, by stacking the spectra we can extract physical informationfrom the ensemble and augment photometric studies.

The ensemble photometric redshift method proposed by Pad-manabhan et al. (2019) aims to constrain the redshift distribu-tion of a photometrically-selected galaxy sample by using thestacked spectrum built from the average of many low signal-to-noise ratio spectra. Since broadband photometric measure-ments have coarse wavelength resolution, galaxies with differentspectral types at different redshifts can have degenerate colours.These degeneracies lead to catastrophic photometric redshift er-rors which are characterised by multiple peaks and long tails inthe redshift distribution. Adding information from stacked spec-troscopy can break these degeneracies since spectral featuresleave their signature in the stack. The spectra cannot be used toinfer the redshift of individual sources due to the low signal-to-noise ratio of the measurements, but the ensemble can be usedto infer the redshift distribution. The stacked spectrum will bea mixture of galaxy spectral types at different redshifts and withtemplate fitting a unique decomposition may be found to recoverthe redshift distribution.

In this work we implement and test the approach on a mockgalaxy survey considering ground based photometry from theVera C. Rubin Observatory and near-infrared photometry andslitless spectroscopy from Euclid. We select galaxy samplesbased on the photometry and infer the redshift distributions us-ing the combination of stacked spectroscopy and stacked pho-tometry. The Euclid near-infrared spectrograph (NISP) has awavelength range 1.25–1.85 µm; therefore, it will measure therest-frame spectral energy distribution (SED) at λ < 9000 Åfor galaxies at z > 1. Thus the spectroscopy can augment thenear-infrared photometry by adding continuum shape informa-tion in the spectral range of the 4000 Å break which is a key fea-ture for redshift estimation. The redshift distributions inferredfrom broadband photometry alone are generally not accuratebecause of the dependendence on the template priors (Benítez2000). However, the joint fit of spectroscopy and photometry to-

Article number, page 2 of 15

Page 3: Euclid: Constraining ensemble photometric redshift

M.S. Cagliari et al.: Euclid: Redshift distribution with stacked spectroscopy

gether proves to be a powerful tool for extracting redshift dis-tributions: broadband photometry provides a broad wavelengthcoverage and spectroscopy gives higher spectral resolution thatcan break colour degeneracies. We demonstrate this in the caseof Euclid in Appendix A. We quantify the accuracy of the con-straints by considering the inference of the baryon acoustic oscil-lation (BAO) scale from angular two-point correlation functionmeasurements. The BAO scale is not the only feature that willbe measured in photometric galaxy clustering analyses; the fullshape of the galaxy power spectrum encodes relevant informa-tion for cosmological studies. However, we can consider that theuncertainty on the BAO scale provides a lower limit on the in-formation contained in the power spectrum and thus is a usefulmetric for quantifying systematic errors. This metric is also ap-plicable to weak lensing studies that require determinations ofthe mean redshift of tomographic bins.

This paper is organized as follows. In Sect. 2, we presentthe ensemble photometric redshift method and describe how wereduce the formal problem of finding the redshift distribution of agroup of galaxies with similar colours to a linear problem. Then,in Sect. 3, we describe the construction of the mock cataloguesused to test the method (Sect. 3.1). In this section we also discussthe SED templates used to fit the redshift distributions, how thegalaxies are partitioned into colour groups and the quantitativebenchmark for the redshift distribution estimates based on themeasurement of the BAO scale. Finally, in Sect. 4, we presentthe results of the analyses of both ideal noiseless spectroscopydata and realistic cases with noise. In Sect. 5 we summarize ourresults and discuss the applications and possible improvementsthat may be made.

2. The ensemble photometric redshift method

2.1. Method

The distribution of galaxies in colour-redshift space can be con-strained by adding information from stacked spectroscopy. Thisis illustrated in Fig. 1. The left panel shows a three-fold degen-eracy in colour at Y − J = 0.8 for starburst and elliptical galaxyspectral types. This colour can correspond to a population of el-liptical galaxies at z ∼ 1.7, starburst galaxies either at z ∼ 1.9or z ∼ 2.5, or to different mixes of these populations. In theright panel we show that these galaxy populations have uniquespectral shapes, and so, the stacked spectrum encodes informa-tion about the distribution of redshifts and the mixture of galaxytypes. We now consider the general case with many photometriccolour measurements and describe how the information in thestacked spectroscopy can be extracted.

The normalised stacked spectrum (hereafter stacked spec-trum) of a sample of galaxies with similar colours (hereafter acolour group) is defined as the weighted mean of the individualspectral flux measurements,

f obsstack(λ) =

1Ngal

Ngal∑i=1

1fi

fi(λ) , (1)

where i indexes the galaxies, f (λ) is the measured galaxy flux asa function of wavelength and f is the integrated flux for normal-isation, therefore the normalised stacked spectrum is expressedin units Hz−1. The calculation of the integrated flux will be dis-cussed in Sect. 2.3. In the analysis we will generalize f (λ) to alsoinclude broadband photometric measurements (see Sect. 2.4).The observed flux spectrum can be written in terms of the rest-

frame SED of the galaxy,

f (λ) = a T (λ | z) , (2)

thus, as the product of a flux normalisation, a, and a rest-frametemplate transformed to redshift z, T (λ | z).

Galaxy SEDs can be modeled by a finite set of parameters(e.g. Marchetti et al. 2013). Therefore, the expression for thestacked spectrum can be rewritten as a sum over discrete SEDsindexed by α and weighted by their frequency as a function ofredshift, pα(z),

f modelstack (λ) =

NSED∑α=1

∫dz pα(z) Tα(λ | z) . (3)

The template normalisation required to equate f obsstack(λ) =

f modelstack (λ) will be discussed in Sect. 2.3.

In order to carry out the numerical analysis we discretise theexpression over a regular grid of redshift,

f modelstack (λ) =

NSED∑α=1

zmax∑z=zmin

pα,zTα,z(λ) , (4)

where NSED is the number of templates, and zmin and zmax arethe minimum and maximum redshift over which to evaluate theredshift distribution. The overall redshift distribution is thereforegiven by the summation over templates

pz ∝

NSED∑α=1

pα,z . (5)

In principle the redshift distribution can be found by substitutingthe observed flux f obs

stack for f modelstack in Eq. (4) and solving for the

coefficients pα,z.

2.2. Machine implementation

Equation (4) describes a linear set of equations that can be writ-ten in matrix notation as

f = Tp , (6)

where f is the spectral flux data vector with Nλ elements. Thematrix T is constructed from NSED SED templates each shiftedto Nz redshifts; therefore T has dimension (NSED Nz) × Nλ. Theredshift distribution of each template is encoded in the vector pwhich has length NSED Nz. Since the product of templates andredshifts NSED Nz is much greater than the number of spectralelements Nλ, the system is under-constrained and does not havea unique solution.

We can make progress in solving Eq. (6) by using a linear re-gression algorithm that employs regularization terms to identifythe most interesting solutions. We add two pieces of information:first, we are not interested in unphysical solutions with negativepα,z, and second, galaxy spectra are well fit by a small numberof SED templates. These two considerations lead us to imposea non-negativity constraint and to use a shrinkage estimator2 tofind the minimum set of templates that can fit the stacked spectra.

We test three linear regression methods with non-negativityconstraints:2 In statistics, shrinkage is a process that aims to reduce overfitting andthe effect of sampling variation. It can be implemented with the additionof penalties to the cost function of interest.

Article number, page 3 of 15

Page 4: Euclid: Constraining ensemble photometric redshift

A&A proofs: manuscript no. paper_main

1. the non-negative least squares method (NNLS, Lawson &Hanson 1987);

2. the least absolute shrinkage and selection operator (LASSO,Tibshirani 1996);

3. and the elastic net regularization (ElasticNet, Zou & Hastie2005).

All three methods minimize a cost function of the form

minp≥0

1N

∑i

fi −∑

j

T ji p j

2

+ Q(pi)

, (7)

but adopt different penalty functions Q. The penalty functionacts to reward solutions that use fewer templates. LASSO addsthe l1 penalty of the form Q(p, α) = α|p| and ElasticNet usesQ(p, α, β) = α

[β|p| + 0.5(1 − β)p2

]. The variables α and β are

free parameters that must be chosen in the analysis (see Sect.3.5). NNLS is a variant of the standard least squares solver anddoes not introduce a penalty term. In this work we used the im-plementation of the NNLS algorithm in the Python SciPy libraryoptimize.nnls (Virtanen et al. 2020). For LASSO and Elas-ticNet we used the implementations found in the Scikit-learn li-brary (Pedregosa et al. 2011).

The attractiveness of the ensemble photometric redshiftsmethod as it has been presented here comes from its ability toinfer the underlying distribution using only a chosen template setand no additional information. However, adding physical priors,e.g. the galaxy luminosity function or galaxy type-redshift dis-tributions, may improve the method performance. Consideringhow the problem was reduced to a set of linear equations (Eq. 6),to take into account physical priors is not a trivial task. Possibly,the most straightforward way to do so is to rewrite the problemin terms of likelihood maximisation in a Bayesian framework.The likelihood could be sampled in the parameter space via aMarkov chain Monte Carlo (MCMC) algorithm. This approachcould have a high computational cost since the parameter spaceis very large.

2.3. Normalisation

The normalisation of the spectra is important in the stackingprocess (Eq. 1) to standardize the contribution from the faintestand brightest sources. We chose to normalise the galaxy spec-troscopy by the integrated flux. However, since the measuredspectra are very noisy, the integration cannot be carried out onthe spectra themselves. Instead, we use the broadband photome-try to set the normalisation. The photometry is typically deeperthan the spectroscopy and so gives a robust normalisation.

In this analysis we use the near-infrared photometry in the Y ,J and H bandpasses that will be measured by the Euclid NISPinstrument. That is, the integrated flux used to weight the mea-sured spectra in Eq. (1) is given byf = fY + fJ + fH , (8)where fY, fJ and fH represent the measured photometric flux inthe Y , J and H bandpasses.

The SED templates, Tα,z(λ), are normalised in the same wayby computing the integrated flux in the three NISP bandpassesand summing them. The flux integrated over a bandpass responsefunction R(λ) is

fR =

∫R(λ) fλ(λ) λ

hc dλ∫R(λ) dλ

, (9)

where fλ(λ) is the spectral flux in units erg cm−2 s−1 Å−1 and hcis the product of Planck’s constant and the speed of light.

2.4. Combining photometry and spectroscopy

The extension of the observed stacked spectrum f with photom-etry is straightforward. We generalize the spectroscopic wave-length λ in Eq. (1) so that it also refers to photometric bands.The first part of the data vector f will contain the actual stackedspectrum, while its last Nb elements, where Nb is the number ofphotometric filters, will be the observed stacked photometry ineach filter. The photometric data are stacked following Eq. (1) inthe same way as the spectra are and have the same normalisationas well (Eq. 8). Therefore, the extended stacked spectrum f is avector with Nλ + Nb components.

In order to extend the template matrix with photometry wecompute the photometric fluxes in the bandpasses of interestwith Eq. (9) for each one of the NSED Nz templates in the ma-trix. These fluxes are normalised in the same way as the SEDtemplates. The dimension of the template matrix T becomes(NSED Nz) × (Nλ + Nb) and note that its columns have the sameorder as the elements of the extended stacked spectrum.

Lastly, in this work we do not weight the data with their ob-servational errors. This choice was dictated by the great numberof galaxies in the colour groups. There are so many galaxies ina colour group that the noise in the spectra becomes negligible.This is seen in Fig. 2 right panel which illustrates the spectralstack with 2×106 galaxies. However, an inverse-error weightingmay be applied to improve the performance in the analysis ofless populous colour groups. The weights may be defined usingthe variance of the stacked spectrum,

σ2stack(λ) =

1N2

gal

Ngal∑i=1

1

fi2 σ

2i (λ) , (10)

where σ(λ) is the observed galaxy flux error as a function ofwavelength. The stacked standard deviation is the square root ofthe stacked variance and its inverse can be used to weight thestacked spectrum and the columns of the template matrix. If thedata are weighted following this recipe, the computation of premains a linear problem with the form of Eq. (6) with the onlydifference being that we substitute the stacked spectrum and thetemplate matrix with their weighted counterparts.

3. Application to mock Euclid data

3.1. Survey simulation

We synthesize mock spectroscopic and photometric observationsrepresentative of the Euclid survey to validate the ensemble pho-tometric redshifts method. We base the simulations on the EuclidFlagship mock galaxy catalogue v1.8.43. The Euclid Flagshipsimulation is a dark matter N-body simulation with a box sizeof 3780 h−1 Mpc and particle mass of 2.4 × 109 M (Potter et al.2017).

The cosmic web of dark matter halos in the Flagship simula-tion was populated with galaxies using an extended halo occupa-tion distribution model (Carretero et al. 2015; Crocce et al. 2015)by the SciPIC collaboration (Carretero et al. 2017; Tallada et al.2020) and a full-sky light cone was produced spanning the red-shift range from 0 to 2.3. Galaxy properties, including the SEDsand broadband magnitudes were assigned to match the luminos-ity function and galaxy clustering measurements at z = 0.1 andextrapolated to higher redshift.

3 https://sci.esa.int/web/euclid/-/59348-euclid-flagship-mock-galaxy-catalogue

Article number, page 4 of 15

Page 5: Euclid: Constraining ensemble photometric redshift

M.S. Cagliari et al.: Euclid: Redshift distribution with stacked spectroscopy

In this work we used the Flagship catalogue v1.8.4 cov-ering one octant of the sky (5157 deg2) in the redshift range0 < z < 2.3. We used the SED of each galaxy to simulate theEuclid grism spectroscopy in the near infrared as well as the Eu-clid broadband photometry Y , J and H bands and the six bandsfrom the Vera C. Rubin Observatory: u, g, r, i, z and y4. The fluxfrom spectral lines was not simulated in the SEDs or broadbandphotometry. This choice simplified the SED fitting procedure butis an idealization that should be addressed in a future analysis.However, the addition of emission line flux on the photometry isminor compared with the effect of internal attenuation which wedescribe next.

The mock SEDs are based on the COSMOS template set (Il-bert et al. 2009) with a variation in the internal galaxy attenuationcurves with the addition of the 2175 Å bump (Prevot et al. 1984;Calzetti et al. 2000). We use three mock catalogue versions inthe analysis with different attenuation models:

1. non-attenuated - galactic attenuation was not applied to theSEDs;

2. fixed - a fixed attenuation model was applied to all galaxySEDs (see below);

3. real - the value of E(B − V) for each galaxy in the Flagshipcatalogue was used to apply attenuation to the SED.

In each case the broadband photometry was computed in a con-sistent way from the SED with Eq. (9).

We apply attenuation to the SED in the following way thatis consistent with the construction of the Flagship mock galaxycatalogue. The attenuated SED is computed as the product of thenon-attenuated SED and an attenuation factor,

Fatt(λ) =

(fatt(λ)f0(λ)

) E(B−V)0.2

, (11)

where fatt(λ) is one out of the four attenuation curve from Prevotet al. (1984) and Calzetti et al. (2000), f0(λ) is a constant func-tion per unit frequency and E(B − V) is the colour excess. Tobuild the fixed attenuation catalogue all of the galaxy SEDs weremultiplied by the same attenuation factor computed with the at-tenuation curve from Prevot et al. (1984) and E(B− V) = 0.2. Inthe case of the real attenuation catalogue, the attenuation curveand the value of E(B − V) specified for each galaxy in the Flag-ship catalogue were used.

We simulate the measurement uncertainty of the mock spec-troscopy and photometry using a simple photometric model. Thesignal-to-noise ratio, SNR, is defined as

SNR =fσ, (12)

where f is the band flux and σ its measurement uncertainty.Then the variance of a given flux can be computed as

σ2f =

flimSNR2

lim

f , (13)

where flim is the flux corresponding to the instrumental depthin the chosen band, SNRlim is the signal-to-noise ratio at whichthe depth is expressed and f is the true galaxy flux. We adoptthe 10σ depth values (J.C. Cuillandre, private communication)listed in AB magnitudes in Table 1. We generate the observedphotometric flux by drawing a value from a Gaussian distribution

4 The Euclid and Vera C. Rubin Observabory filter transmission func-tions were obtained from the Euclid data model version 1.8.

Table 1. The 10σ depth values in AB magnitude adopted for each filter.

Filter 10σ depthu 24.2g 24.5r 23.9i 23.6z 23.4y 23.2Y 23.0J 23.0H 23.0

centred on the real photometric flux and with standard deviationσ f . Moreover, in order to simulate the measured galaxy samplewe apply an H-band magnitude selection H < 24 and a signal-to-noise ratio threshold of SNRH > 5 for the redshift distributionanalysis. The total number of galaxies in each catalogue is about109 after the signal-to-noise ratio selection.

The Euclid NISP spectrograph is sensitive over the wave-length range 1.25 < λ < 1.85 µm. The pixel dispersion is ∆λ =

13.4 Å pixel−1 such that the spectral data vector has Nλ = 488 el-ements. We model the measurement uncertainty with instrumen-tal and astrophysical noise sources. The variance on the detectorin electron count units per pixel is

σ2pixel = Nexp

[texp

(ndark + nsky

)+ σ2

read

], (14)

where Nexp is the number of exposures, texp is the exposure time.The detector noise has contributions from the dark current ndarkand the read noise σ2

read. The astrophysical background nsky in-cludes contributions from zodiacal emission and scattered light.The noise per pixel is propagated to the flux-calibrated one-dimensional spectrum σλ(λ) in units erg cm−2 s−1 Å−1 with

σλ(λ) =hc λw

A ∆λ qe(λ) T (λ)σpixel . (15)

Here, A is the collecting area of the telescope, ∆λ is the spec-tral dispersion in Å pixel−1, w is the extraction window in pixels,qe is the detector quantum efficiency and T is the transmissionfunction. The measurement uncertainty is assigned to the modelflux spectra by computing the spectral variance σ2

λ(λ). The noisyrealizations of the spectra are generated by drawing values froma Gaussian distribution with the given variance and adding themto the model flux spectra.

In Fig. 2 we show an example stacked spectrum traced byugrizyY JH photometry and NISP spectroscopic measurements.The left and right panels show stacks built from 2 × 103 and2 × 106 sources, respectively. The measurement uncertainty onthe photometric points is not visible in both cases while theuncertanity on the spectroscopy is evident. The spectroscopicnoise on the other hand becomes negligible with > 106 sources.Finally, the two uncertainty models for photometry and spec-troscopy we described above simulate only observational uncer-tainties. A discussion of systematic uncertainties is left to thefinal conclusions (Sect. 5).

3.2. Description of templates

For the ensemble photometric redshifts method to be success-ful, the analysis template set used to build the matrix T shouldbe representative of the observed galaxy SEDs and also spanthe range of attenuation values. In this study we use the same

Article number, page 5 of 15

Page 6: Euclid: Constraining ensemble photometric redshift

A&A proofs: manuscript no. paper_main

2500 5000 7500 10000 12500 15000 17500 20000

Wavelength [Å]

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Tra

nsm

issi

on

Nor

mal

ised

sta

cked

flu

x [H

z-1]

2500 5000 7500 10000 12500 15000 17500 20000

Wavelength [Å]

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Tra

nsm

issi

on

Nor

mal

ised

sta

cked

flu

x [H

z-1]

Fig. 2. The stacked spectro-photometry for two groups of galaxies that have been photometrically-selected (see Sect. 3.3). The photometric dataincludes Euclid Y JH and the Vera C. Rubin Observatory ugrizy bands and the spectrocscopic data is from the Euclid NISP instrument withsimulated noise. The photometric bandpasses used in the analysis are overplotted. In both plots the photometric uncertainty bars are smaller thanthe markers. Left: an example stack for a group of galaxies with mean redshift z = 1.10. The stack includes 2 × 103 galaxies. Right: stacked fluxfor a group at mean redshift z = 1.44 with 2 × 106 galaxies.

103 104 105

Wavelength [Å]

10 6

10 5

10 4

10 3

10 2

10 1

Norm

aliz

ed flu

x/Å

Fig. 3. Arbitrary scaled COSMOS templates used for the Euclid Flag-ship mock galaxy catalogue. In red are shown the elliptical templates,in blue the lenticular and spiral and in green the starburst ones.

template set that was used to generate the mock galaxy SEDs.Clearly this is an idealized situation that can lead to over-optimistic results. A discussion of this issue will be left to thefinal conclusions (Sect. 5). The COSMOS template set includesa mix of elliptical, spiral, lenticular and starburst galaxy typesmaking a total of 31 templates (indexed from 0 to 30; Ilbert et al.2009). The templates are plotted in Fig. 3.

The three versions of the mock galaxy catalogue described inSect. 3.1 use different assumptions on the attenuation and there-fore require different template sets for analysis. In the first case,for the analysis of the non-attenuated catalogue we use templateswithout applying attenuation. This provides an idealized casestudy that we use to assess the impact that attenuation has onthe result.

In the second case with fixed attenuation the attenuationmodel is assumed to be known a priori. We apply the fixed at-tenuation model , which we described in Sect. 3.1, to all galaxySEDs and also to the template set.

Finally, in the third catalogue that is the most realistic case,realistic attenuation is applied to the mock galaxies. The attenu-

Table 2. The assignment of attenuation curves to the COSMOS tem-plates. The attenuation curve identified by 0 is a constant function perunit frequency, the index 1 refers to the attenuation curve from Prevotet al. (1984), the other attenuation curves, from 2 to 4, are from Calzettiet al. (2000).

COSMOS SEDs Attenuation curve0-9 0

10-22 123-30 223-30 323-30 4

ation curve and the E(B − V) value are assigned to each galaxyfrom the Flagship catalogue. In this case we use a combinationof attenuated and non-attenuated templates in the analysis. Theattenuated templates were constructed by applying the attenu-ation factor (Eq. 11) with the Prevot et al. (1984) and Calzettiet al. (2000) attenuation curves and E(B−V) fixed to the medianvalue from the Flagship catalogue, which cover a colour excessrange from 0 to 0.5. In Table 2 we show the correspondence be-tween the COSMOS templates and the attenuation curves. Thisprocedure follows the recipe used for the Flagship mock galaxycatalogue (Carretero et al., private communication). In this waywe have a very general set of 47 templates that also contains theattenuation model. A more representative set of templates couldbe built using different values of E(B − V); however, a greaternumber of templates would also increment the number of pa-rameters that need to be fitted in order to compute the redshiftdistributions.

In order to build the matrix, the templates need to be shiftedto Nz redshifts (see Sect. 2.2). The redshift distributions will bemeasured on the grid of these Nz redshifts. We use the same red-shift grid for the analysis of all the three catalogues, it rangesfrom redshift 0 to 2.30 with a step of 0.01 for a total of 231redshifts. In the analysis of the real attenuation catalogue, thedimension of the template matrix is (NSED Nz) × (Nλ + Nb) =10 857 × 497, while for the non-attenuated and fixed attenuationcatalogue analyses it is 7161 × 497.

Article number, page 6 of 15

Page 7: Euclid: Constraining ensemble photometric redshift

M.S. Cagliari et al.: Euclid: Redshift distribution with stacked spectroscopy

3.3. Colour selection

We use a self-organizing map (SOM) for the colour group divi-sion. The SOM (Kohonen 1982, 1990) is an unsupervised ma-chine learning algorithm that projects high-dimensional data ona lower-dimensional grid, usually a two-dimensional map. Itsmain characteristic is that the lower-dimension representationpreserves the characteristic of the high-dimensional data. In thelast few years SOMs have grown in popularity as a data drivenmethod to estimate photometric redshifts (e.g. Masters et al.2015; Wilson et al. 2020). However, in this work we simplyexploit the efficiency with which SOMs are able to cluster andgroup data with similar features.

For each galaxy we have nine photometric fluxes (see Sect.3.1) from which we compute eight colours used to build theSOM. Using SOMPY, a SOM library for Python by Moosaviet al. (2014), we built a 20 × 20 rectangular cell SOM using theprincipal component analysis (PCA) as the initialization method.The SOMs we build have much smaller dimensions than theSOMs used for photometric redshift measurements or to esti-mate physical properties (Masters et al. 2015; Davidzon et al.2019). We have two goals when we chose the SOM size: firstly,we want galaxy samples that span relatively narrow redshiftranges which are appropriate for measuring clustering statistics.Secondly, we need these samples to be highly populated in or-der to be able to average out the spectroscopic noise during thestacking process. However, there is an intrinsic tension betweenthese two goals due to the nature of the SOM algorithm. In thiswork we opted for a coarse SOM to have highly populated cells(Ngal > 106) without the need to group cells together. Compar-ing Fig. 2 left and right panels it can be seen how the increasein the number of galaxies in the cells reduces the noise in thestacked spectroscopy making it negligible; the issues related tothe analysis of less populated groups (see Fig. 2 left panel) willbe discussed in Sect. 5. Nevertheless, the choice of the SOMsize will depend on the application and should be investigated infuture studies.

We use the cells defined by the SOM grid to define the colourgroups for analysis. In Fig. 4 we illustrate how the cells of theSOM grid map to redshift. In the analysis we focus on colourgroups with compact redshift distributions and, therefore, weselected groups with standard deviation in redshift below 0.20which are marked with red spots in in the left panel of Fig. 4.The mean number of galaxies in a group is 2.5 × 106.

3.4. Quantifying the method performance

In order to quantify the error in the redshift distributions mea-sured using the ensemble photometric redshift method and tounderstand if they can be useful in cosmological studies we com-pare the angular position of the BAO peak computed with thereal redshift distribution and the one measured with the ensem-ble photometric redshifts method.

The position of the BAO peak in the angular correlation func-tion is determined by the projection of the sound horizon rs at thecomoving distance r(z) to the galaxy sample. In the case of a thinredshift shell at redshift z, the angular scale is ϑBAO = rs/r(z)(Sánchez et al. 2011). A systematic shift in the redshift distri-bution ∆z propagates to an error in the angular position to firstorder as

∆ϑBAO

ϑBAO=

1r(z)

drdz

∣∣∣∣∣z∆z , (16)

where z is the mean redshift of the shell.

However, the full shape of the angular correlation functionalso depends on the evolution of the correlation function inte-grated over the redshift distribution. This can have a substan-tial impact on the measurement of the BAO scale particularlywhen the redshift distribution has extended tails. We thereforeuse a full model of the angular correlation function to propagatethe error. We write the angular correlation function in terms ofthe three-dimensional galaxy power spectrum Pg(k, z) and nor-malised redshift distribution p(z),

w(ϑ) =

∫d` `2π

J0(`ϑ)∫

dr

[p(z) dz

dr

]2

r2 Pg

` + 12

r, z

, (17)

where r is the radial comoving distance, J0 the Bessel func-tion of order zero and the redshift z = z(r) is a function ofthe radial comoving distance (Elvin-Poole et al. 2018). This ex-pression is derived using the Limber and flat-sky approxima-tions

(k −→ `+1/2

r

)which are valid at the redshifts we probe.

We model the relation between the galaxy power spectrum,Pg(k, z), and the matter power spectrum, Pm(k, z), with a linearbias Pg(k, z) = b2Pm(k, z).

We compute the matter power spectrum using the CLASScode (Blas et al. 2011). We adopt a flat ΛCDM cosmologicalmodel with h = 0.7, Ωb = 0.05, ΩCDM = 0.25 and ΩΛ = 0.7.We use only the position of the BAO peak in the analysis so thedetails of the galaxy bias model, overall power spectrum ampli-tude and broadband shape of the power spectrum will not signif-icantly influence the results. We integrate Eq. (17) numericallyover the redshift range 0 < z < 2.3 (the limit of the mock cat-alogue). To achieve convergence the integration range was setwith kmax = 10 which corresponds to `max = 53 000.

To locate the BAO peak we fit the angular correlation func-tion computed with Eq. (17) with a template that consists ofa power law, that describes the decreasing part of the correla-tion function, added to a Gaussian component that represents thepeak,

f (ϑ) = c1 ϑ−γ + c2 exp

[−

(ϑ − ϑBAO)2

σ2

]+ ynorm . (18)

There are six parameters in this model: ynorm, an integration con-stant, c1 and c2, two coefficients and the three parameters char-acteristic of the correlation function, γ, which determines theslope, σ, which is the BAO peak width, and finally ϑBAO, the an-gular position of the BAO. We used the optimize.curve_fitalgorithm from SciPy (Virtanen et al. 2020) to fit the correla-tion function. We compute the BAO angular position using boththe measured redshift distribution and the real one, known fromthe mock catalogue data (see Sect. 3.1), for every colour group.The error in the measured redshift distribution is quantified bythe relative error of the BAO angular position (hereafter BAOrelative error):

∆ϑBAO

ϑBAO=ϑreal

BAO − ϑfitBAO

ϑrealBAO

. (19)

We use the error on the BAO scale as the performance metric;however, the ensemble photometric redshift method may also beapplied in tomographic weak lensing analyses. In the context ofweak lensing, estimates of the mean redshifts of the tomographicbins are needed. Equation (16) shows that in the limit of narrowredshift distributions, the error on the BAO scale can be equatedwith the error on the mean redshift. Therefore our results canalso be interpreted as systematic errors for weak lensing analy-ses.

Article number, page 7 of 15

Page 8: Euclid: Constraining ensemble photometric redshift

A&A proofs: manuscript no. paper_main

0 2 4 6 8 10 12 14 16 18 20

0

2

4

6

8

10

12

14

16

18

200.25

0.50

0.75

1.00

1.25

1.50

1.75

2.00

z

0 50 100 150 200 250 300 350 400Colour group

0.0

0.5

1.0

1.5

2.0

Mean r

edsh

ift

Fig. 4. Left: the two-dimensional projection of the galaxy colour groups constructed with the SOM. The colour scale indicates the mean redshiftof the SOM cells, which we define as colour groups. The red spots identify the cells with σz < 0.20. Right: the colour groups sorted by their meanredshift. The error bar represents the standard deviation of redshift, σz, in each group. The groups with red error bars are the one with σz < 0.20.

=10

4

=5

10

5

=10

5

=5

10

6

=5

10

4,

=0.6

=5

10

5,

=0.6

=10

5,

=0.6

=5

10

6,

=0.6

=5

10

4,

=0.7

=10

4,

=0.7

=5

10

5,

=0.7

=10

5,

=0.7

=5

10

6,

=0.7

=5

10

5,

=0.8

=10

5,

=0.8

=10

4,

=0.9

=5

10

5,

=0.9

=10

5,

=0.9

0.004

0.005

0.006

0.007

0.008

|BAO

BAO

|

LASSO ElasticNetNoiseNo noise

Fig. 5. Mean BAO error for different sets of LASSO and ElasticNet fitparameters. The colour groups for the analysis were selected to havezmean > 1 and standard deviation in redshift lower than 0.15.

3.5. Optimising regression parameters

The regression methods LASSO and ElasticNet described inSect. 2.2 have free parameters that must be chosen. We anal-ysed the fixed attenuation catalogue over a range in the param-eter space to test the quality of the regression result and tunethe parameters. The LASSO method has the single parameterαLASSO, while ElasticNet has two parameters αElasticNet, βElasticNet.We quantify the goodness of fit with the BAO relative error, Eq.(19).

We found that the regression parameters are weakly depen-dent on the mean redshift, zmean, of the analysed colour group.Therefore we decided to limit the redshift range of our analysesby selecting only colour groups with zmean > 1. Due to this selec-tion we can neglect the regression parameters dependence on theredshift and use the same set of parameters for all the analysedgroups. As discussed in the introduction, we are most interestedin the ability of the method to fit redshift distributions at highredshifts, z > 1, rather than low redshift ones. At z > 1 the Eu-clid near IR spectroscopy will measure the rest-frame SED atλ < 9000 Å which carries more information in the continuumshape to constrain photometric redshifts.

In Fig. 5 we show the means over the colour groups of theabsolute relative error on the BAO scale (hereafter mean BAOerror) obtained with different parameters sets. For LASSO weobserve a clear minimum in the goodness of fit that identifiesα = 5 × 10−5 as the best fit parameter both for the analyses withand without spectroscopic noise. We use this value as αLASSOfor all the analyses we present in the following section. On theother hand, for the ElasticNet method we find a broad minimumin the parameter space. The analyses without and with noiserespectively have best-fit parameters αno noise = 5 × 10−5 andβno noise = 0.6, and αnoise = 5 × 10−5 and βnoise = 0.7. Thesetwo sets of fit parameters are used as αElasticNet and βElasticNet inthe following section; we use the parameters with subscript ‘nonoise’ to compute the results presented in Sect. 4.1, and the oneswith subscript ‘noise’ for the analyses in Sect. 4.2.

We found that the parameter choice also affects the smooth-ness of the redshift distribution estimates seen by eye. However,it was not possible to achieve the minimum error and smoothnesssimultaneously. We therefore optimised only for the error.

4. Results

Having explained how the method is implemented, the data pre-pared and the redshift distribution computed we now discussthe results we obtained in the analyses of the three catalogues.We present the joint analysis of stacked photometry and spec-troscopy. The analysis with broadband photometry only or spec-troscopy only proves to be significantly less accurate. We discussthese cases in Appendix A.

In Sect. 3.1 we discussed how uncertainties are added tothe photometry and the spectroscopy. The errors in the spec-tral fluxes are greater than those in photometric data (see Fig.2) and we expect them to be the main sources of uncertaintyin the redshift distribution fits; thus, we firstly analysed the cat-alogues adding only the photometric error and considered thespectroscopic noise only in later analyses. The analyses withoutspectroscopic noise can be considered as the limit in which thereare enough galaxies in a colour group such that the noise in thestacked spectrum is negligible.

Article number, page 8 of 15

Page 9: Euclid: Constraining ensemble photometric redshift

M.S. Cagliari et al.: Euclid: Redshift distribution with stacked spectroscopy

0.0 0.5 1.0 1.5 2.0z

0

2

4

6

8

10

p(z

)

Real distributionNNLSLASSOElasticNet

1.0 1.2 1.4 1.6 1.8 2.0Mean Redshift

0.02

0.01

0.00

0.01

0.02

0.03

real

BAO

fit

BAO

real

BAO

LASSOElasticNet

NNLS

0.0 0.5 1.0 1.5 2.0z

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

p(z

)

Real distributionNNLSLASSOElasticNet

1.0 1.2 1.4 1.6 1.8Mean Redshift

0.02

0.01

0.00

0.01

0.02

0.03

real

BAO

fit

BAO

real

BAO

LASSOElasticNet

NNLS

0.0 0.5 1.0 1.5 2.0z

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

p(z

)

Real distributionNNLSLASSOElasticNet

1.0 1.2 1.4 1.6 1.8 2.0Mean Redshift

0.100

0.075

0.050

0.025

0.000

0.025

0.050

0.075

0.100

real

BAO

fit

BAO

real

BAO

LASSOElasticNet

NNLS

Fig. 6. Results on ideal spectroscopy without measurement noise. Top row: results of the non-attenuated catalogue analyses. Middle row: resultsof the fixed attenuation catalogue analyses. Bottom row: results of the real attenuation catalogue analyses. NNLS, LASSO and ElasticNet resultsare respectively plotted in orange, green and red. The left column shows examples of redshift distribution fits with the real distribution plotted inblue. Shown on the right is the BAO relative error of the analysed colour groups ordered by redshift.

4.1. Analyses without noise

In the left panels of Fig. 6 we present an example of a fitted red-shift distribution for each one of the analysed catalogues. Withthese plots we highlight not only the general features related tothe analysis without spectroscopic noise, but also the character-istics of the different catalogue analyses.

Firstly, in the non-attenuated and fixed attenuation analysesthe method is able to recover and fit the position, the width andthe height of the redshift distributions; moreover, the detailed

shapes of the distributions are well fit. The three algorithms,NNLS, LASSO and ElasticNet give comparable results. How-ever, in some cases spurious secondary peaks are evident. Theoccurrence of spurious peaks is reduced in the LASSO and Elas-ticNet results due to the shrinkage and selection processes in thealgorithms which suppress secondary solutions in favour of theprincipal ones.

Secondly, we find a loss of precision in the real attenuationcase. All three estimators fail to reproduce the shape of the red-shift distributions accurately. As seen in the example shown in

Article number, page 9 of 15

Page 10: Euclid: Constraining ensemble photometric redshift

A&A proofs: manuscript no. paper_main

Table 3. Mean BAO errors for the analyses without and with spectro-scopic noise.

Analysis without spectroscopic noiseNNLS LASSO ElasticNet

non-attenuated 0.0069 0.0062 0.0053fixed 0.0086 0.0050 0.0040real 0.019 0.021 0.014

Analysis with spectroscopic noiseNNLS LASSO ElasticNet

non-attenuated 0.0073 0.0071 0.0063fixed 0.0099 0.0066 0.0057real 0.016 0.023 0.015

Fig. 6, the principal peak in the estimated redshift distribution isnarrower and many significant spurious peaks are seen.

The right-hand panels of Fig. 6 show the relative error in therecovered BAO position, Eq. (19), for each colour group. The er-ror is weakly dependent on the mean redshift of the group. Thethree estimators tend to show more stable performance and givelower error at higher redshift z > 1.4. The NNLS estimator givesthe largest error while the LASSO and ElasticNet estimators per-form similarly.

The best performance is found in the case of the non-attenuated catalogue which has a mean error of approximately0.5–0.7% in the BAO position from the three estimators (see Ta-ble 3). The fixed attenuation analysis shows similar error of 0.4–0.9%. However, when allowing the attenuation to be free in thereal attenuation case, the error grows to 2%. Attenuation intro-duces a degeneracy in colour-redshift space that leads to spu-rious peaks in the redshift distributions which biases the BAOangular position. Underestimation of the BAO position signifiesthat the redshift was biased high, as seen for the NNLS estima-tor at z > 1.8. Overall the three methods have a similar erroraround 2% in the presence of attenuation with NNLS and Elas-ticNet giving the best fits. LASSO on average performed lesswell due to a small number of groups that were poorly fit. Atz > 1.5 the LASSO and ElasticNet algorithms perform betterin the presence of attenuation, which may be attributed to thetraining process. Indeed, the performance will depend on the in-ternal galaxy attenuation models used in the template set and thetraining sample.

4.2. Analyses with noise

The results of the analyses with noisy spectroscopy are shownin Fig. 7, on the left side panels we show an example redshiftdistribution for each one of the analysed catalogues. The non-attenuated and fixed attenuation catalogue analyses with noiseshow similar results to the analyses without noise. At lower red-shift the method is able to recover the redshift distributions withgreat detail with all three the regression methods, however thatis not the case at higher redshifts, z > 1.5. At high redshift thecomputed distribution tend to be very noisy and less smooth (seeFig. 7 middle left panel). The degree of this behaviour dependson the regression method used in the fit with NNLS showingmore spurious peaks. Even though the smoothness of the redshiftdistributions is lost at high redshifts, the position and the widthof the distributions are still recovered with enough precision inorder to have small relative errors in the BAO angular positionmeasurements. As for the analyses without spectroscopic noise

we observe more stable performance and lower error at higherredshifts.

In the real attenuation catalogue analysis we observe morespurious peaks separated from the principal peak than the ones inthe other two catalogue analyses. However these spurious struc-tures usually are more peaked and frequent with respect to themuch wider ones we have in the noiseless analysis of this samecatalogue. Moreover, the fitted redshift distributions tend to bevery noisy and lose smoothness even at lower redshifts (see Fig.7 bottom left panel).

The measured error on the BAO position are plotted in Fig. 7right side panels. The non-attenuated and fixed attenuation casesshow similar trends to the analyses without noise. In both caseswe are able to recover the BAO position well, however, at z > 1.5the NNLS algorithm shows a trend of underestimating the BAOscale.

In the case of the real attenuation analysis, colour groupsat z ∼ 1.05 tend to have the BAO position overestimated. Thistrend was not evident in the analysis without noise and indicatesan added degeneracy related to the attenuation curves and howattenuation is modelled in the presence of spectroscopic noise.At z > 1.5 similar behaviour is found with and without spectro-scopic noise.

The mean BAO errors are reported in Table 3. The trends areconsistent with the analysis without spectroscopic noise but wefind that the error degrades by approximately 10%. This indi-cates that the error introduced by the internal galaxy attenuationand its imperfect modeling is the most important factor that lim-its the fit performance.

5. Conclusions

In this pilot study we have tested the use of stacked spectra fromEuclid near infra-red grism spectroscopy to reconstruct the en-semble redshift distribution of photometrically-selected galaxysamples. The general approach in the context of slitless spec-troscopic surveys was proposed by Padmanabhan et al. (2019).Here we considered the combination of broadband photometryincluding the ugrizy bands from the Vera C. Rubin Observa-tory and Euclid NISP Y JH augmented with stacked NISP grismspectroscopy using the Euclid Flagship mock galaxy catalogue.Since the optimisation of the photometric galaxy selection in Eu-clid is ongoing (Euclid Collaboration et al. 2021), we selectedmock galaxy samples in colour space using the SOM algorithm.These galaxy samples have compact distributions in both colourand redshift. The redshift distributions inferred from broadbandphotometry alone prove to be unreliable as shown in AppendixA. This is not unexpected since the constraints from SED fit-ting depend on the template priors which we do not consider(Benítez 2000). However, we find that the full application of thejoint analysis of photometry and spectroscopy on mock surveydata is promising and very informative of both the method’s lim-its and its potential applications.

To assess the quality of the redshift distribution estimationwe focused on the cosmological application of inferring the BAOscale with photometric galaxy clustering measurements. Cur-rently the best constraints of the BAO scale with photometricmeasurements is ∼ 4% (Seo et al. 2012; Abbott et al. 2019).This error depends on the survey area, the redshift of the sam-ple as well as the width of the redshift distribution. We can ex-pect that Euclid will make measurements of the BAO scale withpercent-level statistical precision in multiple redshift bins from0 < z < 3. Thus, it will be necessary to reduce the systematic

Article number, page 10 of 15

Page 11: Euclid: Constraining ensemble photometric redshift

M.S. Cagliari et al.: Euclid: Redshift distribution with stacked spectroscopy

0.0 0.5 1.0 1.5 2.0z

0

2

4

6

8

p(z

)

Real distributionNNLSLASSOElasticNet

1.0 1.2 1.4 1.6 1.8 2.0Mean Redshift

0.04

0.03

0.02

0.01

0.00

0.01

0.02

0.03

0.04

real

BAO

fit

BAO

real

BAO

LASSOElasticNet

NNLS

0.0 0.5 1.0 1.5 2.0z

0

1

2

3

4

5

6

7

p(z

)

Real distributionNNLSLASSOElasticNet

1.0 1.2 1.4 1.6 1.8Mean Redshift

0.02

0.01

0.00

0.01

0.02

0.03

0.04

real

BAO

fit

BAO

real

BAO

LASSOElasticNet

NNLS

0.0 0.5 1.0 1.5 2.0z

0

1

2

3

4

5

6

p(z

)

Real distributionNNLSLASSOElasticNet

1.0 1.2 1.4 1.6 1.8 2.0Mean Redshift

0.100

0.075

0.050

0.025

0.000

0.025

0.050

0.075

0.100

real

BAO

fit

BAO

real

BAO

LASSOElasticNet

NNLS

Fig. 7. Results with noisy spectroscopy. The panels show the same as in Fig. 6, the example redshift distribution fits plotted here are from differentcolour groups than the ones shown in Fig. 6. The redshift distributions estimated for colour groups at higher redshift tend to be less smooth with agreater frequency of spurious peaks. The LASSO and ElasticNet regression algorithms have free parameters that can be adjusted to give smootherdistributions but at the cost of lower accuracy.

error propagated from uncertainty in the redshift distributions tothe sub-percent level.

We tested the quality of the redshift distribution estimatesin progressively more realistic cases on mock galaxy cataloguesconsidering grism spectroscopy with and without measurementnoise. In the most idealized configuration without internal galac-tic attenuation the redshift distributions were reconstructed withexcellent accuracy on the BAO scale of about 0.5%. The pres-ence of spectroscopic noise degraded this error to about 0.6%.We compared three regression algorithms, NNLS, LASSO and

ElasticNet. All three performed well but ElasticNet which hastwo free parameters gave the best results.

Our main conclusion is that the accuracy of the redshift dis-tribution estimation is limited primarily by internal galaxy atten-uation and its modeling. Compared with the non-attenuated andfixed attenuation cases, we found a significant loss of precision inthe real attenuation analysis where the attenuation curve variesfor each galaxy. This was the case in both analyses we carried outconsidering spectra with and without measurement noise (Sects.4.1 and 4.2). Nevertheless, despite the degeneracies introduced

Article number, page 11 of 15

Page 12: Euclid: Constraining ensemble photometric redshift

A&A proofs: manuscript no. paper_main

by attenuation we found that the BAO scale could be recoveredwith a precision better than 2%. However, this behaviour revealsthe importance of the template set and attenuation model thatmust be representative of the galaxy sample.

In this work we used the same template set to build thegalaxy spectra and the template matrix (see Sect. 3.2); this isan ideal situation that is not possible when analysing real obser-vations. We expect a further loss in precision in a realistic casewhen the template set is not fully representative of the galaxysample. However, optimisations may be made in the template setwith the addition of priors that may improve the fitting perfor-mance. Spectroscopic campaigns such as the ongoing C3R2 willbuild representative redshift catalogs that can provide invaluableinformation to improve the templates, constrain the attenuationmodels and set priors.

Another idealisation made while building the template ma-trix that needs to be highlighted is the range of the redshift grid.The redshift grid we use for our analyses covers only the redshiftrange that is simulated in the Euclid Flagship catalogue. Realcatalogues will contain higher redshift galaxies, hence a widerrange should be spanned by the redshift grid. Nevertheless, westill expect that extending the redshift range will not produce asignificant loss in precision although we may find spurious peaksat high redshift if they are degenerate with the adopted attenua-tion model.

Moreover, it may be possible to improve the method fittingperformance by introducing inverse-error weights in the stackedspectrum and template matrix as we suggested in Sect. 2.4.These weights will be useful in the analyses with noisy spectrato balance the relative importance of the photometry and spec-troscopy in the fit and produce smoother redshift distributions.In addition it could make feasible the analysis of less populouscolour groups, in which there are not enough galaxies to averagethe noise out of the stacked spectrum.

In the case of real observations we should also account forcontamination from stars and quasars for which the templatefits may be unreliable. We will also face additional sources ofsystematic error that we have not addressed here. Grism spec-troscopy suffers from contamination due to overlapping spectra(Kümmel et al. 2009). This contamination can particularly spoilthe measurement of the galaxy continuum. However, we expectthat the spurious signals will be uncorrelated between spectraand average out in the stack. The spectrophotometric calibra-tion error on the other hand can systematically alter the shape ofall spectra in the stack and bias the fit. The importance of thesesources of error will be investigated in a later work. Future workmust also investigate the effect of emission lines in the galaxySEDs on the stacked spectrum. We expect emission lines to ap-pear in the stacked spectrum as bumps, the width of which willdepend on the photometric redshift bin width. In order to takethe emission lines into account in the analysis, they need to bemodelled and added to the template matrix SEDs. Potentially theemission lines signal would help constrain the template fittingand improve the results, but they could also make the analysismore sensitive to the choice of the template set.

Our analyses confirm that in the case of Euclid, stacked spec-troscopy adds information that can help to break degeneracies incolour space that affect statistical studies based on photometricredshifts. The approach provides an internal method for calibrat-ing the redshift distributions without relying on representativespectroscopic samples. This is particularly important at the highredshifts and faint galaxy luminosities probed by Euclid wherestatistically complete samples of spectroscopic galaxy redshiftsare lacking for calibration.

Acknowledgements. The Euclid Consortium acknowledges the European SpaceAgency and a number of agencies and institutes that have supported the devel-opment of Euclid, in particular the Academy of Finland, the Agenzia SpazialeItaliana, the Belgian Science Policy, the Canadian Euclid Consortium, the Cen-tre National d’Etudes Spatiales, the Deutsches Zentrum für Luft- und Raum-fahrt, the Danish Space Research Institute, the Fundação para a Ciência e a Tec-nologia, the Ministerio de Economia y Competitividad, the National Aeronauticsand Space Administration, the National Astronomical Observatory of Japan, theNetherlandse Onderzoekschool Voor Astronomie, the Norwegian Space Agency,the Romanian Space Agency, the State Secretariat for Education, Research andInnovation (SERI) at the Swiss Space Office (SSO), and the United KingdomSpace Agency. A complete and detailed list is available on the Euclid web site(http://www.euclid-ec.org). This work has made use of CosmoHub. Cos-moHub has been developed by the Port d’Informació Científica (PIC), main-tained through a collaboration of the Institut de Física d’Altes Energies (IFAE)and the Centro de Investigaciones Energéticas, Medioambientales y Tecnológ-icas (CIEMAT) and the Institute of Space Sciences (CSIC & IEEC), and waspartially funded by the "Plan Estatal de Investigación Científica y Técnica y deInnovación" program of the Spanish government.

ReferencesAbbott, T. M. C., Abdalla, F. B., Alarcon, A., et al. 2019, MNRAS, 483, 4866Akeson, R., Armus, L., Bachelet, E., et al. 2019, arXiv e-prints,

arXiv:1902.05569Benítez, N. 2000, ApJ, 536, 571Benitez, N., Dupke, R., Moles, M., et al. 2014, arXiv e-prints, arXiv:1403.5237Blas, D., Lesgourgues, J., & Tram, T. 2011, J. Cosmology Astropart. Phys., 2011,

034Bolzonella, M., Miralles, J. M., & Pelló, R. 2000, A&A, 363, 476Calzetti, D., Armus, L., Bohlin, R. C., et al. 2000, ApJ, 533, 682Carretero, J., Castander, F. J., Gaztañaga, E., Crocce, M., & Fosalba, P. 2015,

MNRAS, 447, 646Carretero, J., Tallada, P., Casals, J., et al. 2017, in Proceedings of The Euro-

pean Physical Society Conference on High Energy Physics — PoS(EPS-HEP2017), 488

Connolly, A. J., Csabai, I., Szalay, A. S., et al. 1995, AJ, 110, 2655Crocce, M., Castander, F. J., Gaztañaga, E., Fosalba, P., & Carretero, J. 2015,

MNRAS, 453, 1513Dark Energy Survey Collaboration, Abbott, T., Abdalla, F. B., et al. 2016, MN-

RAS, 460, 1270Davidzon, I., Laigle, C., Capak, P. L., et al. 2019, MNRAS, 489, 4817DESI Collaboration, Aghamousa, A., Aguilar, J., et al. 2016, arXiv e-prints,

arXiv:1611.00036Doré, O., Werner, M. W., Ashby, M. L. N., et al. 2018, arXiv e-prints,

arXiv:1805.05489Driver, S. P., Hill, D. T., Kelvin, L. S., et al. 2011, MNRAS, 413, 971Elvin-Poole, J., Crocce, M., Ross, A. J., et al. 2018, Phys. Rev. D, 98, 042006Euclid Collaboration, Guglielmo, V., Saglia, R., et al. 2020, A&A, 642, A192Euclid Collaboration, Pocino, A., Tutusaus, I., et al. 2021, arXiv e-prints,

arXiv:2104.05698Guzzo, L., Scodeggio, M., Garilli, B., et al. 2014, A&A, 566, A108Hartley, W. G., Chang, C., Samani, S., et al. 2020, MNRAS, 496, 4769Ilbert, O., Capak, P., Salvato, M., et al. 2009, ApJ, 690, 1236Kohonen, T. 1982, Biological Cybernetics, 43, 43Kohonen, T. 1990, Proceedings of the IEEE, 78, 78Kümmel, M., Walsh, J. R., Pirzkal, N., Kuntschner, H., & Pasquali, A. 2009,

PASP, 121, 59Laureijs, R., Amiaux, J., Arduini, S., et al. 2011, arXiv e-prints, arXiv:1110.3193Lawson, C. & Hanson, R. J. 1987, Solving least squares problems (Philadelphia:

SIAM)Le Fèvre, O., Vettolani, G., Garilli, B., et al. 2005, A&A, 439, 845Lima, M., Cunha, C. E., Oyaizu, H., et al. 2008, MNRAS, 390, 118LSST Science Collaboration, Abell, P. A., Allison, J., et al. 2009, arXiv e-prints,

arXiv:0912.0201Marchetti, A., Granett, B. R., Guzzo, L., et al. 2013, MNRAS, 428, 1424Masters, D., Capak, P., Stern, D., et al. 2015, ApJ, 813, 53Masters, D. C., Stern, D. K., Cohen, J. G., et al. 2019, ApJ, 877, 81Momcheva, I. G., Brammer, G. B., van Dokkum, P. G., et al. 2016, ApJS, 225,

27Moosavi, V., Packmann, S., & Vallés, I. 2014, SOMPY: A Python Library

for Self Organizing Map (SOM), gitHub.[Online]. Available: https://github.com/sevamoo/SOMPY

Newman, J. A. 2008, ApJ, 684, 88Newman, J. A., Abate, A., Abdalla, F. B., et al. 2015, Astroparticle Physics, 63,

81

Article number, page 12 of 15

Page 13: Euclid: Constraining ensemble photometric redshift

M.S. Cagliari et al.: Euclid: Redshift distribution with stacked spectroscopy

Padmanabhan, N., White, M., Chang, T.-C., et al. 2019, arXiv e-prints,arXiv:1903.01571

Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011, Journal of MachineLearning Research, 12, 12

Potter, D., Stadel, J., & Teyssier, R. 2017, Computational Astrophysics and Cos-mology, 4, 2

Prevot, M. L., Lequeux, J., Maurice, E., Prevot, L., & Rocca-Volmerange, B.1984, A&A, 132, 389

Salvato, M., Ilbert, O., & Hoyle, B. 2019, Nature Astronomy, 3, 212Sánchez, E., Carnero, A., García-Bellido, J., et al. 2011, MNRAS, 411, 277Schmidt, S. J., Ménard, B., Scranton, R., Morrison, C., & McBride, C. K. 2013,

MNRAS, 431, 3307Schneider, M., Knox, L., Zhan, H., & Connolly, A. 2006, ApJ, 651, 14Scodeggio, M., Guzzo, L., Garilli, B., et al. 2018, A&A, 609, A84Scottez, V., Mellier, Y., Granett, B. R., et al. 2016, MNRAS, 462, 1683Seo, H.-J., Ho, S., White, M., et al. 2012, ApJ, 761, 13Stanford, S. A., Masters, D., Darvish, B., et al. 2021, arXiv e-prints,

arXiv:2106.11367Tallada, P., Carretero, J., Casals, J., et al. 2020, Astronomy and Computing, 32,

32Tibshirani, R. 1996, Journal of the Royal Statistical Society. Series B (Method-

ological), 58, 58Virtanen, P., Gommers, R., Oliphant, T. E., et al. 2020, Nature Methods, 17, 261Wilson, D., Nayyeri, H., Cooray, A., & Häußler, B. 2020, ApJ, 888, 83Zou, H. & Hastie, T. 2005, Journal of the Royal Statistical Society: Series B

(Statistical Methodology), 67, 67Zwicky, F. 1941, PASP, 53, 242

1 Dipartimento di Fisica "Aldo Pontremoli", Universitá degli Studi diMilano, Via Celoria 16, I-20133 Milano, Italy2 INAF-Osservatorio Astronomico di Brera, Via Brera 28, I-20122Milano, Italy3 INFN-Sezione di Milano, Via Celoria 16, I-20133 Milano, Italy4 INAF-Osservatorio di Astrofisica e Scienza dello Spazio di Bologna,Via Piero Gobetti 93/3, I-40129 Bologna, Italy5 Institute of Space Sciences (ICE, CSIC), Campus UAB, Carrer deCan Magrans, s/n, 08193 Barcelona, Spain6 Institut d’Estudis Espacials de Catalunya (IEEC), Carrer Gran Capitá2-4, 08034 Barcelona, Spain7 INFN-Sezione di Torino, Via P. Giuria 1, I-10125 Torino, Italy8 Dipartimento di Fisica, Universitá degli Studi di Torino, Via P. Giuria1, I-10125 Torino, Italy9 INAF-Osservatorio Astrofisico di Torino, Via Osservatorio 20,I-10025 Pino Torinese (TO), Italy10 Institute of Cosmology and Gravitation, University of Portsmouth,Portsmouth PO1 3FX, UK11 Universitäts-Sternwarte München, Fakultät für Physik, Ludwig-Maximilians-Universität München, Scheinerstrasse 1, 81679 München,Germany12 Max Planck Institute for Extraterrestrial Physics, Giessenbachstr. 1,D-85748 Garching, Germany13 INFN-Sezione di Roma Tre, Via della Vasca Navale 84, I-00146,Roma, Italy14 Department of Mathematics and Physics, Roma Tre University, Viadella Vasca Navale 84, I-00146 Rome, Italy15 INAF-Osservatorio Astronomico di Capodimonte, Via Moiariello16, I-80131 Napoli, Italy16 INAF-IASF Milano, Via Alfonso Corti 12, I-20133 Milano, Italy17 Institut de Física d’Altes Energies (IFAE), The Barcelona Institute ofScience and Technology, Campus UAB, 08193 Bellaterra (Barcelona),Spain18 INAF-Osservatorio Astronomico di Roma, Via Frascati 33, I-00078Monteporzio Catone, Italy19 Department of Physics "E. Pancini", University Federico II, ViaCinthia 6, I-80126, Napoli, Italy20 INFN section of Naples, Via Cinthia 6, I-80126, Napoli, Italy21 Dipartimento di Fisica e Astronomia “Augusto Righi” - Alma MaterStudiorum Università di Bologna, via Piero Gobetti 93/2, I-40129Bologna, Italy22 INAF-Osservatorio Astrofisico di Arcetri, Largo E. Fermi 5, I-50125,Firenze, Italy23 Institut national de physique nucléaire et de physique des particules,

3 rue Michel-Ange, 75794 Paris Cédex 16, France24 Centre National d’Etudes Spatiales, Toulouse, France25 Institute for Astronomy, University of Edinburgh, Royal Observatory,Blackford Hill, Edinburgh EH9 3HJ, UK26 Jodrell Bank Centre for Astrophysics, School of Physics andAstronomy, University of Manchester, Oxford Road, Manchester M139PL, UK27 European Space Agency/ESRIN, Largo Galileo Galilei 1, 00044Frascati, Roma, Italy28 ESAC/ESA, Camino Bajo del Castillo, s/n., Urb. Villafranca delCastillo, 28692 Villanueva de la Cañada, Madrid, Spain29 Univ Lyon, Univ Claude Bernard Lyon 1, CNRS/IN2P3, IP2I Lyon,UMR 5822, F-69622, Villeurbanne, France30 Mullard Space Science Laboratory, University College London,Holmbury St Mary, Dorking, Surrey RH5 6NT, UK31 Department of Astronomy, University of Geneva, ch. dÉcogia 16,CH-1290 Versoix, Switzerland32 Université Paris-Saclay, CNRS, Institut d’astrophysique spatiale,91405, Orsay, France33 INAF-Osservatorio Astronomico di Padova, Via dell’Osservatorio 5,I-35122 Padova, Italy34 University of Lyon, UCB Lyon 1, CNRS/IN2P3, IUF, IP2I Lyon,France35 INAF-Osservatorio Astronomico di Trieste, Via G. B. Tiepolo 11,I-34131 Trieste, Italy36 Istituto Nazionale di Astrofisica (INAF) - Osservatorio di Astrofisicae Scienza dello Spazio (OAS), Via Gobetti 93/3, I-40127 Bologna, Italy37 Istituto Nazionale di Fisica Nucleare, Sezione di Bologna, ViaIrnerio 46, I-40126 Bologna, Italy38 Institute of Theoretical Astrophysics, University of Oslo, P.O. Box1029 Blindern, N-0315 Oslo, Norway39 Leiden Observatory, Leiden University, Niels Bohrweg 2, 2333 CALeiden, The Netherlands40 Jet Propulsion Laboratory, California Institute of Technology, 4800Oak Grove Drive, Pasadena, CA, 91109, USA41 von Hoerner & Sulger GmbH, SchloßPlatz 8, D-68723 Schwetzin-gen, Germany42 Max-Planck-Institut für Astronomie, Königstuhl 17, D-69117Heidelberg, Germany43 Institut d’Astrophysique de Paris, 98bis Boulevard Arago, F-75014,Paris, France44 Aix-Marseille Univ, CNRS/IN2P3, CPPM, Marseille, France45 AIM, CEA, CNRS, Université Paris-Saclay, Université de Paris,F-91191 Gif-sur-Yvette, France46 Université de Genève, Département de Physique Théorique andCentre for Astroparticle Physics, 24 quai Ernest-Ansermet, CH-1211Genève 4, Switzerland47 Department of Physics and Helsinki Institute of Physics, GustafHällströmin katu 2, 00014 University of Helsinki, Finland48 NOVA optical infrared instrumentation group at ASTRON, OudeHoogeveensedijk 4, 7991PD, Dwingeloo, The Netherlands49 Argelander-Institut für Astronomie, Universität Bonn, Auf demHügel 71, 53121 Bonn, Germany50 Institute for Computational Cosmology, Department of Physics,Durham University, South Road, Durham, DH1 3LE, UK51 California institute of Technology, 1200 E California Blvd, Pasadena,CA 91125, USA52 INFN-Sezione di Bologna, Viale Berti Pichat 6/2, I-40127 Bologna,Italy53 Observatoire de Sauverny, Ecole Polytechnique Fédérale de Lau-sanne, CH-1290 Versoix, Switzerland54 European Space Agency/ESTEC, Keplerlaan 1, 2201 AZ Noordwijk,The Netherlands55 Department of Physics and Astronomy, University of Aarhus, NyMunkegade 120, DK–8000 Aarhus C, Denmark56 Perimeter Institute for Theoretical Physics, Waterloo, Ontario N2L2Y5, Canada57 Department of Physics and Astronomy, University of Waterloo,Waterloo, Ontario N2L 3G1, Canada58 Centre for Astrophysics, University of Waterloo, Waterloo, Ontario

Article number, page 13 of 15

Page 14: Euclid: Constraining ensemble photometric redshift

A&A proofs: manuscript no. paper_main

N2L 3G1, Canada59 Institute of Space Science, Bucharest, Ro-077125, Romania60 Departamento de Astrofísica, Universidad de La Laguna, E-38206,La Laguna, Tenerife, Spain61 Instituto de Astrofísica de Canarias, Calle Vía Làctea s/n, 38204,San Cristòbal de la Laguna, Tenerife, Spain62 INFN-Sezione di Roma, Piazzale Aldo Moro, 2 - c/o Dipartimentodi Fisica, Edificio G. Marconi, I-00185 Roma, Italy63 INFN-Padova, Via Marzolo 8, I-35131 Padova, Italy64 Dipartimento di Fisica e Astronomia “G.Galilei", Universitá diPadova, Via Marzolo 8, I-35131 Padova, Italy65 Instituto de Astrofísica e Ciências do Espaço, Faculdade de Ciências,Universidade de Lisboa, Tapada da Ajuda, PT-1349-018 Lisboa,Portugal66 Departamento de Física, Faculdade de Ciências, Universidade deLisboa, Edifício C8, Campo Grande, PT1749-016 Lisboa, Portugal67 Universidad Politécnica de Cartagena, Departamento de Electrónicay Tecnología de Computadoras, 30202 Cartagena, Spain68 Kapteyn Astronomical Institute, University of Groningen, PO Box800, 9700 AV Groningen, The Netherlands69 Infrared Processing and Analysis Center, California Institute ofTechnology, Pasadena, CA 91125, USA70 INAF-IASF Bologna, Via Piero Gobetti 101, I-40129 Bologna, Italy71 Université de Paris, CNRS, Astroparticule et Cosmologie, F-75013Paris, France72 Space Science Data Center, Italian Space Agency, via del Politecnicosnc, 00133 Roma, Italy73 Instituto de Astrofísica e Ciências do Espaço, Universidade do Porto,CAUP, Rua das Estrelas, PT4150-762 Porto, Portugal

Article number, page 14 of 15

Page 15: Euclid: Constraining ensemble photometric redshift

M.S. Cagliari et al.: Euclid: Redshift distribution with stacked spectroscopy

0.0 0.5 1.0 1.5 2.0z

0

2

4

6

8

10

12

14

p(z

)

Real distributionSpectroscopy & PhotometrySpectroscopyPhotometry

1.0 1.2 1.4 1.6 1.8 2.0Mean Redshift

0.04

0.03

0.02

0.01

0.00

0.01

0.02

0.03

real

BAO

fit

BAO

real

BAO

Spectroscopy & PhotometryPhotometry

Fig. A.1. Analysis of the non-attenuated catalogue without measure-ment noise obtained with the ElasticNet regularization. The results fromthe combination of stacked spectroscopy and photometry, stacked spec-troscopy alone and stacked photometry are respectively plotted in black,green and orange. Top: an example of fitted redshift distribution. Thereal distribution is the filled blue histogram. Bottom: BAO relative errorof the analysed colour groups ordered by redshift. The BAO positioncould not be fit in the spectroscopy-only analysis so the error in thiscase is not shown.

Appendix A: Spectro-photometry vs. photometry

In Sect. 1 we state that the combination of stacked spectroscopyand photometry is needed in order to break the colour-redshiftdegeneracy and recover detailed redshift distributions. Here wejustify this claim by comparing the results of different analysesthat use the combination of stacked spectroscopy and photome-try, stacked spectroscopy alone and stacked photometry.

We analysed a subset of the non-attenuated catalogue with-out measurement noise. The colour groups for the analysis wereselected with the same criterion used for the analyses presentedin the paper (zmean > 1 and σz < 0.2). In Fig. A.1 we presentthe results of the analysis that used the ElasticNet regulariza-tion, which was the best performing linear regression method,with the best fitting parameters labeled as αno noise and βno noise inSect. 3.5. The top panel shows an example redshift distributionfit. From the plot it is clear that the analysis with stacked spec-troscopy alone (green line) is not able to localize the peak ofthe redshift distribution. On the other hand, stacked photometry(orange line) is able to roughly locate the redshift distribution,but does not fit its substructure and presents spurious peaks. Fi-nally, the combination of stacked spectroscopy and photometry

(black line) breaks the colour-redshift degeneracies and recoversthe redshift distribution with a significant improvement in accu-racy that can be seen by eye.

The bottom panel of Fig. A.1 shows the BAO relative er-ror derived for all of the colour groups with the combination ofstacked spectroscopy and photometry, and for stacked photom-etry alone. We were unable to fit the BAO angular position forthe analysis with stacked spectroscopy alone due to the dispersedistributions that were recovered and so it is not shown on theplot. The mean BAO errors of the two analyses we present are0.0044 for the combination of stacked spectroscopy and photom-etry and 0.010 for the analysis with stacked photometry alone.Thus, the addition of spectroscopy in the analysis reduces theerror by more than a factor of 2.

These results justify the choice of using the combinationof stacked spectroscopy and photometry. Photometry is indeedneeded in order to locate the redshift distribution, but the addi-tion of spectroscopic information helps to break the degenera-cies in colour-redshift space and significantly improves the con-straints.

Article number, page 15 of 15