a deep learning approach to galaxy cluster x-ray masses

1
A Deep Learning Approach to Galaxy Cluster X-ray Masses Michelle Ntampaka 1, 2 , J. ZuHone 3 , D. Eisenstein 1 , D. Nagai 4 , A. Vikhlinin 1 , L. Hernquist 1 , F. Marinacci 1 , D. Nelson 5 , R. Pakmor 5 , A. Pillepich 6 , P. Torrey 7 , and M. Vogelsberger 8 1 Center for Astrophysics | Harvard & Smithsonian 2 Harvard Data Science Initiative 3 Smithsonian Astrophysical Observatory 4 Yale University 5 Max-Planck-Institut f¨ ur Astrophysik 6 Max-Planck-Institut f¨ ur Astronomie 7 University of Florida 8 Massachusetts Institute of Technology Abstract We present a machine-learning approach for estimating galaxy cluster masses from Chandra mock images. We utilize a Convolutional Neural Network (CNN), a deep machine learning tool commonly used in image recognition tasks. The CNN is trained and tested on our sample of 7,896 Chandra X-ray mock observations, which are based on 329 massive clusters from the IllustrisTNG simulation. Our CNN learns from a low resolution spatial distribution of photon counts and does not use spectral information. Despite our simplifying assumption to neglect spectral information, the resulting mass values estimated by the CNN exhibit small bias in comparison to the true masses of the simulated clusters (-0.02 dex) and reproduce the cluster masses with low intrinsic scatter, 8% in our best fold and 12% averaging over all. In contrast, a more standard core-excised luminosity method achieves 15-18% scatter. We interpret the results with an approach inspired by Google DeepDream and find that the CNN ignores the central regions of clusters, which are known to have high scatter with mass. Introduction: Galaxy Clusters as a Cosmological Probe Galaxy clusters are gravitationally bound systems that contain hundreds or thousands of galaxies in dark matter halos of mass & 10 14 M . They are massive and rare, and their abundance is sensitive to the underlying cosmological model. Utilizing cluster abundance as a cosmological probe requires a large cluster sample with a well-defined selection function and a way to connect cluster observations to the underlying cluster mass with low intrinsic scatter. Methods: Mock Chandra Observations Figure: A sample of 16 of the 7,896 mock X-ray cluster observations created with pyXSIM soft- ware (ZuHone et al., 2014) applied to the IllustrisTNG cosmological hydrodynamical simulation (Springel et al., 2018). The mock observations emulate 100ks Chandra observations that have been degraded to 128 × 128 postage stamp images for one broad 0.5 - 7keV energy band; shown are the number of photons, N , in each pixel for this mock observation. Methods: Convolutional Neural Network input image 128x128x1 convolution pool fully connected feature extraction output global pool convolution convolution pool pool Figure: Convolutional neural networks employ a many-layered network to identify features of an input image, typically for image recognition. Deep convolutional neural networks (CNNs) have been used successfully in computer vision and image recognition. The method utilizes many layers of information processing, with multiple modules, each consisting of a convolutional layer and a pooling layer, and modules stacked to form a deep network. The convolutional layer averages the information within a rectangular window, while the pooling layer subsamples from the convolutional layer. Our model utilizes three layers of convolution and pooling, followed by three fully connected layers (see Ntampaka et al., 2018, for full model details). Results were produced through ten-fold crossvalidation. References ZuHone, J. A., Biffi, V., Hallman, E. J., et al. 2014, in Proceedings of the 13th Python in Science Conference, ed. St´ efan van der Walt & James Bergstra, 103 – 110 Springel, V., Pakmor, R., Pillepich, A., et al. 2018, MNRAS, 475, 676 Ntampaka, M., ZuHone, J., Eisenstein, D., et al. 2018, arXiv e-prints, arXiv:1810.07703 Maughan, B. J. 2007, ApJ, 668, 772 Mantz, A. B., Allen, S. W., Morris, R. G., & von der Linden, A. 2018, MNRAS, 473, 3072 Mordvintsev, A., Olah, C., & Tyka, M. 2015, in Google AI Blog Chollet, F. 2016, How Convolutional Neural Networks See the World Results: Cluster Mass Predictions Figure: PDF of mass error (blue solid) given by log(M predicted ) - log(M true ). The full sample error distribution best fit Gaussian (green dash) has 11.6% intrinsic scatter, while the best-fit fold has 7.7% intrinsic scatter (pink solid and purple dash). Core-excised L X -based methods that use a single measure of cluster luminosity typically achieve a 15% - 18% scatter (red band, Maughan, 2007; Mantz et al., 2018), though it is somewhat larger (15% - 23%) in the sample of simulated clusters. The Y X technique, which uses a the full spatial and spectral observation, can yield a tighter 5% - 7% scatter. Our CNN approach uses low-resolution spatial information to improve mass estimates over one based on a single summary parameter, L X . Interpretation Figure: Convolutional Neural Networks are notoriously difficult to interpret. To understand the method, we use an approach inspired by DeepDream (e.g. Mordvintsev et al., 2015) which, for an image classifier, asks the question “What changes in the input image will result in a significant change in the classification of this image?” In contrast, our implementation regresses an output mass label, so we ask “What changes in the input cluster image will result in a mass change of this image?” Original image (left) credit: NASA/CXC/NGST, public domain. Figure: The fractional change in photons (ΔN/N ) as a function of projected distance from the cluster center (R/R 500c ) for a sample of test clusters. The first iteration (solid red) adds photons beyond 0.2 R 500c , while the second iteration (blue dotted) increases the photon count at larger radii. This suggests that the CNN has learned to excise cores, which have been shown to have large scatter with M 500c . One notable exception (green) occurs when the CNN misidentifies off-center structure as the core. Interpretation tools such as this one can be used to interpret a CNN. For more details on visualizing filters of CNNs implemented in Keras, see Chollet (2016). Summary Convolutional neural networks, a class of machine learning image recognition algorithms, can reduce scatter in galaxy cluster mass estimates. For 100ks Chandra mock observations, they reduce scatter by a factor of 2 or more. The model is not a black box; it produces interpretable results. Ntampaka, et al., ApJ, Volume 876, Issue 1, article id. 82, 7 pp. (2019).

Upload: others

Post on 28-May-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Deep Learning Approach to Galaxy Cluster X-ray Masses

A Deep Learning Approach to GalaxyCluster X-ray Masses

Michelle Ntampaka1, 2, J. ZuHone3, D. Eisenstein1, D. Nagai4, A. Vikhlinin1,L. Hernquist1, F. Marinacci1, D. Nelson5, R. Pakmor5, A. Pillepich6, P. Torrey7, and

M. Vogelsberger8

1Center for Astrophysics | Harvard & Smithsonian 2Harvard Data Science Initiative 3Smithsonian Astrophysical Observatory

4Yale University 5Max-Planck-Institut fur Astrophysik 6Max-Planck-Institut fur Astronomie 7University of Florida

8Massachusetts Institute of Technology

Abstract

We present a machine-learning approach for estimating galaxy cluster masses from Chandramock images. We utilize a Convolutional Neural Network (CNN), a deep machine learning toolcommonly used in image recognition tasks. The CNN is trained and tested on our sample of 7,896Chandra X-ray mock observations, which are based on 329 massive clusters from the IllustrisTNGsimulation. Our CNN learns from a low resolution spatial distribution of photon counts and doesnot use spectral information. Despite our simplifying assumption to neglect spectral information,the resulting mass values estimated by the CNN exhibit small bias in comparison to the truemasses of the simulated clusters (-0.02 dex) and reproduce the cluster masses with low intrinsicscatter, 8% in our best fold and 12% averaging over all. In contrast, a more standard core-excisedluminosity method achieves 15-18% scatter. We interpret the results with an approach inspiredby Google DeepDream and find that the CNN ignores the central regions of clusters, which areknown to have high scatter with mass.

Introduction: Galaxy Clusters as a Cosmological Probe

Galaxy clusters are gravitationally bound systems that contain hundreds or thousands of galaxiesin dark matter halos of mass & 1014 M�. They are massive and rare, and their abundance issensitive to the underlying cosmological model. Utilizing cluster abundance as a cosmologicalprobe requires a large cluster sample with a well-defined selection function and a way to connectcluster observations to the underlying cluster mass with low intrinsic scatter.

Methods: Mock Chandra Observations

1.0 1.5 2.0 2.5log(N)

Figure: A sample of 16 of the 7,896 mock X-ray cluster observations created with pyXSIM soft-ware (ZuHone et al., 2014) applied to the IllustrisTNG cosmological hydrodynamical simulation(Springel et al., 2018). The mock observations emulate 100ks Chandra observations that havebeen degraded to 128× 128 postage stamp images for one broad 0.5− 7keV energy band; shownare the number of photons, N , in each pixel for this mock observation.

Methods: Convolutional Neural Network

inputimage128x128x1

convolution pool fullyconnected

featureextraction

output

globalpool

convolution convolutionpool pool

Figure: Convolutional neural networks employ a many-layered network to identify features of aninput image, typically for image recognition. Deep convolutional neural networks (CNNs) havebeen used successfully in computer vision and image recognition. The method utilizes manylayers of information processing, with multiple modules, each consisting of a convolutional layerand a pooling layer, and modules stacked to form a deep network. The convolutional layeraverages the information within a rectangular window, while the pooling layer subsamples fromthe convolutional layer. Our model utilizes three layers of convolution and pooling, followed bythree fully connected layers (see Ntampaka et al., 2018, for full model details). Results wereproduced through ten-fold crossvalidation.

References

ZuHone, J. A., Biffi, V., Hallman, E. J., et al. 2014, in Proceedings of the 13th Python in Science Conference, ed. Stefan van der Walt & James Bergstra, 103 – 110

Springel, V., Pakmor, R., Pillepich, A., et al. 2018, MNRAS, 475, 676

Ntampaka, M., ZuHone, J., Eisenstein, D., et al. 2018, arXiv e-prints, arXiv:1810.07703

Maughan, B. J. 2007, ApJ, 668, 772

Mantz, A. B., Allen, S. W., Morris, R. G., & von der Linden, A. 2018, MNRAS, 473, 3072

Mordvintsev, A., Olah, C., & Tyka, M. 2015, in Google AI Blog

Chollet, F. 2016, How Convolutional Neural Networks See the World

Results: Cluster Mass Predictions

0.6 0.4 0.2 0.0 0.2 0.4 0.6log(M500c, predicted) log(M500c, true)

0

2

4

6

8

10

12

PDF

CNN, best fold7.7%CNN, all folds11.6%15%-18%

Figure: PDF of mass error (blue solid) given by log(Mpredicted)− log(Mtrue). The full sample errordistribution best fit Gaussian (green dash) has 11.6% intrinsic scatter, while the best-fit fold has7.7% intrinsic scatter (pink solid and purple dash). Core-excised LX-based methods that use asingle measure of cluster luminosity typically achieve a 15%− 18% scatter (red band, Maughan,2007; Mantz et al., 2018), though it is somewhat larger (15%− 23%) in the sample of simulatedclusters. The YX technique, which uses a the full spatial and spectral observation, can yield atighter 5% − 7% scatter. Our CNN approach uses low-resolution spatial information to improvemass estimates over one based on a single summary parameter, LX.

Interpretation

Figure: Convolutional Neural Networks are notoriously difficult to interpret. To understand themethod, we use an approach inspired by DeepDream (e.g. Mordvintsev et al., 2015) which, foran image classifier, asks the question “What changes in the input image will result in a significantchange in the classification of this image?” In contrast, our implementation regresses an outputmass label, so we ask “What changes in the input cluster image will result in a mass change ofthis image?” Original image (left) credit: NASA/CXC/NGST, public domain.

0.0 0.2 0.4 0.6 0.8 1.0R/R500c

0.95

1.00

1.05

1.10

1.15

1.20

N/N

iteration 1iteration 2

Figure: The fractional change in photons (∆N/N) as a function of projected distance from thecluster center (R/R500c) for a sample of test clusters. The first iteration (solid red) adds photonsbeyond ≈ 0.2R500c, while the second iteration (blue dotted) increases the photon count at largerradii. This suggests that the CNN has learned to excise cores, which have been shown to have largescatter with M500c. One notable exception (green) occurs when the CNN misidentifies off-centerstructure as the core. Interpretation tools such as this one can be used to interpret a CNN. Formore details on visualizing filters of CNNs implemented in Keras, see Chollet (2016).

Summary

Convolutional neural networks, a class of machine learning image

recognition algorithms, can reduce scatter in galaxy cluster mass

estimates. For 100ks Chandra mock observations, they reduce

scatter by a factor of ∼2 or more. The model is not a black box; it

produces interpretable results.

Ntampaka, et al., ApJ, Volume 876, Issue 1, article id. 82, 7 pp. (2019).