arxiv:1903.01998v3 [gr-qc] 10 mar 2021

14
Statistically-informed deep learning for gravitational wave parameter estimation Hongyu Shen, 1, 2, 3 E. A. Huerta, 1, 3, 4, 5, 6 Eamonn O’Shea, 7 Prayush Kumar, 7, 8 and Zhizhen Zhao 1, 2, 3, 9 1 National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA 2 Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA 3 NCSA Center for Artificial Intelligence Innovation, University of Illinois at Urbana-Champaign, Urbana, Illinois, 61801 4 Department of Physics, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA 5 Department of Astronomy, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA 6 Illinois Center for Advanced Studies of the Universe, University of Illinois at Urbana-Champaign, Urbana, Illinois, 61801 7 Cornell Center for Astrophysics and Planetary Science, Cornell University, Ithaca, New York 14853, USA 8 International Centre for Theoretical Sciences, Tata Institute of Fundamental Research, Bangalore 560089, India 9 Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA (Dated: March 11, 2021) We introduce deep learning models for gravitational wave parameter estimation that combine a modified WaveNet architecture with constrastive learning and normalizing flow. To ascertain the statistical consistency of these models, we validated their predictions against a Gaussian conjugate prior family whose posterior distribution is described by a closed analytical expression. Upon confirming that our models produce statistically consistent results, we used them to estimate the astrophysical parameters of five binary black holes: GW150914, GW170104, GW170814, GW190521 and GW190630. Our findings indicate that our deep learning approach predicts posterior distributions that encode physical correlations, and that our data-driven median results and 90% confidence intervals are consistent with those obtained with gravitational wave Bayesian analyses. This methodology requires a single V100 NVIDIA GPU to produce median values and posterior distributions within two milliseconds for each event. This neural network, and a tutorial for its use, are available at the Data and Learning Hub for Science. I. INTRODUCTION The advanced LIGO [1, 2] and advanced Virgo [3] observatories have reported the detection of over fifty gravitational wave sources [46]. At design sensitivity, these instruments will be able to probe a larger volume of space, thereby increasing the detection rate of sources populating the gravitational wave spectrum. Thus, given the expected scale of gravitational wave discovery in up- coming observing runs, it is in order to explore the use of computationally efficient signal-processing algorithms for gravitational wave detection and parameter estimation. The rationale to develop scalable and computationally efficient signal-processing tools is apparent. Advanced gravitational wave detectors will be just one of many large- scale science programs that will be competing for access to oversubscribed and finite computational resources [710]. Furthermore, transformational breakthroughs in Multi- Messenger Astrophysics over the next decade will be enabled by combining observations in the gravitational, electromagnetic and astro-particle spectra. The combi- nation of these high dimensional, large volume and high speed datasets in a timely and innovative manner presents unique challenges and opportunities [1113]. The realization that companies such as Google, YouTube, among others, have addressed some of the big-data challenges we are facing in Multi-Messenger As- trophysics has motivated a number of researchers to learn what these companies have done, and how such innovation may be adapted in order to maximize the science reach of big-data projects. The most successful approach to date consists of combining deep learning with innovative and extreme scale computing. Deep learning was first proposed as a novel signal- processing tool for gravitational wave astrophysics in [14]. That initial approach considered a 2-D signal manifold for binary black hole mergers, namely the masses of the binary components (m 1 ,m 2 ), and considered simulated advanced LIGO noise. The fact that such method was as sensitive as template-matching algorithms, but at a fraction of the computational cost and orders of mag- nitude faster, provided sufficient motivation to extend such methodology and apply it to detect real gravita- tional wave sources in advanced LIGO noise in [15, 16]. These studies have sparked the interest of the gravita- tional wave community to explore the use of deep learning for the detection of the large zoo of gravitational wave sources [1738]. Deep learning methods have matured to now cover a 4D signal manifold that describes the masses of the binary components and the z-component of the 3-D spin vec- tor: (m 1 ,m 2 ,s z 1 ,s z 2 )[39, 40]. These algorithms have been used to search for and find gravitational wave sources pro- cessing open source advanced LIGO data in bulk, which is available at the Gravitational Wave Open Science Center [41]. In the context of Multi-Messenger sources, deep learning has been used to forecast the merger of binary neutron stars and black hole-neutron star sys- tems [37]. The importance of including eccentricity for deep learning forecasting has also been studied and quan- tified [38]. In brief, deep learning research is moving at an incredible pace. Another application area that has gained traction is the use of deep learning for gravitational wave parameter arXiv:1903.01998v3 [gr-qc] 10 Mar 2021

Upload: others

Post on 11-Dec-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Statistically-informed deep learning for gravitational wave parameter estimation

Hongyu Shen,1, 2, 3 E. A. Huerta,1, 3, 4, 5, 6 Eamonn O’Shea,7 Prayush Kumar,7, 8 and Zhizhen Zhao1, 2, 3, 9

1National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA2Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA3NCSA Center for Artificial Intelligence Innovation, University of Illinois at Urbana-Champaign, Urbana, Illinois, 61801

4Department of Physics, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA5Department of Astronomy, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA

6Illinois Center for Advanced Studies of the Universe, University of Illinois at Urbana-Champaign, Urbana, Illinois, 618017Cornell Center for Astrophysics and Planetary Science, Cornell University, Ithaca, New York 14853, USA

8International Centre for Theoretical Sciences, Tata Institute of Fundamental Research, Bangalore 560089, India9Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA

(Dated: March 11, 2021)

We introduce deep learning models for gravitational wave parameter estimation that combine amodified WaveNet architecture with constrastive learning and normalizing flow. To ascertain thestatistical consistency of these models, we validated their predictions against a Gaussian conjugateprior family whose posterior distribution is described by a closed analytical expression. Uponconfirming that our models produce statistically consistent results, we used them to estimate theastrophysical parameters of five binary black holes: GW150914, GW170104, GW170814, GW190521 andGW190630. Our findings indicate that our deep learning approach predicts posterior distributions thatencode physical correlations, and that our data-driven median results and 90% confidence intervalsare consistent with those obtained with gravitational wave Bayesian analyses. This methodologyrequires a single V100 NVIDIA GPU to produce median values and posterior distributions within twomilliseconds for each event. This neural network, and a tutorial for its use, are available at the Data

and Learning Hub for Science.

I. INTRODUCTION

The advanced LIGO [1, 2] and advanced Virgo [3]observatories have reported the detection of over fiftygravitational wave sources [4–6]. At design sensitivity,these instruments will be able to probe a larger volumeof space, thereby increasing the detection rate of sourcespopulating the gravitational wave spectrum. Thus, giventhe expected scale of gravitational wave discovery in up-coming observing runs, it is in order to explore the use ofcomputationally efficient signal-processing algorithms forgravitational wave detection and parameter estimation.

The rationale to develop scalable and computationallyefficient signal-processing tools is apparent. Advancedgravitational wave detectors will be just one of many large-scale science programs that will be competing for access tooversubscribed and finite computational resources [7–10].Furthermore, transformational breakthroughs in Multi-Messenger Astrophysics over the next decade will beenabled by combining observations in the gravitational,electromagnetic and astro-particle spectra. The combi-nation of these high dimensional, large volume and highspeed datasets in a timely and innovative manner presentsunique challenges and opportunities [11–13].

The realization that companies such as Google,YouTube, among others, have addressed some of thebig-data challenges we are facing in Multi-Messenger As-trophysics has motivated a number of researchers to learnwhat these companies have done, and how such innovationmay be adapted in order to maximize the science reach ofbig-data projects. The most successful approach to dateconsists of combining deep learning with innovative and

extreme scale computing.

Deep learning was first proposed as a novel signal-processing tool for gravitational wave astrophysics in [14].That initial approach considered a 2-D signal manifoldfor binary black hole mergers, namely the masses of thebinary components (m1, m2), and considered simulatedadvanced LIGO noise. The fact that such method wasas sensitive as template-matching algorithms, but at afraction of the computational cost and orders of mag-nitude faster, provided sufficient motivation to extendsuch methodology and apply it to detect real gravita-tional wave sources in advanced LIGO noise in [15, 16].These studies have sparked the interest of the gravita-tional wave community to explore the use of deep learningfor the detection of the large zoo of gravitational wavesources [17–38].

Deep learning methods have matured to now cover a4D signal manifold that describes the masses of the binarycomponents and the z-component of the 3-D spin vec-tor: (m1,m2, s

z1, s

z2) [39, 40]. These algorithms have been

used to search for and find gravitational wave sources pro-cessing open source advanced LIGO data in bulk, whichis available at the Gravitational Wave Open ScienceCenter [41]. In the context of Multi-Messenger sources,deep learning has been used to forecast the merger ofbinary neutron stars and black hole-neutron star sys-tems [37]. The importance of including eccentricity fordeep learning forecasting has also been studied and quan-tified [38]. In brief, deep learning research is moving atan incredible pace.

Another application area that has gained traction isthe use of deep learning for gravitational wave parameter

arX

iv:1

903.

0199

8v3

[gr

-qc]

10

Mar

202

1

2

estimation. The established approach to estimate theastrophysical parameters of gravitational wave signals isthrough Bayesian inference [42–45], which is a well testedand extensively used method, though computationally-intensive. On the other hand, given the scalability adncomputational efficiency of deep learning models, it isnatural to explore their applicability for gravitationalwave parameter estimation.

In this article we focus on the application of deep learn-ing to estimate in real-time the astrophysical parametersof binary black holes, focusing on the measurement ofthe masses of the binary components, (m1, m2), and theparameters that describe the post-merger remnant, i.e.,the final spin, af , and the frequency and damping time ofthe ringdown oscillations of the fundamental ` = m = 2bar mode, (ωR, ωI) (Also known as quasinormal modes(QNMs)) [46].

Related Work The first exploration of deep learningfor the detection and point-parameter estimation of a2-D signal manifold, comprising the masses (m1, m2) ofthe binary components, was presented in [14–16]. Theseresults provided a glimpse of the robustness and scalabilityof neural networks, and the ability to generalize to newtypes of signals that are not used during training. Theseresults provided sufficient motivation to further developthese tools to use them in the context of high-dimensionalsignal manifolds.

During the review of this article, there have been anumber of studies that use deep learning approaches tolearn the posterior distribution predicted by traditionalBayesian methods employed for gravitational wave pa-rameter estimation. For instance, Ref. [47] used neuralnetworks to process simulated gravitational waves injectedin Gaussian noise, and to output a parametrized approxi-mation of the corresponding posterior distribution. Othermethods have proposed the use of Conditional VariationalAuto-Encoders (CVAEs) to infer the parameters of GWsembedded in simulated noise [48–50]. In these studies, theposterior estimation is indirectly formed by an integrationof two probability functions that are learned after modeltraining. Instead of calculating the integration, they drawone sample from each of the two probability functionssequentially to approximate the posterior function. Itis apparent that this approach may work well for highsignal-to-noise ratio (SNR) signals, the context in whichthese methods have been applied. However, real gravita-tional wave events with moderate SNRs may present achallenge, since noisy data may introduce greater variancein the integration approximation, leading to a significantaccumulation of errors. Thus, further studies are neededto establish the applicability of this approach to recon-struct the parameters of detected events with moderateand low SNRs. To overcome some of these limitations, wedesigned a model that is able to make fast inference onevents that cover a broad network SNR range, between13 to 24, with a short period of training time.

We notice that while Refs. [51, 52] discuss similar ideas

to learn a joint distribution, the approach we introduce inthis article to produce a data-driven posterior distributionis entirely novel. Furthermore, while Refs. [51, 52] applyq(z), an arbitrary random distribution to their generativemodel, our posterior distributions do not involve arbitraryrandom distributions.

Although we recognize a similar idea that applies nor-malizing flow technique to characterize the distributionexits in the previous works [53, 54], our method stilldiffers from the recent study [53] in GW study, wherethe authors showcase their method for a single event,GW150914, and apply a data transformation of the origi-nal 8s-long waveform inputs during model training andinference. Our method, on the other hand, focuses on 1s-long time domain inputs. Furthermore, our method onlyoverlaps in that we both measure the masses of the binarycomponents. Their method is then used to measure otherastrophysical parameters, while we apply our method tomeasure for the first time the astrophysical parametersof the black hole remnant. We also demonstrate that ourproposed approach is able to generalize to other eventsfrom the first, second and third observing runs, even ifwe only use noise from the first observing run to train ourmodel. We provide a more detailed list of highlights ofthis deep learning methodology below.

Highlights of This Work Our novel methodology maybe summarized as follows

• We have designed new architectures and trainingschemes to demonstrate that deep learning providesthe means to reconstruct the parameters of binaryblack hole mergers in realistic astrophysical settings, i.e.,quasi-circular, spinning, non-precessing binary blackholes. This 4-D signal manifold marks the first timedeep learning models at scale are used for gravitationalwave data analysis, i.e., models trained using datasetswith tens of millions of waveforms to significantly reducethe training stage. Once fully trained, these deep learn-ing models can reconstruct in real-time the parametersof real gravitational wave sources.

• We guide the reader on the construction of the mathe-matical and statistical foundations of our deep learningmethod. Thereafter, we study in detail the predictionsof our methodology against well known statistical mod-els. This exercise establishes the statistical validity andconsistency of our approach.

• This methodology learns physical correlations duringthe training stage. As we show below, these known cor-relations emerge naturally in the data-driven posteriordistributions predicted by our model.

• Our approach is domain agnostic. It can be readilyapply to other time-series datasets.

• We trained a single model using open source ad-vanced LIGO noise from the first observing run. Thismodel was able to generalize to five events GW150914,GW170104, GW170814, GW190521 and GW190630 with-out additional fine-tuning. It is remarkable that theseevents were observed in different observing runs, and

3

cover a broad SNR range.

• We have directly compared our data-driven posteriordistributions with those produced by the Bayesianpipeline PyCBC Inference [45, 55, 56]. Though thesetwo methodologies are entirely different, we find thattheir predictions are consistent.

• Our model only requires 1s-long modeled signals toperform inference, which reduces data storage, andminimizes training time.

This article is organized as follows. In Section II wedescribe the architecture of our neural network model,and the datasets used to train, validate and test it. Webriefly describe the Bayesian inference pipeline, PyCBCInference, in Section III, which we used as a baselineto compare the full posterior distributions predicted byour deep learning model. We quantify the accuracy andphysical consistency of the predictions of our deep learningmodel for several gravitational wave sources in Section IV.We summarize our findings and future directions of workin Section V.

II. METHODS

In this section we describe our methodology for param-eter estimation, and introduce several new techniques toimprove the training performance and model accuracy.We have used PyTorch [57] to design, train, validate andtest the neural network models presented in this section

A. Deep Learning Model Objective

We briefly introduce the observation model, followed byour deep learning model formulation and implementationdetails during the evaluation with real gravitational waveevents.

The goal of our deep learning model is to estimate a pos-terior distribution of the physical parameters of the wave-forms from the input noisy data. This approach sharessimilarities with Bayesian approaches such as MCMC,e.g., once a likelihood function and a predefined prior areprovided, posterior samples may be drawn. The differencebetween the deep learning model and MCMC is that ourproposed framework will learn a distribution from whichwe can easily draw samples, thereby increasing compu-tational efficiency significantly. It is worth emphasizingthat once the likelihood model is properly defined, theframework we introduce here may be applicable to otherdisciplines.

In the context of gravitational waves, the noisy wave-form y is generated according to the following physicalmodel,

yi,` = F (xi) + ni,`, (1)

where F is the function that maps the physical parameters(masses and spins) xi to the clean waveform [45, 55, 56],

and ni,` denotes the additive noise at various signal-to-noise ratios (SNR). We use yi,` with subscript pair (i, `)to specifically indicate the i-th template associated with`-th noise realization in our dataset D. For simplicity, weuse y and x to indicate noisy waveforms and the physicalparameters when the specification of i or ` subscript isnot needed. We use K and M to denote the dimensionof y and x, respectively.

We use WaveNet [58] to extract features from the inputnoisy waveforms. WaveNet was first introduced as anaudio synthesis tool to generate human-like audios givenrandom inputs. It uses dilated convolutional kernel andresidual network to capture the spatial information bothin the time domain and the model depth, which hasbeen shown to be a powerful tool in model time-seriesdata. Previously, [59, 60] tailored this architecture forgravitational wave denoising and detection. The encodedfeature vector h ∈ RL comes from an embedding functionparameterized by the WaveNet weights ω, fω : y 7→ h. Inother words, h = fω(y).

Normalizing flow is a technique to transform distribu-tions with invertible parameterized functions. Specifically,we use a conditional version of normalizing flow: con-ditional autoregressive spline [61–63] to learn the poste-rior distribution on top of the encoded latent space byWaveNet encoding. The actual implementation of condi-tional autoregressive spline [64, 65] in the probabilisticprogramming language Pyro [63]. The invertible functiong(h,θ) : z 7→ x is parameterized by the learnable modelweights θ and the encoded feature h. In this way, weencode dependencies of the posterior distribution on theinput y. The random vector z ∈ RM is drawn accordingto a pre-defined base distribution p(z), and has the samedimension as x. The function g(h,θ)(z) is then used toconvert the base distribution p(z) to the approximatedposterior distribution pω,θ(x|y) of the physical parame-ters,

pω,θ(x|y) = p(z)

∣∣∣∣det

(∂g(h,θ)(z)

∂z

)∣∣∣∣−1 , (2)

with h = fω(y).The computation of the transformation g(h,θ)(z) con-

tains two steps. The first step is to compute the interme-diate coefficients α from the feature vector h based on thefunction kθ, which is parameterized by 2 fully connectedlayers with weights denoted as θ, i.e., α = kθ(h). Thecoefficients α are used to combine the invertible linearrational splines to form g(h,θ) (see Eq. (5) in [62] fordetails). Therefore, g(h,θ) is an invertible, element-wiseinvertible linear rational spline with coefficients α. Sinceh depends on the input waveform y and α = kθ(h), theresulting mapping g(h,θ) and parameterized distributionin Eq. (2) vary with the input y. The parameterizationof the estimated posterior distribution is illustrated inFig. 1.

To learn the network weights, we need to construct theempirical loss objective given the collection of trainingdata {xi, yi,`}. We propose to include a loss term defined

4

on the feature vectors in the our learning objective to takeinto account of the variation in the waveform due to noise.That is if the underlying physical parameters are similar,then the similarity of the feature vectors should be large,and vice versa. To achieve this, we use contrastive learningobjective [66] to distinguish positive data pairs (waveformswith the same physical parameters) from the negativepairs (noisy waveforms with different physical parameters).Specifically, we use the normalized temperature-scaledcross entropy (NT-Xent) loss used in the state-of-the-artcontrastive learning technique SimCLR [67, 68]. SimCLRwas originally introduced to improve the performance ofimage classification with additional data augmentationand NT-Xent loss evaluation. We adapt the NT-Xent lossused in contrastive learning to our feature vectors,

l(hi,j , hi,`) ≡ − logesim(hi,j ,hi,`)/τ∑

i′ 6=i∑2j=1

∑2`=1 e

sim(hi,j ,hi′,`)/τ,

(3)

where hi,· = fω(yi,·), τ ∈ (0,∞) is a scalar temperatureparameter, and we choose τ = 0.2 according to the defaultsetting provided in [67]. The NT-Xent loss performs insuch a way that, regardless of the noise statistics, thecosine distances of the encoded features associated withsame underlying physical parameters (i.e. hi,j and hi,`)are minimized, and the distances of features with differentunderlying physical parameters are maximized. Conse-quently, the trained model is robust to the change of noiserealizations and noise statistics. Therefore, incorporatingthe term in Eq. (3) can be used as a noise stabilizer forgravitational wave parameter estimation. We found thatthe inclusion of this term speeds up the convergence intraining.

Our deep learning objective in Eq. (4) combines theNT-Xent loss in Eq. (3) with the posterior approximationterm. Given a batch of B physical parameters xi, wegenerate different noise realizations yi,` for each xi andthe empirical loss function is,

L(ω, θ) =1

2B

B∑i=1

[−

2∑`=1

log pω,θ(xi|yi,`)

+

2∑`=1, ` 6=j

2∑j=1

l(fω(yi,j), fω(yi,`))

, (4)

where pω,θ(xi|yi,`) is defined in Eq. (2). Minimizing theloss in Eq. (4) with respect to ω and θ provides a posteriorestimation for gravitational wave events.

B. Separate Models for Parameters

In this paper, we are interested in the following physicalparameters: (m1,m2, af , ωR, ωI). We find that trying toestimate all parameters using a single model lead to sub-optimal results given that they are of different scales.

FIG. 1. Model Architecture. The first component of ourmodel is a WaveNet architecture with 11 blocks, whose inputis a 1s-long waveform sampled at 4096Hz, denoted by y. Theoutput of the WaveNet modulo is a 254 dimensional vector, h,that is fed into a Normalizing Flow modulo, which is then com-bined with a base distribution, p(z), to provide the posteriordistribution estimation pω,θ(x|y). z represents the randomvariable for the base distribution, and x represents the physicalparameters of the binary black hole mergers, respectively.

Thus, we use two separate models with similar modelarchitecture as shown in Fig. 1. One model is used toestimate the masses (m1,m2) of the binary components,while the other one is used to infer the final spin (af ) andQNMs (ωR, ωI) of the remnant.

The final spin of the remnant and its QNMs have similarrange of values when the QNMs are cast in dimensionlessunits. We trained the second model using the fact thatthe QNMs are determined by the final spin af using therelation [46]:

ω220 (af ) = ωR + i ωI , (5)

where (ωR, ωI) correspond to the frequency and dampingtime of the ringdown oscillations for the fundamental` = m = 2 bar mode, and the first overtone n = 0. Wecompute the QNMs following [46]. One can translateωR into the ringdown frequency (in units of Hertz) andωI into the corresponding (inverse) damping time (inunits of seconds) by computing Mf · ω220, where Mf isthe final mass of the remnant, and can be determinedusing Eq. (1) in [69]. An additional benefit of usingtwo separate models is that the training converges fasterwith two models considering two different sets of physicalparameters at different magnitudes.

C. Dataset Preparation and Training

Modeled Waveforms We used the surrogate wave-form family [70] to produce modeled waveforms that de-

5

scribe binary black holes with component masses m1 ∈[10M�, 80M�], m2 ∈ [10M�, 50M�], and spin compo-nents sz{1, 2} ∈ [−0.9, 0.9]. By uniformly sampling this

parameter space we produce a dataset with 1,061,023waveforms. These waveforms describe the last second ofthe late inspiral, merger, and ringdown. The waveformsare produced using a sample rate of 4096Hz.

For training purposes, we label the waveforms usingthe masses and spins of the binary components, and thenuse this information to also enable the neural net toestimate the final spin of the black hole remnant usingthe formulae provided in [71], and the quasinormal modesof the ringdown following [46]. In essence, we are trainingour neural network models to identify the key featuresthat determine the properties of the binary black holesbefore and after merger using a unified framework.

We use 90% of these waveform samples for training,10% testing. The training samples are randomly and uni-formly chosen. Throughout the training, we use AdamWoptimizer to minimize the mean squared error of thepredicted parameters with default hyper-parameter se-tups [72]. We choose the learning rate to be 0.0001. Tosimulate the environment where the true gravitationalwaves are embedded, we use real advanced LIGO noiseto compute power spectral density (PSD), which is thenused to whiten the templates.

Advanced LIGO noise. For training we used a4096s-long advanced LIGO noise data segment, sampledat 4096Hz, starting at GPS time 1126259462. We ob-tained these data from the Gravitational Wave OpenScience Center [41]. We estimate a PSD using the en-tire 4096s segment to whiten the modeled waveforms andnoise. For each one second long noisy waveform used intraining, we combine the clean whitened template witha randomly picked one second long noise segment fromthe 4096s-long advanced LIGO strain data. For eachclean template, we generate two different noisy realiza-tions. The number of different noisy waveforms used inthe training is equal to the number of training iterationsmultiplied by two times the batch size.

In Section IV, we demonstrate that our model, trainedonly with advanced LIGO noise from the first observingrun, is able to estimate the astrophysical parameters ofother events across O1-O3. We fixed the merger pointof the training templates at the 3,596th timestep out of4,096 total timesteps. We empirically found having a fixedmerger point, rather than shifting the templates to havetime-invariant property, provides a tighter estimation ofthe posteriors. Our deep learning model was trained on 1NVIDIA V100 GPU with a batch size of 8. In general, ittakes about 1-2 days to fully train this model.

D. GPS Trigger Time

It is known that a trigger GPS time associated with agravitational wave event, typically provided by a detectionalgorithm, may differ from the true time of coalescence.

Therefore, we perform a local search around the triggertime by any given detection algorithm as a preprocess-ing step for the parameter estimation using the trainedmodel. We first identify local merger time candidates byevaluating the normalized cross-correlation (NCC) of thewhitened observation with 33,713 whitened clean tem-plates, whose physical parameters uniformly cover therange: m1 ∈ [10M�, 80M�], m2 ∈ [10M�, 50M�], andsz{1, 2} ∈ [−0.9, 0.9], over a time window of 0.015 sec-

onds around the time candidates. The time points withtop NCC values are selected as the candidates. Thenwe use the trained models to estimate the posterior dis-tributions of the physical parameters at each candidatetime point. In practice, we found that the trigger timeswith the best NCC values differ from those published atthe Gravitational Wave Open Science Center by upto 0.01s. These trigger times produce different posteriordistributions that vary in size by up to ±1M� for themasses of the binary components, and up to 5% for theastrophysical properties of the compact remnant. Wehave selected the time point that gives the smallest 90%confidence area for the results we present in Section IV B.

III. BAYESIAN INFERENCE

We compare our data-driven posterior estimation withPyCBC Inference [45, 55, 56], which uses a parallel-tempered MCMC algorithm, emcee pt [73], to evaluatethe posterior probability p(x|y) for the set of source pa-rameters x given the data y. The posterior is calculatedas p(x|y) ∝ p(y|x)p(x) where p(y|x) is the likelihood andp(x) is the prior. The likelihood function for a set of Ndetectors is

p(y|x) = exp

(−1

2

N∑i=1

〈yi(k)− si(k, x)|yi(k)− si(k, x)〉

),

(6)where yi(k) and si(k, x) are the frequency-domain rep-resentations of the data and the model waveform fordetector i. The inner product 〈·|·〉 is defined as

〈ai(k)|bi(k)〉 = 4R∫ ∞0

ai(k)bi(k)

P i(k)dk , (7)

where P i(k) is the PSD of the i-th detector.We performed the MCMC analysis using the publicly

available data from the GWTC-1 release [4] and usedthe corresponding publicly available PSD files for eachevent [74]. We analyse a segment of 8 seconds aroundthe GPS trigger 1167559935.6, with the data sampledto 2048 Hz. We use the IMRPhenomD [75] waveformmodel to generate waveform templates to evaluate thelikelihood. We assume uniform priors for the componentmasses with m{1,2} ∈ [10M�, 80M�) and uniform priorson the component spins with a{1,2} ∈ (−0.99, 0.99). Wealso set uniform priors on the luminosity distance withDL ∈ [10, 4000)Mpc and the deviation of the arrival time

6

from the trigger time −0.1 < ∆t < 0.1. We set uniformpriors for the coalescence phase and the polarization angleφc, ψ ∈ [0, 2π). The prior on the inclination angle betweenthe binary’s orbital angular momentum and the line ofsight, ι, is set to be uniform in the sine of the angle,and the right ascension and declination have priors to beuniform over the sky.

Furthermore, they may be used to cross validate thephysical reality of an event [39, 40], and to assess whetherthe estimated merger time is consistent between the twoseparate models. For instance, if the models output verydifferent merger times, then we may conclude that theyare not providing a reliable merger time. On the otherhand, when their results are consistent, within a windowbetween 0.001s and up to 0.005s, then we can removethe ambiguity introduced when using the NCC approachdescribed in Section II D.

IV. EXPERIMENTAL RESULTS

In this section we present two types of results. First,we validate our model with a well known statistical model.Upon confirming that our deep learning approach is sta-tistically consistent, we used to estimate the parametersof five binary black hole mergers.

A. Validation on Simulated Data

We performed experiments on simulated data that haveclosed form posterior distributions. This is important toascertain the accuracy and reliability of our method. Thesimulated data are generated through a linear observationmodel with additive white Gaussian noise,

y = Ax+ n, (8)

where the additive noise n ∼ N (0, σ2I). We considerthe underlying parameters x ∈ RM and the linear mapA ∈ RK×M , with M = 2 and K = 5. The likelihoodfunction is

p(y|x) =1

(√

2πσ)Kexp

(−‖y −Ax‖

2

2σ2

). (9)

If we assume the prior distribution of x is a Gaussiandistribution with mean 0 and covariance S, we can getan analytical expression for the posterior distribution ofx given the observation y,

p(x|y) = C exp

(−1

2(y − Σ−1b)TΣ−1(y − Σ−1b)

), (10)

where

C =1√

(2π)M |Σ|, Σ = S−1 +

1

σ2ATA, b =

1

σ2AT y .

During the training stage we draw 100 samples of x fromits prior p(x), and y is generated through the linear ob-servation model (8). We train a 3-layer model with themodel objective (4), and show three examples of the pos-terior estimation in Fig 2. Therein we show 50% and90% confidence contours. Black lines represent groundtruth results (ellipses given the posterior is Gaussian),while the red contours correspond to the neural networkestimations, based on Gaussian kernel density estimation(KDE) with 9,000 samples generated from the network.These results indicate that our deep learning model canproduce reliable and statistically valid results.

B. Results with Real Events

In this section we use our deep learning models toestimate the medians and posterior distributions of theastrophysical parameters (m1,m2) and (af , ωR, ωI), re-spectively, for five binary black hole mergers: GW150914,GW170104, GW170814, GW190521 and GW190630.

As described in Section II A, we consider 1s-long ad-vanced LIGO noise input data batches, denoted as y,sampled at 4096Hz. We construct two posterior distri-bution estimations, pω,θ(x|y), by minimizing the loss inEq. (4) for (m1,m2) and for (af , ωR, ωI). We use two dif-ferent multivariate normal base distributions for p(z) inthe two different models. To estimate the masses of the bi-nary components, the mean and covariance matrix (µ,Σ)are: µ = (30, 30),Σ = diag(5, 5); whereas for the finalspin and QNMs model we use: µ = (0.5, 0.55, 0.07),Σ =diag(0.05, 0.03, 0.002). “diag(·)” refers to the diagonalmatrix with “·” being the diagonal elements. The numberof normalizing flow layers also varies for the two models.We use a 3-layer normalizing flow module for masses pre-diction, and an 8-layer module for the predictions of finalspin and QNMs.

Our first set of results is presented in Figures 3, 4,and 5. These figures provide the median, and the 50% and90% confidence intervals, which we computed using Gaus-sian KDE estimation with 9,000 samples drawn from theestimated posteriors. In Tables I and II we also presenta summary of our data-driven median results and 90%confidence intervals, along with those obtained with tradi-tional Bayesian algorithms in [4, 76]. Before we presentthe main highlights of these results, it is important toemphasize that our results are entirely data-driven. Wehave not attempted to use deep learning as a fast inter-polator that learns the properties of traditional Bayesianposterior distributions. Rather, we have allowed deeplearning to figure out the physical correlations among dif-ferent parameters that describe the physics of black holemergers. Furthermore, we have quantified the statisticalconsistency of our approach by validating it against a wellknown model. This is of paramount importance, sincedeep learning models may be constructed to reproducethe properties of traditional Bayesian distributions, butthat fact does not provide enough evidence of their sta-

7

3 2 1 0 1 2 3 4x1

2

1

0

1

2

3

x 2TrueApproximation

(a)

3 2 1 0 1 2 3 4x1

2

1

0

1

2

3

x 2

TrueApproximation

(b)

3 2 1 0 1 2 3 4x1

2

1

0

1

2

3

x 2

TrueApproximation

(c)

FIG. 2. Comparison of posterior distributions produced by our deep learning model (red contours) and a Gaussian conjugateprior family whose posterior distribution (black contours) is given by a closed analytical model. These data-driven predictionsfor the 50% and 90% confidence contours are in agreement with expected statistical results.

TABLE I. Data-driven and Bayesian results [4, 76] for the median and 90% confidence intervals of the masses of five binaryblack hole mergers. The network signal-to-noise ratio (SNRs) for each event are also provided for reference.

Event Name Our model LIGO [4, 76] SNR

m1[M�] m2[M�] m1[M�] m2[M�]

GW150914 38.85+6.90−4.15 31.20+4.39

−5.94 35.60+4.70−3.10 30.60+3.00

−4.40 24.4

GW170104 28.90+6.55−3.80 22.75+3.73

−5.14 30.80+7.80−5.60 20.00+4.90

−4.60 13.0

GW170814 33.92+9.14−5.27 24.31+4.13

−5.46 30.60+5.60−5.30 25.20+2.80

−4.00 15.9

GW190521 46.10+8.77−6.61 33.74+6.68

−8.47 42.10+5.90−4.90 32.70+5.40

−6.20 14.4

GW190630 34.00+7.19−4.43 26.17+4.54

−5.86 35.00+6.90−5.70 23.60+5.20

−5.10 15.6

TABLE II. Data-driven and Bayesian results [4, 76] for the median and 90% confidence intervals of the final spin of five binaryblack hole mergers. Results for the frequencies of the ringdown oscillations, (ωI , ωR), are directly measure by our model fromadvanced LIGO’s strain data, whereas the results quoted for LIGO are estimated using af values from [4, 76] and Eq. (5) [77].

Event Name Our model LIGO [4, 76]

af ωR ωI af ωR ωI

GW150914 0.71+0.06−0.07 0.536+0.028

−0.029 0.0805+0.0023−0.0026 0.69+0.05

−0.04 0.528+0.016−0.023 0.0811+0.0021

−0.0013

GW170104 0.69+0.06−0.07 0.530+0.028

−0.030 0.0810+0.0023−0.0025 0.66+0.08

−0.11 0.515+0.036−0.033 0.0821+0.0026

−0.0030

GW170814 0.68+0.06−0.09 0.525+0.028

−0.032 0.0815+0.0024−0.0024 0.72+0.07

−0.05 0.541+0.037−0.022 0.0800+0.0018

−0.0037

GW190521 0.73+0.05−0.06 0.548+0.029

−0.028 0.0795+0.0024−0.0029 0.72+0.05

−0.07 0.552+0.026−0.030 0.0800+0.0025

−0.0025

GW190630 0.71+0.06−0.07 0.535+0.028

−0.030 0.0806+0.0024−0.0026 0.70+0.06

−0.07 0.532+0.030−0.028 0.0808+0.0037

−0.0022

8

20 30 40 50 60 70m1 [M ]

10

15

20

25

30

35

40

45

50

55

m2[M

]

90%

50%

True: [m1, m2]

(a) GW150914

20 30 40 50 60 70m1 [M ]

10

15

20

25

30

35

40

45

50

55

m2[M

]

90%50%

True: [m1, m2]

(b) GW170104

20 30 40 50 60 70m1 [M ]

10

15

20

25

30

35

40

45

50

55

m2[M

]

90%50%

True: [m1, m2]

(c) GW170814

20 30 40 50 60 70m1 [M ]

10

15

20

25

30

35

40

45

50

55m

2[M

] 90%50%

True: [m1, m2]

(d) GW190521

20 30 40 50 60 70m1 [M ]

10

15

20

25

30

35

40

45

50

55

m2[M

]

90%

50%

True: [m1, m2]

(e) GW190630

FIG. 3. Data-driven posterior distributions, including 50% and 90% confidence regions, for the masses of black hole mergers.

9

0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85af

0.46

0.48

0.50

0.52

0.54

0.56

0.58

0.60

0.62

R

90%

50%

True: [af, I]

(a) GW150914

0.50 0.55 0.60 0.65 0.70 0.75 0.80af

0.46

0.48

0.50

0.52

0.54

0.56

0.58

0.60

R

90%

True: [af, I]

(b) GW170104

0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80af

0.44

0.46

0.48

0.50

0.52

0.54

0.56

0.58

R

90%

True: [af, I]

(c) GW170814

0.55 0.60 0.65 0.70 0.75 0.80af

0.48

0.50

0.52

0.54

0.56

0.58

0.60

0.62R

90%

True: [af, I]

(d) GW190521

0.50 0.55 0.60 0.65 0.70 0.75 0.80af

0.46

0.48

0.50

0.52

0.54

0.56

0.58

R

90%

True: [af, I]

(e) GW190630

FIG. 4. Data-driven posterior distributions, including 50% and 90% confidence regions, for (af , ωR) of black hole mergers.

10

0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85af

0.072

0.074

0.076

0.078

0.080

0.082

0.084I

90%

50%

True: [af, I]

(a) GW150914

0.50 0.55 0.60 0.65 0.70 0.75 0.80af

0.074

0.076

0.078

0.080

0.082

0.084

0.086

I

90%

50%

True: [af, I]

(b) GW170104

0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80af

0.076

0.078

0.080

0.082

0.084

0.086

I

90%

50%

True: [af, I]

(c) GW170814

0.55 0.60 0.65 0.70 0.75 0.80af

0.072

0.074

0.076

0.078

0.080

0.082

0.084I

90%

50%

True: [af, I]

(d) GW190521

0.50 0.55 0.60 0.65 0.70 0.75 0.80af

0.076

0.078

0.080

0.082

0.084

0.086

I

90%

50%

True: [af, I]

(e) GW190630

FIG. 5. Data-driven posterior distributions, including 50% and 90% confidence regions, for (af , ωI) of real black hole mergers.

11

0.0 0.2 0.4 0.6 0.8 1.0p

0.0

0.2

0.4

0.6

0.8

1.0CD

F(p)

m1m2af

R

I

Ref.

FIG. 6. P-P plot comparing the posterior distributions es-timated by the neural network model for five astrophysicalparameters (m1,m2, af , ωR, ωI).

tistical validity or consistency. Finally, given the natureof the signal processing tools and computing approacheswe use in this study, we do not expect our data-driven re-sults to exactly reproduce the traditional Bayesian resultsreported in [4, 76].

Our results may be summarized as follows. Figures 3,4, and 5 show that our data-driven posterior distributionsencode expected physical correlations for the masses ofthe binary components, (m1,m2), and the parametersof the remnant: (af , ωR) and (af , ωI). We also learnthat these posterior distributions are determined by theproperties of the noise and loudness of the signal thatdescribes these events. Figure 3 presents a direct com-parison between the posterior distributions predicted byour deep learning models and those produced with PyCBCInference—marked with dashed lines. These resultsshow that our deep learning models provide real-time,reliable information about the astrophysical propertiesof binary black hole mergers that were detected in threedifferent observing runs, and which span a broad SNRrange.

On the other hand, Tables I and II show that our me-dian and 90% confidence intervals are better, similar andin some cases slightly larger than those obtained withBayesian algorithms. In these Tables, Bayesian LIGO re-sults for af are directly taken from [4, 76], while (ωR, ωI)results are computed using their Bayesian results for afand the tables available at [78]. These results indicatethat deep learning methods can learn physical correla-tions in the data, and provide reliable estimates of theparameters of gravitational wave sources. To demonstratethat our model represents true statistical properties of the

posterior distribution, we tested the posterior estimationon simulated noisy gravitational waveforms. We calculatethe empirical cumulative distribution function (CDF) ofthe number of times the true value for each parameterwas found within a given confidence interval p, as a func-tion of p. We compare the empirical CDF with the trueCDF of p in the P-P plot in Figure 6. To obtain theempirical CDF, for each test waveform (1000 waveformsin total) and one-dimensional estimated posterior distri-bution generated from the network with 9,000 samples,we record the count of the confidence intervals p (p=1% ,. . . , 100%) where the true parameters fall. The empiricalCDF is based on the frequency of such counts with the1000 waveforms randomly drawn from the test dataset.Since the empirical CDFs lie close to the diagonal, weconclude that the networks generate close approximationof the posteriors. Furthermore, our data-driven results,including medians and posterior distributions, can beproduced within 2 milliseconds per event using a singleNVIDIA V100 GPU. We expect that these tools will pro-vide the means to assess in real-time whether the inferredastrophysical parameters of the binary components andthe post-merger remnant adhere to general relativisticpredictions. If not, these results may prompt follow upanalyses to investigate whether apparent discrepancies aredue to poor data quality or other astrophysical effects [79].

The reliable astrophysical information inferred in low-latency by deep learning algorithm warrants the extensionof this framework to characterize other sources, includ-ing eccentric compact binary mergers, and sources thatrequire the inclusion of higher-order waveform modes.Furthermore, the use of physics-inspired deep learning ar-chitectures and optimization schemes [29] may enable anaccurate measurement of the spin of binary components.These studies should be pursued in the future.

V. CONCLUSION

We designed neural networks to estimate five parame-ters that describe the astrophysical properties of binaryblack holes before and after the merger event. The firsttwo parameters constrain the masses of the binary com-ponents, while the others estimate the properties of theblack hole remnant, namely (m1,m2, af , ωR, ωI). Thesemodels combine a WaveNet architecture with normaliz-ing flow and contrastive learning to provide statisticallyconsistent estimates for both simulated distributions, andreal gravitational wave sources.

Our findings indicate that deep learning can abstractphysical correlations in complex data, and then providereliable predictions for the median and 90% confidenceintervals for binary black holes that span a broad SNRrange. Furthermore, while these models were trainedusing only advanced LIGO noise from the first observingrun, they were capable of generalizing to binary blackholes that were reported during the first, second and thirdobserving runs.

12

These models will be extended in future work to provideinformative estimates for the spin of the binary compo-nents, including higher-order waveform modes to bettermodel the physics of highly spinning and asymmetricmass-ratio black hole systems.

VI. ACKNOWLEDGEMENTS

Datasets used in this study are available from the corre-sponding author on reasonable request. Neural networkmodels are available at the Data and Deep LearningHub for Science [80, 81]. EAH, HS and ZZ gratefullyacknowledge National Science Foundation (NSF) awardsOAC-1931561 and OAC-1934757. EOS and PK grate-fully acknowledge NSF grants PHY-1912081 and OAC-193128, and the Sherman Fairchild Foundation. PK alsoacknowledges the support of the Department of AtomicEnergy, Government of India, under project no. RTI4001.This work utilized the Hardware-Accelerated Learning(HAL) cluster, supported by NSF Major Research Instru-mentation program, grant OAC-1725729, as well as theUniversity of Illinois at Urbana-Champaign. Computeresources were provided by XSEDE using allocation TG-PHY160053. This work made use of the Illinois CampusCluster, a computing resource that is operated by theIllinois Campus Cluster Program (ICCP) in conjunctionwith the National Center for Supercomputing Applica-tions and which is supported by funds from the Universityof Illinois at Urbana-Champaign. This research used re-sources of the Argonne Leadership Computing Facility,

which is a DOE Office of Science User Facility supportedunder Contract DE-AC02-06CH11357. This research alsomade use of LIGO Data Grid clusters at the CaliforniaInstitute of Technology. This research used data, softwareand/or web tools obtained from the LIGO Open ScienceCenter (https://gw-openscience.org), a service of LIGOLaboratory, the LIGO Scientific Collaboration and theVirgo Collaboration. LIGO is funded by the U.S. Na-tional Science Foundation. Virgo is funded by the FrenchCentre National de Recherche Scientifique (CNRS), theItalian Istituto Nazionale della Fisica Nucleare (INFN)and the Dutch Nikhef, with contributions by Polish andHungarian institutes.

VII. CONTRIBUTION

EAH envisioned this study, directed the constructionof the data sets used to train/validate/test the neuralnetwork models, and advised the design of physically in-formed neural network models. HS designed the model,and carried out the training and evaluation of this modelusing the deep learning HAL cluster at NCSA, andXSEDE’s Bridges-AI. AI-driven inference results werealso obtained using HAL and Bridges-AI. The Bayesianparameter estimation results presented in this articlewere produced by EOS and PK using PyCBC Inference.ZZ supervised on the study and the construction of themodel. All co-authors contributed to drafting and editingthe manuscript.

[1] B. P. Abbott, R. Abbott, T. D. Abbott, M. R. Abernathy,F. Acernese, K. Ackley, C. Adams, T. Adams, P. Addesso,R. X. Adhikari, and et al., Physical Review Letters 116,131103 (2016), arXiv:1602.03838 [gr-qc].

[2] J. Aasi, B. Abbott, R. Abbott, T. Abbott, M. Abernathy,K. Ackley, C. Adams, T. Adams, P. Addesso, R. Adhikari,et al., Classical and quantum gravity 32, 074001 (2015).

[3] F. Acernese et al., Classical and Quantum Gravity 32,024001 (2015), arXiv:1408.3978 [gr-qc].

[4] B. P. Abbott et al. (LIGO Scientific Collaboration andVirgo Collaboration), Phys. Rev. X 9, 031040 (2019).

[5] R. Abbott et al. (LIGO Scientific, Virgo), (2020),arXiv:2010.14527 [gr-qc].

[6] R. Abbott et al. (LIGO Scientific, Virgo), (2020),arXiv:2010.14533 [astro-ph.HE].

[7] E. Huerta, R. Haas, E. Fajardo, D. S. Katz, S. Anderson,P. Couvares, J. Willis, T. Bouvet, J. Enos, W. T. Kramer,et al., in 2017 IEEE 13th International Conference one-Science (e-Science) (IEEE, 2017) pp. 335–344.

[8] E. Huerta, R. Haas, S. Jha, M. Neubauer, and D. S. Katz,Computing and Software for Big Science 3, 5 (2019).

[9] D. Weitzel, B. Bockelman, D. A. Brown, P. Couvares,F. Wurthwein, and E. F. Hernandez, in Proceedings of thePractice and Experience in Advanced Research Computing2017 on Sustainability, Success and Impact (2017) pp.

1–6.[10] S. Liang, Y. Liu, C. Wang, and L. Jian, in 2010 IEEE

2nd Symposium on Web Society (2010) pp. 53–60.[11] G. Allen, W. Anderson, E. Blaufuss, J. S. Bloom, P. Brady,

S. Burke-Spolaor, S. B. Cenko, A. Connolly, P. Cou-vares, D. Fox, A. Gal-Yam, S. Gezari, A. Goodman,D. Grant, P. Groot, J. Guillochon, C. Hanna, D. W.Hogg, K. Holley-Bockelmann, D. A. Howell, D. Kaplan,E. Katsavounidis, M. Kowalski, L. Lehner, D. Muthukr-ishna, G. Narayan, J. E. G. Peek, A. Saha, P. Shawhan,and I. Taboada, arXiv e-prints , arXiv:1807.04780 (2018),arXiv:1807.04780 [astro-ph.IM].

[12] G. Allen, I. Andreoni, E. Bachelet, G. B. Berriman, F. B.Bianco, R. Biswas, M. Carrasco Kind, K. Chard, M. Cho,P. S. Cowperthwaite, Z. B. Etienne, D. George, T. Gibbs,M. Graham, W. Gropp, A. Gupta, R. Haas, E. A. Huerta,E. Jennings, D. S. Katz, A. Khan, V. Kindratenko,W. T. C. Kramer, X. Liu, A. Mahabal, K. McHenry,J. M. Miller, M. S. Neubauer, S. Oberlin, J. Olivas,Alexander R., S. Rosofsky, M. Ruiz, A. Saxton, B. Schutz,A. Schwing, E. Seidel, S. L. Shapiro, H. Shen, Y. Shen,B. M. Sipocz, L. Sun, J. Towns, A. Tsokaros, W. Wei,J. Wells, T. J. Williams, J. Xiong, and Z. Zhao, arXiv e-prints , arXiv:1902.00522 (2019), arXiv:1902.00522 [astro-ph.IM].

13

[13] E. Huerta et al., Nature Rev. Phys. 1, 600 (2019),arXiv:1911.11779 [gr-qc].

[14] D. George and E. A. Huerta, Phys. Rev. D 97, 044039(2018), arXiv:1701.00008 [astro-ph.IM].

[15] H. Shen, D. George, E. Huerta, et al., APS 2018, L01(2018).

[16] D. George and E. Huerta, Physics Letters B 778, 64(2018).

[17] H. Gabbard, M. Williams, F. Hayes, and C. Mes-senger, Physical Review Letters 120, 141103 (2018),arXiv:1712.06041 [astro-ph.IM].

[18] V. Skliris, M. R. Norman, and P. J. Sutton, arXiv preprintarXiv:2009.14611 (2020).

[19] Y.-C. Lin and J.-H. P. Wu, arXiv preprintarXiv:2007.04176 (2020).

[20] H. Wang, S. Wu, Z. Cao, X. Liu, and J.-Y. Zhu, Phys. Rev.D 101, 104003 (2020), arXiv:1909.13442 [astro-ph.IM].

[21] H. Nakano, T. Narikawa, K.-i. Oohara, K. Sakai, H.-a.Shinkai, H. Takahashi, T. Tanaka, N. Uchikata, S. Ya-mamoto, and T. S. Yamamoto, Phys. Rev. D 99, 124032(2019), arXiv:1811.06443 [gr-qc].

[22] X. Fan, J. Li, X. Li, Y. Zhong, and J. Cao, Sci. ChinaPhys. Mech. Astron. 62, 969512 (2019), arXiv:1811.01380[astro-ph.IM].

[23] X.-R. Li, G. Babu, W.-L. Yu, and X.-L. Fan, Front.Phys. (Beijing) 15, 54501 (2020), arXiv:1712.00356 [astro-ph.IM].

[24] D. S. Deighan, S. E. Field, C. D. Capano, and G. Khanna,(2020), arXiv:2010.04340 [gr-qc].

[25] A. L. Miller et al., Phys. Rev. D 100, 062005 (2019),arXiv:1909.02262 [astro-ph.IM].

[26] P. G. Krastev, Phys. Lett. B 803, 135330 (2020),arXiv:1908.03151 [astro-ph.IM].

[27] M. B. Schafer, F. Ohme, and A. H. Nitz, Phys. Rev.D 102, 063015 (2020), arXiv:2006.01509 [astro-ph.HE].

[28] C. Dreissigacker and R. Prix, Phys. Rev. D 102, 022005(2020), arXiv:2005.04140 [gr-qc].

[29] A. Khan, E. Huerta, and A. Das, Phys. Lett. B 808,0370 (2020), arXiv:2004.09524 [gr-qc].

[30] C. Dreissigacker, R. Sharma, C. Messenger, R. Zhao,and R. Prix, Phys. Rev. D 100, 044009 (2019),arXiv:1904.13291 [gr-qc].

[31] W. Wei, E. A. Huerta, B. C. Whitmore, J. C. Lee, S. Han-non, R. Chandar, D. A. Dale, K. L. Larson, D. A. Thilker,L. Ubeda, M. Boquien, M. Chevance, J. M. D. Kruijssen,A. Schruba, G. A. Blanc, and E. Congiu, MNRAS 493,3178 (2020), arXiv:1909.02024 [astro-ph.GA].

[32] B. Beheshtipour and M. A. Papa, Phys. Rev. D 101,064009 (2020), arXiv:2001.03116 [gr-qc].

[33] V. Skliris, M. R. K. Norman, and P. J. Sutton, arXiv e-prints , arXiv:2009.14611 (2020), arXiv:2009.14611 [astro-ph.IM].

[34] S. Khan and R. Green, arXiv preprint arXiv:2008.12932(2020).

[35] A. J. K. Chua, C. R. Galley, and M. Vallisneri, Phys.Rev. Lett. 122, 211101 (2019).

[36] A. Rebei, E. A. Huerta, S. Wang, S. Habib, R. Haas,D. Johnson, and D. George, Phys. Rev. D100, 044025(2019), arXiv:1807.09787 [gr-qc].

[37] W. Wei and E. A. Huerta, arXiv e-prints ,arXiv:2010.09751 (2020), arXiv:2010.09751 [gr-qc].

[38] W. Wei, E. A. Huerta, M. Yun, N. Loutrel, R. Haas, andV. Kindratenko, arXiv e-prints , arXiv:2012.03963 (2020),arXiv:2012.03963 [gr-qc].

[39] E. A. Huerta, A. Khan, X. Huang, M. Tian, M. Leven-tal, R. Chard, W. Wei, M. Heflin, D. S. Katz, V. Kin-dratenko, D. Mu, B. Blaiszik, and I. Foster, arXiv preprintarXiv:2012.08545 (2020), arXiv:2012.08545 [gr-qc].

[40] W. Wei, A. Khan, E. A. Huerta, X. Huang, and M. Tian,Phys. Lett. B 812, 136029 (2021), arXiv:2010.15845 [gr-qc].

[41] M. Vallisneri, J. Kanner, R. Williams, A. Weinstein, andB. Stephens, Proceedings, 10th International LISA Sym-posium: Gainesville, Florida, USA, May 18-23, 2014,J. Phys. Conf. Ser. 610, 012021 (2015), arXiv:1410.4839[gr-qc].

[42] P. Graff, F. Feroz, M. P. Hobson, and A. Lasenby, MN-RAS 421, 169 (2012), arXiv:1110.2997 [astro-ph.IM].

[43] J. Veitch, V. Raymond, B. Farr, W. Farr, P. Graff, S. Vi-tale, B. Aylott, K. Blackburn, N. Christensen, M. Cough-lin, W. Del Pozzo, F. Feroz, J. Gair, C.-J. Haster,V. Kalogera, T. Littenberg, I. Mandel, R. O’Shaughnessy,M. Pitkin, C. Rodriguez, C. Rover, T. Sidery, R. Smith,M. Van Der Sluys, A. Vecchio, W. Vousden, and L. Wade,Phys. Rev. D 91, 042003 (2015), arXiv:1409.7215 [gr-qc].

[44] L. P. Singer and L. R. Price, Physical Review D 93,024013 (2016), arXiv:1508.03634 [gr-qc].

[45] C. M. Biwer, C. D. Capano, S. De, M. Cabero, D. A.Brown, A. H. Nitz, and V. Raymond, The Astro-nomical Society of the Pacific 131, 024503 (2019),arXiv:1807.10312 [astro-ph.IM].

[46] E. Berti, V. Cardoso, and C. M. Will, Phys. Rev. D 73,064030 (2006), arXiv:gr-qc/0512160.

[47] A. J. Chua and M. Vallisneri, Physical Review Letters124, 041102 (2020).

[48] H. Gabbard, C. Messenger, I. S. Heng, F. Tonolini,and R. Murray-Smith, arXiv preprint arXiv:1909.06296(2019), arXiv:1909.06296 [astro-ph.IM].

[49] S. R. Green, C. M. Simpson, and J. R. Gair, ArXivabs/2002.07656 (2020).

[50] T. S. Yamamoto and T. Tanaka, arXiv: General Relativityand Quantum Cosmology (2020).

[51] V. Dumoulin, I. Belghazi, B. Poole, A. Lamb, M. Ar-jovsky, O. Mastropietro, and A. C. Courville, ArXivabs/1606.00704 (2017).

[52] J. Donahue, P. Krahenbuhl, and T. Darrell, ArXivabs/1605.09782 (2017).

[53] S. R. Green and J. Gair, arXiv preprint arXiv:2008.03312(2020).

[54] A. Grover, M. Dhar, and S. Ermon, in AAAI (2018).[55] A. Nitz, I. Harry, D. Brown, et al., “gwastro/pycbc: Pycbc

release 1.16.4,” (2020).[56] W. D. Vousden, W. M. Farr, and I. Mandel, Monthly

Notices of the Royal Astronomical Society 455, 1919–1937(2015).

[57] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang,Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer,in NIPS-W (2017).

[58] A. van den Oord, S. Dieleman, H. Zen, K. Simonyan,O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, andK. Kavukcuoglu, arXiv preprint arXiv:1609.03499 (2016).

[59] W. Wei and E. Huerta, Physics Letters B 800, 135081(2020).

[60] W. Wei, A. Khan, E. Huerta, X. Huang, and M. Tian,arXiv preprint arXiv:2010.15845 (2020).

[61] C. Durkan, A. Bekasov, I. Murray, and G. Papamakarios,in Advances in Neural Information Processing Systems(2019) pp. 7511–7522.

14

[62] H. M. Dolatabadi, S. Erfani, and C. Leckie, arXiv preprintarXiv:2001.05168 (2020).

[63] E. Bingham, J. P. Chen, M. Jankowiak, F. Obermeyer,N. Pradhan, T. Karaletsos, R. Singh, P. Szerlip, P. Hors-fall, and N. D. Goodman, The Journal of Machine Learn-ing Research 20, 973 (2019).

[64] E. Bingham, J. P. Chen, M. Jankowiak, F. Obermeyer,N. Pradhan, T. Karaletsos, R. Singh, P. Szerlip, P. Hors-fall, and N. D. Goodman, Journal of Machine LearningResearch (2018).

[65] D. Phan, N. Pradhan, and M. Jankowiak, arXiv preprintarXiv:1912.11554 (2019).

[66] R. Hadsell, S. Chopra, and Y. LeCun, in 2006 IEEEComputer Society Conference on Computer Vision andPattern Recognition (CVPR’06), Vol. 2 (IEEE, 2006) pp.1735–1742.

[67] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton,in Thirty-seventh International Conference on MachineLearning (2020).

[68] T. Chen, S. Kornblith, K. Swersky, M. Norouzi, andG. E. Hinton, Advances in Neural Information ProcessingSystems 33 (2020).

[69] J. Healy and C. O. Lousto, Phys. Rev. D 95, 024037(2017).

[70] J. Blackman, S. E. Field, C. R. Galley, B. Szilagyi, M. A.Scheel, M. Tiglio, and D. A. Hemberger, Physical ReviewLetters 115, 121102 (2015), arXiv:1502.07758 [gr-qc].

[71] F. Hofmann, E. Barausse, and L. Rezzolla, Astrophys. J.825, L19 (2016).

[72] I. Loshchilov and F. Hutter, arXiv preprintarXiv:1711.05101 (2017).

[73] D. Foreman-Mackey, D. W. Hogg, D. Lang, and J. Good-man, The Astronomical Society of the Pacific 125, 306(2013), arXiv:1202.3665 [astro-ph.IM].

[74] “https://dcc.ligo.org/ligo-p1800370/public,” .[75] S. Khan, S. Husa, M. Hannam, F. Ohme, M. Purrer, X. J.

Forteza, and A. Bohe, Phys. Rev. D 93, 044007 (2016).[76] R. Abbott, T. Abbott, S. Abraham, F. Acernese, K. Ack-

ley, A. Adams, C. Adams, R. Adhikari, V. Adya, C. Af-feldt, et al., arXiv preprint arXiv:2010.14527 (2020).

[77] E. Berti, V. Cardoso, and M. Casals, Phys. Rev. D 73,024013 (2006), arXiv:gr-qc/0511111.

[78] E. Berti, A. Buonanno, and C. M. Will, Phys. Rev. D 71,084025 (2005), gr-qc/0411129.

[79] J. R. Gair, M. Vallisneri, S. L. Larson, and J. G. Baker,Living Rev. Rel. 16, 7 (2013), arXiv:1212.5575 [gr-qc].

[80] B. Blaiszik, L. Ward, M. Schwarting, J. Gaff, R. Chard,D. Pike, K. Chard, and I. Foster, MRS Communications9 (2019), 10.1557/mrc.2019.118.

[81] R. Chard, Z. Li, K. Chard, L. Ward, Y. Babuji,A. Woodard, S. Tuecke, B. Blaiszik, M. J. Franklin, andI. Foster, in 2019 IEEE International Parallel and Dis-tributed Processing Symposium (IPDPS) (2019) pp. 283–292.