data mining, modeling, and machine learning in...

25
Data Mining, Modeling, and Machine Learning in Astronomy Today: Finish CNNs and related PCA, basis functions, Fourier examples Reading: Backpropagation Assignment 3 due Thursday 21 March Questions? Projects A4523/A6523 Spring 2019 www.astro.cornell.edu/~cordes/A6523 1 Lecture 14

Upload: others

Post on 16-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Mining, Modeling, and Machine Learning in Astronomyhosting.astro.cornell.edu/~cordes/A6523/A4523-6523... · 2019. 3. 14. · Data Mining, Modeling, and Machine Learning in Astronomy

Data Mining, Modeling, and Machine Learning in Astronomy

• Today:• Finish CNNs and related • PCA, basis functions, Fourier examples

• Reading:• Backpropagation

• Assignment 3 due Thursday 21 March• Questions?

• Projects

A4523/A6523 Spring 2019 www.astro.cornell.edu/~cordes/A6523 1

Lecture 14

Page 2: Data Mining, Modeling, and Machine Learning in Astronomyhosting.astro.cornell.edu/~cordes/A6523/A4523-6523... · 2019. 3. 14. · Data Mining, Modeling, and Machine Learning in Astronomy

Project guidelines• Choose a topic of interest, learn and apply it• Preferably involve data analysis (simulated or real)• Can be related to research you are already doing or data from the instructor• Examples

• A method discussed in class, go into it more deeply• Topics related to course topics but not covered in any depth• Compare two or more methods applied to a specific problem

• Spectral analysis, time series modeling, image processing• Deconvolution methods• Neural networks and related• Cluster analysis• Nonlinear models + Markov Chain Monte Carlo, Genetic algorithms, simulated annealing …

• Devise a NN for a specific application; gauge performance• Simulation methods

• e.g. Population of objects – recovery of population parameters with biased data set• Acquire data using the Space Sciences radio telescope

• Fast-sampled time series (1D) à dynamic spectra (2D)• Obtain multiple data sets and run through a NN to classify features

A4523/A6523 Spring 2019 www.astro.cornell.edu/~cordes/A6523 2

Page 3: Data Mining, Modeling, and Machine Learning in Astronomyhosting.astro.cornell.edu/~cordes/A6523/A4523-6523... · 2019. 3. 14. · Data Mining, Modeling, and Machine Learning in Astronomy

Project guidelines

• Other examples• Find a paper in the literature (astronomical or other) and do an in-depth

analysis of their use of a NN (or some other method). • Critique their choice of architecture• Implement a reduced version of their NN and test it using simulated data (i.e. develop a

toy model so that it is feasible and it illustrates a key element of the published NN.

A4523/A6523 Spring 2019 www.astro.cornell.edu/~cordes/A6523 3

Page 4: Data Mining, Modeling, and Machine Learning in Astronomyhosting.astro.cornell.edu/~cordes/A6523/A4523-6523... · 2019. 3. 14. · Data Mining, Modeling, and Machine Learning in Astronomy

A4523/A6523 Spring 2019 www.astro.cornell.edu/~cordes/A6523 4

Astronomy&

Astrophysics

A&A 611, A97 (2018)https://doi.org/10.1051/0004-6361/201731106© ESO 2018

Deep learning approach for classifying, detecting and predicting

photometric redshifts of quasars in the Sloan Digital Sky Survey

stripe 82?

J. Pasquet-Itam1,2 and J. Pasquet3,4

1 LUPM UMR 5299 CNRS/UM, Université de Montpellier, CC 72, 34095 Montpellier Cedex 05, France2 CPPM, CNRS-IN2P3, Université Aix Marseille II, CC 907, 13288 Marseille Cedex 9, France

e-mail: [email protected] LIRMM UMR 5506 - team ICAR, Université de Montpellier, Campus St Priest, 34090 Montpellier, France4 LSIS UMR 7296, CNRS, ENSAM, Université de Toulon et Aix-Marseille, Bâtiment Polytech, 13397 Marseille, France

e-mail: [email protected]

Received 4 May 2017 / Accepted 3 November 2017

ABSTRACT

We have applied a convolutional neural network (CNN) to classify and detect quasars in the Sloan Digital Sky Survey Stripe 82 and alsoto predict the photometric redshifts of quasars. The network takes the variability of objects into account by converting light curves intoimages. The width of the images, noted w, corresponds to the five magnitudes ugriz and the height of the images, noted h, representsthe date of the observation. The CNN provides good results since its precision is 0.988 for a recall of 0.90, compared to a precision of0.985 for the same recall with a random forest classifier. Moreover 175 new quasar candidates are found with the CNN considering afixed recall of 0.97. The combination of probabilities given by the CNN and the random forest makes good performance even betterwith a precision of 0.99 for a recall of 0.90. For the redshift predictions, the CNN presents excellent results which are higher than thoseobtained with a feature extraction step and different classifiers (a K-nearest-neighbors, a support vector machine, a random forest anda Gaussian process classifier). Indeed, the accuracy of the CNN within |�z| < 0.1 can reach 78.09%, within |�z| < 0.2 reaches 86.15%,within |�z| < 0.3 reaches 91.2% and the value of root mean square (rms) is 0.359. The performance of the KNN decreases for the three|�z| regions, since within the accuracy of |�z| < 0.1, |�z| < 0.2, and |�z| < 0.3 is 73.72%, 82.46%, and 90.09% respectively, and thevalue of rms amounts to 0.395. So the CNN successfully reduces the dispersion and the catastrophic redshifts of quasars. This newmethod is very promising for the future of big databases such as the Large Synoptic Survey Telescope.

Key words. methods: data analysis – techniques: photometric – techniques: image processing – quasars: general – surveys

1. Introduction

Quasars are powered by accretion onto supermassive black holesat the dynamical centers of their host galaxies, producing highluminosities spanning a broad range of frequencies. They areof paramount importance in astronomy. For example, their stud-ies can inform on massive blackhole (e.g., Portinari et al. 2012).Moreover, as they are the most luminous active galactic nuclei(AGN), they can be seen far across the Universe. So they giveclues to the evolution and structure of galaxies (e.g., Hopkinset al. 2006). They are also used as background objects to studythe absorption of intergalactic matter in the line of sight, whichhave many applications in cosmology (e.g., Lopez et al. 2008).With the advent of large and dedicated surveys such as theSloan Digital Sky Survey (SDSS; York et al. 2000) and the 2dFQuasar Redshift Survey (2QZ; Croom et al. 2009), the num-ber of known quasars has rapidly increased. Thus, the SDSSDR7 Quasar catalog (Schneider et al. 2010) contains 105 783spectroscopically confirmed quasars. The catalog covers an areaof '9380 deg2 and the quasar redshifts range from 0.065 to5.46.

? A table of the candidates is only available at the CDS viaanonymous ftp to cdsarc.u-strasbg.fr (130.79.128.5) or viahttp://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/611/A97

With the soon coming of the Large Synoptic Survey Tele-scope (LSST Science Collaboration 2009), it is important todevelop classification tools for quasar detection given the hugeamount of future data. In this way, machine learning algorithmsare being used increasingly. These algorithms permit us to pre-dict the label of an object thanks to the extraction of differentfeatures which characterize the object (e.g., the color of thesource). Several classifiers are now commonly used in astronomylike random forests which are a set of decision trees (Quinlan1986), Naives Bayes (Duda & Hart 1973), neural networks(Rumelhart et al. 1986) and support vector machines (Cortes &Vapnik 1995). These methods are very powerful in classifica-tion and detection of variable objects in astronomy (e.g., Eyer &Blake 2005; Dubath et al. 2011; Blomme et al. 2011; Rimoldiniet al. 2012; Peng et al. 2012; Peters et al. 2015). We can also citethe recent work of Hernitschek et al. (2016) on the classificationand the detection of QSOs in the Pan-STARR S1 (PS1) 3⇡ sur-vey. This is a multi-epoch survey that covered three quarters ofthe sky at typically 35 epochs between 2010 and the beginning of2014 with five filters (gP1, rP1, iP1, zP1, yP1). They use a randomforest classifier and colors and a structure function as features, toidentify 1 000 000 QSO candidates.

The main motivation for this work is to propose a newclassification and a detection method for quasars in the SloanDigital Sky Survey Stripe 82, that can be easily adapted to large

Article published by EDP Sciences A97, page 1 of 11

SDSS multi-epoch data in five optical wavelength bands

Time series in the five bands are converted to a 2D ‘image’i.e. a dynamic spectrum

Page 5: Data Mining, Modeling, and Machine Learning in Astronomyhosting.astro.cornell.edu/~cordes/A6523/A4523-6523... · 2019. 3. 14. · Data Mining, Modeling, and Machine Learning in Astronomy

A4523/A6523 Spring 2019 www.astro.cornell.edu/~cordes/A6523 5

J. Pasquet-Itam and J. Pasquet: Classification of quasars based on a deep learning approach

Fig. 4. Representation of the architecture that we are proposing. Thestructure is subdivided into five successive processing blocks at differ-ent temporal resolutions. Two types of convolutions are used: temporalconvolutions with four kernel sizes: 41⇥ 1, 21⇥ 1, 11⇥ 1, and 5⇥ 1 andfilter convolutions with a kernel size of 5 ⇥ 1.

outputs of the fully connected layers are randomly dropout(Srivastava et al. 2014). During the back-propagation processing,the network has to determine a large number of parameters,namely 1 802 032 in the convolution layers and 11 468 80 in thefully-connected layers.

5. Classification results

5.1. Experimental protocol

We did five cross-validations of the database by always select-ing 75% of the LCI for the learning base and 25% for the testingbase. For each of the five cross-validations, each CNN completedits learning on 60 epochs (during an epoch, each LCI is transmit-ted to the network and its error is back-propagated). Each CNNhas three outputs on the softmax layer corresponding to the fol-lowing classes: quasars, pulsating stars (RR Lyrae and � Scuti)and other objects. During the testing phase, each CNN gives alist of detected quasars in the testing base. We merge the listsgiven by each CNN into one list that we evaluate.

5.2. Results

The performance of the CNN is given in Fig. 5 in function ofthe magnitude and the redshift. We can notice that for a g-bandmagnitude below 17 magnitudes, the value of the recall decreasesuntil a recall of 50%. It is due to the too low number of exam-ples of very bright quasars in the training set. Indeed there areonly 22 light curves of quasars in the training database withmagnitudes between 15 and 17. However the recall is similarwhatever magnitudes above 17 for the g-band magnitude. It isa very interesting result because it means that the CNN perfor-mance does not depend on the magnitude but only on the numberof objects in the training database. This effect is less visible onthe right histogram of Fig. 5. Indeed, it is enough to consideronly 5% of the training database to reach a recall between 98%and 99%. This experiment shows that the CNN is invariant toredshift.

In the testing base, for a fixed recall of 0.97, 175 new quasarsdetected by the CNN have never been identified before. We callthem quasar candidates. Figure 6 represents the spatial distribu-tion of found quasars in the testing base by the CNN. The redcrosses characterize the new quasar candidates.

As we can see, the quasars detected by the CNN are dis-tributed in an uniform manner. Figure 7 shows the average num-ber of quasars in the sky per square degree, detected by the CNN,against the recall. As the recall increases, the number of quasarsper square degree increases, which is consistent as we detectedmore and more quasars. For a recall around 0.92, the averagenumber of quasars per square degree is about 20. Then, this num-ber is drastically increased because the precision is reduced andthe sample is contaminated by sources which are not quasars.

It is also interesting to highlight that a well known propertyof quasars is met by the new quasar candidates, namely a bluerwhen brighter tendency. This trend has been well establishedin the UV and optical color variations in quasar (e.g., Cristianiet al. 1997; Giveon et al. 1999; Vanden Berk et al. 2004).Figure 8 represents the amplitude of variations of detectedquasars in the u-band filter against the r-band filter at differentrecalls. We note that 83.6%, and 88.7% of variation amplitudesin the u-band filter are larger than in the r-band filter for arecall of 0.90, and 0.97 respectively. Thus the detected quasarsshow larger variation amplitudes in bluer bands and so a strongwavelength dependence.

5.3. Comparison with a random forest classifier

We compare the performance of our algorithm with that of arandom forest classifier whose we empirically estimated the bestparameters on the same database. It contains 400 decision trees

A97, page 5 of 11

J.Pasquet-Itamand

J.Pasquet:Classification

ofquasarsbased

ona

deeplearning

approach

Fig.4.R

epresentationof

thearchitecture

thatw

eare

proposing.Thestructure

issubdivided

intofive

successiveprocessing

blocksatdiffer-

enttemporalresolutions.Tw

otypes

ofconvolutionsare

used:temporal

convolutionswith

fourkernelsizes:41⇥1,21⇥

1,11⇥1,and

5⇥1

andfilterconvolutions

with

akernelsize

of5⇥1.

outputsof

thefully

connectedlayers

arerandom

lydropout

(Srivastavaetal.2014).D

uringthe

back-propagationprocessing,

thenetw

orkhas

todeterm

inea

largenum

berof

parameters,

namely

1802

032in

theconvolution

layersand

11468

80in

thefully-connected

layers.

5.

Cla

ss

ific

atio

nre

su

lts

5.1.Experim

entalprotocol

We

didfive

cross-validationsof

thedatabase

byalw

aysselect-

ing75%

oftheLC

Iforthelearning

baseand

25%forthe

testingbase.Foreach

ofthefive

cross-validations,eachC

NN

completed

itslearningon

60epochs(during

anepoch,each

LCIistransm

it-ted

tothe

network

andits

errorisback-propagated).Each

CN

Nhas

threeoutputs

onthe

softmax

layercorrespondingto

thefol-

lowing

classes:quasars,pulsatingstars

(RR

Lyraeand�

Scuti)and

otherobjects.D

uringthe

testingphase,each

CN

Ngives

alist

ofdetected

quasarsin

thetesting

base.We

merge

thelists

givenby

eachC

NN

intoone

listthatwe

evaluate.

5.2.Results

Theperform

anceof

theC

NN

isgiven

inFig.5

infunction

ofthe

magnitude

andthe

redshift.We

cannotice

thatforag-band

magnitudebelow

17m

agnitudes,thevalueoftherecalldecreasesuntila

recallof50%

.Itisdue

tothe

toolow

number

ofexam

-ples

ofvery

brightquasarsin

thetraining

set.Indeedthere

areonly

22light

curvesof

quasarsin

thetraining

databasew

ithm

agnitudesbetw

een15

and17.

How

everthe

recallis

similar

whatever

magnitudes

above17

fortheg-band

magnitude.It

isa

veryinteresting

resultbecauseitm

eansthatthe

CN

Nperfor-

mance

doesnotdependon

them

agnitudebutonly

onthe

number

ofobjectsin

thetraining

database.Thiseffectis

lessvisible

onthe

righthistogram

ofFig.5.Indeed,it

isenough

toconsider

only5%

ofthetraining

databaseto

reacha

recallbetween

98%and

99%.This

experiment

shows

thattheC

NN

isinvariant

toredshift.

Inthe

testingbase,fora

fixedrecallof0.97,175

newquasars

detectedby

theC

NN

haveneverbeen

identifiedbefore.W

ecall

themquasarcandidates.Figure

6represents

thespatialdistribu-

tionof

foundquasars

inthe

testingbase

bythe

CN

N.The

redcrosses

characterizethe

newquasarcandidates.

As

we

cansee,

thequasars

detectedby

theC

NN

aredis-

tributedin

anuniform

manner.Figure

7show

sthe

averagenum

-berofquasarsin

thesky

persquaredegree,detected

bythe

CN

N,

againsttherecall.A

sthe

recallincreases,thenum

berofquasarsper

squaredegree

increases,which

isconsistentas

we

detectedm

oreand

more

quasars.Fora

recallaround

0.92,theaverage

numberofquasarspersquare

degreeisabout20.Then,thisnum

-beris

drasticallyincreased

becausethe

precisionis

reducedand

thesam

pleis

contaminated

bysources

which

arenotquasars.

Itisalso

interestingto

highlightthataw

ellknown

propertyofquasars

ism

etbythe

newquasarcandidates,nam

elya

bluerw

henbrighter

tendency.This

trendhas

beenw

ellestablished

inthe

UV

andopticalcolor

variationsin

quasar(e.g.,C

ristianiet

al.1997;

Giveon

etal.

1999;Vanden

Berk

etal.

2004).Figure

8represents

theam

plitudeof

variationsof

detectedquasars

inthe

u-bandfilter

againstther-band

filteratdifferent

recalls.We

notethat83.6%

,and88.7%

ofvariationam

plitudesin

theu-band

filterare

largerthan

inthe

r-bandfilter

fora

recallof0.90,and

0.97respectively.Thus

thedetected

quasarsshow

largervariationam

plitudesin

bluerbandsand

soa

strongw

avelengthdependence.

5.3.Com

parisonw

itha

randomforestclassifier

We

compare

theperform

anceof

ouralgorithm

with

thatof

arandom

forestclassifierwhose

we

empirically

estimated

thebest

parameters

onthe

same

database.Itcontains400

decisiontrees

A97,page

5of11

Page 6: Data Mining, Modeling, and Machine Learning in Astronomyhosting.astro.cornell.edu/~cordes/A6523/A4523-6523... · 2019. 3. 14. · Data Mining, Modeling, and Machine Learning in Astronomy

A4523/A6523 Spring 2019 www.astro.cornell.edu/~cordes/A6523 6

MNRAS 000, 1–11 (2018) Preprint 25 February 2019 Compiled using MNRAS LATEX style file v3.0

Optimizing Sparse RFI Prediction using Deep Learning

Joshua Kerrigan1?

, Paul La Plante2, Saul Kohn

2, Jonathan C. Pober

1, James Aguirre

2, Zara Abdurashidova

3,

Paul Alexander4, Zaki S. Ali

3, Yanga Balfour

5, Adam P. Beardsley

6, Gianni Bernardi

5,7,8, Judd D. Bowman

6,

Richard F. Bradley9, Jacob Burba

1, Chris L. Carilli

4,10, Carina Cheng

3, David R. DeBoer

3, Matt Dexter

3,

Eloy de Lera Acedo4, Joshua S. Dillon

3, Julia Estrada

19, Aaron Ewall-Wice

11, Nicolas Fagnoni

4, Randall Fritz

5,

Steve R. Furlanetto12, Brian Glendenning

10, Bradley Greig

13,20, Jasper Grobbelaar

5, Deepthi Gorthi

3, Ziyaad Halday

5,

Bryna J. Hazelton14,15

, Jack Hickish3, Daniel C. Jacobs

6, Austin Julius

5, Nick Kern

3, Piyanat Kittiwisit

6,

Matthew Kolopanis6, Adam Lanman

1, Telalo Lekalake

5, Adrian Liu

16, David MacMahon

3, Lourence Malan

5,

Cresshim Malgas5, Matthys Maree

5, Zachary E. Martinot

2, Eunice Matsetela

5, Andrei Mesinger

17, Mathakane Molewa

5,

Miguel F. Morales14, Tshegofalang Mosiane

5, Abraham R. Neben

11, Aaron R. Parsons

3, Nipanjana Patra

3,

Samantha Pieterse5, Nima Razavi-Ghods

4, Jon Ringuette

14, James Robnett

10, Kathryn Rosie

5, Peter Sims

1,

Craig Smith5, Angelo Syce

5, Nithyanandan Thyagarajan

6,10, Peter K. G. Williams

18, Haoxuan Zheng

11

The authors’ a�liations are shown in Appendix B

Accepted XXX. Received YYY; in original form ZZZ

ABSTRACTRadio Frequency Interference (RFI) is an ever-present limiting factor among radio

telescopes even in the most remote observing locations. When looking to retain the

maximum amount of sensitivity and reduce contamination for Epoch of Reionization

studies, the identification and removal of RFI is especially important. In addition to

improved RFI identification, we must also take into account computational e�ciency of

the RFI-Identification algorithm as radio interferometer arrays such as the Hydrogen

Epoch of Reionization Array grow larger in number of receivers. To address this, we

present a Deep Fully Convolutional Neural Network (DFCN) that is comprehensive in

its use of interferometric data, where both amplitude and phase information are used

jointly for identifying RFI. We train the network using simulated HERA visibilities

containing mock RFI, yielding a known “ground truth” dataset for evaluating the

accuracy of various RFI algorithms. Evaluation of the DFCN model is performed on

observations from the 67 dish build-out, HERA-67, and achieves a data throughput

of 1.6⇥105HERA time-ordered 1024 channeled visibilities per hour per GPU. We

determine that relative to an amplitude only network including visibility phase adds

important adjacent time-frequency context which increases discrimination between

RFI and Non-RFI. The inclusion of phase when predicting achieves a Recall of 0.81,

Precision of 0.58, and F2 score of 0.75 as applied to our HERA-67 observations.

Key words: methods: data analysis – techniques: interferometric

1 INTRODUCTION

Next generation radio interferometers are now beginning tobecome operational. These arrays are looking to detect andmeasure some of the weakest signals the Universe has too↵er, such as the brightness-temperature contrast of the

? E-mail: joshua [email protected] (JRK)

21cm signal during the Epoch of Reionization (EoR). Bymeasuring this highly redshifted signal we can character-ize the progression of the EoR. The understanding gainedfrom this characterization has the potential to help us un-ravel how the first stars and galaxies formed and reionizedtheir surrounding neutral hydrogen. While instruments likethe Hydrogen Epoch of Reionization Array (HERA) (De-Boer et al. 2017) have the intrinsic sensitivity required to

© 2018 The Authors

arX

iv:1

902.

0824

4v1

[ast

ro-p

h.IM

] 21

Feb

201

9

2 J. R. Kerrigan et al.

detect the EoR signal through a power spectrum, they area✏icted with anthropogenic noise which we refer to as Ra-dio Frequency Interference (RFI). Interference from RFI in21cm EoR observations is an especially significant obstaclebecause it can have a brightness anywhere from on the orderof the EoR signal to orders of magnitude beyond even Galac-tic and extra-galactic foregrounds. RFI unfortunately intro-duces a reduction in sensitivity in two separate but distinctways, one being the direct contamination by having sim-ilar spectral characteristics and overpowering of the 21cmsignal, and the other being the introduction of a complexsampling function due to missing data. This produces cor-relations between modes when computing the Fourier trans-form along the frequency axis (O↵ringa et al. 2019). It istherefore important to strike a balance between identify-ing RFI while not falsely identifying non-RFI as RFI, whichleads to further complicating our sampling function over fre-quency. Many approaches have recently been developed toidentify and extract RFI from radio telescope data. RFI al-gorithms of particular interest include AOflagger (O↵ringaet al. 2012), which uses a Scale-invariant Rank operator toidentify morphologies that are scale-invariant in time or fre-quency which is a characteristic of many RFI signals. ThisRFI detection strategy has been used successfully on instru-ments such as the MWA (O↵ringa et al. 2015) and the Low-Frequency Array (LOFAR) (O↵ringa et al. 2013). Alterna-tive approaches to RFI identification include the applicationof neural networks. More specifically, a Deep Fully Convolu-tional Neural Network (DFCN) based on the U-Net architec-ture (Ronneberger et al. 2015) has been used on single dishradio telescope data (Akeret et al. 2017), and a RecurrentNeural Network (RNN) has been applied to signal ampli-tudes from radio interferometer data (Burd et al. 2018).

In this paper we expand upon the RFI identification ap-proach using a DFCN developed in Tensorflow (Abadi et al.2016) with the use of both the amplitude and phase infor-mation from an interferometric visibility. This technique isprompted by examples such as what is shown in Figure 1,which demonstrates how the phase of time-ordered visibil-ities (waterfall visibilities) can provide supplemental infor-mation in identifying RFI beyond that of an amplitude-onlyapproach. Note that in this paper, all time-ordered visibilityplots of real data are in the yellow-purple palette (e.g. Fig-ure 1) whereas all simulated data is in the blue-white palette(e.g. Figure 7). To understand the improvements a↵orded byour joint amplitude-phase network we compare it to both anamplitude only network and the Watershed RFI algorithm(See Appendix A) which is the current RFI-flagging algo-rithm of choice for the HERA data processing pipeline.

The paper is outlined as follows. Section 2 introducesthe architecture of our network, discusses how it compares toprevious work, and describes the training dataset. We thendemonstrate the e↵ectiveness by evaluating both DFCNs onsimulated and real HERA observations in Section 3. Finallyin Section 4 we conclude with discussion of further applica-tions and future work.

170 180 190Frequency (MHz)

0

10

20

30

40

50

60

Tim

eIn

tegr

atio

ns

170 180 190Frequency (MHz)

log10(Jy)�4

�2

0

2

4

��3

�2

�1

0

1

2

3

Figure 1. An example of a HERA 14m baseline waterfall visibil-ity between 170-195 MHz in amplitude (left) and phase (right).The phase waterfall visibility demonstrates how it can providecomplementary information about the presence of RFI such as inthe 181.3 MHz channel which has constant narrow-band RFI andthe more spontaneous ‘blips’ in the 179.5 MHz channel at time in-tegrations of 13, 22, and 23. The significant contrast between thephase of the sky fringe, and how it’s restricted to a narrow-bandis an obvious indication of being RFI.

2 METHOD

2.1 DFCN Architecture

The standard 2d convolutional neural network (CNN) (Le-Cun & Bengio 1998) is structurally similar to that of a typi-cal Artificial Neural Network (ANN) (Lecun et al. 1998), butit di↵ers to an ANN’s dense layers of ‘neurons’ by its succes-sive convolutions of an input image, which preserves spatialdependence. Each convolutional layer contains a set of learn-able filters which represent a response for particular shapesat di↵erent scales (e.g. the edges of an object in an image).The convolved output for every layer is then typically down-sampled using a process known as max pooling that stridesa window across the image keeping the highest pixel valuewithin the window. Max pooling provides both a computa-tional improvement due to a decreased image size, and anadded level of abstraction relative to the initial image. Afterthe convolution and max pooling layers the image typicallyis then passed through a non-linear activation function (e.g.sigmoid function) which produces a spatial activation mapdescribing the convolutional layer’s response to every pixelcontained within the image. The eventual output of thesesuccessive convolution, max pooling, and activation layersis then used to predict (or regress) based on the classifica-tion of the image. The error between the predicted class andthe true class is then computed through a loss function suchas the cross-entropy loss (or mean squared error for regres-sion) and the error is back-propagated through the networkupdating the learn-able parameters.

The style of network we describe in this paper devi-ates from a traditional CNN by requiring a fully connectedconvolutional layer of neurons after the convolutional down-sampling and an upsampling stage to semantically predictclasses on a per-pixel basis. For a deeper understanding

MNRAS 000, 1–11 (2018)

Page 7: Data Mining, Modeling, and Machine Learning in Astronomyhosting.astro.cornell.edu/~cordes/A6523/A4523-6523... · 2019. 3. 14. · Data Mining, Modeling, and Machine Learning in Astronomy

A4523/A6523 Spring 2019 www.astro.cornell.edu/~cordes/A6523 7

Sparse RFI prediction using FCNs 3

of this kind of network architecture, see Krizhevsky et al.(2017). We begin with a Deep Fully Convolutional Networkarchitecture similar to the U-Net RFI (Ronneberger et al.2015; Akeret et al. 2017) implementation. However, insteadof using a uniform number of feature layers for each con-volutional layer, we use an image pyramid (Lin et al. 2016)style approach with an increasing number of features as thenetwork approaches the fully connected convolutional layersand invert to a decreasing number of feature layers towardsthe output prediction layer. This approach should o↵er usan increase in performance as the input image for each suc-cessive convolution shrinks. Each stacked layer in the maxpooling stages has the dimensions of ( H

2L ⇥W

2L ⇥ 2L

F) whereF is the number of feature layers, H and W are the layerheight and width in pixels, and L is the layer of interest.

To adapt the network to use the visibility phase compo-nent, we mirror the amplitude only network as shown in Fig-ure 2. We then combine successive amplitude & phase con-volution layers at each transpose convolution layer with thetechnique known as ‘skip connections’ introduced in Longet al. (2014) and He et al. (2016). This is implemented bytaking the output of a downsampled convolutional layer andconcatenating it with an upsampled transpose convolutionallayer of equal time, frequency, and feature dimensions. Byusing these skips in the convolutional pathway, the networkis provided with an initial “template” from which to makesmall deviations. This fixes an issue within deep networkswhere fits to higher-order nonlinearities become dominantin a layer, leading to training and overfitting issues. Empiri-cally, we find that using skip connections in conjunction withphase information allows for training a deeper network thatconverges in fewer iterations than the simple amplitude-onlynetwork.

For each of the skip layer concatenations between theamplitude and phase pathways, we subtract the mean andnormalize over both time and frequency, which assists instandardization as amplitude and phase features can bequite dissimilar. The amplitude only DFCN we use has⇠ 6⇥10

5 trainable parameters, while the addition of thephase downsampling layers for the amplitude & phase DFCNpushes the number of trainable parameters to ⇠9⇥10

5. Thespecific per layer attributes employed in our networks can beseen in Table 1, where it should be noted that per layer di-mension sizes are not specified because this style of networkis agnostic to the input height and width.

To optimize the network hyperparameters, a coarse gridsearch was performed over dropout rate, learning rate, andbatch size; the optimal results from this search are found inTable 2. The depth of our convolutional layers are chosento maximize learning and minimize prediction times, whiletrying to retain abstractions of the input visibilities that canproperly describe our RFI. These dimensions are thus deter-mined by initially training at an arbitrarily high number offeature layers and scaling back to the minimum number oflayers we need to retain for convergence of the training loss.

2.2 Data Preparation

The analysis in this paper is performed entirely on HERAdata (both simulated and real) and therefore should be notedthat any data preparation techniques outlined here may beunique to HERA. This does not imply that they are unsuit-

layer type kernel size stride filters depthconvolution 3x3 1 16 2convolution 1x1 1 16 1maxpool 2x2 2 1batch norm.convolution 3x3 1 32 2convolution 1x1 1 32 1maxpool 2x2 2 1batch norm.convolution 3x3 1 64 2convolution 1x1 1 64 1maxpool 2x2 2 1batch norm.convolution 3x3 1 128 2convolution 1x1 1 128 1maxpool 2x2 2 1batch norm.convolution 3x3 2 256 2convolution 1x1 1 256 1maxpool 2x2 2 1batch norm.transpose conv. 3x3 2 128 1transpose conv. 3x3 2 64 1transpose conv. 3x3 2 32 1transpose conv. 5x5 4 2 1

Table 1. Architecture overview of the DFCNs demonstrated inthis analysis. The colored rows correspond to the concatenationson the outputs between those respective layers, where prior tothe concatenation each layer undergoes a batch normalization.The depth of a layer here means that there are multiples of thelayer stacked all having the same properties. The amplitude-phaseDFCN has two input pathways mirrored up until the first trans-pose convolution layer.

able for other radio interferometers but additional precau-tions may need to be taken into consideration. To preparethe amplitude-phase input space to be as robust to as manyvisibility scenarios as possible, we must adopt several stan-dardizations. The amplitude of the visibility can vary dras-tically by local sidereal time (LST), day, and baseline typewhile having significant di↵erences in dynamic range. In con-trast, the phase of a visibility is intrinsically more standard-ized: it is constrained between �⇡ � ⇡ and should havea mean that is approximately µ� = 0, so we should only ex-pect substantial deviations across baseline type, which aredue to changing fringe rates. Therefore to lessen the dynamicrange issues in amplitude, we standardize our waterfall visi-bilities V(t, ⌫), according to V(t, ⌫) = (ln|V |� µln |V | )/�ln |V | , bysubtracting the mean, µln |V | , and dividing by the standarddeviation, �ln |V | , across time and frequency of the logarith-mic visibilities.

To further increase the robustness and generalizabilityof our network for di↵erent time and frequency sub-bands,we slice the HERA visibilities into 16 spectral windows ofdimensions 64 frequency channels by 60 time integrations(6.3 MHz⇥600 sec). We then pad both time and frequency di-mensions by reflecting about the boundaries, extending thedataset in both directions. This allows for making predic-tions for the edge pixels, which otherwise would be ignoreddue to the size of our convolution layer kernel size of 3 ⇥ 3

(98.44 kHz⇥30 s). Furthermore, we want to use square inputchannels to maintain a 1:1 aspect of time to frequency pix-

MNRAS 000, 1–11 (2018)

ROC Plot8 J. R. Kerrigan et al.

Figure 5. ROC curve comparing all three RFI flagging al-gorithms, Amplitude DFCN (Red), Amplitude-Phase DFCN(Black), and Watershed (Orange). The ROC curves were derivedfrom each algorithm predicting on real HERA data visibilities(solid) and simulated HERA visibilities (broken). Black circlesrepresent the optimal F2 score. The Area Under the Curve (AUC)metric condenses the overall performance of our algorithms andtells us that the Amplitude-Phase network exhibits the best re-sponse on our real data with an AUC of 0.95. The TPR and FPRsfor the real data (solid) are based on manually flagging RFI tothe best of our ability to discern RFI from signals on the sky andtherefore should not be taken as a ground truth.

ities, coupled with an amplitude-phase DFCN we shouldbe able to achieve an extremely e↵ective first-round RFIflagger that reduces a common pipeline bottleneck. We dohowever recognize that the DFCN approach can have is-sues with identifying RFI bursts that occupy single time-frequency samples, what we called ‘blips’, and broadbandbursts. This is most likely due to an imbalanced representa-tion in our training dataset, and the loss optimization notbeing rewarded enough to drive the DFCNs to learn a sub-class that appears at a rate of < 0.1%. This can be poten-tially overcome by fine-tuning the model by using transferlearning (Yosinski et al. 2014), and would involve a train-ing dataset which consists almost entirely of these two sub-classes, where the trained DFCN model shown here wouldserve as the starting point.

In near future build-outs of HERA there will need tobe an extreme importance placed on reducing bottlenecksin the HERA data processing pipeline. The current Wa-tershed RFI flagging algorithm does not scale particularlywell, which puts this class of fully convolutional neural net-work as an ideal alternative. The eventual number of HERAdishes will total 350, which for a single 10 minute obser-vation gives us 61,075 unique waterfall visibilities. In theamplitude-phase DFCN design outlined in this paper theRFI flagging throughput is 1.6⇥10

5 waterfalls/h/gpu 3 whichcompares to the Watershed RFI flagger at 7.4⇥10

3 on thesame resources.

Future work related to the amplitude-phase DFCN

3 Performed on a single NVIDIA GeForce GTX TITAN

could include a modification to a similarly styled compre-hensive data quality classifier which should in-turn lead toimproved results for sky based (Barry et al. 2016) and re-dundant calibration (Zheng et al. 2014), both of which re-quires exceptionally conditioned data. A strict binary classi-fier could be achieved by developing a training dataset thatdoesn’t use a mock sky, but an accurately modeled sky witha proper HERA beam model. Of course it would also bepossible and might be better suited by developing an obser-vation derived training dataset in this instance, as failuremodes are generally easier to identify in visibilities as op-posed to contamination by RFI.

It should also be possible to extend this work to arrayswith better temporal resolution such as the MWA (Tingayet al. 2015) in the search for transients like Fast Radio Bursts(FRBs, Zhang et al. 2018). The additional phase informa-tion could potentially reduce the low-end limit of fluence foridentification due to a more significant contrast between RFIand sky fringes.

The github repository for the RFI DFCN described inthis paper can be found at https://github.com/UPennEoR/ml_rfi.

ACKNOWLEDGEMENTS

This material is based upon work supported by the Na-tional Science Foundation under Grant Nos. 1636646 and1836019 and institutional support from the HERA collabo-ration partners. This research is funded in part by the Gor-don and Betty Moore Foundation. HERA is hosted by theSouth African Radio Astronomy Observatory, which is a fa-cility of the National Research Foundation, an agency of theDepartment of Science and Technology. This work was sup-ported by the Extreme Science and Engineering DiscoveryEnvironment (XSEDE), which is supported by National Sci-ence Foundation grant number ACI-1548562 (Towns et al.2014). Specifically, it made use of the Bridges system, whichis supported by NSF award number ACI-1445606, at thePittsburgh Supercomputing Center (Nystrom et al. 2015).We gratefully acknowledge the support of NVIDIA Corpo-ration with the donation of the Titan X GPU used for thisresearch. SAK is supported by a University of PennsylvaniaSAS Dissertation Completion Fellowship. Parts of this re-search were supported by the Australian Research CouncilCentre of Excellence for All Sky Astrophysics in 3 Dimen-sions (ASTRO 3D), through project number CE170100013.GB acknowledges support from the Royal Society and theNewton Fund under grant NA150184. This work is basedon research supported in part by the National ResearchFoundation of South Africa (grant No. 103424). GB ac-knowledges funding from the INAF PRIN-SKA 2017 project1.05.01.88.04 (FORECaST). We acknowledge the supportfrom the Ministero degli A↵ari Esteri della Cooperazione In-ternazionale - Direzione Generale per la Promozione del Sis-tema Paese Progetto di Grande Rilevanza ZA18GR02. Thiswork is based on research supported by the National Re-search Foundation of South Africa (Grant Number 113121).

MNRAS 000, 1–11 (2018)

Page 8: Data Mining, Modeling, and Machine Learning in Astronomyhosting.astro.cornell.edu/~cordes/A6523/A4523-6523... · 2019. 3. 14. · Data Mining, Modeling, and Machine Learning in Astronomy

A4523/A6523 Spring 2019 www.astro.cornell.edu/~cordes/A6523 8

Time Series Classification from Scratch with DeepNeural Networks: A Strong Baseline

Zhiguang Wang, Weizhong YanGE Global Research

{zhiguang.wang, yan}@ge.com

Tim OatesComputer Science and Electric EngineeringUniversity of Maryland Baltimore County

[email protected]

Abstract—We propose a simple but strong baseline for time

series classification from scratch with deep neural networks. Our

proposed baseline models are pure end-to-end without any heavy

preprocessing on the raw data or feature crafting. The proposed

Fully Convolutional Network (FCN) achieves premium perfor-

mance to other state-of-the-art approaches and our exploration

of the very deep neural networks with the ResNet structure is

also competitive. The global average pooling in our convolutional

model enables the exploitation of the Class Activation Map

(CAM) to find out the contributing region in the raw data for

the specific labels. Our models provides a simple choice for

the real world application and a good starting point for the

future research. An overall analysis is provided to discuss the

generalization capability of our models, learned features, network

structures and the classification semantics.

I. INTRODUCTION

Time series data is ubiquitous. Both human activities andnature produces time series everyday and everywhere, likeweather readings, financial recordings, physiological signalsand industrial observations. As the simplest type of time seriesdata, univariate time series provides a reasonably good start-ing point to study such temporal signals. The representationlearning and classification research has found many potentialapplication in the fields like finance, industry, and health care.

However, learning representations and classifying time se-ries are still attracting much attention. As the earliest baseline,distance-based methods work directly on raw time serieswith some pre-defined similarity measures such as Euclideandistance or Dynamic time warping (DTW) [1] to performclassification. The combination of DTW and the k-nearest-neighbors classifier is known to be a very efficient approachas a golden standard in the last decade.

Feature-based methods suppose to extract a set of featuresthat are able to represent the global/local time series patterns.Commonly, these features are quantized to form a Bag-of-Words (BoW), then given to the classifiers [2]. Feature-basedapproaches mostly differ in the extracted features. To namea few recent benchmarks, The bag-of-features framework(TSBF) [3] extracts the interval features with different scalesfrom each interval to form an instance, and each time seriesforms a bag. A supervised codebook is built with the randomforest for classifying the time series. Bag-of-SFA-Symbols(BOSS) [4] proposes a distance based on the histogramsof symbolic Fourier approximation words. Its extension, theBOSSVS method [5] combines the BOSS model with the

vector space model to reduce the time complexity and improvethe performance by ensembling the models with differencewindow size. The final classification is performed with theOne-Nearest-Neighbor classifier.

Ensemble based approaches combine different classifierstogether to achieve a higher accuracy. Different ensembleparadigms integrate various feature sets or classifiers. TheElastic Ensemble (PROP) [6] combines 11 classifiers based onelastic distance measures with a weighted ensemble scheme.Shapelet ensemble (SE) [7] produces the classifiers throughthe shapelet transform in conjunction with a heterogeneousensemble. The flat collective of transform-based ensembles(COTE) is an ensemble of 35 different classifiers based on thefeatures extracted from both the time and frequency domains.

All the above approaches need heavy crafting on datapreprocessing and feature engineering. Recently, some efforthas been spent to exploit the deep neural network, especiallyconvolutional neural networks (CNN) for end-to-end timeseries classification. In [8], a multi-channel CNN (MC-CNN)is proposed for multivariate time series classification. Thefilters are applied on each single channel and the features areflattened across channels as the input to a fully connectedlayer. The authors applied sliding windows to enhance thedata. They only evaluate this approach on two multivariatetime series datasets, where there is no published benchmarkfor comparison. In [9], the author proposed a multi-scale CNNapproach (MCNN) for univariate time series classification.Down sampling, skip sampling and sliding windows are usedfor preprocessing the data to manually prepare for the multi-scale settings. Although this approach claims the state-of-the-art performance on 44 UCR time series datasets [10], the heavypreprocessing efforts and a large set of hyperparameters makeit complicated to deploy. The proposed window slicing methodfor data augmentation seems to be ad-hoc.

We provide a standard baseline to exploit deep neuralnetworks for end-to-end time series classification without anycrafting in feature engineering and data preprocessing. Thedeep multilayer perceptrons (MLP), fully convolutional net-works (FCN) and the residual networks (ResNet) are evaluatedon the same 44 benchmark datasets with other benchmarks.Through a pure end-to-end training on the raw time seriesdata , the ResNet and FCN achieve comparable or betterperformance than COTE and MCNN. The global averagepooling in our convolutional model enables the exploitation of

arX

iv:1

611.

0645

5v4

[cs.L

G]

14 D

ec 2

016

Time Series Classification from Scratch with DeepNeural Networks: A Strong Baseline

Zhiguang Wang, Weizhong YanGE Global Research

{zhiguang.wang, yan}@ge.com

Tim OatesComputer Science and Electric EngineeringUniversity of Maryland Baltimore County

[email protected]

Abstract—We propose a simple but strong baseline for time

series classification from scratch with deep neural networks. Our

proposed baseline models are pure end-to-end without any heavy

preprocessing on the raw data or feature crafting. The proposed

Fully Convolutional Network (FCN) achieves premium perfor-

mance to other state-of-the-art approaches and our exploration

of the very deep neural networks with the ResNet structure is

also competitive. The global average pooling in our convolutional

model enables the exploitation of the Class Activation Map

(CAM) to find out the contributing region in the raw data for

the specific labels. Our models provides a simple choice for

the real world application and a good starting point for the

future research. An overall analysis is provided to discuss the

generalization capability of our models, learned features, network

structures and the classification semantics.

I. INTRODUCTION

Time series data is ubiquitous. Both human activities andnature produces time series everyday and everywhere, likeweather readings, financial recordings, physiological signalsand industrial observations. As the simplest type of time seriesdata, univariate time series provides a reasonably good start-ing point to study such temporal signals. The representationlearning and classification research has found many potentialapplication in the fields like finance, industry, and health care.

However, learning representations and classifying time se-ries are still attracting much attention. As the earliest baseline,distance-based methods work directly on raw time serieswith some pre-defined similarity measures such as Euclideandistance or Dynamic time warping (DTW) [1] to performclassification. The combination of DTW and the k-nearest-neighbors classifier is known to be a very efficient approachas a golden standard in the last decade.

Feature-based methods suppose to extract a set of featuresthat are able to represent the global/local time series patterns.Commonly, these features are quantized to form a Bag-of-Words (BoW), then given to the classifiers [2]. Feature-basedapproaches mostly differ in the extracted features. To namea few recent benchmarks, The bag-of-features framework(TSBF) [3] extracts the interval features with different scalesfrom each interval to form an instance, and each time seriesforms a bag. A supervised codebook is built with the randomforest for classifying the time series. Bag-of-SFA-Symbols(BOSS) [4] proposes a distance based on the histogramsof symbolic Fourier approximation words. Its extension, theBOSSVS method [5] combines the BOSS model with the

vector space model to reduce the time complexity and improvethe performance by ensembling the models with differencewindow size. The final classification is performed with theOne-Nearest-Neighbor classifier.

Ensemble based approaches combine different classifierstogether to achieve a higher accuracy. Different ensembleparadigms integrate various feature sets or classifiers. TheElastic Ensemble (PROP) [6] combines 11 classifiers based onelastic distance measures with a weighted ensemble scheme.Shapelet ensemble (SE) [7] produces the classifiers throughthe shapelet transform in conjunction with a heterogeneousensemble. The flat collective of transform-based ensembles(COTE) is an ensemble of 35 different classifiers based on thefeatures extracted from both the time and frequency domains.

All the above approaches need heavy crafting on datapreprocessing and feature engineering. Recently, some efforthas been spent to exploit the deep neural network, especiallyconvolutional neural networks (CNN) for end-to-end timeseries classification. In [8], a multi-channel CNN (MC-CNN)is proposed for multivariate time series classification. Thefilters are applied on each single channel and the features areflattened across channels as the input to a fully connectedlayer. The authors applied sliding windows to enhance thedata. They only evaluate this approach on two multivariatetime series datasets, where there is no published benchmarkfor comparison. In [9], the author proposed a multi-scale CNNapproach (MCNN) for univariate time series classification.Down sampling, skip sampling and sliding windows are usedfor preprocessing the data to manually prepare for the multi-scale settings. Although this approach claims the state-of-the-art performance on 44 UCR time series datasets [10], the heavypreprocessing efforts and a large set of hyperparameters makeit complicated to deploy. The proposed window slicing methodfor data augmentation seems to be ad-hoc.

We provide a standard baseline to exploit deep neuralnetworks for end-to-end time series classification without anycrafting in feature engineering and data preprocessing. Thedeep multilayer perceptrons (MLP), fully convolutional net-works (FCN) and the residual networks (ResNet) are evaluatedon the same 44 benchmark datasets with other benchmarks.Through a pure end-to-end training on the raw time seriesdata , the ResNet and FCN achieve comparable or betterperformance than COTE and MCNN. The global averagepooling in our convolutional model enables the exploitation of

arX

iv:1

611.

0645

5v4

[cs.L

G]

14 D

ec 2

016

arXiv 1611.06455

500

Inpu

t

500

500

Soft

max

Inpu

t

Soft

max

0.1 0.2 0.2 0.3

ReLU

ReLU

ReLU

128

BN +

ReL

U

256

BN +

ReL

U

128

BN +

ReL

U

Inpu

t

64

BN +

ReL

U

64

BN +

ReL

U

64

BN +

ReL

U

Glo

bal P

oolin

g

128

BN +

ReL

U

128

BN +

ReL

U

128

BN +

ReL

U

128

BN +

ReL

U

128

BN +

ReL

U

128

BN +

ReL

U

Soft

max

Glo

bal P

oolin

g

+ + +

(a)MLP

(b)FCN

(C)ResNet

Fig. 1. The network structure of three tested neural networks. Dash line indicates the operation of dropout.

the Class Activation Map (CAM) to find out the contributingregion in the raw data for the specific labels.

II. NETWORK ARCHITECTURES

We tested three deep neural network architectures to providea fully comprehensive baseline.

A. Multilayer Perceptrons

Our plain baselines are basic MLP by stacking three fully-connected layers. The fully-connected layers each has 500neurons following two design rules: (i) using dropout [11]at each layer’s input to improve the generalization capability ;and (ii) the non-linearity is fulfilled by the rectified linear unit(ReLU)[12] as the activation function to prevent saturation ofthe gradient when the network is deep. The network ends witha softmax layer. A basic layer block is formalized as

x = fdropout,p(x)

y = W · x+ b

h = ReLU(y) (1)

This architecture is mostly distinguished from the seminalMLP decades ago by the utilization of ReLU and dropout.ReLU helps to stack the networks deeper and dropout largelyprevent the co-adaption of the neurons to help the modelgeneralizes well especially on some small datasets. However,if the network is too deep, most neuron will hibernate as theReLU totally halve the negative part. The Leaky ReLU [13]might help, but we only use three layers MLP with the ReLUto provide a fundamental baselines. The dropout rates at the

input layer, hidden layers and the softmax layer are {0.1, 0.2,0.3}, respectively (Figure 1(a)).

B. Fully Convolutional Networks

FCN has shown compelling quality and efficiency for se-mantic segmentation on images [14]. Each output pixel is aclassifier corresponding to the receptive field and the networkscan thus be trained pixel-to-pixel given the category-wisesemantic segmentation annotation.

In our problem settings, the FCN is performed as a featureextractor. Its final output still comes from the softmax layer.The basic block is a convolutional layer followed by a batchnormalization layer [15] and a ReLU activation layer. Theconvolution operation is fulfilled by three 1-D kernels with thesizes {8, 5, 3} without striding. The basic convolution blockis

y = W ⌦ x+ b

s = BN(y)

h = ReLU(s) (2)

⌦ is the convolution operator. We build the final networksby stacking three convolution blocks with the filter sizes {128,256, 128} in each block. Unlike the MCNN and MC-CNN, Weexclude any pooling operation. This strategy is also adopted inthe ResNet [16] as to prevent overfitting. Batch normalizationis applied to speed up the convergence speed and help improvegeneralization. After the convolution blocks, the features arefed into a global average pooling layer [17] instead of a fully

Page 9: Data Mining, Modeling, and Machine Learning in Astronomyhosting.astro.cornell.edu/~cordes/A6523/A4523-6523... · 2019. 3. 14. · Data Mining, Modeling, and Machine Learning in Astronomy

A4523/A6523 Spring 2019 www.astro.cornell.edu/~cordes/A6523 9

Draft version January 16, 2019Typeset using LATEX twocolumn style in AASTeX62

Classification of Broad Absorption Line Quasars with a Convolutional Neural Network

Zhiyuan Guo1 and Paul Martini2

1Department of Astronomy, The Ohio State University, 140 West 18th Avenue, Columbus, OH 43210, USA

[email protected] of Astronomy and Center for Cosmology and Astro-Particle Physics, The Ohio State University, Columbus, OH 43210, USA

[email protected]

ABSTRACT

Quasars that exhibit blue-shifted, broad absorption lines (BAL QSOs) are an important probe ofblack hole feedback on galaxy evolution. Yet the presence of BALs is also a complication for large,spectroscopic surveys that use quasars as cosmological probes because the BAL features can a↵ectredshift measurements and contaminate information about the matter distribution in the Lyman-↵forest. We present a new BAL QSO catalog for quasars in the Sloan Digital Sky Survey (SDSS) DataRelease 14 (DR14). As the SDSS DR14 quasar catalog has over 500,000 quasars, we have developed anautomated BAL classifier with a Convolutional Neural Network (CNN). We trained our CNN classifieron the C IV �1549 region of a sample of quasars with reliable human classifications, and compared theresults to both a dedicated test sample and visual classifications from the earlier SDSS DR12 quasarcatalog. Our CNN classifier correctly classifies over 98% of the BAL quasars in the DR12 catalog,which demonstrates comparable reliability to human classification. The disagreements are generallyfor quasars with lower signal-to-noise ratio spectra and/or weaker BAL features. Our new catalogincludes the probability that each quasar is a BAL, the strength, blueshifts and velocity widths ofthe troughs, and similar information for any Si IV �1398 BAL troughs that may be present. We findsignificant BAL features in 16.8% of all quasars with 1.57 < z < 5.56 in the SDSS DR14 quasar catalog.

Keywords: Catalog — Quasars: absorption lines

1. INTRODUCTION

Quasars or quasi-stellar objects (QSOs) are highly en-ergetic sources at the centers of galaxies that are causedby the accretion of matter onto supermassive blackholes. A sub-set of QSOs exhibit blue-shifted, broad ab-sorption line (BAL) troughs with velocities greater than2000 km s�1 Weymann et al. (1991). One longstandingquestion in BAL research is if the BAL phenomenonrepresents an evolutionary phase of all QSOs, or if itis always present, but only visible along a subset of alllines of sight. One interesting observation is that thefraction of BALs in a QSO sample depends on the se-lection method. Surveys that employ UV and visiblewavelengths find BAL fractions of 10-30% (Foltz et al.1990; Trump et al. 2006), while the IR-selected BALfraction is greater and about 40% based on a 2MASS-selected sample from Dai et al. (2008). This result sug-gests that BALs are present in at least a large frac-tion of all QSOs, and that the presence of BAL troughsmay inhibit the identification of BAL QSOs via UV andvisible-wavelength selection methods.

BAL QSOs are often subdivided based on the spec-tral lines that show broad absorption features, and theBAL fraction depends on which absorption features areseen. The most common type of BAL QSO just ex-hibits absorption in high-ionization lines such as C IV

�1549 and are called HiBALs. If absorption troughsfrom low-ionization features such as Mg II are also seen,then the BAL is classified as a LoBAL. LoBALs thatexhibit absorption in Fe lines such as Fe II are classifiedas FeLoBALs. Finally, the rarest BALs exhibit absorp-tion in the Balmer lines (Hall 2007; Mudd et al. 2017).The distribution of these classes depends on selectionmethod, as Trump et al. (2006) find HiBALs, LoBALs,and FeLoBALs are 26%, 1.3%, and 0.3% in their studyof QSOs in the SDSS, while Urrutia et al. (2009) findthat the BAL fraction is above 30% for all three classesbased on an IR-selected sample.One important application of large, spectroscopic

quasar samples is to measure the matter distribution at0.8 < z < 2.2, the redshift range where quasars are themost accessible tracer of the matter distribution (Ata etal. 2018). That analysis requires robust measurements

arX

iv:1

901.

0450

6v1

[ast

ro-p

h.G

A]

14 Ja

n 20

19

Combination of principal component analysis and a CNN

Page 10: Data Mining, Modeling, and Machine Learning in Astronomyhosting.astro.cornell.edu/~cordes/A6523/A4523-6523... · 2019. 3. 14. · Data Mining, Modeling, and Machine Learning in Astronomy

A4523/A6523 Spring 2019 www.astro.cornell.edu/~cordes/A6523 11

CNN BAL Classification 3

uncertain on the blue wing of the C IV line. Anotheris that it is not sensitive to BAL troughs that are veryshallow. We attempt to address the uncertainty due tothe continuum with the addition of an error term thataccounts for the uncertainty due to the continuum fit.Our modified equation is:

�2BI = �

Z 3000

25000

�2f(v) + �

2PCA

0.92

!C(v)dv (3)

The quantity �PCA is the uncertainty in our PCA fit-ting. We describe how we calculate this quantity in Sec-tion 2.3. AI and the uncertainty in AI are similar to theexpressions for BI:

AI = �Z 0

25000

1� f(v)

0.9

�C(v)dv (4)

�2AI = �

Z 0

25000

�2f(v) + �

2PCA

0.92

!C(v)dv (5)

This main di↵erence between AI and BI is that for AI thequantity C(v) is set to one once the trough has extendedfor more than 450 km s�1, rather than 2000 km s�1. Italso extends the calculation velocity interval to zeroblueshift. Hall et al. (2002) introduced AI to accountfor uncertainties in the systemic redshift, the contin-uum shape, and to measure intrinsic absorption systems,such as the mini-BAL troughs identified by Hamann etal. (2001).Finally, Trump et al. (2006) introduced the parame-

ter �2trough, the reduced chi-squared for each detected

trough:

�2trough =

X 1

N

1� f(v)

�2(6)

In this expression N is the number of pixels in a trough,f(v) is the normalized flux, and � is the estimated rmsnoise for each pixel. This expression is intended to quan-tify the statistical significance of any apparent trough,such that larger values correspond to more significanttroughs. This quantity is particularly useful for assess-ing weak troughs and/or troughs in low signal-to-noiseratio data. Trump et al. (2006) consider a trough with�2trough > 10 to be significant.

2.2. Data

The starting point for our analysis is the SDSS DR14Quasar Catalog of Paris et al. (2018). This catalog has526,356 quasars, including measurements of BI and �BI

for each quasar. The catalog was derived from the spec-troscopic data in the Fourteenth Data Release of SDSS(Abolfathi et al. 2018), and we also used these spectraas the starting point for our analysis. These spectra

were mostly obtained as part of the Baryon OscillationSpectroscopic Survey (BOSS) and its extension (eBOSSDawson et al. 2013, 2016), which were observed with theSDSS spectrograph (Smee et al. 2013). More informa-tion about the selection and analysis of these quasarsare described in Paris et al. (2018).We also used information from the SDSS DR12 quasar

catalog (Paris et al. 2017). That catalog includes297,301 quasars from the Twelfth Data Release of SDSS(Alam et al. 2015). The advantage of the DR12 catalogis that BAL quasars were flagged during a visual in-spection of all of the quasar targets (see also Paris et al.2012). The DR12 quasar catalog has 29,580 quasars vi-sually flagged as BALs. For all quasars at z � 1.57, thiscatalog includes measurements of BI, AI, and �

2trough.

There is an AI value if there is at least one trough with�2trough � 10, and 48,863 quasars meet this criterion.

The catalog also includes the number of troughs andthe velocity ranges of each trough. Of the sample of29,580 quasars visually flagged as BALs, 21,444 haveAI> 0 and 15,044 have BI> 0.The redshift distributions of the DR12 and DR14

quasar catalogs are shown in Figure 1. The DR14 cat-alog includes many more quasars with 0.8 < z < 2.2because of a change in the selection criteria to identifymore quasars to trace large-scale structure in this red-shift range (Ata et al. 2018). The inset panel showsthe redshift distribution for quasars with a significantBI value, which we define as BI > 3�BI .

2.3. Principal Component Analysis

We use Principal Component Analysis (PCA) to fitthe spectra of the quasars from DR14. These fits arenecessary to obtain accurate estimates of the contin-uum to characterize any absorption troughs. Removal ofthe quasar continuum shape and broad emission featuresalso reduces the complexity of the quasar spectra for au-tomated classification. Similar to Paris et al. (2012), wegenerated five principal components from 8000 quasarswith no evidence for BAL features, redshifts of 1.57 <

z < 5.56 that match our search for BALs, and gen-erate PCA components over the rest frame wavelengthrange from 1260A to 2400A. This wavelength range pro-vides good coverage of the C IV and Si IV regions wherewe characterize absorption troughs with blueshifts up to25000 km s�1. The five PCA components are shown inFigure 2.We fit these five PCA components to each quasar with

a �2 minimization algorithm. This algorithm decreases

the wavelength range of the fit for quasars near the red-shift limits of our study. We also run an algorithm todetect troughs with blueshifts from �25000 to 0 km s�1

4 Guo & Martini

Figure 1. Redshift distribution of the SDSS-DR12(dashed red histogram) and SDSS-DR14 (solid blue his-togram) quasars over the redshift range 0 < z < 5. Theinset panel in the upper right shows the redshift distribu-tion of quasars with a significant BI value, BI > 3�BI .The BAL quasars are only shown over the redshift range1.57 < z < 5.56 where the C IV region is visible with theSDSS spectra.

relative to C IV, and use the results to iteratively maskBAL features. The iterative masking of the BAL fea-tures significantly improves the PCA fit to quasars withsignificant absorption troughs. Finally, we subtract thebest PCA fit from each quasar. Examples of the PCAfit and the subtraction are shown in Figure 3.We calculated PCA fits to all of the SDSS DR14

quasar spectra with 1.57 < z < 5.56. In most cases,the subtracted spectrum is flat and the broad emissionlines of C IV and other ions are barely visible, especiallyfor quasars without BAL troughs. The exceptions aretypically on the blue side of the C IV line (and oftenthe Si IV line), where the impact of absorption troughsare most apparent. These di↵erence spectra thereforehighlight exactly the features that we want the auto-mated classifier to identify. Finally, we only use thevelocity range from �25000 to 0 km s�1 relative to C IV

for the automatic classification. This dramatically de-creases the size of the data, and increases the e�ciencyof the classifier.Following the method described above, we fit PCA

components to all of the quasars in SDSS DR14. Figure3 shows examples for both non-BAL and BAL quasarscentered on the C IV emission line. This emission line isremoved when the PCA fit is subtracted, while the ab-sorption features in the BAL quasar examples are pre-served. Even though we fit the PCA components over awide wavelength range, we only use the subtracted spec-

0.01

0.02

0.03

Principal Com ponent 1

− 0.03

0.00

Principal Com ponent 2

− 0.16

− 0.08

0.00

Principal Com ponent 3

− 0.06

0.00

0.06

Principal Com ponent 4

1400 1600 1800 2000 2200 2400

Rest fram e wavelength Å

− 0.05

0.00

0.05

Principal Com ponent 5

Flux

Figure 2. Five principal components computed from a sam-ple of 8000 quasars with no absorption features. The PCAcomponents span the rest frame wavelength range 1260A to2400A. See Section 2.3 for more details.

tra from �25000 km s�1 to 0 km s�1 relative to C IV asinput to our classifier.To characterize the quality of the PCA fit for the mea-

surement of absorption features, we calculate a sepa-rate �

2fit over just the velocity range from �25, 000 to

5000 km s�1 relative to C IV:

�2fit =

1

D

nX

k=1

(Ok � Ek)2

�2(7)

The quantity D is the number of degrees of freedom, Ok

and Ek represent the value for each pixel in the origi-nal spectrum and the PCA fit, respectively, and � is theuncertainty in each original pixel as provided in Paris etal. (2017). The distribution of these �

2fit values for our

DR14 quasar sample are shown in Figure 4. The distri-bution of �2

fit values is peaked at about one, althoughwith a tail to larger values.

3. AUTOMATIC BAL CLASSIFICATION

We chose to implement our automatic BAL classifi-cation algorithm with a convolutional neural network

6 Guo & Martini

Figure 4. Reduced �2 distribution of the PCA fits to theentire sample of SDSS-DR14 quasars (blue, dashed line). Thedistributions are also shown for the CNN training set (red,dotted line) and test set (green, solid line). These two subsetsof the data have larger �2 values because both have a largerpercentage of BALs by construction (about 50%), and thefits to BALs typically have larger �2 values.

Input 1@375x1

ConvolutionMax pooling

Full connection

C1: feature maps32@375X1 M2: feature maps

32@75x1

F3512x1 Output

2

Figure 5. CNN structure employed to create our BAL clas-sifier. Each input spectrum has 375 pixels. The CNN struc-ture has one convolutional layer, one max pooling layer, anda fully connected layer that performs the classification. SeeSection 3 for more details.

traction of the PCA fit. As CNNs typically work bestwith input layers on the interval [0,1] or [-1,1], we alsoexperimented with various schemes to renormalize ourdata, such as division by the PCA fit, but did not iden-tify one that produced substantially better results thansimple subtraction. The CNN structure we use is shownin Figure 5. We implemented the CNN structure with

TensorFlow1, an open source software library for ma-chine learning (Abadi et al. 2015).

3.2. Training and Testing Sets

Machine learning methods such as our CNN classifierrequire a training set. We used the DR12 quasar catalogof Paris et al. (2017) as the starting point to produce onefor our classifier. One virtue of the DR12 catalog is thatthere was a visual inspection of every quasar. However,we did not completely rely on the DR12 classificationsbecause human classification is inherently subjective andthere is no single, quantitative definition of a BAL thatis appropriate for all applications. For example, the bal-nicity index of Weymann et al. (1991) does not includethe first 2000 km s�1 of the absorption feature, will missshallow troughs, and does not extend to the center ofthe C IV line. It will consequently miss broad absorp-tion that is less than 2000 km s�1 in extent that couldnevertheless impact cosmological analysis with the Ly↵forest, as well as strong absorption features near thecenter of the C IV line that could compromise the red-shift estimate. The AI quantity introduced by Hall et al.(2002) is sensitive to narrower absorption features thatare still broader than typical galaxy velocities, and doesextend to zero velocity, although it is still insensitiveto the shallowest features. Finally, both of these mea-sures work less well in low signal-to-noise ratio spectra,and both can be compromised by poor fits to the quasarcontinuum.We started construction of our training set with about

10,000 visually-classified BALs and 10,000 visually-classified non-BALs from the DR12 quasar catalog(Paris et al. 2017), but adjusted these classificationsthrough several iterative passes. For each iteration wetrained a new classifier on the BAL and non-BALs in thetraining set, ran the classifier on the training sample,visually inspected all of the apparent mis-classificationsand ambiguous cases, and adjusted the classifications ofthe training set as appropriate. After several iterations,the classifier converged well with our visual classifica-tions. In all cases, we label quasars with a BAL prob-ability greater than 50% as BALs, and quasars belowthis percentage as non-BALs. For our visual inspectionstep, we classified a quasar as a BAL if it showed nar-rower troughs than the 2000 km s�1 minimum width forBI, and included troughs that extended to the center ofthe C IV emission line. Our iterative process changedthe final classifications of about 6.5% of the training setquasars relative to the DR12 visual BAL flag. Whilestill a subjective process, this approach proved to be

1 https://www.tensorflow.org

The method uses PCA to fit the spectra of many quasars (continuum and emission lines), which are subtracted to give spectra with

(primiarly) absorption lines.

Page 11: Data Mining, Modeling, and Machine Learning in Astronomyhosting.astro.cornell.edu/~cordes/A6523/A4523-6523... · 2019. 3. 14. · Data Mining, Modeling, and Machine Learning in Astronomy

A4523/A6523 Spring 2019 12

Fast Radio Bursts

Fast Radio Burst 121102 Pulse Detection and Periodicity: A Machine Learning Approach

Yunfan Gerry Zhang1,2 , Vishal Gajjar2,3 , Griffin Foster1,2,4, Andrew Siemion1,2,5,6,7, James Cordes8 ,Casey Law1 , and Yu Wang9

1 Department of Astronomy, University of California, Berkeley, CA, USA; [email protected] Berkeley SETI Research Center, University of California, Berkeley, CA, USA

3 Space Sciences Laboratory, Berkeley, CA, USA4 Sub-Department of Astrophysics, Oxford University, UK

5 SETI Institute, Mountain View, CA, USA6 Radboud University, Nijmegen, The Netherlands

7 Institute of Space Sciences and Astronomy, University of Malta, Malta8 Department of Astronomy, Cornell University, USA

9 Department of Statistics, University of California, Berkeley, CA, USAReceived 2018 April 10; revised 2018 August 19; accepted 2018 August 30; published 2018 October 23

Abstract

We report the detection of 72 new pulses from the repeating fast radio burst FRB 121102 in Breakthrough ListenC-band (4–8 GHz) observations at the Green Bank Telescope. The new pulses were found with a convolutionalneural network in data taken on 2017 August 26, where 21 bursts have been previously detected. Our techniquecombines neural network detection with dedispersion verification. For the current application, we demonstrate itsadvantage over a traditional brute-force dedispersion algorithm in terms of higher sensitivity, lower false-positiverates, and faster computational speed. Together with the 21 previously reported pulses, this observation marks thehighest number of FRB 121102 pulses from a single observation, totaling 93 pulses in five hours, including45 pulses within the first 30 minutes. The number of data points reveals trends in pulse fluence, pulse detectionrate, and pulse frequency structure. We introduce a new periodicity search technique, based on the Rayleigh test, toanalyze the time of arrivals (TOAs), with which we exclude with 99% confidence periodicity in TOAs with periodslarger than 5.1 times the model-dependent timestamp uncertainty. In particular, we rule out constant periods10 ms in the barycentric arrival times, though intrinsic periodicity in the time of emission remains plausible.

Key words: methods: data analysis – methods: observational – methods: statistical – pulsars: general – techniques:image processing

1. Introduction

Fast radio bursts (FRBs) are millisecond-duration radiotransients that exhibit dispersion relations consistent withpropagation through cold plasma (Lorimer et al. 2007; Thorntonet al. 2013; Petroff et al. 2016). Out of the known FRBs,only FRB 121102 has been observed to repeat (Spitleret al. 2014, 2016; Scholz et al. 2016, 2017). The repeatingpulses allowed precise localization of the source within a dwarfgalaxy of redshift 0.193 (Chatterjee et al. 2017; Marcoteet al. 2017; Tendulkar et al. 2017), confirming the extragalacticnature of the phenomenon suspected from their high dispersionmeasures (DMs). Recently, Breakthrough Listen reportedobservations of 21 pulses of FRB 121102 recorded with theC-band receiver at the Green Bank Telescope (GBT; Gajjaret al. 2018). The reported bursts marks the highest frequencies ofpulses ever detected from the repeating FRB. Together with theobservation at the William E. Gordon Telescope at the AreciboObservatory, the new pulses show 100% linearly polarizedemission with a high and variable rotation measure of+1.33×105 radians per square meter to +1.46×105 radiansper square meter in the source reference frame, indicating that thesource is situated in a highly magneto-ionic environment(Michilli et al. 2018).

In this paper, we present a reanalysis of the C-bandobservation by Breakthrough Listen on 2017 August 26 withconvolutional neural networks (CNNs). Recent rapid develop-ment of deep learning, and, in particular, CNNs (Krizhevskyet al. 2012; Simonyan & Zisserman 2014; Szegedy et al. 2014;He et al. 2015), has enabled revolutionary improvements in

signal classification and pattern recognition in all fields of datascience such as, but not limited to, computer image processing,medicine, and autonomous driving. In this work, we present thefirst successful application of deep learning to direct detectionof fast radio transient signals in raw spectrogram data. Deeplearning methods have been applied to pulsar searches in Zhuet al. (2014) and Guo et al. (2017), while Wagstaff et al. (2016)and Foster et al. (2018) applied traditional machine learningto single-pulse transient candidate classification. Recently,Connor & van Leeuwen (2018) applied deep learning modelsto FRB searches. These works all focused on reducing thefalse-positive rate from traditional search candidates, thoughConnor & van Leeuwen (2018) suggested the possibility ofdirect deep learning detection. As we shall see, neural networkscan in some scenarios offer higher sensitivity, but lackinterpretability in their predictions. Dedispersion searches areinterpretable but may suffer from poor sensitivity false-alarmtrade-off. In this work, we leverage the advantages of bothtechniques by using the latter to verify the candidate outputs ofthe former. The resulting technique revealed more than 70 newpulses of FRB 121102 in a 5 hr C-band observation conductedby Breakthrough Listen. Our neural network is capable ofprocessing Breakthrough Listen spectral-temporal data 70times faster than real time on a single GPU, though processingspeed in other contexts depends on the frequency and timeresolution. We do not claim our technique is ready to replacecurrent state-of-the-art dedispersion pipelines, but our methodshows advantage in some scenarios and encourages furtherexploration.

The Astrophysical Journal, 866:149 (18pp), 2018 October 20 https://doi.org/10.3847/1538-4357/aadf31© 2018. The American Astronomical Society. All rights reserved.

1

Page 12: Data Mining, Modeling, and Machine Learning in Astronomyhosting.astro.cornell.edu/~cordes/A6523/A4523-6523... · 2019. 3. 14. · Data Mining, Modeling, and Machine Learning in Astronomy

A4523/A6523 Spring 2019 www.astro.cornell.edu/~cordes/A6523 13

number of layers by half. Both of these models show anoticeable drop in recall compared to the original. While it iscertainly possible to obtain higher sensitivity with a largermodel, here we chose an architecture that is sufficient for thisanalysis.

The accuracy of detection with a sufficiently capablenetwork is determined by the quality of the training set. Agood training set not only needs to be large enough to capturethe distribution of inputs but also be relatively balanced. Sinceour positive examples are simulated, we are able to have anexactly balanced representation of positive signals, resulting ingood recall score for all ranges of DMs, widths, and frequencymodulations. The RFI distribution, however, is not necessarilybalanced. The rate of false detections in our network is around2%. Out of the 400,000 images in our training set, if a type ofRFI only exists in 400 images, then the network would nothave sufficient incentive to learn to reject an interferer of thattype, because complete misclassification of the RFI only leads

to a 0.1% reduction in precision. A common method to reducesuch false positives is fine-tuning, which refers to the retrainingof the network with a smaller data set. However, fine-tuning issubject to overfitting and can prove difficult in practice. We aredeveloping a novel method to train our network to be morerobust to such underrepresented RFI types. For this currentwork, we simply manually reject all such false positives.In addition to RFI, background random noise presents

another potential source of false detection. Even though thedistinction between RFI and FRB can be reduced with bettertraining data, the detection performance in a background ofnoise is subject to trade-offs between recall and false-alarm ratedepending on threshold of detection. To test this explicitly, weran our trained model on a 3 hr BL observation in the X-band(8–11.6 GHz). The network returned only seven positives dueto occasional strong RFI. In other words, the observation, theequivalent of around 120,000 images, produced no falsepositives due to noise, thus indicating we are in the very lowfalse-positive regime in noise receiver operation characteristics.

3.5. Inference Speed

Inference speed is crucial in real-time applications such asautonomous driving, where a large number of images must beprocessed per second. For radio astronomy, inference speed isless of an issue, as we now explain. Because of the high noisein radio astronomy data, each pixel in the input spectrogramdoes not contain independent useful information. Therefore, thefirst one or two convolutional layers should employ largeconvolutional kernels and large strides, which immediatelyreduces the data rate. In this analysis, we fix the channel

Figure 1. Examples of simulated pulses on real observations. Relatively brightexamples are shown for visual clarity while the actual training set containedmuch weaker pulses.

Table 1Architecture of Our Residual Network, Showing Input, Five Convolutional

Stacks (Conv), Average Pooling (avg-pool), and Fully Connected(fc) Output Layers

Group Name Output Size Stack Type

conv0 342×256×1 [32×1]×1conv1 171×128×32 [7×7]×1conv2 42×32×32 [3×3]×2conv3 10×8×64 [3×3]×3conv4 5×4×128 [3×3]×2avg-pool 1×128fc 2

Note.Stack types are shown in [h×w]×N, where h and w are the height andwidth of the weights, and N is the number of convolutional blocks in the stack.The output sizes are shown as [H×W]×M, where M is the number offeatures.

Figure 2. Model recall scores as a function of fluence (top) and full-bandfrequency-integrated S/N (bottom). The dashed lines show the training recallwhile the solid lines show test recall. Two smaller models are shown incomparison with our original (red). Compared to the original model, the thinmodel has half the amount of features, and the shallow model has half as manylayers. Both show reduced recall for weak pulses. The flare at the low fluenceside is due to small-number statistics. In the lower panel, the theoretical upperlimit recall for the full-band dedispersion search with 6σ threshold is includedfor comparison.

4

The Astrophysical Journal, 866:149 (18pp), 2018 October 20 Zhang et al.

Page 13: Data Mining, Modeling, and Machine Learning in Astronomyhosting.astro.cornell.edu/~cordes/A6523/A4523-6523... · 2019. 3. 14. · Data Mining, Modeling, and Machine Learning in Astronomy

A4523/A6523 Spring 2019 www.astro.cornell.edu/~cordes/A6523 14

Applying Deep Learning to Fast Radio Burst Classification

Liam Connor1,2 and Joeri van Leeuwen1,21 ASTRON, Netherlands Institute for Radio Astronomy, Postbus 2, 7990 AA Dwingeloo, The Netherlands; [email protected]

2 Anton Pannekoek Institute for Astronomy, University of Amsterdam, Science Park 904, 1098 XH Amsterdam, The NetherlandsReceived 2018 March 8; revised 2018 October 2; accepted 2018 October 2; published 2018 November 9

Abstract

Upcoming fast radio burst (FRB) surveys will search ∼103 beams on the sky with a very high duty cycle,generating large numbers of single-pulse candidates. The abundance of false positives presents an intractableproblem if candidates are to be inspected by eye, making it a good application for artificial intelligence (AI). Weapply deep learning to single-pulse classification and develop a hierarchical framework for ranking events by theirprobability of being astrophysical transients. We construct a treelike deep neural network that takes multiple orindividual data products as input (e.g., dynamic spectra and multibeam information) and trains on themsimultaneously. We have built training and test sets using false-positive triggers from real telescopes, simulatedFRBs, and pulsar single pulses. Training the network was independently done for both the CHIME Pathfinder andApertif. High accuracy and recall can be achieved with a labeled training set of a few thousand events. Even withhigh triggering rates, classification can be done very quickly on graphical processing units, which is essential forselective voltage dumps or real-time VOEvents. We investigate whether dedispersion back ends could be replacedby a real-time DNN classifier. It is shown that a single forward propagation through a moderate convolutionalnetwork could be faster than brute-force dedispersion, but the low signal-to-noise per pixel makes such a classifiersuboptimal for this problem. Real-time automated classification will prove useful for bright, unexpected signals,both now and when searchable parameter spaces outgrow our ability to manually inspect data, such as for the SKAand ngVLA.

Key words: methods: data analysis – pulsars: general – techniques: image processing

1. Introduction

Fast radio bursts (FRBs) are bright, millisecond-duration,extragalactic radio transients characterized by dispersionmeasures (DMs) that are significantly larger than the expectedMilky Way contribution. They have been detected at fluxdensities between tens of μJy and tens of Jy (Lorimer et al.2007; Thornton et al. 2013; Petroff et al. 2015a; Raviet al. 2016). The majority of early detections were made withthe Parkes telescope multibeam receiver, but in recent years,detections have been made at Arecibo (Spitler et al. 2014),Green Bank Telescope (GBT; Masui et al. 2015), the UpgradedMolonglo Synthesis Telescope (UTMOST; Caleb et al. 2017),and the Australian Square Kilometre Array Pathfinder(ASKAP; Bannister et al. 2017). The only source known torepeat is FRB 121102 (Scholz et al. 2016; Spitler et al. 2016),allowing for the first host galaxy localization using very longbaseline interferometry (VLBI; Marcote et al. 2017; Tendulkaret al. 2017). Recently, the repeating bursts from this sourcewere found to be almost 100% linearly polarized with aFaraday rotation measure (RM) of 105 rad m−2 (Michilli et al.2018a).

There are likely thousands of detectable events each dayacross the full sky, but only ∼ 50 have been observed to date.This is due to the moderate field of view (FoV) and relativelylow duty cycle of current FRB surveys. Still, such surveys haveproduced thousands of false-positive triggers for each true FRB,the diagnostic plots of which have traditionally been inspectedby eye (Masui et al. 2015; Amiri et al. 2017; Caleb et al. 2017;Foster et al. 2018). For upcoming fast transient surveys, thefalse-positive problem will be intractable if single-pulsecandidates are to be human-inspected, even with rigorousremoval of radio frequency interference (RFI). The Canadian

Hydrogen Intensity Mapping Experiment (CHIME) will search1024 beams at all times between 400 and 800MHz and up tovery high DMs (Kaspi & CHIME/FRB Collaboration 2017; Nget al. 2017). The Aperture Tile in Focus (Apertif) experiment onthe Westerbork telescope will continuously search thousands ofsynthesized beams at 1.4 GHz (van Leeuwen 2014). ASKAP(Bannister et al. 2017) and UTMOST (Caleb et al. 2017) are alsoexpected to have high detection rates, searching many beamswith high duty cycles. As a result, we will go from roughly fivenew FRB detections per year (2012–2017) to, potentially,thousands (>2019). This will also correspond with an orders-of-magnitude increase in the number of false-positive candidates,meaning that the generation of such events must be mitigated,and the process of sifting through them must be automated.In pulsar searching, the problem is arguably worse due to the

larger number of parameters involved, like rotation period andits derivatives. Over the last decades, the ranking of pulsarcandidates has involved an initial step of selection throughsimple heuristics, the main one being the peak signal-to-noiseratio (S/N) of the frequency-collapsed pulse profile. Thereafter,astronomers go through the ordered list of candidate plotslooking for further pulsar signs, such as broadband, properlydispersed signal; a sharply peaked (not sinusoidal) foldedprofile; and steady emission throughout the observation. Anexperienced pulsar astronomer can average one to two plots persecond, and human brains are very capable of singling out themost promising candidates. But modern multibeam pulsarsurveys and the increasing bandwidths and new frequenciesoutside of the radio-quiet protected spectrum are makingthis approach unfeasible. A telescope like LOFAR employsmany hundreds of beams (van Leeuwen & Stappers 2010) andproduces vast numbers of candidates. The LOFAR pilotsurveys the LPPS and LOFAR Tied-Array All-Sky Survey

The Astronomical Journal, 156:256 (13pp), 2018 December https://doi.org/10.3847/1538-3881/aae649© 2018. The American Astronomical Society. All rights reserved.

1

Page 14: Data Mining, Modeling, and Machine Learning in Astronomyhosting.astro.cornell.edu/~cordes/A6523/A4523-6523... · 2019. 3. 14. · Data Mining, Modeling, and Machine Learning in Astronomy

A4523/A6523 Spring 2019 www.astro.cornell.edu/~cordes/A6523 15

Notion of replacing standard dedispersion with a NN that ‘learns’ dedispersion

Page 15: Data Mining, Modeling, and Machine Learning in Astronomyhosting.astro.cornell.edu/~cordes/A6523/A4523-6523... · 2019. 3. 14. · Data Mining, Modeling, and Machine Learning in Astronomy

A4523/A6523 Spring 2019 www.astro.cornell.edu/~cordes/A6523 16

MNRAS 000, 1–14 (2019) Preprint 19 February 2019 Compiled using MNRAS LATEX style file v3.0

Towards deeper neural networks for Fast Radio Burst

detection

Devansh Agarwal,1,2,⇤? Kshitij Aggarwal,1,2,⇤† Sarah Burke-Spolaor1,2,Duncan R. Lorimer1,2 and Nathaniel Garver-Daniels1,21West Virginia University, Department of Physics and Astronomy, P. O. Box 6315, Morgantown, WV, USA

2Center for Gravitational Waves and Cosmology, West Virginia University, Chestnut Ridge Research Building, Morgantown, WV, USA

Accepted XXX. Received YYY; in original form ZZZ

ABSTRACTWith the upcoming commensal surveys for Fast Radio Bursts (FRBs), and their highcandidate rate, usage of machine learning algorithms for candidate classification is anecessity. Such algorithms will also play a pivotal role in sending real-time triggers forprompt follow-ups with other instruments. In this paper, we have used the technique ofTransfer Learning to train the state-of-the-art deep neural networks for classificationof FRB and Radio Frequency Interference (RFI) candidates. These are convolutionalneural networks which work on radio frequency-time and dispersion measure-time im-ages as the inputs. We trained these networks using simulated FRBs and real RFIcandidates from telescopes at the Green Bank Observatory. We present 11 deep learn-ing models, each with an accuracy and recall above 99.5% on our test dataset com-prising of real RFI and pulsar candidates. As we demonstrate, these algorithms aretelescope and frequency agnostic and are able to detect all FRBs with signal-to-noiseratios above 10 in ASKAP and Parkes data. We also provide an open-source pythonpackage FETCH (Fast Extragalactic Transient Candidate Hunter) for classification ofcandidates, using our models. Using FETCH, these models can be deployed along withany commensal search pipeline for real-time candidate classification.

Key words: radio continuum: transients – methods: data analysis

1 INTRODUCTION

Fast Radio Bursts (FRBs) are extremely bright, millisecond-duration radio transients that are characterised by disper-sion measures (DMs) that are much higher than the ex-pected Milky Way contribution originally seen in data fromthe Parkes radio telescope (Lorimer et al. 2007; Thorntonet al. 2013a). They have subsequently been detected in datacollected at Arecibo (Spitler et al. 2014), Green Bank Tele-scope (GBT) (Masui et al. 2015), the upgraded MolongloSynthesis Telescope (UTMOST) (Caleb et al. 2017), and theAustralian Square Kilometre Array Pathfinder (ASKAP)(Bannister et al. 2017; Shannon et al. 2018). Of over 60 FRBspublished,1 two have been found to repeat: FRB 121102(Spitler et al. 2016) and FRB 180814.J0422+73 (Amiri et al.2019a). FRB 121102 was confidently localized to a low-metallicity host galaxy at a redshift of 0.19 by the Realfast

? E-mail: [email protected] (DA)† E-mail: [email protected] (KA)*Both authors contributed equally to this work.

1 http://frbcat.org (Petro↵ et al. 2016)

detector (Law et al. 2018) on the Karl G. Jansky Very LargeArray (Chatterjee et al. 2017; Tendulkar et al. 2017), mak-ing it evident that some, if not all, FRBs are cosmologicalin origin.

FRB searches are typically done on high time and fre-quency resolution radio astronomical data by first correctingaccounting for the dispersive delay over many trial DM val-ues. This is then frequency averaged to generate a time se-ries. These de-dispersed time series are then convolved withbox-car kernels of various widths to look for broader pulses.Finally, candidates above a detection threshold are markedfor visual inspection by a human. More recently, however,with the advent of state-of-the-art de-dispersion algorithmsand Graphic Processing Unit (GPU)-accelerated pipelines(e.g., heimdall2 (Barsdell et al. 2012); FREDDA (Bannis-ter et al. in prep); bonsai (Smith et al. in prep)), it is nowpossible to implement real-time FRB searches. As a result,commensal back-ends for FRB detection are now running onmany radio telescopes around the world. All of these searches

2 https://sourceforge.net/projects/heimdall-astro

© 2019 The Authors

arX

iv:1

902.

0634

3v1

[ast

ro-p

h.IM

] 17

Feb

201

9

Page 16: Data Mining, Modeling, and Machine Learning in Astronomyhosting.astro.cornell.edu/~cordes/A6523/A4523-6523... · 2019. 3. 14. · Data Mining, Modeling, and Machine Learning in Astronomy

A4523/A6523 Spring 2019 www.astro.cornell.edu/~cordes/A6523 17

Page 17: Data Mining, Modeling, and Machine Learning in Astronomyhosting.astro.cornell.edu/~cordes/A6523/A4523-6523... · 2019. 3. 14. · Data Mining, Modeling, and Machine Learning in Astronomy

A4523/A6523 Spring 2019 www.astro.cornell.edu/~cordes/A6523 18

It’s not all about neural networks

Page 18: Data Mining, Modeling, and Machine Learning in Astronomyhosting.astro.cornell.edu/~cordes/A6523/A4523-6523... · 2019. 3. 14. · Data Mining, Modeling, and Machine Learning in Astronomy

A4523/A6523 Spring 2019 www.astro.cornell.edu/~cordes/A6523 19

THE ASTROPHYSICAL JOURNAL, 517 :78È91, 1999 May 201999. The American Astronomical Society. All rights reserved. Printed in U.S.A.(

AN AUTOMATED CLUSTER FINDER: THE ADAPTIVE MATCHED FILTER

JEREMY KEPNER,1,2 XIAOHUI FAN,1 NETA BAHCALL,1 JAMES GUNN,1 ROBERT LUPTON,1 AND GUOHONG XU1Received 1998 March 9 ; accepted 1998 December 29

ABSTRACTWe describe an automated method for detecting clusters of galaxies in imaging and redshift galaxy

surveys. The adaptive matched Ðlter (AMF) method utilizes galaxy positions, magnitudes, andÈwhenavailableÈphotometric or spectroscopic redshifts to Ðnd clusters and determine their redshift and rich-ness. The AMF can be applied to most types of galaxy surveys, from two-dimensional (2D) imagingsurveys, to multiband imaging surveys with photometric redshifts of any accuracy (2.5 dimensional

to three-dimensional (3D) redshift surveys. The AMF can also be utilized in the selection of[212D]),clusters in cosmological N-body simulations. The AMF identiÐes clusters by Ðnding the peaks in acluster likelihood map generated by convolving a galaxy survey with a Ðlter based on a model of thecluster and Ðeld galaxy distributions. In tests on simulated 2D and data with a magnitude limit of212Dr@ B 23.5, clusters are detected with an accuracy of *z B 0.02 in redshift and D10% in richness to z [0.5. Detecting clusters at higher redshifts is possible with deeper surveys. In this paper we present thetheory behind the AMF and describe test results on synthetic galaxy catalogs.Subject headings : galaxies : clusters : general È methods : data analysis

1. INTRODUCTION

Clusters of galaxiesÈthe most massive virialized systemsknownÈprovide powerful tools in the study of cosmology,from tracing the large-scale structure of the universe(Bahcall 1988 ; Huchra et al. 1990 ; Postman, Huchra, &Geller 1992 ; Dalton et al. 1994 ; Peacock & Dodds 1994,and references therein) to determining the amount of darkmatter on Mpc scales (Zwicky 1957 ; Tyson, Valdes, &Wenk 1990 ; Kaiser & Squires 1993 ; Peebles 1993 ; Bahcall,Lubin, & Dorman 1995 ; Carlberg et al. 1996) to studyingthe evolution of cluster abundance and its cosmologicalimplications (Evrard 1989 ; Peebles, Daly, & Juszkiewicz1989 ; Henry et al. 1992 ; Eke, Cole, & Frenk 1996 ; Bahcall,Fan, & Cen 1997 ; Carlberg et al. 1997 ; Oukbir & Blan-chard 1997). The above studies place some of the strongestconstraints yet on cosmological parameters, including themass-density parameter of the universe, the amplitude ofmass Ñuctuations at a scale of 8 h~1 Mpc, and the baryonfraction.

The availability of complete and accurate cluster catalogsneeded for such studies is limited. One of the most usedcatalogs, the Abell catalog of rich clusters (Abell 1958, andits southern counterpart, Abell, Corwin, & Olowin 1989),has been extremely useful over the past four decades. Thiscatalog, which contains D4000 rich clusters to overz [ 0.2the entire high-latitude sky, with estimated redshifts andrichnesses for all clusters, was constructed by visual selec-tion from the Palomar Sky Survey plates, using well-deÐnedselection criteria. The Zwicky cluster catalog (Zwicky et al.1961È1968) was similarly constructed by visual inspection.

The need for new, objective, and accurate large-areacluster catalogs to various depths is growing, following theimportant use of clusters in cosmology. Large-area skysurveys using CCD imaging in one or several colors, as wellas redshift surveys, are currently planned or underway,

1 Princeton University Observatory, Peyton Hall, Ivy Lane, Princeton,NJ 08544-1001 ; jvkepner=astro.princeton.edu, fan=astro.princeton.edu,neta=astro.princeton.edu, jeg=astro.princeton.edu,rhl=astro.princeton.edu, xu=astro.princeton.edu.

2 Present address : MIT Lincoln Laboratory, Lexington, MA.

including, among others, the Sloan Digital Sky Survey(SDSS). Such surveys will provide the data needed for con-structing accurate cluster catalogs that will be selected in anobjective and automated manner. In order to identify clus-ters in the new galaxy surveys, a robust and automatedcluster selection algorithm is needed. We propose such amethod here.

Cluster identiÐcation algorithms have typically been tar-geted at speciÐc surveys, and new algorithms have beencreated as each survey is completed. Abell (1958) was theÐrst to develop a well-deÐned method for cluster selection,even though the identiÐcation was carried out by visualinspection (see, e.g., McGill & Couchman 1990 for ananalysis of this method). Other algorithms have beencreated for the Automatic Plate Measurement Facility(APM) survey (Dalton et al. 1994 ; Dalton, Maddox, &Sutherland 1997 ; see Schuecker & Bohringer 1998 for avariant of this method), the Edinburgh-Durham survey(ED; Lumsden et al. 1992), and the Palomar DistantCluster Survey (PDCS; Postman et al. 1996 ; see also Kawa-saki et al. 1998 for a variant of this method ; and Kleyna etal. 1996 for an application this method to Ðnding dwarfspheroidals). All the above methods were designed for andapplied to two-dimensional imaging surveys.

In this paper we present a well-deÐned, quantitativemethod, based on a matched Ðlter technique that expandson some of the previous methods and provides a generalalgorithm that can be used to identify clusters in any type ofsurvey. It can be applied to two-dimensional (2D) imagingsurveys, 2.5 dimensional surveys (multiband imaging(212D)with photometric redshift estimates of any accuracy), andthree-dimensional (3D) redshift surveys, as well as com-binations of the above (i.e., some galaxies with photometricredshifts and some with spectral redshifts). In addition, thisadaptive matched Ðlter (AMF) method can be applied toidentify clusters in cosmological simulations.

The AMF identiÐes clusters by Ðnding the peaks in acluster likelihood map generated by convolving a galaxysurvey (2D, or 3D) with a Ðlter that models the cluster212D,and Ðeld galaxy distribution. The peaks in the likelihoodmap correspond to locations where the match between the

78

Page 19: Data Mining, Modeling, and Machine Learning in Astronomyhosting.astro.cornell.edu/~cordes/A6523/A4523-6523... · 2019. 3. 14. · Data Mining, Modeling, and Machine Learning in Astronomy

A4523/A6523 Spring 2019 20

Unsupervised learning of Kepler dataMNRAS 484, 834–849 (2019) doi:10.1093/mnras/sty3461Advance Access publication 2018 December 21

Systematic serendipity: a test of unsupervised machine learning as amethod for anomaly detection

Daniel Giles 1,2‹‡ and Lucianne Walkowicz1‹1Astronomy Department, The Adler Planetarium, Chicago, IL 60605, USA2Physics Department, Illinois Institute of Technology, 10 W 35th St, Chicago, IL 60616, USA

Accepted 2018 December 18. Received 2018 December 17; in original form 2018 October 8

ABSTRACTAdvances in astronomy are often driven by serendipitous discoveries. As survey astronomycontinues to grow, the size and complexity of astronomical data bases will increase, and theability of astronomers to manually scour data and make such discoveries decreases. In thiswork, we introduce a machine learning-based method to identify anomalies in large data sets tofacilitate such discoveries, and apply this method to long cadence light curves from NASA’sKepler Mission. Our method clusters data based on density, identifying anomalies as datathat lie outside of dense regions. This work serves as a proof-of-concept case study and wetest our method on four quarters of the Kepler long cadence light curves. We use Kepler’smost notorious anomaly, Boyajian’s star (KIC 8462852), as a rare ‘ground truth’ for testingoutlier identification to verify that objects of genuine scientific interest are included amongthe identified anomalies. We evaluate the method’s ability to identify known anomalies byidentifying unusual behaviour in Boyajian’s star; we report the full list of identified anomaliesfor these quarters, and present a sample subset of identified outliers that includes unusualphenomena, objects that are rare in the Kepler field, and data artefacts. By identifying<4 per cent of each quarter as outlying data, we demonstrate that this anomaly detectionmethod can create a more targeted approach in searching for rare and novel phenomena.

Key words: methods: data analysis – surveys – stars: individual: KIC 846285 – stars: individ-ual: KIC 8462852.

1 IN T RO D U C T I O N

Survey astronomy is producing more data than ever before, bothexpanding the number of objects observed and the number of ob-servations per object. PanSTARRS, for example, recently deliveredto astronomy the first petabyte scale data release (Chambers et al.2016), Gaia has released data for nearly 2 billion sources (GaiaCollaboration 2016, 2018), and others, like the Transiting ExoplanetSurvey Satellite (TESS; Ricker 2014), and the Zwicky TransientFacility (ZTF; Smith et al. 2014), have launched and will releasedata in short order. The Large Synoptic Survey Telescope (LSST;LSST Science Collaboration 2009) will have first light in the nextfew years and deliver 10 to 30 terabytes of data per night. Thesesurveys yield unprecedented insights into the universe by observingbillions of stars and galaxies through space and time, adding newobjects to every category of known phenomena, and creating newcategories of previously unknown, unobserved events. Identifying

⋆ E-mail: [email protected] (DG); [email protected](LW)‡ LSSTC Data Science Fellow

new, anomalous, and outlying observations pose a significantchallenge given the scale of data. In this work we present a proof-of-concept for a methodology we’ve developed to address thischallenge.

As Douglas Hawkins puts it, ‘An outlier is an observation whichdeviates so much from the other observations as to arouse suspicionsthat it was generated by a different mechanism’ (Hawkins 1980).The need for, and by extension the application of, anomaly detectionin large-scale astronomy is still relatively new, but anomaly detec-tion is well precedented outside of the astronomical community.Computer scientists have developed techniques to identify abnor-malities for a multitude of reasons, including detecting networkattacks (Agrawal & Agrawal 2015), fraud (Ahmed, Mahmood &Islam 2016), and malware (Menahem et al. 2009). A survey ofdifferent anomaly detection methods is presented by Chandola,Banerjee & Kumar (2009) and initial applications to astronomicalsurvey data have been pursued (for example Wagstaff et al. 2013;Baron & Poznanski 2017). To date, though, discoveries of novelphenomena in astronomy have often been more serendipitous thanintentional (see Thompson et al. 2012; Wright et al. 2014; Boyajianet al. 2016). The scale of modern astronomical surveys does not

C⃝ 2018 The Author(s)Published by Oxford University Press on behalf of the Royal Astronomical Society

Dow

nloaded from https://academ

ic.oup.com/m

nras/article-abstract/484/1/834/5256651 by Cornell U

niversity Library user on 22 January 2019

‘Anomaly’ = dips in Boyajian’s star that don’t fit standard transit time series

Kepler data = training set

Application anticipated to TESS data

Page 20: Data Mining, Modeling, and Machine Learning in Astronomyhosting.astro.cornell.edu/~cordes/A6523/A4523-6523... · 2019. 3. 14. · Data Mining, Modeling, and Machine Learning in Astronomy

A4523/A6523 Spring 2019 www.astro.cornell.edu/~cordes/A6523 21

! !""" #$%

!"#$ %"$&# "'()"*) + ,- .//&"0(1"2*3 2% *$45(& *$162573 12 289$01#$1$01"2* (*# 31(5:)(&(;< 0&(33"%"0(1"2*

%& $'()*+',-! .& ./)0123+,!,4 .& 5+'0+,- #& 6/031/7*))18,9 /'( :& ;/<2/'+!

-!""#$%&'($)( *"'$(+(,)-( .) /&0(.),(+'#1 %)& 2()&$)#33( 451 67484 9&0(3)1 :'&3;!<&-(3'&= .) >-)#+?#1 @+)%#$")'&= .) >&3#$+(1 %)& >A *33#+.#1 6B764 C&$(+)"") D>&3#$+(E1 :'&3;4::*>> FGA HA /&)&+)#33(I1 %)& JA K#33#L$)+(1 4M1 6B74M N)#'$) "O, D>&3#$+(E1 :'&3;8P2:1 @+)%#$")'&= .) >&3#$+(1 %)& >A *33#+.#1 6B764 C&$(+)"") D>&3#$+(E1 :'&3;9:9<2 @+)'&= .) >&3#$+(1 %)& >A *33#+.#1 6B764 C&$(+)"") D>&3#$+(E1 :'&3;

$==*<>*( !""" ?/@ 8& #*=*1A*( !""" $<)13 !BC 1' +)101'/3 7+)D -EEE :+A*DF*) !G

. = > ? @ . A ?

$H>)+'+D1=/3 I1(*J71*3( 1D/01'0 <*)7+)D*( I1>K '*I 3/)0*J7+)D/> ;;L (*>*=>+)H <+H*H(/>/ )*(2=>1+' <)+F3*DH +7 2'<)*=*(*'>*( H=/3*, IK1=K /)* (1771=23> >+ (*/3 I1>K 2H1'0>)/(1>1+'/3 1'>*)/=>1A* >++3H& M* <)*H*'> K*)* :N!" O:*2)/3 NP>)/=>+)Q, / '*I '*2)/3 '*>I+)RO::Q F/H*( </=R/0* =/</F3* +7 (*>*=>1'0 +FS*=>H /'( <*)7+)D1'0 F+>K (*F3*'(1'0 /'( H>/)T0/3/P@ =3/HH171=/>1+' 1' /' /2>+D/>1= I/@& 6)/(1>1+'/33@, 1' /H>)+'+D1=/3 1D/0*H, +FS*=>H /)*71)H> (1H>1'021HK*( 7)+D >K* '+1H@ F/=R0)+2'( F@ H*/)=K1'0 7+) H*>H +7 =+''*=>*( <1P*3HK/A1'0 F)10K>'*HH*H /F+A* / 01A*' >K)*HK+3(C >K*@ /)* >K*' =3/HH171*( /H H>/)H +) /H 0/3/P1*H>K)+20K (1/0'+H>1= (1/0)/DH K/A1'0 A/)1/F3*H =K+H*' /==+)(1'0 >+ >K* /H>)+'+D*)UH >/H>* /'(*P<*)1*'=*& V' >K* *P>)/=>1+' H>*<, /HH2D1'0 >K/> 1D/0*H /)* I*33 H/D<3*(, :N!" )*W21)*H+'3@ >K* H1D<3*H> / <)1+)1 (*71'1>1+' +7 XIK/> /' +FS*=> 1HU O1&*& 1> R**<H /33 H>)2=>2)*H=+D<+H*( +7 D+)* >K/' +'* <1P*3Q /'( <*)7+)DH >K* (*>*=>1+' A1/ /' 2'H2<*)A1H*( ::,/<<)+/=K1'0 (*>*=>1+' /H / =32H>*)1'0 <)+F3*D >K/> K/H F**' >K+)+20K3@ H>2(1*( 1' >K*/)>171=1/3 1'>*3310*'=* 31>*)/>2)*& 6K* 71)H> </)> +7 >K* :N!" <)+=*(2)* =+'H1H>H +7 /' +<>1D/3=+D<)*HH1+' +7 >K* )*(2'(/'> 1'7+)D/>1+' =+'>/1'*( 1' >K* <1P*3H A1/ / D/<<1'0 7)+D <1P*31'>*'H1>1*H >+ / H2FH</=* 1'(1A1(2/31Y*( >K)+20K <)1'=1</3 =+D<+'*'> /'/3@H1H& $> D/0'1J>2(*H 7/1'>*) >K/' >K* =+D<3*>*'*HH 31D1>, H>/)H /)* 2H2/33@ /3D+H> 1'(1H>1'021HK/F3* 7)+D0/3/P1*H, /'( >K*)*7+)* >K* </)/D*>*)H =K/)/=>*)1Y1'0 >K* >I+ =3/HH*H (+ '+> 31* 1' (1H=+'J'*=>*( H2FH</=*H, >K2H <)*A*'>1'0 >K* 2H* +7 2'H2<*)A1H*( D*>K+(H& M* >K*)*7+)* /(+<>*( /H2<*)A1H*( :: O1&*& / :: >K/> 71)H> 71'(H >K* )23*H >+ =3/HH17@ +FS*=>H 7)+D *P/D<3*H /'( >K*'/<<31*H >K*D >+ >K* IK+3* (/>/ H*>Q& V' <)/=>1=*, */=K +FS*=> 1H =3/HH171*( (*<*'(1'0 +' 1>HD*DF*)HK1< +7 >K* )*01+'H D/<<1'0 >K* 1'<2> 7*/>2)* H</=* 1' >K* >)/1'1'0 H*>& V' +)(*) >++F>/1' /' +FS*=>1A* /'( )*31/F3* =3/HH171=/>1+', 1'H>*/( +7 2H1'0 /' /)F1>)/)13@ (*71'*( H*> +77*/>2)*H I* 2H* / :: >+ H*3*=> >K* D+H> H10'171=/'> 7*/>2)*H /D+'0 >K* 3/)0* '2DF*) +7D*/H2)*( +'*H, /'( >K*' I* 2H* >K*H* H*3*=>*( 7*/>2)*H >+ <*)7+)D >K* =3/HH171=/>1+' >/HR& V'+)(*) >+ +<>1D1Y* >K* <*)7+)D/'=* +7 >K* H@H>*D, I* 1D<3*D*'>*( /'( >*H>*( H*A*)/3(177*)*'> D+(*3H +7 ::& 6K* =+D</)1H+' +7 >K* :N!" <*)7+)D/'=* I1>K >K/> +7 >K* F*H>(*>*=>1+' /'( =3/HH171=/>1+' </=R/0* R'+I' >+ >K* /2>K+)H O%N!"#$%"&#Q HK+IH >K/> :N!"1H /> 3*/H> /H *77*=>1A* /H >K* F*H> >)/(1>1+'/3 </=R/0*H&

B$< 625#3C D*>K+(HZ (/>/ /'/3@H1H [ >*=K'1W2*HZ 1D/0* <)+=*HH1'0 [ =/>/3+02*H&

D , E ? @ F G H A ? , F E

$H>)+'+D1=/3 I1(*J71*3( OK*)*/7>*) M\Q 1D/01'0 *'=+D</HH*H>K* 2H* +7 1D/0*H 3/)0*) >K/' 8"""! <1P*3H O51<+A*H>R@ -EE8Q /'(1H >K* +'3@ >++3 >+ >/=R3* <)+F3*DH F/H*( +' )/)* +FS*=>H +) +'

H>/>1H>1=/33@ H10'171=/'> H/D<3*H +7 +<>1=/33@ H*3*=>*( +FS*=>H&6K*)*7+)*, M\ 1D/01'0 K/H F**' /'( H>133 1H +7 </)/D+2'>)*3*A/'=* >+ /3D+H> >K* IK+3* 71*3( +7 /H>)+<K@H1=HZ 7)+D >K*H>)2=>2)* /'( (@'/D1=H +7 +2) ./3/P@ >+ >K* *'A1)+'D*'>/3 *77*=>H+' 0/3/P@ 7+)D/>1+' /'( *A+32>1+' >+ >K* 3/)0*JH=/3* H>)2=>2)* +7>K* ]'1A*)H*& V' >K* </H>, M\ I/H >K* /3D+H> *P=32H1A* (+D/1' +7%=KD1(> >*3*H=+<*H *W21<<*( I1>K 3/)0* <K+>+0)/<K1= <3/>*H /'(

?+'& :+>& #& $H>)+'& %+=& IDJ, ^""[^-G O!"""Q

! NJD/13Z /'()*+'_'/&/H>)+&1>

Dow

nloaded from https://academ

ic.oup.com/m

nras/article-abstract/319/3/700/1073630 by guest on 07 February 2019

Page 21: Data Mining, Modeling, and Machine Learning in Astronomyhosting.astro.cornell.edu/~cordes/A6523/A4523-6523... · 2019. 3. 14. · Data Mining, Modeling, and Machine Learning in Astronomy

From: Wide field imaging — I. Applications of neural networks to object detection and star/galaxy classificationMon Not R Astron Soc. 2000;319(3):700-716. doi:10.1046/j.1365-8711.2000.03700.xMon Not R Astron Soc | © RAS

Preprocessing of data before NN

Page 22: Data Mining, Modeling, and Machine Learning in Astronomyhosting.astro.cornell.edu/~cordes/A6523/A4523-6523... · 2019. 3. 14. · Data Mining, Modeling, and Machine Learning in Astronomy

From: Wide field imaging — I. Applications of neural networks to object detection and star/galaxy classificationMon Not R Astron Soc. 2000;319(3):700-716. doi:10.1046/j.1365-8711.2000.03700.xMon Not R Astron Soc | © RAS

Page 23: Data Mining, Modeling, and Machine Learning in Astronomyhosting.astro.cornell.edu/~cordes/A6523/A4523-6523... · 2019. 3. 14. · Data Mining, Modeling, and Machine Learning in Astronomy

From: Wide field imaging — I. Applications of neural networks to object detection and star/galaxy classificationMon Not R Astron Soc. 2000;319(3):700-716. doi:10.1046/j.1365-8711.2000.03700.xMon Not R Astron Soc | © RAS

Page 24: Data Mining, Modeling, and Machine Learning in Astronomyhosting.astro.cornell.edu/~cordes/A6523/A4523-6523... · 2019. 3. 14. · Data Mining, Modeling, and Machine Learning in Astronomy

Basics of Principal Component Analysis (PCA)

• Done on blackboard• Estimate covariance matrix C from data vectors• Solve eigen equation with C as operator• Eigenvectors = principal components• Eigenvalues = statistical variance in the data associated with each eigenvector

• The eigen equation is found via constrained maximization:• Maximize the variance of each eigenvector subject to:

• Orthonormality of the eigenvectors (unit length, orthogonal)

• Utility:• Detection (most of the signal variance may be in a small subset of components)• Compression (possibly express data with a small number of components) • Preprocessing of data as input to a network

A4523/A6523 Spring 2019 www.astro.cornell.edu/~cordes/A6523 25

Page 25: Data Mining, Modeling, and Machine Learning in Astronomyhosting.astro.cornell.edu/~cordes/A6523/A4523-6523... · 2019. 3. 14. · Data Mining, Modeling, and Machine Learning in Astronomy

Principal Component Analysis

These slides will be posted separately fromthe lecture slides