surveys - i. algorithm descriptionmnras 000,1{17(2017) preprint 20 june 2017 compiled using mnras...

17
MNRAS 000, 117 (2017) Preprint 20 June 2017 Compiled using MNRAS L A T E X style file v3.0 Linear feature detection algorithm for astronomical surveys - I. Algorithm description Dino Bekteˇ sevi´ c, 1,2? and Dejan Vinkovi´ c 1,2,3 1 Faculty of Science, University of Split, Rudjera Boˇ skovi´ ca 33, HR-21000 Split, Croatia 2 Science and Society Synergy Institute, Bana Josipa Jelaˇ ci´ ca 22, HR-40000 ˇ Cakovec, Croatia 3 HiperSfera d.o.o., Ilica 36, HR-10000 Zagreb, Croatia Accepted XXX. Received YYY; in original form ZZZ ABSTRACT Computer vision algorithms are powerful tools in astronomical image analyses, espe- cially when automation of object detection and extraction is required. Modern object detection algorithms in astronomy are oriented towards detection of stars and galax- ies, ignoring completely detection of existing linear features. With the emergence of wide-field sky surveys, linear features attract scientific interest as possible trails of fast flybys of near-Earth asteroids and meteors. In this work we describe a new linear feature detection algorithm designed specifically for implementation in Big Data as- tronomy. The algorithm combines a series of algorithmic steps that first remove other objects (stars, galaxies) from the image and then enhance the line to enable more efficient line detection with the Hough algorithm. The rate of false positives is greatly reduced thanks to a step that replaces possible line segments with rectangles and then compares lines fitted to the rectangles with the lines obtained directly from the im- age. The speed of the algorithm and its applicability in astronomical surveys are also discussed. Key words: methods: data analysis – surveys – meteors – minor planets, asteroids: general 1 INTRODUCTION Long linear features in astronomical images are seldom of scientific interest, with the exception of asteroid tracks (in case of fast near-Earth-crossers, but they are still much shorter than the meteor and satellite tracks; Waszczak et al. 2017). Such features are treated as noise, since the expec- tation is that they originate from non-astronomical sources: satellite and airplane tracks, scratches and dirt within the optical system and imaging camera, reflection and diffrac- tion spikes within the telescope (Storkey et al. 2004). With the growing importance of sky surveys of various size and scope (Djorgovski et al. 2013), it also grows the need for a proper classification of all objects detected within the field of view. This also includes linear features, as their identifica- tion enables corrections to the scientifically relevant features affected by the linear noise. However, the large volume of imaging data produced in surveys, especially in the wide field surveys in visual bands, created two interesting challenges. The first is that detection of linear features has to be done automatically, with a ro- bust linear feature detection algorithm (LFDA). The second ? E-mail: [email protected] (DB) is that some specific cases of linear features attracted scien- tific interest: trails crated by close flybys of the near-Earth asteroids (NEAs) (Ivezi´ c et al. 2010; Abell et al. 2009) and, more recently, trails created by meteors (Bekteˇ sevi´ c et al. 2014; Vinkovi´ c et al. 2016). Those are rare transient events, with unpredictable positions in the sky (except when a NEA flyby is predicted far in advance, but most of the time smaller NEAs are discovered during their close flyby). Wide field sky surveys cover a large fraction of the sky, with modern sur- veys doing that repeatedly. This creates a big enough sky coverage to detect such rare random events, but they are hidden in a large volume of imaging data. Detection of lines is a very common requirement in com- puter vision. The standard method used for this is the Hough algorithm (Hough 1959) or its derivatives. Hence, the first suggestions for LFDA in astronomy were based on this al- gorithm (e.g. Kubiˇ ckov´ a 2011; Storkey et al. 2004), but de- tecting lines in astronomical images is not the same class of problems as detecting lines in ordinary photos. Astronomical images have stars and galaxies that produce strong localized brightness peaks that easily confuse the Hough algorithm. This manifests itself as a large number of false positives, which forces the algorithm to be limited only to very bright c 2017 The Authors arXiv:1612.04748v3 [astro-ph.IM] 18 Jun 2017

Upload: others

Post on 25-Jan-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

  • MNRAS 000, 1–17 (2017) Preprint 20 June 2017 Compiled using MNRAS LATEX style file v3.0

    Linear feature detection algorithm for astronomicalsurveys - I. Algorithm description

    Dino Bektešević,1,2? and Dejan Vinković1,2,31Faculty of Science, University of Split, Rudjera Boškovića 33, HR-21000 Split, Croatia2Science and Society Synergy Institute, Bana Josipa Jelačića 22, HR-40000 Čakovec, Croatia3HiperSfera d.o.o., Ilica 36, HR-10000 Zagreb, Croatia

    Accepted XXX. Received YYY; in original form ZZZ

    ABSTRACTComputer vision algorithms are powerful tools in astronomical image analyses, espe-cially when automation of object detection and extraction is required. Modern objectdetection algorithms in astronomy are oriented towards detection of stars and galax-ies, ignoring completely detection of existing linear features. With the emergence ofwide-field sky surveys, linear features attract scientific interest as possible trails offast flybys of near-Earth asteroids and meteors. In this work we describe a new linearfeature detection algorithm designed specifically for implementation in Big Data as-tronomy. The algorithm combines a series of algorithmic steps that first remove otherobjects (stars, galaxies) from the image and then enhance the line to enable moreefficient line detection with the Hough algorithm. The rate of false positives is greatlyreduced thanks to a step that replaces possible line segments with rectangles and thencompares lines fitted to the rectangles with the lines obtained directly from the im-age. The speed of the algorithm and its applicability in astronomical surveys are alsodiscussed.

    Key words: methods: data analysis – surveys – meteors – minor planets, asteroids:general

    1 INTRODUCTION

    Long linear features in astronomical images are seldom ofscientific interest, with the exception of asteroid tracks (incase of fast near-Earth-crossers, but they are still muchshorter than the meteor and satellite tracks; Waszczak et al.2017). Such features are treated as noise, since the expec-tation is that they originate from non-astronomical sources:satellite and airplane tracks, scratches and dirt within theoptical system and imaging camera, reflection and diffrac-tion spikes within the telescope (Storkey et al. 2004). Withthe growing importance of sky surveys of various size andscope (Djorgovski et al. 2013), it also grows the need for aproper classification of all objects detected within the fieldof view. This also includes linear features, as their identifica-tion enables corrections to the scientifically relevant featuresaffected by the linear noise.

    However, the large volume of imaging data produced insurveys, especially in the wide field surveys in visual bands,created two interesting challenges. The first is that detectionof linear features has to be done automatically, with a ro-bust linear feature detection algorithm (LFDA). The second

    ? E-mail: [email protected] (DB)

    is that some specific cases of linear features attracted scien-tific interest: trails crated by close flybys of the near-Earthasteroids (NEAs) (Ivezić et al. 2010; Abell et al. 2009) and,more recently, trails created by meteors (Bektešević et al.2014; Vinković et al. 2016). Those are rare transient events,with unpredictable positions in the sky (except when a NEAflyby is predicted far in advance, but most of the time smallerNEAs are discovered during their close flyby). Wide field skysurveys cover a large fraction of the sky, with modern sur-veys doing that repeatedly. This creates a big enough skycoverage to detect such rare random events, but they arehidden in a large volume of imaging data.

    Detection of lines is a very common requirement in com-puter vision. The standard method used for this is the Houghalgorithm (Hough 1959) or its derivatives. Hence, the firstsuggestions for LFDA in astronomy were based on this al-gorithm (e.g. Kubičková 2011; Storkey et al. 2004), but de-tecting lines in astronomical images is not the same class ofproblems as detecting lines in ordinary photos. Astronomicalimages have stars and galaxies that produce strong localizedbrightness peaks that easily confuse the Hough algorithm.This manifests itself as a large number of false positives,which forces the algorithm to be limited only to very bright

    c© 2017 The Authors

    arX

    iv:1

    612.

    0474

    8v3

    [as

    tro-

    ph.I

    M]

    18

    Jun

    2017

  • 2 D. Bektešević and D. Vinković

    lines that dominate the image. This ignores all the false neg-atives coming from lines of lower brightness.

    In this paper we describe a new type of LFDA that over-comes the above mentioned problems. The Hough method isstill one of the crucial elements, but we added several othersteps that expand the applicability of the line detection. Forthe survey images we used the SDSS database (Alam et al.2015). A detailed review of our LFDA performance on theentire SDSS image dataset will be published in an upcom-ing paper, while here we focus solely on the algorithm de-scription and its basic performance properties. In Section 2we first describe the constraints on the applicability of anyLFDA in Big Data astronomy. An overview of the algorithmand its algorithmic components are described in Section 3.The first step in our LFDA is removal of all objects (stars,galaxies, etc.) from the image, such that the linear featurescan dominate in the Hough space. This step depends on theparticular survey under consideration, while we describe inSection 4 how we do it in the case of SDSS. Steps in ourLFDA differ slightly between detection of bright lines, de-scribed in Section 5, and detection of dim lines, describedin Section 6. In Section 7 we show some typical examples ofline detection, including intermediate steps of the algorithmperformance. In Section 8 we discuss the algorithm’s per-formance in the context of false positive and false negativedetections and the algorithm’s implementation.

    2 ALGORITHM APPLICABILITY ANDLIMITATIONS.

    The LFDA is not particularly useful in situations when im-ages can be inspected manually for linear features. If weassume that the manual revision time for one image is onesecond then a set of 10 000 images would require approxi-mately 3 hours for such a task, while 100 000 images wouldtake about a day of revision time. However, the total revi-sion time is mainly dominated by the fatigue of the reviewerprolonging the total review time, making it a very laboriousprocess. Thus, we can stipulate that there would be no needto implement a computer vision algorithm to deal with a linedetection problem in up to 100 000 images. The number iseven lower if we need to inspect images multiple times.

    This limits the algorithm use cases to situations wherewe need to analyze images in real time or when we need toinspect image sets with the number of images in the mil-lions. The former use case is well represented by potentialapplications in the large sky surveys that require real timeimage analysis, such as the Large Synoptic Survey Telescope(LSST)1, while the latter is applicable in cases of existing im-age databases, such as the Sloan Digital Sky Survey (SDSS)2

    or the Dark Energy Survey (DES)3.In both use cases it is necessary for the algorithm to

    perform fast. For example, the SDSS database contains ap-proximately 6.5 million images4. If it takes a second for thealgorithm to decide if an image contains a linear feature ornot, it would take a total of 75 days to process the entire

    1 http://www.lsst.org/2 http://www.sdss.org/3 http://www.darkenergysurvey.org/4 http://www.sdss.org/dr13/data_access/volume/

    data set. It is therefore desirable that the total temporalexecution per image is less than a second, including data ac-quisition, not just the processing time. Also, the algorithmshould be easily parallelizable to speed up the analysis ofthe entire database. Analysis of an image is not dependenton any of its neighbouring images. As long as there is noshared input or output objects or resources that can not beshared simultaneously, the program parallelization becomesan embarrassingly parallel problem. Hence, maintaining lowdata access times is a greater problem than parallelization,except in a case of real-time execution when the images comedirectly from the detector.

    If the image storage facilities are not equipped to runthe detection algorithm on-site, it is necessary to considerthe computer memory required to store the data on a ma-chine that would be capable of running the algorithm. Forexample, the corrected SDSS bzipped FITS files require15.37TB of storage. Once bunzipped for processing the totalmemory requirement rises to 70-80TB, assuming the averagefile size of 13MB. This example is the lower estimate of thetotal memory requirements as the SDSS camera has “only”126 megapixels (Gunn et al. 1998), compared to more mod-ern cameras such as the 560 megapixels DECam (Honscheidet al. 2008), or 3 200 megapixel LSST camera (Abell et al.2009). Additionally, if the processing could not be done on-site the question of how to transfer such a large quantityof data to the machine becomes important. With the harddisk drive (HDD) storage requirements only growing, it isobvious that the processing will eventually have to be donein real time, within the image processing pipeline of the skysurvey in question, or on compressed sections of the totaldata that can fit into the machine where the processing willbe done. In the latter case, uncompressing the data puts ad-ditional strains on the temporal execution requirements ofthe algorithm itself. Our test on bz2 compressed SDSS FITSfiles, with 100 test samples performed on an i5 Intel Asuslaptop with 7 200 rpm Seagate Momentus HDD with anaverage read rate of 78.6 MB/s and 15.76 ms access time,shows consistent and stable bunzip execution times of 0.8seconds.

    Less worrying than the HDD memory requirementsthere are also random access memory (RAM) requirements.During algorithm execution each of the images might requireseveral copies to be held in memory during processing. Ifseveral LFDA processes depend on the same RAM space,and each holds several copies of an image, then the numberof processes that are able to run in parallel could be verylimited. Fortunately, RAM constraints are subverted by thefact that images themselves are not that large. In the caseof SDSS the image frames are 2048x1489 pixel arrays, whilethe DES images have 2000x4000 pixels. This limits imagesize to the range of 10-30MB, which means that even in acase when a single process requires 100-120MB it would bepossible to run up to ten processes per 1GB of RAM space.

    The last constraint on the algorithm execution is itscorrectness. For example, when we tried to analyze 6.5 mil-lion image frames from the SDSS database, it was crucialto minimize Type I (false positive) and Type II (false nega-tive) error rates. If we assume that the software rejects 99%of images, as images without linear features, the number ofreturned images, possibly containing linear features, wouldbe approximately 65 000 images. Each one of those would

    MNRAS 000, 1–17 (2017)

    http://www.lsst.org/http://www.sdss.org/http://www.darkenergysurvey.org/http://www.sdss.org/dr13/data_access/volume/

  • Linear feature detection algorithm 3

    have to be reviewed manually to determine if it is a false pos-itive or not. On the other hand, severely restricting detectionparameters to accept only the most confident of detectionswould boost Type II errors and result in a biased sampleof only the most bright, longest and thickest linear featuresand would therefore be a poor statistical sample unfit formaking general conclusions.

    3 ALGORITHM OVERVIEW

    3.1 Platform and data specifics of our LFDAimplementation

    We implemented our LFDA algorithm in Python 2.6.65 andutilized Erin Sheldon’s fitsio v0.9.46, numpy v1.7.07 andOpenCV v2.4.88 for reading/writing FITS files and imageanalysis. In order to maintain the fast performance require-ments, some steps (such as object removal, data input andcongregation of results) are coded to give optimal perfor-mance for the dataset in question. Currently the algorithmis specifically set up for the SDSS data structure and forexecution on the Fermi cluster at the Observatory Belgrade.However, a lot of attention has been given to abstracting thedata layer input-output operations, which makes the algo-rithm easily adaptable to other types of datasets and plat-forms. The existing software stack contains seven modulesin total. Most of them deal with input, output, various al-gorithm settings and will not be discussed in depth in thispaper. We will focus only on the modules that contain thegist of the linear feature detection algorithm.

    Distributed Queueing System9 (DQS) scripts controlthe process requirements (number of CPU cores, memory,file requests, etc.) and parallelized execution by instructingthe TORQUE10 resource manager and Maui11 scheduler onthe requirements of each process within a submitted paralleljob. These scripts are generated by the Createjobs module(see pseudo-code Createjobs) and, apart from the intrinsicprocess requirements, consist mostly of calls made to the De-tecttrails module (see pseudo-code DetectTrails). Each jobis given a subset of images for analysis, disjunct from theother jobs. Within an individual job, images are processedin serial order and the job has its own isolated output file.Parallelization, thus, is dependent on the environment inwhich the code is running and it is not inherent to the ac-tual line-detection algorithm. The Detecttrails module con-tains all of the 29 parameters that control the algorithm ex-ecution, listed in Table 1, and invokes the commands in theRemovestars and Processfield modules in the correct order.The Removestars module contains the known object-removalfunctions (stars, galaxies, etc.) while Processfield module iswhere the detect bright and dim functions are located.

    5 https://www.python.org/6 https://github.com/esheldon/fitsio7 http://www.numpy.org/8 http://opencv.org9 http://www.csb.yale.edu/userguides/sysresource/batch/

    doc/InstMaint.html#_Toc35294191510 http://www.adaptivecomputing.com/products/

    open-source/torque/11 http://www.adaptivecomputing.com/products/

    open-source/maui/

    Procedure Createjobs

    Input : Execution and detection parameters,execution limitations, executionenvironmental variables,

    Output: Series of DQS scripts and a batch shellscript.

    1 read DQS file template;2 while not out of parameters do3 replace KEYWORDS;4 end5 save DQS scripts as job#.dqs;6 open new file;7 foreach DQS script do8 write “submit job#.dqs”;9 end

    10 save as shell script;

    Procedure DetectTrails. Called from DQSscripts.

    Input : Detection parameters, set of files toprocess.

    Output: Results (text file) or none.1 foreach file in set do2 bunzip file to BUNZIP PATH;3 open image header with fitsio;4 open data header with fitsio;5 call remove stars in RemoveStars module;6 call detect bright in Processfield module;7 if detection then8 write results;9 continue;

    10 else11 call detect dim in Processfield module;12 if detection then13 write results;14 continue;

    15 else if error then16 write errors;17 continue;

    18 else19 continue;20 end

    21 end

    22 end

    3.2 General algorithm flow

    The algorithm itself is designed as a three-step pipelinein which each step is more time consuming and less reli-able. A short diagram of the execution logic is shown inthe pseudo-codes DetectTrails and Createjobs. A more de-tailed algorithm-execution flowchart is shown in Figure B1(Appendix B). The first step is the removal of all knownobjects, followed by a step that tries to detect bright linearfeatures. When this step detects a possible linear feature,results are outputted and the process terminates. If thereare no bright linear features detected then the next step isto search for dim linear features. The image is subjectedto additional processing that increases the intensity of dim

    MNRAS 000, 1–17 (2017)

    https://www.python.org/https://github.com/esheldon/fitsiohttp://www.numpy.org/http://opencv.orghttp://www.csb.yale.edu/userguides/sysresource/batch/doc/InstMaint.html#_Toc352941915http://www.csb.yale.edu/userguides/sysresource/batch/doc/InstMaint.html#_Toc352941915http://www.adaptivecomputing.com/products/open-source/torque/http://www.adaptivecomputing.com/products/open-source/torque/http://www.adaptivecomputing.com/products/open-source/maui/http://www.adaptivecomputing.com/products/open-source/maui/

  • 4 D. Bektešević and D. Vinković

    (a) Dilation

    (b) Original

    (c) Erosion

    Figure 1. An example of the effects dilation and erosion have

    on linear objects. This original image is a part of the SDSS im-age frame frame-i-002888-1-0139.fits. A 3x3 rectangle kernel was

    applied in both operations with all kernel elements used for the

    decision on the new value for the anchor point.

    objects, followed by a step similar to detecting bright lines.Detecting bright and dim lines includes several checks to testthe validity of the possible linear features. Whenever any ofthe checks fail, the algorithm immediately rejects the linecandidate without completing further checks.

    The central idea of the algorithm is to remove all knownobjects in the image, detect the remaining objects, applyshape descriptor operators to perform structural analysis onthe detected remaining objects, and then reconstruct theimage while keeping only the elongated objects. If the sameobject is detected in the reconstructed image and in theimage with known objects removed, then the detection canbe confidently declared as positive. This approach is used inboth the detect bright lines and the detect dim lines steps.Such a two-step search based on line brightness allows thealgorithm to be more robust against varying image quality.

    3.3 Erosion and dilation

    Erosion and dilatation are morphological operators used inthe LFDA for noise removal. Both erosion and dilation usea kernel applied to the image pixels. The kernel is a ma-trix with a defined anchoring element, usually at its center.A new value of the element located at the anchor point isdecided by the image pixel values covered by the nonzero el-ements of the kernel. Erosion will assign the anchored pixelthe minimum value found in the kernel, while dilatation willassign the maximum value found in the kernel.

    Chaining combinations of erosion and dilation opera-tors creates other operators. The opening operator is ero-sion followed by dilation. As we show in Figure 1, applyinga too-large erosion operator has the potential of destroyingobjects in the image. However, if just sufficiently strong ero-sion is applied, such that the object of interest survives, andthen equally strong dilation is applied, the resulting imagewould contain the original object of interest and all objectslarger than it, but no objects smaller than the object of in-terest. The opening operator is thus very good at dealingwith noise. A small erosion kernel effectively deletes isolatedpixels, after which the dilation operator restores remainingobjects in the image to their previous sizes. This process is

    (a) Original image.

    (b) Original image after opening operator has been applied.

    Figure 2. Opening operator (erosion followed by dilation) is very

    useful at removing noise, as shown on this portion of the SDSS

    image frame frame-i-002888-1-0017.fits. A 3x3 rectangle kernelwas used.

    shown in Figure 2. The opening operator tends to break updiffuse objects, such as halos of large stars and galaxies, intoa collection of small ”blobs”. To reconnect these blobs backinto a singular object, it is necessary to apply dilation witha larger kernel than the one used in erosion.

    3.4 Histogram equalization

    Histogram equalization is a technique used for adjusting im-age contrast. In essence, histogram equalization finds thesmallest and largest intensity values in the image, assignsthem the minimum and maximum allowed intensity value(set by the image bit depth) and then reassigns the inten-sity values of the remaining pixels based on a cumulativedistribution function of the image. In our LFDA, histogramequalization is applied to images in conjunction with bright-ness enhancement. Brightness enhancement consists of man-ually adding a fixed value to all existing pixels with inten-sities larger than zero. Increasing the brightness and thenapplying histogram equalization can lead to drastically dif-ferent end results depending on the amount of brightness

    MNRAS 000, 1–17 (2017)

  • Linear feature detection algorithm 5

    (a) Only histogram equalization.

    (b) Brightness was increased by 0.5 before histogram equalization.

    Figure 3. An example of histogram equalization (adjusting im-

    age contrast) without and with brightness enhancement. Thistechnique is used for enhancing the visibility of existing linear

    dim features. The image is a part of SDSS image frame frame-i-005973-3-0130.fits.

    added (see Figure 3). Increasing brightness and contrast ofan image increases the amount of noise in the image as well.

    3.5 Canny edge detection

    Canny edge detection is a method of finding edges in a pic-ture. In our LFDA, it is used for detecting minimum arearectangles (see Figure B1). It was first developed by Canny(1986), but today there are several variations of the algo-rithm. The version used in our LFDA is the OpenCv imple-mentation. First the image is blurred with a 5x5 Gaussianblur matrix to reduce the effects of remaining noise. Thenthe Sobel operator (Sobel 1968) is applied. It is based on two3x3 kernels, Gx and Gy, which are used for calculating theapproximate derivatives of the intensity gradient at a pixelin the horizontal and vertical directions respectively. A newimage representing the gradient magnitude is constructedfrom the two images as

    G =√

    G2x + G2y . (1)

    As the gradient will be the largest at the pixels with thelargest change in intensity, the end result is an image withemphasized edges. The gradient direction can be calculatedas:

    Θ = arctanGyGx

    (2)

    Edge direction is perpendicular to the gradient direction,and is rounded to be in one of 4 possible directions: 2 diag-onal, vertical or horizontal direction.

    To reduce the number of falsely detected edges a non-maximum suppression is applied. Every nonzero gradientpoint (i.e. every nonzero pixel in the constructed image) ischecked and compared to the neighboring gradient points, inthe respective gradient directions, to see if the gradient pointin question is the local maximum or not. If that particularpoint is not the local maximum, the gradient at this pointis set to zero. This is an edge thinning technique, as the re-sulting image represents an image containing only the pointsof sharpest change in intensity value, while all other pix-els are removed. The remaining edges, are filtered through

    two user defined thresholds (lower and higher) to removenoise-induced edges. This is known as hysteresis threshold-ing. Pixels that are above the higher threshold are automat-ically declared to be actual edges. Pixels bellow the lowerthreshold are automatically discarded. The remaining pixelsbetween lower and higher thresholds are judged by the con-nectivity criterion. If such a pixel can be connected throughany number of steps to a pixel above the higher threshold itis declared an edge, otherwise it is discarded. The result isa binary image containing edges of the input image.

    3.6 Contour detection

    A contour is an outline representing the shape of an ob-ject. We use a contour-detection method first developed bySuzuki & Keiichi (1985), which takes detected edges andfinds contours of objects. Unlike the Canny edge-detectionalgorithm the result of contour detection is a list of 2D points(vectors). The function we use is taken from OpenCV, and ithas two important parameters - mode and method. The modeparameter describes relationships between the contours thatbelong to the same object. Considering that the main inter-est is the shape of an object, it is necessary to retain themost outer object contour. However, to avoid problems withedges near the image borders and poorly detected edges, itis prudent to consider returning all detected contours andexamining them as well. This is the currently used mode inour LFDA.

    The method parameter controls contour-detection ap-proximations and deals with the way the contours are stored.Since these contours will be used only for fitting minimum-area rectangles, it would be enough to use a simple approxi-mation where all horizontal, diagonal and vertical segmentsare compressed and only their end points are stored. How-ever, as with the mode parameter, it is again prudent toreturn all contour points and use a more reliable shape de-scriptor to select contours of interest for linear feature de-tection.

    3.7 Minimum area rectangle

    Minimum-area rectangles are rectangles of arbitrary rota-tion, but with the minimum surface area that still encom-passes the entire contour. A fast O(n) method for fittingminimum area rectangles using contours was first proposedby Toussaint (1983), based on the previous work by Free-man & Shapira (1975). The algorithm exploits the fact thata rectangle of minimum area enclosing a convex polygon hasa side collinear with that one of the edges of the polygon.The first step in the algorithm is to find the object’s verticesof minimum and maximum x and y coordinates. Then par-allel lines are drawn through the vertices, representing thefirst calipers. The second caliper lines are drawn perpen-dicular to the first and touching the object at its extremevertices in the opposite direction. These calipers are rotatedaround the object’s contour and the enclosed area is recal-culated and compared until the rectangle with the minimumarea is found. The benefiting factor is that the rotation doesnot need to be incremental because, as given by Freeman &Shapira (1975) theorem, one of the sides of the calipers mustbe collinear with one of the sides of the object and therefore

    MNRAS 000, 1–17 (2017)

  • 6 D. Bektešević and D. Vinković

    it is sufficient to rotate the calipers by the minimum angleone of the caliper hands closes with the contour.

    The main motivation for finding the minimum area rect-angles from contours is that we are able to gain descriptiveinformation about the contours (e.g., lengths of the rectan-gle sides). Additionally, this approach has proven itself tobe more reliable than selecting a simple contours method inconjunction with the contours mode parameter set to returnonly the outermost contours. By requiring that the ratio ofthe rectangle’s side lengths differs from 1, it is possible tofind only those contours that encompass elongated objectsin the image.

    3.8 Hough transform

    The generalized Hough transform refers to a feature-extraction technique used in computer vision that is basedon a voting procedure that favors a certain class of shapes.The classical Hough transform (Hough 1959) deals with de-tecting linear shapes in images. In Cartesian coordinates aline can be represented by an equation

    y = mx + b. (3)

    transforming equation 3 into the normal-line parameteriza-tion (Hart 2009), we obtain

    y = − cos(θ)sin(θ)

    x +r

    sin(θ). (4)

    Now we can rewrite equation (4) into the Hessian normalform

    r = x cos(θ) + y sin(θ). (5)

    The transformed coordinate space which has θ and r as itsaxes is called the Hough space. A set of points belonging toa line in Cartesian space will get mapped to a set of sinu-soids intersecting at a point (θ, r) in the Hough space. Thus,we reduced the problem of detecting a line in an image to aproblem of detecting points in Hough space. Once the localmaximum in Hough space is detected, an inverse transformcan be performed using equation (4) to get the correspond-ing line parameters in the Cartesian space.

    The basic Hough transform is fairly easy to reproduce.First set a threshold value that determines which pixels onan image are considered for line detection (i.e., pixels withintensity greater than the threshold). Then for each activepixel (x, y), draw a family of lines (θi, ri) in the Hough space.Each consecutive line in the family has its θi increased by apreset value defined by the resolution of Hough space. Foreach θi calculate ri and increase the vote in the Hough-spaceaccumulator at (ri, θi). Once all active pixels have been ex-hausted, search for the global maximum in Hough space anddeclare these Hough space coordinates to be the detectedline parameters. A search for local maxima would producea set of lines found in the image. Additionally, only longerlines can be selected by setting a lower limit for votes inthe Hough space. However, a lot of subtlety is hidden in theway these local maxima are found. In Appendix A we illus-trate why accumulator maxima have to be selected carefully,how Hugh transform can produce false positives, and whyany disk-like object (stars, galaxies, nebulae, etc.) has to be”hollowed” out or completely removed in order to preventfalse line detections.

    Figure 4. Lines detected by the Hough algorithm drawn overrectangle objects representing simplified examples of linear fea-

    tures encountered in astronomical images. All examples use the

    first five maxima in the Hough space. Notice how the slope andposition of detected lines changes with the rectangle shape.

    The ideal case for detecting linear features would beto apply the Hough transform on an image containing onlyedges of objects to avoid ambiguities. For example, apply-ing the Hough line detection algorithm on an empty imagewith only one line being just a single pixel wide leaves noambiguity about which set of pixels represents the line. Thisis unlikely in the astronomical setting, where we deal withlines of at least a dozen pixels wide. In that case we are ac-tually trying to detected a rectangle and, as demonstratedin Figure 4, the line that encompasses the most pixels (i.e.with the most votes in Hough space) is one of the rectangle’sdiagonals. The number of lines fitted to a single object, oreither of its diagonals, also depends on the object’s widthand length (see Figure 4). Our LFDA takes into accountthese ambiguities.

    4 OBJECT REMOVAL

    The purpose of this step is to remove all known objects fromthe image. A better object removal process yields higher de-tection rates and faster algorithm performance. In an idealcase where all objects in the image were registered correctly(known positions), confidently (known object shape param-eters), and their removal was perfect, the result would be anempty image except for the linear features, and the detec-tion step would be trivial. In the case of sky surveys with im-age differencing implemented into the data pipeline, a highquality object removal is a part of the survey’s data prod-uct. LSST is going to be such an example, with linear fea-tures hiding within their transient alert stream of ∼10,000events/visit (∼10 million per night) (Abell et al. 2009). TheTomo-e Gozen wide-field camera is another example (Moriiet al. 2016), ideal for meteor surveys due to its ability totake 2 frames per second of a 20 deg2 field-of-view by 84chips of 2k×1k CMOS sensors.

    However, in cases when an automatic object detectionpipeline is not perfect or image databases do not containimage differencing, LFDA has to incorporate its own objectremoval module. Depending on the set of images in ques-tion, object removal should be adapted appropriately basedon the specifics of the image detector and data acquisition

    MNRAS 000, 1–17 (2017)

  • Linear feature detection algorithm 7

    and availability of a catalogue of objects covered by the im-ages. For example, the UCAC4 object catalog12 would con-tain enough objects to perform this step successfully whendealing with La Sagra Sky Survey images13. For image setswith a much dimmer limiting magnitude than the 16th mag-nitude of the UCAC4 catalog, the object detection step canbe handled by running SExtractor (Bertin & Arnouts 1996)prior the star removal step to build a catalogue of objectsor by extracting a list of objects from the SDSS catalogue.

    However, the problem of object removal is particularlychallenging in the case of SDSS, where the list of objects vis-ible in images is obtained from the appropriate SDSS pho-toObj files. PhotoObj files are FITS files that contain in theirheader a table of all the objects registered in that particularsingle frame. An object is removed only if all of the followingconditions are satisfied:

    • the objects’ PSF fitted magnitudes are less than theSDSS 95% completeness limit (Stoughton et al. 2002) set bythe LFDA parameter filter caps,• the maximum allowed object’s magnitude difference be-

    tween SDSS filters is larger than maxmagdiff by at leastmagcount times, where both parameters are set by theLFDA,• and if the number of times the object is observed was

    equal to the number of times the field was observed (whenthe same field on the sky was imaged more than once).

    These conditions are fine tuned for LFDA implementationon the SDSS image database and the detailed logic behindthis choice will be explained in our future publication fo-cused solely on SDSS.

    It is difficult to generalize the choice of these parametersfor cases where object removal is applied on images not fromSDSS. For example, setting the bottom magnitude limit fil-ter caps assures that object removal using the SDSS objectcatalogue does not over-mask the image. This would hap-pen in cases where the SDSS catalogue contains a numberof objects dimmer than the limiting magnitude of analysedimage. In that situation object removal would mask partsof the image not showing these objects (as they are toodim), which could jeopardize the transient linear featuresthat might appear in the image, at these same positions.

    Once the objects are registered in the image a maskis constructed for object removal. Reconstructing the exactlight profile for each object is also possible, but it is compu-tationally more demanding and, as shown later in AppendixA, has proved itself unnecessary. We found that masking ob-jects with a simple square is good enough, with the squaresize determined by scaling the measured SDSS petrosian ra-dius containing 90% of the flux with the image pixel scalein arcseconds/pixel (pixscale parameter). If the determinedsquare is larger than the maximum allowed size (maxxy pa-rameter), only a default sized square is drawn whose size isset by the defaultxy parameter. As shown in Appendix A,this masking simplification has a negligible impact on thedetection rates in our design of LFDA.

    This entire procedure is demonstrated in Figure 5. For

    12 http://www.usno.navy.mil/USNO/astrometry/

    optical-IR-prod/ucac13 http://www.minorplanets.org/OLS/LSSS.html

    the SDSS images these steps have proved themselves capableof removing the maximum amount of objects without harm-ing the linear features. The features can be harmed eitherby accidentally covering them with oversized masks fromnearby objects or by interpreting the linear feature itself asan object (see Figures 5b and 5c).

    In general, where star catalogues are used, such as SDSSor UCAC4, there is no need to check for the correctness ofthe object detection as they have already been vetted by thesurvey authors. Also, the mask dimensions could be approxi-mately determined directly from the apparent magnitudes ofobjects. However, if SExtractor is used for detecting objectsthen the dimensions of mask rectangles can be determinedfrom the petrosian flux, but the correctness of the detectionis not guaranteed and it should be carefully monitored.

    5 DETECTING BRIGHT LINEAR FEATURES

    Bright linear features require less processing than dim fea-tures, which makes them simpler and faster to detect. Infact, excessive processing of bright features can harm thepossibility of their detection. Substantially increasing thebrightness and contrast of an image exposes a lot of noiseand remnants of incomplete object removal. Such spuriousobjects would decrease the confidence in detection of lin-ear features because they can obscure the linear feature ormask the maximum in Hough space. Detecting bright linearfeatures is therefore primarily oriented towards detection oflinear features that require minimal brightness and contrastenhancements.

    All the steps in this process are listed in Figure B1. Itstarts with reduction of the image depth, where the FITSimages, usually a 32-bit float, are reduced into 8-bit integerimages, with a floor function used for rounding to a wholenumber. New pixel values are recalculated as

    Inew(x, y) = min (max(Iold(x, y), 0), 255) (6)

    The loss of bit depth information during the conversion isnot particularly concerning because the bit depth data ismostly needed for photometry information. As long as theobjects’ intensities are bright enough not to be rounded tozero their shape will not change drastically.

    In essence, all pixels with values less than 1 are roundedto 0 and all pixels with brightness larger than 255 arerounded to 255. Pixels with brightness in the 0-255 rangeare left untouched. The image is then subjected to histogramequalization, followed by the dilation procedure to increasethe size of all remaining objects. The size of dilation kernelis controlled by the dilateKernel parameter found in the De-tecttrails module. We refer to this converted, equalized anddilated image as the ”processed image”. It will be used laterfor line-detection by the Hough line-detection procedure.

    The processed image is manipulated further to obtainan image with sufficiently elongated objects replaced by rect-angles. The processed image is first exposed to the Cannyedge-detection algorithm, with the hysteresis thresholdingparameters set to the maximum of 255 and the minimum of0. Effectively, this step returns all existing edges found asif hysteresis thresholding was skipped. The purpose of thisedge detection is to clearly identify all remaining objects onthe image in such a way that there would be no ambiguity

    MNRAS 000, 1–17 (2017)

    http://www.usno.navy.mil/USNO/astrometry/optical-IR-prod/ucachttp://www.usno.navy.mil/USNO/astrometry/optical-IR-prod/ucachttp://www.minorplanets.org/OLS/LSSS.html

  • 8 D. Bektešević and D. Vinković

    about the position of objects’ borders. This is important be-cause the next step is extracting the shapes of objects fromtheir contours.

    The contour function’s mode and method optionsare controlled by the LFDA’s contoursMode and con-toursMethod parameters. ContoursMode is set to theRETR LIST flag, which means that all found contours arereturned without establishing any hierarchical relationships.ContoursMethod is set to the CHAIN APROX NONE flag,which means that all contour points are returned with notruncation or approximation. It is not recommended tochange these values as the other available parameter flagscan break the LFDA algorithm. The contours are then fedinto the procedure for finding the minimum-area rectangles.However, the minimum area rectangles are fitted to the con-tours only if

    • the shorter rectangle side is longer than theminAreaRectMinLen parameter,• and the ratio of longer to shorter rectangle side is larger

    than lwTresh.

    The first condition excludes all remaining contours that aretoo small to belong to a valid linear feature. The second con-dition filters out all minimum area rectangles not elongatedenough to be considered as a linear feature. The rectanglesthat pass those criteria are drawn on a new empty image. Ifthere are no rectangles found, either because no edges or nocontours were found, this detection step terminates as if theimage was rejected.

    Lines are derived by the Hough line detection proce-dure from the image with rectangles and from the processedimage. This creates two sets of lines - one for the processedimage and one for the image with rectangles. The number oflines in a set is the number of the first nLinesInSet strongestlines in Hough space. The sets are compared to each otherand the lines are compared within their sets. When compar-ing lines within the same set, the spread is determined asthe difference between the maximum and minimum θ coor-dinates. If that difference is larger than thetaTresh then thedetection is rejected. As shown in Figure 4, wider and shorterthe linear features are expected to have larger spreads. Wenoticed that this step helps in differentiating between the lin-ear features of interest and the star diffraction spikes. Thediffraction spikes, being generally shorter than interestinglinear features, have wider spreads. Setting thetaTresh pa-rameter to a smaller value favors the detection of longer lin-ear features and rejects false positives generated by diffrac-tion spikes, albeit at a risk of rejecting valid linear features.Unfortunately, it is not always possible to cleanly differenti-ate between diffraction spikes and linear features of interest.

    Sets of lines are compared to each other in a similarway. First, all lines in a set are averaged to obtain the aver-age (r, θ) for the set. After that the averages of two sets arecompared such that the difference between average θ mustnot be greater than lineSetTresh or the detection will be re-jected. Since θ describes only the slope, it is also possiblethat the two sets of lines are nearly parallel to each other,but at different places in the image. Therefore, if the differ-ence between average r coordinates is larger than dro thenthe detection is rejected. We see that working with two linesets enables us to double check that a detected linear featureis not just a spurious event.

    6 DETECTING DIM LINEAR FEATURES

    If no bright linear features were found then faint featureson the image are boosted in hopes one of them is a trail.The process is similar to the detection of bright lines, ex-cept that now the first step transforms all pixels with valuessmaller than minFlux to zero, while the remaining pixelshave an addFlux value added to them. Only then the imageis converted from a 32-bit float image into an 8 bit imageusing equation (6) and its contrast is enhanced by histogramequalization. Since the pixel values have been altered beforethe conversion and histogram equalization, a larger numberof pixels remains visible in the final image.

    Thus, these additional operations enhance the noisewhich now has to be removed as well. Increasing the minFluxparameter would achieve this, but at the risk of destroyingexisting dim features. Another approach, as demonstrated insection 3.3, is to apply the opening operator. However, weuse an operation that is not a true opening as the dimensionsof the erosion and dilation kernel are not equal. The prob-lem with certain dim linear features is that they are ”trans-parent” - so dim that they are barely distinguishable fromthe background. This creates a problem for erosion becausepixels within transparent objects can be eroded. When thathappens dilation has no effect because there is no object cen-ter to expand from. To combat these effects it is necessaryto use a small erosion kernel and a relatively large dilata-tion kernel in order to try to preserve and restore as much ofthe original linear feature as possible. The dimension of theerosion kernel is controlled by the erosionKernel parameter,while dilatationKernel controls the size of dilation kernel.

    The remaining detection steps are the same as when de-tecting bright linear features: detection of edges by Canny,locating contours among the edges and then a reconstructionof the image via minimum-area-rectangles approach. Linesare fitted to both the reconstructed image and the processedoriginal image and their spread is tested individually. Thetwo line sets are then compared to each other in the sameway as in the detection of bright lines. The parameters in-volved are named the same as before, but do not necessarilyhave the same values.

    7 LFDA PROCESSING EXAMPLES

    We illustrate the LFDA processing sequence for three sit-uations. Figure 6 shows the steps involved in a successfuldetection of a bright linear feature. Without dilating theimage, there is a strong possibility that no minimum arearectangles would be found due to the transparency of thelinear feature. Notice also the low level of noise in the im-age. This example is a limiting case of a successful detectionof bright features, as trails slightly dimmer or more trans-parent require additional steps for detecting dim features.

    Figure 7 represents steps taken during a successful de-tection of the dim linear feature. This procedure is activatedafter a failed attempt to detect a bright feature. The linearfeature visible in the image is very thin and not particu-larly transparent. This example displays the dangers of ero-sion step, where trails that are more transparent or thinnerwould not be detected because erosion destroys them. How-ever, if the erosion step were omitted, the sheer amount of

    MNRAS 000, 1–17 (2017)

  • Linear feature detection algorithm 9

    the background noise now present in the image would neg-atively impact the line detection and would have a drasticimpact on the execution time required due to a larger num-ber of active pixels.

    Figure 8 represents a false-negative detection. Due tothe relatively crowded field and shortness of the trail visiblein the image, the detection ended up unsuccessful. It demon-strates limitations of the current version of LFDA, but alsoa situation where the algorithm can be further improved infuture releases.

    All LFDA parameter values were the same for all threefigures and are shown in Table 1. However, these detectionparameters do not represent the optimal detection parame-ters for SDSS, which we will address in our future publica-tion. In addition to the listed parameters, there is one moreparameter debug that controls the verbosity of execution byprinting various messages to the screen as well as savingimages depicting the processing steps.

    8 DISCUSSION AND CONCLUSION

    8.1 False positives

    Our LFDA is highly resistant to false-positive detections,mainly due to its step when it compares lines obtainedthrough minimum-area rectangles with lines from the pro-cessed image. The reason why this step is so efficient canbe understood if we consider what type of signals tend toemerge as false positives. Those are objects that are not re-moved or not entirely removed from the image. This is oftenthe case with very bright objects, very dim objects and ob-jects with visible internal structure. Very dim objects andobjects with internal structure tend to have poorly deter-mined petrosian radii, which results in a bad masking ele-ment. Very bright objects are saturated and their centroidscannot be determined. When this occurs, the masking el-ement is not placed at the appropriate pixel coordinates.Objects that are only partially removed have generally well-defined measured properties, but the objects are bigger thanthe masking element of maxxy in size.

    The minimum-area rectangles fitted to large completelyunremoved objects are rejected because they fail to complywith the lwTresh condition. This enables LFDA to avoidfalse positives in such cases because the object will not ap-pear in the reconstructed image (which contains only draw-ings of elongated minimum area rectangles) and, therefore,cannot satisfy a comparison with the possible lines that theHough algorithm produces in the processed image. In thecase of partially removed objects, the minimum-area rect-angle fitted to such an object will depend on the shape ofthe object. Partially removed objects have only their centersmasked out, while some outer brightness structure remains.This makes them appear like a hollow disk and, except in therare cases of large edge-on galaxies, no minimum-area rect-angles fitted to such structures can pass the lwTresh condi-tion. In some cases the hollowed-out objects can appear likearcs, especially when there are other masked objects in thevicinity that also mask out parts of the hollow disk as well.Such arcs can possibly yield minimum area rectangles, butthe Hough algorithm applied to the processed image wouldnot favor such objects (see Appendix A). This again leads to

    the comparison failure between lines from the reconstructedand processed images.

    This process rejects the vast majority of false positivedetections and makes our LFDA a very robust algorithm.Nonetheless, some false positive may appear, but since thenumber of detected lines is typically small, such cases canbe easily removed by visual inspection in the post-analysis.

    8.2 False negatives

    More worrisome are situations when a linear feature ispresent in the image but is not detected. In such cases wewould not know that some lines still exist in the processedsample and our conclusions based on detected lines mightbecome biased. We recognized three classes of linear featuresthat our LFDA has a problem identifying:

    • Cases when the overzealous object removal process re-moves too much of the line, as shown in Figure 8.• Extremely dim, translucent features (examples are

    shown in Figure 9) are often destroyed during the erosionstep. The mechanism of how that happens has been dis-cussed in section 6.• Lines that have an internal brightness structure in a

    form of a brightness dip along the middle od the line. Thisfeature is typically a sign of defocusing effect in meteorsimaged through big telescopes. It can also be due to meteorfragmentation, when multiple brightness dips might emerge.Such linear features can be very wide, as shown in a 48 pixelswide example in Figure 9b. The problem with these lines isthat they can fail on one of the line consistency checks (dro,linesetTresh or thetaTresh).

    These shortcomings can be bypassed in certain situations asthe sources of long linear features are transient objects andthe fields of view are not particularly large. Imaging deviceson large scale sky surveys are generally constructed out oftens of CCDs arranged in a square, rectangular or circulararrays in which each CCD records one or multiple imagesstored into a database as imaging frames. By knowing thegeometry of this array the path of a linear feature acrossthe array can be reconstructed from a detection in a singleframe. This way we can look for the linear feature even inframes where LFDA fails to detect the line.

    8.3 Implementation of LFDA

    During the development phase we tested our LFDA on alarge variety of images, mostly from the SDSS database. Ingeneral, almost all trails could have been detected with acareful selection of the detection parameters. We see thatour LFDA performs fast enough, 0.1 to 0.3 seconds per im-age frame, in small scale tests (Bektešević 2016). We willsoon finish a large scale test on the entire SDSS databaseand reveal the results in the upcoming publication. It willcover the questions of algorithm performance, possible ex-isting execution bottlenecks, selection of optimal detectionparameter values and corresponding detection rates for theentire SDSS database. For the purpose of measuring detec-tion rates we identified close to complete set of frames withlines in the SDSS r and i filters (our high degree of con-fidence in the completeness arises from inspecting by hand

    MNRAS 000, 1–17 (2017)

  • 10 D. Bektešević and D. Vinković

    (a) The original image containing a very bright trail (additional

    brightness adjustments need to be applied to make all the objects

    visible).

    (b) Mask constructed from all the objects contained in the photoObjFITS file header. Notice how the linear feature was misclassified as

    a large number of individual objects.

    (c) Applying varying square sizes to accommodate for the size ofobjects shows how parts of the linear feature have been detected as

    galaxies or stars. If this mask were used, the linear feature would

    be completely destroyed.

    (d) Imposing conditions for magnitude measurements in different

    SDSS filters reduces the number of objects that are masked.

    (e) Equating the number of times an object and the field are ob-

    served eliminates the remaining fictitious objects.

    (f) The resulting image once the mask is overlaid on the original

    image..

    Figure 5. An example of mask creation and subtraction on a particularly bright linear feature in the SDSS frame-i-002888-1-0139.fitsimage. The image itself has undergone slight processing to increase the brightness of the objects in the image, which otherwise wouldnot be visible.

    MNRAS 000, 1–17 (2017)

  • Linear feature detection algorithm 11

    (a) The original image with a relatively bright trail. The image has

    undergone brightness and contrast enhancements.

    (b) Objects are masked, the image is converted to 8-bit image and

    then histogram equalized.

    (c) A dilation operator is applied using a 4x4 kernel in order to

    enhance the objects and fill-in any holes due to line transparency

    after conversion.

    (d) After the edges and contours have been found the processed im-

    age from Figure 6c is reconstructed by drawing only the elongated

    minimum area rectangles.

    (e) Lines are fitted to the reconstructed image in Figure 6d. (f) Lines are fitted to the processed image in Figure 6c.

    Figure 6. An example of how bright detection algorithm processes bright trails. The example uses the SDSS frame-i-005973-3-0130

    image frame. The original image in Figure 6a had its brightness and contrast increased in the same way as in Figure 6b in order to makethe objects in the image more pronounced. After fitted lines from Figure 6e and Figure 6f were tested and compared (see section 5) thedetection was accepted.

    MNRAS 000, 1–17 (2017)

  • 12 D. Bektešević and D. Vinković

    (a) The original image with a dim trail. The image has undergonebrightness and contrast enhancements.

    (b) Objects are masked, brightness is increased, the image is con-verted to 8-bit image and then histogram equalized.

    (c) An erosion operator is applied using a 3x3 kernel in order to

    remove noise now present due to brightness and contrast enhance-ments.

    (d) A dilation operator is applied using a 9x9 kernel in order to

    enhance the objects, reconnect them and fill-in any holes due toerosion.

    (e) Contours are found among Canny edges, followed by the mini-mum area rectangles fit to Figure-7d and finally the lines are fittedto the rectangles.

    (f) Lines are fitted to Figure 7c and drawn on Figure 7d.

    Figure 7. An example of how dim detection algorithm processes dim trails. The examples uses the SDSS frame-i-000094-1-0313 imageframe. The original image in Figure 7a had its brightness and contrast increased in the same way as in Figure 7b in order to make the

    objects in the image more pronounced. After fitted lines from Figure 7e and Figure 7d were tested and compared (see section 6) the

    detection was accepted.

    MNRAS 000, 1–17 (2017)

  • Linear feature detection algorithm 13

    (a) The original image with a dim trail. The image has undergonebrightness and contrast enhancements.

    (b) Objects are masked, brightness is increased, the image is con-verted to 8-bit image and then histogram equalized.

    (c) An erosion operator is applied using a 3x3 kernel in order to

    remove noise now present due to brightness and contrast enhance-ments.

    (d) A dilation operator is applied using a 9x9 kernel in order to

    enhance the objects, reconnect them and fill-in any holes due toerosion.

    (e) Contours are found among Canny edges, followed by the mini-mum area rectangles fit to Figure-8d and finally the lines are fittedto the rectangles.

    (f) Lines are fitted to Figure 8c and drawn on Figure 8d.

    Figure 8. An example of a false negative detection. The example uses the SDSS frame-i-005415-1-0055 image frame. Even though atrail exists in the image it is not detected. The main reasons for failed detection is over-masking objects in a relatively dense field.

    Additionally, due to the shortness of the linear feature it is unclear whether thetaTresh conditions would have been met.

    MNRAS 000, 1–17 (2017)

  • 14 D. Bektešević and D. Vinković

    Table 1. Parameters that control the execution of line detection algorithm in Detecttrails module.

    Parameter name Parameter description Parameter value

    RemoveStars parameters

    pixscale Pixel scale (arcseconds/pixel) 0.396

    defaultxy Default mask size (pixels) 10maxxy Maximum allowed mask size (pixels) 60

    filter caps Minimum apparent magnitude under which objects

    are not removed, given per SDSS filter.

    u:22.0, g:22.2, r:22.2, i:21.3, z:20.5

    magcount Number of different filters in which magnitude differ-

    ence can exceed maxmagdiff.

    3

    maxmagdiff Maximal magnitude difference allowed between two fil-ters.

    5

    Detect bright parameters

    dilateKernel Dilation kernel. 4x4 matrix, all elements equal to 1.

    contoursMode Contours mode. RETR LIST, return all contours as a flat list.contoursMethod Contours approximation method. CHAIN APROX NONE, a full set of points en-

    compassing the contour is returned.minAreaRectMinLen Allowed minimum length of sides of a minimum area

    rectangle. (pixels)

    1

    lwTresh Minimum allowed ratio of sides of a minimum arearectangle.

    5

    nlinesInSet Number of detected lines to be declared a set. 3

    thetaTresh Maximum allowed difference between any two lineslopes in a set. (radians)

    0.15

    lineSetTresh Maximum allowed difference between averaged line

    slopes of the two line sets. (radians)

    0.15

    dro Maximum allowed horizontal displacement between

    line sets. (pixels)

    25

    Detect dim parameters

    minFlux Pixels with flux less than minFlux are set to zero. 0.03addFlux Pixels with flux larger than zero have addFlux added

    to them.

    0.5

    erodeKernel Erosion kernel. 3x3 matrix, all elements are 1.dilateKernel Dilation kernel. 9x9 matrix, all elements equal to 1.

    contoursMode Contours mode. RETR LIST, return all contours as a flat list.contoursMethod Contours approximation method. CHAIN APROX NONE, a full set of points en-

    compassing the contour is returned.

    minAreaRectMinLen Allowed minimum length of sides of a minimum arearectangle. (pixels)

    1

    lwTresh Minimum allowed ratio of sides of a minimum area

    rectangle.

    5

    nlinesInSet Number of detected lines to be declared a set. 3

    thetaTresh Maximum allowed difference between any two line

    slopes in a set. (radians)

    0.15

    lineSetTresh Maximum allowed difference between averaged line

    slopes of the two line sets. (radians)

    0.15

    dro Maximum allowed horizontal displacement between

    line sets. (pixels)

    25

    more than 500 000 images that were suspected for containinglinear features) and tests show 80% and 74% detection rates,respectively (Bektešević 2016). So far we see that our LFDAis fast, robust, with low false positives/negatives, easily par-allelizable, and easily adaptable to different computationalenvironments.

    The applicability of LFDA in diverse observational set-tings is of a great concern. Apparat from the required mod-ifications to the object removal step and the optimization ofdetection parameter values, no other part of LFDA wouldhave to be changed for implementation in different environ-ments. The method does not require a large scale testing

    to derive optimal algorithm parameter values. This can bedone on a small initial test set, which enables its easy im-plementation into the image processing pipelines of existingand upcoming sky surveys. The source code of our LFDAis available at https://github.com/DinoBektesevic/LFDAunder the General Public License GPLv314.

    Sky surveys are particularly suitable for LFDA as theyalready have an object detection pipeline, which is a neces-sary component of LFDA within its initial object removal

    14 https://www.gnu.org/licenses/gpl-3.0.en.html

    MNRAS 000, 1–17 (2017)

    https://github.com/DinoBektesevic/LFDAhttps://www.gnu.org/licenses/gpl-3.0.en.html

  • Linear feature detection algorithm 15

    (a) Transparent dim linear features are almost undifferentiated from

    the surrounding noise. This example is taken from the SDSS frame-

    u-005973-3-0127 image frame.

    (b) A transparent linear feature exhibiting a defocusing effect visi-

    ble as a brightness dip along the middle of the line. The interior ofthe linear feature is full of holes - pixels of very low intensity. This

    example is taken from the SDSS frame-g-002728-2-0424 image frame.

    Figure 9. Two examples of transparent linear features that tendto avoid detection with LFDA. Their low brightness makes them

    barely visible and only a view from a distance reveals their ex-

    istence as an increase in white pixel density. The size of kernelsused in the LFDA morphological operators is typically fine tuned

    to the brighter, more pronounced features. This makes them unfit

    to detect these wide transparent trails. Additionally, not all linearfeatures have a uniform internal structure, as seen in the lower

    panel example.

    step. High cadence sky surveys represent an especially at-tractive target for LFDA implementation because they useimage differencing in their data pipelines. This approach canbe almost perfect in its ability to detect all the objects in animage, which removes a lot of obstacles that cause false pos-itive or false negative detections. However, implementationsof LFDA are not limited to sky surveys alone. As mentionedin section 4, there are other ways by which the object re-moval step can be achieved. The most general, although oneof the slowest, would be applying SExtractor to detect andremove objects such as stars and galaxies. Hence, we see ourLFDA as a useful tool for any research aiming at identifi-cation of long linear features in large astronomical imagingdatasets.

    In the upcoming publication we will address the issue ofmeteor defocusing that affects the shape of the lines (Figure9b), but enables differentiation between satellites and mete-ors, as well as extraction of meteor physical properties fromthe track brightness profile. This will be followed by a publi-cation that describes results of a large scale test on the entireSDSS database, which will also enable us to publish a largeset of SDSS frames with lines. This set can be used in thefuture for testing new linear feature detection algorithms,including machine learning methods that need training sets.The goal is to reach a level of automatization where algo-

    rithms not only detect lines, but also classify them basedon their origin and extract some physical properties of themoving object.

    ACKNOWLEDGEMENTS

    We thank Darko Jevermović for his help with the SDSSdatabase setup and access to Fermi cluster at the Obser-vatory Belgrade. We are thankful to Aleksandar Cikota forhelpful discussions and a preliminary investigation into thistopic. We acknowledge the help by Benjamin A. Weaver fromthe Physics Department, New York University, for his helpwith the SDSS data. Authors are also thankful to Russ Laherfor usefull comments on our manuscript. We also thank thePhysics Department, University of Split, and the TechnologyInnovation Centre Međimurje for additional computationalresources.

    The authors acknowledge being apart of the networksupported by the COST Action TD1403 Big Data Era inSky and Earth Observation, which helped with discussionson the applicability of our research.

    We would also like to thank Sloan Digital Sky Surveyfor the free and open data access and accompanying de-tailed documentations. Funding for SDSS-III has been pro-vided by the Alfred P. Sloan Foundation, the ParticipatingInstitutions, the National Science Foundation, and the U.S.Department of Energy Office of Science. The SDSS-III website is http://www.sdss3.org/.

    SDSS-III is managed by the Astrophysical ResearchConsortium for the Participating Institutions of the SDSS-III Collaboration including the University of Arizona, theBrazilian Participation Group, Brookhaven National Lab-oratory, Carnegie Mellon University, University of Florida,the French Participation Group, the German ParticipationGroup, Harvard University, the Instituto de Astrofisica deCanarias, the Michigan State/Notre Dame/JINA Partici-pation Group, Johns Hopkins University, Lawrence Berke-ley National Laboratory, Max Planck Institute for Astro-physics, Max Planck Institute for Extraterrestrial Physics,New Mexico State University, New York University, OhioState University, Pennsylvania State University, Universityof Portsmouth, Princeton University, the Spanish Partic-ipation Group, University of Tokyo, University of Utah,Vanderbilt University, University of Virginia, University ofWashington, and Yale University.

    REFERENCES

    Abell P. A. et al., 2009, LSST Science Book, Version 2.0,

    (arXiv:0912.0201)

    Alam S., et al., 2015, ApJS, 219, 12

    Bektešević D., 2016, LSST@Europe2 conference, Belgrade,Serbia. Accessible at https://project.lsst.org/meetings/

    lsst-europe-2016/programme

    Bektešević D., Cikota A., Jevermović D., Vinković D., 2014, in

    Soille P., Marchetti P. G., eds, Proc. Conf. on Big Data fromSpace (BiDS’14), p. 287

    Bertin E., Arnouts S., 1996, A&AS, 117, 393

    Canny J., 1986, IEEE Transactions on Pattern Analysis and Ma-

    chine Intelligence, PAMI-8, 679

    MNRAS 000, 1–17 (2017)

    https://project.lsst.org/meetings/lsst-europe-2016/programmehttps://project.lsst.org/meetings/lsst-europe-2016/programme

  • 16 D. Bektešević and D. Vinković

    Djorgovski S. G., Mahabal A., Drake A., Graham M., Donalek C.,

    2013, in Oswalt T.D., Bond, H. E., eds, Planets, Stars and

    Stellar Systems. Volume 2: Astronomical Techniques, Soft-ware and Data, p. 223

    Freeman H., Shapira R., 1975, Commun. ACM, 18, 409

    Gunn J. E., et al., 1998, AJ, 116, 3040

    Hart P. E., 2009, IEEE Signal Processing Magazine, 26:6, 18

    Honscheid K., et al., 2008, The Dark Energy Camera (DECam)

    (arXiv:0810.3600)

    Hough P. V. C., 1959, in Proc. of 2nd International Conferenceon High-Energy Accelerators and Instrumentation, 554

    Ivezić Ž. et al., 2010, NASA Planetary Data System, 124

    Kubičková E. A., 2011, in Asher D. J. Christou A. A., Atreya P.,

    Barentsen G. eds, Proc. of the International Meteor Confer-ence 2010, Armagh, Northern Ireland, p. 41

    Morii M. et al., preprint (arXiv:1609.03249)

    Sobel, I., Feldman, G., 1968, A 3x3 isotropic gradient operator for

    image processing. Presented as a talk at the Stanford Arti-

    ficial Project. http://proceedings.informingscience.org/InSITE2009/InSITE09p097-107Vincent613.pdf

    Storkey, A. J., Hambly, N. C., Williams, C. K. I., Mann, R. G.,

    2004, MNRAS, Cleaning sky survey data bases using Houghtransform and renewal string approaches, 347, p. 36-51

    Stoughton C. et al., 2002, AJ, 123, 485

    Suzuki S., Keiichi A., 1985, Computer Vision, Graphics, and Im-

    age Processing, 30, 32

    Toussaint G. T., 1983, in Proc. of the Mediterranean Electrotech-

    nical Conference, Athens, Greece

    Vinković D. et al., 2016, in Roggemans A., Roggemans P., eds,

    Proc. of the International Meteor Conference 2016, Egmond,

    the Netherlands, p. 319

    Waszczak A., et al. 2017, PASP, 129, 034402

    APPENDIX A: HOUGH TRANSFORMEXAMPLES

    Figure A1 shows why a careful determination of the posi-tion of the accumulator maximum in the Hough (r, θ) spaceis so important. A large number of highly active accumu-lators (i.e. values close to the maximum) are visible in theneighborhood of the detection point (the maximum value inthe Hough space) in the zoomed panel in Figure A1. Eachof those active accumulators represents a line that passesthrough the three dots in the used image with varying pre-cision. A proper centroid location method applied to themaximum neighborhood would probably yield the best (r, θ)values, but it would also be computationally more demand-ing. Considering how symmetrically spaced the active accu-mulators are around the actual value, a good approximativeapproach would be averaging a set of top n active accumu-lators (i.e. the first n maxima in the Hough space). Noticealso how dots produce pronounced and tight sinusoids in theHough space compared to the sinusoids produced by circlesshown in Figures A2 and A3. The reason why dots pro-duce such sinusoids is that they are disks. Within disks anddisk-like objects there are a lot of neighboring active pixelsthrough which a new line can be drawn. All those lines pro-duce a similar maximum in the Hough space and thereforeleave a very distinguishable compact track. It is importantto notice that even though a line does not actually exist inFigure A1, a strong maximum in the Hough space implies aline anyhow.

    Figures A2 and A3 show the opposite case of a disk.Unlike the filled dots, circles and generally objects that are

    Figure A1. Hough space of an image with three dots and no

    line.

    Figure A2. Hough space of an image with a circle and a line.

    MNRAS 000, 1–17 (2017)

    http://proceedings.informingscience.org/InSITE2009/InSITE09p097-107Vincent613.pdfhttp://proceedings.informingscience.org/InSITE2009/InSITE09p097-107Vincent613.pdf

  • Linear feature detection algorithm 17

    Figure A3. Hough space of an image with a line and a lot of

    circles.

    not filled produce wider and less distinguishable tracks inthe Hough space. This means that all disk-like objects willhave to be ”hollowed” out, or completely removed, otherwisethe Hough line detection algorithm would likely produce adetection, as was the case in Figure A1. This is why mask-ing objects with squares in the object removal step is suc-cessful, provided enough of the objects’ interior is maskedout. Figure A3 also shows how robust and unaffected bynoise the Hough transform is. The line accumulators in theHough space remained as visible as in the case of a singlecircle (Figure A2) even though the used image was far morecrowded.

    APPENDIX B: DETAILED ALGORITHMFLOWCHART

    In pseudo-codes DetectTrails and Createjobs we show a gen-eral logic flow of our LFDA, while here in Figure B1 we showin more detail the algorithm flow of the key parts - object re-moval (in case images do not already have objects removed),detecting bright linear features and detecting dim linear fea-tures.

    This paper has been typeset from a TEX/LATEX file prepared by

    the author.

    Figure B1. Flowchart diagram describing a detailed algorithm

    flow.

    MNRAS 000, 1–17 (2017)

    1 Introduction2 Algorithm applicability and limitations.3 Algorithm overview3.1 Platform and data specifics of our LFDA implementation3.2 General algorithm flow3.3 Erosion and dilation3.4 Histogram equalization3.5 Canny edge detection3.6 Contour detection3.7 Minimum area rectangle3.8 Hough transform

    4 Object removal5 Detecting bright linear features6 Detecting dim linear features7 LFDA processing examples8 Discussion and conclusion8.1 False positives8.2 False negatives8.3 Implementation of LFDA

    A Hough transform examplesB Detailed algorithm flowchart