deep convolutional neural networks as a method to classify

Deep Convolutional Neural Networks as a Method to Classify RotatingObjects based on Monostatic Radar Cross Section

Eric Wengrowski1*, Matthew Purri1*, Kristin Dana1, and Andrew Huston2

*(equally contributing co-primary authors)

Abstract— Radar systems emit a time-varying signal andmeasure the response of a radar-reflecting surface. In the caseof narrowband, monostatic radar signal domain, all spatial in-formation is projected into a scalar Radar Cross Section (RCS)value. We address the challenging problem of determining anobject’s shape class using RCS estimates from a rotating objectcollected as a time series. A monostatic signal is difficult torecognize when the subject is tumbling with unknown motionparameters under detectability limitations and signal noise.Previous shape classification methods have relied on image-like synthetic aperture radar (SAR) or multistatic (multiview)radar configurations with known geometry. Deep learningand convolutional neural networks (CNNs) have revolutionizedmachine learning and classification tasks in the computer visiondomain. But vision tasks usually leverage images and videorich with high-resolution 2D or 3D spatial information. Weshow that a feed-forward convolutional neural network can betrained to successfully classify an object’s shape using only anoisy monostatic RCS signal and unknown quaternion motionparameters. We construct datasets that contain over 100,000simulated monostatic RCS signals belonging to different shapeclasses. We introduce deep neural network architectures thatproduce 2% classification error on a testing dataset. Wealso introduce a refinement network that transforms signalsgenerated by a simulator so they appear more realistic andhave greater utility in training. To our knowledge, we are thefirst to apply modern deep learning techniques to target shapeclassification using noisy monostatic radar signals. The resultsare a pioneering step toward the recognition of more complextargets using narrowband, monostatic radar.

I. INTRODUCTION

When illuminated with a narrowband radar signal, anobject reflects incident energy and the reflectance depends onthe object’s geometry and material properties. The amountof energy that is reflected directly back toward the source ofillumination is a function of its monostatic RCS (Radar CrossSection). As an object changes orientation, the RCS changesas well. We wish to classify the 3D shape of objects basedonly on a time series of monostatic RCS as the object movesaccording to force-free rigid body motion. Our set of targetobjects includes right circular cones, right circular cylinders,rectangular planes, spheroids, and trapezoidal prisms. Thetarget object set varies in size with respect to a geometricparameter for each class (e.g. radius and height variation forcylinders). The chosen geometric properties in the test set areselected by radar wavelength so that each object is modeledas a Perfect Electrical Conductor (PEC). Since frequencywas a free parameter, it was arbitrarily set to 300MHz, so

1Rutgers University, Department of Electrical and Computer Engineering2Lockheed Martin Radar Group, Moorestown, NJ

(1.1) (1.2) (1.3)

(1.4) (1.5) (1.6)

Fig. 1: Our goal is to correctly predict an object’s shapefamily from a noisy monostatic RCS time series. Each RCSsignal varies temporally according to the object’s rotationthrough space parameterized with generalized Euler tumble,roll, and observation angles. The rotation rates and viewingangles are unknown to the classifier. For example, the objectmay be rotating very fast or very slow about multiple axes.These signals contain added white Gaussian noise to achievea specified mean SNR (signal-to-noise ratio). The additionof a Swerling detection model, where the probability ofdetection is smaller for lower RCS values results in missingdata points. A convolutional neural network (CNN) is used tolearn the separating features that accurately recognize eachobject class overcoming the challenge of noisy data, missingdata and unknown trajectories.

wavelength would be 1 meter, and all geometric parameterswould be scaled to 1 meter. Labelled data, i.e. RCS of knownobjects, are required to train and test our supervised classifier.To generate training and testing data, we use POFacets [2],a MATLAB tool for simulating RCS with respect to viewingangle. For each object, a 2D RCS reference image map iscomputed for each 3D object model using POFacets. Theparameters of the RCS map are the viewing angles θ andφ in standard spherical coordinates. The object is modeledas following rigid body rotations using a specified initialorientation, roll rate and tumble rate. Using the orientationof the object in spherical coordinates over time and thereference 2D RCS map, a time series of RCS signals isgenerated which represents monostatic radar measurementsover time. Several unique motion paths are generated for

Fig. 2: The 4 shape families correspond to 4 target classes in our classifier. Each shape class has a range of geometricparameters and motion parameters. The parameter ranges are listed under each shape. λ is wavelength of the incident radarsignal.

each object.To simulate real-world conditions, the input signals for

testing are corrupted by Gaussian noise and Swerlingdropout. The Swerling Model [29] is a standard method fordetermining the detectability of an object based on SNR andwaveform characteristics. The instantaneous probability ofdetecting each object at at given time is explicitly included inorder to make the performance closer to real world operation.If the Signal to Noise Ratio (SNR) at a given time point is toosmall, a real-world radar system may be unable to separatethe object from noise and will therefore be unable to detectthe object and estimate its RCS.

A subset of the generated signals are used to train a feed-forward convolutional neural network classifier. We employan end-to-end learning architecture, where signal featuresand the classifier are jointly solved for. The inputs area series of RCS samples over time as the object rotatesthrough free space. These objects belong to 1 of 4 shapefamilies, illustrated in Figure 2. When the rotation is simpleand follows a known path (as shown in Figure 4, bottomrow), the problem is trivial. However, the problem becomessubstantially more difficult when the motion parameters areunknown (see Figure 4, top row).

The neural network classifier returns the probability ofeach signal belonging to 1 of 4 shape families. The numberof RCS time series signals used for training and testing variedby the experiments discussed in Section IV. Experimentswith the addition of a fifth class with a small numberof training measurements tests the networks ability to addtraining classes.

In this work, we successfully classify the shape family forrotating objects with unknown roll rates, tumble rates, andunknown initial orientations. The classifier training and test-ing is implemented in Torch and PyTorch, machine learninglibraries for Lua and Python.

II. RELATED WORKProducing an accurate representation of a target object’s

narrowband monostatic RCS is a challenging problem. Radar

specific properties such as wavelength and sampling rate, aswell as object-specific properties such as surface material,shape, and motion may dramatically influence the resultingRCS time series. In this application, the objects underinvestigation are geometrically simple, convex shapes withuniform material construction. The incident energy wave isassumed to be a simple plane wave. The environment isnot modeled, except for the addition of Gaussian noise. Dueto these constraints, the physical optics (PO) approximationis appropriate to produce realistic returns. Open sourceRCS signal generation tools such as POFacets are readilyavailable [2]. POFacets is a Matlab toolbox that has beenused in many applications. e.g. Touzopoulos et al. [32]recently created 3D models of aircraft from images and thengenerated an RCS signal of those aircraft with POFACETS.

A powerful new class of supervised machine learningalgorithms called convolutional neural networks (CNNs)leverage optimization to learn complex latent features forrobust classification. This family of algorithms is calleddeep learning when networks contain many convolutionallayers. In 2012, a convolutional neural network significantlyoutperformed all other algorithms on the object classificationdataset ImageNet [16] and CNNs have become the algorithmof choice for image recognition in computer vision [26], [4],[30], [7], [8].

Traditional neural networks have been used for radar clas-sification tasks for decades, often derived from architecturesdeveloped for speech recognition such as the time-delayneural network [15], [17]. Early work on neural networksfor processing radar signals were applied to identifying thenumber and type of radar emitters in a simulated multisourceenvironment [1]. Pulse-train radar signal classification andsource identification remains a topic of active research [9],[10]. Another recent challenge for neural networks in radaris the identification of radar jamming signals [27], [21].Traditional neural networks have been applied to: SAR im-agery for ground terrain classification [6] and crop classifica-tion [34]; microwave radar for classifying pedestrians and ve-

Fig. 3: A 3D model was first created in POFACETS and thelibrary was used to generate a 2D RCS plane parameterizedby rotation angles θ and φ . Generalized Euler motion issimulated by discretely drawing line segments on the 2DRCS plane. Additive Gaussian Noise and Swerling 2 dropoutare then incorporated to generate the final signal.

hicles [23]; doppler radar for identify human breathing [24];ground penetrating radar for the classification of geologicalstructures [31]; forward scattering radar for identifying verysmall marine targets [11].

While traditional neural networks have been used widelyin radar classification tasks, modern deep learning and CNNsare beginning to take hold in recent applications [18], [22],[12], [3], [19]. The success of the 2D CNNs on standardcolor images has translated well into radar applications.While most deep learning networks are designed for 2Dimagery and can be directly applied to radar-based imagery,however,the RCS time series signals in our work are one-dimensional signals. In fields such as natural language pro-cessing [33] and medical applications [14], 1D CNNs haveprovided successful classification. In this work, we leveragesuccessful deep networks for 2D image recognition, but adaptthe networks to the 1D monostatic RCS signals.

Multi-static radar systems utilize a set of receivers andtransmitters to create multiple 1D RCS signals of a target ob-ject. In prior work, multistatic RCS signals are classified in-dividually using CNNs [18], [28] and the average of multipleCNNs [20] for multistatic contextual target signatures. Themonostatic system addressed in our work contains a singlecollocated receiver-transmitter pair, compared to multistaticsystems which have one or more spatially separated receiversand transmitters. The classification problem of monostaticRCS signals is particularly challenging since the signals donot contain contextual information from multiple sources.

III. GENERATING RCS SIGNALS

The first step in RCS classification is generating 3Dmodels of our target objects. The parameters of these objectsare listed in Figure 2. All models are created using POFacets.the Matlab library for radar simulation. POFacets is usedto generate 128 geometric models, each corresponding to

1 of 4 shape classes in the primary experiments. For eachgeometric model, a monostatic, fixed-frequency 2D RCSmap is created. The 2D RCS maps are parameterized byviewing angles θ and φ in standard spherical coordinates.Each of the 3D models were used for RCS signal generation.Classes with more static parameters, like plate, had more 3Dmodels.

Once the 3D models have been created, POFacets is usedto generate narrowband monostatic RCS values. In the caseof monostatic radar, we assume that the radar source andreceiver are at the same location. The radar frequency iskept constant. It is important to note that in the physicaloptics model, RCS behavior depends only on the size of theobject in wavelengths. Thus we can arbitrarily set the chosenfrequency to 0.3GHz while preserving the general behaviorof any wavelength. Since the 3D model parameters are scaledby wavelength, this allowed for unit shape size parameters.POFacets is used to generate a 2D map of RCS responseversus viewing angle θ and viewing angle φ . The mappingis done by specifying an angular sweep from 0◦ to 180◦ athigh sampling intervals (0.1◦). Symmetry about the shapesallows us to simulate to a maximum rotation of 180◦.

Even for simple rotation about a fixed axis, different setsof geometric parameters, such as for the 3D cone models,produce substantially different RCS responses. The diversityof signals makes the classification task challenging, butCNNs are particularly suited for achieving a wide range ofinvariance over input signals. In feature-space, the distancebetween each of these diverse signals should be smaller thanthe distance between every model in each other class.

A. Generalized Euler Motion

Once an RCS map had been generated, a motion pathis drawn over the surface and the map is be interpolated.The target objects are assigned an initial tumble, roll, andviewing angle. The initial conditions are then propagatedfollowing the physics of rigid body motion in the presenceof no external forces (free motion). A quaternion model isused to generate the motion path parameterized by θ andφ over the precomputed 2D RCS map. The roll and tumbleparameters are bound by the values described in Figure 2.

For each shape class, the center of mass and moment ofinertia are calculated and used for the simulation of realistic,geometry-dependent object motion.

B. Randomizations in Motion Parameters

It would be relatively easy to classify RCS signals fromobjects at integer-valued roll, tumble, and viewing angle. Tomake the problem more realistic and challenging, random-izations were applied to the values of each parameter. Arandom variable x with µ = 1 and σ = 0.5 was multipliedwith the viewing angle (θ and φ ), tumble rate, and rotationrate for each signal. The random variation allows for theconstruction of a database where the same 2D RCS mapcould be used to generate multiple signals. The ability toscale motion parameters with random jitter allowed thecreation a nearly equal number of signals between the 4

Cone Cylinder Plate

(4.1) (4.2) (4.3) (4.4) (4.5) (4.6)

(4.7) (4.8) (4.9) (4.10) (4.11) (4.12)

Fig. 4: There is tremendous variation among the cone, cylinder, and plate RCS signals on the top row. Those signals haverotation about a fixed axis at a relatively slow speed and zero noise. The bottom row features 2 more realistic cone, cylinder,and plate RCS signals. The salient features present in the top examples are now gone.

classes, even though there were more 3D models created forplates. CNN performance is generally improved when thereare equal number of training examples in all classes.

C. Update Rate, Swerling, Gaussian Noise, Gradients, andPyramids

A realistic radar model has a finite update rate. Thenumber of samples as an object rotates are related to theupdate rate (in Hz) and the rotation rates (in radians/second).In this study the kinematic bounds of the objects are de-fined in radians/update, thus the performance of a highlysampled signal that rotates quickly is the same as as if itwere rotating more slowly with a corresponding decreasein radar update rate. The motion parameters are specifiedin radians per update. The radar update rate is arbitrarilyset to 1 Hz. To simulate realistic distortions of each RCSvalue, Gaussian noise and a Swerling detectability model areincorporated into each RCS signal. The addition of Gaussiannoise transforms the RCS from a truth value to an estimate.The specific parameters can be found in Figure 7.

To summarize, the objects under test have complex motionwith tumble, roll, and variable viewing angles, yielding com-plex time series of RCS estimates. The signals are noisy andhave missing data points. Each RCS signal dataset containsvariable values for each of the aforementioned parameters.Therefore, the same classifier is expected to correctly labelRCS signals from objects moving at highly varied speeds inhighly varied motion paths with different amounts of noise.

IV. EXPERIMENTS

We analyze and expand upon our generated datasets byusing the datasets to train and evaluate deep learning archi-tectures. Two datasets were created according to the previousmethodology. The parameters used to create these dataset arelisted in Figure 7. The datasets in this paper are named A4and B4 respectively because they both contain four classesbut have different parameter values.

Fig. 5: Swerling detectability was an important parameter inour model. As the RCS SNR decreased, so did the probabilityof detection. According to the above graph, SNRs of 25and 15dB provide almost no dropped measurements. But forSNR = 5dB, the probability of detection drop significantly,to roughly 50%. The RCS measurements with the lowestmagnitude have a greater likelihood of being dropped to 0.Although Swerling dropout did have a major effect on ourresults, it often preserves larger RCS values in the time seriessignal, and the larger RCS values are expected to play a moresubstantial role in feature selection.

A. Residual Network

Our 1-dimensional residual network architecture is in-spired from He et al. [7]. Two-dimensional 3×3 convo-lutional filters were replaced by 1-dimensional 3×1, 5×1,and 7×1 filters, but the original block module structure andskip connections are maintained. See Figure 6 for a detailedview of the 18-layer network architecture. The residual

Fig. 6: Here, an 18-layer convolutional network is trained toanalyze a noisy RCS signal. The architecture is strongly in-spired by ResNet [7]. Skip connections are shown as curvedarrows. Unlike ResNet, batch normalization is incorporatedinto the model.

network was run over 30 epochs and updated using theAdam [13] optimizer with a learning rate of 0.001. Unlikethe original implementation of ResNet, batch normalizationis done during training to avoid overfitting. The batch sizewas 128 signals for all models except for the 152 layeredresidual network due to GPU memory constraints and wasinstead run with a batch size of 32 signals. The learningrate decayed by 70% if the current validation accuracy didnot improve compared to the average of the previous fivevalidation accuracies. The network with the lowest validationerror was used to evaluate the test data.

B. Expanding the A4 Dataset

In secondary tests we expand our four class dataset toinclude a new trapezoidal prism class. We augment thedataset to answer the question of how our model performancewould be affected by the addition of a smaller class of

Fig. 7: The POFacets library is used to generate both of thedatasets. The total number of signals depend on the numberof parameters.

Fig. 8: The dataset was created such that the new class wassimilar to other classes but had much fewer signals in relationto the other classes.

signals. This object is selected such that it closely resembledone of the original classes, i.e. the plate class. Only onemodel for the trapezoidal prism class was created. The newdataset distribution is shown in Figure 8. The number ofsignals for the new class is significantly lower than the otherclasses.

C. Siamese Network

Our initial hypothesis was that our residual network wouldmisclassify signals belonging to the class with the fewestinstances, confusing them with one of the larger classes. Ifwe assume one class will be confused, the loss function willbe minimized by misclassifying signals in the smallest class.In order to test our hypothesis, we compare the performanceof the residual network with a siamese network. A siamesenetwork consists of two feature extractor modules, eachoutputting a lower dimensional, compared to the originalinput, feature vector. Siamese networks work by clusteringsignals from the same class in close proximity while movingsignals from different classes farther apart in feature space.This network is chosen such that the smaller class is lesslikely to be grouped with another class. The feature extractormodules share the same parameter so that the output vectorscan be compared symmetrically. Residual networks wereused as the feature extractors in the siamese architecture.The network was trained with similar parameters such as theAdam optimizer and using batch sizes of 128 for 30 epochs.The siamese networks used were made up of 18 layered

Fig. 9: Two signals are fed into two CNNs with sharedparameters. The output feature vectors are compared via theSiamese network loss function 1. The target label is one ifthe two signals belong to the same class and zero otherwise.

residual networks. The learning rate was also initialized andadjusted similarly. The comparator or loss function requiresa margin hyperparameter to separate signals of differentclasses:

L =N

∑i

yi ∗‖xi1− xi

2‖22 +(1− yi)∗max(m,‖xi

1− xi2‖2

2) (1)

The function compares the L2 distance between the twofeature vectors x1 and x2 of the two input signals. If thebinary label y is equal to one then the L2 distance isconsidered but if y is equal to zero then the loss functionincreases if the signals are not farther than the marginparameter m. Since the network requires two signals, evalu-ation is based on measuring the similarity between a set ofsignals belonging to different classes. Several methods wereattempted as classifiers but ultimately a nearest neighborclassifier performed with the greatest accuracy. An inputsignal first passes throught the feature extractor network toproduce the corresponding test signal feature vector. The testfeature vector is compared to a set of training feature vectors.The most similar feature vector to the test feature vectorassigns its label to the test vector. Other methods such asa k-nearest neighbor with k greater than one and a supportvector machine were also used but did not perform as well.

D. Robustness Test

In order for the classifier to be utilized in real worldapplications, it must make accurate predictions on signalswith previously unseen distortions. Signal distortions suchas occlusion, saturation, and clutter can affect monostatic

Fig. 10: The refiner network and the discriminator workin a similar adversarial manner as a generative adversarialnetwork. The refiner optimizes the simulated signals to lookmore like the unlabeled realistic data while the discriminatortries to distinguish the difference between the refined andrealistic signals.

RCS signals. Occlusion, in this work, is defined as zeroinga subset of a signal’s RCS values. Clutter is defined asrandom amplitude spikes at random locations within a signal.Saturation or clipping is a hard cutoff at a set threshold thatlimits a signal’s amplitude. Subsampling is the removal ofa random contiguous section of a signal. Occlusion differsfrom subsampling because occluded signals have the samenumber of samples after the distortion is applied unlike signalsubsampling. As a robustness test, the 1-dimensional residualnetworks performance is evaluated on signals with theseunseen distortions. The network is trained on dataset A4which only contains signals distorted by noise and Swerlingdropout. The test set of A4 is distorted by one of thepreviously mentioned distortions and then evaluated withthe residual network. The degree of distortion is varied ineach test, e.g. the test signals are saturated to 75% of theirmaximum amplitude. The residual architecture can receivesignals of various dimensions as its input because of anaverage pooling layer before the end of the feature extractormodule. Subsampling is implemented by circularly shiftingthe signal by a random integer while the first n RCS valuesof the original 501 RCS samples remains the same.

E. Refiner Network

This section is heavily inspired by the work done byShrivastava et al. [25], where the authors train a refinernetwork to make generated images appear more realistic.This network resembles a generative adversarial network [4]where a generating network tries to create “realistic” dataand a discriminator network decides whether the data isreal or fake. The generator network iteratively improves thegenerated image while the discriminator network learns tomore accurately discern the real and fake data apart. Insteadof generating data from a noise distribution as in a GANnetwork, a refiner network converts simulated data into datathat more resembles the realistic data. In this work weuse a refiner network to make our simulated signals fromPOFACETS look like simulated signals with added noise.The refiner network maintains the structure of our signal

Fig. 11: Some examples of signals pre and post refinement.The structure of the signal is maintained but pseudo noise isadded to the original signal from the refiner network.

while adding features to make it appear more like the signalswith noise. The parameters used for the simulated datasetwere similar to A4 dataset except that no noise was addedto the signal and rotation and roll rates were decreased.

The refiner network is a three layered convolutional neuralnetwork that takes a simulated signal as input and outputsa signal of the same size. The discriminator network is a 5layered CNN that receives the refined signal as input andoutputs a vector probability map. The probability map deter-mines which parts of the input signal appear realistic to thediscriminator. The refiner and discriminator networks haveseparate loss functions and are trained iteratively. The refinernetwork’s loss function compares the distance between theinput signal and the generated signal and additionally thelikelihood that the discriminator believes that the refinedsignal is real. The discriminator network’s loss functionis a combination of the likelihood that the discriminatorbelieves that the refined signal is real and the likelihoodthat the discriminator is unsure that the real data is real.Both networks are trained for 50 epochs and with the Adamoptimizer. For each epoch the refiner network is trained twicewhile the discriminator is only trained once.

V. RESULTS & DISCUSSION

In this section we explore the performance of the convo-lutional neural network algorithm on our generated datasets.We also compare different architectures on an augmenteddataset, investigate the robustness of our classifier, andexplore improving our simulated data post-generation.

To our knowledge, our methods are the first applicationof deep learning for monostatic radar signal classification.Monostatic signals are generated from only one transmitterand receiver pair as opposed to multistatic signals whichare generated from more transmitter and receiver pairs.Classifying monostatic signals is thus more difficult becauseonly a single viewing angle is considered and no contextualinformation is included. As in many fields CNNs have beenused to great effect improving the performance on nearlyevery dataset.

A. Classification on A4 and B4 Datasets

Several 1-dimensional residual networks are trained atvarious depths as described in the experiments section, on

Fig. 12: The Residual network was evaluated on both datasetA4 and B4. As the number of layers in the architectureincreases the test error on either dataset decreases but onlyachieves marginal improvement past the 18 layer depth.

both dataset A4 and B4. As network depth is increased,classification error decreases to 2.5% and 2.0% on datasetsA4 and B4 respectively, as shown in Figure 12. Althoughdataset B4 contains a larger variety of parameters it alsocontains more training examples. Models trained on the B4dataset perform better than models trained on the A4 datasetacross all network depths. As a baseline, a neural networkand non-residual convolutional neural network were trainedand evaluated on the A4 dataset with the correspondingtest errors, 29.5% and 6.1%. The neural network contains 6layers, dropout, and non-linear layers. Increasing the numberof layers in the neural network did not significantly im-prove results. The non-residual convolutional neural networkcontained 18 layers and is trained with the same trainingparameters described in the experiments section. When thenumber of layers in the non-residual convolutional networkwas increased the performance plateaued and then began todegrade.

Our classification results for the 1D residual networksseem counter-intuitive at first glance, since CNNs typicallyperform worse on datasets that have more variation. Datasetswith more variation are simply more difficult to learn becausethe CNN will have to learn specific filters to deal with thatvariation. Not only does the B4 dataset contain more signalsbut it contains faster roll and tumble rates. The faster roll andtumble rates for our signals actually increases the amountof information per sample because the models we use togenerate our signals have large distinct edges and smoothsurfaces. If the models used instead had rough surfaces andless distinct edges, information would be lost by increasingthe roll and tumble rates. The B4 dataset also contains signalswith lower SNR rates and more varied viewing angles,which decrease the amount of information within the signals.Regardless of the size of the network, test performance on theB4 dataset was greater than on the A4 dataset. It was for thisreason that the A4 dataset was selected to create new datasets

Fig. 13: Three different classifier modules are compared aftera CNN feature extractor of varied depths. The nearest neigh-bor classifier performed with the highest overall accuracyacross all architectures tested.

Fig. 14: The single residual network’s robustness perfor-mance is shown for several novel distortions. This benchmarkwas a way to compare the network robustness to realisticsignal distortion. Signal occlusion, clutter, saturation, andsubsampling were the realistic distortions used for thisbenchmark.

and to further train/test our models. If a more difficult datasetis used then there will be a clearer distinction between theresults of the single CNN versus the siamese network.

B. Classification on A5 Dataset

The A5 dataset contains the same set of parameters as A4but includes an additional geometric model of trapezoidalprism. The additional class contains only one model andmakes up a small portion of the total signals in the A5dataset. The single residual network outperforms the siamesenetwork in terms of overall accuracy as shown in Figure 16.The nearest neighbor classifier performed significantly bettercompared to the k-nearest neighbor and support vector ma-

chine classifiers. From the confusion matrices in Figure 15,the single network classifies the trapezoidal prism classmore accurately than all siamese network architectures. Thesiamese network with a nearest neighbor classifier correctlypredicts the trapezoidal class nearly as frequently as thesingle network. The k-nearest neighbor and support vectormachine classifiers however, consistently misclassify thetrapezoidal prism class. The single network did not performwell in terms of accuracy for the cone class. The precision,recall, and F1-scores for the single network and siamesenetwork with a nearest neighbor classifier is shown in TableI. The single network performs worst on the cone classand best on the spheroid class in terms of F1-scores. Thesiamese network performs worst on the trapezoidal prismclass. The average F1-score scores for the single networkand the siamese network were 0.948 and 0.956 respectively.

The siamese network structure has been used on a varietyof tasks such as signature matching and facial identificationwith high performance. This type of network performs mosteffectively when the number of classes in a dataset is largeand the number of data per class if relatively low. Thearchitecture’s unique comparator function forces input fromthe same class to cluster in high dimensional space and inputfrom different classes to be farther apart in high dimensionalspace. The loss function for a typical CNN classifier is thenegative log likelihood function which does not contain anyconstraint on how far apart the output vectors of the featureextractor module are.

If the A4 dataset is augmented with a class that isunique but similar to one of the other classes in the dataset,we would expect that the CNN would be more likely tomisclassify the novel class. The results of this experiment areshown in Table I. The single residual network outperformsall types of the siamese networks in terms of overall accuracyas shown in Figure 16. Initially it appears that the lack ofclustering term in the objective function does not reduceperformance on the A5 dataset, however the CNN couldmaintain high accuracy even while misclassifying all ofthe signals in the newest class. To further investigate thisresult the precision, recall, and the F1-score of each class iscalculated and shown in Table I. The siamese networks withthe k-nearest neighbor and support vector machine classifiersmisclassified the trapezoidal prism class in every case. Thesingle residual network and the siamese network with thenearest neighbor classifier were both able to correctly classifythe trapezoidal prism class a majority of the time. It wassurprising that the single residual network performed betteron the new class than the siamese network.

In Table I we can see that the F1-score for the trapezoidalclass is greater in the single network section than the siamesenetwork section. Overall the average F1-score across classesis 0.948 and 0.956 for the single network and siamese net-work respectively. If we weigh the F1-score by the numberof signals per class there is an even larger difference inperformance. The weighted F1-score of the single networkand siamese network are 0.947 and 0.959 respectively. Itappears that the single network showed high performance

Fig. 15: The confusion matrices for all siamese networks and the single residual network. Confusion matrices belong to the(a) single network, (b) the siamese network with nearest neighbor, (c) siamese network with k nearest neighbor, and (d)siamese network with support vector machine. The classes are enumerated as (0) cone, (1) cylinder, (2) plate, (3) spheroid,and (4) trapezoidal prism.

on the trapezoidal prism class because it misclassified moreof the signals in the cone class. The siamese network with thenearest neighbor classifier performs well because the featureextractor module is more able to separate the clusters for eachclass. Intuitively we would expect the k nearest neighborand support vector machine classifiers to outperform thenearest neighbor classifier. A potential reason that the nearestneighbor classifier performed better, may be a result ofthe dimensionality of the output vector from the featureextractor module. As the number of dimensions increase, thek nearest neighbor algorithm tends to perform worse due tothe increasing space in between points.

C. Robustness Metric Performance

The robustness metric is designed as a means of studyinghow a convolutional neural network would perform on un-seen data corrupted by unknown distortions. The four typesof distortions were signal occlusion, clutter, saturation, andsubsampling. The network was trained on the A4 datasetwhich does not contain any of the four distortions. Therobustness results for the 18-layered residual network areshown in Figure 14. The amount of the signal occludedvaries from 0% to 99%. At 0% distortion none of thesignal is distorted while at 99% distortion only 1% of thesignal is non-zero. The overall F1-score of the 18-layeredresidual network is above 0.8 at 50% signal occlusion.The steepest degradation in performance is around 85%signal occlusion. Clutter immediately begins to degrade the

performance of the network. However after roughly 50%signal clutter the performance of the network plateaus foreach class. The networks performance on saturated signalsis essentially unaffected before 13% saturation. The averageF1-score drops most significantly at 20% saturation andperformance improves at 40% saturation before falling. Thefinal distortion is the removal of signal samples from theoriginal 501 sample length to 101, 51, 25, and 10 signallengths. When the signals are a fifth of the original numberof samples the average F1-score decreases to 0.7. Even whenthe test signal length is 25, the network achieves an averageF1-score of roughly 0.4. At a test signal length of 10 samplesthe average F1-score is essentially random.

A CNN classifiers ability to handle noisy input data can beevaluated in multiple ways, such as testing on a novel set ofdata with distortions seen in the training data or testing on anovel set of data with distortions unseen in the training data.Monostatic signals can have a variety of distortions suchas signal occlusion, clutter, sensor saturation, subsampling,etc. or a combination of several. Since generating a datasetwith every combination of signal distortions is unwieldy weinstead decide to evaluate our systems robustness to distor-tions by evaluating our model on data with distortions notseen in the training data. The results shown in Figure 14 arefrom the single 1-dimensional 18 layered residual networkhowever the results for the same network with more layersis nearly identical. The evaluation set was generated via themethod described in the experiments section. Surprisingly,

Single Network Siamese Network+NNClass Precision Recall F1-Score Precision Recall F1-Score

Cone 0.94 0.89 0.91 0.92 0.95 0.94Cylinder 0.96 0.96 0.96 0.95 0.94 0.95Plate 0.94 0.97 0.95 0.99 0.96 0.98Spheroid 0.98 1.00 0.99 0.98 1.00 0.99Trapezoidal Prism 0.93 0.94 0.93 0.95 0.89 0.92

TABLE I: Computed precision, recall, F1-scores for the single residual network and the siamese network with the nearestneighbor classifier. The average and weighted F1-score of the siamese network is larger than the single network.

Fig. 16: The A5 dataset is trained and evaluated on a singleResidual network and compared to three residual Siamesenetworks with various classifier modules. All models weregiven the same number of epochs to run but the trainingmodel with the highest validation accuracy was used fortesting. ANN stands for artificial neural network, NN isnearest neighbor, kNN is k nearest neighbor, and SVM standsfor support vector machine.

our network performs remarkably well on signals that havebeen occluded by even 75%. Occlusion may not affect ournetwork significantly because the rotation rates used in ourdataset generation are relatively large and sometimes themodel is rotating several times within the full window ofsampling. Even if the signal is occluded significantly, somesignals with high rotation rates may contain enough informa-tion for classification. However signals generated with slowerrotation rate parameters do not appear to complete rotationsmultiple times within a full window. For these cases theCNN is able to discern the object within a limited viewingwindow. The CNN is however very sensitive to signal clutter,accuracy per class drops as soon as clutter is introduced.Clutter in this work is the addition of random peaks ina signal and CNNs are sensitive to slight distortions toinput data. This distortion is similar to the distortion createdby adversarial attacks such as FGSM [5], except that weare adding distortions with random amplitudes at randomlocations. Most CNNs are not robust to adversarial attacksand it appears that clutter approximates an adversarial attack.The CNN is resilient to signal saturation up to roughly 15%,then performance decreases significantly soon after. Signals

with heavy saturation begin to appear indistinguishable fromeach other and the filters that the CNN uses to detect featurescannot distinguish between each class. The rise in F1-scoreof some of the classes seems to be an artifact of the datasetinstead of a feature of the network. The final distortion issubsampling the input signal. This measure is similar tothe occlusion distortion but the number of total of samplesin the signal do not change in the occlusion distortion.The results of subsampling show that the CNN can usesignals with lengths as small as 25 samples as input andachieve a relatively good F1-score. The performance halveswhen input size is 5% of its original length. The siamesenetwork evaluated with the robustness metric is not includedbecause the current siamese testing method compares aninput signal to a subset of the training data. Since the trainingdata does not contain the distortions of the evaluation data,unsurprisingly, our network performs very poorly.

D. Classification on Refined Dataset

To compare the difference between the simulated datasetand the refined dataset we train separate three layeredconvolutional neural networks. The network’s performancewas evaluated by classifying simulated signals with addedwhite noise. The simulated signals with added noise werealso used as “real” data in the refiner network training.Overall the model’s performance on the evaluation dataset isgreater when the model is trained using the refined dataset by3.5%. The accuracy of the network trained on the simulatedsubset A4 dataset is 86.7% while the accuracy of the networktrained on the refined dataset was 90.2%.

No simulator can perfectly model the all of the nuancesand variables that are required to create real data. Thereforetraining a CNN on simulated data typically does not performwell on real data. This does not mean that networks should betrained with only real data because representative real data isdifficult and expensive to obtain. Real data is also potentiallybiased in terms of only representing certain occurrences andtypically few variables are able to be controlled when creat-ing real datasets. Simulated data is useful because you cangenerate very large datasets easily, can make adjustments onone variable at a time, and know the values of all parametersused to create that data. The generative CNN called therefiner network described in the experiments section has beenshown to make simulated data appear more like real data.Using the refined data to train a small network on a subsetof our A4 dataset results in a 3.5% accuracy improvementover training using the equivalent simulated data. For that

Fig. 17: The result of the refiner network is shown above. Therefiner network takes the signal in (a) as input and returnsthe signal in (b). It learns to make this transformation byobserving signals with noise like the signal in (c).

test the only “realistic” feature added to the “real” data wasnoise. In Figure 17 we see that the refined signal seeminglyadds noise to the simulated signal but maintains the structuralelements of the signal.

VI. CONCLUSION

To the best of our knowledge, we are the first to trainconvolutional neural networks to classify object shape frommonostatic radar signals. We expand upon the POFacetsMATLAB package to generate large datasets with a varietyof selected parameters. Realistic motion, added noise, andSwerling dropout enhance the initial simulation generation.Utilizing the latest in deep learning architecture we create a1D residual network capable of achieving test error resultsas low as 2-2.5% on our generated datasets. Our A4 datasetis augmented with an additional test and then evaluatedwith a siamese network architecture. The siamese CNNdoes perform as well in terms of accuracy but surpassesthe performance of the single residual network in termsof average F1-score. The robustness of our CNN is thenevaluated on signals it previously unseen realistic distortions.The single residual network performs well on signals withocclusion and subsampling but performs poorly on signalswith clutter and saturation. We explored increasing thequality of the simulated signals using a state of the art refinernetwork. Deep learning models trained on the refined signalsoutperform models trained on the original simulated data.

ACKNOWLEDGMENT

This project was funded by Lockheed Martin. We wouldlike to thank Rowland Escritor and Kevin Vance for theirsubstantial efforts coordinating and facilitating this project.

REFERENCES

[1] J. A. Anderson, M. T. Gately, P. A. Penz, and D. R. Collins. Radarsignal categorization using a neural network. Proceedings of the IEEE,78(10):1646–1657, 1990.

[2] F. Chatzigeorgiadis. Development of code for a physical opticsradar cross section prediction and analysis application. PhD thesis,Monterey California. Naval Postgraduate School, 2004.

[3] M. Gong, J. Zhao, J. Liu, Q. Miao, and L. Jiao. Change detection insynthetic aperture radar images based on deep neural networks. IEEEtransactions on neural networks and learning systems, 27(1):125–138,2016.

[4] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. InAdvances in neural information processing systems, pages 2672–2680,2014.

[5] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessingadversarial examples. arXiv preprint arXiv:1412.6572, 2014.

[6] Y. Hara, R. G. Atkins, S. H. Yueh, R. T. Shin, and J. A. Kong.Application of neural networks to radar image classification. IEEETransactions on Geoscience and Remote Sensing, 32(1):100–109,1994.

[7] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for imagerecognition. In Proceedings of the IEEE conference on computer visionand pattern recognition, pages 770–778, 2016.

[8] G. Huang, Z. Liu, K. Q. Weinberger, and L. van der Maaten. Denselyconnected convolutional networks. arXiv preprint arXiv:1608.06993,2016.

[9] I. Jordanov and N. Petrov. Sets with incomplete and missing datannradar signal classification. In Neural Networks (IJCNN), 2014 Inter-national Joint Conference on, pages 218–224. IEEE, 2014.

[10] I. Jordanov, N. Petrov, and A. Petrozziello. Supervised radar signalclassification. In Neural Networks (IJCNN), 2016 International JointConference on, pages 1464–1471. IEEE, 2016.

[11] C. Kabakchiev, V. Behar, I. Garvanov, D. Kabakchieva, andH. Rohling. Detection, parametric imaging and classification of verysmall marine targets emerged in heavy sea clutter utilizing gps-basedforward scattering radar. In Acoustics, Speech and Signal Processing(ICASSP), 2014 IEEE International Conference on, pages 793–797.IEEE, 2014.

[12] Y. Kim and T. Moon. Human detection and activity classificationbased on micro-doppler signatures using deep convolutional neuralnetworks. IEEE Geoscience and Remote Sensing Letters, 13(1):8–12,2016.

[13] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014.

[14] S. Kiranyaz, T. Ince, and M. Gabbouj. Real-time patient-specific ecgclassification by 1-d convolutional neural networks. IEEE Transactionson Biomedical Engineering, 63(3):664–675, 2016.

[15] G. Kouemou. Radar target classification technologies. In RadarTechnology. InTech, 2010.

[16] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classificationwith deep convolutional neural networks. In Advances in neuralinformation processing systems, pages 1097–1105, 2012.

[17] K. J. Lang, A. H. Waibel, and G. E. Hinton. A time-delay neuralnetwork architecture for isolated word recognition. Neural networks,3(1):23–43, 1990.

[18] J. Lunden and V. Koivunen. Deep learning for hrrp-based target recog-nition in multistatic radar systems. In Radar Conference (RadarConf),2016 IEEE, pages 1–6. IEEE, 2016.

[19] E. Mason, B. Yonel, and B. Yazici. Deep learning for radar. In RadarConference (RadarConf), 2017 IEEE, pages 1703–1708. IEEE, 2017.

[20] Z. Mathews, L. Quiriconi, U. Boniger, C. Schupbach, and P. Weber.Learning multi-static contextual target signatures. In Radar Confer-ence (RadarConf), 2017 IEEE, pages 1568–1572. IEEE, 2017.

[21] A. Mendoza, A. Soto, and B. C. Flores. Classification of radar jammerfm signals using a neural network. In Radar Sensor Technology XXI,volume 10188, page 101881G. International Society for Optics andPhotonics, 2017.

[22] D. A. Morgan. Deep convolutional neural networks for atr from sarimagery. Proceedings of the Algorithms for Synthetic Aperture RadarImagery XXII, Baltimore, MD, USA, 23:94750F, 2015.

[23] S. Park, J. P. Hwang, E. Kim, H. Lee, and H. G. Jung. A neuralnetwork approach to target classification for active safety system usingmicrowave radar. Expert Systems with Applications, 37(3):2340–2346,2010.

[24] A. Rahman, E. Yavari, V. M. Lubecke, and O.-B. Lubecke. Noncontactdoppler radar unique identification system using neural network clas-sifier on life signs. In Biomedical Wireless Technologies, Networks,and Sensing Systems (BioWireleSS), 2016 IEEE Topical Conferenceon, pages 46–48. IEEE, 2016.

[25] A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang, andR. Webb. Learning from simulated and unsupervised images through

adversarial training. In The IEEE Conference on Computer Vision andPattern Recognition (CVPR), volume 3, page 6, 2017.

[26] K. Simonyan and A. Zisserman. Very deep convolutional networks forlarge-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.

[27] A. Soto, A. Mendoza, and B. C. Flores. Optimization of neuralnetwork architecture for classification of radar jamming fm signals.In Radar Sensor Technology XXI, volume 10188, page 101881H.International Society for Optics and Photonics, 2017.

[28] P. Stinco, M. S. Greco, F. Gini, and M. La Manna. Non-cooperativetarget recognition in multistatic radar systems. IET Radar, Sonar &Navigation, 8(4):396–405, 2013.

[29] P. Swerling. Probability of detection for fluctuating targets. IRETransactions on Information Theory, 6(2):269–308, 1960.

[30] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov,D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper withconvolutions. In Proceedings of the IEEE conference on computervision and pattern recognition, pages 1–9, 2015.

[31] P. Szymczyk and M. Szymczyk. Classification of geological structureusing ground penetrating radar and laplace transform artificial neuralnetworks. Neurocomputing, 148:354–362, 2015.

[32] P. Touzopoulos, D. Boviatsis, and K. C. Zikidis. 3d modelling ofpotential targets for the purpose of radar cross section (rcs) prediction:Based on 2d images and open source data. In Military Technologies(ICMT), 2017 International Conference on, pages 636–642. IEEE,2017.

[33] X. Zhang and Y. LeCun. Text understanding from scratch. arXivpreprint arXiv:1502.01710, 2015.

[34] Y. Zhang and L. Wu. Crop classification by forward neural net-work with adaptive chaotic particle swarm optimization. Sensors,11(5):4721–4743, 2011.

deep convolutional neural networks as a method to classify

Documents