deep complex networks for protocol-agnostic radio ...applicability in practical and large scale...

Deep Complex Networks for Protocol-AgnosticRadio Frequency Device Fingerprinting in the Wild

Ioannis Agadakos∗SRI International

Email: [email protected]

Nikolaos Agadakos∗†University of Illinois at Chicago


Jason PolakisUniversity of Illinois at Chicago


Mohamed R. Amer†Robust.AI


Abstract—Researchers have demonstrated various techniquesfor fingerprinting and identifying devices. Previous approacheshave identified devices from their network traffic or transmittedsignals while relying on software or operating system specificartifacts (e.g., predictability of protocol header fields) or charac-teristics of the underlying protocol (e.g., central frequency offset).As these constraints can be a hindrance in real-world settings, weintroduce a practical, generalizable approach that offers signifi-cant operational value for a variety of scenarios, including as anadditional factor of authentication for preventing impersonationattacks. Our goal is to identify artifacts in transmitted signalsthat are caused by a device’s unique hardware “imperfections”without any knowledge about the nature of the signal.

We develop RF-DCN, a novel Deep Complex-valued NeuralNetwork (DCN) that operates on raw RF signals and is completelyagnostic of the underlying applications and protocols. We presenttwo DCN variations: (i) Convolutional DCN (CDCN) for modelingfull signals, and (ii) Recurrent DCN (RDCN) for modeling timeseries. Our system handles raw I/Q data from open air captureswithin a given spectrum window, without knowledge of themodulation scheme or even the carrier frequencies. This enablesthe first exploration of device fingerprinting where the fingerprintof a device is extracted from open air captures in a completelyautomated way. We conduct an extensive experimental evaluationon large and diverse datasets (with WiFi and ADSB signals)collected by an external red team and investigate the effectsof different environmental factors as well as neural networkarchitectures and hyperparameters on our system’s performance.RF-DCN is able to detect the device of interest in realistic settingswith concurrent transmissions within the band of interest frommultiple devices and multiple protocols. While our experimentsdemonstrate the effectiveness of our system, especially underchallenging conditions where other neural network architecturesbreak down, we identify additional challenges in signal-basedfingerprinting and provide guidelines for future explorations.Our work lays the foundation for more research within this vastand challenging space by establishing fundamental directions forusing raw RF I/Q data in novel complex-valued networks.

I. INTRODUCTION

In recent years, the feasibility of identifying devicesthrough browser or device fingerprints has garnered significantattention [1], [2], [3], [4], [5], [6], [7]. Such techniques are notrestricted to device characteristics and specifications but canalso fingerprint devices based on imperfections inherent in thedevice’s hardware [8]. In an alternative line of research, tech-niques have been proposed for fingerprinting devices based onunique characteristics of the device’s hardware that are exposed

∗ Equal Contributions† This work was done while working at SRI International.

in their transmissions. A remote network-based fingerprintingtechnique was presented in the seminal work by Kohno etal. [9], but could be prevented by software-level defenses [10].Brik et al. [10] proposed a more robust but geographically-localized technique that focused on transmitted radio frequency(RF) signals. However, their approach also relied on protocol-specific information, and crucial practical factors were notconsidered in the experimentation and evaluation. While bothapproaches pose significant contributions, the diversity ofwireless protocols in the wild (which differ considerably acrossversions or implementations [11]), coupled with the emergenceof new protocols and standards, hinders their applicabilityin many real-world scenarios. Even the more generalizabletechniques by Danev et al. [12], [13] require, at least, someknowledge of higher-level but protocol-defined information,such as the frequencies of the carrier or sub-carrier signalsor the bandwidth of the transmission channel. Inspired by theextensive prior research in the area, we explore novel methodsto ameliorate current limitations and study the feasibility offingerprinting with no prior knowledge of even the centralcarrier frequencies or modulation schemes involved.

Furthermore, while RF signals are inherently complex-valued, limitations of both theoretical frameworks and avail-able development tools have constrained RF-based researchto real-valued conventional networks that severely limit therepresentational power of generated models. However, recentbreakthroughs laid the theoretical foundations for complex-valued networks [14] and paved the road for a new strain ofdeep learning models which are able to work natively withcomplex data. These networks have higher representationalcapacity and thus more degrees of freedom, allowing themto locate appropriate manifolds that are out of the reach ofconventional real-valued networks. In this work, we leveragethe power of complex-valued deep learning architectures, studytheir ability to operate on raw complex RF data, and devisemethodologies for training these novel networks. Our modelsare trained on raw I/Q data of signals captured under chal-lenging environmental conditions in real settings, to show theirapplicability in practical and large scale deployments. In fact,RF-DCN is able to fingerprint devices using open air captureswithout any protocol preprocessing, and without knowledge ofthe underlying protocol or what carrier frequencies to look for.

Specifically, our system guides the deep neural networktowards identifying the artifacts in signals that are left by theunavoidable minor hardware variations or impairments (herebyreferred to as “imperfections”) that occur during the manu-facturing process. To train our models and demonstrate theirperformance under different conditions, we utilize a plethora

arX

iv:1

909.

0870

3v1

[cs

.CR

] 1

8 Se

p 20

19

of datasets with as many as 1,000 devices. With each samplecorresponding to a captured transmission of only 64 µseconds,our total training set requires just 6.4-51.2 milliseconds worthof captured signals per device. We show that when using only a64 µs long signal, RF-DCN can correctly fingerprint the device72% and 100% of the time for WiFi and ADS-B transmittersrespectively. We also show that our system is able to train asingle model for both types and obtain an accuracy of 82%despite the drastic differences between the two signals (i.e.,protocols, transmission frequencies, and modulation schemes).

We find that RF-DCN is completely protocol-agnostic atthe different spectrum ranges of our dataset and can handledevices that operate at multiple ranges (e.g., a device thattransmits WiFi at 2.4GHz and 5GHz) without any priorknowledge, without relying on protocol implementation flawslike predictable sequence numbers [15], exposed device identi-fiers [16], [17], [18] or software-level data patterns (our datasetonly includes control packets). Our system can be applied ina defensive capacity in a wide range of different scenarios,including device-authentication for authorizing access to re-sources (e.g., a cell tower or access point), detecting unautho-rized or fraudulent devices in a restricted area, and unmaskingspoofed transmissions (including relay attacks or attacks us-ing compromised certificates). Nonetheless, our experimentalevaluation is designed to abstract away the specificities ofthe deployment scenario and focuses on the central task ofidentifying devices based on their transmitted RF signals.

Overall, our research presents new techniques for handlingraw RF I/Q data in their native complex format, highlights thechallenges of RF-based fingerprinting in real-world settings,and demonstrates novel directions for passively fingerprintingdevices without requiring any knowledge of the underlyingprotocols. We believe that being able to fingerprint devicesin an agnostic and scalable way can greatly augment cur-rent authentication schemes and defend against challengingattacks such as relay attacks or credential theft. Our evaluationexplores numerous critical dimensions, yet many interestingfuture research directions remain unexplored. We hope thatmethodologies, network architectures, and experimental find-ings presented in this work will spearhead more research inthe area. In summary, our main research contributions are:

• We present a novel device fingerprinting technique thatworks on complex I/Q data representing raw RF wirelesssignals. We build a novel deep complex-valued network thatis protocol-agnostic, designed to identify devices based onlearnable characteristics from passively captured transmittedwireless signals. This is the first, to our knowledge, systemable to fingerprint devices using unprocessed raw signals ina range of frequencies of interest.

• We detail a series of neural network architectures andconfigurations, and explore their suitability for device fin-gerprinting. We derive appropriate insights and highlight theinherent limitations of certain architectures, which can helpguide future research.

• We provide an extensive experimental evaluation usingdatasets from an externally organized red team. Our ex-periments investigate multiple dimensions of signal-baseddevice fingerprinting and assess the impact of multiplefactors under different environmental settings and real-world conditions that previous work has not explored. We

demonstrate the effectiveness of our system’s fingerprintingcapabilities under challenging and realistic scenarios, usingreal-world, multi-device, open air captures of RF signals.

II. BACKGROUND AND DEPLOYMENT PREDICATES

A. RF Primer

We first introduce relevant background information thatfacilitates comprehension of the following sections. This in-cludes basic telecommunication terms and notions and techni-cal details on the two signal families that we use throughoutthis paper. As the methodologies involved in signal processingare sufficiently complicated, we refer readers without back-ground in the area to [19] for a more complete overview.

Baseband signals. Signals are encountered in this form be-fore being transmitted or after the receiver removes the carriersignal during the early stages of demodulation, centering thereceived signal around 0Hz.

Modulation is the process by which a transmitter encodesinformation so that it can be effectively and efficiently trans-mitted. The main idea is to encode the information from abaseband signal by utilizing a carrier signal of higher fre-quency. Information is encoded by modifying one or more ofthe carrier’s characteristics: amplitude, phase, and frequency.

I/Q data. Signals are commonly represented in a formatknown as I/Q form [20]. This is based on a methodology ofrepresenting a signal using a vector of complex numbers. The’I’ stands for the in-phase component whereas the ’Q’ standsfor the quadrature or out-of-phase component. Analytically itcan be represented by the following equation:

S(t) = A cos θt+A sin θt→ S(t) = I +Qj

In essence any signal can be generated by a combination ofin-phase and out-of-phase components and, hence, it is widelyused in signal processing and telecommunications.

SDR. Software Defined Radios revolutionized the fieldof telecommunications as they allowed for the reception andtransmission of RF signals with any modulation scheme and atany frequency while offering the convenience of defining ev-erything through software despite utilizing the same underlyinghardware. One of the key components behind this technology isthe quadrature mixer. The quadrature mixer can synthesize anysignal using two signals with a phase difference of 90 degrees,and these quadrature signals can be conveniently expressed inanalytical form as I/Q data. Figure 11 in Appendix A depictsthe process of transmuting I/Q data to a modulated signal usinga quadrature modulator.

Channel. The medium through which the signal is trans-mitted, which can greatly alter signals in transmission and havea detrimental effect on them. A channel’s characteristics are of-ten ephemeral as they change due to a variety of environmentalconditions, and the same signal transmitted through the samemedium at different times can differ drastically. This inherentchallenge is pertinent to our research goals, and we explore itin depth during our evaluation.

SNR. Due to the transient nature of a channel’s conditions,the Signal-to-Noise Ratio metric is useful in denoting thequality of the received signal by quantifying how much ofa received signal is noise and what portion is actually useful.

Fig. 1: Spectrograms depicting WiFi and ADS-B signalscaptured under different channel environments and SNR. Theoutdoors WiFi signals depict the open air capture conditions.

Spectrogram. Captured transmissions can be visually rep-resented as spectrograms. To obtain a spectrogram from rawI/Q data, a Short Term Fourier Transform (STFT) [21] isperformed over the entire captured trace. The SFTF outcomedepicts the spectral content of the signal through time andone can easily identify drastic changes, the effect of noise, ordifferent parts of the transmission. Figure 1 shows a collectionof spectrograms for both WiFi and ADS-B signals underdifferent conditions to highlight the aforementioned effects.Plots (i) and (ii) show a WiFi signal captured from the samedevice over two different days and (vi) depicts the signalafter a basebanding and low pass filtering operation. Sincethe capture was conducted outdoors, the interference fromother channels/devices is evident in the first two plots. In(iii) we show an unprocessed WiFi signal that was capturedindoors where the SDR basebanded the signal without usingan intermediate frequency (IF); one can notice that noise isstill present since the signal has not been filtered. Plots (iv),(v) show both versions of ADS-B transmissions under differentSNR (the x-axes are different for the two signals as the shortversion is almost half the transmission).

WiFi. WiFi signals are part of the 802.11 IEEE protocolfamily. While many variations exist, in practice the mostfrequently encountered follow the 802.11b and 802.11g spec-ifications (which use a central frequency of 2.4GhZ) and thenewer 802.11n version (which utilizes the 5GhZ band). WiFisignals utilize the Orthogonal Frequency Division Multiplexing(OFDM) modulation which encodes information not in onecarrier signal but in multiple sub-carriers, where each sub-carrier can be modulated using amplitude, phase or frequencymodulation schemes such as 16QAM, 16QPSK etc. The in-formation content in a WiFi signal can vary significantly insize. We direct the interested reader to [22] for an excellentresource with more information.

ADS-B. A protocol of choice used by air crafts and isscheduled to replace most RADAR systems according to theFederal Aviation Agency [23]. It is encountered in two forms,either short or extended (64 or 120 µs). Apart from the headerand the CRC, short messages only contain a unique aircraftidentifier (24bits), whereas the extended version carries addi-

tional information on the altitude, position, heading, horizontaland vertical velocity. ADS-B transmits at 1090MHz with50KHz bandwidth or at 978MHz with 1.3MHz bandwidth.The modulation employed in this protocol is referred to asPulse Position Modulation (PPM), which encodes data bysimply increasing or decreasing the width of the carrier pulsesaccording to the value of the modulating signal at a giventime. It is a much simpler modulation scheme than the oneused in WiFi signals and the messages transmitted are either5 or 10 bytes. An important characteristic of this protocol isthat it is very robust to multipath effects and environmentalnoise. A more detailed overview and description can befound in [24]. While ADS-B packets contain a unique deviceidentifier, certain transceivers have the option for anonymousmode where a randomized ID is sent [25]. Prior work hasdiscussed how ADS-B data can easily be spoofed to disruptair travel safety [26], [27]. As such, the ability to fingerprintand identify transmitters without relying on identifiers hassignificant operational value.

B. Deep Learning Primer

This section introduces the basic concepts required forfollowing the deep learning networks and methodologies dis-cussed in this paper. Readers familiar with the field mayadvance to the next section. An excellent detailed presentationof foundational concepts can be found in [28].

CNN. Convolutional Neural Networks [29] perform ex-ceptionally well at detecting spatial points of interest andstructural properties, and several variants have been used ina plethora of applications with great success, such as faceand gender detection [30], semantic image segmentation [31],speech recognition [32] and stock prediction [33]. The basicidea behind a CNN is to perform a convolution operationbetween a weight vector of a given size (often called akernel) and a subset of the input data, with the length of eachconvolution being the size of the kernel. Convolutional layersare almost always used in conjunction with pooling layers toachieve dimensionality reduction without loss of information.

RNN. Recurrent Neural Networks are networks that excelat modeling sequential or time series data. The core conceptsbehind them are the context and temporal relations betweendata points, potentially in different parts of the input. Twoadvanced forms of RNNs enhance the concept of context by in-troducing well-defined memory properties, namely Long ShortTerm Memory Models (LSTM) and Gated Recurrent Units(GRU). These advanced models allow the recurrent networkto “choose” and “memorize” parts of the context that aresignificant for the model, emulating a form of memory. Whilethe mathematical models behind each variant are different,both networks have been used in practice with significantsuccess and similar capabilities [34], [35], [36], [37].

Back Propagation Through Time. As recurrent networkscapture sequential properties and try to find temporal relations,the gradient descent step during the back propagation phaseessentially attempts to link the current state with previousstates in time. This is referred to as Back Propagation ThroughTime (BPTT) [38]. RNNs face the problem of vanishing orexploding gradients for lengthy time sequences and whileLSTMs are known to be more robust they can still suffer fromthe same problems albeit at much longer sequences [39].

Complex-valued inputs. Most networks currently used inpractice operate on real-valued data. However, many problemsare expressed in complex numbers, e.g., signals (RF or other-wise). Down-converting complex numbers to either the real orimaginary part limits the available information and introduces amiss-match between the model and problem at hand. To intro-duce complex-valued inputs and, thus, create complex-valuednetworks that theoretically have higher representational power(as they can express solutions in complex-valued manifolds)one needs to address many theoretical challenges and providealternatives to elementary units such as activation units, orproperly initializing complex-valued weights in hidden layers.

Complex batch normalization. Data normalization iscommonly used for efficiently training in most types of deepnetworks. However, when the underlying problem lies in thecomplex domain, data requires special handling as outlinedin [14]. Briefly, one cannot perform two-way independent nor-malization in the real and imaginary part of a complex numberas there is also information precisely in the relation of the realand imaginary axes. This is highly evident if we consider thepolar representation of a complex number z = rειφ, where φis the angle the number forms in the complex plane with thepositive real axis, and is given by φ = atan2(IM(z), Re(z)).If one operates on either the real or imaginary part indepen-dently, thus disregarding any correlation, we can introduce anaffine transformation which is not equivalent to normalization.To properly handle the normalization process, we treat thecomplex numbers as 2D vectors and the process as 2D whiten-ing. We scale the 0 centered data by the inverse square rootof their covariance matrix V (the existence of the inverse isguaranteed by the Tikhonov regularization; see Appendix A4):

x = (V)−12 (x− E[x]),

where V is the 2x2 covariance matrix:

V =

(Vrr VriVri Vii

)=

(Cov(<(x),<(x)) Cov(<(x),=(x))Cov(<(x),=(x)) Cov(<(x),<(x))

)The complex batch normalization layer is defined as:

BN(x) = γx + β, where γ =

(γrr γriγri γii

)Complex activation units. As in the normalization case,

activation functions must also be adapted to handle complex-valued numbers. Several activation units have been proposedsuch as ModRelu [40], CReLU [14] and zReLU [41]. Thesefunctions preserve differentiability, at least in part, to main-tain compatibility with back-propagation. As shown in [42],full complex differentiable functions are not strictly required.While many activation functions have been proposed, weevaluate the following complex activation functions:

CReLU = ReLU(<(z)) + ιReLU(=(z))

zReLU =

{z ifθz ∈ [0, π/2],0 elsewhere

Complex weight initialization. Appropriate weight initial-ization is often considered helpful for the training process ofnetworks. It can help avoid the risk of vanishing or explodinggradients and facilitate a faster convergence. In the complexcase, as derived in [14], the variance of the weight vectors isgiven by Var(W) = 2σ2, where σ is Rayleigh distribution’s only

parameter; σ can be selected according to any initializationcriterion such as those described in [43], [44]. Note that as thevariance is not affected by phase, one can initialize the phaseφ of W as φW ∼ U [−π, π].

C. Deployment Predicates

The techniques presented in this work can be applied to awide range of scenarios. Nonetheless, for ease of presentationwe will assume that role of an entity aiming to identify thesources of wireless signals (be it electronic devices or aircrafts). We assume that this entity passively collects wirelesssignals within a certain area, without interacting with oractively probing the devices. Similarly we do not require theuser to interact with a specific resource. The only assumptionis that the user’s device is transmitting some form of wirelesssignal; in our experimental evaluation we use WiFi and ADS-B traces, but this could be applied to other types as well (e.g.,Bluetooth, GSM, 4G, etc ). Depending on the scenario, theapplication can include the deployment of multiple antennaswithin a larger physical area (e.g., for authenticating devicesconnecting to an organization’s restricted wireless network) ormay focus on a specific small space (e.g., for detecting rogueaccess points using stolen certificates).

III. SYSTEM DESIGN AND IMPLEMENTATION

Here we present an overview of RF-DCN, and discuss themotivation behind our design and implementation choices.

Design constraints. The main goal of RF-DCN is tooperate on raw I/Q data so as to decouple the fingerprinting ca-pabilities of our system from protocol or software specificities(which can differ across versions [11]) and implementationflaws. It also removes the burden of manual analysis thatwould otherwise be required if new or custom protocols wereencountered. To that end we limit all data preprocessing togeneral RF methodologies (such as applying low or band passfiltering) and data augmentation techniques that can be appliedto any RF signal regardless of protocol (such as decimation).

CDCN. We first develop a complex-valued DCN network(Figure 2a) similar to the shallow MusicNet from the recentwork of Trabelsi et al. [14], which was successfully applied toacoustic data. We draw inspiration from this network becauseeven though acoustic and RF signals are different in nature(one is mechanical and the other electromagnetic) they sharemany similarities. Both can be modeled as a 1D complexvector of samples, and many of the techniques, methodologiesand signal processing theories are applicable to both in practice(low/bandpass filtering, decimation, etc). This also poses as acomparison baseline for our novel LSTM-based approach.

RDCN. Since RF signals can be expressed as a time seriesand have temporal coherency, we hypothesize that networksthat capture such temporal features will perform well at ana-lyzing RF data. Consider, for instance, the case of amplitudemodulation where voltage changes from a higher value toa lower value must pass through intermediate values (albeitephemerally). The precise nature of this transition is based onthe device’s electronics and this is precisely what we aim tocapture. Our second complex network elevates LSTM layers,and employs a custom build layer we call the “Sequencer”,as shown in Figure 2b. A known problem for RNNs is the

RawI/Q

Preprocessing

cBN01

32

cConv

CReLU

cBN1 AvgPool

CReLU

Dropout

4096

cLinear

CReLU 4096

cBN2

4096

cLinear

CReLU4096

cBN3

OutputGate

100

softmax

(a) Convolutional Deep Complex Network (CDCN) architecture.

RawI/Q

Preprocessing

cBN0

6400

cLinear

CReLU

cBN1

1

1024

Sequencer

LSTM0

LSTM1

CReLU

CReLU

Stack cBN2

4096

cLinear

CReLU 4096

cBN3

OutputGate

100

softmax

(b) Recurrent Deep Complex Network (RDCN) architecture.

Fig. 2: Convolutional Deep Complex Network (CDCN): the Complex Convolutional Layer combines both I/Q pairs in two mixedchannels (top). The Recurrent Deep Complex Network (RDCN) is based on the powerful LSTM variant; while initially the I/Qpairs are in two different channels the Complex Linear layer binds them together, and its output contains two mixed channels(bottom). The schematic was produced with PlotNeuralNet [45].

BPTT problem; as RF signals can expand to extremely largetime series ranging to several thousands of samples, simplyoperating on a point to point basis is infeasible. While LSTM’smemory mechanisms help alleviate the problem by providingalternative ways for the gradient to flow back through time,they can still break down under sufficiently large signals [39],as the memory cells grow to crippling proportions. Drawingupon the conclusions of Gers et al. [39] to combat thisproblem we have designed the Sequencer layer, which splitsthe signal -that is modeled as a 1D time sequence of I/Q pairsinto a number of smaller vectors. The generated vectors arethen served as inputs sequentially to the LSTM in a propertemporal order as shown in Figure 4, allowing the LSTM to“see” the same amount of data but over less timesteps. Thistransmutation of a longer vector into a collection of smallervectors changes the nature of the signal by transforming itinto different dimensions. As such, the size and number of thenewly generated vectors are critical as they might not preservethe qualities of the original signal.

A. Real, Imaginary and Mixed Data Channels

Due to limitations of the underlying frameworks, the fun-damental data-holding structure (i.e., tensors) cannot containcomplex numbers. To overcome this our tensors contain thereal (“I”) component in one dimension and the imaginary (“Q”)component in another, akin to [14]. Initially, our input channelsare clearly separated between real and imaginary componentscontaining real-valued float numbers. This separation ceasesonce the data is pushed through the first layer. The linear,convolutional layers operate on both channels utilizing theequations described previously, and the output channels ex-iting these layers contain two channels with data containinginformation from both the real and imaginary parts.

These output channels no longer solely represent the realand imaginary parts of the number, but are instead mixedinformation-wise. To differentiate from the prior “pure” input

channels we refer to the new channels as “A” and “B”.To illustrate the qualitative difference between these sets ofchannels, consider the output of the complex linear layer:

LCW (h) =

[<(W · h)=(W · h))

]=

[A -BB A

]·[

xy

](1)

The outcome of this equation contains values that are quali-tatively mixed and stores them as:

• channel A= R ∗R− I ∗ I• channel B= R ∗ I + I ∗R

Mathematically, the outcome of this operation is still real-valued in one channel and imaginary in the other. However,information-wise the channels contain mixed data since theimaginary values are used in tandem with the real valued parts.

Output channels. The output of the network is containedin two mixed channels A and B. Trabelsi et al. [14] utilizethe output of channel A, and discard the output of channel Bin the shallow version of their proposed network but utilizeboth channels through a linear dense output layer in the deepversion of their model. Since we are working with novel RFdata we explore three possible configurations of the outputs soas to identify the optimal way for leveraging the mixed outputchannels: (i) only the output of channel A (per prior work),(ii) we introduce parameters Ga and Gb and sum the channels,allowing the network to modify the parameters and (iii) weemploy a convolutional layer for joining the two channels intoone output channel of identical dimensions. It is importantto note that while Wolter et al. [46] follow a similar schemefor the gates that control the memory of the Complex GRUthat should not be confused with our parameters which act asa weighting factor for the two output channels. Our resultsshow that both channels are actually used in practice and thenetwork keeps both parameters in close proximity.

B. Layers

Complex layers. We use the complex batch normalization,complex convolutional, and complex linear layers proposedby Trabelsi et al.[14]. The input of the initial convolutionaland linear layers is split in two channels that contain the(purely) imaginary and real part in each channel. However,as aforementioned, the output of these layers contains mixedinformation from both imaginary and real parts of the signal.

Activation units. To identify the optimal activation unit weevaluate both ZReLU and CReLu. After the initial complexconvolutional/linear layer the output channels contain mixedinformation. We found that ZReLU, which activates on thefirst quadrant of the Cartesian plane, is not optimal for themixed nature of the data and CReLU provides better results.

LSTM. Our LSTM layers are unmodified real-valuedLSTMs, but the input they receive is mixed. We utilize twoLSTMs in parallel and provide the A, B channels as input re-spectively. The output used for the classification is the contextfrom each LSTM during the final step. The internal contextis preserved for the entire epoch and reinitialized to a cleancontext at the start of each epoch. The computational graph ispreserved within each minibatch but detached -yet the contextvalues are still preserved across minibatches so as to avoidbackpropagation across minibatches. During our experimentswe found that the RDCN architecture was significantly harderto train, corroborating the findings of [47].

Preprocessing step. Our data preprocessing is minimal andcomprises of: (i) basebanding the signals [19], and (ii) applyinga 3rd order Butterworth bandpass filter [48] with cutoffsat 25KHz and 20MHz. Our insight behind the lower orderButterworth filter is to allow a smooth cutoff and include po-tential discriminating effects emanating from the imperfectionsinherent to the transmitter’s filters. The precise lower value wascalibrated experimentally. No other data preprocessing or otherprotocol-specific treatment takes place.

C. Data Augmentation

Existing methodologies for processing image or audiodata contain a multitude of data manipulation techniques forcreating new synthetic instances by increasing robustness andreducing generalization errors. For instance, a well establishedapproach for creating synthetic image training data is to rotatethe image, scale it or add/remove noise. Prior work has shownthat such practices improve the network [49], [50], [51]. Weadapt common data augmentation practices and evaluate theirapplicability and limitations in the realm of RF data.

Decimation. A widely used methodology in signal pro-cessing is to reduce the computational load by keeping everyMth sample of a signal where M is called the decimationfactor. In order to avoid aliasing a low pass filter must beapplied before the decimation operation. For neural networksapart from reducing the computational load, decimation alsoreduces the number of network parameters since the inputsignal is reduced according to the decimation factor. In ourwork decimation has the additional role of augmenting thedataset since the decimated signals are used as extra samples.The decimation factor must be an integer value and obey thelower limit of the Nyquist theorem [52] to be able to correctly

Preamble 20μs M

AC

4μs

Payload - Variable Length

Prea

mbl

e 8μ

s

Dow

nlin

k 5

μs

Cap

abili

ty 3

μs

24 bit Identifier 24 bit Parity

WiF

iA

DS-

B

Fig. 3: Signal breakdown and unique identifier position for the“I” component over time.

1 2 5 74 83 6

Time

S

DecimationFactor:2

1 5 73S1

2 6 84S2

1 2 5 74 83 6

Time

S

SequencerSteps:2

1 3 42S1

5 7 86S2

LSTM0

LSTM1S2 S1

ChannelA

ChannelB

S2 S1

Fig. 4: Decimation with a factor of 2 generates two newsamples of half the original length (left). The Sequencerconverts the signal into a series of vectors (here a vector of 2vectors) that are sequentially entered in the LSTM in propertemporal order, i.e, the samples will be seen in the same orderas if the original signal was provided unprocessed (right).

reconstruct the signal. Decimation can be seen as reducingthe rate of sampling by a ratio equal to the decimation factorM , thus the maximum ratio must be the number that allowsa sampling rate with at least Fs ≥ 2Fmax, where Fmax isthe maximum frequency required. For example in the case ofWiFi where the bandwidth is 20MHz the minimum requiredsample rate is 40MSPS (Mega Samples Per Second); sinceour sample rate is 100MSPS per channel and the factor Mmust be an integer the only allowed value is two. Increasingthe decimation factor beyond that limit reduces the highestfrequency one may reconstruct, leading to potentially missedinformation. To establish whether this constrain also appliesto the problem at hand (which is to fingerprint the transmitterrather than faithfully reconstruct the signal) we add thisparameter to our hyperparameter exploration. Figure 4 (left)shows an illustration of decimating a signal composed of 8 datasamples with a factor of two. We also adapted other commondata augmentation techniques but did not find them to be usefulin this context. We provide more info in Appendix B.

Randomized Cropping. Our main design goal is for RF-DCN to be protocol-agnostic and to not learn software-levelcharacteristics. To guide the network away from latching ontoprotocol identifiers, we crop each sample in every minibatchinto N parts and randomly choose, without replacement, onepart to train on. This approach forces the network towardsselecting features that are common in different parts of the

signal since it has to minimize the error for every part itencounters during training. The same procedure is followedfor testing as well, and each test signal is fragmented into Nparts and a random piece is chosen for classification.

Both WiFi and ADS-B signals used in our evaluation, con-tain identifiers that exist in predetermined positions and havedifferent sizes (see Figure 3 in Appendix A). In our currentexperiments we find that N = 6 provides a good balancebetween having a sufficient part of the signal and making surethat the majority of the generated random crops do not containany unique identifier. Our technique is protocol agnostic as wedo not make assumptions on the precise location of the uniqueidentifier in the signal, choosing a random part when trainingor testing. In other words, during our experiments 83.3% ofeach testing and training signal is randomly discarded, whichprevents the system from latching on to identifiers withoutus revealing information to the system about the position ofthe identifier within a specific type of packet (as that wouldimplicitly teach the system information about each protocol).By following this scheme, the network is forced to identify thedevice without learning superficial features that correspond toidentifiers, and it does so regardless of where such an identifieris located in the signal. The results solidify our hypothesisthat our approach captures the essential characteristics of eachdevice; if the networks had learned to predominantly discern asignal through its identifier, the accuracy results reported withrandom signal cropping would arguably be significantly lowerthan those presented in this work.

IV. DATASET DESCRIPTION

Here we provide details of the datasets used throughoutour experiments, which are comprised of signal traces capturedunder a variety of conditions over a period of 13 days. Thiswas part of our external evaluation by a red team (they werein charge of creating the dataset – performers were not partof IRB-related processes). Data is stored in the SigMF format,which we describe in Appendix A2.

The RF signals were captured from 96 different TektronixRSA5106B each equipped with a HG2458-08LP antenna, andfour USRP B210 SDRs. Signals were captured at 100MSPS1

which is well beyond the minimum sample rate required toreconstruct even the most demanding signal category in ourdataset, i.e., WiFi signals. Finally, the captures were conductedwith both vertical (90%) and horizontal (10%) antenna po-larization, 5% of the signals are captured in both forms. Thecaptured signals have variable lengths according to the protocoland the message type. ADS-B signals have a length of 6,400(64µs short) or 12,000 (120µs extended) whereas for WiFilength is extremely variable (as per specification), ranging froma few thousand I/Q pairs to 272K I/Q pairs in our data. Asour goal is to create a system without any knowledge of theunderlying protocols, we perform our experiments with vectorsof 6,400 samples (I/Q pairs) for both WiFi and ADS-B. We donot differentiate between protocols by adding more data pointsfor WiFi signals, as that would increase the informationalcontent the system sees for one type and could guide it toseparate signals due to superficial characteristics (e.g., different

1The overall sampling rate is 200MSPS split between both “I” and “Q”components leading to 100 MSPS per channel.

signal lengths). At our sample rate 6,400 complex data pointscorrespond to to 64µs of signal capture.

WiFi-1. This dataset spans all 13 days and data wascaptured in both indoor and outdoor settings. It includes 4TBof signals that have been collected from 53,853 unique devicesfrom a variety of manufactures. The majority of devices areApple iPhones (∼30%) and the rest are from different man-ufacturers and types (smartphones, laptops, tablets, drones).The signals captured are mainly from 2.4Ghz but a significantportion (≥ 20%) is also from 5Ghz emitting devices; somedevices transmit in both bands. Device labeling was doneusing the MAC addresses, as MAC address randomization wasdisabled to facilitate the labelling process of such a massiveand diverse set of devices. The real-world capturing processhas also resulted in additional interference and “noise” in thecaptured signals from other protocols that transmit at thoseranges (e.g., Bluetooth), which poses an additional challengefor protocol-agnostic fingerprinting. Furthermore, the open aircapturing was designed to eliminate or minimize environmen-tal sources of bias that could assist the fingerprinting systems.Specifically, while the deep learning models could potentiallylearn environmental factors or location-specific artifacts, thiswas counteracted by: (i) the weather and other environmentalfactors (e.g., physical obstacles that alter signals) changingover the course of 13 days, (ii) using devices that are notstationary (as those commonly used in prior studies) butmobile devices during normal operation where they changemultiple locations per day while transmitting their signals,and (iii) the antennas being deployed in different locationswhich also changed through time. As such, this dataset allowsfor the evaluation of our proposed models across differentcapturing conditions and for large-scale deployments wheremany devices need to be tracked.

WiFi-2. This dataset is formed from a set of 19 identicaldevices with the same firmware version installed. The signalswere collected in a lab (indoors) and the devices were set upto transmit the same data and configured with the same MACaddress. The dataset includes ∼200K signals and requires17GB of storage. Labeling was done manually: traffic wascaptured from one device and labelled, then the next devicewas used and labelled, etc. It is important to note that while alldevices have the same MAC address, and devices transmittedthe same data, ephemeral identifiers in the traffic will differ(e.g., timestamps). The main purpose of this dataset is todemonstrate that RF-DCN does not learn specific identifiersand is unaffected by other software or content-related artifacts,but is instead capable of identifying the artifacts left by adevice’s unique hardware imperfections in a transmitted signal.

ADSB-1. This dataset was collected over a span of 10days and contains approximately an equal amount of shortand extended signals. The transmitters were airplanes and wehave collected signals from 10,404 unique sources. The cap-tured data contains a plethora of different airplane positions,velocities, altitude and SNR values. Labeling was done usingthe unique 24bit identifier encoded in each message. The entiredataset includes 3.5M signals and is approximately 7TB. Theoutdoors collection took place at a single location.

Realistic environmental conditions. Numerous factorsaffect the quality of a recorded signal; channel effects (e.g.,multipath, hysterisis), environmental effects (e.g., outdoors vs

indoors), the type of transmitter (e.g., cellphones, UAV), cap-ture conditions (e.g., antenna polarization), will vary in reality.Our dataset was generated so as to contain signals representingall of the above conditions. We believe that this plethoraof different capture parameters is a critical requirement forevaluating a system’s ability to fingerprint devices in practice,as ideal indoor-lab settings alone are not indicative of actualdeployment settings. Furthermore, the signal captures can con-tain multiple concurrent transmissions from different devices,thus, introducing additional noise. Another important feature isthat our training and testing data are not from a contiguous timewindow (apart from WiFi-2), further increasing the realismof our testing scenarios. Prior studies have evaluated theirproposed systems using datasets that reflected some but notall of the aforementioned parameters. To our knowledge, weare the first to present a study that experimentally evaluates asystem using a dataset that contains signals captured in suchdiverse conditions and quantifies their effect.

V. EXPERIMENTAL EVALUATION

Here we provide an extensive evaluation that exploresRF-DCN’s performance under different environmental factorsand across multiple dimensions of our model’s design andimplementation. The combinations of factors and the specificdetails (e.g., number of training and testing samples) in eachexperiment were dictated by a red team evaluation processorganized externally (we have provided more information tothe PC chairs). All the experiments presented, unless otherwisenoted, were performed on captured signals with a duration of64µs represented by 1D vectors of 6,400 complex numbers. Inall experiments the testing and training data are from differenttransmissions. The reported accuracy in each case denotesthe percentage of target devices that were identified correctly,thus, presenting a lower bound of the actually accuracy of ourmodels. For instance, consider an example where the test signalcapture for device A also contains background transmissionsfrom a device B which is one of the other devices used inthe experiment. If our system labels the test signal as “DeviceB” we will count that as a misclassification, despite Device Bbeing one of the classes and present in the capture.

Hardware specifications. Experiments were performed ona Supermicro SuperServer 1029GQ-TVRT equipped with 2Intel Xeon Bronze 3104 (6 cores at 1.7GHz each) with 96GBof DDR4 RAM and two Nvidia Tesla V100 SXM2 GPUs with16GB of VRAM each. All of the architectures occupied amaximum of 8GB on the GPU; the multiple units enabled theconcurrent training of different models and reduced the timerequired for the hyperparameter exploration.

Models. We present two real-valued conventional net-works, namely ANN and CNN, and two complex-valuednetworks, namely CDCN and RDCN shown in Figure 2.We use the real-valued networks as baseline for evaluatingthe potential of the complex-valued variants and drawing abaseline of comparison using all the information present in asignal (properly bound using the covariance matrices, and theusage of complex weights and complex activation units).

Machine learning configuration. The networks weretrained over 100 epochs with a batch size of 32. We found thatadaptive learning rate annealing is helpful and, thus, reduce

the learning rate by 1e−1 after every 10 epochs where thenetwork fails to improve its validation accuracy. We also allowearly stopping once the learning rate reaches 1e−7. The resultswe report in this section represent the best generalization accuracy over 100 epochs or if the early stopping criterionis met. This falls in line with evaluation approaches in thedeep learning literature when handling large and complicatedneural networks (e.g.,[53], [54], [55], [56]). This is standardpractice for obtaining unbiased error estimations when facingsignificant computational requirements for training such net-works and exploring different architectures, configurations, anddatasets, as opposed to k-fold validation approaches that arecommon for other machine learning techniques. We evaluatedboth Cross Entropy and Binary Cross Entropy and foundthat the later improves the convergence of the model andproduces better results. We evaluated both the Adam optimizerand SGD in a subset of the experiments and found thatAdam with amsgrad disabled is superior. Real-valued layersare initialized using uniform distribution based on the workby Glorot et al. [43], while complex-valued layers leveragecomplex weight initialization. We performed no normalizationand standardization of the data during preprocessing, and reliedon the first Batch Normalization layer that performs a similarcomputation. The hyperparameters that we explored includeseveral variables such as the optimal decimation factor, theoptimal sequencer time step and the number of hidden units inthe RDCN layer. The hyperparameter space was explored usingSpearmint [57], and the optimal values obtained correspondedto the best accuracy using a validation dataset comprised ofboth WiFi and ADS-B devices. Our methodology and detailson the obtained hyperparameters can be found in Appendix C.

Hyperparameter exploration takeaways. While our ex-periments show that the hyperparameters greatly affect thelearning capabilities and accuracy of each architecture, ourexploration reveals that the RDCN network is the most “sensi-tive” to hyperparameters. Its performance can range from notlearning anything (i.e., getting results equivalent to randomlyguessing the source of the signal) to obtaining highly accurateresults. For RDCN we find that the most crucial parameter,besides the learning rate, is the number of time steps producedby the sequencer; this indicates that restructuring the 1Dtime series vector into a tensor of vectors directly influencesthe representation of the signal, effectively changing an N-length one dimensional signal to an N/M vector length ofM-dimensional vectors. The best results were obtained for alength that corresponded to approximately 1µs worth of data,given our sampling rate that corresponds to 100 values pertimestep (or 50 when a factor 2 decimation is used).

A. Signal Preprocessing

First we explore how preprocessing the signal prior tofeeding it to RF-DCN affects accuracy. To that end we comparethe accuracy obtained when the different neural architecturesare fed completely unfiltered data as opposed to data thathas undergone generic and standard telecommunications pre-processing, namely baseband and passband filtering. Figure 5shows that filtering and basebanding are crucial for the WiFisignals. Apart from higher susceptibility to outdoor conditionsand channel effects, the inherent ability of WiFi transmitters tobroadcast in different communication channels mandates the

TABLE I: Accuracy in regards to temporal relation of the collected signals, using the WiFi-1 dataset.

T-train T-test Environment Classes Samples (Train) Samples (Test) ANN CNN CDCN RDCN RDCN (top5)

Day-1 Day-1 Indoors 100 800 200 39% 66% 74% 72% 90%Day-1 Day-1 Outdoors 100 800 200 38% 52% 66% 63% 89%Day-1 Day-1 Both 100 800 200 32% 61% 71% 70% 91%Day-1 Day-2 Both 50 100 25 2% 3% 7% 10% 30%Day-1,2 Day-3 Both 65 218 54 4% 10% 12% 21% 44%

0

20

40

60

80

100

ANN CNNCDCN

RDCN

Accu

racy

(%)

ADSB-1 (100)

ANN CNNCDCN

RDCN

WiFi-1 (100)

Unfiltered Filtered

ANN CNNCDCN

RDCN

WiFi-1/ADSB-1 (100)

Fig. 5: Preprocessing effect on accuracy, for 100 devices.While all networks benefit from preprocessing, CDCN andRDCN are more robust to learning from unprocessed RF data.

use of basebanding. Interestingly, we find that the complex-valued networks are able to “see through” the clutter inducedby concurrent WiFi transmissions and ambient noise, locatingthe intended signal even in the raw pre-processed signal casewhere the signal is not even basebanded and the capturedinstance between training and testing could be in differentcentral frequencies. It is evident however that performingat least the basic signal processing can greatly improve theaccuracy, especially in the weaker real-valued networks.

B. Multi-protocol networks

Apart from the positive effect of filtering, Figure 5 alsoindicates the challenge of building a network for signals frommultiple protocols. When the network is trained on a mixtureof two signal types, performance is lower than the averageaccuracy of the two stand-alone networks (which one might

0

20

40

60

80

100

ANNCNN

CDCNRDCN

Accu

racy (

%)

ADSB-1 (50)

ANNCNN

CDCNRDCN

WiFi-1 (50)

ANNCNN

CDCNRDCN

WiFi-1/ADSB-1 (100)

Fig. 6: Accuracy comparison of two stand-alone single-protocol networks versus a protocol-agnostic approach for allsignal sources. The red lines denote the “ideal” accuracy,which is the average of the two stand-alone networks.

hope to achieve). To further explore this effect we considerthe following experiment. We assume a scenario where 50WiFi and 50 ADS-B sources exist, and want to quantify theperformance impact of a protocol-agnostic approach where noexplicit information is provided to the network for differenti-ating between them. Specifically, we quantify the accuracy oftwo stand-alone versions of RF-DCN, each trained and testedon 50 classes of a single protocol, and compare it to a protocol-agnostic version that has to handle all 100 classes. As canbe seen in Figure 6, the multi-protocol network is “weigheddown” by the WiFi protocol and is not able to reach the averageaccuracy of the two standalone networks. It is important to notethat while the increase in the number of classes from 50 to 100can affect the performance, the presence of multiple types ofprotocols has a dominant impact (as in Figure 5). We find thatthe RDCN architecture not only achieves the highest accuracy,but also presents the smallest accuracy loss for the multi-protocol system, which further highlights its effectiveness andsuitability for challenging settings where multiple types ofsignals are emitted from sources.

C. Channel effect: spatial implications

Due to the transient nature of the channel through which asignal is being transmitted, we explore how RF-DCN’s accu-racy changes based on the spatial conditions of the differentsignals. We break down our experiments on the WiFi-1 datasetin the first two rows of Table I. We start by using indoordata that was captured on the same day, for 100 differentdevices. Our RDCN network is able to correctly identify thedevice in 72% of the cases, while the correct device is inthe top five for 92% of the tests. When we replicate thisexperiment using signals that were captured outdoors, accuracydrops by 4-9%, due to the impact environmental factors haveon the transmitted signals. When training and testing usinga mixture of indoor/outdoor signals RF-DCN performance isroughly in the middle between the indoor and outdoor settings.Once again, RDCN vastly outperforms real-valued networks,

demonstrating our system’s robustness against environmentalfactors – if our model relied on location-specific artifacts foridentifying devices its performance would have significantlydeteriorated in this experiment.

D. Channel effect: temporal implications

Prevalent ambient environmental differences greatly affectthe morphology of each signal as shown in Figure 1. Apartfrom the effect natural environmental changes have (e.g. rain/-sunny day) ephemeral factors such as vehicles or obstaclesmay induce multipath or other effects. Finally some protocolslike WiFi allow for transmission in different communicationsub-channels according to conditions in the RF spectrum inthe devices’ vicinity, for example if more than one devicebroadcasts in the same channel. As such, we also experimentwith a more challenging and realistic scenario where thetraining data is collected during one day and the testing datais from a different day, using a smaller number of classes.The last two rows in Table I contain the experiments usedto estimate the effect of these conditions. When we train onone day and test using a different one, all the aforementionedchallenges are reflected in our results as even the RDCNarchitecture obtains only a 10% accuracy with the correctanswer being in the top five 30% of the time. The real-valued networks learn nothing as their accuracy is equivalent torandom selection. When we train on signals from two differentdays the network appears to learn and discard the channeleffects and create more robust signal features, since testing ona different day now achieves an increase in accuracy with theRDCN rising to 21%. While the other architectures improveas well, they remain considerably less accurate.

It is important to note that while we refer to Days 1,2,3 forease of presentation, the data is not from three specific daysbut can span the entire 13 days. For example, the data usedfor one device might be from the first, sixth and eleventh day,while for a different device it might be from the second, third,and seventh day of the collection period. This allows us to trulystress test our system under different environmental settings.In our opinion, this experiment highlights perhaps the toughestchallenge in operating with raw I/Q data. Previous work thatrelied on protocol-specific techniques by extracting attributes(e.g., constellation phase-shift) obfuscated this obstacle sincesophisticated protocol-specific signal processing techniquescan rectify these effects. The fact that the RDCN networkseems to be able to learn the channel effects within such acomplicated domain using signals captured from different daysis an encouraging result for the feasibility of protocol-agnosticfingerprinting in realistic settings.

E. Channel effect: signal-to-noise implications

As the devices being monitored are likely to be in anexternal setting, the quality of the signal can be affectedby environmental factors resulting in varying levels of noise.In this set of experiments we aim to investigate how RF-DCN’s performance is affected using datasets with differentsignal-to-noise ratios. We train our models with 100 classesusing an equal number of training and testing samples (525)that were captured over a mix of different days. We breakdown our data into three different levels of SNR (Low ≤2dB,2dB<Med<5dB and High ≥5dB). To make conditions more

TABLE II: Accuracy when using different subsets of ADSB-1with varying levels of SNR ratios.

Train Test ANN CNN CDCN RDCN RDCN (top5)

Low Med 1% 98% 100% 100% 100%Low High 75% 99% 99% 100% 100%Med Low 85% 85% 98% 99% 100%Med High 99% 100% 99% 100% 100%High Low 27% 26% 99% 99% 100%High Med 78% 78% 99% 100% 100%

0

20

40

60

80

100

1 2 3 4 5 6 7 8 9 10

Accura

cy (

%)

Run

ANN CNN CDCN RDCN

Fig. 7: Consistency of neural architectures over ten runs onthe same testing and training datasets.

challenging we experiment with testing and training combina-tions that have different SNR levels, as shown in Table II. Thecomplex-valued networks both achieve accuracy higher than98% and our results show that they are unaffected by differentSNRs. The RDCN network achieves almost 100% accuracy inall settings. The real-valued networks perform relatively betterin settings where the testing data is of higher quality thanthe training but suffer when attempting to recognize signalsof lower quality when trained on signals of higher quality.The most indicative such case is shown when the trainingused signals with High SNR and the testing occurred on LowSNR with ANN and CNN both achieving an accuracy of lessthan 27%. We observe an interesting relation between thisexperiment and the one comparing filtered and raw signals;in both experiments the complex-valued networks appear tobe able to “see through” the noise and extract robust andmeaningful features whereas the real-valued networks are moresusceptible to superficial effects induced by noise.

Model Reliability. During this experiment we encountereda discrepancy in the ANN model, shown in the first rowof Table II. To further investigate this, we performed 10runs for each architecture as shown in Figure 7. We observethat the ANN network has a mean value of 54.1% with astandard deviation of 36.7 whereas the other networks remainconsistent. When the ANN performed poorly it suffered fromthe first epoch, indicating that this issue stems from weightinitialization. The limitations of this simplistic, yet parameter-rich, model are also evident when the number of devicesincreases and it completely fails to scale, as discussed next.

F. Class size effect

We explore how our system fares when the number ofclasses increases. As Figure 8 shows, we run three sets ofexperiments where RF-DCN is required to differentiate be-tween 100, 500, and 1000 sources. These experiments evaluateour system using an equal mix of data from both the WiFi-1 and ADSB-1 datasets, in all setups we use 218 signals foreach device for training and 54 for testing. One can easily

0

10

20

30

40

50

60

70

80

90

100

100 500 1000

Accu

racy (

%)

Number of classes

ANNCNN

CDCN

RDCNRDCN (top5)

Fig. 8: RF-DCN’s accuracy for different number of classes.

observe that accuracy diminishes as the number of classesincrease. Interestingly it does so in a non-linear fashion, withthe RDCN achieving an accuracy of about 82% for 100 devicesand dropping to 72% and 62% for 500 and 1,000 classesrespectively. It is also evident that the real-valued networksfail to produce models with sufficient descriptive power tofollow the increase in cardinality, even though they have moreparameters than the CDCN architecture. Finally we observea complete breakdown of the ANN and a more rapid declineof the real-valued CNN when compared to its correspondingcomplex-valued CDCN. The decrease in accuracy due to anincrease in cardinality is known and expected [58].

G. Learning hardware imperfections

If the data processing methodology is not designed cor-rectly then the models tend to latch onto (software-level)identifiers as opposed to the hardware characteristics of thetransmitting device. To further demonstrate the feasibility ofcreating fingerprints from the unique hardware imperfectionsthat arise during a device’s manufacturing process, our nextexperiment aims to stress test our system. Specifically, weleverage the WiFi-2 dataset obtained from 19 identical smart-phones flashed with the same firmware version, configuredwith the same link-layer identifier (i.e., MAC address) andtransmitting the same data. We present the results in Table III.To avoid latching on to either ephemeral or unique identifierswe cut each signal in 6 parts and perform classificationusing one randomly selected segment. Each training samplecomprises less than 17% of the original signal, disrupting thecontinuity of the transmission, which leads to two primarybenefits; it increases the diversity of the dataset and guides thenetwork towards finding features that identify the device basedon the signal itself. Simply latching onto identifiers wouldlead to poor performance in this experiment. In our currentsetup, our random segments are approximately 10µs worth oftransmission or 1033 data points (out of the 6400 originally).This fragmentation prevents latching onto ephemeral or uniqueidentifiers wherever they may be located in the signal, withoutrequiring prior knowledge of where they may be located. Thisallows our methodology to be applied to any protocol.

As can be seen in Table III, this has a detrimental effect onall the networks except RDCN, as its accuracy remains within2-4% of when provided with the entire signal. This also showsthat our cropping methodology prevents the models from usingsecondary protocol features (e.g., timestamps) as a discerning

TABLE III: Accuracy when using random signal fragments toprevent networks from learning software-level identifiers.

Dataset Classes ANN CNN CDCN RDCN RDCN (t5)

WiFi-2 19 21% 26% 27% 99% 100%ADSB-1 100 1% 15% 37% 98% 100%WiFi-1 100 2% 26% 24% 69% 90%Mix 100 15% 34% 36% 80% 93%

ANN CNN DCN LSTM ANN(D) CNN(D) DCN(D) LSTM(D)0

200

400

Tim

e (S

ec)

0

20

40

60

80

Para

met

ers (

Milli

on)

Fig. 9: Performance of different architectures. (D) denotes theuse of decimation with a factor of 2.

signal since, if that was the case, all the models would continueto exhibit high accuracy.We believe that superiority of ourmodel is due to the existence of sufficient gated memory cellsin the RDCN layer and its ability to learn features in context.Once the signal is cropped and fed to the network its variousparts stem from different transmission phases (i.e., preamble,transient states, payload-data transmission, checksum etc). Themorphology of the signal in these different phases differsdrastically as seen in Fig 3, and the RDCN network’s ability tomemorize features and work in context is the crucial differenceto the other architectures that struggle with this diversity. Thisrobustness, derived from capturing the essential transitionalimperfections of the original transmitting device, will allowan authentication system to reject impersonating devices thathave spoofed identifiers (e.g., MAC address) even if they areusing stolen credentials or certificates.

H. System Performance

Figure 9 depicts the processing time required to traina mixed dataset over a single epoch, and the parameterscontained by the model. We find that the number of parametersdoes not always correlate with the time required to processthe data, as is evident for the ANN that has almost double theparameters compared to CDCN yet takes less than a quarterof the time for processing. This computational load, which isindependent of the number of parameters, exhibits an inversecorrelation to the accuracy of the networks. We also includemeasurements from our decimation experiment, in which thesignals are decimated with a factor of two. Decimation vastlyreduces the number of parameters and even though it leadsto a signal half the size, the number of parameters is lessthan one third of the original. RDCN is the “heaviest” modelcontaining 30M parameters, down from 92.3M. As we useall the decimated samples instead of just one, we elevatedecimation from a computation-reducing technique to a data-augmentation strategy as well. While this improves accuracy, itincreases the processing time required by the complex-valuednetworks, indicating the computational mismatch between thereal and the complex-valued counterparts.

Figure 10 plots RF-DCN’s testing duration for each phase.

0

0.2

0.4

0.6

0.8

1

0.0001 0.001 0.01 0.1 1 10 100

CD

F

Time (sec)

Model-LoadData-Load

Data-PreprocessData-Test

Fig. 10: Time required for each testing subphase (RDCN).

We use RDCN and conduct 100 runs on an idle server. Ourdataset includes 11,200 signals from 100 classes, amounting toa 1.7G testing dataset. Here the model has already been trainedand we are interested in the time required for the differentphases required for testing. It is evident that computation isdominated by the time required to pre-process and then test thedata, requiring between 23-33 and 29-33 seconds respectively.This experiment reflects a scenario where the attacker monitorsan area with 100 signal sources and aims to classify them.In a scenario where RF-DCN is interacting with a singledevice, e.g., for authenticating a device connecting to an accesspoint, verifying the device would require approximately 0.5seconds which is a reasonable requirement for bootstrappingauthenticated communication between the two endpoints.

VI. RELATED WORK

Fingerprinting. Numerous prior studies have proposedapproaches for fingerprinting devices, with browser fingerprint-ing attracting much attention due to the ubiquitous presenceof browsers [1], [7], [6], [4], [59], [3], [60]. Fingerprintingwork, however, is not exclusive to browsers. Kohno et al. [9]introduced the notion of remotely fingerprinting a specificphysical device over the network by inspecting packet headersand identifying the clock skew due to the hardware charac-teristics of a given device. Several subsequent studies haveexplored alternative approaches for detecting clock skew orother protocol-specific features [61], [62], [63], [64], [65]. Briket al. [10] demonstrated that hardware imperfections can beleveraged as successful fingerprinting features; however theirapproach relied on protocol-specific information since duringthe last steps of demodulation one needs to know the utilizedmodulation scheme in order to estimate the difference from theideal expected symbol constellation. Such approaches, whileaccurate, rely on protocol-specific noticeable imperfectionsthat need to be manually identified and extracted using hand-crafted techniques which is problematic for unknown proto-cols. It is also important to note that their data was collectedfrom an indoor testbed with statically placed transmitters –in practice a fingerprinting system would face significantlyharsher conditions that affect performance, as demonstratedby our experiments. Also, their data was collected using onereceiver which can prove problematic in realistic settings (seeSection VII). Such limitations are common in this line ofresearch (e.g. [66]). Dey et al. [8] argued that the physicalimperfections of smartphones’ accelerometer sensors rendereddevices fingerprintable, as do the audio components [67].

In contrast to prior work we propose and develop a widely

applicable fingerprinting approach, which presents a paradigmshift as it removes many assumptions and practical limitationsthat affected prior techniques. Specifically our work differsacross multiple dimensions. First, our system is fully passiveand does not require any form of interaction with the user orthe target device. Second, our approach is completely protocol-agnostic and does not require any information about theunderlying protocols, operating system or applications running.As a result, our proposed system is not affected by differencesin software implementations across platforms or differentversions, changes to the device’s software (e.g., updates), orthe use of custom protocols. Moreover, we present an extensiveexperimentation that explores the effect of multiple dimensionsof the data collection and model creation process. Finally, oursystem design guides the neural network towards learning theeffect of the device’s hardware on the transmitted signal, thusovercoming obstacles presented by software-level defenses.

Complex-valued networks. Trabelsi et al. [14] laid thetheoretical foundations for using complex-valued networks andproposed appropriate building blocks such as complex batchnormalization and complex weight initialization to facilitatetraining with complex data. Wolter et al. [46] subsequentlyintroduced and successfully trained a complex-enabled GRUand showed that it exhibits both stability and competitiveness.

Deep learning in RF. Recent papers [68], [69], [70]highlighted the potential benefits of using deep learning for sig-nal processing as well as tackling fundamental problems likechannel estimation [68], [70], modulation classification [69]even directly recovering symbols without demodulation [68];all these networks rely on real valued networks and treat thenaturally complex data as two real valued channels. The au-thors of [69], [70] stated the lack of complex-valued networksas an open research question and highlighted the mismatchbetween the common use of complex numbers in almost allsignal processing algorithms and the real-valued nature of thenetworks. O’Shea and Hoydis [69] identified some crucialproblems such as the non-holomorphic nature of commonactivation functions, which was addressed in [14].

In an independent concurrent study, Restuccia et al. [18]have proposed the use of deep learning for fingerprintingdevices based on RF signals. However, their study presentsmultiple significant limitations compared to our work. First,their system relies on protocol-specific processing of thesignal, which introduces an important deployment constraintcompared to our protocol-agnostic design. Next, their systemworks with real-valued data, as opposed to our complex-valued data approach, which inherently results in a loss ofinformation during the down-converting process. Moreover,their system is built on top of a CNN which, as demonstratedby our experimental evaluation, is considerably less accurate– our RDCN architecture outperformed the CNN across allexperiments. Perhaps the most critical limitation is that theyprovide the entire signal for training and testing; as wehighlighted in our experiments, that guides the CNN towardslearning the device identifiers (e.g., MAC address or aircraftID) as opposed to actual hardware imperfections. Next, themajority of their experiments is conducted with a very smallnumber of classes; their largest experiment includes numbersfor up to 500 classes but, as shown in our Figure 8, theCNN architecture’s accuracy deteriorates significantly for more

devices. Finally, they train their WiFi and ADS-B networksseparately, whereas we experiment with multi-protocol sys-tems where the networks are fed both types of signals withoutany prior knowledge of the underlying protocols. Overall,while their work presents similarities with ours, we believethat our paper presents significant advantages for practicaldeployment, while also demonstrating superior accuracy undermore challenging scenarios. We also explore the effect ofmany practical dimensions of the data collection and networktraining processes that are omitted in their work.

VII. DISCUSSION AND FUTURE WORK

Deployment. Our techniques can be deployed in a de-fensive capacity in multiple scenarios ranging from passivelyauthenticating devices [71], [72], [73], [74] to detecting unau-thorized devices in restricted areas [75] or unmasking spoofedtransmissions (e.g., “wormhole” attacks [76]). Another poten-tial application is detecting the illegitimate use of credential-s/certificates from unauthorized transmitters.

Agnostic vs generalizable techniques. An interestingrelated line of work explores using a range of mathematicaland statistical tools that are broadly applicable across pro-tocols, such as the work Danev et al. [13], [12] or Kleinet al. [77]. The main difference between our work and theabove is that we explore the feasibility of fingerprinting withoutknowledge of the modulation or the carrier frequencies. Whileour system’s accuracy does indeed benefit from doing genericpreprocessing, our system works in a completely agnosticmanner given a capture of signals within a given frequencyrange; it also handles devices that transmit at multiple differentfrequency ranges, without any “guiding” information providedto the system. In contrast while these prior methods are indeedgeneralizable and applicable across protocols, they require theability to locate the signal and move it to an IF while filteringand clearing the signal sufficiently to extract the statistical ormathematical properties used in defining the fingerprint.

Hardware-based bypassing. While our technique is in-herently robust to OS and software level attempts of subver-sion, attackers could try to bypass it using hardware-basedapproaches and trying to mimic the physical characteristicsthat were learned by our network for a specific device. Thispresents several significant challenges in practice, and wouldrequire considerable effort and money [10], if at all feasible.

Danev et al.[78] explored two attacks for circumvent-ing physical authentication systems: (i) a feature replicationscheme for bypassing modulation-based schemes, and (ii)using a high quality signal generator to faithfully reconstruct acaptured signal and deceive low-level physical authenticationschemes such as transient-based approaches. The first attackis ineffective against our system since modulation features areirrelevant; the second, more powerful attack could potentiallytrick our system within lab conditions. In their evaluationthey captured and replicated signals in a lab environment withdevices placed on a tripod 30cm away from the receiver. Ina realistic scenario, capturing a signal in a given locationwould consist of transmissions that are affected by the givenchannel and would be re-transmitted in the channel of thenew location, which would adversely affect the re-transmittedsignal. The spectrum conditions at the new location might also

affect the re-transmitted signal in ways that the authors didnot explore. Consider capturing a WiFi signal at channel 1;when re-transmitting at the relay location prevalent spectrumconditions may contain other WiFi devices transmitting atchannel 1, thus, inducing conflicts. Therefore, while relaytransmission from high-end quality hardware could pose athreat to our fingerprinting scheme, significant experimentationis required for exploring the feasibility of such an attackunder realistic non-lab conditions. As such, we argue that ourproposed techniques present a robust augmentation to existingauthentication (and other defensive) schemes.

Scalability. The limitations on softmax-based classifica-tion, also referred to as the “softmax bottleneck”, have beenstudied extensively in the work of Yang et al. [58]. Soft-max adaptations like Sigsoftmax [79] might allow for highercardinalities. Discriminatory schemes such as Siamese net-works [80] can facilitate an open set classification strategy,providing robustness in the face of a large number of classes.Our RDCN and CDCN networks can be retrofitted to power aSiamese architecture with the appropriate loss function; findingthe appropriate loss metric for the comparison of RF complexdata is a challenging and interesting problem.

Signal types. In our experimental evaluation we used WiFiand ADS-B signals collected both in the wild and in laboratorysettings. Common user devices also use other signal familiesand protocols such as GSM and 4G. As such, in the future weplan to experiment with other prevalent signal types; however,given that RF-DCN works on raw data and is agnostic of theunderlying protocols we believe that our techniques can bedirectly applied to a wide variety of wireless signals.

Model transferability. Similarly to how the transmittersleave a unique fingerprint on signals, which is the motivatingobservation behind RF-DCN, the receiving hardware may alsouniquely affect the signal. As such, in scenarios where thetraining and testing data have been collected by differentdevices a model’s accuracy could be impacted. In the datasetsused in our experiments the testing and training data wascollected by multiple receivers and antennas. However, as thedatasets do not have labels regarding the capturing device/an-tenna, we were unable to quantify this effect. We considerthe in-depth exploration of the transferability of models acrossdifferent monitoring devices as part of our future work. Givenour current results we believe that leveraging multi-antennaor multi-device monitoring setups, where signals are collectedfrom several receivers with different polarizations, can lead toRF-DCN learning the imperfections of the transmitting deviceswhile “ignoring” the receiving hardware’s effect.

Model interpretability. This study presents the first ex-ploration on the feasibility of using deep complex models forfingerprinting devices based on their raw RF transmissions.Our extensive experimental evaluation demonstrates that suchtechniques are well suited for such tasks. While we havedesigned our novel models based on specific domain-basedobservations and well-defined design goals, we consider anin-depth exploration of these network architectures that decon-structs and interprets their predictions (e.g., using LRP [81])an important aspect of our future work.

VIII. CONCLUSION

In this paper we demonstrated the effectiveness of novelcomplex-valued networks, as they can produce powerful mod-els able to express fingerprints that uniquely identify devicesbased on the wireless signals they transmit. Specifically, wepresented a system that can identify artifacts ingrained in thewireless signals due to the imperfections inherent in eachtransmitter’s circuitry. One of our main motivations was toexplore the feasibility of building an RF-based fingerprintingsystem that can act as an additional factor of authentication,able to defend against powerful attacks such as impersonation,digital certificate theft, or relay attacks. Our prototype demon-strates that such a system can be readily deployed and operatewithin realistic environments and without constraints on thetype of signal that devices need to transmit, knowledge of theunderlying protocol or software implementations. Through ourextensive experimental evaluation we highlighted the challeng-ing nature of this task, and quantified the effect of differentfactors across multiple dimensions of the testing and trainingprocesses. While our approach is robust and effective underrealistic conditions, we believe that our work is an importantexploratory first step within a vast and challenging new space,and has identified several interesting future directions.

REFERENCES

[1] P. Eckersley, “How unique is your web browser?” in Proceedings ofthe 10th International Conference on Privacy Enhancing Technologies,ser. PETS’10, 2010.

[2] P. Laperdrix, W. Rudametkin, and B. Baudry, “Beauty and the beast:Diverting modern web browsers to build unique browser fingerprints,”in 2016 IEEE Symposium on Security and Privacy (SP). IEEE, 2016,pp. 878–894.

[3] G. Acar, M. Juarez, N. Nikiforakis, C. Diaz, S. Gurses, F. Piessens,and B. Preneel, “Fpdetective: Dusting the web for fingerprinters,” inProceedings of the 2013 ACM SIGSAC Conference on Computer &Communications Security, ser. CCS ’13, 2013.

[4] O. Starov and N. Nikiforakis, “Xhound: Quantifying the fingerprintabil-ity of browser extensions,” in 2017 IEEE Symposium on Security andPrivacy (SP). IEEE, 2017, pp. 941–956.

[5] A. Sjosten, S. Van Acker, and A. Sabelfeld, “Discovering browserextensions via web accessible resources,” in Proceedings of the SeventhACM on Conference on Data and Application Security and Privacy,ser. CODASPY ’17. New York, NY, USA: ACM, 2017, pp. 329–336.[Online]. Available: http://doi.acm.org/10.1145/3029806.3029820

[6] Y. Cao, S. Li, and E. Wijmans, “(cross-)browser fingerprinting via OSand hardware level features,” in 24th Annual Network and DistributedSystem Security Symposium, NDSS 2017, San Diego, California,USA, February 26 - March 1, 2017, 2017. [Online]. Available:https://www.ndss-symposium.org/ndss2017/ndss-2017-programme/cross-browser-fingerprinting-os-and-hardware-level-features/

[7] K. Mowery and H. Shacham, “Pixel perfect: Fingerprinting canvas inHTML5,” in Proceedings of W2SP 2012, May 2012.

[8] S. Dey, N. Roy, W. Xu, R. R. Choudhury, and S. Nelakuditi, “Accelprint:Imperfections of accelerometers make smartphones trackable.” in NDSS,2014.

[9] T. Kohno, A. Broido, and K. C. Claffy, “Remote physical device fin-gerprinting,” IEEE Transactions on Dependable and Secure Computing,vol. 2, no. 2, pp. 93–108, 2005.

[10] V. Brik, S. Banerjee, M. Gruteser, and S. Oh, “Wireless device identi-fication with radiometric signatures,” in Proceedings of the 14th ACMinternational conference on Mobile computing and networking. ACM,2008, pp. 116–127.

[11] X. Zeng, R. Bagrodia, and M. Gerla, “Glomosim: a library for parallelsimulation of large-scale wireless networks,” in Proceedings. TwelfthWorkshop on Parallel and Distributed Simulation PADS’98 (Cat. No.98TB100233). IEEE, 1998, pp. 154–161.

[12] B. Danev, D. Zanetti, and S. Capkun, “On physical-layer identificationof wireless devices,” ACM Computing Surveys (CSUR), vol. 45, no. 1,p. 6, 2012.

[13] B. Danev and S. Capkun, “Transient-based identification of wirelesssensor nodes,” in Proceedings of the 2009 International Conference onInformation Processing in Sensor Networks. IEEE Computer Society,2009, pp. 25–36.

[14] C. Trabelsi, O. Bilaniuk, Y. Zhang, D. Serdyuk, S. Subramanian, J. F.Santos, S. Mehri, N. Rostamzadeh, Y. Bengio, and C. J. Pal, “Deepcomplex networks,” arXiv preprint arXiv:1705.09792, 2017.

[15] M. Vanhoef, C. Matte, M. Cunche, L. S. Cardoso, and F. Piessens,“Why mac address randomization is not enough: An analysis of wi-finetwork discovery mechanisms,” in Proceedings of the 11th ACM onAsia Conference on Computer and Communications Security. ACM,2016, pp. 413–424.

[16] D. Rupprecht, K. Kohls, T. Holz, and C. Popper, “Breaking LTE onlayer two,” in IEEE Symposium on Security & Privacy (SP). IEEE,May 2019.

[17] A. Musa and J. Eriksson, “Tracking unmodified smartphones using wi-fi monitors,” in Proceedings of the 10th ACM conference on embeddednetwork sensor systems. ACM, 2012, pp. 281–294.

[18] F. Restuccia, S. D’Oro, A. Al-Shawabka, M. Belgiovine, L. Angioloni,S. Ioannidis, K. R. Chowdhury, and T. Melodia, “Deepradioid:Real-time channel-resilient optimization of deep learning-based radiofingerprinting algorithms,” CoRR, vol. abs/1904.07623, 2019. [Online].Available: http://arxiv.org/abs/1904.07623

[19] J. G. Proakis and M. Salehi, Fundamentals of communication systems.Pearson Education India, 2007.

[20] “National instruments - i/q data,” 2018, http://www.ni.com/tutorial/4805/en/.

[21] J. Allen, “Short term spectral analysis, synthesis, and modification bydiscrete fourier transform,” IEEE Transactions on Acoustics, Speech,and Signal Processing, vol. 25, no. 3, pp. 235–238, 1977.

[22] “Concepts for orthogonal frequency division modulation,”2019, rfmw.em.keysight.com/wireless/helpfiles/89600B/WebHelp/Subsystems/wlan-ofdm/content/ofdm basicprinciplesoverview.htm.

[23] “Federal aviation administration,” 2018, https://www.faa.gov/nextgen/programs/adsb/faq/.

[24] “Airplane tracking using ads-b signals,” 2019, https://www.mathworks.com/help/supportpkg/rtlsdrradio/examples/airplane-tracking-using-ads-b-signals.html.

[25] “Federal aviation administration,” 2018, https://www.faa.gov/nextgen/equipadsb/resources/faq/.

[26] A. Atanasov and R. Chenane, “Security vulnerabilities in next genera-tion air transportation system,” Master of Science in Computer Systemsand Networks, Chalmers University of Technology, 2013.

[27] R. Faragher, P. F. MacDoran, and M. B. Mathews, “Spoofing mitigation,robust collision avoidance, and opportunistic receiver localisation usinga new signal processing scheme for ads-b or ais,” in Proceedings of the27th International Technical Meeting of The Satellite Division of theInstitute of Navigation (ION GNSS+ 2014), 2014.

[28] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MITPress, 2016, http://www.deeplearningbook.org.

[29] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner et al., “Gradient-basedlearning applied to document recognition,” Proceedings of the IEEE,vol. 86, no. 11, pp. 2278–2324, 1998.

[30] R. Ranjan, V. M. Patel, and R. Chellappa, “Hyperface: A deep multi-task learning framework for face detection, landmark localization, poseestimation, and gender recognition,” IEEE Transactions on PatternAnalysis and Machine Intelligence, vol. 41, no. 1, pp. 121–135, 2019.

[31] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networksfor semantic segmentation,” in Proceedings of the IEEE conference oncomputer vision and pattern recognition, 2015, pp. 3431–3440.

[32] Y. Zhang, W. Chan, and N. Jaitly, “Very deep convolutional networks forend-to-end speech recognition,” in 2017 IEEE International Conferenceon Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2017,pp. 4845–4849.

[33] Z. Zhang, S. Zohren, and S. Roberts, “Deeplob: Deep convolutional

http://doi.acm.org/10.1145/3029806.3029820

https://www.ndss-symposium.org/ndss2017/ndss-2017-programme/cross-browser-fingerprinting-os-and-hardware-level-features/

https://www.ndss-symposium.org/ndss2017/ndss-2017-programme/cross-browser-fingerprinting-os-and-hardware-level-features/

http://arxiv.org/abs/1904.07623

http://www.ni.com/tutorial/4805/en/

http://www.ni.com/tutorial/4805/en/

rfmw.em.keysight.com/wireless/helpfiles/89600B/WebHelp/Subsystems/wlan-ofdm/content/ofdm_basicprinciplesoverview.htm

rfmw.em.keysight.com/wireless/helpfiles/89600B/WebHelp/Subsystems/wlan-ofdm/content/ofdm_basicprinciplesoverview.htm

https://www.faa.gov/nextgen/programs/adsb/faq/

https://www.faa.gov/nextgen/programs/adsb/faq/

https://www.mathworks.com/help/supportpkg/rtlsdrradio/examples/airplane-tracking-using-ads-b-signals.html



https://www.faa.gov/nextgen/equipadsb/resources/faq/

https://www.faa.gov/nextgen/equipadsb/resources/faq/

http://www.deeplearningbook.org

neural networks for limit order books,” IEEE Transactions on SignalProcessing, vol. 67, no. 11, pp. 3001–3012, 2019.

[34] S. Xingjian, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, and W.-c.Woo, “Convolutional lstm network: A machine learning approach forprecipitation nowcasting,” in Advances in neural information processingsystems, 2015, pp. 802–810.

[35] A. Graves, A.-r. Mohamed, and G. Hinton, “Speech recognition withdeep recurrent neural networks,” in 2013 IEEE international conferenceon acoustics, speech and signal processing. IEEE, 2013, pp. 6645–6649.

[36] Z. Che, S. Purushotham, K. Cho, D. Sontag, and Y. Liu, “Recurrentneural networks for multivariate time series with missing values,”Scientific reports, vol. 8, no. 1, p. 6085, 2018.

[37] Z. Zhao, W. Chen, X. Wu, P. C. Chen, and J. Liu, “Lstm network: adeep learning approach for short-term traffic forecast,” IET IntelligentTransport Systems, vol. 11, no. 2, pp. 68–75, 2017.

[38] P. J. Werbos, “Generalization of backpropagation with application to arecurrent gas market model,” Neural networks, vol. 1, no. 4, pp. 339–356, 1988.

[39] F. A. Gers, J. Schmidhuber, and F. Cummins, “Learning to forget:Continual prediction with lstm,” 1999.

[40] M. Arjovsky, A. Shah, and Y. Bengio, “Unitary evolution recurrentneural networks,” in International Conference on Machine Learning,2016, pp. 1120–1128.

[41] N. Guberman, “On complex valued convolutional neural networks,”arXiv preprint arXiv:1602.09046, 2016.

[42] A. Hirose and S. Yoshida, “Generalization characteristics of complex-valued feedforward neural networks in relation to signal coherence,”IEEE Transactions on Neural Networks and learning systems, vol. 23,no. 4, pp. 541–551, 2012.

[43] X. Glorot and Y. Bengio, “Understanding the difficulty of trainingdeep feedforward neural networks,” in Proceedings of the thirteenthinternational conference on artificial intelligence and statistics, 2010,pp. 249–256.

[44] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers:Surpassing human-level performance on imagenet classification,” inProceedings of the IEEE international conference on computer vision,2015, pp. 1026–1034.

[45] “Plotneuralnet: Latex code for making neural networks diagrams,” 2018,https://github.com/HarisIqbal88/PlotNeuralNet.

[46] M. Wolter and A. Yao, “Gated complex recurrent neural networks,”arXiv preprint arXiv:1806.08267, 2018.

[47] R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of trainingrecurrent neural networks,” in International conference on machinelearning, 2013, pp. 1310–1318.

[48] S. Butterworth, “On the theory of filter amplifiers,” Wireless Engineer,vol. 7, no. 6, pp. 536–541, 1930.

[49] J. Ding, B. Chen, H. Liu, and M. Huang, “Convolutional neural networkwith data augmentation for sar target recognition,” IEEE Geoscienceand remote sensing letters, vol. 13, no. 3, pp. 364–368, 2016.

[50] J.-J. Lv, X.-H. Shao, J.-S. Huang, X.-D. Zhou, and X. Zhou, “Dataaugmentation for face recognition,” Neurocomputing, vol. 230, pp. 184–196, 2017.

[51] H. Salehinejad, S. Valaee, T. Dowdell, and J. Barfett, “Image aug-mentation using radial transform for training deep neural networks,” in2018 IEEE International Conference on Acoustics, Speech and SignalProcessing (ICASSP). IEEE, 2018, pp. 3016–3020.

[52] H. Nyquist, “Certain topics in telegraph transmission theory,” Transac-tions of the American Institute of Electrical Engineers, vol. 47, no. 2,pp. 617–644, 1928.

[53] L. Prechelt, “Early stopping-but when?” in Neural Networks: Tricks ofthe trade. Springer, 1998, pp. 55–69.

[54] C. M. Bishop, Pattern recognition and machine learning. springer,2006.

[55] I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. MIT press,2016.

[56] C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, “Understand-ing deep learning requires rethinking generalization,” arXiv preprintarXiv:1611.03530, 2016.

[57] J. Snoek, H. Larochelle, and R. P. Adams, “Practical bayesian optimiza-tion of machine learning algorithms,” in Advances in neural informationprocessing systems, 2012.

[58] Z. Yang, Z. Dai, R. Salakhutdinov, and W. W. Cohen, “Breaking thesoftmax bottleneck: A high-rank rnn language model,” arXiv preprintarXiv:1711.03953, 2017.

[59] T. Hupperich, D. Maiorca, M. Kuhrer, T. Holz, and G. Giacinto, “Onthe robustness of mobile device fingerprinting: Can mobile users escapemodern web-tracking mechanisms?” in Proceedings of the 31st AnnualComputer Security Applications Conference. ACM, 2015, pp. 191–200.

[60] S. Englehardt and A. Narayanan, “Online tracking: A 1-million-sitemeasurement and analysis,” in Proceedings of the 2016 ACM SIGSACConference on Computer and Communications Security. ACM, 2016,pp. 1388–1401.

[61] C. Arackaparambil, S. Bratus, A. Shubina, and D. Kotz, “On thereliability of wireless fingerprinting using clock skews,” in Proceedingsof the third ACM conference on Wireless network security. ACM,2010, pp. 169–174.

[62] S. Jana and S. K. Kasera, “On fast and accurate detection of unautho-rized wireless access points using clock skews,” IEEE transactions onMobile Computing, vol. 9, no. 3, pp. 449–462, 2009.

[63] M. Miettinen, S. Marchal, I. Hafeez, N. Asokan, A.-R. Sadeghi, andS. Tarkoma, “Iot sentinel: Automated device-type identification for se-curity enforcement in iot,” in 2017 IEEE 37th International Conferenceon Distributed Computing Systems (ICDCS). IEEE, 2017, pp. 2177–2184.

[64] S. V. Radhakrishnan, A. S. Uluagac, and R. Beyah, “Gtid: A techniquefor physical deviceanddevice type fingerprinting,” IEEE Transactionson Dependable and Secure Computing, vol. 12, no. 5, pp. 519–532,2014.

[65] R. R. R. Barbosa, R. Sadre, and A. Pras, “Flow whitelisting in scadanetworks,” International journal of critical infrastructure protection,vol. 6, no. 3-4, pp. 150–158, 2013.

[66] T. D. Vo-Huu, T. D. Vo-Huu, and G. Noubir, “Fingerprinting wi-fidevices using software defined radios,” in Proceedings of the 9thACM Conference on Security & Privacy in Wireless and MobileNetworks, ser. WiSec ’16. New York, NY, USA: ACM, 2016, pp.3–14. [Online]. Available: http://doi.acm.org/10.1145/2939918.2939936

[67] A. Das, N. Borisov, and M. Caesar, “Do you hear what i hear?:Fingerprinting smart devices through embedded acoustic components,”in Proceedings of the 2014 ACM SIGSAC Conference on Computer andCommunications Security. ACM, 2014, pp. 441–452.

[68] H. Ye, G. Y. Li, and B.-H. Juang, “Power of deep learning for channelestimation and signal detection in ofdm systems,” IEEE WirelessCommunications Letters, vol. 7, no. 1, pp. 114–117, 2018.

[69] T. O’Shea and J. Hoydis, “An introduction to deep learning for thephysical layer,” IEEE Transactions on Cognitive Communications andNetworking, vol. 3, no. 4, pp. 563–575, 2017.

[70] S. Rajendran, W. Meert, D. Giustiniano, V. Lenders, and S. Pollin,“Deep learning models for wireless signal classification with distributedlow-cost spectrum sensors,” IEEE Transactions on Cognitive Commu-nications and Networking.

[71] B. Kroon, S. Bergin, I. O. Kennedy, and G. O. Zamora, “Steady staterf fingerprinting for identity verification: one class classifier versuscustomized ensemble,” in Irish Conference on Artificial Intelligence andCognitive Science. Springer, 2009, pp. 198–206.

[72] K. I. Talbot, P. R. Duley, and M. H. Hyatt, “Specific emitter identifica-tion and verification,” Technology Review, vol. 113, 2003.

[73] M. J. Riezenman, “Cellular security: better, but foes still lurk,” IEEESpectrum, vol. 37, no. 6, pp. 39–42, 2000.

[74] P. Scanlon, I. O. Kennedy, and Y. Liu, “Feature extraction approachesto rf fingerprinting for device identification in femtocells,” Bell LabsTechnical Journal, vol. 15, no. 3, pp. 141–151, 2010.

[75] I. Bisio, C. Garibotto, F. Lavagetto, A. Sciarrone, and S. Zappatore,“Unauthorized amateur uav detection based on wifi statistical fingerprintanalysis,” IEEE Communications Magazine, vol. 56, no. 4, pp. 106–111,2018.

[76] Y.-C. Hu, A. Perrig, and D. B. Johnson, “Packet leashes: A defenseagainst wormhole attacks in wireless ad hoc networks,” in Proceedingsof INFOCOM, vol. 2003, 2003.

https://github.com/HarisIqbal88/PlotNeuralNet

http://doi.acm.org/10.1145/2939918.2939936

[77] R. W. Klein, M. A. Temple, and M. J. Mendenhall, “Application ofwavelet-based rf fingerprinting to enhance wireless network security,”Journal of Communications and Networks, vol. 11, no. 6, pp. 544–555,2009.

[78] B. Danev, H. Luecken, S. Capkun, and K. El Defrawy, “Attackson physical-layer identification,” in Proceedings of the third ACMconference on Wireless network security. ACM, 2010, pp. 89–98.

[79] S. Kanai, Y. Fujiwara, Y. Yamanaka, and S. Adachi, “Sigsoftmax: Re-analysis of the softmax bottleneck,” in Advances in Neural InformationProcessing Systems, 2018.

[80] J. Bromley, I. Guyon, Y. LeCun, E. Sackinger, and R. Shah, “Signatureverification using a” siamese” time delay neural network,” in Advancesin neural information processing systems, 1994, pp. 737–744.

[81] A. Binder, G. Montavon, S. Bach, K. Muller, and W. Samek,“Layer-wise relevance propagation for neural networks with localrenormalization layers,” CoRR, vol. abs/1604.00825, 2016. [Online].Available: http://arxiv.org/abs/1604.00825

[82] P. Werbos, “Beyond regression:” new tools for prediction and analysisin the behavioral sciences,” Ph. D. dissertation, Harvard University,1974.

[83] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”arXiv preprint arXiv:1412.6980, 2014.

[84] M. D. Zeiler, “Adadelta: an adaptive learning rate method,” arXivpreprint arXiv:1212.5701, 2012.

[85] “Sigmf specification,” 2017, https://github.com/gnuradio/SigMF/blob/master/sigmf-spec.md.

[86] A. N. Tikhonov, A. Goncharsky, V. Stepanov, and A. G. Yagola,Numerical methods for the solution of ill-posed problems. SpringerScience & Business Media, 2013, vol. 328.

[87] “Commpy: Digital communication with python,” 2015, https://github.com/veeresht/CommPy.

APPENDIX

A. Background

1) General training and backpropagation.: In the generalcase, we use a subset of available data as training samples thatthe networks attempt to best describe by adjusting their pa-rameters accordingly. This adjustment step is usually done bypassing a number of samples trough the network, acquiring theoutput and combining it with a function that is usually referredto as a loss function or cost module. This loss function outputsa value, usually scalar, that is a metric representing the erroror distance of the network’s output relative to the ground truth.Driving the loss towards zero is a desirable goal, as it tends tolead to a better performing network; this is done by adjustingthe parameters of the network in such a way that if we passedthe same sample we used to obtain the loss for this iteration thenew output’s loss will be smaller. This is generally performedwith a method known as backpropagation (proven to be ableto train deep networks [82]), which essentially uses the chaindifferentiation rule to discover the contribution of each ofthe network’s parameters to the computed loss. The algorithmstarts computing the expressions from the final layers of thenetwork (closest to the output), towards the start, computingthe error contributions of each layer’s representation based onthe previous layer’s representation of the input. The generalalgorithm employed for the actual adjustment step, i.e, themagnitude and direction of change, is given by a gradientdescent-based algorithm such as Stochastic Gradient Descent(SGD) or variants like Adam [83] and AdaDelta [84].

2) SigMF: storing I/Q data.: Storing and sharing RF datahad been a long standing problem. Following the DARPA RFinitiative, the SigMF specification was established in 2017 to

tackle this issue and allow for a flexible yet well-specifiedstandard for sharing RF data in an efficient manner. The rawI/Q data is specified in an interleaved form in a binary filewhere the precise location of each signal in the data file isdenoted in an accompanying metadata file in a JSON format.The metadata files also contain a host of additional informationsuch as capture details, sampling rate and may optionally beextended to support new custom headers. All the datasets usedin this work utilize the SigMF [85] specification.

3) Quadrature Modulation: Figure 11 depicts a schematicof an ideal quadrature modulator. The output signal is com-posed of a linear combination of signals in-phase or “I”component and the quadrature or “Q” component. Besides itsapparent simplicity, quadrature modulators can synthesize anyreal signal.

I

Q

Re{X}

Im{X}

C{X}

Acos(ωc+Φ)

Asin(ωc+Φ)

Fig. 11: Quadrature Modulation Overview and I/Q data. Aninverse procedure is followed on the receiver side in order toobtain the I/Q pair from the modulated signal.

4) Inverse Square Root of Matrix V: To guarantee theexistence of the inverse square root we leverage the Tikhonovregularization [86] that is performed by: ε I + V. Meansubtraction and multiplication by the inverse square root of thevariance guarantees a standard complex distribution with meanµ = 0, covariance Γ = 1 and relation C = 0. As in the realcase batch normalization procedure, two learnable parametersare used: the shift parameter β and scale parameter γ. Thesetranslate and scale a layer’s input along new learned principalcomponents to achieve the desired variance.

B. Data Augmentation using Rotation, Scaling and Noise

While our experiments using decimation revealed that itcan have a positive impact, we found that for our particulartask these transformations actually have a negative impact onour system’s performance. Below we provide more details forthese augmentations.

1) Rotation and scaling.: Similarly to their 2D counterparts(images), signals can be rotated (phase shift) and scaled(magnitude scaling). This lead to an expansion of the amountof available data by introducing reversible transformations.

We induce rotation by adding a phase shift and scalingby altering the magnitude of each complex data point. Tocalculate the phase and magnitude we convert each point topolar form, perform the necessary calculations, and recover theI/Q components using Euler’s equations. Formulae 2 describe

http://arxiv.org/abs/1604.00825

https://github.com/gnuradio/SigMF/blob/master/sigmf-spec.md

https://github.com/gnuradio/SigMF/blob/master/sigmf-spec.md

https://github.com/veeresht/CommPy

https://github.com/veeresht/CommPy

the procedure for rotation, scaling follows the same principle.

Original Signal = I +Qj

Polar Form = ρeφ, ρ =√I2 +Q2, φ = arctan (I,Q)

Rotation(θ) = ρeιφeιθ = ρeι(φ+θ)

I ′ = ρ cos (φ+ θ), Q′ = ρ sin (φ+ θ)j,

Rotated Signal = I ′ +Q′j

(2)

2) Adding noise - channel effects.: Perhaps the most promi-nent problem regarding RF data is the effect the channelhas to captured signals under different channel conditions. Asshown in Figure 1 signals from the same device captured overdifferent days can differ drastically. The fundamental way thesignal mutates under different channel conditions introducesa mismatch between the data used for training and signalscaptured for testing at a later time when the ambient channelconditions deviate significantly between them. We evaluatedtwo methodologies and their efficacy in improving the model’sgeneralization in regards with being able to classify signalsfrom a day not part of the training set: (i) Augmenting thedataset by introducing samples from different days in thetraining data (ii) Augmenting captured signals synthetically byintroducing Rician or Rayleigh fading using a SISO channelmodel, based on [87].

C. Hyperparameters

The hyperparameters used for each architecture are ob-tained using Spearmint [57], thus allowing 500 runs for each(complex) architecture. The input data used in Spearmint foroptimizing the hyperparameters is a balanced dataset that Itis important to note that even though using Spearmint andits underlying Bayesian optimization engine requires less triesthan conventional searches such as grid search, we were stillunable to exhaustively explore all options as that would requirea prohibitive amount of time and used the best values after500 tests. We included the following hyperparameters in theSpearmint exploration for each architecture:

CDCN: Learning rate, kernel/stride/padding of the convo-lutional layer, the size of the output dense layer.

RDCN: Learning rate, weight decay, the timestep (µs) usedby the sequencer to split the signal,the size of the hidden layerof the LSTM, the number of layers in the LSTM unit andwhether it should be bidirectional or not.

For both architectures the Spearmint search configurationsalso include the factor of decimation factor M and explorewhether the following layers should be part of the architectureor not: (i) batch normalization for the input layer,(ii) batchnormalization for each layer independently. For the decimationfactor our exploration range lies between two and four, givenour sampling rate factors above 2 violate Nyquist criterion, andthe original signal (for the WiFi category) can no longer befaithfully reconstructed. Since our goal in this work is rather tofingerprint the devices and not to reconstruct the original signalwe evaluated the impact of using higher decimation factors,even if that would violate Nyquist’s theorem. We identified thatindeed the best results where obtained with a decimation factorof 2 (which obeys the theorem). As a final note, while thenon-conforming (M > 2) decimation factors produced worst

results they also led to less computations and less parameterswhich in some applications might be a wanted outcome.

Optimized hyperparameters. Our exploration revealedthe following hyperparameters and network configurations:CDCN: lr=1e-4, weight decay=9e-5, kernel size=32, stride=1,padding=1, dense layer=4096.

RDCN lr=1e-4, weight decay=9e-5, single layer unidirec-tional LSTM with 1024 hidden units and sequencer time stepset to 1µ leading to 64 vectors of 100 data points for thebaseline setting.

For both networks the batch normalization layers werefound to be of critical importance and the best performingactivation unit was found to be CReLU. We believe that sinceour channels are mixed channels containing information thatis not entirely real or imaginary, the effect of ZReLU thatconstricts activations to the first quadrant is over-constrainingthe network, whereas the outcome of CReLU that allowsactivation on positive values for both channels shows betterresults.

It is important to note that these hyperparameters are opti-mized for our design goals, i.e., operating on raw I/Q data andwith minimal pre-processing being performed. Alternate datarepresentations or extra pre-processing steps would potentiallyrender these hyperparameters invalid and a re-exploration usingSpearmint (or a conventional, albeit slower, grid search) toobtain the optimal parameters would be required.

deep complex networks for protocol-agnostic radio ...applicability in practical and large scale...

Documents