time of flight estimation in the presence of outliers part...

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

1

Abstract— When the Signal-to-Noise Ratio (SNR) falls below

a certain level, the error of the Time-of-Flight (ToF) Maximum

Likelihood Estimator (MLE) increases abruptly due to the well-

known threshold effect. Nevertheless, operating near and below

the threshold SNR value might be necessary for many remote

sensing applications due to power-related constraints. These

constrains may include a limit on the maximum power of a single

source pulse or a limit on the total power used by multiple signals

transmitted during a single measurement.

For narrowband signals, the threshold effect emerges mostly

due to outliers induced by local maxima of the autocorrelation

function of a source signal. Following the previously explored

path of biosonar-inspired echo processing, this paper introduces

a new method for ToF estimation in the presence of outliers. The

proposed method employs a bank of phase-shifted unmatched

filters for generating multiple biased but only partially correlated

estimators (multiple experts). Using machine learning techniques,

the information from the multiple experts is combined together

for improving the near-the-threshold ToF estimation from a

single echo. .

Index Terms—Sonar, Threshold Effect, Biosonar, Time-of-

Flight Estimation

I. INTRODUCTION

he classical Time of Flight (ToF) estimation method

employs a matched filter[20] for the estimation of the two-way

travel time. Since the returned signal is usually corrupted by

an additive noise, the resulting estimator produces a value

which is only an approximation for the true two-way travel

time. The average magnitude of the squared estimation error

depends on the amount of additive noise and a shape of the

source waveform [13]. For moderate levels of noise and for

source pulses with sufficiently wide bandwidth, the Mean

Square Error (MSE) of the classical estimator closely follows

Manuscript received Mat 1, 2012. This work was supported in part by the U.S. Army Research Office

A. Apartsin is with the Blavatnik School of Computer Science, Tel -Aviv

University, Israel(phone: +972-3-6408040;fax: +972-3-6409373;e-mail: [email protected])

L.N. Cooper is with Institute For Brain and Neural Systems and Physics

Department, Brown Univeristy, Providence, USA (e-mail:

[email protected])

N. Intrator is with the Blavatnik School of Computer Science, Tel -Aviv

University, Israel(e-mail: [email protected])

the Cramer-Rao Low Bound (CRLB) which is the best

achievable performance for any “good” estimator [10]. In the

theory of optimal receivers, these conditions correspond to a

receiver being in the coherent state [22].

However, in many practical cases, active sonar or other

remote sensing devices are employed under conditions that no

longer ensure a coherent reception and estimation. An

application might impose power and bandwidth constrains on

the source pulses generated by a transmitter. For instance, the

maximum power of an individual pulse could be restricted by

the limitations of electronic or acoustic equipment employed

at the transmitter. For military applications, it is sometimes

important to keep the source power low to avoid detection by

the adversary. The low accuracy of an individual estimations

could be somehow compensated by averaging (fusing)

multiple independent measurements. However, the usage of

multiple source pulses for a single estimation might be limited

by constraints imposed on the total measurement time (or,

equivalently, on a number of measurements) and by constrains

on the total employed power. The later is crucial for

measurements that are done by a mobile robotic sensor with

limited autonomous power source (a battery) [8]. Moreover,

there are also might be constraints on a shape of the

transmitted source signal. For instance, underground

exploration by low–powered seismic pulses employs low-

bandwidth source waveforms since the high frequency

harmonics are attenuated rapidly in the ground [7].

The classical approach based on the application of matched

filter and on a simple averaging of multiple measurements is

not adequate for these constrained cases. It has been shown

that, if the level of noise rises above a certain level, the mean

squared error of the conventional ToF estimator abruptly

diverges from CRLB producing the well-known threshold

effect [25]. Moreover, the threshold effect intensifies when a

source pulse waveform has low bandwidth spectrum as it is

sometimes the case in the underground explorations. The

classical approach does not provide an efficient way for

taming the Mean Square Error (MSE) of near-the-threshold

ToF estimation and it does not address the energy-related

constraints in a proper way. Therefore, the alternative

approach is required for improving the efficiency of ToF

estimation that is carried out in the presence of the threshold

effect and within the imposed power limitations.

In [2] [27], an alternative approach for ToF estimation is

described. The ToF estimation is carried out using a version of

the Kalman filter. The shape of the pulse envelope is modeled

using a parametric model and the parameters of the model and

the ToF are jointly estimated “on-the-fly” by an iterative or

recursive procedure. The method described in this paper does

Time of Flight Estimation in the Presence of

Outliers Part I-Singe Echo Processing

Alexander Apartsin, Leon N. Cooper, and Nathan Intrator

T


2

not assume any parametric model for the pulse shape or its

envelope. Using the machine learning approach, the estimator

is accommodated for any specific pulse waveform by training

from the simulated data.

It appears that some of the ideas for improving the

accuracy of near-the-threshold ToF estimation could be

borrowed from nature. Many echolocating animals use

mechanical waves (e.g. airborne ultrasound or underground

infrasound) for communications and navigations. Naturally,

biological systems have many constrains similar to those

described above and yet, the echolocating animals

demonstrate striking ability for the efficient extraction of

information from the very noisy signals [12]. Moreover, a

biosonar processing system employs individual elements (e.g.

cells) with insufficient time-frequency sensitivity for reliably

estimating the signal parameters directly. However, organized

in massively parallel sensory and computational networks,

these elements can function together with astonishing

accuracy [15].

A biologically inspired approach to the ToF estimation tries

to extract additional information from the received signal by

employing a bank of filters instead of a single matched filter

as in the classical theory. The partial information obtained by

the application of individual filters is combined together using

robust fusion methods in order to produce an estimator that

outperforms the classical matched filter.

There is a significant body of research focusing on

analyzing and mimicking biosonar systems. In [5], a filter

bank of 22 combined band pass and low pass filters was

inspired by the echolocating characteristics of big brown bat.

The information from these filters was combined using three

fusion techniques including average-like and voting-like

methods. This work has shown that in high noise settings,

accuracy comparable to that of the classical approach could be

achieved by intelligently fusing information from multiple

bandwidth-constrained filters. In [19], the two-glint resolution

of constrained sonar was analyzed using similar approach that

included a filter bank of 81 low-bandwidth filters followed by

a form of template matching. The resolution achieved by this

approach was far beyond the resolution of individual

processing elements (filters).

In this work, we continue with a biosonar-inspired approach

by analyzing the threshold effect for simple narrowband

source signals. Identification of the major source of the rapid

degradation of accuracy caused by the threshold effect

motivates the design of a family of full bandwidth phase-

shifted filters. These filters combined together provide more

information compared to the matched filter alone. This extra

information can be used for mitigating the threshold effect

and, thus, for improvement in the accuracy of a single

measurement. We employ machine learning techniques for

extracting the information from the vector of responses

generated by the application of these multiple filters. We

construct a classifier to assign a label to a measurement based

on the vector of filters’ responses computed on the retuned

signal. Using this assigned label, the classical estimate

(obtained using the matched filter) is corrected to account for

the expected bias that appears due to the threshold effect.

In our previous work, we used computer simulations to

demonstrate the effectiveness of phase-shifted unmatched

filters for constructing a SNR-dependant Time-of-Arrival

(ToA) estimator that improves estimation performance in the

presence of the threshold effect. In a recent paper [4], a simple

regression method was used for constructing a more robust

ToA estimator which does not require a priori knowledge of

the operational SNR value. We extend this proposed approach

by providing an analytical treatment of the phase-shifted

unmatched filters and by introducing a more robust fusion

technique which is based on weak outlier classifiers.

The work is organized as follows. In section II, the formal

framework for the two-way travel time estimation problem is

established and then this framework is used for analyzing the

major cause of the threshold effect for narrowband signals. In

section III, the phase-shifted unmatched filters and associated

biased estimators are introduced and discussed. The machine

learning-based technique for fusion of the information from

the biased estimates is outlined in section IV. In section V, we

report simulation results which demonstrate the effectiveness

of the described methods. The work is concluded by the

summary section which also discusses possible generalizations

of these methods and possible direction for future work.

II. TIME-OF-FLIGHT PROBLEM AND THE

THRESHOLD EFFECT

In this section, we provide an analytical treatment of the

threshold effect associated with Time of Flight (ToF)

Maximum Likelihood Estimator. We derive an approximation

for the probability of an outlier event using the fundamental

properties of a narrowband source signal, namely its

centralized bandwidth to central frequency ratio. In the next

section, we will extend the analysis for the biased estimator

and show that it is possible to devise a family of biased

estimators which are not completely correlated.

The Time of Flight estimation (ToF) problem also known

as “Time of Arrival” (ToA) estimation problem arises in the

context of Radar, Sonar and other remote sensing applications.

In discrete formulation, the problem could be stated as follows

[20]. An acoustic source emits a signal waveform ( )s t and,

then, the reflected signal contaminated by Additive White

Gaussian Noise (AWGN) ( )n t is picked up by a receiver

after a time delay 0t . Then, the received signal is given by:

0( ) ( ) ( ) r t c s t t n t (1)

Where c is an unknown factor that accounts for a signal loss

due to the attenuation in the medium. Since, in this work, we

consider the estimation under unknown SNR, without loss of

generality, we assume c=1 for the rest of the presentation. The

estimator that we describe below is insensitive to the average

power of the received signal and, thus, larger attenuations

correspond to a stronger noise or to a smaller SNR in our

model. In practice the Additive White Gaussian Noise

assumption does not always hold due to frequency-selective

attenuation and host of other media-dependent effects.

However, since our goal is to present an improvement for the

ToF estimation using low-frequency narrowband source

signals, the received signal is less affected by the frequency-

selective attenuation.


3

The goal of ToF estimation is to recover a value of

unknown time delay 0t based on the received signal ( )r t and

on the known source waveform ( )s t . We usually seek an

estimator 0̂ ( )t r that minimizes the MSE cost function. Since

noise samples ( )n t are i.i.d. random variables with a

probability density2(0, )N , the conditional probability

density function for received signal given the time delay is

|

22

2

Pr ( ( ) | )

12 exp ( ) ( )

2

r t

N

t

r t

r t s t

(2)

where N is the number of waveform samples recorded

during an observation interval. Assuming no prior knowledge

on values of time delay (uniform distribution) and using Bayes

rule, the a posteriori conditional probability density is given

by:

|

|

2

Pr ( | ) Pr( )Pr ( | )

Pr( )

1exp ( ) ( )

r t

t r

t

rr

r

K r t s t

(3)

where K is a normalization factor which accounts for parts

of the conditional density that do not depend on t . The

Minimum Mean Square Estimator (MMSE) is the conditional

expectation of a time delay given the observation:

|

ˆ [ | ] Pr ( | )MMSE t rt E r r (4) Since the evaluation of this sum involves computationally

expensive and numerically unstable steps, the Maximum

Likelihood Estimator (MLE) can be computed instead [20].

The MLE is obtained by maximizing log-likelihood function:

|

2 2

( ) log Pr ( | )

1 ( )( ) ( )

r t

t

l r

CK r t s t K

(5)

Again, the uncertainties about the attenuation constant are

encoded by the uncertainties in the noise power, i.e. unknown

SNR.

The maximization of the log-likelihood function is

equivalent to maximizing the cross-correlation ( )C (the

output of the matched filter) between the received signal and

the source signal [20]. Therefore, the Maximum Likelihood

Estimator of time delay is

ˆ ( )MLEt argmax C (6)

Furthermore, the cross-correlation could be expressed in

terms of the autocorrelation function of the source waveform

and an additive filtered noise:

0

0

( ) ( ) ( ) ( ) ( )

( ) ( )

C s t t s t n t s t

R t W

(7)

Since the filtered noise ( )W is zero-mean, the signal

autocorrelation ( )R is the expected value of the cross-

correlation ( )C when averaged over all possible noise

samples (the ensemble average likelihood function). As will

be discussed later in this section, the multimodal shape of this

ensemble-averaged log-likelihood function plays a critical role

in appearance of outliers when the noise level rises above a

certain threshold. The amount of noise in the received signal is

characterized by pre-filtered and post-filtered Signal-To-Noise

Ratio (SNR) defined by

2

2

2

2 2

( )

( )

( )

s tpre

n

Rpost

W t

s tPSNR

P N

RPSNR

P s t

(8)

Where , , ,s n R WP P P P are the power of the source signal,

the power of the noise, the power the autocorrelation of the

source signal and filtered noise respectively. When the SNR

level is high, the filtered noise ( )W t causes the MLE to be

near the global maximum of the autocorrelation function at

0t . When the SNR falls below a certain threshold level

threshSNR , the filtered noise may occasionally cause the

global maximum of the received signal to be detected far away

from the true value 0t .

For general autocorrelation function the “wrong” peak

could emerge almost anywhere outside the vicinity of the

global maximum of the autocorrelation function. But for a

special class of signals and, more generally, for a special class

of likelihood functions, the outliers will be clustered around

certain locations which are characterized by the local

maximum of the ensemble-average likelihood function. The

narrowband signals which are frequently used in sonar and

radar applications are good example for such class of signals

since the autocorrelation function of narrowband signals has

significant local maxima (side lobes).

Using discrete cosine representation, a DC-less narrowband

source signal could be represented by

1

( ) cos 2k k k

k

s t a f t

(9)

The total (finite) signal energy, the central frequency, and

mean square bandwidth of a signal in the above representation

are given respectively by

2 2 2

21 1

2 2 2

21

1,

1

k c k k

k k

k k

k

E a F a fE

B a fE

(10)

The centralized mean square bandwidth of a signal is given

by

22 2 2 2

21

1c k k c c

k

B a f F B FE

(11)

We assume that most of the signal energy is concentrated

around its central frequency or, equivalently, the signal’s


4

centralized bandwidth is much smaller than the signal central

frequency, namely 2

21 or

1 1 for all such that a 0

c

c

k c kk

c c

B

F

f F fk

F F

Under this assumption, the autocorrelation function of a

signal has local maxima (side lobes) located at a distance

1m

c

T mF

from the peak at 0t . For the rest of this

section we consider only the two most significant side lobes

corresponding to values of 1m although in our

simulations even more distant side lobes are taken into

account. The probability of outliers is the probability that a

local maximum of the signal autocorrelation function becomes

the global maximum of the cross-correlation function. Below

we approximate the probability of this event (an outlier event)

using a narrowband signal representation. Without loss of

generality, we assume 0ot as we are interested only in

relative positions of the autocorrelation maxima. We also

consider the signal and noise values only in the vicinity of the

global maximum and near a single highest side lobe of the

cross-correlation (two-point model). Then, using this model,

the outlier probability is

1Pr( ) Pr (0)

1 1Pr (0) 0

c

c c

outlier C CF

R R W WF F

(12)

Using Taylor expansion the left hand side of this inequality

is approximated by

2

0

2

0

2222 2 2

2 20

1(0) (1 cos 2

(1 cos 2

22

kk

kc c

k ck

k c

ck k c

kc c

fR R a

F F

f Fa

F

Ba f F E

F F

(13)

Denoting

22

2

c

c

BV

F , the difference between values of

autocorrelation function at its peaks could be expressed as2 2 22 E V . It is clear from the representation that the

difference decreases when the central frequency is larger (as

peaks are more closely spaced) or if the signal centralized

bandwidth is lowered, (the signal envelope becomes less

sharp). Asymptotically, if the signal bandwidth is made very

small (tends to zero), the resulting signal approaches a

sinusoid with zero differences between peaks height.

Now we turn our attention to the difference between

filtered noise samples. Filtered noise samples are linear

combinations of zero-mean normal variables and, therefore,

are distributed as zero mean normal random variable with

variance2 2 2

w E . The covariance between filtered noise

samples at distance 1

cF is

2

1 1(0, ) (0)

1

W

c c

c

Cov E W WF F

RF

(14)

Again, using Taylor expansion the value of the

autocorrelation function near the side lobe could be

approximated by:

2

0

2 2 2

1cos 2

1 2

k ck

kc c

f FR a

F F

E V

(15)

Therefore, the correlation coefficient between filtered noise

samples at neighboring maxima is

2 2 2 2

2 2

2 2

1 210, 1 2W

c

E VV

F E

(16)

The difference between two correlated normally distributed

zero-mean random variables is also zero-mean normal with

variance:

2 2 2 2 2 2 2

2 2 2 2

2 2 1 2

4

W E E V

E V

(17)

Therefore, the probability of the outlier becomes

Pr( ) Pr

2 2

Pr

R WR W

EV EV

E EZ V Q V

(18)

Where Z is a random variable with standard normal

distribution and Q is the tail probability of the standard normal

distribution. Therefore, the probability of outlier becomes

large when either signal-to-noise ratio E

or centralized

bandwidth to central frequency ratio V is small.

For an outlier measurement, there will be a bias of 1

cF

induced by left and right side lobes of the autocorrelation

function. Since the probabilities of “left” and “right” outlier

are equal, the overall bias due to outliers will disappear.

However, since the impact of an outlier to the overall error is

proportional to the bias squared, an outlier event will seriously

affect MSE of the estimation. The resulting MSE could be


5

decomposed into parts reflecting the impact from inlier and

outlier events separately:

2 2

2

Pr( )

(1 Pr( ))

outlier outlier

inlier

MSE outlier Variance Bias

outlier Variance

(19)

Assuming that impact to the MSE from the expected

additional variance due to outlier

2 2Pr( )

outlier inlieroutlier Variance Variance (20)

is small relative to the impact by the bias squared term, the

MSE is approximated by:

2 2Pr( )inlier outlierMSE Variance outlier Bias

(21)

Therefore, the cost of outlier for the overall MSE is

proportional to

2

c

EQ Z V

F

(22)

The analysis above is consistent with the well-known s-

shaped curve describing dependency between the MSE of

MLE and the Signal-to-Noise Ratio (SNR) [23]. For large

values of SNR, the probability of outlier is small and,

therefore, the MSE approaches the CRLB curve and the

system is said to be in the coherent-state. As soon as SNR

drops below a certain threshold value, the MSE increases

abruptly due to the presence of outliers and, accordingly, the

system is said to be in the semi-coherent state. For very low

SNR values, the sidelobes produced by the strong presence of

the central frequency in the signal spectrum do not play a

significant role in the location of outliers and the system is

said to be in non-coherent state.

In the analysis above, we have neglected the impact to the

MSE by the additional variance due to outliers. This is

justified by the fact that the probability of outlier and the

difference between inlier and outlier-induced variances are

inversely connected. For the probability of outliers to be

significant, the heights of the main lobe and the secondary

lobe need to be comparable, making the difference in outlier

and inlier variances small. Conversely, for the expected bias

component of the MSE, the bias square part is independent

from the probability of an outlier event. Nevertheless, in

forthcoming part II paper we address the variance parts of the

MSE by combining estimation from the multiple

measurements (pings).

For the rest of the paper, we are mainly concerned with the

semi-coherent area and the methods for improving the MSE

by reducing the effect of outliers on it. We conclude this

section by noting that the central frequency appears in both

nominator and denominator of the expression above.

Therefore, although the signals with higher central frequency

have greater probability for outlier, the impact on overall MSE

of each outlier event is smaller.

III. PHASE-SHIFTED UNMATCHED FILTERS

In this section, we introduce the Phase-Shifted Unmatched

Filters and the associated biased ToF estimators. We extend

the analysis from the previous section to show that although

each of the biased ToF estimators has larger estimation error,

the errors of these estimators are not completely correlated.

This property, which we denote by Semi-correlated Estimators

(SME) enables a fusing of multiple biased ToF estimators

into a single robust ToF estimator using machine learning

algorithms which are described in the next section

A Phase-Shifted Unmatched Filter is generated from the

source waveform ( )s t by shifting the phase of all of its

spectrum content by a same value . The phase-shifting

operation can be easily performed using the Hilbert Transform

[10] that could be computed numerically. Continuing with

previously used notation, the phase-shifted signal is

represented as

0

( ) cos 2k k k

k

s t a f t

(23)

The cross-correlation between the received signal and the

unmatched filter could be split into a sum of two terms as in

the previous section:

( ) ( ) ( )C t R t W t

(24)

The first term is the cross-correlation between the source

signal and the unmatched filter. The second term represents an

additive filtered Gaussian noise. The cross-correlation

between the source signal and unmatched filter is given by

2

0

( ) cos 2k k

k

R t a f t

(25)

The associated ToF estimator is obtained by locating the

global maximum in the unmatched filter output, that is

ˆ arg max ( )t C t

(26)

For small values of phase shift, the global maximum will be

located near the peak of autocorrelation function. Therefore,

the position of the global maximum of ( )R t could be

estimated by setting the time derivative to zero and using

Taylor expansion around zero phase and time

2

0

( )2 sin 2 0k k k

k

R ta f f t

t

(27)

2

0

2

0

2 2

sin 2

2

2 0

k k k

k

k k k

k

c

a f f t

a f f t

E B t F

(28)

and, therefore, the position of global maximum is

2 2 2

1 1

2 2 1

M c

c c c

Ft

F B F V

(29)

The value of the cross-correlation function at its global

maximum could be approximated using linear and quadratic

terms of the Taylor expansion


6

2

20

222

20

( ) cos 1

1 12

M ck k

k

ck k

k

FR t a f

B

Fa f

B

(30)

Expanding and substituting we obtain:

222

20

2 222 2

2 2

2 22

2

2 22

2

1 12

2 12

12 1

(0)2 1

ck k

k

c c

Fa f

B

F FE E

B B

VE

V

VR E

V

(31)

The height of the peak of the cross-correlation between

source signal and phase-shifted unmatched filter is lower

compared to the peak of the source signal autocorrelation.

Therefore, the unmatched filter does not maximize peak

signal-to-noise ratio as the matched filter does. The difference 2

2 2

21

VE

V

depends on a specific value of the phase shift.

Since the phase shift appears in quadratic form, the difference

is the same for positive and negative phase shift of the same

magnitude.

The heights of two closest sidelobes of cross-correlation

function which are located at distances 1

m

c

TF

from its

main peak could be estimated using Taylor expansion:

2

2

0

2

2

0

1

2cos 1

2 ( )cos 1

M

c

c k

k k

k c

c k c

k k

k c

R tF

F fa f

B F

F f Fa f

B F

(32)

2

2

2

0

1

2 ( )11 1

2

M

c

c k c

k k

k c

R tF

F f Fa f

B F

(33)

2 2 2

2

1

12

1 2

M

c

c

R tF

V ER

F V

(34)

Therefore, the difference between the heights of sidelobes

and the main peak is

22

2

2 2 2

2

1 1( )

12

1

12 1

1

M M

c c

c

R R t R tF F

VR E

F V

E VV

(35)

Thus, the difference between the heights of sidelobes

depends on their location relative to the main peak. For

positive phase shift, the difference between the main peak and

the right sidelobe is greater than difference between the main

peak and the left sidelobe (figure 1). Since the filtered noise

has the same autocorrelation function as in case of the

matched filter, the probability of a left outlier for t̂ is greater

than the probability of the right outlier. For instance, the

probability of the left outlier generated by the larger sidelobe

is therefore

2 2 2

2

2

2

1 1Pr ( ) Pr

12 1

1Pr

2

1Pr 1

1

11

1

left

c c

outlier W RF F

E VV

ZEV

EZ V

V

EQ V

V

(36)

It seems that the probability of at least one class of outliers

is greater for a single phase-shifted estimator. However, the

relation between probabilities of left and right outlier is

reversed if we consider the estimator generated by phase shift

with the same magnitude but different sign. Therefore, the

expected bias of two estimators with opposite phase shift is

canceled out and their average constitutes an unbiased

estimator for ToF.

ˆ ˆˆ

2

t tt

(37)

We have reported the improvement achieved by this simple

average in [2]. The gain in performance obtained by such

estimator depends on the SNR level and on a selected phase


7

shift value. The employed value of phase shift defines the

tradeoff in MSE between the inlier variance and the expected

square of outlier bias. The optimal value of a phase shift could

be calibrated to match the underlying SNR level and the

source signal parameters [3]. Unfortunately, this method

requires the knowledge of the operating SNR level which is

not always available in practice. This requirement could be

avoided by combining information from several pairs of

symmetrically phase-shifted unmatched filters. Since each

phase-shifted unmatched filter perturbates the noise signal in a

slightly different way, each estimator derived from a different

phase-shifted unmatched filter contains some additional

information on the interrogated signal. For instance, for two

different values of phase shift, the covariance between noise

samples at global maximum is

2

2

2

2 2

2 2

2

( ( ) ( ))

( )

1 1( )

2 1

12 1

M M

w

M M

c

Cov W t W t

R t t

RF V

VE

V

(38)

And, therefore, the correlation coefficient (assuming small

value of phase shift)

2 2

, 21 1

2 1

M V

V

(39)

Thus, for low SNR (presence of outliers) the phase shift of

the source waveform generates a family of unmatched filters,

from which a family of biased semi-correlated estimators can

be constructed. This family of SMEs can be fused into a single

estimator with improved estimation accuracy.

Since the correlation between outlier events is decreasing

as the differences between phase shift values is increasing, it

seems that, fusing a pair of estimators corresponding to a

greater difference between corresponding phase shift values

should make sense. However, as the underlying unmatched

filters increasingly diverge from the shape of the matched

filter, the error of the individual biased estimators increases.

Therefore, since the probability of outliers increases with a

decrease in the SNR, the estimators with larger phase shift

differences should be assigned a greater weight in the SMEs

fusion as the correlation between them becomes more

important. This suggests that the fusion of the SMEs depends

on the SNR.

In addition, since often an unbiased estimator is required, a

fusion method should somehow reduce the combined bias of

multiple input estimators to produce an unbiased output

estimate.

In [4] we have introduced a method for fusion of multiple

biased estimators using regression. The Decision Trees based

regression [1] was trained using a mix of samples for a range

of SNR values in the semi-coherent zone and, thus, the

combined estimate did not require a priori knowledge of the

operational SNR value. The method provided up to 10%

improvement in the estimation accuracy compared to the

conventional Maximum Likelihood Estimator.

In the next section, we introduce an alternative fusion

scheme that relies on phase-shifted unmatched filters for a

weak classification of ToF MLE outcomes into classes

corresponding to peaks of the signal autocorrelation function.

Using the labels produces by the classifier, the expected bias

introduced by an outlier measurement is reduced from the

MLE value, resulting in up to a 30% increase in the estimation

accuracy. The additional information generated by the

classifier could be used for devising an efficient method for

fusion of multiple independent ToF measurements in the

presence of outliers. This and other applications of the

described method will be addressed in forthcoming papers.

IV. MACHINE LEARNING FUSION OF BIASED

ESTIMATORS

This section describes a novel machine learning method

for improving the MSE of the Maximum-Likelihood ToF

estimator at the semi-coherent region. The method employs a

single weak classifier that is trained to label the MLE value

according to the side lobe of the autocorrelation function that

induced the peak in the output of the matched filter. The

classifier relies on a bank of phase-shifted unmatched filters to

generate an input feature vector for training and classification.

Based on the label produced by the classifier, an expected bias

due to outliers can be computed using the estimated prior

probabilities of different outlier types and the estimated

confusion matrix of the classifier. The resulting estimator is

obtained by subtracting the expected bias from the ToF MLE

value. Below we describe the method in a general form to

demonstrate its applicability to similar estimation problems.

Consider an estimator x̂ of a parameter x contaminated by

additive random noise : x̂ x . We model outliers by

assuming that the random noise is sampled form m different

distribution (similar to a mixture model). These noise

probability distributions are characterized by the vector of

means 1( ... )m and the vector 2 2 2

1( ... )m of

their variances. There is also a vector of prior probabilities

1

1

( ... ), 1m

m i

i

p p p p

such that the probability of a

noise sample to be selected from distribution i is ip . We

define true (unknown) labeling function ˆ( )N x . We say that

ˆ( )N x i if noise at measurement x̂ is generated using the

probability distribution with mean i and variance 2

i

The expected bias of the measurement is

0

1

m

i i

i

b p

. Therefore, we can compute the error

of unbiased estimator 0ˆ ˆ

ux x b as


8

2 2 2

0 0

2 2 2

1 1

ˆ( ) ( )u

m m

i i i i o

i i

e x x b

p p b

(40)

If the expected bias is zero as in the case of symmetric

outliers induced by an autocorrelation function, the last term

disappears and the total error becomes:

2 2 2

0

1 1

m m

i i i i

i i

e p p

(41)

Assume we can train a (weak) classifier that assigns labels

ˆ( ) 1...L x m to an estimate x̂ such that

,ˆ ˆPr( ( ) | ( ) ) i jL x i N x j g which is the probability for

a measurement to receive a label i by the classifier given that

the measurement’s noise is generated by the j-th distribution.

These conditional probabilities form the confusion matrix G of

the classifier: ,

, 1, 1{ }m m

i j i jG g .Than we can express

posteriori probabilities for a sample to be labeled as class i as:

1

,

1

ˆPr( ( ) )

ˆ ˆ ˆPr( ( ) | ( ) ) Pr( ( ) )

i

m

j

m

i j j

j

q L x i

L x i N x j N x j

g p

(42)

or in a matrix form q G p . Given this posteriori class

probabilities, the likelihood matrix,

, 1, 1{ }m m

i j i jC c becomes:

,

,

ˆ ˆPr( ( ) | ( ) )

ˆ ˆPr( ( ) | ( ) )

i j

i

j

j i i

j

c N x i L x j

L x j N x i p

q

g p

q

(43)

Based on this likelihood matrix, we can also compute the

conditional bias of a measurement given the classifier has

labeled it with class i .

ˆ( )ˆ( ) ,

1

ˆ( )L x i

m

i L x i j j i

j

b x x c

(44)

Therefore, we can reduce appropriate bias b(x) from the

estimate using additional information obtained from the

classifier using:

ˆ ˆ( ) if ( )ib x b L x i (45)

The expected value of this function is equal to the bias of

the original estimator as

1

,

1 1 1

ˆ( )m

i i

i

m m m

i j j i j j o

i j j

b x q b

q c p b

(46)

This allows constructing the modified unbiased estimator

ˆ ˆ ˆ( )mx x b x . The MSE of the new modified estimator is

2 2 2

2 2 2 2 2

1 1

2

2 2

1 1

ˆ ˆ( ) ( ( ))m m

m m

i i o o i i

i i

m m

o i i i i

i i

e x x b x

q b e b q b

e q b q b

(47)

The improvement of the MSE depends on the differences

between the last two terms which is always non-positive due

to Jensen’s inequality [14]

2

2

1 1

m m

i i i i

i i

q b q b

(48)

For the case of the symmetric outliers

1 1

0m m

o i i i i

i i

b q b p

and for a perfect classification

achieving i iq p , the bias square part of the MSE is

completely reduced, resulting in MSE containing only the

variance part

2 2

0

m

m i i

i

e p

(49)

To demonstrate the accuracy gain obtained by this

approach, we consider a simple example involving two

symmetric outliers (called left and right outliers) which occur

with prior probability 1

2Op each. Therefore, the vector of

the prior probabilities becomes:

, ,( , ) ( 1 2 , )L M R O I O Op p p p p p p p (50)

The outliers are symmetric, thus, the vector of means is

( ,0, ) ( ,0, )L R O O . The variances of outliers

are identical and are different from the variance of the inlier2 2 2 2( , , )O I O . Assume we have managed to train a

classifier which produces a constant misclassification rate on

all classes. That is

,

1ˆ ˆPr( ( ) | ( ) ) ,

2i jg L x i L x j i j and

,ˆ ˆPr( ( ) | ( ) ) 1 2i ig L x i L x i .

Under these settings, the bias 0b of the estimator x̂ is zero

and the error becomes

2 2 2 2

0 2 2I I O O O Oe p p p (51)

The vector of posterior class probabilities can be shown to

be

( , , ) ( , , )

( (1 3 ),

2 , (1 3 ))

L M R O I O

O O I

O O O

q q q q q q q

p p p

p p p

(52)


9

An finally, the conditional expected biases given the class

labels are

0Mb

( )O O

L R

L

pb b

q

(53)

Then the improvement in the MSE is equal to

2 22 2 2 2 2

1

2 22

2( )

(1 3 )2

(1 3 )

mO

o m i i

i O

O

O

pe e e q b

q

p

p

(54)

Again, for a perfect classifier with 0 , the difference

achieves its maximum at 2 22 Oe p and, therefore, it

completely cancels out the bias square part in the estimator

error. For random guess classifier with1

3 , this difference

become zero and no gain in the accuracy could be achieved.

The above algorithm requires parametric modeling of

outlier classes to obtain estimates on , and the prior

probabilities p . These parameters could be obtained from

labeled training data by various methods (e.g. EM algorithm).

The same training data could be used to train multiclass

classifier to assign labels according to outlier classes. In

addition, the classifier performance needs to be estimated and

summoned in a confusion matrix which might be done using

the “evaluation” data set.

In the next section, we describe the training and estimation

process along with simulations that are used for evaluating the

method. The simulations show that the method achieves a

significant improvement (up to 30%) in the accuracy of the

semi-coherent ToF estimation.

We conclude this section by noting that although the

described method tries to reduce the bias square part of the

error introduced by outliers, it does not attempt to account for

higher uncertainty which is usually associated with outlier

measurement. The inclusion of the variance of the

measurement as a measure of its uncertainty requires fusion of

information from multiple independent measurements which

is considered in the forthcoming paper.

V. THE ESTIMATION PROCESS AND RESULTS

The simulation process (Figure 2) consists of two phases.

During the preprocessing phase (blocks P1 through P4 in the

diagram), a large data set of randomly generated samples is

created and, then, different portions of this data set are used

for training the classifier and for the evaluation of the

classifier performance. During the estimation phase (blocks

E1 through E4 in the diagram), the ToF estimation for a new

simulated sample is computed using the classifier’s

predictions.

We start with the description of the preprocessing phase.

About 10000 data samples each representing a received signal

are simulated by adding random White Gaussian Noise

(WGN) to the selected narrowband signal (P1 at the diagram

and eq. 1). The noise is produced using ten post-filtered SNR

values corresponding to the semi-coherent range (3-8dB). The

samples of all SNR values are mixed together for all further

processing. A feature vector for each sample is extracted using

a bank of n phase shifted filters (similar to the feature

extraction step performed during the estimation phase). The

filters in the bank are generated using original source signal

waveform (eq.23). For an odd n, the values of n phase values

are selected from the interval :a a according to

12 where 1,...,

1k

ka a k n

n

(55)

Thus, k n k and1

2

0n , meaning that the middle

element in the feature vector corresponds to the Maximum

Likelihood Estimator. An n-valued feature vector

( ,..., )i n

t t t is formed by computing the locations of the

global maximum in n cross-correlations:

arg max ( )i i

t C t (56)

where ( )i

C t is the cross-correlation of thi filter with a

simulated sample (eq.24). The value returned by the

Maximum Likelihood Estimator (which is obtained using the

zero phase-shifted filter as in eq. 6) is subtracted from all

elements of t . Therefore, a corrected feature vector is

computed as

1

1 1 1

2 2

1 1 3

2 2

( ,..., )

( ,..., )

( ,..., ,0, ,..., )

n

n n n

n n n

c c c

c c c c

t t t

t t t t

t t t t

(57)

The subtraction of the MLE value is necessary as we are

interested in identifying an outlier by its locations relatively to

the location of the global maximum of the autocorrelation

function. Consequently, n-valued feature vectors always will

have zero as their middle element and therefore they actually

represent total of n-1 non-trivial features.

The MLE values are also used for computing the samples’

“true” labels according to the distance between the MLE and

the true value of time-delay. One out of m possible class labels

is assigned to a sample according to its distance to one out of

m selected peaks of autocorrelation function. Then, about a

half of the labeled data is used to train the decision trees

classifier (block P3 in the diagram) which accept n-valued

feature vector (eq. 57) and produces one out of m possible

class labels. The second half of the data is used for evaluating

the performance of the classifier, that is for computing the

confusion matrix and the posteriori class probabilities (eq. 42,

block P4 in the diagram).


10

The results presented in this paper are obtained using values

of 2

a

and 11,7,3n . The number of classes - m is set

to five with four outlier classes corresponding to four largest

sidelobes of the autocorrelation function (two at each side of

the main lobe) and the inlier class corresponding to the main

lobe of the autocorrelation function.

After the preprocessing phase, the ToF estimation for a new

sample proceeds in straightforward manner. The feature vector

for a sample is extracted by applying n phase-shifted filters

and computing corresponding biased estimates (block E1 in

the diagram). After the feature vector is corrected by

subtracting the MLE value (eq. 57), the vector is fed into the

classifier which produces one out of m possible class labels

(block E3 in the diagram). The expected bias is then computed

as described in the previous section using the classifier’s

confusion matrix and posteriori class probabilities (eq. 44).

The computed expected bias is then subtracted from the MLE

resulting in a bias-corrected ToF estimate.

For comparison purposes, the simpler fusion of individual

biased estimates is computed using a median statistics as

1

{ ,..., }nmediant median t t (57)

The resulting Root Mean Square Error (RMSE) normalized

by the signal central frequency is shown in the figure 3. The

filter bank of n=11 filters is used in this experiment. From the

graph, it is clear that the resulting error is significantly reduced

(up to 30%) by correcting the MLE estimator using the

expected bias (provided the estimation is carried-out in the

semi-coherent area). The Cramer-Rao Lower Bound (CRLB)

is shown at this figure at red. As expected, the ToF estimators

deviate rapidly from the CRLB as SNR approaches the

threshold value.

Figure 4 shows RMSE curves obtained using different

sizes of filter banks. Generally, the more filters are employed,

the more delicate interrogation of the signal is possible. Since

samples with all SNR levels are mixed together for training, a

finer analysis of the signal results in a larger generalization

error. This is illustrated in Figure 4 by the n=11 line having

better performance over the n=7 line in the region of higher

SNR levels. The relation is reversed as more “coarse” n=7

case gives smaller generalization error for lower SNR values.

The effect of the source signal bandwidth on the estimation

accuracy could be seen at Figure 5. In accordance with the

analysis in the section II (eq. 22), the probability of outliers

decreases when the fractional bandwidth is increased.

Therefore, the overall error and the improvement achieved by

the described method are smaller for estimation performed

using a source pulse with lower bandwidth.

The increase in the pulse’s central frequency leads to a

sharper main lobe and, thus, to an improvement in estimation

accuracy for inlier events. However, the sidelobes are also

becoming higher which cause an increase in the probability of

the outliers. Therefore, an improvement in the estimation

accuracy for low SNRs is still achievable by bias reduction as

could be seen at the Figure 6.

VI. SUMMARY AND CONCLUSIONS

In this paper, we have introduced a method for improving a

near-threshold ToF estimation. Since the threshold effect in

this problem emerges due to the multimodal shape of the

ensemble-average likelihood function, it is possible to reduce

the threshold effect through usage of a weak classifier. After

the classifier has been trained, its output could be used for

computing and, subsequently, subtracting from the MLE the

expected bias occurring due to these outliers. The second

contribution of this work introduced phase shifted unmatched

filters as a means to create a collection of biased estimators.

These estimators were used to generate a feature vector that

characterizes the maxima of likelihood function. The

simulation results showed the combined effect of these two

approaches on near-threshold MSE of Time of Flight MLE.

The problem of semi-coherent ToF estimation emerges

when there are various power and spectrum constraints

imposed by the application on the source signal. In particular,

the described methods improves the accuracy ToF estimation

carried out using pulses with low central frequency and of a

very limited bandwidth.

In the planned sequel papers, we address an additional

constraint which limits the total power spent on multiple

pulses and the associated problem of improving efficiency of

measurement per unit of the invested power.

FIGURES

Figure 1: Two symmetrically phase shifted unmatched filters and the source signal (top); the time-aligned cross correlation between the source signal and

unmatched filters (bottom)


11

Figure 2: The simulation process consists of the preprocessing phase (P1-

P4) and the estimation phase (E1-E4). During the preprocessing phase the

training and the evaluation data sets are generated (P1) and, then, are labeled

by appropriate inlier/outlier class labels (P2). The training data set is used for

training a weak classifier (P3) and the evaluation data set is used for

estimating the classifier’s confusion matrix (P4). During the estimation phase,

the feature vector is extracted from the sample (E1) and fed into the classifier (E3). The predicted class label is used for computing the expected bias and

subtracting it (E4) from the previously computed Maximum Likelihood

Estimator (E2).

Figure 3: The RMSE of the ToF MLE (blue), the bias-corrected estimator

(green) and the Cramer-Rao Low Bound for the ToF estimation. The bias-corrected ToF estimator improves the classical MLE by about 30%

Figure 4: The RMSE obtained using filter banks of different sizes. The n=11

(red) gives better performance at higher SNR as more delicate interrogation of

the signal required. On the other hand, it results in larger generalization error compared to more course estimator for n=7 (green)

Figure 5 : The RMSE curves for different values of fractional bandwidth. For

higher fractional bandwidth and the same central frequency the probability of

outliers is lower, thus, the overall error and the improvement achieved by bias reduction are smaller

Figure 6: The RMSE curves for different values of the central frequency but

the same bandwidth. As the frequency increases, the main lobe becomes sharper, leading to an increase in the estimation accuracy. However, the

probability of outliers is also increases due to higher sidelobes. Therefore, a

significant improvement is still achievable by bias reduction.

References

[1] E. Alpayadin, “Introduction to Machine Learning”, MIT Press

2004

[2] L. Angrisani, A.Baccigalupi et al. “A Measurement Method Based on Kalman Filtering for Ultrasonic Time-of-Flight Estimation”

IEEE Transactions on Instrumentation and Measurements, vol.

55(2), 2006 [3] A. Apartsin, LN. Cooper, N. Intrator. “Biosonar-inspired source

localization in low SNR,” In Proc. International Conference on

Bio-Inspired Systems and Signal Processing (BIOSIGNALS), 2011, pp. 399-404.

[4] A. Apartsin, LN. Cooper, N. Intrator. “SNR-dependant filtering for

Time of Arrival Estimation in High Noise,” in Proc. IEEE Machine Learning for Signal Processing (MLSP), 2010, pp. 427-

431.

[5] A. Apartsin, LN. Cooper, N. Intrator.“Semi-coherent Time of Arrival Estimation Using Regression.”The Journal of the Acoustic

Society of America, vol. 132(2), 2012, pp. 832-837. [6] E.W. Barankin. “Locally best unbiased estimates.” The Annals of

Mathematical Statistics, vol. 20, 1949, pp. 477-501.

[7] C. Chapman. Fundamentals of Seismic Wave Propagation. New York: Cambridge University Press, 2004.

[8] A. Elfes. “Sonar-based real-world mapping and navigation.” IEEE

Journal of Robotics and Automation, vol. 3(3), 1987, pp. 249-265. [9] B.M Sadler, R.J. Kozick. “A Survey of Time Delay Estimation

Performance Bounds,” in Fourth IEEE Workshop on Sensor Array

and Multichannel Processing, 2006, pp. 282-288. [10] S. M. Kay. Fundamental of Statistical Signal Processing. New

Jersey: Prentice Hall, 1993, 595 pages.


12

[11] S. S. Keerthi, C.J. Lin. “Asymptotic behaviors of support vector

machines with Gaussian kernel.” Neural Computation, vol.15 (7), 2003, pp. 1667-1689.

[12] T. Kimchi, M. Reshef, J. Terkel. ”Evidence for the use of reflected

self-generated seismic waves for spatial orientation in a blind subterranean mammal.” Journal of Experimental Biology, vol.

208(4), 2005, pp. 647-659.

[13] R.N. McDonough, A. D. Whalen. Detection of Signals in Noise. San Diego: Academic Press, 1995,495 pages.

[14] N. Neretti, N. Intrator, and LN. Cooper. “Adaptive pulse

optimization for improved sonar range accuracy.” IEEE Signal Processing Letters, vol. 11(4), 2004, pp. 409-412.

[15] N. Neretti, M.I. Sanderson, J.A Simmons, and N. Intrator. “Time-

frequency computational model for echo-delay resolution in sonar images of the big brown bat, Eptesicus fuscus.” Journal of the

Acoustic Society of America, vol. 113(4), 2003, pp. 2137-2145.

[16] D.R. Pauluzzi, N.C. Beaulieu. “A comparison of SNR estimation techniques for the AWGN channel.” IEEE Transaction on

Communications, vol. 48(10), 2000, pp. 1681-1691.

[17] A. Pinkus, J. Tabrikian.“Barankin bound for range and Doppler estimation using orthogonal signal transmission,” in Proc. of IEEE

Radar Conference, 2006, pp. 94–99.

[18] M. I. Sanderson, N. Neretti, N. Intrator, and J. A. Simmons. "Evaluation of an auditory model for echo delay accuracy in

wideband biosonar." The Journal of the Acoustical Society of

America, vol. 114, 2003, pp. 1648-1649. [19] J.A. Simmons and R. A. Stein. “Acoustic imaging in bat sonar:

Echolocation signals and the evolution of echolocation.” Journal of Comparative Physiology, vol. 135(1), 1980, pp. 61-68.

[20] M.I. Skolnik. Introduction to Radar System. New York: McGraw-

Hill, 1962, 772 pages. [21] GP. Succi, G. Prado, R. Gampert, and TK. Pedersen. “Problems in

Seismic detection and Tracking,” in Proc. of SPIE 2000, vol. 4040,

pp. 165-173. [22] Harry L. Van Trees. Detection, Estimation and Modulation

Theory. New York: John Wiley & Sons, Inc., 2004, 720 pages.

[23] L. R. Varshney and D. Thomas. “Side lobe reduction for matched filter range processing,” in Proc. of IEEE Radar Conference, 2003,

pp. 446-451.

[24] P.D. Wasserman. Advanced Methods in Neural Computing. New

York: Van Nostrand Reinhold, 1993, pp. 155–161.

[25] P. Woodward. Probability and Information Theory, with

Applications to Radar. New-York: McGraw-Hill, 1953,140 pages [26] L. Yu, N. Neretti, N. Intrator. “Multiple ping sonar accuracy

improvement using robust motion estimation and ping fusion.” The

Journal of the Acoustic Society of America, vol.119, 2006, pp. 2106-2113.

[27]

[28] ZJ. Yao, QH. Meng, M. Zeng, “Improvement in the accuracy of estimating the time-of-flight in an ultrasonic ranging system using

multiple square-root unscented Kalman filters”, Review of

Scientific Instruments,81.10 (2010)

Alexander Apartsin is a Ph.D. student at the Blavatnik School of Computer Science at Tel-Aviv University. His research focuses on

problems which require synthesis of machine learning and signal

processing techniques. Before joining Tel-Aviv University, Alexander Apartsin was at Weizmann Institute of Science where

he had received his M.Sc. degree for studying multiscale

segmentation algorithms.

Leon N Cooper is presently the Thomas J. Watson, Sr. Professor

of Science and Director of the Institute for Brain and Neural Systems at Brown University, Rhode Island, USA. He was a

Member of the Institute for Advanced Study and a professor at

various universities in the USA and Europe. He was a co-recipient of the 1972 Nobel Prize in Physics. His research interests include

nuclear physics, quantum field theory, superconductivity,

theoretical neuroscience, synaptic modification, and the biological basis of learning and memory storage, neural networks,

foundations of the quantum theory, algorithms for comparison of

gene data sets and tests of the linear-no-threshold hypothesis for low-level radiation damage.

Nathan Intrator received his PhD in applied mathematics from

Brown University in 1991. He is a Professor of Computer Science at Tel Aviv University and an adjunct Professor at the Institute for

Brain and Neural Systems at Brown University. Professor

Intrator’s research focuses on statistical and machine learning methods for signal processing and data fusion from multiple

experts. He has been studying biosonar extensively and the

connection between machine learning expert fusion and multiple decision fusion in biosonar. His earlier work included object

representation and recognition and dimensionality reduction via

exploratory projection pursuit methods. He developed various methods for estimating predictor confidence which can be easily

used for sensor fusion as well as classification and detection based

on multiscale representations. He currently applies his methods in various biomedical signal processing applications.

time of flight estimation in the presence of outliers part...

Documents