abstract arxiv:1612.00876v1 [cs.sd] 2 dec 2016 · response power and subspace methods, it does not...

FRIDA: FRI-BASED DOA ESTIMATION FOR ARBITRARY ARRAY LAYOUTS

Hanjie Pan†, Robin Scheibler†, Eric Bezzam†, Ivan Dokmanic‡, and Martin Vetterli†

†Ecole Polytechnique Federale de Lausanne (EPFL), Switzerland‡Institut Langevin, CNRS, ESPCI Paris, PSL Research University, France

{firstname.lastname}@epfl.ch, [email protected]

ABSTRACT

In this paper we present FRIDA—an algorithm for estimating di-rections of arrival of multiple wideband sound sources. FRIDAcombines multi-band information coherently and achieves state-of-the-art resolution at extremely low signal-to-noise ratios. Itworks for arbitrary array layouts, but unlike the various steeredresponse power and subspace methods, it does not require a gridsearch. FRIDA leverages recent advances in sampling signalswith a finite rate of innovation. It is based on the insight that forany array layout, the entries of the spatial covariance matrix canbe linearly transformed into a uniformly sampled sum of sinu-soids.

Index Terms— Direction of arrival, finite rate of innovation,subspace method, search-free, wideband sources

1. INTRODUCTION

A wishlist for a direction of arrival (DOA) estimator may looksomething like this: it should be high-resolution, work at lowsignal-to-noise ratios (SNRs), resolve many possibly closelyspaced sources, work with few arbitrarily laid out microphones,and do so efficiently, without grid searches.

It is uncommon to have all of these items checked at once.For example, the steered response power (SRP) methods [1] canbe made robust, do not require a specific array geometry, andare immune to coherence in signals. Because they are based onbeamforming though, they cannot resolve close sources [2].

Close sources can be resolved by the high-resolution DOAfinders. Their main representatives are subspace methods suchas MUSIC [3], Prony-type methods such as root-MUSIC [4], andmethods that attempt to compute the maximum likelihood (ML)estimator such as IQML [5].

Subspace methods exploit the fact that for uncorrelated sig-nal and noise, the eigenspace of the spatial covariance matrixcorresponding to largest eigenvalues is spanned by the sourcesteering vectors [3]. These methods are fundamentally narrow-band since the signal subspaces vary with frequency; they canbe made wideband either by incoherently combining narrowband

This work was supported by the Swiss National Science Foun-dation grant 20FP-1 151073 — Inverse problems regularized by spar-sity. ID was funded by LABEX WIFI (Laboratory of Excellencewithin the French Program “Investments for the Future”) under ref-erences ANR-10-LABX-24 and ANR-10-IDEX-0001-02 PSL* and byAgence Nationale de la Recherche under reference ANR-13-JS09-0001-01. All the code used to produce the results of this paper is available athttp://go.epfl.ch/FRIDA.

estimates or, better, by combining them coherently through trans-forming the array manifold at each frequency to a manifold at areference frequency (CSSM [6], WAVES [7]). These methodsrequire a search over space unless the array is a uniform lineararray (ULA) [8]. Coherent methods also require special “focus-ing matrices”, essentially initial guesses of the source locations.WAVES can do without focusing but at the cost of performance.In between coherent and incoherent methods is the TOPS algo-rithm [9], which performs well at mid-SNRs, but still requires asearch and performs worse than coherent methods at low SNRs.

We propose a new finite rate of innovation (FRI) sampling-based algorithm for DOA finding—FRIDA. Among the men-tioned algorithms, FRIDA is most reminiscent of IQML [5],especially for narrowband signals and ULAs. Unlike IQML,FRIDA works for arbitrary sensor geometries and for widebandsignals. Moreover, it uses multi-band information coherently.Still, it requires no grid search and no sensitive preprocessingakin to focusing matrices, and it achieves very high resolution atvery low SNRs, outperforming previous state-of-the-art.

FRIDA can work with fewer microphones than sources as ituses cross-correlations instead of raw microphone streams. Thetradeoff is that it is not able to handle completely correlated sig-nals. A straightforward modification of the algorithm which op-erates on raw signals rather than cross-correlations does not havethis issue, but it requires more microphones.

The main ingredient of FRIDA is an FRI sampling algorithm[10]. FRI sampling has recently been extended to non-uniformgrids along with a robust reconstruction algorithm [11]. The al-gorithm is an iterative algorithm similar to IQML, but with anadded spectral resampling layer and a modified stopping criterion(Section 2.2.3). The key insight is that the elements of the spa-tial correlation matrix can be linearly transformed into uniformlysampled sums of sinusoids, regardless of the array geometry.

2. NOTATION AND PROBLEM FORMULATION

Throughout the paper, matrices and vectors are denoted by boldupper and lower case letters. The Euclidean norm of a vector xis denoted by ‖x‖2 = (xHx)

1/2. We denote by S the unit circle.Unit propagation vectors will be denoted by p.

2.1. Source Signal and Measurements

2.1.1. Sources with Arbitrary Spatial Support

We assume a setup with Q microphones located at {rq ∈R2}Qq=1, and K monochromatic and uncorrelated point sources

1

arX

iv:1

612.

0087

6v1

[cs

.SD

] 2

Dec

201

6

http://go.epfl.ch/FRIDA

in the far-field indexed by the letter k. Each propagates in the di-rection of the unit vector pk ∈ S. Within a narrow band centeredat frequency ω, the baseband representation of the signal comingfrom direction p ∈ S reads x(p, ω, t) = x(p, ω)ejωt,wherex(p, ω) is the emitted sound signal by a source located at p andfrequency ω. The intensity of the sound field is then

I(p, ω)def= E

[|x(p, ω, t)|2

]= E

[|x(p, ω)|2

].

We assume frame-based processing, and the expectation is overthe randomness of x from frame to frame. As x carries the phase,the assumption E [x] = 0 holds. Note that at this point we didnot yet make the point source assumption.

The received signal at the q-th microphone located at rq isthe integration of all plane waves along the unit circle:

yq(ω, t) =

∫Sx(p, ω, t)e−jω〈p,

rqc 〉 dp for q = 1, · · · , Q,

where c is the speed of sound. In this paper, we will take as mea-surements the cross-correlations1 between the received signalsfor a microphone pair (q, q′):

Vq,q′(ω)def= E

[yq(ω, t)y

∗q′(ω, t)

](1)

for q, q′ ∈ [1, Q] and q 6= q′. In practice, Vq,q′ is estimated byaveraging over frames; for simplicity we use the same symbolfor the empirical version.

If we assume that these sources are spatially uncorrelated,then the cross-correlation reduces to:

Vq,q′ (ω)=

∫S

∫SE[x(p, t)x∗(p′, t)

]e−jω

⟨p,

rqc

⟩ejω

⟨p′,

rq′c

⟩dpdp′

=

∫SI(p, ω)e

−jω⟨p,∆rq,q′

⟩dp, (2)

where ∆rq,q′def=

rq−rq′c

is the normalized baseline.

2.1.2. Point Sources

Assume now that our source distribution is a sum of spatiallylocalized sources. We can write

x(p, ω) =

K∑k=1

αk(ω)√γφ(γ(p− pk)

)(3)

where φ(p) is a localized function with finite energy,∫φ2 dp =

1, and γ is the spatial scaling factor. To ensure that the k-thsource has finite power equal to σ2

k, we ask that∫E[|αk|2γφ2(γp)

]dp = σ2

k yielding E[|αk|2

]= σ2

k. Ifwe now denote the intensity corresponding to (3) by Iγ(p, ω),then in the point source limit we get

limγ→0

Iγ(p, ω) =: I(p, ω) =

K∑k=1

σ2k(ω)δ(p− pk),

1 Alternatively, we can estimate the DOA directly from the receivedmicrophone signals yq(ω, t). We leave the detailed discussions for afuture publication.

where σ2k(ω) = E

[|x(pk, ω)|2

], pk = (cosϕk, sinϕk)T, and

ϕk is the azimuth of the k-th source.With the above established, we rewrite the microphone

cross-correlations (2) as

Vq,q′(ω) =

K∑k=1

σ2k(ω)e−jω〈pk,∆rq,q′〉

for q 6= q′. Instead of the more conventional approach to FRIsampling where the microphone signals yq would be used as in-put [12], we use as input the correlations Vq,q′ . This effectivelyincreases the number of measurements and allows us to use asmall number of microphones.

2.2. Point Source Reconstruction

Following the generalized FRI sampling framework [11], we willfirst identify the set of unknown sinusoidal samples and its rela-tion with the given measurements (1). Then, the DOA estimationis cast as a constrained optimization (see e.g., (6)).

2.2.1. Relation between Measurements and the Uniform Sam-ples of Sinusoids

Since p is supported on the circle, we have the following Fourierseries representation for intensity:

I(p, ω) =∑m∈Z

Im(ω)Ym(p),

where Ym(p) is the Fourier series basis Ym(p) = Ym(ϕ) =

ejmϕ, and Im(ω) is the associated expansion coefficient for asub-band centered at frequency ω:

Im(ω) =1

2π

∫SI(p, ω)Y ∗m(p) dp=

1

2π

K∑k=1

σ2k(ω)e

−jmϕk. (4)

Notice that the Fourier series coefficients Im(ω) for m ∈ Z areuniform samples of sinusoids, which are related with the cross-correlation (2) as:

Vq,q′(ω) =

∫S

∑m∈Z

Im(ω)Ym(p)e−jω〈p,∆rq,q′〉 dp

(a)= 2π

∑m∈Z

(−j)mJm(‖ω∆rq,q′‖2)Ym

(∆rq,q′

‖∆rq,q′‖2

)Im (5)

where (a) is from Jacobi-Anger expansion [13] of the complexexponential and Jm(·) is Bessel function of the first kind.

Therefore, we establish a linear mapping from the uniformlysampled sinusoids Im to the given measurements Vq,q′ . Con-cretely, denote a lexicographically ordered vectorization of thecross-correlations Vq,q′(ω), q 6= q′ by a(ω) ∈ CQ(Q−1), andlet the vector b(ω) be the Fourier series coefficients Im(ω) form ∈ M, whereM is a set of considered Fourier coefficients2.Define also a Q(Q− 1)× |M| matrix G(ω) as

g(q,q′),m(ω)def= (−j)mJm(‖ω∆rq,q′‖2)Ym

(∆rq,q′

‖∆rq,q′‖2

),

2Note that these correspond to the spatial Fourier transform of I overthe circle, not to sources’ temporal spectra.

2

where rows of G are indexed by microphone pairs (q, q′), andcolumns of G are indexed by Fourier bins m. We can then con-cisely write (5) as a(ω) = G(ω)b(ω).

2.2.2. Annihilation on the Circle

Since Im in (4) is a weighted sum of uniformly sampled sinu-soids, we know that Im should satisfy a set of annihilation equa-tions [10]: Im ∗hm = 0. Here hm is the unknown annihilat-ing filters to be recovered. A polynomial, whose coefficients arespecified by the filter hm, has roots located at e−jϕk [10]. Thesource azimuths ϕk are subsequently reconstructed with polyno-mial root-finding.

In a multi-band setting, the uniform sinusoidal samplesIm(ω) are different for each sub-band. This is because, the sig-nal power σ2

k varies with the mid-band frequency ω in general.However, since we have the same source locations ϕk for eachsub-band, we only need to find one filter hm (depending solelyon the source locations ϕk) that annihilates Im(ω) for all ω-s:

Im(ω) ∗mhm = 0 ∀ω.

2.2.3. Reconstruction Algorithm

Following the discussion in the previous section, we reconstructthe source locations jointly across all sub-bands. More specif-ically, suppose we consider J sub-bands centered around fre-quencies {ωj}Jj=1. Then, we formulate the FRIDA estimate as asolution of the following constrained optimization:

minb1,··· ,bJ

h∈H

J∑i=1

∥∥ai −Gibi∥∥2

2

subject to bi ∗ h = 0 for i = 1, · · · , J,

(6)

Here ai, bi and Gi are the cross-correlation, uniform sinusoidalsamples, and the linear mapping between them for the i-th sub-band as specified in Section 2.2.1; H is a feasible set that theannihilating filter coefficients belong to, e.g., ‖h‖22 = 1.

Note that (6) is a simple quadratic minimization with respectto bi-s for a given annihilating filter h. By substituting the solu-tion of bi (in function of h), we end up with an optimization forh alone:

minh∈H

hHΛ(h)h, (7)

where

Λ(h)=

J∑i=1

TH(βi)[R(h)

(GHiGi

)−1RH(h)

]−1

T(βi).

Here βi = (GHiGi)

−1GHi ai; T(·) builds a Toeplitz matrix from

the input vector; and R(·) is the right-dual matrix associatedwith T(·) such that T(b)h = R(h)b, ∀b,h. This follows fromthe commutativity of convolution: b ∗ h = h ∗ b.

In general, it is challenging to solve (7) directly. We use an it-erative strategy, building Λ(h) with the reconstructed h from theprevious iteration. However, unlike similar approaches (e.g. [5]),we do not aim at obtaining a convergent solution of (7) but rathera valid solution such that the reconstructed sinusoidal samplesbi-s explain the given measurements up to a certain approxima-tion level (ε2):

∑Ji=1 ‖ai −Gibi‖22 ≤ ε2. Readers are referred

30 20 10 0 10SNR [dB]

0

20

40

60

80

Aver

age

Erro

r [◦]

A AlgorithmFRIDAMUSICSRP-PHATCSSMTOPSWAVES

3 4 6 8 11 16 22 32 45 64 90Separation angle [ ◦ ]

0.5

1.0

1.5

2.0

# so

urce

s res

olved

B

AlgorithmFRIDAMUSICSRP-PHATCSSMTOPSWAVES

Fig. 1. A Average DOA reconstruction error as a function of SNR. Loweris better. B Average number of sources reconstructed for the case of twosources separated by a fixed angle.

to [11] for detailed discussions on the algorithmic details, e.g.,choice of ε, implementation details, etc.

3. EXPERIMENTS

In this section, we demonstrate the effectiveness of the proposedalgorithm through numerical simulations and practical experi-ments. We compare the performance of FRIDA to that of otherwideband algorithms: incoherent MUSIC [3], SRP-PHAT [2],CSSM [6], WAVES [7], and TOPS [9].

The sampling frequency is fixed at 16 kHz. The narrow-band sub-carriers are extracted by a 256-point short-time Fouriertransform (STFT) with a Hanning window and no overlap. Weuse a triangular array of 24 microphones. Each edge is 30 cmlong and carries 8 microphones. The spacing of microphonesranges from 8 mm to 25 cm. This geometry is that of the Pyra-mic compact array designed at EPFL [14] and used to collect therecordings for the practical experiments, see Fig. 2A.

The number of frequency bands used (out of the 128 narrow-bands) is a key parameter for performance and was tuned foreach algorithm. FRIDA, MUSIC and SRP-PHAT use 20 bands,CSSM and WAVES 10 bands, and TOPS 60 bands. In the syn-thetic experiments, the source signals are all white noise to sim-plify the choice of the sub-bands. For speech recordings, theSTFT bins with the largest power are chosen. All implementa-tion details are in the supplementary material.

The reconstruction errors are quantified according to the dis-tance on the unit circle defined as

dS(ϕ, ϕ) = mins∈{±1}

s (ϕ− ϕ) mod 2π. (8)

For multiple DOA, the originals and their reconstructions arematched to minimize the sum of errors.(8)

3.1. Influence of Noise

We study the influence of noise on the algorithms through nu-merical simulation. One source playing white noise is placed atrandom on the unit circle. The propagation of sound is simulatedby applying fractional delay filters to generate the microphonesignals based on the array geometry. Finally, the algorithms arerun with additive white Gaussian noise of variance correspond-ing to a wide range of SNR. The algorithms are fed with 256snapshots of 256 samples each. It should be noted that 256 snap-shots correspond to a processing gain of about 24 dB. We run500 rounds of Monte-Carlo simulation for each SNR value.

3

0°5.5°

39.4°

20.1°

90.6°140.1°

Array

162°

195.7°

217.3° 269.2° 322.5°

A B

Set 2 Set 1+2Set 1

C D

Fig. 2. A Pyramic array, a compact microphone array with 48 MEMS microphones distributed on the edges of a tetrahedron. For the experiments, onlythe top triangle is used. B Locations of the loudspeakers and microphone array in experiments. C Reconstruction error for the different algorithmsapplied to the recorded speech signals. D Reconstruction of 10 sources from only 9 microphones. The average reconstruction error is within 2◦.

DOA FRIDA MUSIC SRP-PHAT

0◦ −0.5± 0.4◦ 1.6± 0.3◦ 1.4± 0.2◦

5.5◦ 4.6± 0.2◦ −93.9± 41.2◦ −38.1± 8.6◦

Table 1. The accuracy of the reconstruction for recordings with sourcesclosely located at 0◦ and 5.5◦. The mean is computed as the logarithmof the average of complex exponentials with argument given by the re-construction angle. The second number is the average distance (8) fromthe sample to the mean.

The simulation results in Fig. 1A show that FRIDA and MU-SIC are the most robust with a breaking points slightly below−20 dB. Next are SRP-PHAT and TOPS, breaking around 2 dBand 4 dB higher, respectively. While WAVES initially seems toperform as well as TOPS, it never reaches zero error. Least re-sistant to noise is CSSM, breaking down as early as −5 dB. Thepoor performance of WAVES and CSSM might be attributed topoor initial estimates of the focusing frequencies.

3.2. Resolving Close Sources

Next, we study the minimum angle of separation necessary toresolve distinct sources. We simulate two sources of white noiseat angles ϕ and ϕ + δ where δ is varied from 90◦ to 2.8◦. Theaverage error is then computed over ten realizations of the noisefor 120 values of ϕ. We mark a DOA as successfully recoveredif the reconstruction error is less than δ/2. This criterion seemscrude for large δ, but for small δ, where performance is critical,it is very stringent. Here again 256 snapshots are used and theSNR is set to 0 dB.

As seen in Fig. 1B, we find that FRIDA largely outperformsthe other algorithms. It always separates sources located as closeas 11.2◦, while the closest contenders, MUSIC and SRP-PHAT,have difficulties for sources closer than 22.5◦. The coherentmethods perform worse than the incoherent ones; they even suf-fer from a lack of precision in estimating a single source.

3.3. Experiments on Recorded Signals

Finally, we perform two experiments with recorded data to val-idate the algorithm in non-ideal, real-world conditions. In thefirst experiment, the Pyramic array is placed at the center of eightloudspeakers (Fig. 2B, Set 1). All the loudspeakers are between1.45 m and 2.45 m away from the array. Recordings are madewith all possible combinations of one, two, and three speakersplaying simultaneously (distinct) speech segments of 3 to 4 sec-

onds duration. Two of the speakers are located at 5.5◦ of eachother to test the resolving power of the algorithms.

The statistics of the reconstruction errors for the different al-gorithms are shown in Fig. 2C. We find the coherent methodsWAVES and CSSM to perform well for one and two sources, butbreak down for three sources. The TOPS method maintains anacceptable but somewhat imprecise performance for more thanone source. FRIDA, MUSIC and SRP-PHAT perform best witha median error within one degree from the ground truth. WhereFRIDA distinguishes itself from the conventional methods is forclosely spaced sources. This is highlighted in Table 1 wherethe average reconstructed DOA for the sources located at 0◦ and5.5◦ is shown. While all three methods correctly identify the firstsource, only FRIDA is able to resolve the second.

The second experiment tests the ability of FRIDA to resolvemore sources than microphones are used. We place ten loud-speakers (Fig. 2B, Set 2) around the Pyramic array and recordthem simultaneously playing white noise. Then, we discard thesignals of all but nine microphones and run FRIDA. The algo-rithm successfully reconstructs all DOA within 2◦ of the groundtruth, as shown in Fig. 2D. Note that none of the subspace meth-ods can achieve this result. While SRP-PHAT is not limited inthis way, its resolution is lower (its error is ∼ 4◦ on this record-ing).

4. CONCLUSION

We introduced FRIDA, a new algorithm for DOA estimation ofsound sources. FRIDA relies on finite rate of innovation sam-pling to do so efficiently on arbitrary array geometries, avoidingany costly grid search. Its ability to use wideband signal informa-tion makes it robust to many types of noise and interference. Wedemonstrate that FRIDA compares favorably to the state-of-the-art, and clearly outperforms all other algorithms when it comesto resolving close sources. Moreover, FRIDA is notable for re-solving more sources than microphones, as demonstrated exper-imentally on recorded signals. Besides the logical extension tothe full spherical case, we want to extend the algorithm to workon plane waves directly, rather than the cross-correlation coeffi-cients. This will allow the algorithm to handle correlated as wellas uncorrelated sources. Finally, it is important for practical pur-poses to improve the computational complexity of the algorithm.

Acknowledgement We are indebted to Juan Azcarreta Ortiz andRene Beuchat for their help with the Pyramic array.

4

5. REFERENCES

[1] J. Capon, “High-resolution frequency-wavenumber spec-trum analysis,” in Proceedings of the IEEE, 1969, pp.1408–1418.

[2] J. H. DiBiase, “A high-accuracy, low-latency techniquefor talker localization in reverberant environments usingmicrophone arrays,” Ph.D. dissertation, Brown University,Providence, RI, 2000.

[3] R. Schmidt, “Multiple emitter location and signal parame-ter estimation,” IEEE Transactions on Antennas and Prop-agation, vol. 34, no. 3, pp. 276–280, 1986.

[4] B. Friedlander, “The root-MUSIC algorithm for direc-tion finding with interpolated arrays,” Signal Processing,vol. 30, no. 1, pp. 15–29, 1993.

[5] Y. Bresler and A. Macovski, “Exact maximum likelihoodparameter estimation of superimposed exponential signalsin noise,” IEEE Transactions on Acoustics, Speech and Sig-nal Processing, vol. 34, no. 5, pp. 1081–1089, Oct. 1986.

[6] H. Wang and M. Kaveh, “Coherent signal-subspace pro-cessing for the detection and estimation of angles of ar-rival of multiple wide-band sources,” IEEE Transactions onAcoustics, Speech and Signal Processing, vol. 33, no. 4, pp.823–831, Aug. 1985.

[7] E. D. di Claudio and R. Parisi, “WAVES: weighted averageof signal subspaces for robust wideband direction finding,”IEEE Transactions on Signal Processing, vol. 49, no. 10,pp. 2179–2191, Oct. 2001.

[8] A. Barabell, “Improving the resolution performanceof eigenstructure-based direction-finding algorithms,” inIEEE International Conference on Acoustics, Speech, andSignal Processing (ICASSP’83), vol. 8. IEEE, 1983, pp.336–339.

[9] Y.-S. Yoon, L. M. Kaplan, and J. H. McClellan, “TOPS:new DOA estimator for wideband signals,” IEEE Transac-tions on Signal Processing, vol. 54, no. 6, pp. 1977–1989,May 2006.

[10] M. Vetterli, P. Marziliano, and T. Blu, “Sampling signalswith finite rate of innovation,” IEEE Transactions on SignalProcessing, vol. 50, no. 6, pp. 1417–1428, 2002.

[11] H. Pan, T. Blu, and M. Vetterli, “Towards generalised FRIsampling with an application to source resolution in ra-dioastronomy,” IEEE Transactions on Signal Processing,2016, submitted to.

[12] P. J. Hayuningtyas and P. Marziliano, “Finite rate of inno-vation method for DOA estimation of multiple sinusoidalsignals with unknown frequency components,” Radar Con-ference (EuRAD), pp. 115–118, 2012.

[13] D. Colton and R. Kress, Inverse acoustic and electromag-netic scattering theory. Springer Science & Business Me-dia, 2012, vol. 93.

[14] J. Azcarreta Ortiz, “Pyramic array: An FPGA based plat-form for many-channel audio acquisition,” Master’s thesis,EPFL, Lausanne, Switzerland, Aug. 2016.

5

abstract arxiv:1612.00876v1 [cs.sd] 2 dec 2016 · response power and subspace methods, it does not...

Documents