domains of practical applicability for parametric ......1.1 time-frequency analysis (tfa) method in...

12
PAPERS J. G. Tylka and E. Y. Choueiri, “Domains of Practical Applicability for Parametric Interpolation Methods for Virtual Sound Field Navigation” J. Audio Eng. Soc., vol. 67, no. 11, pp. 882–893, (2019 November.). DOI: https://doi.org/10.17743/jaes.2019.0038 Domains of Practical Applicability for Parametric Interpolation Methods for Virtual Sound Field Navigation JOSEPH G. TYLKA, AES Student Member ([email protected]) , AND EDGAR Y. CHOUEIRI, AES Associate Member Princeton University, Princeton, NJ, USA Suitable domains are established for the practical application of two state-of-the-art paramet- ric interpolation methods for virtual navigation of ambisonics-encoded sound fields. Although several navigational methods have been developed, existing studies rarely include comparisons between methods and, significantly, practical assessments of such methods have been limited. To that end, the errors introduced by both methods are objectively evaluated, in terms of metrics for sound level, spectral coloration, source localization, and diffuseness, through nu- merical simulations. Various practical domains are subsequently identified, and guidelines are established with which to choose between these methods based on their intended application. Results show that the first method, which entails a time-frequency analysis of the sound field, is preferable for large-area recordings and when spatial localization accuracy is critical, as this method achieves superior localization performance (compared to the second method) with sparsely distributed microphones. However, the second method, which parametrically excludes from the interpolation any microphones that are farther from the listening position than is any source, is shown to be more suitable for applications in which sound quality attributes such as coloration and diffuseness are critical, since this method achieves smaller spectral errors with sparsely distributed microphones and smaller diffuseness errors under all conditions. 0 INTRODUCTION Given an ambisonics-encoded sound field (i.e., a sound field that has been decomposed into spherical harmonics), virtual navigation enables a listener to explore the recorded space and, ideally, experience a spatially and tonally accu- rate perception of the sound field. One application of such a procedure is to reproduce (e.g., over headphones), from an arbitrary vantage point, an acoustically recorded scene. This would allow, for example, a listener to experience a recording of an orchestral performance from elsewhere in the original venue. A well-known limitation of the ambisonics framework is that a finite-order expansion of a sound field yields only an approximation to that sound field, the accuracy of which decreases with increasing frequency and distance from the expansion center [1]. Consequently, the navigable region of such a sound field is inherently restricted. Indeed, existing techniques for virtual navigation using a single ambisonics microphone 1 have been shown to introduce spectral dis- 1 Here, we use the term “ambisonics microphone” to refer to any array of microphone capsules (typically arranged on the surface of a sphere or tetrahedron) that captures ambisonics signals. tortions [2] and degrade localization [3, 4] as the listener navigates farther away from the expansion center. Furthermore, according to theory, the ambisonics ex- pansion provides a mathematically valid description of the sound field only in the free field, thereby effectively cre- ating a spherical region of validity (also known as the re- gion of convergence), which is centered on the recording microphone and extends up to the nearest sound source (or scattering body) [5, Sec. 6.8]. Consequently, near-field sources may pose a significantly limiting problem to navi- gation, although the particular degradations in sound qual- ity (e.g., in terms of spatial or tonal fidelity) that might result from violating this region of validity restriction are unclear. In an effort to overcome these challenges, several au- thors have developed both parametric navigational meth- ods, which leverage additional information about the sound field (e.g., known or inferred source positions) beyond the recorded ambisonics signals, and interpolation-based nav- igational methods, which employ an array of ambisonics microphones (of first-order or higher) distributed through- out the sound field. As it is outside the scope of this work to review all of these methods in detail, the interested reader is referred to Refs. [68] and the works cited below. 882 J. Audio Eng. Soc., Vol. 67, No. 11, 2019 November

Upload: others

Post on 23-Aug-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Domains of Practical Applicability for Parametric ......1.1 Time-Frequency Analysis (TFA) Method In the method proposed by Thiergart et al. [13], the sound field is first analyzed

PAPERSJ. G. Tylka and E. Y. Choueiri, “Domains of Practical Applicabilityfor Parametric Interpolation Methods for Virtual Sound Field Navigation”J. Audio Eng. Soc., vol. 67, no. 11, pp. 882–893, (2019 November.).DOI: https://doi.org/10.17743/jaes.2019.0038

Domains of Practical Applicability for ParametricInterpolation Methods for Virtual Sound

Field Navigation

JOSEPH G. TYLKA, AES Student Member([email protected])

, AND EDGAR Y. CHOUEIRI, AES Associate Member

Princeton University, Princeton, NJ, USA

Suitable domains are established for the practical application of two state-of-the-art paramet-ric interpolation methods for virtual navigation of ambisonics-encoded sound fields. Althoughseveral navigational methods have been developed, existing studies rarely include comparisonsbetween methods and, significantly, practical assessments of such methods have been limited.To that end, the errors introduced by both methods are objectively evaluated, in terms ofmetrics for sound level, spectral coloration, source localization, and diffuseness, through nu-merical simulations. Various practical domains are subsequently identified, and guidelines areestablished with which to choose between these methods based on their intended application.Results show that the first method, which entails a time-frequency analysis of the sound field,is preferable for large-area recordings and when spatial localization accuracy is critical, asthis method achieves superior localization performance (compared to the second method) withsparsely distributed microphones. However, the second method, which parametrically excludesfrom the interpolation any microphones that are farther from the listening position than is anysource, is shown to be more suitable for applications in which sound quality attributes such ascoloration and diffuseness are critical, since this method achieves smaller spectral errors withsparsely distributed microphones and smaller diffuseness errors under all conditions.

0 INTRODUCTION

Given an ambisonics-encoded sound field (i.e., a soundfield that has been decomposed into spherical harmonics),virtual navigation enables a listener to explore the recordedspace and, ideally, experience a spatially and tonally accu-rate perception of the sound field. One application of sucha procedure is to reproduce (e.g., over headphones), froman arbitrary vantage point, an acoustically recorded scene.This would allow, for example, a listener to experience arecording of an orchestral performance from elsewhere inthe original venue.

A well-known limitation of the ambisonics framework isthat a finite-order expansion of a sound field yields only anapproximation to that sound field, the accuracy of whichdecreases with increasing frequency and distance from theexpansion center [1]. Consequently, the navigable region ofsuch a sound field is inherently restricted. Indeed, existingtechniques for virtual navigation using a single ambisonicsmicrophone1 have been shown to introduce spectral dis-

1 Here, we use the term “ambisonics microphone” to refer to anyarray of microphone capsules (typically arranged on the surfaceof a sphere or tetrahedron) that captures ambisonics signals.

tortions [2] and degrade localization [3, 4] as the listenernavigates farther away from the expansion center.

Furthermore, according to theory, the ambisonics ex-pansion provides a mathematically valid description of thesound field only in the free field, thereby effectively cre-ating a spherical region of validity (also known as the re-gion of convergence), which is centered on the recordingmicrophone and extends up to the nearest sound source(or scattering body) [5, Sec. 6.8]. Consequently, near-fieldsources may pose a significantly limiting problem to navi-gation, although the particular degradations in sound qual-ity (e.g., in terms of spatial or tonal fidelity) that mightresult from violating this region of validity restriction areunclear.

In an effort to overcome these challenges, several au-thors have developed both parametric navigational meth-ods, which leverage additional information about the soundfield (e.g., known or inferred source positions) beyond therecorded ambisonics signals, and interpolation-based nav-igational methods, which employ an array of ambisonicsmicrophones (of first-order or higher) distributed through-out the sound field. As it is outside the scope of this work toreview all of these methods in detail, the interested readeris referred to Refs. [6–8] and the works cited below.

882 J. Audio Eng. Soc., Vol. 67, No. 11, 2019 November

Page 2: Domains of Practical Applicability for Parametric ......1.1 Time-Frequency Analysis (TFA) Method In the method proposed by Thiergart et al. [13], the sound field is first analyzed

PAPERS PARAMETRIC NAVIGATIONAL METHODS

0.1 Previous Work and Remaining ProblemsSeveral of the recently developed linear interpolation-

based navigational methods have been evaluated experi-mentally and have shown promising results. Patricio et al.[9] proposed a modified linear interpolation method inwhich the directional components of the microphone near-est to the listener are emphasized over those of the far-ther microphones. In their study, the authors experimentallydemonstrated that the proposed distance-biasing approachachieves plausible source localization and perception oflistener movement.

Grosche et al. [10] recently developed a method that em-ploys multiple distinct virtual loudspeaker arrays (VLAs),each corresponding to a different ambisonics (or other)recording microphone and reproducing the sound field ascaptured by that microphone. The superposition of the re-produced signals from all of the VLAs is then rendered forthe listener at an arbitrary desired position in the virtualsound field. This method has been evaluated via localiza-tion tests [11], which showed a significant improvementin performance over a basic weighted-average interpola-tion approach, and Deppisch and Sontacchi [12] have de-veloped a browser-based implementation of the method.Comparisons between these linear methods under compa-rable conditions, however, have not been conducted. Asdescribed subsequently, in the present article we compre-hensively compare two parametric methods, the procedurefor which may serve as a template for conducting similarcomparisons in the future.

Several of the parametric methods that have been devel-oped entail a directional analysis of the recorded signals(often carried out in the time-frequency domain) in orderto construct a navigable model of the incident sound field[13–15]. While such methods might yield accurate sourcelocalization, some have been shown to introduce minordegradations of sound quality [14, Sec. 5.3] and they maynot be suitable for dense or highly reverberant environments[13, Sec. II]. In a recent publication [16], we developed analternative method that parametrically selects for the inter-polation a suitable subset of microphones to ensure that theregion of validity restriction for each included microphoneis not violated. The differences in performance betweenthese two parametric methods under the same conditionshave not been established, so we seek guidance for choosingbetween them in various domains of practical application.

To that end, we evaluate and compare the performanceof these two state-of-the-art parametric interpolation meth-ods, referred to here as the time-frequency analysis (TFA)method [13] and the valid microphone interpolation (VMI)method [16]. First, in Section 1, we review the formulationof each of these methods. We then describe, in Section 2,the numerical simulations conducted in this study and theobjective metrics used to evaluate the errors introduced byeach method in terms of sound level, spectral coloration,source localization, and diffuseness. We then present anddiscuss in Section 3 the results of these simulations, fromwhich we identify in Section 4 considerations regarding thesuitability of each method to various applications. Relevantambisonics theory is reviewed in Appendix A.

The present article is intended to complement the find-ings of our previous study [16], in which the VMI method isdescribed in detail and fundamental aspects of the methodare demonstrated through numerical proof-of-concept anal-yses. Compared to its original implementation [16], theVMI method is largely unchanged (with the exception of thetwo-band approach; see Section 1.2.2). However, comparedto previous analyses, the present analyses are significantlymore comprehensive, several additional performance met-rics are employed, and the comparative analysis of the twoparametric methods is entirely new.

1 REVIEW OF NAVIGATIONAL METHODS

As is common in ambisonics, we adopt Cartesian andspherical coordinate systems in which, for a listener posi-tioned at the origin, the +x-axis points forward, the +y-axispoints to the left, and the +z-axis points upward. Corre-spondingly, r is the (non-negative) radial distance from theorigin, θ ∈ [ − π/2, π/2] is the elevation angle above thehorizontal (x-y) plane, and φ∈ [0, 2π) is the azimuthal anglearound the vertical (z) axis, with (θ, φ) = (0, 0) correspond-ing to the +x direction and (0, π/2) to the +y direction. For aposition vector �r = (x, y, z), we denote the correspondingunit vector by r ≡ �r/r .

Generally, we seek the ambisonics signals, up to a givenorder Lout, that describe the sound field in the vicinity of alistener located at �r0. Here, we use real-valued orthonormal(N3D) spherical harmonics, as given by Zotter [17, Sec.2.2], and we adopt the ambisonics channel number (ACN)convention [18] such that, for a spherical harmonic functionof degree l ∈ [0, ∞) and order m ∈ [ − l, l], the ACN indexn is given by n = l(l + 1) + m and the spherical harmonicfunction is denoted by Yn.

For a finite-order ambisonics expansion about �r0, theacoustic potential field (defined as the Fourier transform ofthe acoustic pressure field) is given by

ψ(k, �r + �r0) =Nout−1∑

n=0

4π(−i)l An(k) jl(kr )Yn(r ), (1)

where k is the angular wavenumber, An is the nth desiredambisonics signal, jl is the spherical Bessel function oforder l, and Nout = (Lout + 1)2 is the number of terms in theexpansion (cf., Eq. (A.31) in Appendix A).

We take as inputs to each navigational method P sets ofmeasured ambisonics signals, B[p]

n , up to order Lin, that de-scribe the sound field in the vicinity of the pth microphone,which is located at �u p, ∀p ∈ [1, P]. This local potential fieldis given by

ψ(k, �r + �u p) =Nin−1∑n=0

4π(−i)l B[p]n (k) jl(kr )Yn(r ), (2)

where Nin = (Lin + 1)2.

1.1 Time-Frequency Analysis (TFA) MethodIn the method proposed by Thiergart et al. [13], the

sound field is first analyzed in the time-frequency domain

J. Audio Eng. Soc., Vol. 67, No. 11, 2019 November 883

Page 3: Domains of Practical Applicability for Parametric ......1.1 Time-Frequency Analysis (TFA) Method In the method proposed by Thiergart et al. [13], the sound field is first analyzed

TYLKA AND CHOUEIRI PAPERS

and subsequently modeled as a finite set of monochromaticomnidirectional point sources. As will become clear below,this method can only use the first-order ambisonics signals,so we must have Lin = 1. The output signals, however, canbe computed to an arbitrary order Lout. Below, we describeour implementation of this method.

1.1.1 Time-Frequency Sound Field AnalysisFor each microphone, we first compute the short-time

Fourier transform (STFT) of each of the first four ambison-ics signals, which gives B[p]

n (ξ, κ) for n ∈ [0, 3], where ξ

and κ are the time and frequency indices, respectively. Typ-ically, we take an overlap fraction of R = 0.5, and set theFFT length to be

NFFT = 2�log2( Fs1−R

�c )�, (3)

where � · � denotes rounding up to the nearest integer, Fs isthe sampling rate of the system, � is the distance betweenmicrophones (defined later in Section 2), and c ≈ 343 m/sis the speed of sound. For the STFT analysis window, wechoose a Hamming window [19] of length NFFT.

Using the transformed signals from each microphone,we then compute the acoustic intensity vector, �ν[p]

I (ξ, κ),given by [20, Eq. (11)]

�νI(ξ, κ) =√

2

ρ0cRe

{W (ξ, κ) �X (ξ, κ)

}, (4)

where (·) and Re{ · } denote taking the complex conjugateand the real part of the argument, respectively; W and �X =[X Y Z

]are defined below in Eq. (28); and ρ0 ≈ 1.225

kg/m3 is the density of air.At each time-frequency bin, we then triangulate a single

“effective” source (which may or may not coincide witha real source). For two microphones and with sources re-stricted to the horizontal plane, triangulation is computedas follows:

�s0(ξ, κ) = �u1 + c1ν[1]I (ξ, κ) = �u2 + c2ν

[2]I (ξ, κ),

=⇒ �u2 − �u1 = c1ν[1]I (ξ, κ) − c2ν

[2]I (ξ, κ),

(5)

where �s0 is the triangulated source position and c1 and c2

are scalars found for each time-frequency bin. These scalarsare computed by[

c1

c2

]=

[cos φ

[1]I − cos φ

[2]I

sin φ[1]I − sin φ

[2]I

]−1

·[

x2 − x1

y2 − y1

], (6)

where φ[p]I denotes the azimuth of �ν[p]

I and xp and yp denotethe x and y components of �u p, respectively. Note that thematrix inversion in Eq. (6) fails when φ

[1]I = φ

[2]I , i.e., when

the intensity vectors are parallel. A more general approachfor source triangulation, either in three dimensions or for P> 2 microphones (or both), is described by Thiergart et al.[13, Sec. IV.A]. It is worth noting that this triangulationstep assumes that the sound field consists of a finite num-ber of discrete sources that can be easily separated (i.e.,sources that are far enough apart or not emitting sound si-multaneously) [13, Sec. II]; consequently, the performance

of this method may suffer in dense or highly reverberantenvironments.

Finally, we compute the acoustic potential, given by

ψ[p](ξ, κ) =√

4πB[p]0 (ξ, κ), (7)

and the diffuseness parameter, �[p](ξ, κ), as given belowin Eq. (29) and described in Section 2.2.4. (Note that, ifwe were not using orthonormal N3D spherical harmonics,an additional normalization coefficient would be needed inthe above equation.)

1.1.2 Sound Field Modeling and SynthesisThe estimated ambisonics output signals are assembled

in the time-frequency domain as follows. For a given lis-tener position �r0, we let �s0

′ = �s0 − �r0 denote the positionof the triangulated source relative to the listener for eachtime-frequency bin. Additionally, we choose a referencemicrophone with index p = pref , such that the position ofthe triangulated source relative to the reference microphoneis given by �spref = �s0 − �u pref . By default, we choose as thereference the nearest microphone to the listener. We furtherdefine a direct-to-diffuse sound ratio parameter, given by

�(ξ, κ) = 1

�[pref](ξ, κ)− 1, (8)

as well as direct and diffuse components of the sound field,given by

Sdir =√

�(ξ, κ)

1 + �(ξ, κ)

ψ[pref]

ikh0(kspref ), (9)

Sdiff =√

1

1 + �(ξ, κ)ψ[pref], (10)

respectively, where h0 is the zeroth-order outgoing sphericalHankel function.

From these, we compute the ambisonics output signalsup to order Lout by

An(ξ, κ) = i l+1khl(ks0′)Yn(s0

′)Sdir(ξ, κ)

+ 1√4π

Sdiff(ξ, κ), (11)

where we have used Eq. (A.32) to encode the direct point-source components and applied the factor of 1/

√4π to en-

code the diffuse sound components, thereby compensatingfor the “directivity” of each ambisonics channel. (This fac-tor is only independent of channel for orthonormal sphericalharmonics.) Each of these signals is finally converted intothe time domain via an inverse STFT.

1.2 Valid Microphone Interpolation (VMI) MethodBelow, we describe our previously proposed parametric

navigational method, which comprises two components(1) A parametric method for excluding “invalid” mi-

crophones from the interpolation calculation based on esti-mated source positions, and

(2) A two-band implementation of regularized least-squares interpolation filters.

884 J. Audio Eng. Soc., Vol. 67, No. 11, 2019 November

Page 4: Domains of Practical Applicability for Parametric ......1.1 Time-Frequency Analysis (TFA) Method In the method proposed by Thiergart et al. [13], the sound field is first analyzed

PAPERS PARAMETRIC NAVIGATIONAL METHODS

The method (as originally implemented, without the two-band analysis) and its performance are discussed in moredetail by Tylka and Choueiri [16].

1.2.1 Source Position and Microphone ValidityAccording to theory (see Appendix A), ambisonics sig-

nals provide a valid description of the captured sound fieldonly in a spherical free-field region around the ambison-ics microphone that extends up to the nearest source orobstacle. Consequently, in order to determine the set ofmicrophones for which the listening position is valid, wefirst localize any near-field sources. Several methods foracoustically localizing sources using signals from two ormore ambisonics microphones are discussed by Zheng [14,Ch. 3]; such methods often involve triangulation via acous-tic intensity vectors, as described in Section 1.1.1.

Once the locations of any near-field sources are deter-mined, we compute the distances from each microphoneto its nearest source and to the listening position. Onlythose microphones that are nearer to the listening po-sition than to any near-field source are included in theinterpolation calculation (i.e., all microphones such thatrp = ‖�r0 − �u p‖ < ‖�s0 − �u p‖ = sp). As described in thefollowing section, a matrix of interpolation filters is thencomputed and applied to the signals from the remainingvalid microphones.2

1.2.2 Regularized Least-Squares InterpolationWe first define vectors of ambisonics signals, given by

a =

⎡⎢⎢⎢⎣

A0

A1...

ANout−1

⎤⎥⎥⎥⎦ , bp =

⎡⎢⎢⎢⎢⎣

B[p]0

B[p]1...

B[p]Nin−1

⎤⎥⎥⎥⎥⎦ . (12)

We also define a matrix of interpolation weights, givenby

W =

⎡⎢⎢⎢⎢⎣

√w1 I 0 · · · 0

0√

w2 I. . .

......

. . .. . . 0

0 · · · 0√

wP I

⎤⎥⎥⎥⎥⎦ , (13)

where wp is the interpolation weight for the pth microphone;I is the Nin × Nin identity matrix; and 0 is an Nin × Nin matrixof zeros. Here, we compute the weights wp using a standardlinear interpolation scheme.

We then pose interpolation as an inverse problem, inwhich we consider the ambisonics signals at the listeningposition and, using matrices of ambisonics translation co-efficients, we write a system of equations simultaneouslydescribing the ambisonics signals at all P valid (first- or

2 In practice, as the listener traverses the navigable region, thenumber of valid microphones may change. Consequently, oneshould crossfade between the filters for successive audio framesto prevent any audible discontinuities.

higher-order) ambisonics microphones.3 That is, for eachfrequency, we write

W · M · a = W · b, (14)

where, omitting frequency dependencies, we let

M =

⎡⎢⎢⎢⎣

T(−�r1)T(−�r2)

...T(−�rP )

⎤⎥⎥⎥⎦ , b =

⎡⎢⎢⎢⎣

b1

b2...

bP

⎤⎥⎥⎥⎦ , (15)

where �rp is the vector from the pth microphone to the lis-tening position, given by �rp = �r0 − �u p, and T is a matrixof ambisonics translation coefficients, which we computeusing the recurrence formulae given by Gumerov and Du-raiswami [22, Sec. 3.2], Zotter [17, Ch. 3], and Tylka andChoueiri [23]. Essentially, this formulation allows us tocompute the ambisonics signals at the desired listening po-sition that “best explain” (in a least-squares sense) the mea-sured signals, while weighting most heavily those signalsfrom the microphone nearest to the listening position.

Next, we compute the singular value decomposition ofMw = (W · M), such that Mw = U�V∗, where ( · )* rep-resents conjugate-transposition. This allows us to computea regularized pseudoinverse of Mw, given by [24, Sec. 5.1]

L = V��+U∗, (16)

where ( · )+ represents pseudoinversion, and � is a square,diagonal matrix whose elements are given by

�nn = σ2n

σ2n + β

. (17)

Here, σn is the nth singular value of Mw and β is afrequency-dependent regularization parameter, for whichwe choose a high-shelf filter profile (cf. Tylka and Choueiri[16, Eq. (17)]).

Rearranging Eq. (14) yields an estimate of a, given by

a = L · W · b, (18)

which we apply below some critical wavenumber k0. Ide-ally, as Lin → ∞, we should find a → a (the exact ambison-ics signals of the sound field at �r0). Above k0, we computea weighted average,4 given by

a = [w1I w2I · · · wP I

] · b, (19)

where now I is the Nout × Nin identity matrix.Finally, the combined interpolated signals are given by

a ={

L · W · b for k < k0,[w1I w2I · · · wP I

] · b, for k ≥ k0,(20)

3 Samarasinghe et al. perform a similar derivation for a two-dimensional sound field [22, Sec. III.A].

4 Future refinements to this method might instead employ one ofthe state-of-the-art linear interpolation methods [10, 9] (providedthat we have P > 1 valid microphones), which have been shownto outperform the basic weighted-average interpolation approachadopted here.

J. Audio Eng. Soc., Vol. 67, No. 11, 2019 November 885

Page 5: Domains of Practical Applicability for Parametric ......1.1 Time-Frequency Analysis (TFA) Method In the method proposed by Thiergart et al. [13], the sound field is first analyzed

TYLKA AND CHOUEIRI PAPERS

x

y

�s0�s1 �s2

ϕ0ϕ1 ϕ2

◦ ◦Δ

ϕ0 ϕ

◦◦◦◦ ◦◦◦◦Δ

Fig. 1. Diagram of a two-microphone array (empty circles) witha single source (filled circle). The shaded gray disk indicates theinterior region, where r < �/2. The jagged line segment indicatesthe navigable region, where y ∈ [ − �/2, �/2] and x = z = 0.

where we have chosen

k0 =⎧⎨⎩

1/r1, for P = 1,

�/(r1r2), for P = 2,

1/ maxp∈[1,P] rp, otherwise,(21)

which we found empirically to perform well in terms ofspectral errors.

It is worth noting that, for large distances, this criticalwavenumber may correspond to very low frequencies, suchthat the above computation in Eq. (20) simplifies to justa weighted-average interpolation. For example, considera pair of microphones separated by a distance of � = 2m and a desired listener position at the midpoint, such thatr1 = r2 = 1 m. In this configuration, if only one microphoneis valid (i.e., P = 1), the frequency corresponding to k0

is given by c/(2πr1) ≈ 54.6 Hz; if both microphones arevalid (i.e., P = 2), this frequency is given by c�/(2πr1r2)≈ 109.2 Hz.

2 SIMULATIONS AND METRICS

Consider a linear microphone array geometry, illus-trated in Fig. 1, in which a pair of microphones (P = 2)are separated by a distance �, equidistant from the originand placed along the lateral y-axis, such that their posi-tions are given by �u1 = (0,�/2, 0) and �u2 = (0,−�/2, 0).We define the navigable region as the segment of the y-axis connecting the two microphone positions, i.e., all lis-tener positions �r0 = (0, y0, 0) where y0 ∈ [ − �/2, �/2].We further define a nondimensional geometrical parameterγ = r/(�/2), and refer to the region with γ > 1 as the exte-rior region and that with γ < 1 as the interior region (seeFig. 1).

A single point source is placed on the horizontal planeat �s0 = (s0 cos ϕ0, s0 sin ϕ0, 0). From the position of the pth

microphone, the apparent source position is given by �sp =�s0 − �u p = (sp cos ϕp, sp sin ϕp, 0), such that the apparent

source azimuth is ϕp and the relative source distance fromthe microphone is sp.

2.1 Simulation ParametersWe simulate recording of the sound field depicted in

Fig. 1 for a range of microphone spacings, � ∈ [0.1, 10] m,and all source distances s0 = γ�/2 for γ ∈ [0.1, 10]. In eachsimulation, we vary the source azimuth from ϕ0 = 0◦ to 90◦

in increments of 5◦ and generate, using Eq. (A.32), an arti-ficial ambisonics impulse response at the microphone. Wethen compute, using each method, the ambisonics impulseresponses at listener positions from y0 = −�/2 to +�/2,taken in 20 equal increments. Note that for the TFA methodonly, we intentionally omit source azimuths of 90◦ since,for ϕ0 = ±90◦, the source becomes collinear with the mi-crophones and consequently the triangulation calculation(see Eq. (6)) can no longer produce a unique solution.

In all simulations, we choose Lin = Lout = 1.5 Althoughthe VMI method has been derived for an arbitrary Lin

(whereas the TFA method has only been derived for first-order ambisonics input signals; see Section 1.1), it can beverified that the performance of that method does not varysignificantly with input order due to our order-independentchoice of critical wavenumber (see Eq. (21)). The samplingrate is 48 kHz and all impulse responses are calculatedwith 16,384 samples (≈341 ms). Additionally, we filterall point-source ambisonics impulse responses with order-dependent near-field compensation high-pass filters, givenfor the lth-order ambisonics signals by

Hl( f ) = 1 − 1√1 +

(ffl

)l, (22)

where fl is the corner frequency of the lth filter, which wechoose to be fl = (200 × l) Hz.

2.2 MetricsWe quantify the errors incurred through navigation by

each method using the following metrics:

1. The level error, eλ, of the mean audible energy(MAE), as given in Section 2.2.1,

2. The range, ρη, of the auditory band spectral error(ABSE), as given in Section 2.2.2,

3. The localization error, eν, for the precedence-effectlocalization vector, as given in Section 2.2.3, and

4. The error, e� , in the diffuseness parameter, as givenin Section 2.2.4.

For each simulation presented here, we average theseerror metrics over the entire navigable region (as definedabove) and all source azimuths, for specified combinationsof source distance s0 and microphone spacing �.

5 Note that, for the metrics listed in Section 2.2, only the lo-calization model (described in Section 2.2.3) depends on Lout; allof the other metrics, by construction, use only the zeroth and firstorder signals.

886 J. Audio Eng. Soc., Vol. 67, No. 11, 2019 November

Page 6: Domains of Practical Applicability for Parametric ......1.1 Time-Frequency Analysis (TFA) Method In the method proposed by Thiergart et al. [13], the sound field is first analyzed

PAPERS PARAMETRIC NAVIGATIONAL METHODS

2.2.1 Mean Audible Energy (λ)We define the mean audible energy (MAE), λ, of an

ambisonics signal as the average energy of the zeroth-orderterm across a set of critical bands, i.e.,

λ = 10 log10

(1

Nb

Nb∑c=1

∫ |H�( f ; fc)||A0( f )|2d f∫ |H�( f ; fc)|d f

), (23)

where H�( f ; fc) is the transfer function of a gammatonefilter6 (which approximates critical bands), with center fre-quency fc for c ∈ [1, Nb], and integration is taken over allfrequencies f. Here, we choose fc to be a set of ERB-spaced(equivalent rectangular bandwidth) center frequencies [26]spanning the range fc ∈ [50 Hz, 21 kHz]. We further definethe level error, given in dB by

eλ = λ − λ, (24)

where λ is the MAE for a reference signal and λ is that fora translated signal.

2.2.2 Auditory Band Spectral Error (η)The auditory band spectral error (ABSE), adapted from

Scharer and Lindau [27, Eq. (9)], is given by

η( fc) = 10 log10

(∫ |H�( f ; fc)|| A0( f )|2d f∫ |H�( f ; fc)||A0( f )|2d f

), (25)

where A0 and A0 are the zeroth-order terms of the referenceand translated ambisonics transfer functions, respectively;each integration is taken over all frequencies f; and we againchoose ERB-spaced fc ∈ [50 Hz, 21 kHz]. We further definethe spectral error, given by

ρη = maxc

η( fc) − minc

η( fc), (26)

which we found through a previous subjective validationstudy to be a strong predictor of perceptible colorationsinduced through navigation [28].

2.2.3 Precedence-Effect Localization Vector (�ν)Localization is predicted using a recently developed

and subjectively validated precedence-effect-based local-ization model, the details of which are provided in an ear-lier publication [28, Sec. 2.A.i]. Briefly, this model entailsdecomposing the ambisonics impulse response into a finiteset of plane-wave impulse responses, which are further di-vided into wavelets with distinct arrival times. The signalamplitudes, plane-wave directions, and times-of-arrival forall wavelets are fed into the precedence-effect-based en-ergy vector model of Stitt et al. [29] to produce a singlepredicted source localization vector, �ν. The correspondinglocalization error is then computed by

eν = cos−1(ν · s0

′) , (27)

where s0′ is the direction of the source relative to the listener,

found by normalizing the vector �s0′ = �s0 − �r0.

6 Here, we used the gammatone filters implemented in the largetime-frequency analysis toolbox (LTFAT) for MATLAB [26].

2.2.4 Diffuseness Parameter (�)According to Merimaa and Pulkki [20], the acoustic

intensity vector and a diffuseness parameter can be com-puted using the four standard “B-format” signals, which arerelated to the first four ACN/N3D ambisonics signals by

W = A0√2, Y = A1√

3, Z = A2√

3, X = A3√

3. (28)

The diffuseness parameter is a nondimensional measureof the fraction of the total acoustic energy that is not di-rectional. To compute it, we first construct a frequency-dependent Cartesian row vector, �X = [

X Y Z]. The dif-

fuseness parameter � is then given by [20, Eq. (12)]

�( f ) = 1 −√

2

∥∥∥Re{

W ( f ) �X ( f )}∥∥∥

|W ( f )|2 +∥∥∥ �X ( f )

∥∥∥2/2

, (29)

where (·) denotes taking the complex conjugate of the ar-gument and, for a complex-valued vector, ‖ · ‖ denotes thenorm given by ‖�x‖ =

√〈�x, �x〉 ≡√

�x �x H, where ( · )H de-notes the conjugate (Hermitian) transpose of the argument.Thus, � is a real-valued scalar which takes on values be-tween � ∈ [0, 1], where � = 0 corresponds to a purelydirectional incident sound field and � = 1 corresponds toa purely diffuse incident sound field.

We then compute the logarithmically weighted mean ofthe difference between the diffuseness spectra for the trans-lated and reference signals, given by

e� =

∫ fU

fL

1

f

(�( f ) − �( f )

)d f

log( fU) − log( fL), (30)

where fL = 50 Hz and fU = 21 kHz.

3 CHARACTERIZATION AND DISCUSSION

In Fig. 2(a), we plot the level errors incurred by the TFAmethod as a function of array spacing � and normalizedsource distance γ. Note that we exclude from these plotsthe region in which s0 + �/2 < 0.1 m (i.e., the bottom leftcorner of each panel in Figs. 2 and 3), as this correspondsto geometries for which the source is “inside the head”(for an approximate head radius of 10 cm) at all positionswithin the navigable region. From Fig. 2(a), we see that theTFA method is able to achieve approximately zero error al-most everywhere, with the exception of far interior sources(γ < 1). This yields an improvement over the VMI method,shown in Fig. 2(b), which is only able to accurately recon-struct the sound level for exterior sources (γ > 1) and isotherwise several dB too quiet for interior sources.

That the reconstructed level is too low may be particu-larly detrimental to a listener’s perception of source prox-imity, since one of the primary distance cues humans expectis an increase in level [30, Sec. 3.1.1]. Consequently, theimpact of these errors on a listener’s perception of distanceshould be investigated, although it is outside the scope ofthis work to do so.

J. Audio Eng. Soc., Vol. 67, No. 11, 2019 November 887

Page 7: Domains of Practical Applicability for Parametric ......1.1 Time-Frequency Analysis (TFA) Method In the method proposed by Thiergart et al. [13], the sound field is first analyzed

TYLKA AND CHOUEIRI PAPERS

Fig. 2. Level errors eλ (top panels) and spectral errors ρη (bottom) for microphone spacing � and normalized source distance γ. Levelerror contour lines are drawn every 2 dB; spectral error contour lines are drawn every 1 dB.

For spectral coloration, we see in Figs. 2(c) and 2(d)that the TFA method yields larger errors than the VMImethod for all microphone spacings larger than approxi-mately 0.5 m. In particular, the VMI method yields sig-nificantly smaller errors for interior sources with large mi-crophone spacings (γ < 1 and � > 0.5 m). Only for ex-terior sources with microphone spacings smaller than ap-proximately 0.25 m does the TFA method achieve smallerspectral errors than the VMI method. Future investigationsshould attempt to determine the source of, and correct for,the spectral coloration induced by the TFA method at largemicrophone spacings.

Localization errors for the TFA method are shown inFig. 3(a). Contrary to that method’s coloration perfor-mance (see Fig. 2(c)), which is most accurate at smallmicrophone spacings (e.g., � < 0.5 m), the localiza-

tion errors incurred by this method are largest at thosesmall microphone spacings. In particular, for exteriorsources with microphone spacings smaller than approxi-mately 0.3 m, the VMI method yields a significant im-provement (∼15◦) over the TFA method. Additionally,for far exterior sources (γ > 3) and at all microphonespacings, the VMI method yields a marked improvement(∼5◦) over the TFA method. For microphone spacingslarger than approximately 0.5 m, the errors incurred bythe VMI method are relatively constant with spacing,whereas those incurred by the TFA method improve withincreasing spacing and even become very small (εν <

5◦) at large � and γ < 1. Accordingly, the TFA methodyields a significant improvement over the VMI methodfor interior sources with microphone spacings larger thanapproximately 1 m.

888 J. Audio Eng. Soc., Vol. 67, No. 11, 2019 November

Page 8: Domains of Practical Applicability for Parametric ......1.1 Time-Frequency Analysis (TFA) Method In the method proposed by Thiergart et al. [13], the sound field is first analyzed

PAPERS PARAMETRIC NAVIGATIONAL METHODS

Fig. 3. Predicted localization errors eν (panels (a) and (b)) and diffuseness errors e� (panels (c) and (d)) for microphone spacing �and normalized source distance γ. Localization error contour lines are drawn every 5◦; diffuseness error contour lines are drawn inincrements of 0.1.

Under such conditions, although the absolute localiza-tion performance of the VMI method (εν ∼ 15◦) may betolerable for some applications, it is unclear to what extentthe method can reproduce dynamic localization cues as thelistener moves. At large translation distances, the interpo-lation calculation effectively reduces to a “switching” be-tween different microphones (since k0 will be very low; seeEq. (21)) to the one that is valid for a given listener position,typically with a small region of overlap wherein multiplemicrophones can be used. Thus, to prevent abrupt changesin perspective, the audio streams may require some formof crossfading between frames. Alternatively, more sophis-ticated extrapolation methods [6, 15, 8] could potentiallybe employed for cases with only P = 1 valid microphone.

This is a topic for exploration and development in futurepractical implementations.

From the plots of diffuseness errors shown in Figs. 3(c)and 3(d), we immediately see that the VMI method achievesmore accurate performance than the TFA method over allconditions. The TFA method consistently yields a diffuse-ness parameter which is too small, whereas the VMI methodachieves nearly exact diffuseness (except at very small γ

and �). This apparent deficiency in diffuseness of the TFAmethod suggests that the diffuse sound term in the soundfield re-synthesis equation, Eq. (11), is underestimated bythe method. Consequently, it may be relatively straightfor-ward to modify the TFA method to correct for this behavior.This too is a topic for further development.

J. Audio Eng. Soc., Vol. 67, No. 11, 2019 November 889

Page 9: Domains of Practical Applicability for Parametric ......1.1 Time-Frequency Analysis (TFA) Method In the method proposed by Thiergart et al. [13], the sound field is first analyzed

TYLKA AND CHOUEIRI PAPERS

4 CONCLUSIONS

In this work we conducted numerical simulations in or-der to characterize and compare the performance of thetime-frequency analysis (TFA) interpolation method ofThiergart et al. [13] to our recently proposed parametricvalid microphone interpolation (VMI) method [16]. Fol-lowing the simulation framework laid out in Section 2,we simulated simple incident sound fields consisting ofa two-microphone array and a single point-source andvaried source distance and azimuth, microphone spac-ing, and listener position. We conducted a comprehen-sive analysis of the methods by computing, over a widerange of conditions, the metrics enumerated in Section2.2 for sound level, spectral coloration, source localiza-tion, and diffuseness. These analyses yielded the followingfindings:

• The TFA method yields virtually exact sound levelsfor all conditions and is particularly superior to theVMI method for interior sources;

• The TFA method yields significantly larger spectralerrors than the VMI method for microphone spacingslarger than approximately 0.5 m;

• The TFA method yields significantly smaller lo-calization errors than the VMI method for interiorsources with microphone spacings larger than ap-proximately 1 m; and

• The TFA method does not sufficiently reproducethe diffuseness of a sound field, whereas the VMImethod yields nearly exact diffuseness for almostall conditions.

4.1 Practical ImplicationsTaken together, these findings suggest that the TFA and

VMI methods may each be more suitable in different prac-tical domains. For the present discussion, we define thefollowing practically relevant “axes”:

(1) The sparsity of the microphone array (i.e., the sizeof the desired navigable region relative to the number ofavailable microphones),

(2) The intimacy of the sound sources (i.e., the proximityof the sources to the navigable region), and

(3) The complexity of the sound field (i.e., the total num-ber of sources and/or the reverberance of the recordingenvironment).

While the first two of these axes can be easily re-lated to microphone spacing and normalized source dis-tance, respectively, the third axis has not been directly ex-plored here. However, based on the construction of the TFAmethod, we speculate that this method may have difficul-ties accommodating multiple sources since, at each time-frequency bin, only a single point-source is created (seeSection 1.1). Consequently, the capability of this methodto accurately reproduce multiple sources warrants furtherstudy.

Nevertheless, below, we identify several general prin-ciples with which to choose between the two methods invarious applications spanning these axes:

(1a) With a sparse microphone array (e.g., when cover-ing a large room with only a few microphones), theTFA method will generally yield superior localiza-tion accuracy, whereas the VMI method will incurless spectral coloration.

(1b) With a dense microphone array, the methods performcomparably to each other (i.e., neither method is par-ticularly superior) in terms of localization accuracyand spectral coloration.

(2a) When recording primarily intimate sources (e.g., animmersive recording of a small group of musicians),the TFA method will yield superior localizationaccuracy and will likely better convey source dis-tance information (due to its accurate reproductionof sound level), whereas the VMI method will againincur less spectral coloration.

(2b) When recording primarily distant sources (e.g., whencovering the audience section only of a concert hall),the VMI method will yield smaller spectral and lo-calization errors.

(3a) For an acoustically complex sound field (e.g., in aroom with highly reflective surfaces and/or manyscattering bodies), the VMI method is likely moresuitable as it more accurately reproduces diffuse-ness and, potentially, the TFA method will fail toadequately reproduce many sources.

(3b) For an acoustically simple sound field (e.g., an out-door recording of a park with sparsely distributedsources), the TFA method will likely yield superiorlocalization accuracy, and its deficiency in diffuse-ness will be less problematic.

While these principles specify the superior method forany given practical domain, another way of summarizingthe present results is to determine the domains, in termsof these practical axes, over which each method yields ac-curate and superior performance. As we did not explorethe complexity axis explicitly, here we omit that axis andfocus only on the sparsity of the microphone array andthe intimacy of the sources. Additionally, since the leveland diffuseness results are relatively straightforward (seethe top panels of Fig. 2 and the bottom panels of Fig. 3,respectively), we omit those metrics as well.

Using the spectral errors plotted in the bottom panels ofFig. 2 and the localization errors plotted in the top panelsof Fig. 3, we first identify, for each method, regions of lowcoloration (ρη < 3 dB) and regions of accurate localization(eν < 10◦). We then determine the regions in which eachmethod performs a) more accurately than that error limitand b) more accurately than, or at least comparably to, theother method. These regions are sketched in Fig. 4.

From these plots, we see that, in applications with dis-tant sources and with a sparse microphone array, the VMImethod yields accurate and superior performance in bothcoloration and localization. Furthermore, for most appli-cations with a sparse microphone array or with intimatesources, the VMI method yields accurate and often superiorspectral coloration performance, and for most applicationswith distant sources, the VMI method yields accurate and

890 J. Audio Eng. Soc., Vol. 67, No. 11, 2019 November

Page 10: Domains of Practical Applicability for Parametric ......1.1 Time-Frequency Analysis (TFA) Method In the method proposed by Thiergart et al. [13], the sound field is first analyzed

PAPERS PARAMETRIC NAVIGATIONAL METHODS

Fig. 4. Region plots illustrating accurate and superior methods in terms of spectral coloration (panel (a)) and localization accuracy (panel(b)) across practical applications with varying microphone array sparsity (horizontal axis) and sound source intimacy (vertical). The grayfilled regions correspond to the valid microphone interpolation (VMI) method and the hatched regions correspond to the time-frequencyanalysis (TFA) method. Regions that are both filled and hatched indicate that the methods perform comparably; empty regions indicatethat neither method satisfies the specified error limit.

superior localization performance. The TFA method, how-ever, yields accurate spectral coloration performance onlyfor applications with a dense microphone array and with in-timate sources, and accurate and superior localization per-formance only for applications with a sparse microphonearray and with intimate sources. Consequently, in such ap-plications (with both a sparse microphone array and withintimate sources), the TFA method yields improved local-ization but degraded coloration performance compared tothe VMI method.

Although the results shown here are for first-order am-bisonics input signals only, future modifications to the VMImethod might yield improved performance through includ-ing higher-order terms (e.g., by choosing a different crit-ical wavenumber, cf. Eq. (21)). At present, however, it ispractically advantageous to employ first-order ambisonicsmicrophones (which tend to be significantly less expensivethan higher-order ones and require fewer recording chan-nels, preamplifiers, etc.). Also, as mentioned in Section 2.1,the TFA method fails to triangulate sources with azimuthsof |ϕ0| = 90◦. While, in practice, a source azimuth of exactly±90◦ is virtually impossible (e.g., due to positioning errors,noise, etc.), this does suggest that the triangulation calcu-lation (see Eq. (6)) may be very sensitive to small changesin azimuth near these extremes. This issue might be easilyavoided, however, by using P > 2 microphones arranged ina triangular or rectangular configuration, for example.

While the evaluation presented here has been purely nu-merical, we hope that the practical recommendations enu-merated above will facilitate real-world implementations ofthese navigational methods. Ideally, the conclusions drawnfrom these analyses will be borne out by future experimen-

tal investigations. In particular, subjective listening assess-ments of these and other methods would be useful to bothperceptually validate the present findings and identify otherareas for improvement in these methods.

5 ACKNOWLEDGMENTS

This work was sponsored by the Sony Corporation ofAmerica. The authors wish to thank R. Sridhar for fruitfuldiscussions throughout the work and P. Stitt for providingthe MATLAB code for the precedence-effect-based energyvector model [31].

REFERENCES

[1] M. A. Poletti, “Three-Dimensional Surround SoundSystems Based on Spherical Harmonics,” J. Audio Eng.Soc., vol. 53, no. 11, pp. 1004–1025 (2005 Nov.).

[2] N. Hahn and S. Spors, “Physical Properties of ModalBeamforming in the Context of Data-Based Sound Repro-duction,” presented at the 139th Convention of the AudioEngineering Society (2015 Oct.), convention paper 9468.

[3] F. Winter, F. Schultz, and S. Spors, “LocalizationProperties of Data-Based Binaural Synthesis includingTranslatory Head-Movements,” presented at the ForumAcusticum (2014 Sep.).

[4] J. G. Tylka and E. Y. Choueiri, “Comparison of Tech-niques for Binaural Navigation of Higher-Order AmbisonicSoundfields,” presented at the 139th Convention of theAudio Engineering Society (2015 Oct.), convention paper9421.

J. Audio Eng. Soc., Vol. 67, No. 11, 2019 November 891

Page 11: Domains of Practical Applicability for Parametric ......1.1 Time-Frequency Analysis (TFA) Method In the method proposed by Thiergart et al. [13], the sound field is first analyzed

TYLKA AND CHOUEIRI PAPERS

[5] E. G. Williams, Fourier Acoustics: Sound Radiationand Nearfield Acoustical Holography (Academic Press,1999).

[6] T. Pihlajamaki and V. Pulkki, “Synthesis of Com-plex Sound Scenes with Transformation of Recorded Spa-tial Sound in Virtual Reality,” J. Audio Eng. Soc., vol. 63,no. 7/8, pp. 542–551, (2015 Aug.).

[7] K. Wakayama, J. Trevino, H. Takada, S. Sakamoto,and Y. Suzuki, “Extended Sound Field Recording Us-ing Position Information of Directional Sound Sources,”presented at the 2017 IEEE Workshop on Applicationsof Signal Processing to Audio and Acoustics (WAS-PAA), pp. 185–189 (2017 Oct.), https://doi.org/10.1109/WASPAA.2017.8170020.

[8] A. Allen, “Ambisonics sound field naviga-tion using directional decomposition and path dis-tance estimation” (2019 Jan.), US Patent 10,182,303,https://patents.google.com/patent/US10182303B1/.

[9] E. Patricio, A. Ruminski, A. Kuklasinski, Ł.Januszkiewicz, and T. Zernicki, “Toward Six Degrees ofFreedom Audio Recording and Playback Using MultipleAmbisonics Sound Fields,” presented at the 146th Conven-tion of the Audio Engineering Society (2019 Mar.), conven-tion paper 10141.

[10] P. Grosche, F. Zotter, C. Schorkhuber, M. Frank,and R. Holdrich, “Method and apparatus for acousticscene playback” (2018 May), WIPO (PCT) Patent App.WO/2018/077379.

[11] D. Rudrich, F. Zotter, and M. Frank, “Evaluationof Interactive Localization in Virtual Acoustic Scenes,” 43.Jahrestagung fur Akustik (DAGA 2017), pp. 279–282 (2017Mar.).

[12] T. Deppisch and A. Sontacchi, “Browser Applica-tion for Virtual Audio Walkthrough,” Forum Media Tech-nology, pp. 145–150 (2017).

[13] O. Thiergart, G. Del Galdo, M. Taseska, andE. A. P. Habets, “Geometry-Based Spatial Sound Ac-quisition Using Distributed Microphone Arrays,” IEEETransactions on Audio, Speech, and Language Pro-cessing, vol. 21, no. 12, pp. 2583–2594 (2013 Dec.),https://doi.org/10.1109/TASL.2013.2280210.

[14] X. Zheng, Soundfield Navigation: Separation,Compression and Transmission, Ph.D. thesis, Universityof Wollongong (2013).

[15] A. Plinge, S. J. Schlecht, O. Thiergart, T. Robotham,O. Rummukainen, and E. A. P. Habets, “Six-Degrees-of-Freedom Binaural Audio Reproduction of First-Order Am-bisonics with Distance Information,” presented at the 2018AES International Conference on Audio for Virtual andAugmented Reality (2018 Aug.), conference paper P6-2.

[16] J. G. Tylka and E. Y. Choueiri, “Soundfield Nav-igation Using an Array of Higher-Order Ambisonics Mi-crophones,” presented at the 2016 AES International Con-ference on Audio for Virtual and Augmented Reality (2016Sep.), conference paper 4-2.

[17] F. Zotter, Analysis and Synthesis of Sound-Radiation with Spherical Arrays, Ph.D. thesis, Universityof Music and Performing Arts Graz (2009).

[18] C. Nachbar, F. Zotter, E. Deleflie, and A. Sontacchi,“ambiX - A Suggested Ambisonics Format,” Proceedingsof the 3rd Ambisonics Symposium (2011 June).

[19] The MathWorks, Inc., “Hamming window,”https://www.mathworks.com/help/signal/ref/hamming.html(2019), [Online; accessed 16-February-2019].

[20] J. Merimaa and V. Pulkki, “Spatial Impulse Re-sponse Rendering I: Analysis and Synthesis,” J. Audio Eng.Soc., vol. 53, pp. 1115–1127 (2005 Dec.).

[21] P. Samarasinghe, T. Abhayapala, and M. Po-letti, “Wavefield Analysis Over Large Areas Using Dis-tributed Higher Order Microphones,” IEEE/ACM Trans-actions on Audio, Speech, and Language Processing,vol. 22, pp. 647–658 (2014 Mar.), ISSN 2329-9290,https://doi.org/10.1109/TASLP.2014.2300341.

[22] N. A. Gumerov and R. Duraiswami, Fast MultipoleMethods for the Helmholtz Equation in Three Dimensions(Elsevier Science, 2005).

[23] J. G. Tylka and E. Y. Choueiri, “Algorithms forComputing Ambisonics Translation Filters,” Technical re-port, 3D Audio and Applied Acoustics Laboratory, Prince-ton University (2019 Feb.).

[24] P. C. Hansen, Rank-Deficient and Discrete Ill-PosedProblems: Numerical Aspects of Linear Inversion (Societyfor Industrial and Applied Mathematics, 1998).

[25] Z. Pr◦usa, P. L. Søndergaard, N. Holighaus, C. Wies-meyr, and P. Balazs, “The Large Time-Frequency AnalysisToolbox,” http://ltfat.github.io (2012) (Online; accessed 16February, 2019).

[26] B. R. Glasberg and B. C. J. Moore, “Derivation ofAuditory Filter Shapes from Notched-Noise Data,” Hear-ing Research, vol. 47, no. 1–2, pp. 103–138 (1990 Aug.).

[27] Z. Scharer and A. Lindau, “Evaluation of Equaliza-tion Methods for Binaural Signals,” presented at the 126thConvention of the Audio Engineering Society (2009 May),convention paper 7721.

[28] J. G. Tylka and E. Y. Choueiri, “Models for Evalu-ating Navigational Techniques for Higher-Order Ambison-ics,” Proc. Mtgs. Acoust., vol. 30, no. 1, p. 050009 (2017Oct.), https://doi.org/10.1121/2.0000625.

[29] P. Stitt, S. Bertet, and M. van Walstijn, “Ex-tended Energy Vector Prediction of Ambisonically Re-produced Image Direction at Off-Center Listening Posi-tions,” J. Audio Eng. Soc., vol. 64, no. 5, pp. 299–310(2016 May).

[30] P. Zahorik, D. S. Brungart, and A. W. Bronkhorst,“Auditory Distance Perception in Humans: A Summaryof Past and Present Research,” Acta Acustica united withAcustica, vol. 91, no. 3, pp. 409–420 (2005 May).

[31] P. Stitt, “Matlab Code,” https://circlesounds.wordpress.com/matlab-code/ (2016) (Online; accessed 16February, 2019).

892 J. Audio Eng. Soc., Vol. 67, No. 11, 2019 November

Page 12: Domains of Practical Applicability for Parametric ......1.1 Time-Frequency Analysis (TFA) Method In the method proposed by Thiergart et al. [13], the sound field is first analyzed

PAPERS PARAMETRIC NAVIGATIONAL METHODS

A.1 A RELEVANT AMBISONICS THEORY

In the free field (i.e., in a region free of sources andscattering bodies), the acoustic potential field satisfies thehomogeneous Helmholtz equation and can therefore be ex-pressed as an infinite sum of regular (i.e., not singular) basissolutions. In ambisonics, these basis solutions are given byjl(kr )Yn(r ), and the sum, also known as a spherical Fourier-Bessel series expansion, is given by [23, Ch. 2]

ψ(k, �r ) =∞∑

n=0

4π(−i)l An(k) jl(kr )Yn(r ), (A.31)

where An are the corresponding (frequency-dependent) ex-pansion coefficients and we have, without loss of generality,factored out (− i)l to ensure conjugate-symmetry in eachAn, thereby making each ambisonics signal (i.e., the inverseFourier transform of An) real-valued for a real pressure field.

The ambisonics encoding filters for a point source lo-cated at �s0 and expanded about the origin are given in thefrequency domain by [1, Eq. (10)]

An(k) = i l+1khl(ks0)Yn(s0), (A.32)

where hl is the (outgoing) spherical Hankel function oforder l.

THE AUTHORS

Joseph G. Tylka Edgar Y. Choueiri

Joe Tylka is a recent Ph.D. recipient from the 3D Audioand Applied Acoustics Laboratory at Princeton University,where his dissertation research explored binaural renderingand virtual navigation of recorded 3D sound fields. Joe’sresearch interests include machine listening and acousticsignal processing.

Edgar Choueiri is a professor of applied physics in theDepartment of Mechanical and Aerospace Engineering atPrinceton University and associated faculty in the De-partment of Astrophysical Sciences. He heads Princeton’sElectric Propulsion and Plasma Dynamics Lab and the 3DAudio and Applied Acoustics Lab. His research interestsare plasma physics, plasma propulsion, acoustics, and 3Daudio.

J. Audio Eng. Soc., Vol. 67, No. 11, 2019 November 893