is my decoder ambisonic?

27
Is My Decoder Ambisonic? Aaron J. Heller Artificial Intelligence Center, SRI International Menlo Park, CA 94025, USA Email: [email protected] Richard Lee Pandit Littoral Cooktown, Queensland 4895, AU Email: [email protected] Eric M. Benjamin Dolby Laboratories San Francisco, CA 94044, USA Email: [email protected] Revision 2 Abstract In earlier papers, the present authors established the importance of various aspects of Am- bisonic decoder design: a decoding matrix matched to the geometry of the loudspeaker array in use, phase-matched shelf filters, and near-field compensation [1, 2]. These are needed for accurate reproduction of spatial localization cues, such as interaural time difference (ITD), interaural level difference (ILD), and distance cues. Unfortunately, many listening tests of Ambisonic reproduction reported in the literature either omit the details of the decoding used or utilize suboptimal decoding. In this paper we review the acoustic and psychoacoustic criteria for Ambisonic reproduc- tion, present a methodology and tools for “black box” testing to verify the performance of a candidate decoder, and present and discuss the results of this testing on some widely used decoders. Keywords: Ambisonics, decoder design, listening tests, surround sound, psychoacoustics NOTE: This is a revised and corrected version of the paper presented at the 125th Convention of the Audio Engineering Society, held October 1 – 5, 2008 in San Francisco, CA USA $Id: imda-revised.tex 25737 2012-06-10 23:40:07Z heller $ 1

Upload: others

Post on 02-Feb-2022

8 views

Category:

Documents


0 download

TRANSCRIPT

Is My Decoder Ambisonic?

Aaron J. HellerArtificial Intelligence Center, SRI International

Menlo Park, CA 94025, USAEmail: [email protected]

Richard LeePandit Littoral

Cooktown, Queensland 4895, AUEmail: [email protected]

Eric M. BenjaminDolby Laboratories

San Francisco, CA 94044, USAEmail: [email protected]

Revision 2

Abstract

In earlier papers, the present authors established the importance of various aspects of Am-bisonic decoder design: a decoding matrix matched to the geometry of the loudspeaker arrayin use, phase-matched shelf filters, and near-field compensation [1, 2]. These are needed foraccurate reproduction of spatial localization cues, such as interaural time difference (ITD),interaural level difference (ILD), and distance cues. Unfortunately, many listening tests ofAmbisonic reproduction reported in the literature either omit the details of the decoding usedor utilize suboptimal decoding.

In this paper we review the acoustic and psychoacoustic criteria for Ambisonic reproduc-tion, present a methodology and tools for “black box” testing to verify the performance ofa candidate decoder, and present and discuss the results of this testing on some widely useddecoders.

Keywords: Ambisonics, decoder design, listening tests, surround sound, psychoacoustics

NOTE: This is a revised and corrected version of the paperpresented at the 125th Convention of the Audio EngineeringSociety, held October 1 – 5, 2008 in San Francisco, CA USA

$Id: imda-revised.tex 25737 2012-06-10 23:40:07Z heller $

1

1 INTRODUCTIONThis paper is about testing Ambisonic de-coders. The decoder is the component of anAmbisonic reproduction system that derives theloudspeaker signals from the program signals.Unlike most other surround sound systems inwhich each channel of a recording is intendedto drive a single loudspeaker directly, an Am-bisonic recording can be played back on a va-riety of speaker layouts, both 2-D and 3-D, byusing an appropriate decoder.

A key feature of Ambisonic theory is that itprovides a mathematical encapsulation of prac-tically all known auditory localization models,except the pinna coloration and impulsive (high-frequency) interaural time delay models. Thesemathematical descriptions can be used to provetheorems about surround sound recording andreproduction, predict what spatial informationcan and cannot be conveyed by a particular sys-tem, guide the design of decoders, and as dis-cussed in this paper, evaluate and validate im-plementations.

We assume that the reader is familiar withthe basic workings of surround sound in gen-eral and Ambisonics in particular. Backgroundmaterial on these topics, as well as sample Am-bisonic recordings, can be found at various web-sites [3, 4].

Our interest in determining whether or not agiven decoder meets the criteria for Ambisonicreproduction is motivated by practical consider-ations. When we first conducted listening tests,we did what many do: obtained some record-ings made with a Soundfield microphone, set upsix loudspeakers in a hexagon, downloaded a de-coder off the Internet, and listened with the de-fault settings. What we experienced was quiteconfusing completely ambiguous localizationand severe comb filtering artifacts from slighthead movements. Over the next few listeningsessions, we tried other software decoders andother settings with different but equally unsat-isfying results. Had we not had previous expe-

rience with good Ambisonic reproduction, wemight have stopped there and written off Am-bisonics as yet another failed surround soundtechnology.

Instead, we went to the benchmark ofgood Ambisonic playback, what are known in-formally as Classic Ambisonic Decoders thehardware-based decoders designed by the origi-nal Ambisonics team [5] and built up an offline,file-to-file decoding workflow that mimicked theprocessing performed by those decoders. Sinceeach step produced an intermediate file, we wereable to verify that our implementation was per-forming as expected. The techniques describedin this paper are a formalization and extensionof this verification process.

Finally, by using a playback program thatprovided synchronized playback of a number offiles and rapid switching among them, we wereable to gain an understanding of the effects ofeach of the key components in an Ambisonicdecoder: a decoding matrix matched to the ge-ometry of the loudspeaker array in use, phase-matched shelf filters, and near-field compensa-tion (NFC). These listening tests demonstratedthat using the correct decoder results in dramat-ically improved performance [1, 2].

A number of recent papers have reportedon the results of Ambisonic listening tests thathave used decoders that are clearly faulty oremployed incorrectly. As an example, in ref-erence [6] the authors compare various spatial-ization techniques, including Ambisonics. Themethodology used was well thought out, but un-fortunately the software used to decode the Am-bisonic program material may not have been themost appropriate:

“The ‘in-phase’ ambisonic decoderwas selected as it is recommendedfor larger rooms and listening areas,preventing anti-phase signals to befed to the loudspeaker opposite tothe sound source.”

Later in the paper, the authors conclude that

2

Ambisonics provides poor localization. How-ever, given that the listening tests were per-formed with single listeners using a speakerarray with 2-meter radius, the best (known)methodology for decoding would have been toperform exact, or velocity decoding at low fre-quencies, energy decoding at middle and highfrequencies and use near-field compensation.

Another recent paper discusses “spectralimpairment” in Ambisonic reproduction [7].While not stated explicitly, a careful reading re-veals that exact decoding was used over the en-tire frequency range. This is known to producecomb-filtering artifacts, and again, better decod-ing strategies are known.

Craven sums up the current situation as fol-lows [8],

There may be doubt about themeaning of the term “Ambison-ics”. The precise definition givenin [Sec. 1.1 of this paper] seems tohave been largely ignored, and theterm “ambisonic” is now appliedloosely to any system that makesuse of circular or spherical harmon-ics.

Most software decoders have many adjust-ments, but their authors provide little or no guid-ance on appropriate settings for various play-back situations, making it difficult for a user toknow if they are functioning correctly withoutextensive listening tests. We have read many ac-counts of “phasey”, “ambiguous”, or “unpleas-ant” Ambisonic reproduction that can be at-tributed to this problem. In particular, phaseyreproduction will occur when exact velocity de-coding is used at higher frequencies, where thewavelengths are smaller than the inter-ear dis-tance.

The key point here is that it is not enough tosimply specify that an Ambisonic decoder wasused; not all decoders or decoder philosophiesperform in the same way. It is also worth tonoting that various workers in the field may not

want to design a decoder; they simply want toverify that an existing one works properly andthen use it.

Good engineering practice dictates that thebehaviors of the individual components of a sys-tem under test be verified so that its overall per-formance can be properly characterized. Whilethe design criteria have been outlined or impliedin many papers, we have found no discussion oftools or methodologies to assess how well theyhave been met in a given implementation.

We confine the discussion in this paper to de-coders suitable for one or two listeners.1 In thispaper we test only horizontal, first-order Am-bisonic decoders; however, the extensions forfull 3-D reproduction (periphony) and arbitraryorders are straightforward.

There are a number of additional factors, anyof which can have a large effect on the qualityof playback but are beyond the scope of whatis discussed here, such as room acoustics, accu-racy of speaker positioning and matching, tim-ing skew in multichannel D/A converters, and soforth. Simply due to the number of interconnec-tions, speakers, and amplifiers in a typical Am-bisonics playback system, the odds of making asetup error are much higher than in the case ofstereo and the faults more difficult to diagnosethan, for example, a channel reversal in stereoreproduction.

Due to space limitations, we test just fourdecoders and a single speaker configuration, the√

3 : 1 rectangle. In this configuration there arefour speakers at azimuths ±30 and ±150 de-grees in the horizontal plane. It was preferredover a square layout in previous listening tests,as well as being easier to fit in most domesticrooms.

1Design of decoders that work well over large areas isa distinct art and in general involves additional constraintsthat compromise their performance for small areas. [9]

3

1.1 Definition of AmbisonicsIn summary, we are trying to decide if a givendecoder and loudspeaker configuration meet thecriteria for Ambisonic reproduction as definedby Gerzon in [10].

A decoder or reproduction systemfor 360◦ surround sound is definedto be Ambisonic if, for a central lis-tening position, it is designed suchthat

i) velocity and energy vec-tor2directions are the same atleast up to around 4 kHz, suchthat the reproduced azimuthθV = θE is substantiallyunchanged with frequency,

ii) at low frequencies, say belowaround 400 Hz, the magnitudeof the velocity vector is nearunity for all reproduced az-imuths,

iii) at mid/high frequencies, saybetween around 700 Hz and4 kHz, the energy vector mag-nitude, rE , is substantiallymaximised across as large apart of the 360◦ sound stage aspossible.

We feel that these are necessary, if perhaps notsufficient, conditions for good surround soundreproduction.

2 REVIEW OF AMBISONIC CRITERIAGerzon defines two primitive models that arecharacterized by the velocity localization vec-tor (rV) and energy localization vector (rE).These models encapsulate the primary Interau-ral Time Difference (ITD) and Interaural LevelDifference (ILD) theories of auditory localiza-tion. The direction of each indicates the direc-

2Precise definitions of these are given in Sec. 2.

tion of the localization perception, and the mag-nitude indicates the quality of the localization.In natural hearing, from a single source the mag-nitude of each is exactly 1 and the direction isthe direction to the source.

Ideally, both types of cue will be accuratelyrecreated by a multispeaker playback environ-ment and they will be in agreement with eachother. In terms of Gerzon’s models this meansthat rV and rE should agree in direction up toaround 4 kHz; that below 400 Hz, the magnitudeof rV is near unity for all reproduced directions;and that between 700 Hz and 4 kHz, |rE| is max-imized over as many reproduced directions aspossible. |rE| achieves a maximum value of 1for a single source and is always less than 1 formultiple sources. Gerzon observes that a valueless than 0.5 “gives rather poor image stability.”[11]

Following Gerzon [12], the magnitude anddirection of the velocity vector, rV and rV, at thecenter of a speaker array with n speakers is

rV rV = Re∑n

i=1 Giui

∑ni=1 Gi

(1)

whereas the magnitude and direction of the en-ergy vector, rE and rE are computed by

rE rE = ∑ni=1(GiGi

∗)ui

∑ni=1(GiGi

∗)(2)

where the Gi are the (possibly complex) gainsfrom the source to the i-th speaker, u is a unitvector in the direction of the speaker, and Gi

∗ isthe complex conjugate of Gi.

The main goal of the test protocol outlinedin Sec. 3 is to recover the Gi’s used by the de-coder under test for a given source direction andspeaker configuration. In the general case, theyvary with frequency; hence, Gi and GiGi

∗ can bethought of as the complex frequency and energyresponses of the decoder for a particular direc-tion.

The remaining parameters are the imaginary

4

parts of velocity localization vector

Im∑n

i=1 Giui

∑ni=1 Gi

(3)

which correspond to “phaseyness” arising fromusing filters whose phase responses are notmatched. The most important part of this is theY -component, the direction of the ear axis, overthe frequency range 300 to 1500 Hz, and shouldbe as close to zero as possible [12].

In general, optimizing the rV and rE vectorsrequires the use of a different decoding matrixfor each frequency range. This can be accom-plished with shelf filters or band-splitting filterssimilar to those used in loudspeaker crossovers.In either case, it is imperative that the filters arephase matched to preserve uniform frequencyresponse over all directions.

Last, near-field compensation corrects forthe reactive component of the reproducedsoundfield when the listening position is withina few meters of the loudspeakers.

3 TEST PROTOCOLIt is not necessarily straightforward to determinewhether a decoder is operating optimally sim-ply by inspecting the software or listening tothe output. They must be tested in order toverify that the desired characteristics have beenachieved.

To do that, a test signal was created con-sisting of unit impulses at 216-sample intervals.This signal was encoded according to the B-format conventions (see Appendix B) to createa series of unit impulses from varying source di-rections. This test signal is the equivalent to theoutput of a virtual soundfield microphone witha virtual source that is moved from one angu-lar position to the next. The original series ofimpulses is included on an additional channel inthe test file to act as a sync signal to simplify thelater analysis. A plot of the test file is shown inFigure 1. At 48 kHz sample frequency, the play-

ing time of this file is 109.2 seconds. 3

This file is then applied directly to the in-put of a software decoder, or played out thougha multichannel soundcard into a hardware de-coder, and the output recorded for subsequentanalysis. In either case the intermediate outputof the testing process is a file containing the re-sultant loudspeaker feeds derived by the decoderfor the particular speaker configuration. Thesync signal is recorded directly into the outputfile, without passing through the decoder undertest. A screen capture showing this process isshown in Figure 2.

In the current work, 72 horizontal directionsare used and the four loudspeaker feeds capturedto the file, resulting in 288 impulse response(IR) measurements for each decoder configura-tion tested.

To perform the analysis, the complex fre-quency and energy responses are computed foreach IR, yielding the Gis needed to compute rVand rE according to Eqns. 1 and 2. By exam-ining these results, we can evaluate the decoderagainst Gerzon’s criteria for Ambisonic repro-duction as well as our other criteria.

It is worth noting that a number of methodscan be used to measure the impulse responseof a system. A survey of these techniques canbe found in Stan, et al. [15]. Any of thesetechniques should work for this analysis. Forour current purposes, we have selected the sim-plest one since it provides adequate signal-to-noise (S/N) ratio for direct testing of softwaredecoders and removes the deconvolution pro-cess as a potential source of errors. To test hard-ware decoders, more sophisticated IR measure-ment techniques, such as MLS or Sine Sweep,are needed to deal with the lower S/N ratio andpossibly higher distortion levels found in analogcircuitry.

3Matlab code to generate this test file, along with thecode discussed in Sec. 3.1 is available on our website [13].

5

Figure 1: A plot of the B-format impulses file encoding an impulse from 72 source directions inthe horizontal plane. The signals from top to bottom are W , X , Y , Z, and original impulses. Theyare offset vertically on the plot for clarity. The original impulse is included as a sync pulse that isrecorded directly to the output file to simplify the analysis process. The first and last four impulsesin the file excite each B-format component of the decoder independently. These are not used in thecurrent analysis.

3.1 AnalysisWe have created a set of tools in Matlab to an-alyze the recorded impulse responses and pro-duce various plots, which give us insight intothe behavior of the decoders. All the Matlabcode used in this paper is available on our web-site [13].

The first function, read_spkr_imps, readsthe speaker feeds file, normalizes the rangeof the data to fullscale = 1.0, extracts thesync pulses from the sync track, and then usesthem to extract the individual impulses. It re-turns a 216 × 72 × 4 real-valued array, calledimps_dir_spkr in this example. The indicesto each dimension represent sample number,source direction in 5 degree increments, and

speaker. It also returns the sample rate of thefile, Fs. Optional return values are the raw sam-ples and an array containing the locations of thesync pulses.

[ imps_dir_spkr Fs ] = ...

read_spkr_imps( file );

The next function, compute_ffts_imps, takesthe imps_dir_spkr array as input and com-putes the FFT of each impulse. This returnsa complex valued array, with the same indicesas above, but with frequency instead of samplenumber. It also returns the length of the FFT. Byslicing though this array along various dimen-sions, we obtain the data we need to computethe parameters of interest.

[ ffts_dir_spkr NFFT ] = ...

6

Figure 2: A screen capture of Plogue Bidule [14] being used as a test harness for a decoder avail-able as a VST plug-in. This works for Windows and MacOSX, which also supports Apple AudioUnit plug-ins. A similar setup using Jack and Ecasound is used to test Linux-hosted decoders.

compute_ffts_imps( imps_dir_spkr );

Next, the speaker positions are specified. In the√3 : 1 rectangular array used here, there are four

loudspeakers, with 60◦ separation between thepairs of speakers in the front and rear. It is im-portant that the order of these corresponds to theorder of the recorded speaker signals in the datafile.

phi = pi/6; % frontal spkrs half-angle

speaker_weights = [ ...

1, cos( phi), sin( phi), 0 ;

1, cos(pi-phi), sin(pi-phi), 0 ;

1, cos(pi+phi), sin(pi+phi), 0 ;

1, cos( -phi), sin( -phi), 0 ];

The next functions compute the unnormalizedcomponents (Pressure, X, Y, and Z) of rV andrE, by summing the ffts and the ffts squaredweighted by the speaker locations. Vpxyz andEpxyz are indexed by frequency, direction, andcomponent. The values in Vpxyz are complex,those in Epxyz are real.

Vpxyz = sum_pxyz( ffts_dir_spkr, ...

speaker_weights );

Epxyz = sum_pxyz( ffts_dir_spkr .* ...

conj(ffts_dir_spkr), ...

speaker_weights );

The absolute value of Vpxyz is used to exam-ine the frequency response of the pressure andvelocity components to determine whether ornot the decoder under test implements near-fieldcorrection and dual-band processing. We alsocompare the frequency and phase responses invarious directions to check that they are consis-tent.

rV and rE are now computed by normaliz-ing by the pressure component and convertingto spherical coordinates to yield the directionand magnitude. rVcart is complex. The realparts comprise rV. The imaginary parts, and inparticular the one parallel to the Y -axis, indicatephaseyness due to use of filters that are not phasematched.

[ rVsph rVcart ] = r_from_pxyz( Vpxyz );

[ rEsph rEcart ] = r_from_pxyz( Epxyz );

At this point, polar plots of rV and rE are createdat various frequencies and evaluated according

7

to the Ambisonic criteria discussed in Sec. 2.Moore and Wakefield have published schemesfor prioritizing and weighting these criteria toproduce a single figure of merit suiteable for usein numerical optimization methods [16, 17].

4 KEY COMPONENTS OF ANAMBISONIC DECODER

All decoders must perform the fundamentalfunction of forming suitable linear combinationsof the B-format signals for each loudspeaker inthe array that reproduces the pressure and parti-cle velocity at the central position in the array.This set of linear combinations is called the ex-act or matching decoder matrix. It is also calledthe basic or systematic solution of the speakerarray or simply the velocity decode. Regardlessof what it is called, it is unique for each loud-speaker array geometry.

In general, there are three types of loud-speaker arrays:

1. regular polygons and polyhedra, such assquare, hexagon, cube, dodecahedron

2. irregular layouts but with speakers in dia-metrically opposite pairs, such the

√3 : 1

rectangle tested here and its 3-D coun-terparts the bi- and tri-rectangular arrays.This are also called semiregular arrays insome literature.

3. general irregular arrays such as an ITU 5.1array or hemispherical dome.

In all cases the number of loudspeakers must ex-ceed the number of B-format signals.

A method for deriving the decoder matrixfor the first two types is given in Appendix A.Methods for the third type remain an area ofopen research [10, 18, 19]. In the case of strictregular arrays (type 1), this reduces to the re-sult that the decoding and encoding matrices areidentical, with the speaker positions substitutedfor the source positions. The single most perva-sive error in Ambisonic decoder design and use

is assuming that also holds for irregular arrays.Sec. 5.3 discusses the effect of this error. In ourexperience, most software Ambisonic decodersthat can be downloaded are of this type.

The exact decoder matrix recreates the pres-sure and velocity at the central position underthe assumption that the wavefronts are planar,i.e., sources and loudspeakers at an infinite dis-tance. Sources and loudspeakers at finite dis-tances produce wavefronts with a “reactive” (orimaginary) component, which is perpendicularto the direction of propagation, in addition tothe “real” component, which is parallel to the di-rection of propagation. This results in the well-known bass-boosting proximity effect in direc-tional microphones. It is important to under-stand that this is an actual physical effect, not adesign flaw in the microphone or loudspeaker.4

For point sources, the frequency at which the re-active and real components are equal is given by

f =c

2πd(4)

where c is the speed of sound and d is the dis-tance from the loudspeaker [20].

In terms of the velocity localization vector,this makes rV > 1 at low frequencies, which hasthe effect of widening the source images andmaking them unstable with head rotation. Thisartifact is most apparent in recordings of stringtrios and quartets, where the cello sounds as if itis somewhat larger and closer than the other in-struments. To compensate for this, a single-polehigh-pass filter is applied to the velocity signalsin the decoder. We call this near-field compen-sation. This is also refered to as distance com-pensation in the older literature. The design of

4The implication for B-format signal encoding is thatthe X, Y, and Z signals must have a low-frequency boostand phase shift relative to the W signal, the amount ofwhich is a function of the source distance. For natu-ral acoustic sources, a properly aligned Soundfield-typemicrophone does this by virtue of accurate transductionof the incident wavefronts, and thereby encodes distance.For synthetic sources, this must be included in the encod-ing equations. Further discussion of this is in Appendix B.

8

this filter is covered in Appendix A.This exact reproduction of acoustic pressure

and velocity is equivalent to the condition rV = 1in Gerzon’s velocity localization model. In the-ory that happens at only a single point in space;however, in practice, it is good enough up toroughly one-half wavelength from the centralposition. If we desire reconstruction over anarea of 0.5-meter, the exact decoder matrix canbe used up to roughly 350 Hz and correspondsto the frequency regime of ITD-based auditorylocalization. If it is used beyond that frequency,comb filter artifacts and in-head localization ef-fects will be experienced by the listener. This isprobably the second most common error in Am-bisonic reproduction.

At higher frequencies, say 700 to 4000 Hz,ILD-based auditory localization models are ap-propriate, which Gerzon encapsulates in the en-ergy localization vector, rE. The one parameterthat can be changed in the exact solution with-out changing the direction of the velocity local-ization vector, rV, is the velocity-to-pressure ra-tio (i.e, the gain of X, Y, and Z vs. W), usuallydenoted by k.5 Writing the magnitude of the en-ergy localization vector, rE , as a function of k,for any regular 2-D polygonal array with at leastfour speakers, we get

rE(k) =2k

2k2 +1(5)

which attains its maximum value when k =√2

2 ≈ 0.7071≈−3.01 dB.6 In the case of a regu-lar 3-D polyhedral array, with at least six speak-ers, we get

rE(k) =2k

3k2 +1(6)

which attains its maximum value when k =√3

3 ≈ 0.5774 ≈ −4.77 dB. Figure 3 showsgraphs of these equations. Solutions with these

5k is equivalent to the inverse of the specific acousticimpedance.

6found by setting the derivative with respect to k equalto zero and finding the roots

values of k are often called “Max-rE” or “energydecodes.”

Next we must apportion the total betweenboost for pressure and cut for velocity in sucha way that the overall loudness and balancebetween low and high frequencies is main-tained. One approach is preserving the root-mean-square (RMS) level.7 In the 2-D case

W 2 +X2 +Y 2 = 3 at both LF & HF (7)

XW

=YW

=√

22

at HF only (8)

solving for the HF gains

W =

√32≈ +1.76 dB (9)

X = Y =

√34≈−1.25 dB (10)

In the 3-D case

W 2 +X2 +Y 2 +Z2 = 4 at both LF & HF

(11)

XW

=YW

=ZW

=√

33

at HF only (12)

solving for the HF gains

W =√

2 ≈ +3.01 dB (13)

X = Y = Z =

√23≈−1.76 dB (14)

Note that the gains derived here for the 3-D case differ from those given by Gerzon in

7In our listening tests with small numbers of loud-speakers, this provides a tonal balance similar to a single-band energy decode. However, Adriaensen has observedthat with a 16-speaker array reproducing first- and second-order material, there is a quite apparent low-frequencyboost unless the decoder is set up assuming that pressuresare added [21]. This is because very low frequencies sumcoherently, so equalizing the RMS levels is not correct inthat regime. The effect is also present in smaller arrays,but to a lesser extent, and results in the subjective impres-sion of extended bass. The best way to treat this is an openquestion at this time.

9

[11]. We have auditioned both sets using a bi-rectangle array and find that the present onesyield a similar tonal balance to 2-D playbackon a rectangular rig with similar aspect ra-tio, whereas Gerzon’s provide a distinctly morerolled off and spatially diffuse balance.

For semiregular speaker arrays (“type 2”),the magnitude of rE varies in direct proportionto the angular density of the loudspeakers in agiven direction, but for first-order Ambisonicsthe average value cannot exceed

√2

2 for hori-

zontal arrays and√

33 for 3-D arrays. Gerzon

notes that√

33 is “perilously close to being un-

satisfactory” [11]. However, in most periphonic(with height) systems, practical and domesticconsiderations often dictate that there will bemore speakers in near horizontal than verticaldirections. Localization is better in directionswith more speakers hence, our preference for arectangle horizontal array over a square for pre-dominantly frontal source material. However,Ambisonic systems have a clear advantage overother surround systems in that ambient/diffusesounds are still perceived realistically even fromdirections with “poor localization.”

A physical interpretation of the energy de-code is that for a square array, a source directlyahead (azimuth zero), is reproduced with equalgain in the two front speakers and with zero gainin the two rear speakers. That is, the virtual mi-crophone pattern formed by the gains from a di-rectly frontal source to the speakers is a near-supercardioid, with the two nulls at the angu-lar locations of the rear speakers. The same istrue in 3-D of a cube array; a frontal sound withazimuth and elevation zero, is reproduced withequal gain in the front speakers and with zerogain in the rear speakers.

This suggests that for optimal reproductiontwo decoding matrices are needed, one for lowfrequencies with k = 1 and another for highfrequencies, with k =

√2

2 or√

33 for the 2-D

and 3-D cases, respectively. Classic AmbisonicDecoders used phase-matched shelf filters to

Figure 3: Plots of rE as a function of thevelocity-to-pressure ratio k. The top curveshows the 2-D case, the bottom curve shows the3-D case. The maximum values are

√2

2 and√

33 ,

respectively.

“morph” the exact solution into the energy so-lution. A more flexible strategy, first suggestedby Barton [10], is to split each B-format sig-nal into two bands so that two independent so-lutions can be used. We call this a dual-banddecoder. It requires the use of phase-matchedband-splitting filters similar to those used inloudspeaker crossover networks. The design ofsuch filters is discussed in Appendix A.

In summary, all the key components

• decoding matrices matched to the speakerarray geometry

• near-field compensation

• frequency-dependent gains (shelf filters ora dual-band) optimizing for ITD and ILDcues using phase-matched filters

are needed to satisfy various localization mecha-nisms. It is this compensation for important psy-choacoustic phenomena by simple means thatmakes a decoder Ambisonic.

Anecdotal evidence suggests that us-ing shelf or band-splitting filters with smallphase-matching errors is better than omitting

10

frequency-dependent processing all together.This remains to be confirmed with formallistening tests, however. Finally, if one mustuse a decoder without frequency-dependentprocessing, the best result will be obtainedusing the energy-optimized values of k for allfrequencies [1].

5 EXAMPLESWe review general classes of decoders and dis-cuss the results of testing on four samples. Twoperform well and two do not perform well forthe given speaker configuration, the

√3 : 1 rect-

angle.

5.1 Types of DecodersBeyond the issues of operating system, audioand user interfaces, the main distinguishing at-tributes of decoders are which of the three nec-essary components discussed in Sec. 4 are im-plemented and how the decoding matrix is spec-ified to the decoder.

Some decoders provide presets such assquare, rectangle, pentagon, hexagon, cube, do-decahedron, and so forth. With others the angu-lar position of the speaker is entered along withthe directivity of a virtual microphone or gainsof the various orders of spherical harmonics. Inthe third type, the decoding matrix is specifieddirectly.

5.2 Decoder 1Decoder 1 is Adriaensen’s AmbDec [22]. Ver-sion 0.2.0 was tested on a dual AMD Athlonmachine running Fedora Core 8 and the PlanetCCRMA distribution of audio software [23].It has provisions for near-field compensation,dual-band processing, and a number of otherfeatures. The decoding matrices are specified di-rectly in the configuration file. The distributioncontains presets files for common loudspeakerarrays, but does not include the

√3 : 1 array.

It was tested with decoding matrices derived bythe procedure outlined in Appendix A. Distance

was set to 2.0 meters and dual-band decodingwas turned on with 380 Hz crossover.

Results are shown in Figure 4. Examiningthe frequency and phase response graph, we seethat NFC is implemented and has the correct -3 dB point for 2 meter distance (27 Hz). TheNFC (correctly) has a large effect on the low-frequency gain and phase response of the ve-locity signals, which makes the magnitude ofrV less than 1 (0.91) and introduces a phase-matching error between pressure and velocity.These are intended to be the complement of thenear-field effect of the loudspeakers, so that atthe listening position, the magnitude of rV willbe exactly 1 and the phase mismatch will be 0.

To examine the frequency and phase re-sponse of the band-splitting filter in isolation,we ran a second test with NFC turned off.These results are shown in Figure 5. Frequency-dependent gains are implemented with the cor-rect values of k, and the phase responses of thefilters are matched. Also note that the pressureand velocity gains are identical over all sourceazimuths.

Examining the polar plots of rV and rE, wesee that all source azimuths are rendered cor-rectly at both high and low frequencies. Atlow frequencies the magnitude of rV is (almost)1 and at high frequencies the magnitude of rEvaries smoothly and has the highest attainableaverage for a first-order decoder of

√2

2 . Themagnitude of rV is slightly less than 1 (0.95) be-cause the shelf filters are already affecting thegain at 150 Hz, as seen in Figure 5(c).

This decoder and configuration is Am-bisonic.

5.3 Decoder 2Decoder 2 is a VST plugin that was tested on aMacBook Pro running OS X 10.5 using Biduleas a host program.8 The GUI has sliders used

8We do not identify Decoder 2 since many decodersappear to use the same underlying approach. We suspectthat any one of them would have produced results no bet-

11

(a) rV and rE measured at 150 Hz (b) rV and rE measured at 3 kHz

(c) frequency and phase response

Figure 4: AmbDec with configuration derived by the procedure given in Appendix A. This is avery good result. (a) and (b) show rV and rE at 150 Hz and 3 kHz. Source directions are correct andmatched. The magnitude of rV is uniform in all directions and rE at 3 kHz attains an average valueof

√2

2 . (c) shows that NFC and dual-band processing is implemented. The next figure shows thesame configuration with NFC switched off so that frequency and phase response can be examinedin isolation.

12

(a) rV and rE measured at 150 Hz (b) rV and rE measured at 3 kHz

(c) frequency and phase response

(d) phase mismatch between pressure and velocity

Figure 5: AmbDec with NFC switched off. Only three lines are seen in (c) because the pressureand velocity phase responses are identical.

13

to specify azimuth, elevation, directivity, anddistance for each loudspeaker. There is also aswitch to turn shelf filters on and off. The shelv-ing frequency is not listed in the documentation,nor is there any mention of near-field compensa-tion. It was tested with shelf filters switched onand the azimuths of the four speakers entered onthe sliders. All other settings were left in theirdefault positions.

Testing results are shown in Figure 6. Nonear-field compensation is provided. Shelf fil-ters are implemented with a turnover frequencyof about 415 Hz. The magnitude of rV varieswith source direction, and its direction does notmatch the source direction. For some source di-rections rV > 1.0. Frequency response varieswith source direction. At 3 kHz, the magni-tude of rE is < 0.5 from 45 to 135 degrees az-imuth. Note that over these azimuths, the shelffilters merely make a suboptimal rE even worse.We emphasize that shelf filters will have the in-tended effect only when paired with the correctdecoder matrix for the speaker array geometry.There is a small (15◦) error in phase matchingnear the crossover frequency.

In summary, this decoder is not suitable foruse with this speaker configuration when set upaccording to the instructions provided.

To be fair, it is possible to derive appropri-ate virtual microphone angles and directivitiesfrom the exact decoder matrix, enter those intodecoders of this type and obtain somewhat betterperformance than we observe here. In the gen-eral case, the virtual microphones will not pointat the speaker positions. See Sec. A.1.2 for fur-ther discussion and examples.

5.4 Decoder 3Decoder 3 is Csound’s bformdec Opcode.9

Csound is a computer programming languagefor dealing with sound, also known as a sound

ter than the one we happened to test.9Csound tests were performed by Sven Bien from Bre-

men University.

compiler or an audio programming language[24]. The sound processing elements are calledopcodes, which are connected and invoked us-ing orchestra and score files. The documenta-tion for bformdec says “New in Version 5.07(October 2007).” This opcode does not haveprovision for the rectangular configuration weare using; however, it is still useful to includethis test as an example of a decoder that is inwidespread use. The configuration tested is thesquare speaker layout.

The test was conducted with the followingorchestra file:

sr = 48000

kr = 4800

ksmps = 10

nchnls = 5

instr 1

a1 soundin "bf-1-sync.wav"

aw soundin "bf-2-w.wav"

ax soundin "bf-3-x.wav"

ay soundin "bf-4-y.wav"

az soundin "bf-5-z.wav"

a2, a3, a4, a5 bformdec 2, aw, ax, ay, az

outc a1, a2, a3, a4, a5

endin

Test results are shown in Figure 7.The most apparent feature of these results is

that the frequency response is perfectly flat; nei-ther NFC nor shelf filters are implemented. Atlow frequencies the magnitude of rV is 1 andsource directions are rendered correctly. How-ever, at high frequencies it remains 1, which willresult in in-head localization and comb filteringartifacts with head movement. This also puts thehigh-frequency rE magnitude well below the op-timum value. This decoder should not be used ascurrently implemented.

If one is forced to use this decoder, reducingthe levels of X , Y , and Z by 3 dB will producean “energy decode,” which was recommended in[1] in cases where no shelf filters are available.

14

(a) rV and rE measured at 150 Hz (b) rV and rE measured at 3 kHz

(c) frequency and phase response for a 0◦

azimuth source(d) frequency and phase response for a 90◦

azimuth source

(e) phase mismatch between pressure and velocity

Figure 6: Decoder 2

15

(a) rV and rE measured at 150 Hz (b) rV and rE measured at 3 kHz

(c) frequency and phase response for all source azimuths

Figure 7: Decoder 3 Csound’s bformdec Opcode. Only two of the four lines are visible in (c)because the pressure and velocity responses are identical.

16

(a) rV and rE measured at 150 Hz (b) rV and rE measured at 3 kHz

(c) frequency response with and without NFC (d) phase mismatch between pressure and velocity

Figure 8: Decoder 4 Minim Analog Ambisonic Decoder.

17

5.5 Decoder 4Decoder 4 is a Mimim AD-10 decoder whoseAmbisonic processing is similar to the modeldescribed here [5] and is an example of a ClassicAmbisonic Decoder. It is an all-analog imple-mentation of an Ambisonic decoder for horizon-tal decoding to four or six loudspeakers. It hasa switch to turn NFC on and off and a “layout”adjustment to accommodate speaker arrays withaspect ratios ranging from 1 : 2 to 2 : 1. For thesetests, the layout control was set to “3 o’clock”,which was our best estimate of the setting forthe

√3 : 1 array used for our analysis. NFC was

switched on. The test was performed using anAudio Precision analyzer. Results are shown inFigure 8. As can be seen from the graphs, cor-rect shelf filters and NFC are implemented. Atlow frequencies, rV = 1 and source directionsare rendered correctly. At high frequencies, themaximum possible values of rE are achieved, in-dicating that this decoder is operating near opti-mally for the speaker array used for these tests.The slight “egg shape” in rE at low frequenciesis most likely due to a gain mismatch betweenthe front and rear loudspeaker outputs.

This decoder is Ambisonic.

6 LISTENING TESTSOne of the authors carried out informal listeningtests in a domestic setting using some of the testfiles employed in earlier listening tests: voiceannouncements in eight directions, continuouslypanned pink noise, a recording of a classicalchamber orchestra, and the applause that fol-lowed the performance. The last was recordedwith a Calrec Soundfield Microphone MkIV, se-rial number 099, with original capsules and cal-ibration.

The results broadly confirm the results of thetesting described in this paper.

• Adriaensen’s AmbDec decoder providedgood localization in all directions, uni-form frequency response, and a good

sense of envelopment, with no audible ar-tifacts.

• With Decoder 2 all sources were localizedto the front or rear, with no sense of envel-opment and a very narrow “sound stage”on orchestral test material.

• We were not able to audition the Csounddecoder directly, but did simulate it inBidule using the measured parameters,and can confirm that the predicted comb-filtering and in-head localization artifactsare quite apparent.

7 DISCUSSIONFrom the measurements reported above, it maybe safely concluded that not all available Am-bisonic decoders perform according to the re-quirements set down in Sec. 2, and in fact mostdo not! The most common problems that werefound in decoders are

• Incorrect coefficients for rectangular orother nonregular polygonal loudspeakerarrays

• Lack of dual-band decoding

• Lack of near-field compensation

These omissions or errors cause the audio repro-duction to suffer from poor localization behav-ior, phasiness, and other artifacts.

The regular polygon dilemmaIn General Metatheory, Gerzon described a“naïve decoder for a regular polygon loud-speaker Layout,” with the following form:

W+Xcosθ +Y sinθ (15)

for which he proved that “the Makita and En-ergy vector localizations coincide, and the en-ergy vector magnitude, rE , cannot exceed

√2

2 .[12]

18

Unfortunately, that statement is true only forthe case of regular polygons, and following thisform for nonregular polygons gives an incorrectresult. Specifically, if a regular array is nar-rowed, then the naïve decoder gives values forthe coefficients that are larger for X and smallerfor Y, relative to the regular polygon. But in-tuition tells us that narrowing the array wouldrequire less X and more Y to achieve the samelocalization vectors.

The correct decoder for the rectangle case is

W± X√2cosϕ

± Y√2sinϕ

(16)

where 2ϕ is the angle subtended by the front twoloudspeakers.

It may be argued, and in fact we do argue,that the rectangle decoder (and its 3-D exten-sion, the bi-rectangle) is the single most impor-tant case for actual applications in reproductionof first-order Ambisonic recordings. The reasonfor this importance is that rectangular arrays fitbetter into ordinary rooms, and in addition thatthey give a needed improvement in mid/high-frequency localization toward the front and backwhen the correct decoder is used.

Many of the available software Ambisonicdecoders do not implement dual-band decod-ing, presumably because of lack of knowledgeabout how to design IIR filters. When dual-band decoding is implemented, it frequently isfound to utilize shelf filters that are not phasematched between the filter for W and the fil-ters for X/Y/Z. Since that phase mismatch oc-curs only in the frequency range around the tran-sition frequency (see Figure 6(e)), it is difficultto evaluate the seriousness of the error. Clearly,any errors will be program dependent, depend-ing on the spectral density in that particular fre-quency range. However, it is relatively easy todo it correctly and it should be done correctly.See Appendix A.

Previous listening tests by the authors of thispaper, and informal listening tests done duringthe writing of the present paper, have shown the

importance of all the features of an Ambisonicdecoder. The use of Ambisonic decoders thatare inappropriate to the venue, have incorrectdecoding coefficients, or lack the important fea-tures of dual-band decoding and NFC will giveresults that are inferior to what would be ob-tained with a correct Ambisonic decoder andwhich will prejudice the results of comparativelistening tests.

In this paper, we have made every effort to“tell it all” as clearly and plainly as we can with-out oversimplification, and back that up withexamples, test files, and sample code that canbe downloaded and used. We hope that re-searchers conducting experiments in audio lo-calization will adopt these or similar techniquesto validate their experimental setups and that de-coder writers will now have necessary knowl-edge and tools to write better decoders.

A decade ago, lack of program material (B-format recordings) was the biggest problem withAmbisonics. Now that downloads of B-formatrecordings [4], relatively low-cost B-format mi-crophones [25], and pocket-sized multichanneldigital audio recorders are available, suitableprogram material is somewhat more plentiful.

The next hurdle is the creation of easy-to-useplayback software that runs on popular comput-ing platforms, about which we can say: Theseare Ambisonic.

8 ACKNOWLEDGMENTSThe authors thank Sven Bien from BremenUniversity and his advisor Jörn Loviscach atHochschule Bremen, University of Applied Sci-ences, for help with Csound, and in partic-ular Sven for running the tests under verytight deadline constraints. We also thank Mar-tin Leese, Daniel Courville, David Malham,and Etienne Deleflie for their careful readingand suggestions, as well as the participants inthe sursound email discussion group for theirencouragement and comments, Don Dreweckifor providing a steady flow of top-notch B-

19

format concert recordings for our listening tests,and Jim Mastracco for many hours of stimu-lating discussion on microphones and acous-tics.

REFERENCES[1] Eric Benjamin, Richard Lee, and Aaron

Heller. Localization in Horizontal-OnlyAmbisonic Systems. In 121st Audio En-gineering Society Convention Preprints,number 6967, San Francisco, October2007. 1, 2, 11, 14, 23

[2] Richard Lee and Aaron J Heller. Am-bisonic localisation - part 2. 14th Interna-tional Congress on Sound and Vibration,2007. 1, 2

[3] Richard Elen. Ambisonic.net - wheresurround-sound comes to life. http://www.ambisonic.net/. 2

[4] Etienne Deleflie. Ambisonia: The nextgeneration surround sound experience, bythe enthusiasts, for the home theatre.http://www.ambisonia.com/. 2, 19

[5] Michael Gerzon. Multi-System AmbisonicDecoder. Wireless World, 83(1499):43–47,July 1977. Part 2 in issue 1500. Availablefrom http://www.geocities.com/ambinutter/Integrex.pdf (accessedJune 1, 2006). 2, 18

[6] Catherine Cuastavino, Veronique Larcher,Guillaume Gatusseau, and PatrickBossard. Spatial audio quality evaluation:Comparing transaural, ambisonics andstereo. Proceedings of the 13th Interna-tional Conference on Auditory Display,2007. 2

[7] Audun Solvang. Spectral impairment fortwo-dimensional higher order ambisonics.Journal of the Audio Engineering Society,56(4):267–279, Jan 2008. 3

[8] Peter G Craven. The hierarchical view-point. AES 22nd UK Conference - IllusionsIn Sound, (16):7, Sep 2007. 3

[9] David G Malham. Experience with LargeArea 3-D Ambisonic Sound Systems. Pro-ceedings of the Institute of Acoustics Au-tumn Conference on Reproduced Sound 8,1992. 3

[10] Michael A. Gerzon and Geoffrey J. Barton.Ambisonic Decoders for HDTV. In 92ndAudio Engineering Society ConventionPreprints, number 3345, Vienna, March1992. AES E-lib http://www.aes.org/e-lib/browse.cfm?elib=6788. 4, 8, 10

[11] Michael A. Gerzon. Practical Periphony:The Reproduction of Full-Sphere Sound.In 65th Audio Engineering SocietyConvention Preprints, number 1571,London, February 1980. AES E-libhttp://www.aes.org/e-lib/browse.cfm?elib=3794. 4, 10, 22, 23

[12] Michael A. Gerzon. General Metathe-ory of Auditory Localisation. In 92ndAudio Engineering Society ConventionPreprints, number 3306, Vienna, March1992. AES E-lib http://www.aes.org/e-lib/browse.cfm?elib=6827. 4, 5,18, 27

[13] Aaron Heller. Ambisonics reference ma-terial. http://www.ai.sri.com/ajh/ambisonics. 5, 6, 22

[14] Plogue Art et Technologie Inc. Bidule:modular audio software. http://www.plogue.com/. 7

[15] Guy-Bart Stan, Jean-Jacques Embrechts,and Dominique Archambeau. Compari-son of different impulse response measure-ment techniques. Journal of the AudioEngineering Society, 50(4):249–262, Apr2002. 5

20

[16] David Moore and Jonathan Wakefield.The Design of Improved First Order Am-bisonic Decoders by the Application ofRange Removal and Importance in aHeuristic Search Algorithm. PreprintsAES 122nd Convention, 7053, 2007. 8

[17] David Moore and Jonathan Wakefield. Ex-ploiting Human Spatial Resolution in Sur-round Sound Decoder Design. In 125thAES Convention Preprints, San Francisco,Oct 2008. 8

[18] Bruce Wiggins, Iain Paterson-Stephens,Val Lowndes, and Stuart Berry. The De-sign and Optimisation of Surround SoundDecoders Using Heuristic Methods. InProceedings of UKSim 2003, Conferenceof the UK Simulation Society, pages 106–114. UK Simulation Society, 2003. Availi-ble from http://sparg.derby.ac.uk/SPARG/PDFs/SPARG_UKSIM_Paper.pdf(accessed May 15, 2006). 8

[19] David Moore and Jonathan Wakefield. TheDesign of Ambisonic Decoders for theITU 5.1 Layout with Even PerformanceCharacteristics. In 124th Audio Engineer-ing Society Convention Preprints, number7473, Amsterdam, May 2008. 8

[20] Philip M. Morse and K. Uno Ingard. The-oretical Acoustics, chapter 7, page 311.Princeton University Press, 1986. ISBN:0691024014. 8, 27

[21] Fons Adriaensen. 1st vs 2nd order: tim-bre differences? email to Sursound onlinediscussion group at music.vt.edu, January2009. 9

[22] Fons Adriaensen. AmbDec UserManual, 0.4.3 edition, 2011.http://kokkinizita.linuxaudio.org/ambisonics/. 11

[23] Fernando Lopez-Lezcano. Planet ccrmaat home. http://ccrma.stanford.edu/planetccrma/software/. 11

[24] Wikipedia. Csound — wikipedia, the freeencyclopedia, 2008. [Online; accessed 11-August-2008]. 14

[25] Core sound tetramic. http://www.core-sound.com/TetraMic/1.php. 19

[26] Eric W. Weisstein. Moore-Penrose Matrix Inverse. FromMathWorld–A Wolfram Web Resource.http://mathworld.wolfram.com/Moore-PenroseMatrixInverse.html,2008. 23

[27] Wikipedia. Singular value decomposition— wikipedia, the free encyclopedia, 2008.[Online; accessed 10-August-2008]. 23

[28] Wikipedia. Digital biquad filter —wikipedia, the free encyclopedia, 2008.[Online; accessed 11-August-2008]. 26

[29] Leo Beranek. Acoustic Measurements,pages 56–64. John Wiley, New York,1949. 27

[30] Philip Cotterel. On the Theory ofthe Second-Order Soundfield Microphone.PhD thesis, Dept. of Cybernetics. ReadingUniversity, Febuary 2002. 27

[31] Jerome Daniel. Spatial sound encoding in-cluding near field effect: Introducing dis-tance coding filters and a viable, new am-bisonic format. Preprints 23rd AES Inter-national Conference, Copenhagen, 2003.27

21

APPENDICES

A DECODER DESIGNWe present some “cookbook” procedures for thedesign of the key decoder components, alongwith some digressions into the underlying the-ory and mathematics. Examples and implemen-tations of all three can be found on the authors’website [13].

A.1 Exact-Solution Decoder MatrixThis method is based on the idea of inversionwhere we write down the direction of propa-gation of the acoustic impulse created by eachloudspeaker in the array, decompose that intothe selected set of spherical harmonics and usegeneralized inversion to derive the decoder ma-trix that recreates the original impulse. This is ageneralization of Figure 12 (“The Design Math-ematics”) in [11] and can be shown to be equiv-alent to the least-squares solution. If the prob-lem is under-constrained (many solutions), as itis for typical Ambisonic speaker arrays (morespeakers than signals), it will give the solutionthat requires the minimum overall radiated en-ergy, which also will yield the largest |rE|’s.

While this procedure provides a solution forany loudspeaker array, only regular arrays andthose with speakers in diametrically opposedpairs (Type 1 and 2 from Sec. 4) will result in thedirections of rV and rE agreeing for all sourcedirections, which is one of the basic criteria forAmbisonic reproduction. As noted earlier, so-lution of general irregular arrays (Type 3) thatsatisfy Ambisonic criteria is beyond the scopeof this paper.

An additional constraint is that all the speak-ers in the array are equidistant from the listen-ing position. While this can be relaxed by in-troducing delays, “1/r”-level adjustments, andper-speaker NFC, it is also beyond the scope ofthis paper.

We assume that each loudspeaker in the ar-ray produces a planar wave front propagating to-

wards the center of the array that is parallel tothe direction given by its position relative to thecenter. For the ith loudspeaker

Li =[xi yi zi

](17)

where xi2 + yi

2 + zi2 = 1, that is they are the di-

rection cosines of the of the vector from the cen-ter of the array to the ith loudspeaker. In spheri-cal coordinates this is

Li =[cosθ cosε sinθ cosε sinε

](18)

where θ is the counterclockwise azimuth fromdirectly ahead, and ε is the elevation from hori-zontal.

Next, we select a set of spherical harmonicfunctions, up to the desired order, that form anorthogonal basis and then project the speaker di-rections onto it. For first-order Ambisonics thereis single choice, the B-format definitions.10 InCartesian coordinates, each Li becomes

Ki =[√

22 xi yi zi

]. (19)

For the array used in this paper, the√

3 : 1 rect-angle, which has speakers at azimuths 30, 150,210, and 330 degrees in the horizontal plane

K_rect30 =

0.7071 0.8660 0.5000 0

0.7071 -0.8660 0.5000 0

0.7071 -0.8660 -0.5000 0

0.7071 0.8660 -0.5000 0

where each row corresponds to the coefficientsof a single speaker. For a cuboid array that is 2meters wide, 3 meters deep, and 1.5 meters tall

K_cuboid =

0.7071 0.5121 0.7682 -0.3841

0.7071 0.5121 -0.7682 -0.3841

0.7071 -0.5121 -0.7682 -0.3841

0.7071 -0.5121 0.7682 -0.3841

0.7071 0.5121 0.7682 0.3841

0.7071 0.5121 -0.7682 0.3841

0.7071 -0.5121 -0.7682 0.3841

0.7071 -0.5121 0.7682 0.3841

10For higher-order Ambisonics, there are at least threepossibilities.

22

We want to find M, the decoder matrix, that sat-isfies the condition

M×K = I (20)

where I is the identity matrix, a matrix with 1’son the diagonal and 0’s everywhere else and ×indicates matrix multiplication. When that issatisfied, it means that the speaker array, L, incombination with the decoder matrix, M, canreproduce all the spherical harmonics indepen-dently (i.e., without crosstalk).

If K is invertible, then M = K−1; however,in the case with most Ambisonic arrays (and inparticular in the two examples above), it is not,so we use the least-squares solution to the sys-tem

M×K− I = r (21)

In the typical case, we have more speakers thansignals, so this system is over determined andthere are many solutions. By selecting the onethat also minimizes the 2-norm of r, we obtainthe one providing the highest average value of|rE|.

The desired solution of Eqn. 21, M is givenby the Moore-Penrose pseudoinverse of K [26],which is available in Matlab and GNU Octaveas the function pinv() and in many other com-puter mathematics systems, such as Scilab andMathematica.

In Matlab,

>> M_rect30 = pinv(K_rect30);

>> transpose(M_rect30)

ans =

0.3536 0.2887 0.5000 0

0.3536 -0.2887 0.5000 0

0.3536 -0.2887 -0.5000 0

0.3536 0.2887 -0.5000 0

>> M_cuboid = pinv(K_cuboid);

>> transpose(M_cuboid)

ans =

0.1768 0.2441 0.1627 -0.3254

0.1768 0.2441 -0.1627 -0.3254

0.1768 -0.2441 -0.1627 -0.3254

0.1768 -0.2441 0.1627 -0.3254

0.1768 0.2441 0.1627 0.3254

0.1768 0.2441 -0.1627 0.3254

0.1768 -0.2441 -0.1627 0.3254

0.1768 -0.2441 0.1627 0.3254

where the function transpose() swaps therows and columns of the matrix, yielding a ma-trix where each column corresponds to one ofthe B-format signals, W, X, Y, and Z and eachrow contains the decoder gains for the corre-sponding speaker for that signal.

We can now use Eqn. 21 to check the qual-ity of the solution by examining the entries inr. Non-zero entries on the diagonal indicate aspherical harmonic that is not being reproducedcorrectly. Non-zero entries off the diagonal indi-cate crosstalk or aliasing between the sphericalharmonics. Either condition indicates that fur-ther analysis of the array geometry is needed.

A.1.1 A further math digression...One way to compute A†, the pseudoinverse ofA, is

A† = (ATA)−1AT (22)

where AT indicates the transpose of A.11 Withone further optimization (factoring out the Wcolumn, which is constant), this is what Gerzonis doing in Figure 12 of Practical Periphony[11],which is reproduced as Eqn. 4 in [1].

Most spreadsheet programs include basicmatrix operations such as transposition and in-version, so it is possible to create spreadsheetsthat do these calculations, however from a nu-merical computing standpoint, this is not thebest way to obtain the pseudoinverse.

A better way to compute A† is to usesingular-value decomposition (SVD) [27]. Thiswill also yield some insight into the underlyingmechanism of the inverse method. The SVDfactors any matrix (real or complex) into three

11If A is complex, use the conjugate transpose, denotedAH (where the H stands for "Hermitian".)

23

other matrices, as follows

A = U×Σ×VT (23)

where U and V are orthonormal and Σ is diago-nal. Then the pseudoinverse is

A† = V×Σ† ×UT. (24)

Σ† is trivial to compute because it is diagonal;simply substitute each non-zero entry in Σ withits reciprocal.

Informally, V is said to contain the “input”or “analyzing” vectors of A, U is said to con-tain the “output” vectors of A, and the diagonalentries of Σ the “gains.”

In terms of solving for Ambisonic decodermatrices

1. VT transforms K, into an orientation thatis symmetric about the coordinate axes, X ,Y , and Z in the case of first order, so eachcan be adjusted independently, then

2. Σ† adjusts the gain of each spherical har-monic, so the pressure, velocity, and soforth, are correctly reproduced, whichalso assures that source directions are cor-rectly reproduced, and finally,

3. U returns everything to the original orien-tation of the speaker array.

The non-zero entries in Σ are called the singularvalues of A. In analyzing Ambisonic playbacksystems, if the number of singular values doesnot equal the number of signals in use, then thespeaker array is not capable of reproducing allthe intended spherical harmonics. This may be atrivial result, for example that a horizontal arraycannot reproduce Z, or more significantly, thatthe array geometry is degenerate in some otherway.

The ratio of the largest to the smallest sin-gular value is called the condition of the matrix.This is related to the minimum and maximumvalues of rE , and hence gives us insight into theoverall quality of localization the array will pro-vide. For example, in Matlab

>> svd(K_rect30)

ans =

1.7321

1.4142

1.0000

0

indicates that one spherical harmonic will not bereproduced (Z) and that rE will not be uniformin all directions.

A.1.2 Virtual MicrophonesWith some decoders, the decoder matrix is spec-ified in terms of virtual microphones. The thedirections and directivites are obtained from theexact solutions found in Sec. A.1 as follows

[ az el r ] = cart2sph( M(:,2), ...

M(:,3), ...

M(:,4) );

d = r ./ ( r + M(:,1)*sqrt(1/2) );

vm = [ az el d ]

Using the two examples from above and con-verting from radians to degrees

>> [ az el r ] = cart2sph( M_rect30(:,2), ...

M_rect30(:,3), ...

M_rect30(:,4) );

>> d = r ./ ( r + M_rect30(:,1)*sqrt(1/2) );

>> vm_rect30 = [ az*(180/pi) el*(180/pi) d ]

vm_rect30 =

60.0000 0 0.6202

120.0000 0 0.6202

-120.0000 0 0.6202

-60.0000 0 0.6202

>> [ az el r ] = cart2sph( M_cuboid(:,2), ...

M_cuboid(:,3), ...

M_cuboid(:,4) );

>> d = r ./ ( r + M_cuboid(:,1)*sqrt(1/2) );

>> vm_cuboid = [ az*(180/pi) el*(180/pi) d ]

vm_cuboid =

33.6901 -47.9689 0.7125

24

-33.6901 -47.9689 0.7125

-146.3099 -47.9689 0.7125

146.3099 -47.9689 0.7125

33.6901 47.9689 0.7125

-33.6901 47.9689 0.7125

-146.3099 47.9689 0.7125

146.3099 47.9689 0.7125

Note that the virtual microphones do not pointat the loudspeakers.

A.2 Phase-Matched Band-SplittingFilters

Sec. 4 discussed the need for frequency-dependent processing in order to transition fromthe exact solution at low-frequencies (LF), toone that optimizes rE at high frequencies (HF).The crossover frequency use by Classic Am-bisonic Decoders and in our earlier listeningtests is 380 Hz. We present the design of a suit-able filter for this.

The key idea is to treat the LF-to-HF transi-tion as one would the crossover network feedingthe LF and HF units in a loudspeaker. We de-sire a gradual transition, so simple second-orderfilters are used

LF(s) =1

1+2sT +(sT )2 (25)

HF(s) =(sT )2

1+2sT +(sT )2 (26)

These have the -6 dB point at the crossover fre-quency ω = 1/T rad/sec. If you combine these,the outputs cancel at the crossover frequency,but reversing the phase of the HF section, makesits phase match that of the LF section and thereis no cancellation at the crossover frequency.The output is

Total(s) =1− (sT )2

1+2sT +(sT )2 (27)

=(1+ sT )(1− sT )(1+ sT )(1+ sT )

(28)

=1− sT1+ sT

(29)

which is a first-order all-pass network. Hence,the phase response is the same as the LF sec-tion and is maintained regardless of the relativelevels of the LF and HF sections.

Applying the bilinear transformation to im-plement these as digital infinite-impulse re-sponse (IIR) filters, the second-order pole

H(s) =1

1+2sT +(sT )2 (30)

becomes

H(z) =b0 +b1z−1 +b2z−2

a0 +a1z−1 +a2z−2 (31)

where, for the LF section

b0 =k2

k2 +2k +1(32)

b1 = 2b0 (33)b2 = b0 (34)a0 = 1 (35)

a1 =2(k2 −1)

k2 +2k +1(36)

a2 =k2 −2k +1k2 +2k +1

(37)

and, for the HF section

b0 =1

k2 +2k +1(38)

b1 = −2b0 (39)b2 = b0 (40)

with a0, a1, a2 as in the LF section and

k = tanπFc

Fs(41)

and Fc is the crossover frequency in Hz and Fs isthe sample rate.12

As an example, with Fc = 380 and Fs =48000

12Note that this k is not the same as the k used in Sec. 4.

25

b_lp =

0.000589143208472

0.001178286416944

0.000589143208472

b_hp =

0.952044598366767

-1.904089196733534

0.952044598366767

a =

1.000000000000000

-1.902910910316590

0.905267483150478

These filters can be implemented in Plogue Bid-ule as Direct Form 1 IIRs [28] using the Recur-sive Function block. The HF section is speci-fied by entering

( 0.952044598366767 * x) +

(-1.904089196733534 * prevX(1)) +

( 0.952044598366767 * prevX(2)) -

(-1.902910910316590 * prevR(1)) -

( 0.905267483150478 * prevR(2))

and the LP section by entering

( 0.000589143208472 * x) +

( 0.001178286416944 * prevX(1)) +

( 0.000589143208472 * prevX(2)) -

(-1.902910910316590 * prevR(1)) -

( 0.905267483150478 * prevR(2))

Recall that the desired phase response is ob-tained by subtracting the output of these sec-tions, so after scaling according to the desiredresponse, the output signals must be differenced,not summed.

A.3 Near-Field Compensation FilterAll that is needed here is a first-order high-pass(HP) filter

H(s) =s

1+ sT(42)

which translates to the digital IIR filter

H(z) =b0 +b1z−1

a0 +a1z−1 (43)

Figure 10: Frequency and phase response of 2-meter NFC filter implemented as Direct Form 1IIR with Bidule’s recursive function block.

with

b0 =1

k +1(44)

b1 = −b0 (45)a0 = 1 (46)a1 = (k−1)b0 (47)

where k is given by Eqn. 41.As an example, for 2 meters, f-3dB = 27.1 Hz

and Fs = 48000

b =

0.998229447703366 -0.998229447703366

a =

1.000000000000000 -0.996458895406731

Implementing in Bidule’s recursive functionblock

( 0.998229447703366 * x) +

(-0.998229447703366 * prevX(1)) -

(-0.996458895406731 * prevR(1))

B IS MY ENCODER AMBISONIC?Many references show the first-order horizontalAmbisonic B-format encoding equations asW

XY

=

22

cosθsinθ

S (48)

26

(a) low-pass filter (b) high-pass filter

Figure 9: Frequency and phase response of phase-matched 380 Hz band-splitting filters imple-mented as Direct Form 1 IIR with Bidule’s recursive function block.

where θ is the azimuth and S is the pressuredue to the source at the position of the “micro-phone.” But this is a simplification that has ledto much misunderstanding of the nature of B-format. As an example, one myth suggests thatB-format does not accurately encode near or dif-fuse soundfields because these equations haveno “phase information.”

W is a perfect omni-directional pressure mi-crophone with -3 dB gain. However, no practi-cal microphone has a response as shown for Xand Y . The correct equations regard X and Yas two perfect figure-8 particle velocity micro-phones with an on-axis gain of 1. As such, X andY are subject to the variations in phase and am-plitude between particle velocity and pressureencountered in real life. An important exampleof this is proximity [29, 20], which is a clearand unambiguous coding of distance for nearsources. Cotterel [30] correctly derives XY Z assolutions of the Helmholtz wave equation. Thiscodes near and diffuse soundfields properly andis consistent with practical implementations ofthe Soundfield Microphone.

The correct equation, via Beranek [29] en-codes a single point source at distance d asW

XY

=

22

D(s)cosθD(s)sinθ

S (49)

where D(s) =1+ sT

sT, T =

dc

, and c is the speedof sound.

Eqn. 49 is essentially Gerzon’s full expres-sion for velocity components at the bottom ofpage 15, General Metatheory of Auditory Lo-calisation [12], but from an encoding viewpoint.The Wave Equation precludes any practical de-vice that implements the simplistic Eqn. 48.

Daniel [31] extends this form for higher or-ders but his derivation does not convenientlydescribe diffuse fields, standing waves or non-point sources for which we refer you to Cotterel[30]. However, a microphone with response topoint sources as Eqn. 49 is necessary and suffi-cient to correctly encode diffuse fields, standingwaves, non-point and nearby sources.

27