1 ) -0 #0*- ! 1 ) #! !-* 0* ! ) *- ! ) ) ) *#! #0! *sigmedia/pmwiki/uploads/main...# ( "...

11
ACTA ACUSTICA UNITED WITH ACUSTICA Vol. 98 (2012) 61 – 71 DOI 10.3813/AAA.918492 Distance Perception in Interactive Virtual Acoustic Environments using First and Higher Order Ambisonic Sound Fields Gavin Kearney 1) , Marcin Gorzel 2) , Henry Rice 3) , Frank Boland 2) 1) Department of Theatre, Film and Television, University of York, United Kingdom. [email protected] 2) Department of Electronic and Electrical Engineering, Trinity College Dublin, Ireland. [gorzelm, fboland]@tcd.ie 3) Department of Mechanical and Manufacturing Engineering, Trinity College Dublin, Ireland. [email protected] Summary In this paper, we present an investigation into the perception of source distance in interactive virtual auditory environments in the context of First (FOA) and Higher Order Ambisonic (HOA) reproduction. In particular, we investigate the accuracy of sound field reproduction over virtual loudspeakers (headphone reproduction) with increasing Ambisonic order. Performance of 1 st ,2 nd and 3 rd order Ambisonics in representing distance cues is assessed in subjective audio perception tests. Results demonstrate that 1 st order sound fields can be sucient in representing distance cues for Ambisonic-to-binaural decodes. PACS no. 43.20.-f, 43.55.-n, 43.58.-e, 43.60.-c, 43.71.-k, 43.75.-z 1. Introduction Recent advances in interactive entertainment technology have led to visual displays with a convincing perception of source distance, based not only on stereo vision tech- niques, but also on real time graphics rendering technol- ogy for correct motion parallax [1, 2]. Typically, such presentations are accompanied by loud- speaker surround technology based on amplitude panning techniques and aimed at multiple listeners. However, in interactive virtual environments, headphone listening al- lows for greater control over personalized sound field re- production. One method of auditory spatialization is to in- corporate Head Related Transfer Functions (HRTFs) into the headphone reproduction signals. HRTFs describe the interaction of a listener’s head and pinnae on impinging source wavefronts. It has been shown that for eective externalization and localization to occur, head-tracking should be employed to control this spatialization pro- cess [3], particularly where non-individualised HRTFs are used. However, the switching of the directionally depen- dent HRTFs with head movement can lead to auditory arti- facts caused by wave discontinuity in the convolved binau- ral signals [4]. A more flexible solution is to form ‘virtual loudspeakers’ from HRTFs, where the listener is placed at the centre of an imaginary loudspeaker array. Here, the loudspeaker feeds are changed relative to the head po- sition and any technique for sound source spatialization Received 25 February 2011, accepted 1 October 2011. over loudspeakers can be used. Many dierent spatializa- tion systems have been proposed for such application in the literature, most notably Vector Based Amplitude Pan- ning (VBAP) [5] and Wavefield Synthesis [6]. However, the Ambisonics system [7], which is based on the spheri- cal harmonic decomposition of the sound field, represents a practical and asymptotically holographic approach to spatialization. It is well known in Ambisonic loudspeaker reproduction, that as the order of sound field representa- tion gets higher, the localization accuracy increases due to greater directional resolution. However, there are many unanswered questions of the capability of Ambisonic techniques with regard to the per- ception of depth and distance. In this paper, we want to investigate whether enhanced directional accuracy of di- rect sound and early reflections in a sound field can pos- sibly lead to better perception of environmental depth and thus better localization of the sound source distance in this environment. We approach the problem by means of sub- jective listening tests in which we compare the perception of distance of real sound sources to the First Order Am- bisonic (FOA) and Higher Order Ambisonic (HOA) sound fields presented over headphones. This paper is outlined as follows: We will begin by pre- senting a succinct review of the relevant psychoacousti- cal aspects of auditory localization and distance percep- tion. We will then outline the incorporation of Ambisonic techniques to virtual loudspeaker reproduction and sub- sequent re-synthesis of measured FOA sound fields into higher orders. A case study investigating the perception of source distance at higher Ambisonic orders is then pre- sented through subjective listening tests. © S. Hirzel Verlag · EAA 61

Upload: others

Post on 09-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 ) -0 #0*- ! 1 ) #! !-* 0* ! ) *- ! ) ) ) *#! #0! *sigmedia/pmwiki/uploads/Main...# ( " &/4$/' ,$ +$ *- ! ) %- #! ! !- ) - 1 ) -0 #0*- ! 1 ) #! !-* 0* ! ) *- ! )) ) *#! #0! * K05

ACTA ACUSTICA UNITED WITH ACUSTICAVol 98 (2012) 61 ndash 71

DOI 103813AAA918492

Distance Perception in Interactive VirtualAcoustic Environments using First and HigherOrder Ambisonic Sound Fields

Gavin Kearney1) Marcin Gorzel2) Henry Rice3) Frank Boland2)

1) Department of Theatre Film and Television University of York United Kingdom gavinkearneyyorkacuk2) Department of Electronic and Electrical Engineering Trinity College Dublin Ireland

[gorzelm fboland]tcdie3) Department of Mechanical and Manufacturing Engineering Trinity College Dublin Ireland hricetcdie

SummaryIn this paper we present an investigation into the perception of source distance in interactive virtual auditoryenvironments in the context of First (FOA) and Higher Order Ambisonic (HOA) reproduction In particular weinvestigate the accuracy of sound field reproduction over virtual loudspeakers (headphone reproduction) withincreasing Ambisonic order Performance of 1st 2nd and 3rd order Ambisonics in representing distance cues isassessed in subjective audio perception tests Results demonstrate that 1st order sound fields can be sufficient inrepresenting distance cues for Ambisonic-to-binaural decodesPACS no 4320-f 4355-n 4358-e 4360-c 4371-k 4375-z

1 Introduction

Recent advances in interactive entertainment technologyhave led to visual displays with a convincing perceptionof source distance based not only on stereo vision tech-niques but also on real time graphics rendering technol-ogy for correct motion parallax [1 2]

Typically such presentations are accompanied by loud-speaker surround technology based on amplitude panningtechniques and aimed at multiple listeners However ininteractive virtual environments headphone listening al-lows for greater control over personalized sound field re-production One method of auditory spatialization is to in-corporate Head Related Transfer Functions (HRTFs) intothe headphone reproduction signals HRTFs describe theinteraction of a listenerrsquos head and pinnae on impingingsource wavefronts It has been shown that for effectiveexternalization and localization to occur head-trackingshould be employed to control this spatialization pro-cess [3] particularly where non-individualised HRTFs areused However the switching of the directionally depen-dent HRTFs with head movement can lead to auditory arti-facts caused by wave discontinuity in the convolved binau-ral signals [4] A more flexible solution is to form lsquovirtualloudspeakersrsquo from HRTFs where the listener is placedat the centre of an imaginary loudspeaker array Here theloudspeaker feeds are changed relative to the head po-sition and any technique for sound source spatialization

Received 25 February 2011accepted 1 October 2011

over loudspeakers can be used Many different spatializa-tion systems have been proposed for such application inthe literature most notably Vector Based Amplitude Pan-ning (VBAP) [5] and Wavefield Synthesis [6] Howeverthe Ambisonics system [7] which is based on the spheri-cal harmonic decomposition of the sound field representsa practical and asymptotically holographic approach tospatialization It is well known in Ambisonic loudspeakerreproduction that as the order of sound field representa-tion gets higher the localization accuracy increases due togreater directional resolution

However there are many unanswered questions of thecapability of Ambisonic techniques with regard to the per-ception of depth and distance In this paper we want toinvestigate whether enhanced directional accuracy of di-rect sound and early reflections in a sound field can pos-sibly lead to better perception of environmental depth andthus better localization of the sound source distance in thisenvironment We approach the problem by means of sub-jective listening tests in which we compare the perceptionof distance of real sound sources to the First Order Am-bisonic (FOA) and Higher Order Ambisonic (HOA) soundfields presented over headphones

This paper is outlined as follows We will begin by pre-senting a succinct review of the relevant psychoacousti-cal aspects of auditory localization and distance percep-tion We will then outline the incorporation of Ambisonictechniques to virtual loudspeaker reproduction and sub-sequent re-synthesis of measured FOA sound fields intohigher orders A case study investigating the perceptionof source distance at higher Ambisonic orders is then pre-sented through subjective listening tests

copy S Hirzel Verlag middot EAA 61

ACTA ACUSTICA UNITED WITH ACUSTICA Kearney et al Distance perceptionVol 98 (2012)

2 Distance Perception

It is important to note that throughout the literature thereexists a clear distinction between lsquodistancersquo and lsquodepthrsquoboth understood as perceptual attributes of sound Accord-ing to [8] lsquodistancersquo is related to the physical range be-tween the sound source and a listener whereas lsquodepthrsquo re-lates to the recreated auditory scene as a whole and con-cerns a sense of perspective in that scene

21 Distance Perception in a Free Field

Although the human ability to perceive sources at differ-ent distances is not fully understood there are several keyfactors which are known to contribute to distance percep-tion In the first case changes in distance lead to changesin the monaural transfer function (the sound pressure atone ear) This is shown in Figure 1 for a spherical modelof a head We see that for sources of less than 1m distancethe sound pressure level varies depending on the angle ofincidence due to the shadowing effects of the head Be-yond 1m the intensity of the source decays according tothe inverse square law

However absolute monoaural cues will only be mean-ingful if we have some prior knowledge of the source levelie how familiar we are with the source In other words aform of semiosis occurs where the perception of localiza-tion is based on anticipation and experience [9] For exam-ple for normal level speech (approximately 60dB at 1m)we expect nearer sources to be loud and quieter sourcesfurther away However this is more difficult to assess forsynthetic sounds or sounds that we are unfamiliar with

It is interesting to note that for sources in the medianplane the level at distances less than 1m does not changeas dramatically as sources located at the ipsilateral pointThis will not significantly affect the low frequency Interau-ral Time Difference (ITD) but it is reflected in the Interau-ral Level Difference (ILD) as shown in Figure 2 We notethat the most extreme ILD is exhibited at the side of thehead (90) due to the maximum head shadowing effectFor a similar reason subconscious head movements maybe regarded as another important cue since level changesclose to the source will be more apparent then far from it[10] Thus near-field ILD cues exist which aid us in dis-criminating source distance

On the other hand for larger distances and high soundpressure levels the propagation speed of a sound wavein a medium ceases to be constant with frequency whichmay lead to distortion of the waveform [11] Furthermoresound waves travelling a substantial distance also undergoa process of energy absorption by water molecules in theatmosphere This is more apparent for high-frequency en-ergy of the wave and leads to spectral changes (low-passfiltering) of the sound being heard

22 Distance Perception in a Reverberant Field

In reverberant rooms the ratio of the direct to reverberantsound plays an extremely important role in distance per-ception For near sources where the direct field energy is

Source distance (m)

RM

SM

onaura

lTra

nsfe

rFunction

(dB

)

1-2

-1

0

1

2

0deg

30deg

60deg

90deg

3

4

2 5 1 2 5 10

Figure 1 RMS monaural transfer function for a spherical headmodel at the left ear for broadband source at different angles withvarying source distance (reference = plane wave at (0 0))

Source distance (m)

ILD

(dB

)

1-5

0

5

10

15

20

25

30

0deg

30deg

60deg

90deg

2 5 1 2 5 10

Figure 2 Interaural level difference of spherical head model forbroadband source at different angles with varying source dis-tance

much greater than the reverberant field the sound pressurelevel approximately changes in accordance to the free-fieldconditions However for source-listener distances greaterthan the critical distance the level of reverberation is ingeneral independent of the source position due to the ho-mogeneous level of the diffuse field and the direct to re-verberant ratio changes approximately 6dB per doublingof distance from the source

The directions of arrival of the early reflections are an-other parameter which change according to the source-listener position and can be regarded as an important factorin creating environmental depth Whether it is useful to thelisteners in determining the distance to the sound sourcein the presence of other cues like sound intensity directto reverberant energy ratio or the arrival pattern of delaysremains an open question that needs to be addressed Am-bisonics allows for enhanced directional reproduction ofdeterministic components of a sound field by increasing

62

Kearney et al Distance perception ACTA ACUSTICA UNITED WITH ACUSTICAVol 98 (2012)

the order of spherical harmonic decomposition Howeverbetter directional localization can be achieved without af-fecting other important cues for distance estimation likeoverall sound intensity or direct to reverberant energy ra-tio Thus it can constitute an ideal framework for testingwhether less apparent properties of a sound field can influ-ence the perception of distance

23 Former Psychoacoustical Studies on DistancePerception

The perception of distance has been shown to be one that isnot linearly proportional to the source distance For exam-ple both Nielson et al [12] and Gardner [13] have shownthat the localization of speech signals is consistently un-derestimated in an anechoic environment This underesti-mation has also been shown by other authors in the contextof reverberant environments both real and virtual In [14]Bronkhorst et al demonstrate that in a damped virtual en-vironment sources are consistently perceived to be closerthan in a reverberant virtual environment due to the directto reverberant ratio In their studies the room simulationis conducted using simulated Binaural Room Impulse Re-sponses (BRIRs) created from the image source method[15] They show how perceived distance increases rapidlywith the number and amplitude of the reflections

In a similar study Rychtarikova et al [16] investi-gated the difference in localization accuracy between realrooms and computationally derived BRIRs Their findingsshow that at 1m localization accuracy in both the virtualand real environments is in good agreement with the truesource position However at 24m the accuracy degradesand high frequency localization errors were found in thevirtual acoustic pertaining to the difference in HRTFs be-tween the model and the subject In the same vain Chan etal [17] have shown that distance perception using record-ings made from the in-ear microphones on individual sub-jects again lead to underestimation of the source distancein virtual reverberant environments more so than with realsources

Waller [18] and Ashmead et al [10] have identified thatone of the factors improving distance perception is the lis-tener movement in the virtual or real space It is thereforecrucial to account for any listenerrsquos movements (or lackthereof) in the experimental design

Similarly for headphone reproduction of virtual acous-tic environments small subconscious head rotations maylead to improvements in distance perception by providingenhanced ILD and ITD cues Therefore the sound fieldtransformations should reflect well the small changes oforientation of the listenerrsquos head

3 Ambisonic Spatialization

Ambisonics was originally developed by Gerzon Bartonand Fellgett [7] as a unified system for the recording re-production and transmission of surround sound The the-ory of Ambisonics is based on the decomposition of the

sound field measured at a single point in space into spher-ical harmonic functions defined as

Y σmn(ΦΘ) = AmnPmn(sinΘ) (1)

middot cos(mΦ) if σ = +1sin(mΦ) if σ = minus1

where m is the order and n is the degree of the sphericalharmonic and Pmn is the fully normalized (N3D) associ-ated Legendre function The coordinate system used com-prises x y and z axes pointing to the front left and uprespectively Φ is the azimuthal angle with the clockwiserotation and Θ is the elevation angle form the x-y planeFor each order m there are (2m + 1) spherical harmonics

In order for plane wave representation over a loud-speaker array we must ensure that

s Y σmn(ΦΘ) =I

i=1

gi Yσmn(φi θi) (2)

where s is the pressure of the source signal from direction(ΦΘ) and gi is the ith loudspeaker gain from direction(φi θi) We can then express the left hand side of equation(2) in vector notation giving the Ambisonic channels

B = YΦΘs (3)

= Y 100(ΦΘ) Y 1

10(ΦΘ) Y σmm(ΦΘ)Ts

Equation (2) can then be rewritten as

B = C middot g (4)

where C are the encoding gains associated with the loud-speaker positions and g is the loudspeaker signal vector Inorder to obtain g we require a decode matrix D which isthe inverse of C However to invert C we need the matrixto be a square which is only possible when the number ofAmbisonic channels is equal to the number of loudspeak-ers When the number of loudspeaker channels is greaterthan the number of Ambisonic channels which is usuallythe case we then obtain the pseudo-inverse of C where

D = pinv(C) = CT (CCT )minus1 (5)

Since the sound field is represented by a spherical coor-dinate system sound field transformation matrices can beused to rotate tilt and tumble the sound fields In this waythe Ambisonic signals themselves can be controlled by theuser allowing for the virtual loudspeaker approach to beemployed For 3-D reproduction the number of I virtualloudspeakers employed with the Ambisonics approach isdependent on the Ambisonic order m where

I ge N = (m + 1)2 (6)

63

ACTA ACUSTICA UNITED WITH ACUSTICA Kearney et al Distance perceptionVol 98 (2012)

4 Virtual Loudspeaker Reproduction

In the lsquovirtual loudspeakerrsquo approach HRTFs are mea-sured at the lsquosweet-spotrsquo (the limited region in the cen-tre of a reproduction array where an adequate spatial im-pression is generally guaranteed) in a multi-loudspeakerreproduction setup and the resultant binaural playback isformed from the convolution of the loudspeaker feeds withthe virtual loudspeakers This concept is illustrated in Fig-ure 3 For the left ear we have

L =I

i=1

hLi lowast qi (7)

where lowast denotes convolution and hLi is the left ear HRIRcorresponding to the ith virtual loudspeaker and qi is theith loudspeaker feed Similar relations apply for the rightear signal This method was first introduced by McKeagand McGrath [19] and examples of its adoption can befound in [20] and [21] This approach has major computa-tional advantages since a complex filter kernel is not re-quired and head rotation can be simulated by changing theloudspeaker feeds p as opposed to the HRTFs Whilst theHRTFs in this case play an important role in the spatializa-tion ultimately it is the sound field creation over the virtualloudspeakers which gives the overall spatial impressionMost existing research uses a block frequency domain ap-proach to this convolution However given that the virtualloudspeaker feeds are controlled via head-tracking in real-time a time-domain filtering approach can also be utilizedFor short filter lengths obtaining the output in a point wisemanner avoids the inherent latencies introduced by blockconvolution in the frequency domain A strategy for sig-nificant reduction of the filter length without artifacts hasbeen proposed in [22]

5 Higher Order Synthesis

In order to compare the distance perception of differentorders of Ambisonic sound fields it is desirable to takereal world sound field measurements However the for-mation of higher order spherical harmonic directional pat-terns is non-trivial Thus in order for us to change FOAimpulse responses to HOA representations we will em-ploy a perceptual based approach which will allow us tosynthesize the increased directional resolution that wouldbe achieved with a HOA sound field recording For this weadopt the directional analysis method of Pulkki and Meri-maa found in [23] Here the B-format signals are analyzedin terms of sound intensity and energy in order to derivetime-frequency based direction of arrival and diffusenessThe instantaneous intensity vector is given from the pres-sure p and particle velocity u as

I(t) = p(t) u(t) (8)

Since we are using FOA impulse response measurementsthe pressure can be approximated by the 0th order Am-bisonics component w(t) which is omnidirectional

p(t) = w(t) (9)

Figure 3 The virtual loudspeaker reproduction concept

and the particle velocity by

u(t) =1radic2Z0

x(t)ex + y(t)ey + z(t)ez (10)

where ex ey and ez represent Cartesian unit vectors x(t)y(t) z(t) are the FOA signals and Z0 is the characteristicacoustic impedance of air

The instantaneous intensity represents the direction ofthe energy transfer of the sound field and the direction ofarrival can be determined simply by the opposite directionof I For FOA we can calculate the intensity for each coor-dinate axis and in the frequency domain Since a portionof the energy will also oscillate locally a diffuseness esti-mate can be made from the ratio of the magnitude of theintensity vector to the overall energy density E given as

ψ = 1 minusI

c E (11)

where middot denotes time averaging || middot || denotes the normof the vector and c is the speed of sound The diffusenessestimate will yield a value of zero for incident plane wavesfrom a particular direction but will give a value of 1 wherethere is no net transport of acoustic energy such as in thecases of reverberation or standing waves Time averagingis used since it is difficult to determine an instantaneousmeasure of diffuseness

64

Kearney et al Distance perception ACTA ACUSTICA UNITED WITH ACUSTICAVol 98 (2012)

The output of the analysis is then subject to smoothingbased on the Equivalent Rectangular Bandwidth (ERB)scale such that the resolution of the human auditory sys-tem is approximated Since the frequency dependent direc-tion of arrival of the non-diffuse portion of the sound fieldcan be determined HOA reproduction can be achievedby re-encoding point like sources corresponding to the di-rection indicated in each temporal average and frequencyband into a higher order spherical harmonic representa-tion The resultant Ambisonic signals are then weighted ineach frequency band k according to 1 minus ψk However itis only vital to re-encode non-diffuse components to higherorder and the diffuse field can be obtained by multiplyingthe FOA signals by

radicψk and forming a first order decode

This is justified since source localisation is dependent onthe direction of arrival of the direct sound and early reflec-tions and not on late room reverberation [24] Thus fromthe perceptual point of view it is questionable whetherthere is a need to preserve the full directional accuracy ofthe reverberant field Furthermore if there exists a generaldirectional distribution to the diffuse field this will still bepreserved in first order form On the other hand the diffusecomponent should not be simply derived from the 0th ordersignal One can easily see that such a solution would pro-vide perfectly correlated versions of the diffuse field to theleft and right ear signals which have no equivalent in thephysical world (ie real physical sound field) Moreoverinteraural decorrelation is an important factor in providingspatial impression in enclosed environments [25]

Figure 4 shows an example of the first 20ms of a 1st

order impulse response taken in a reverberant hall [26]Here the source was located 3m from a Soundfield

ST350 microphone and the Spatial Room Impulse Re-sponse (SRIR) captured using the exponentially swept-sine tone technique [27] In these plots particular attentionis drawn to the direct sound (coming from directly in frontof the microphone) and a left wall reflection at approxi-mately 14ms It can be seen that the directional resolutionincreases significantly with HOA representation It shouldbe noted that the A-format capsule on sound field micro-phones only display adequate directionality up to 10 kHz[28] Spatial aliasing is therefore an issue for high fre-quencies and as a result the directional information above10 kHz cannot be relied upon

6 Method Localization of Distance of TestSounds

Different protocols have been used in literature for subjec-tive assessment of distance perception most notably a ver-bal report [29 30] direct or indirect blind walking [31 32]or imagined timed walking [32] All of these methods haveproved to provide reliable and comparable results for bothauditory and visual stimuli with direct blind walking ex-hibiting the least between-subject variability [31 32]

In former work [26] authors of this paper developed amethod where subjects indicated the perceived distance ofreal and virtual sound sources by selecting one of several

Direct sound

Left wall reflection

Direct sound

Left wall reflection

(a)

(b)

Figure 4 Ambisonic sound field from 1st order measurementwith a Soundfield ST350 (a) 1st order representation (b) 3rd

order up-mix

physical loudspeakers lined up (and slightly offset in orderto provide lsquoacoustic transparencyrsquo) in front of their eyesHowever for the present study in order to completelyeliminate any possible anchors as well as visual cues itwas decided to utilize the method of direct blind walkingOf the main concerns in the experiment was a direct com-parison of distance perception of real sound sources versusvirtual sound sources presented over headphones Due todifferent apparatus requirements the experiment had to beconducted in two separate phases

61 Participants

Seven participants aged 24ndash58 took part in the experimentAll subjects were of good hearing and were either musictechnology students or practitioners actively involved inaudio research or production Prior to the test HRIR data

65

ACTA ACUSTICA UNITED WITH ACUSTICA Kearney et al Distance perceptionVol 98 (2012)

Figure 5 Measuring Head Related Impulse Responses withminiature microphones

for all the participants has been obtained in a sound-prooflarge (18times15times10m3) but quite damped (T60 1000Hz =057 s) multipurpose room (Black Box) in the Departmentof Theatre Film and Television at the University of YorkAdditional damping was assured by thick heavy curtainscovering all four walls and a carpet on the floor Themeasurement process consisted of a standard procedurewhere miniature omnidirectional microphones (KnowlesFG-23629-P16) were placed at the entrance of a blockedear canal in order to capture acoustic pressure generatedby one loudspeaker at a time located at constant distanceand varying angular direction

Subjects were seated on an elevated platform so thattheir ears were 220m above the ground and their headwas in the centre of a spherical loudspeaker array arrangedin diametrically opposed pairs The ear height was cali-brated using a laser guide as shown in Figure 5 The ar-ray consisted of 16 full range Genelec 8050A loudspeak-ers since the intention was to reproduce Ambisonic soundfields up to and including 3rd order This 3-D setup shownin Figure 6 comprised a flat-front horizontal octagon anda cube (four loudspeakers on top and four on the bottom)The radius of the loudspeaker array (and thus the virtualloudspeaker array) was 327m For FOA-to-binaural de-code only virtual loudspeakers from the cube configura-tion were utilized since no directional resolution is gainedby using a higher number of loudspeakers Furthermoredespite careful alignment oversampling of the sound fieldwith higher numbers of speakers has the potential to yieldsound field distortions [33] Note that for 2nd and 3rd order

Figure 6 Array of 16 loudspeakers used for HRIR measure-ments

reproduction all 16 loudspeakers were used Although theoversampled configuration was not optimal from the 2nd

order reproduction point of view it was not possible toeasily and accurately rearrange the loudspeaker array inorder to accommodate for a different layout

HRIRs were captured using the exponentially swept-sine tone technique [27] at 441 kHz sampling rate and16-bit resolution Since the measurement environment wasnot fully anechoic further processing of the measured datawas necessary The HRIRs were tapered before the arrivalof the first reflection (from the floor) yielding filter kernelswith 257 taps and were subsequently diffuse-field equal-ized

62 Stimuli

The stimuli used in the experiment were pink noisebursts and phonetically balanced phrases selected fromthe TIMIT Acoustic-Phonetic Continuous Speech Corpusdatabase and recorded by a female reader [34] A sam-pling rate of 441 kHz and 16 bit resolution was used inboth cases These two sample types were selected in orderto represent both unfamiliar and familiar sound sourcesThey were presented to the subjects in a pseudo-random-ized manner to avoid any ordering effects

For headphone reproduction prior to the test phaseFOA impulse response measurements were taken from thelistener position of each loudspeaker using the exponen-tially swept-sine tone technique [27] From these measure-ments 2nd and 3rd order impulse response sets were ex-tracted using the directional analysis approach outlined insection 5 0th order Ambisonics does not provide any di-rectional information which means that it would lack thecues that are investigated in the higher order renderingsTherefore it was decided not to include it in this compari-son

The only psychoacoustical optimization applied to theAmbisonics decodes was shelf filtering and was intendedto satisfy Gerzonrsquos localization criteria for maximized ve-locity decode at low frequencies and energy decode athigher frequencies [35] This involved changing the ratio

66

Kearney et al Distance perception ACTA ACUSTICA UNITED WITH ACUSTICAVol 98 (2012)

of the pressure to velocity components at low and highfrequencies Whilst the crossover frequency for the highfrequency boost in the pressure channel at first order isnormally in the region of 400Hz for regular loudspeakerlistening here we restore the crossover point to 700Hzsince the subject is always perfectly centred in the virtualloudspeaker array

63 Test Environment and Apparatus

A series of subjective listening tests was conducted in theLarge Rehearsal Room in the Department of Theatre Filmand Television in the University of York The room dimen-sions were 12 times 9 times 35m3 and the spatially averaged T60at 1 kHz was 026 s A low T60 was desired for this studyso the walls were covered with thick heavy curtains asshown in Figure 7 Since the up-mix from 1st to 2nd and 3rd

order Ambisonics concerned only the deterministic part ofthe measured SRIRs it was assumed that no advantagewould be gained from using a more reverberant space

A professional camera dolly track was set up roughlyin the direction of the diagonal of the room It not onlyallowed for testing distances of the real loudspeaker upto 8m but its non-symmetrical position also assured thatearly reflections of the same order from different surfacesdid not easily coincide at the subjects ears but instead ar-rived at different times A single full-range loudspeaker(Genelec 8050A) was mounted on a camera dolly whichenabled it to be noiselessly translated by the experimentassistant to different locations The guiding rope was hungalong the dolly track which was intended to help and guidethe participants when walking toward the sound sourceSince it was not possible to walk exactly on the dolly trackit was decided that the walking path would be directly nextto it as shown in Figure 7 The only weakness of this so-lution was that the sound source horizontal angle variedfrom 1404 degrees at the closest distance (2m) to 358 de-grees at the furthest distance (8m) However this did nothave any effect on the distance judgments for two reasonsFirstly the subjects were allowed (or even encouraged) torotate their head in order to fully utilize the available ITDand ILD cues Secondly the initial head orientation wasnot in any way fixed This combined with the fact thatthere were no clear cues to the subjectrsquos initial orientationin the room at the origin made this small initial angularoffset unimportant Furthermore none of the participantsreported any bias in their assessment based on the horizon-tal offset of the sound source

For trials with binaural presentation high quality openback headphones (AKG-K601) were used which exhibitlow levels of interaural magnitude and group delay dis-tortion Sound field rotation tilt and tumble control wasimplemented via the TrackIR 5 infra-red head trackingsystem [36] resulting in stable virtual images with headrotations The system responsible for playback of virtual-ized sound sources was completely built in the Pure Datavisual programming environment [37] and its combinedlatency (including head-tracker data porting and audio up-date rate) was 20ms

Figure 7 Participant performing a trial during the experiment

64 Procedure

In the experiment subjects entered the test environmentblindfolded and without any prior expectation regardingthe room dimensions its acoustic properties or the testapparatus They were guided by the experimenter to thereference point (the lsquooriginrsquo) After a short explanation ofthe experiment objectives a training session began witha short (3ndash5min) walking-only trial until participants feltcomfortable with walking blindfolded and using a guiderope Next they performed 4ndash6 training trials in which thesame test stimuli to be used in the experiment (speech andpink noise) were played by the loudspeaker at randomlychosen distances No feedback was given and no resultswere recorded after each test trial The end of the trainingsession was clearly announced and after a 1 minute inter-val the first phase of the test began

In test phase I participants were asked to listen to staticsound sources at a randomly chosen points focusing onthe perceived distance They could listen to any audio sam-ple as many times as they wished During the playbackthey were instructed to stay still and refrain from any trans-lational head movements However they were encouragedto rotate their head freely After the playback had stoppedthey were asked to walk guided by the rope to the pointwhere they thought the sound originated from The dis-tance walked was subsequently recorded by the assistantusing a laser measuring tool after which the participantwalked backwards to the origin In the meantime the loud-speaker was noiselessly translated to its new position and

67

ACTA ACUSTICA UNITED WITH ACUSTICA Kearney et al Distance perceptionVol 98 (2012)

the test proceeded Similar to the training session no feed-back was given at any stage

During the first test phase participants had to indicatethe perceived distance for sound sources randomly lo-cated at 2m 4m 6m or 8m Taking into account thatboth speech and pink noise bursts samples were used (ina pseudo-random order) the number of trials in the firstphase added up to 8 Each subject performed all the trialsonly once

Upon completion of the first phase of the test there wasa short (approximately 2 minutes) interval that was re-quired in order to put on the headphones and calibratethe head-tracking system In phase II subjects were alsoasked to identify the sound source distance but this timeusing Ambisonic sound fields presented over headphonesOther than the fact that headphones and the head-trackingsystem were used the test protocol remained the same asin phase I However due to the fact that there were threeplayback configurations to be tested (1st 2nd and 3rd orderAmbisonics) participants had to perform 24 trails insteadof 8 Instead of separate phases for each Ambisonic orderall samples were randomly presented to the subject withinthe same test phase Again subjects performed all the tri-als only once and no feedback was given at any stage

7 Results

The perceived sound source distance (indicated by the dis-tance walked) was collected from 7 subjects for 4 presen-tation points (2m 4m 6m and 8m) two stimuli (femalespeech and pink noise bursts) and four playback options1st 2nd and 3rd Order Ambisonics and real loudspeakerswhich for analysis we will denote FOA SOA TOA andREAL respectively With headphone trials none of theparticipants reported in-head localization however therewere 3 cases were the proximity of the sound source wasvery apparent so participants decided not to move at allIn some cases the virtual sound source was initially local-ized behind the subjects but all participants were able toresolve the confusion by applying head-rotation

We computed the mean values of walked distances micro foreach test condition along with the corresponding standarderrors se(micro) The results are presented separately for eachstimulus type within 95 Confidence Intervals

As expected the perception of distance for the real sour-ces was more accurate for near sources Beyond 4m dis-tance perception was continuously underestimated whichis congruent with the previous studies outlined in sec-tion 2 Furthermore the standard deviation of localiza-tion increases as the source moves further into the diffusefield We also see that unfamiliar stimuli produce greatervariability in subjectsrsquo answers The mean localization ofthe virtual sources follows the reference source localiza-tion well The answers for virtual sources deviate fromtheir means roughly in the same fashion as the answers forreference sources as localization becomes more difficultwithin the diffuse field

Since the study followed the within-subject factorial de-sign with 2(stimuli)4(playback conditions) in order to

0 1 2 3 4 5 6 7 8 90

1

2

3

4

5

6

7

8

9

Real distance [m]

Dis

tance

walk

ed

[m]

FOASOATOAReal

Figure 8 Mean localization of real and virtual sound sources (fe-male speech)

0 1 2 3 4 5 6 7 8 90

1

2

3

4

5

6

7

8

9

Real distance [m]

Dis

tance

walk

ed

[m]

FOASOATOAReal

Figure 9 Mean localization of real and virtual sound sources(pink noise bursts)

investigate the effects of these two factors (referred lateras factors A and B) as well as potential interaction ef-fects for each presentation distance a two-way ANOVAhas been performed The null hypothesis being tested hereis that all the mean perceived distances for all the stimuliand playback methods do not differ significantly

H0 microFOA=microSOA=microTOA=microReal=micro

H1 not all localization means (microi) are the same

No statistically significant effect of stimuli (familiar vsunfamiliar) on the perception of distance has been found(F2m(3 48) = 0835 p = 0365 F4m(3 48) = 20462p = 0159 F6m(3 48) = 2575 p = 0115 F8m(3 48) =20462 p = 0159) For distances of 4m and moreplayback option had also no statistically significant effect(F4m(3 48) = 2192 p = 0101 F6m(3 48) = 0665p = 0577 F8m(3 48) = 0202 p = 0894)

However a statistically significant difference has beendetected for the distance of 2m In larger study de-signs with multiple levels it is advisable to use the Hon-

68

Kearney et al Distance perception ACTA ACUSTICA UNITED WITH ACUSTICAVol 98 (2012)

Table I Mean localization [m] of virtual and real sound sourcesat 2m

microFOA microSOA microTOA microReal

Speech 1119 1389 0841 1638Noise 0877 1001 0902 1641

Table II Correlation coefficients ρ and corresponding pminus valuesfor pairs of distance estimations for real and virtual soundsources (Speech)

ρ p minus valueReal vs FOA 09828 00172Real vs SOA 09960 00040Real vs TOA 09590 00410

Table III Correlation coefficients ρ and corresponding pminusvaluesfor pairs of distance estimations for real and virtual soundsources (Noise)

ρ p minus valueReal vs FOA 09913 00087Real vs SOA 09857 00143Real vs TOA 09972 00028

estly Significant Difference (HSD) approach since thereis an increased risk of spuriously significant differencearisen purely by chance So in order to investigate furtherwhere the difference occurs anHSD has been computed(HSD = 1423m) If we now compile the table of meanperceived distances for the sound sources located at 2mwe can see that all of the above values clearly lie withina single HSD to each other and cannot be distinguishedWe can safely assume then an ANOVA false alarm (typeI error) and no statistically significant effect of playbackmethod for the sources at the distance of 2m as well

Lastly for all the distances no synergetic effects of fac-tors A (stimuli) and B (playback conditions) have beendetected

Additionally we calculated correlation coefficients ρ forpairs of distance estimations for real and virtual soundsources (either 1st 2nd or 3rd order) and two stimuli Inall cases high correlation coefficients have been obtainedwhich confirms our findings that for these particular testconditions the perception of distance of binaurally ren-dered Ambisonic sound fields of orders 1 to 3 cannot bedistinguished from the perception of distance of the realsound sources

8 Discussion

The results presented for real sources corroborate the clas-sic underestimation of source distance as reported in theliterature These results were used as a basis with whichto measure the ability of Ambisonic sound fields of differ-ent orders to present sources at different distances It was

expected that a further underestimation of the source dis-tance would ensue with the binaural rendering as reportedin [17] However this was not the case even for first or-der presentations and the apparent distances of the vir-tual sources matched the real source distances well Oneshould note that the major difference between this studyand that of [17] is our use of head-tracking indicatingthe importance of head-movements in perceiving sourcedistance which develops the findings of Waller [18] andAshmead et al [10] on user interaction in a virtual spaceFurther work is required to quantify the effect of this

Moreover the presented study demonstrates that the en-hanced directional accuracy gained by presenting soundsources in HOA through head-tracked binaural renderingdoes not yield a significant improvement in the perceptionof the source distance What is noteworthy is that for eachorder there is no significant difference in the perception ofthe source location when compared to real-world sourcesWe therefore conclude that sound field directionality fordistance perception is sufficient with 1st order playback

The presence of the ANOVA false alarm at the 2m pointis of interest It is noteworthy that the 2m point representsa source inside the virtual array geometry It is a knownissue that virtual sound sources rendered inside the arrayof loudspeakers cannot be reproduced in a straightforwardway without artifacts Some of these artifacts include in-correct wave-front curvature and insufficient bass boost

In the first case there is ample evidence in the litera-ture to suggest that the wavefront curvature translates to asignificant binaural cues for sound sources near the head[30 38] It was already shown in section 21 that as asource moves closer to the head the levels of the monau-ral transfer function and the ILD both change significantlywith source angle However this effect is not strong at 1mand beyond For sources further away it has been shownin [39] that it is very difficult to assess distance by binauralcues alone

In the second case the requirement for distance com-pensation filtering due to near field effects for the largeloudspeaker radius (327m) and the given source distances(gt2m) is only prominent below 100Hz For the femalespeech test stimuli this will not have an effect since thefirst formant frequencies do not go down below 180HzAlso the current method employed for capturing HRIRsallowed for reliably obtaining filters with a frequencyresponse reaching down to around 170Hz thereby alsoband-limiting the delivery of the pink noise stimuli

Finally there was no significant difference in the resultspresented for different sources although the greater vari-ance in the results for pink noise suggest that the famil-iarity of the source does indeed play a role in the percep-tion of source distance as mentioned in section 23 Futurestudies will investigate the use of these monaural cues fur-ther and will utilize 0th order sound field rendering sinceit will remove the influence of any directional information

Considering the aforementioned study of Bronkhorst etal [14] where the accuracy of distance perception for bin-aural playback increases with the number of reflections

69

ACTA ACUSTICA UNITED WITH ACUSTICA Kearney et al Distance perceptionVol 98 (2012)

our findings demonstrate that the net effect of the monauralcues of direct to reverberant ratio level difference and timeof arrival of early reflections are of greater importance indistance perception for binaural rendering than Ambisonicdirectional accuracy beyond 1st order

9 Conclusions

We have assessed through subjective analysis the per-ceived source distance in virtual Ambisonic sound fieldsin comparison to real world sources The hypothesis testedwas that enhanced directional accuracy of deterministicpart of the sound field may lead to better reconstructionof environmental depth and thus improve the perceptionof sound source distance However it was shown thatAmbisonic reproduction matches the perceived real worldsource distances well even at 1st order and no improvementin this regard was observed when increasing the order Itmust be emphasized though that this analysis applies toAmbisonic-to-binaural decodes with higher order synthe-sis achieved using the directional analysis method of [23]Therefore further work will examine this topic for loud-speaker reproduction for both centre and off-centre listen-ing as well as investigate the effectiveness of HOA synthe-sis in comparison to real world HOA measurements

Acknowledgments

The authors gratefully acknowledge the participation ofthe test subjects for both their time and constructive com-ments as well as the technical support staff at the Depart-ment of Theatre Film and Television at the University ofYork for their assistance in the experimental setups Thisresearch is supported by Science Foundation Ireland

References

[1] L Fauster Stereoscopic techniques in computer graphicsTechnical paper TU Wien 2007

[2] J Lee Head tracking for desktop VR displays using theWii remote httpjohnnyleenetprojectswiiaccessed 30th Sept 2011

[3] D R Begault Direct comparison of the impact of headtracking reverberation and individualized head-relatedtransfer functions on the spatial perception of a virtualsound source J Audio Eng Soc 49 (2001) 904ndash916

[4] M Otani T Hirahara Auditory artifacts due to switchinghead-related transfer functions of a dynamic virtual audi-tory display IEICE Trans Fundam Electron CommunComput Sci E91-A (2008) 1320ndash1328

[5] V Pulkki Virtual sound source positioning using VectorBase Amplitude Panning J Audio Eng Soc 45 (1997)456ndash466

[6] A J Berkhout A Holographic Approach to Acoustic Con-trol J Audio Eng Soc 36 (1988) 977ndash995

[7] M A Gerzon Periphony With-height sound reproductionJ Audio Eng Soc 21 (1973) 2ndash10

[8] F Rumsey Spatial quality evaluation for reproducedsound Terminology meaning and a scene-based para-digm J Audio Eng Soc 50 (2002) 651ndash666

[9] J Blauert Communication acoustics Springer 2008

[10] D H Ashmead D L Davis A Northington Contributionof listenersrsquo approaching motion to auditory distance per-ception J Exp Psy Hum Percep and Perform 21 (1995)239ndash256

[11] E Czerwinski A Voishvillo S Alexandrov A TerekhovPropagation distortion in sound systems Can we avoid itJ Audio Eng Soc 48 (2000) 30ndash48

[12] S H Nielsen Auditory distance perception in differentrooms J Audio Eng Soc 41 (1993) 755ndash770

[13] M B Gardner Distance estimation of 0 or apparent 0oriented speech signals in anechoic space J Acoust SocAm 45 (1969) 47ndash53

[14] A W Bronkhorst T Houtgast Auditory distance percep-tion in rooms Nature 397 (1999) 517ndash520

[15] J B Allen D A Berkley Image method for efficientlysimulating small-room acoustics J Acoust Soc Am 65(1979) 943ndash950

[16] M Rychtarikova T V d Bogaert G Vermeir J WoutersBinaural sound source localization in real and virtualrooms J Audio Eng Soc 57 (2009) 205ndash220

[17] J S Chan C Maguinness D Lisiecka C Ennis MLarkin C OrsquoSullivan F Newell Comparing audiovisualdistance perception in various real and virtual environ-ments Proc of the 32nd Euro Conf on Vis Percep Re-gensburg Germany 2009

[18] D Waller Factors affecting the perception of interobjectdistances in virtual environments Presence Teleoper Vir-tual Environ 8 (1999) 657ndash670

[19] A McKeag D McGrath Sound field format to binauraldecoder with head-tracking Proc of the 6th Australian Re-gional Convention of the AES 1996

[20] M Noisternig A Sontacchi T Musil R Holdrich A3D Ambisonic based binaural sound reproduction systemProc of the 24th Int Conf of the Audio Eng Soc AlbertaCanada 2003

[21] B-I Dalenback M Str

omberg Real time walkthrough au-

ralization - the first year Proc of the Inst of AcousCopenhagen Denmark 2006

[22] C Masterson S Adams G Kearney F Boland A methodfor head related impulse response simplification Procof the 17th European Signal Processing Conference (EU-SIPCO) Glasgow Scotland 2009

[23] J Merimaa V Pulkki Spatial impulse response renderingi Analysis and synthesis J Audio Eng Soc 53 (2005)

[24] W M Hartmann Localization of sound in rooms JAcoust Soc Am 74 (1983) 1380ndash1391

[25] D Griesinger Spatial impression and envelopment in smallrooms Proc of the 103rd Conv of the Audio Eng SocNew York USA 1997

[26] G Kearney M Gorzel H Rice F Boland Depth per-ception in interactive virual acoustic environments usinghigher order ambisonic soundfields Proc of the 2nd IntAmbisonics Symp Paris France 2010

[27] A Farina Simultaneous measurement of impulse responseand distortion with a swept-sine technique Proc of the108th Conv of the Audio Eng Soc Paris France 2000

[28] M Gerzon The design of precisely coincident microphonearrays for stereoand surround sound Proc of the 50thConv of the Audio Eng Soc London UK 1975

70

Kearney et al Distance perception ACTA ACUSTICA UNITED WITH ACUSTICAVol 98 (2012)

[29] C Guastavino B F G Katz Perceptual evaluation ofmulti-dimensional spatial audio reproduction J AcoustSoc Am 116 (2004) 1105ndash1115

[30] P Zahorik Assessing auditory distance perception usingvirtual acoustics J Acoust Soc Am 111 (2002) 1832ndash1846

[31] J M Loomis R L Klatzky J W Philbeck R G Goll-edge Assessing auditory distance perception using percep-tually directed action Perception And Psychophysics 60(1998) 966ndash980

[32] T Y Grechkin T D Nguyen J M Plumert J F CremerJ K Kearney How does presentation method and measure-ment protocol affect distance estimation in real and virtualenvironments ACM Trans Appl Percept 7 (2010) 261ndash2618

[33] S Bertet Formats audio 3d hieacuterarchiques Caracteacuterisationobjective et perceptive des systeacutemes ambisonicsdrsquoordressupeacuterieurs PhD dissertation INSA Lyon 2008

[34] W M Fisher G R Doddington K M Goudie-MarshallThe darpa speech recognition research database Specifica-tions and status Proc of the DARPA Workshop on SpeechRecognition 1986

[35] M A Gerzon G J Barton Ambisonic decoders forHDTV Proc of the 92nd Conv of the Audio Eng SocVienna Austria 1992

[36] NaturalPoint Trackir 5 httpwwwnaturalpointcomtrackir accessed 30th Sept 2011

[37] M Puckette Pure data httppuredatainfo ac-cessed 30th Sept 2011

[38] P Zahorik D S Brungart A W Bronkhorst Auditory dis-tance perception in humans A summary of past and presentresearch Acta Acustica united with Acustica 91 (2005)409ndash420

[39] H Wittek Perceptual differences between Wavefield Syn-thesis and Stereophony Department of Music and SoundRecording School of Arts Communication and Humani-ties University of Surrey UK 2007

71

Page 2: 1 ) -0 #0*- ! 1 ) #! !-* 0* ! ) *- ! ) ) ) *#! #0! *sigmedia/pmwiki/uploads/Main...# ( " &/4$/' ,$ +$ *- ! ) %- #! ! !- ) - 1 ) -0 #0*- ! 1 ) #! !-* 0* ! ) *- ! )) ) *#! #0! * K05

ACTA ACUSTICA UNITED WITH ACUSTICA Kearney et al Distance perceptionVol 98 (2012)

2 Distance Perception

It is important to note that throughout the literature thereexists a clear distinction between lsquodistancersquo and lsquodepthrsquoboth understood as perceptual attributes of sound Accord-ing to [8] lsquodistancersquo is related to the physical range be-tween the sound source and a listener whereas lsquodepthrsquo re-lates to the recreated auditory scene as a whole and con-cerns a sense of perspective in that scene

21 Distance Perception in a Free Field

Although the human ability to perceive sources at differ-ent distances is not fully understood there are several keyfactors which are known to contribute to distance percep-tion In the first case changes in distance lead to changesin the monaural transfer function (the sound pressure atone ear) This is shown in Figure 1 for a spherical modelof a head We see that for sources of less than 1m distancethe sound pressure level varies depending on the angle ofincidence due to the shadowing effects of the head Be-yond 1m the intensity of the source decays according tothe inverse square law

However absolute monoaural cues will only be mean-ingful if we have some prior knowledge of the source levelie how familiar we are with the source In other words aform of semiosis occurs where the perception of localiza-tion is based on anticipation and experience [9] For exam-ple for normal level speech (approximately 60dB at 1m)we expect nearer sources to be loud and quieter sourcesfurther away However this is more difficult to assess forsynthetic sounds or sounds that we are unfamiliar with

It is interesting to note that for sources in the medianplane the level at distances less than 1m does not changeas dramatically as sources located at the ipsilateral pointThis will not significantly affect the low frequency Interau-ral Time Difference (ITD) but it is reflected in the Interau-ral Level Difference (ILD) as shown in Figure 2 We notethat the most extreme ILD is exhibited at the side of thehead (90) due to the maximum head shadowing effectFor a similar reason subconscious head movements maybe regarded as another important cue since level changesclose to the source will be more apparent then far from it[10] Thus near-field ILD cues exist which aid us in dis-criminating source distance

On the other hand for larger distances and high soundpressure levels the propagation speed of a sound wavein a medium ceases to be constant with frequency whichmay lead to distortion of the waveform [11] Furthermoresound waves travelling a substantial distance also undergoa process of energy absorption by water molecules in theatmosphere This is more apparent for high-frequency en-ergy of the wave and leads to spectral changes (low-passfiltering) of the sound being heard

22 Distance Perception in a Reverberant Field

In reverberant rooms the ratio of the direct to reverberantsound plays an extremely important role in distance per-ception For near sources where the direct field energy is

Source distance (m)

RM

SM

onaura

lTra

nsfe

rFunction

(dB

)

1-2

-1

0

1

2

0deg

30deg

60deg

90deg

3

4

2 5 1 2 5 10

Figure 1 RMS monaural transfer function for a spherical headmodel at the left ear for broadband source at different angles withvarying source distance (reference = plane wave at (0 0))

Source distance (m)

ILD

(dB

)

1-5

0

5

10

15

20

25

30

0deg

30deg

60deg

90deg

2 5 1 2 5 10

Figure 2 Interaural level difference of spherical head model forbroadband source at different angles with varying source dis-tance

much greater than the reverberant field the sound pressurelevel approximately changes in accordance to the free-fieldconditions However for source-listener distances greaterthan the critical distance the level of reverberation is ingeneral independent of the source position due to the ho-mogeneous level of the diffuse field and the direct to re-verberant ratio changes approximately 6dB per doublingof distance from the source

The directions of arrival of the early reflections are an-other parameter which change according to the source-listener position and can be regarded as an important factorin creating environmental depth Whether it is useful to thelisteners in determining the distance to the sound sourcein the presence of other cues like sound intensity directto reverberant energy ratio or the arrival pattern of delaysremains an open question that needs to be addressed Am-bisonics allows for enhanced directional reproduction ofdeterministic components of a sound field by increasing

62

Kearney et al Distance perception ACTA ACUSTICA UNITED WITH ACUSTICAVol 98 (2012)

the order of spherical harmonic decomposition Howeverbetter directional localization can be achieved without af-fecting other important cues for distance estimation likeoverall sound intensity or direct to reverberant energy ra-tio Thus it can constitute an ideal framework for testingwhether less apparent properties of a sound field can influ-ence the perception of distance

23 Former Psychoacoustical Studies on DistancePerception

The perception of distance has been shown to be one that isnot linearly proportional to the source distance For exam-ple both Nielson et al [12] and Gardner [13] have shownthat the localization of speech signals is consistently un-derestimated in an anechoic environment This underesti-mation has also been shown by other authors in the contextof reverberant environments both real and virtual In [14]Bronkhorst et al demonstrate that in a damped virtual en-vironment sources are consistently perceived to be closerthan in a reverberant virtual environment due to the directto reverberant ratio In their studies the room simulationis conducted using simulated Binaural Room Impulse Re-sponses (BRIRs) created from the image source method[15] They show how perceived distance increases rapidlywith the number and amplitude of the reflections

In a similar study Rychtarikova et al [16] investi-gated the difference in localization accuracy between realrooms and computationally derived BRIRs Their findingsshow that at 1m localization accuracy in both the virtualand real environments is in good agreement with the truesource position However at 24m the accuracy degradesand high frequency localization errors were found in thevirtual acoustic pertaining to the difference in HRTFs be-tween the model and the subject In the same vain Chan etal [17] have shown that distance perception using record-ings made from the in-ear microphones on individual sub-jects again lead to underestimation of the source distancein virtual reverberant environments more so than with realsources

Waller [18] and Ashmead et al [10] have identified thatone of the factors improving distance perception is the lis-tener movement in the virtual or real space It is thereforecrucial to account for any listenerrsquos movements (or lackthereof) in the experimental design

Similarly for headphone reproduction of virtual acous-tic environments small subconscious head rotations maylead to improvements in distance perception by providingenhanced ILD and ITD cues Therefore the sound fieldtransformations should reflect well the small changes oforientation of the listenerrsquos head

3 Ambisonic Spatialization

Ambisonics was originally developed by Gerzon Bartonand Fellgett [7] as a unified system for the recording re-production and transmission of surround sound The the-ory of Ambisonics is based on the decomposition of the

sound field measured at a single point in space into spher-ical harmonic functions defined as

Y σmn(ΦΘ) = AmnPmn(sinΘ) (1)

middot cos(mΦ) if σ = +1sin(mΦ) if σ = minus1

where m is the order and n is the degree of the sphericalharmonic and Pmn is the fully normalized (N3D) associ-ated Legendre function The coordinate system used com-prises x y and z axes pointing to the front left and uprespectively Φ is the azimuthal angle with the clockwiserotation and Θ is the elevation angle form the x-y planeFor each order m there are (2m + 1) spherical harmonics

In order for plane wave representation over a loud-speaker array we must ensure that

s Y σmn(ΦΘ) =I

i=1

gi Yσmn(φi θi) (2)

where s is the pressure of the source signal from direction(ΦΘ) and gi is the ith loudspeaker gain from direction(φi θi) We can then express the left hand side of equation(2) in vector notation giving the Ambisonic channels

B = YΦΘs (3)

= Y 100(ΦΘ) Y 1

10(ΦΘ) Y σmm(ΦΘ)Ts

Equation (2) can then be rewritten as

B = C middot g (4)

where C are the encoding gains associated with the loud-speaker positions and g is the loudspeaker signal vector Inorder to obtain g we require a decode matrix D which isthe inverse of C However to invert C we need the matrixto be a square which is only possible when the number ofAmbisonic channels is equal to the number of loudspeak-ers When the number of loudspeaker channels is greaterthan the number of Ambisonic channels which is usuallythe case we then obtain the pseudo-inverse of C where

D = pinv(C) = CT (CCT )minus1 (5)

Since the sound field is represented by a spherical coor-dinate system sound field transformation matrices can beused to rotate tilt and tumble the sound fields In this waythe Ambisonic signals themselves can be controlled by theuser allowing for the virtual loudspeaker approach to beemployed For 3-D reproduction the number of I virtualloudspeakers employed with the Ambisonics approach isdependent on the Ambisonic order m where

I ge N = (m + 1)2 (6)

63

ACTA ACUSTICA UNITED WITH ACUSTICA Kearney et al Distance perceptionVol 98 (2012)

4 Virtual Loudspeaker Reproduction

In the lsquovirtual loudspeakerrsquo approach HRTFs are mea-sured at the lsquosweet-spotrsquo (the limited region in the cen-tre of a reproduction array where an adequate spatial im-pression is generally guaranteed) in a multi-loudspeakerreproduction setup and the resultant binaural playback isformed from the convolution of the loudspeaker feeds withthe virtual loudspeakers This concept is illustrated in Fig-ure 3 For the left ear we have

L =I

i=1

hLi lowast qi (7)

where lowast denotes convolution and hLi is the left ear HRIRcorresponding to the ith virtual loudspeaker and qi is theith loudspeaker feed Similar relations apply for the rightear signal This method was first introduced by McKeagand McGrath [19] and examples of its adoption can befound in [20] and [21] This approach has major computa-tional advantages since a complex filter kernel is not re-quired and head rotation can be simulated by changing theloudspeaker feeds p as opposed to the HRTFs Whilst theHRTFs in this case play an important role in the spatializa-tion ultimately it is the sound field creation over the virtualloudspeakers which gives the overall spatial impressionMost existing research uses a block frequency domain ap-proach to this convolution However given that the virtualloudspeaker feeds are controlled via head-tracking in real-time a time-domain filtering approach can also be utilizedFor short filter lengths obtaining the output in a point wisemanner avoids the inherent latencies introduced by blockconvolution in the frequency domain A strategy for sig-nificant reduction of the filter length without artifacts hasbeen proposed in [22]

5 Higher Order Synthesis

In order to compare the distance perception of differentorders of Ambisonic sound fields it is desirable to takereal world sound field measurements However the for-mation of higher order spherical harmonic directional pat-terns is non-trivial Thus in order for us to change FOAimpulse responses to HOA representations we will em-ploy a perceptual based approach which will allow us tosynthesize the increased directional resolution that wouldbe achieved with a HOA sound field recording For this weadopt the directional analysis method of Pulkki and Meri-maa found in [23] Here the B-format signals are analyzedin terms of sound intensity and energy in order to derivetime-frequency based direction of arrival and diffusenessThe instantaneous intensity vector is given from the pres-sure p and particle velocity u as

I(t) = p(t) u(t) (8)

Since we are using FOA impulse response measurementsthe pressure can be approximated by the 0th order Am-bisonics component w(t) which is omnidirectional

p(t) = w(t) (9)

Figure 3 The virtual loudspeaker reproduction concept

and the particle velocity by

u(t) =1radic2Z0

x(t)ex + y(t)ey + z(t)ez (10)

where ex ey and ez represent Cartesian unit vectors x(t)y(t) z(t) are the FOA signals and Z0 is the characteristicacoustic impedance of air

The instantaneous intensity represents the direction ofthe energy transfer of the sound field and the direction ofarrival can be determined simply by the opposite directionof I For FOA we can calculate the intensity for each coor-dinate axis and in the frequency domain Since a portionof the energy will also oscillate locally a diffuseness esti-mate can be made from the ratio of the magnitude of theintensity vector to the overall energy density E given as

ψ = 1 minusI

c E (11)

where middot denotes time averaging || middot || denotes the normof the vector and c is the speed of sound The diffusenessestimate will yield a value of zero for incident plane wavesfrom a particular direction but will give a value of 1 wherethere is no net transport of acoustic energy such as in thecases of reverberation or standing waves Time averagingis used since it is difficult to determine an instantaneousmeasure of diffuseness

64

Kearney et al Distance perception ACTA ACUSTICA UNITED WITH ACUSTICAVol 98 (2012)

The output of the analysis is then subject to smoothingbased on the Equivalent Rectangular Bandwidth (ERB)scale such that the resolution of the human auditory sys-tem is approximated Since the frequency dependent direc-tion of arrival of the non-diffuse portion of the sound fieldcan be determined HOA reproduction can be achievedby re-encoding point like sources corresponding to the di-rection indicated in each temporal average and frequencyband into a higher order spherical harmonic representa-tion The resultant Ambisonic signals are then weighted ineach frequency band k according to 1 minus ψk However itis only vital to re-encode non-diffuse components to higherorder and the diffuse field can be obtained by multiplyingthe FOA signals by

radicψk and forming a first order decode

This is justified since source localisation is dependent onthe direction of arrival of the direct sound and early reflec-tions and not on late room reverberation [24] Thus fromthe perceptual point of view it is questionable whetherthere is a need to preserve the full directional accuracy ofthe reverberant field Furthermore if there exists a generaldirectional distribution to the diffuse field this will still bepreserved in first order form On the other hand the diffusecomponent should not be simply derived from the 0th ordersignal One can easily see that such a solution would pro-vide perfectly correlated versions of the diffuse field to theleft and right ear signals which have no equivalent in thephysical world (ie real physical sound field) Moreoverinteraural decorrelation is an important factor in providingspatial impression in enclosed environments [25]

Figure 4 shows an example of the first 20ms of a 1st

order impulse response taken in a reverberant hall [26]Here the source was located 3m from a Soundfield

ST350 microphone and the Spatial Room Impulse Re-sponse (SRIR) captured using the exponentially swept-sine tone technique [27] In these plots particular attentionis drawn to the direct sound (coming from directly in frontof the microphone) and a left wall reflection at approxi-mately 14ms It can be seen that the directional resolutionincreases significantly with HOA representation It shouldbe noted that the A-format capsule on sound field micro-phones only display adequate directionality up to 10 kHz[28] Spatial aliasing is therefore an issue for high fre-quencies and as a result the directional information above10 kHz cannot be relied upon

6 Method Localization of Distance of TestSounds

Different protocols have been used in literature for subjec-tive assessment of distance perception most notably a ver-bal report [29 30] direct or indirect blind walking [31 32]or imagined timed walking [32] All of these methods haveproved to provide reliable and comparable results for bothauditory and visual stimuli with direct blind walking ex-hibiting the least between-subject variability [31 32]

In former work [26] authors of this paper developed amethod where subjects indicated the perceived distance ofreal and virtual sound sources by selecting one of several

Direct sound

Left wall reflection

Direct sound

Left wall reflection

(a)

(b)

Figure 4 Ambisonic sound field from 1st order measurementwith a Soundfield ST350 (a) 1st order representation (b) 3rd

order up-mix

physical loudspeakers lined up (and slightly offset in orderto provide lsquoacoustic transparencyrsquo) in front of their eyesHowever for the present study in order to completelyeliminate any possible anchors as well as visual cues itwas decided to utilize the method of direct blind walkingOf the main concerns in the experiment was a direct com-parison of distance perception of real sound sources versusvirtual sound sources presented over headphones Due todifferent apparatus requirements the experiment had to beconducted in two separate phases

61 Participants

Seven participants aged 24ndash58 took part in the experimentAll subjects were of good hearing and were either musictechnology students or practitioners actively involved inaudio research or production Prior to the test HRIR data

65

ACTA ACUSTICA UNITED WITH ACUSTICA Kearney et al Distance perceptionVol 98 (2012)

Figure 5 Measuring Head Related Impulse Responses withminiature microphones

for all the participants has been obtained in a sound-prooflarge (18times15times10m3) but quite damped (T60 1000Hz =057 s) multipurpose room (Black Box) in the Departmentof Theatre Film and Television at the University of YorkAdditional damping was assured by thick heavy curtainscovering all four walls and a carpet on the floor Themeasurement process consisted of a standard procedurewhere miniature omnidirectional microphones (KnowlesFG-23629-P16) were placed at the entrance of a blockedear canal in order to capture acoustic pressure generatedby one loudspeaker at a time located at constant distanceand varying angular direction

Subjects were seated on an elevated platform so thattheir ears were 220m above the ground and their headwas in the centre of a spherical loudspeaker array arrangedin diametrically opposed pairs The ear height was cali-brated using a laser guide as shown in Figure 5 The ar-ray consisted of 16 full range Genelec 8050A loudspeak-ers since the intention was to reproduce Ambisonic soundfields up to and including 3rd order This 3-D setup shownin Figure 6 comprised a flat-front horizontal octagon anda cube (four loudspeakers on top and four on the bottom)The radius of the loudspeaker array (and thus the virtualloudspeaker array) was 327m For FOA-to-binaural de-code only virtual loudspeakers from the cube configura-tion were utilized since no directional resolution is gainedby using a higher number of loudspeakers Furthermoredespite careful alignment oversampling of the sound fieldwith higher numbers of speakers has the potential to yieldsound field distortions [33] Note that for 2nd and 3rd order

Figure 6 Array of 16 loudspeakers used for HRIR measure-ments

reproduction all 16 loudspeakers were used Although theoversampled configuration was not optimal from the 2nd

order reproduction point of view it was not possible toeasily and accurately rearrange the loudspeaker array inorder to accommodate for a different layout

HRIRs were captured using the exponentially swept-sine tone technique [27] at 441 kHz sampling rate and16-bit resolution Since the measurement environment wasnot fully anechoic further processing of the measured datawas necessary The HRIRs were tapered before the arrivalof the first reflection (from the floor) yielding filter kernelswith 257 taps and were subsequently diffuse-field equal-ized

62 Stimuli

The stimuli used in the experiment were pink noisebursts and phonetically balanced phrases selected fromthe TIMIT Acoustic-Phonetic Continuous Speech Corpusdatabase and recorded by a female reader [34] A sam-pling rate of 441 kHz and 16 bit resolution was used inboth cases These two sample types were selected in orderto represent both unfamiliar and familiar sound sourcesThey were presented to the subjects in a pseudo-random-ized manner to avoid any ordering effects

For headphone reproduction prior to the test phaseFOA impulse response measurements were taken from thelistener position of each loudspeaker using the exponen-tially swept-sine tone technique [27] From these measure-ments 2nd and 3rd order impulse response sets were ex-tracted using the directional analysis approach outlined insection 5 0th order Ambisonics does not provide any di-rectional information which means that it would lack thecues that are investigated in the higher order renderingsTherefore it was decided not to include it in this compari-son

The only psychoacoustical optimization applied to theAmbisonics decodes was shelf filtering and was intendedto satisfy Gerzonrsquos localization criteria for maximized ve-locity decode at low frequencies and energy decode athigher frequencies [35] This involved changing the ratio

66

Kearney et al Distance perception ACTA ACUSTICA UNITED WITH ACUSTICAVol 98 (2012)

of the pressure to velocity components at low and highfrequencies Whilst the crossover frequency for the highfrequency boost in the pressure channel at first order isnormally in the region of 400Hz for regular loudspeakerlistening here we restore the crossover point to 700Hzsince the subject is always perfectly centred in the virtualloudspeaker array

63 Test Environment and Apparatus

A series of subjective listening tests was conducted in theLarge Rehearsal Room in the Department of Theatre Filmand Television in the University of York The room dimen-sions were 12 times 9 times 35m3 and the spatially averaged T60at 1 kHz was 026 s A low T60 was desired for this studyso the walls were covered with thick heavy curtains asshown in Figure 7 Since the up-mix from 1st to 2nd and 3rd

order Ambisonics concerned only the deterministic part ofthe measured SRIRs it was assumed that no advantagewould be gained from using a more reverberant space

A professional camera dolly track was set up roughlyin the direction of the diagonal of the room It not onlyallowed for testing distances of the real loudspeaker upto 8m but its non-symmetrical position also assured thatearly reflections of the same order from different surfacesdid not easily coincide at the subjects ears but instead ar-rived at different times A single full-range loudspeaker(Genelec 8050A) was mounted on a camera dolly whichenabled it to be noiselessly translated by the experimentassistant to different locations The guiding rope was hungalong the dolly track which was intended to help and guidethe participants when walking toward the sound sourceSince it was not possible to walk exactly on the dolly trackit was decided that the walking path would be directly nextto it as shown in Figure 7 The only weakness of this so-lution was that the sound source horizontal angle variedfrom 1404 degrees at the closest distance (2m) to 358 de-grees at the furthest distance (8m) However this did nothave any effect on the distance judgments for two reasonsFirstly the subjects were allowed (or even encouraged) torotate their head in order to fully utilize the available ITDand ILD cues Secondly the initial head orientation wasnot in any way fixed This combined with the fact thatthere were no clear cues to the subjectrsquos initial orientationin the room at the origin made this small initial angularoffset unimportant Furthermore none of the participantsreported any bias in their assessment based on the horizon-tal offset of the sound source

For trials with binaural presentation high quality openback headphones (AKG-K601) were used which exhibitlow levels of interaural magnitude and group delay dis-tortion Sound field rotation tilt and tumble control wasimplemented via the TrackIR 5 infra-red head trackingsystem [36] resulting in stable virtual images with headrotations The system responsible for playback of virtual-ized sound sources was completely built in the Pure Datavisual programming environment [37] and its combinedlatency (including head-tracker data porting and audio up-date rate) was 20ms

Figure 7 Participant performing a trial during the experiment

64 Procedure

In the experiment subjects entered the test environmentblindfolded and without any prior expectation regardingthe room dimensions its acoustic properties or the testapparatus They were guided by the experimenter to thereference point (the lsquooriginrsquo) After a short explanation ofthe experiment objectives a training session began witha short (3ndash5min) walking-only trial until participants feltcomfortable with walking blindfolded and using a guiderope Next they performed 4ndash6 training trials in which thesame test stimuli to be used in the experiment (speech andpink noise) were played by the loudspeaker at randomlychosen distances No feedback was given and no resultswere recorded after each test trial The end of the trainingsession was clearly announced and after a 1 minute inter-val the first phase of the test began

In test phase I participants were asked to listen to staticsound sources at a randomly chosen points focusing onthe perceived distance They could listen to any audio sam-ple as many times as they wished During the playbackthey were instructed to stay still and refrain from any trans-lational head movements However they were encouragedto rotate their head freely After the playback had stoppedthey were asked to walk guided by the rope to the pointwhere they thought the sound originated from The dis-tance walked was subsequently recorded by the assistantusing a laser measuring tool after which the participantwalked backwards to the origin In the meantime the loud-speaker was noiselessly translated to its new position and

67

ACTA ACUSTICA UNITED WITH ACUSTICA Kearney et al Distance perceptionVol 98 (2012)

the test proceeded Similar to the training session no feed-back was given at any stage

During the first test phase participants had to indicatethe perceived distance for sound sources randomly lo-cated at 2m 4m 6m or 8m Taking into account thatboth speech and pink noise bursts samples were used (ina pseudo-random order) the number of trials in the firstphase added up to 8 Each subject performed all the trialsonly once

Upon completion of the first phase of the test there wasa short (approximately 2 minutes) interval that was re-quired in order to put on the headphones and calibratethe head-tracking system In phase II subjects were alsoasked to identify the sound source distance but this timeusing Ambisonic sound fields presented over headphonesOther than the fact that headphones and the head-trackingsystem were used the test protocol remained the same asin phase I However due to the fact that there were threeplayback configurations to be tested (1st 2nd and 3rd orderAmbisonics) participants had to perform 24 trails insteadof 8 Instead of separate phases for each Ambisonic orderall samples were randomly presented to the subject withinthe same test phase Again subjects performed all the tri-als only once and no feedback was given at any stage

7 Results

The perceived sound source distance (indicated by the dis-tance walked) was collected from 7 subjects for 4 presen-tation points (2m 4m 6m and 8m) two stimuli (femalespeech and pink noise bursts) and four playback options1st 2nd and 3rd Order Ambisonics and real loudspeakerswhich for analysis we will denote FOA SOA TOA andREAL respectively With headphone trials none of theparticipants reported in-head localization however therewere 3 cases were the proximity of the sound source wasvery apparent so participants decided not to move at allIn some cases the virtual sound source was initially local-ized behind the subjects but all participants were able toresolve the confusion by applying head-rotation

We computed the mean values of walked distances micro foreach test condition along with the corresponding standarderrors se(micro) The results are presented separately for eachstimulus type within 95 Confidence Intervals

As expected the perception of distance for the real sour-ces was more accurate for near sources Beyond 4m dis-tance perception was continuously underestimated whichis congruent with the previous studies outlined in sec-tion 2 Furthermore the standard deviation of localiza-tion increases as the source moves further into the diffusefield We also see that unfamiliar stimuli produce greatervariability in subjectsrsquo answers The mean localization ofthe virtual sources follows the reference source localiza-tion well The answers for virtual sources deviate fromtheir means roughly in the same fashion as the answers forreference sources as localization becomes more difficultwithin the diffuse field

Since the study followed the within-subject factorial de-sign with 2(stimuli)4(playback conditions) in order to

0 1 2 3 4 5 6 7 8 90

1

2

3

4

5

6

7

8

9

Real distance [m]

Dis

tance

walk

ed

[m]

FOASOATOAReal

Figure 8 Mean localization of real and virtual sound sources (fe-male speech)

0 1 2 3 4 5 6 7 8 90

1

2

3

4

5

6

7

8

9

Real distance [m]

Dis

tance

walk

ed

[m]

FOASOATOAReal

Figure 9 Mean localization of real and virtual sound sources(pink noise bursts)

investigate the effects of these two factors (referred lateras factors A and B) as well as potential interaction ef-fects for each presentation distance a two-way ANOVAhas been performed The null hypothesis being tested hereis that all the mean perceived distances for all the stimuliand playback methods do not differ significantly

H0 microFOA=microSOA=microTOA=microReal=micro

H1 not all localization means (microi) are the same

No statistically significant effect of stimuli (familiar vsunfamiliar) on the perception of distance has been found(F2m(3 48) = 0835 p = 0365 F4m(3 48) = 20462p = 0159 F6m(3 48) = 2575 p = 0115 F8m(3 48) =20462 p = 0159) For distances of 4m and moreplayback option had also no statistically significant effect(F4m(3 48) = 2192 p = 0101 F6m(3 48) = 0665p = 0577 F8m(3 48) = 0202 p = 0894)

However a statistically significant difference has beendetected for the distance of 2m In larger study de-signs with multiple levels it is advisable to use the Hon-

68

Kearney et al Distance perception ACTA ACUSTICA UNITED WITH ACUSTICAVol 98 (2012)

Table I Mean localization [m] of virtual and real sound sourcesat 2m

microFOA microSOA microTOA microReal

Speech 1119 1389 0841 1638Noise 0877 1001 0902 1641

Table II Correlation coefficients ρ and corresponding pminus valuesfor pairs of distance estimations for real and virtual soundsources (Speech)

ρ p minus valueReal vs FOA 09828 00172Real vs SOA 09960 00040Real vs TOA 09590 00410

Table III Correlation coefficients ρ and corresponding pminusvaluesfor pairs of distance estimations for real and virtual soundsources (Noise)

ρ p minus valueReal vs FOA 09913 00087Real vs SOA 09857 00143Real vs TOA 09972 00028

estly Significant Difference (HSD) approach since thereis an increased risk of spuriously significant differencearisen purely by chance So in order to investigate furtherwhere the difference occurs anHSD has been computed(HSD = 1423m) If we now compile the table of meanperceived distances for the sound sources located at 2mwe can see that all of the above values clearly lie withina single HSD to each other and cannot be distinguishedWe can safely assume then an ANOVA false alarm (typeI error) and no statistically significant effect of playbackmethod for the sources at the distance of 2m as well

Lastly for all the distances no synergetic effects of fac-tors A (stimuli) and B (playback conditions) have beendetected

Additionally we calculated correlation coefficients ρ forpairs of distance estimations for real and virtual soundsources (either 1st 2nd or 3rd order) and two stimuli Inall cases high correlation coefficients have been obtainedwhich confirms our findings that for these particular testconditions the perception of distance of binaurally ren-dered Ambisonic sound fields of orders 1 to 3 cannot bedistinguished from the perception of distance of the realsound sources

8 Discussion

The results presented for real sources corroborate the clas-sic underestimation of source distance as reported in theliterature These results were used as a basis with whichto measure the ability of Ambisonic sound fields of differ-ent orders to present sources at different distances It was

expected that a further underestimation of the source dis-tance would ensue with the binaural rendering as reportedin [17] However this was not the case even for first or-der presentations and the apparent distances of the vir-tual sources matched the real source distances well Oneshould note that the major difference between this studyand that of [17] is our use of head-tracking indicatingthe importance of head-movements in perceiving sourcedistance which develops the findings of Waller [18] andAshmead et al [10] on user interaction in a virtual spaceFurther work is required to quantify the effect of this

Moreover the presented study demonstrates that the en-hanced directional accuracy gained by presenting soundsources in HOA through head-tracked binaural renderingdoes not yield a significant improvement in the perceptionof the source distance What is noteworthy is that for eachorder there is no significant difference in the perception ofthe source location when compared to real-world sourcesWe therefore conclude that sound field directionality fordistance perception is sufficient with 1st order playback

The presence of the ANOVA false alarm at the 2m pointis of interest It is noteworthy that the 2m point representsa source inside the virtual array geometry It is a knownissue that virtual sound sources rendered inside the arrayof loudspeakers cannot be reproduced in a straightforwardway without artifacts Some of these artifacts include in-correct wave-front curvature and insufficient bass boost

In the first case there is ample evidence in the litera-ture to suggest that the wavefront curvature translates to asignificant binaural cues for sound sources near the head[30 38] It was already shown in section 21 that as asource moves closer to the head the levels of the monau-ral transfer function and the ILD both change significantlywith source angle However this effect is not strong at 1mand beyond For sources further away it has been shownin [39] that it is very difficult to assess distance by binauralcues alone

In the second case the requirement for distance com-pensation filtering due to near field effects for the largeloudspeaker radius (327m) and the given source distances(gt2m) is only prominent below 100Hz For the femalespeech test stimuli this will not have an effect since thefirst formant frequencies do not go down below 180HzAlso the current method employed for capturing HRIRsallowed for reliably obtaining filters with a frequencyresponse reaching down to around 170Hz thereby alsoband-limiting the delivery of the pink noise stimuli

Finally there was no significant difference in the resultspresented for different sources although the greater vari-ance in the results for pink noise suggest that the famil-iarity of the source does indeed play a role in the percep-tion of source distance as mentioned in section 23 Futurestudies will investigate the use of these monaural cues fur-ther and will utilize 0th order sound field rendering sinceit will remove the influence of any directional information

Considering the aforementioned study of Bronkhorst etal [14] where the accuracy of distance perception for bin-aural playback increases with the number of reflections

69

ACTA ACUSTICA UNITED WITH ACUSTICA Kearney et al Distance perceptionVol 98 (2012)

our findings demonstrate that the net effect of the monauralcues of direct to reverberant ratio level difference and timeof arrival of early reflections are of greater importance indistance perception for binaural rendering than Ambisonicdirectional accuracy beyond 1st order

9 Conclusions

We have assessed through subjective analysis the per-ceived source distance in virtual Ambisonic sound fieldsin comparison to real world sources The hypothesis testedwas that enhanced directional accuracy of deterministicpart of the sound field may lead to better reconstructionof environmental depth and thus improve the perceptionof sound source distance However it was shown thatAmbisonic reproduction matches the perceived real worldsource distances well even at 1st order and no improvementin this regard was observed when increasing the order Itmust be emphasized though that this analysis applies toAmbisonic-to-binaural decodes with higher order synthe-sis achieved using the directional analysis method of [23]Therefore further work will examine this topic for loud-speaker reproduction for both centre and off-centre listen-ing as well as investigate the effectiveness of HOA synthe-sis in comparison to real world HOA measurements

Acknowledgments

The authors gratefully acknowledge the participation ofthe test subjects for both their time and constructive com-ments as well as the technical support staff at the Depart-ment of Theatre Film and Television at the University ofYork for their assistance in the experimental setups Thisresearch is supported by Science Foundation Ireland

References

[1] L Fauster Stereoscopic techniques in computer graphicsTechnical paper TU Wien 2007

[2] J Lee Head tracking for desktop VR displays using theWii remote httpjohnnyleenetprojectswiiaccessed 30th Sept 2011

[3] D R Begault Direct comparison of the impact of headtracking reverberation and individualized head-relatedtransfer functions on the spatial perception of a virtualsound source J Audio Eng Soc 49 (2001) 904ndash916

[4] M Otani T Hirahara Auditory artifacts due to switchinghead-related transfer functions of a dynamic virtual audi-tory display IEICE Trans Fundam Electron CommunComput Sci E91-A (2008) 1320ndash1328

[5] V Pulkki Virtual sound source positioning using VectorBase Amplitude Panning J Audio Eng Soc 45 (1997)456ndash466

[6] A J Berkhout A Holographic Approach to Acoustic Con-trol J Audio Eng Soc 36 (1988) 977ndash995

[7] M A Gerzon Periphony With-height sound reproductionJ Audio Eng Soc 21 (1973) 2ndash10

[8] F Rumsey Spatial quality evaluation for reproducedsound Terminology meaning and a scene-based para-digm J Audio Eng Soc 50 (2002) 651ndash666

[9] J Blauert Communication acoustics Springer 2008

[10] D H Ashmead D L Davis A Northington Contributionof listenersrsquo approaching motion to auditory distance per-ception J Exp Psy Hum Percep and Perform 21 (1995)239ndash256

[11] E Czerwinski A Voishvillo S Alexandrov A TerekhovPropagation distortion in sound systems Can we avoid itJ Audio Eng Soc 48 (2000) 30ndash48

[12] S H Nielsen Auditory distance perception in differentrooms J Audio Eng Soc 41 (1993) 755ndash770

[13] M B Gardner Distance estimation of 0 or apparent 0oriented speech signals in anechoic space J Acoust SocAm 45 (1969) 47ndash53

[14] A W Bronkhorst T Houtgast Auditory distance percep-tion in rooms Nature 397 (1999) 517ndash520

[15] J B Allen D A Berkley Image method for efficientlysimulating small-room acoustics J Acoust Soc Am 65(1979) 943ndash950

[16] M Rychtarikova T V d Bogaert G Vermeir J WoutersBinaural sound source localization in real and virtualrooms J Audio Eng Soc 57 (2009) 205ndash220

[17] J S Chan C Maguinness D Lisiecka C Ennis MLarkin C OrsquoSullivan F Newell Comparing audiovisualdistance perception in various real and virtual environ-ments Proc of the 32nd Euro Conf on Vis Percep Re-gensburg Germany 2009

[18] D Waller Factors affecting the perception of interobjectdistances in virtual environments Presence Teleoper Vir-tual Environ 8 (1999) 657ndash670

[19] A McKeag D McGrath Sound field format to binauraldecoder with head-tracking Proc of the 6th Australian Re-gional Convention of the AES 1996

[20] M Noisternig A Sontacchi T Musil R Holdrich A3D Ambisonic based binaural sound reproduction systemProc of the 24th Int Conf of the Audio Eng Soc AlbertaCanada 2003

[21] B-I Dalenback M Str

omberg Real time walkthrough au-

ralization - the first year Proc of the Inst of AcousCopenhagen Denmark 2006

[22] C Masterson S Adams G Kearney F Boland A methodfor head related impulse response simplification Procof the 17th European Signal Processing Conference (EU-SIPCO) Glasgow Scotland 2009

[23] J Merimaa V Pulkki Spatial impulse response renderingi Analysis and synthesis J Audio Eng Soc 53 (2005)

[24] W M Hartmann Localization of sound in rooms JAcoust Soc Am 74 (1983) 1380ndash1391

[25] D Griesinger Spatial impression and envelopment in smallrooms Proc of the 103rd Conv of the Audio Eng SocNew York USA 1997

[26] G Kearney M Gorzel H Rice F Boland Depth per-ception in interactive virual acoustic environments usinghigher order ambisonic soundfields Proc of the 2nd IntAmbisonics Symp Paris France 2010

[27] A Farina Simultaneous measurement of impulse responseand distortion with a swept-sine technique Proc of the108th Conv of the Audio Eng Soc Paris France 2000

[28] M Gerzon The design of precisely coincident microphonearrays for stereoand surround sound Proc of the 50thConv of the Audio Eng Soc London UK 1975

70

Kearney et al Distance perception ACTA ACUSTICA UNITED WITH ACUSTICAVol 98 (2012)

[29] C Guastavino B F G Katz Perceptual evaluation ofmulti-dimensional spatial audio reproduction J AcoustSoc Am 116 (2004) 1105ndash1115

[30] P Zahorik Assessing auditory distance perception usingvirtual acoustics J Acoust Soc Am 111 (2002) 1832ndash1846

[31] J M Loomis R L Klatzky J W Philbeck R G Goll-edge Assessing auditory distance perception using percep-tually directed action Perception And Psychophysics 60(1998) 966ndash980

[32] T Y Grechkin T D Nguyen J M Plumert J F CremerJ K Kearney How does presentation method and measure-ment protocol affect distance estimation in real and virtualenvironments ACM Trans Appl Percept 7 (2010) 261ndash2618

[33] S Bertet Formats audio 3d hieacuterarchiques Caracteacuterisationobjective et perceptive des systeacutemes ambisonicsdrsquoordressupeacuterieurs PhD dissertation INSA Lyon 2008

[34] W M Fisher G R Doddington K M Goudie-MarshallThe darpa speech recognition research database Specifica-tions and status Proc of the DARPA Workshop on SpeechRecognition 1986

[35] M A Gerzon G J Barton Ambisonic decoders forHDTV Proc of the 92nd Conv of the Audio Eng SocVienna Austria 1992

[36] NaturalPoint Trackir 5 httpwwwnaturalpointcomtrackir accessed 30th Sept 2011

[37] M Puckette Pure data httppuredatainfo ac-cessed 30th Sept 2011

[38] P Zahorik D S Brungart A W Bronkhorst Auditory dis-tance perception in humans A summary of past and presentresearch Acta Acustica united with Acustica 91 (2005)409ndash420

[39] H Wittek Perceptual differences between Wavefield Syn-thesis and Stereophony Department of Music and SoundRecording School of Arts Communication and Humani-ties University of Surrey UK 2007

71

Page 3: 1 ) -0 #0*- ! 1 ) #! !-* 0* ! ) *- ! ) ) ) *#! #0! *sigmedia/pmwiki/uploads/Main...# ( " &/4$/' ,$ +$ *- ! ) %- #! ! !- ) - 1 ) -0 #0*- ! 1 ) #! !-* 0* ! ) *- ! )) ) *#! #0! * K05

Kearney et al Distance perception ACTA ACUSTICA UNITED WITH ACUSTICAVol 98 (2012)

the order of spherical harmonic decomposition Howeverbetter directional localization can be achieved without af-fecting other important cues for distance estimation likeoverall sound intensity or direct to reverberant energy ra-tio Thus it can constitute an ideal framework for testingwhether less apparent properties of a sound field can influ-ence the perception of distance

23 Former Psychoacoustical Studies on DistancePerception

The perception of distance has been shown to be one that isnot linearly proportional to the source distance For exam-ple both Nielson et al [12] and Gardner [13] have shownthat the localization of speech signals is consistently un-derestimated in an anechoic environment This underesti-mation has also been shown by other authors in the contextof reverberant environments both real and virtual In [14]Bronkhorst et al demonstrate that in a damped virtual en-vironment sources are consistently perceived to be closerthan in a reverberant virtual environment due to the directto reverberant ratio In their studies the room simulationis conducted using simulated Binaural Room Impulse Re-sponses (BRIRs) created from the image source method[15] They show how perceived distance increases rapidlywith the number and amplitude of the reflections

In a similar study Rychtarikova et al [16] investi-gated the difference in localization accuracy between realrooms and computationally derived BRIRs Their findingsshow that at 1m localization accuracy in both the virtualand real environments is in good agreement with the truesource position However at 24m the accuracy degradesand high frequency localization errors were found in thevirtual acoustic pertaining to the difference in HRTFs be-tween the model and the subject In the same vain Chan etal [17] have shown that distance perception using record-ings made from the in-ear microphones on individual sub-jects again lead to underestimation of the source distancein virtual reverberant environments more so than with realsources

Waller [18] and Ashmead et al [10] have identified thatone of the factors improving distance perception is the lis-tener movement in the virtual or real space It is thereforecrucial to account for any listenerrsquos movements (or lackthereof) in the experimental design

Similarly for headphone reproduction of virtual acous-tic environments small subconscious head rotations maylead to improvements in distance perception by providingenhanced ILD and ITD cues Therefore the sound fieldtransformations should reflect well the small changes oforientation of the listenerrsquos head

3 Ambisonic Spatialization

Ambisonics was originally developed by Gerzon Bartonand Fellgett [7] as a unified system for the recording re-production and transmission of surround sound The the-ory of Ambisonics is based on the decomposition of the

sound field measured at a single point in space into spher-ical harmonic functions defined as

Y σmn(ΦΘ) = AmnPmn(sinΘ) (1)

middot cos(mΦ) if σ = +1sin(mΦ) if σ = minus1

where m is the order and n is the degree of the sphericalharmonic and Pmn is the fully normalized (N3D) associ-ated Legendre function The coordinate system used com-prises x y and z axes pointing to the front left and uprespectively Φ is the azimuthal angle with the clockwiserotation and Θ is the elevation angle form the x-y planeFor each order m there are (2m + 1) spherical harmonics

In order for plane wave representation over a loud-speaker array we must ensure that

s Y σmn(ΦΘ) =I

i=1

gi Yσmn(φi θi) (2)

where s is the pressure of the source signal from direction(ΦΘ) and gi is the ith loudspeaker gain from direction(φi θi) We can then express the left hand side of equation(2) in vector notation giving the Ambisonic channels

B = YΦΘs (3)

= Y 100(ΦΘ) Y 1

10(ΦΘ) Y σmm(ΦΘ)Ts

Equation (2) can then be rewritten as

B = C middot g (4)

where C are the encoding gains associated with the loud-speaker positions and g is the loudspeaker signal vector Inorder to obtain g we require a decode matrix D which isthe inverse of C However to invert C we need the matrixto be a square which is only possible when the number ofAmbisonic channels is equal to the number of loudspeak-ers When the number of loudspeaker channels is greaterthan the number of Ambisonic channels which is usuallythe case we then obtain the pseudo-inverse of C where

D = pinv(C) = CT (CCT )minus1 (5)

Since the sound field is represented by a spherical coor-dinate system sound field transformation matrices can beused to rotate tilt and tumble the sound fields In this waythe Ambisonic signals themselves can be controlled by theuser allowing for the virtual loudspeaker approach to beemployed For 3-D reproduction the number of I virtualloudspeakers employed with the Ambisonics approach isdependent on the Ambisonic order m where

I ge N = (m + 1)2 (6)

63

ACTA ACUSTICA UNITED WITH ACUSTICA Kearney et al Distance perceptionVol 98 (2012)

4 Virtual Loudspeaker Reproduction

In the lsquovirtual loudspeakerrsquo approach HRTFs are mea-sured at the lsquosweet-spotrsquo (the limited region in the cen-tre of a reproduction array where an adequate spatial im-pression is generally guaranteed) in a multi-loudspeakerreproduction setup and the resultant binaural playback isformed from the convolution of the loudspeaker feeds withthe virtual loudspeakers This concept is illustrated in Fig-ure 3 For the left ear we have

L =I

i=1

hLi lowast qi (7)

where lowast denotes convolution and hLi is the left ear HRIRcorresponding to the ith virtual loudspeaker and qi is theith loudspeaker feed Similar relations apply for the rightear signal This method was first introduced by McKeagand McGrath [19] and examples of its adoption can befound in [20] and [21] This approach has major computa-tional advantages since a complex filter kernel is not re-quired and head rotation can be simulated by changing theloudspeaker feeds p as opposed to the HRTFs Whilst theHRTFs in this case play an important role in the spatializa-tion ultimately it is the sound field creation over the virtualloudspeakers which gives the overall spatial impressionMost existing research uses a block frequency domain ap-proach to this convolution However given that the virtualloudspeaker feeds are controlled via head-tracking in real-time a time-domain filtering approach can also be utilizedFor short filter lengths obtaining the output in a point wisemanner avoids the inherent latencies introduced by blockconvolution in the frequency domain A strategy for sig-nificant reduction of the filter length without artifacts hasbeen proposed in [22]

5 Higher Order Synthesis

In order to compare the distance perception of differentorders of Ambisonic sound fields it is desirable to takereal world sound field measurements However the for-mation of higher order spherical harmonic directional pat-terns is non-trivial Thus in order for us to change FOAimpulse responses to HOA representations we will em-ploy a perceptual based approach which will allow us tosynthesize the increased directional resolution that wouldbe achieved with a HOA sound field recording For this weadopt the directional analysis method of Pulkki and Meri-maa found in [23] Here the B-format signals are analyzedin terms of sound intensity and energy in order to derivetime-frequency based direction of arrival and diffusenessThe instantaneous intensity vector is given from the pres-sure p and particle velocity u as

I(t) = p(t) u(t) (8)

Since we are using FOA impulse response measurementsthe pressure can be approximated by the 0th order Am-bisonics component w(t) which is omnidirectional

p(t) = w(t) (9)

Figure 3 The virtual loudspeaker reproduction concept

and the particle velocity by

u(t) =1radic2Z0

x(t)ex + y(t)ey + z(t)ez (10)

where ex ey and ez represent Cartesian unit vectors x(t)y(t) z(t) are the FOA signals and Z0 is the characteristicacoustic impedance of air

The instantaneous intensity represents the direction ofthe energy transfer of the sound field and the direction ofarrival can be determined simply by the opposite directionof I For FOA we can calculate the intensity for each coor-dinate axis and in the frequency domain Since a portionof the energy will also oscillate locally a diffuseness esti-mate can be made from the ratio of the magnitude of theintensity vector to the overall energy density E given as

ψ = 1 minusI

c E (11)

where middot denotes time averaging || middot || denotes the normof the vector and c is the speed of sound The diffusenessestimate will yield a value of zero for incident plane wavesfrom a particular direction but will give a value of 1 wherethere is no net transport of acoustic energy such as in thecases of reverberation or standing waves Time averagingis used since it is difficult to determine an instantaneousmeasure of diffuseness

64

Kearney et al Distance perception ACTA ACUSTICA UNITED WITH ACUSTICAVol 98 (2012)

The output of the analysis is then subject to smoothingbased on the Equivalent Rectangular Bandwidth (ERB)scale such that the resolution of the human auditory sys-tem is approximated Since the frequency dependent direc-tion of arrival of the non-diffuse portion of the sound fieldcan be determined HOA reproduction can be achievedby re-encoding point like sources corresponding to the di-rection indicated in each temporal average and frequencyband into a higher order spherical harmonic representa-tion The resultant Ambisonic signals are then weighted ineach frequency band k according to 1 minus ψk However itis only vital to re-encode non-diffuse components to higherorder and the diffuse field can be obtained by multiplyingthe FOA signals by

radicψk and forming a first order decode

This is justified since source localisation is dependent onthe direction of arrival of the direct sound and early reflec-tions and not on late room reverberation [24] Thus fromthe perceptual point of view it is questionable whetherthere is a need to preserve the full directional accuracy ofthe reverberant field Furthermore if there exists a generaldirectional distribution to the diffuse field this will still bepreserved in first order form On the other hand the diffusecomponent should not be simply derived from the 0th ordersignal One can easily see that such a solution would pro-vide perfectly correlated versions of the diffuse field to theleft and right ear signals which have no equivalent in thephysical world (ie real physical sound field) Moreoverinteraural decorrelation is an important factor in providingspatial impression in enclosed environments [25]

Figure 4 shows an example of the first 20ms of a 1st

order impulse response taken in a reverberant hall [26]Here the source was located 3m from a Soundfield

ST350 microphone and the Spatial Room Impulse Re-sponse (SRIR) captured using the exponentially swept-sine tone technique [27] In these plots particular attentionis drawn to the direct sound (coming from directly in frontof the microphone) and a left wall reflection at approxi-mately 14ms It can be seen that the directional resolutionincreases significantly with HOA representation It shouldbe noted that the A-format capsule on sound field micro-phones only display adequate directionality up to 10 kHz[28] Spatial aliasing is therefore an issue for high fre-quencies and as a result the directional information above10 kHz cannot be relied upon

6 Method Localization of Distance of TestSounds

Different protocols have been used in literature for subjec-tive assessment of distance perception most notably a ver-bal report [29 30] direct or indirect blind walking [31 32]or imagined timed walking [32] All of these methods haveproved to provide reliable and comparable results for bothauditory and visual stimuli with direct blind walking ex-hibiting the least between-subject variability [31 32]

In former work [26] authors of this paper developed amethod where subjects indicated the perceived distance ofreal and virtual sound sources by selecting one of several

Direct sound

Left wall reflection

Direct sound

Left wall reflection

(a)

(b)

Figure 4 Ambisonic sound field from 1st order measurementwith a Soundfield ST350 (a) 1st order representation (b) 3rd

order up-mix

physical loudspeakers lined up (and slightly offset in orderto provide lsquoacoustic transparencyrsquo) in front of their eyesHowever for the present study in order to completelyeliminate any possible anchors as well as visual cues itwas decided to utilize the method of direct blind walkingOf the main concerns in the experiment was a direct com-parison of distance perception of real sound sources versusvirtual sound sources presented over headphones Due todifferent apparatus requirements the experiment had to beconducted in two separate phases

61 Participants

Seven participants aged 24ndash58 took part in the experimentAll subjects were of good hearing and were either musictechnology students or practitioners actively involved inaudio research or production Prior to the test HRIR data

65

ACTA ACUSTICA UNITED WITH ACUSTICA Kearney et al Distance perceptionVol 98 (2012)

Figure 5 Measuring Head Related Impulse Responses withminiature microphones

for all the participants has been obtained in a sound-prooflarge (18times15times10m3) but quite damped (T60 1000Hz =057 s) multipurpose room (Black Box) in the Departmentof Theatre Film and Television at the University of YorkAdditional damping was assured by thick heavy curtainscovering all four walls and a carpet on the floor Themeasurement process consisted of a standard procedurewhere miniature omnidirectional microphones (KnowlesFG-23629-P16) were placed at the entrance of a blockedear canal in order to capture acoustic pressure generatedby one loudspeaker at a time located at constant distanceand varying angular direction

Subjects were seated on an elevated platform so thattheir ears were 220m above the ground and their headwas in the centre of a spherical loudspeaker array arrangedin diametrically opposed pairs The ear height was cali-brated using a laser guide as shown in Figure 5 The ar-ray consisted of 16 full range Genelec 8050A loudspeak-ers since the intention was to reproduce Ambisonic soundfields up to and including 3rd order This 3-D setup shownin Figure 6 comprised a flat-front horizontal octagon anda cube (four loudspeakers on top and four on the bottom)The radius of the loudspeaker array (and thus the virtualloudspeaker array) was 327m For FOA-to-binaural de-code only virtual loudspeakers from the cube configura-tion were utilized since no directional resolution is gainedby using a higher number of loudspeakers Furthermoredespite careful alignment oversampling of the sound fieldwith higher numbers of speakers has the potential to yieldsound field distortions [33] Note that for 2nd and 3rd order

Figure 6 Array of 16 loudspeakers used for HRIR measure-ments

reproduction all 16 loudspeakers were used Although theoversampled configuration was not optimal from the 2nd

order reproduction point of view it was not possible toeasily and accurately rearrange the loudspeaker array inorder to accommodate for a different layout

HRIRs were captured using the exponentially swept-sine tone technique [27] at 441 kHz sampling rate and16-bit resolution Since the measurement environment wasnot fully anechoic further processing of the measured datawas necessary The HRIRs were tapered before the arrivalof the first reflection (from the floor) yielding filter kernelswith 257 taps and were subsequently diffuse-field equal-ized

62 Stimuli

The stimuli used in the experiment were pink noisebursts and phonetically balanced phrases selected fromthe TIMIT Acoustic-Phonetic Continuous Speech Corpusdatabase and recorded by a female reader [34] A sam-pling rate of 441 kHz and 16 bit resolution was used inboth cases These two sample types were selected in orderto represent both unfamiliar and familiar sound sourcesThey were presented to the subjects in a pseudo-random-ized manner to avoid any ordering effects

For headphone reproduction prior to the test phaseFOA impulse response measurements were taken from thelistener position of each loudspeaker using the exponen-tially swept-sine tone technique [27] From these measure-ments 2nd and 3rd order impulse response sets were ex-tracted using the directional analysis approach outlined insection 5 0th order Ambisonics does not provide any di-rectional information which means that it would lack thecues that are investigated in the higher order renderingsTherefore it was decided not to include it in this compari-son

The only psychoacoustical optimization applied to theAmbisonics decodes was shelf filtering and was intendedto satisfy Gerzonrsquos localization criteria for maximized ve-locity decode at low frequencies and energy decode athigher frequencies [35] This involved changing the ratio

66

Kearney et al Distance perception ACTA ACUSTICA UNITED WITH ACUSTICAVol 98 (2012)

of the pressure to velocity components at low and highfrequencies Whilst the crossover frequency for the highfrequency boost in the pressure channel at first order isnormally in the region of 400Hz for regular loudspeakerlistening here we restore the crossover point to 700Hzsince the subject is always perfectly centred in the virtualloudspeaker array

63 Test Environment and Apparatus

A series of subjective listening tests was conducted in theLarge Rehearsal Room in the Department of Theatre Filmand Television in the University of York The room dimen-sions were 12 times 9 times 35m3 and the spatially averaged T60at 1 kHz was 026 s A low T60 was desired for this studyso the walls were covered with thick heavy curtains asshown in Figure 7 Since the up-mix from 1st to 2nd and 3rd

order Ambisonics concerned only the deterministic part ofthe measured SRIRs it was assumed that no advantagewould be gained from using a more reverberant space

A professional camera dolly track was set up roughlyin the direction of the diagonal of the room It not onlyallowed for testing distances of the real loudspeaker upto 8m but its non-symmetrical position also assured thatearly reflections of the same order from different surfacesdid not easily coincide at the subjects ears but instead ar-rived at different times A single full-range loudspeaker(Genelec 8050A) was mounted on a camera dolly whichenabled it to be noiselessly translated by the experimentassistant to different locations The guiding rope was hungalong the dolly track which was intended to help and guidethe participants when walking toward the sound sourceSince it was not possible to walk exactly on the dolly trackit was decided that the walking path would be directly nextto it as shown in Figure 7 The only weakness of this so-lution was that the sound source horizontal angle variedfrom 1404 degrees at the closest distance (2m) to 358 de-grees at the furthest distance (8m) However this did nothave any effect on the distance judgments for two reasonsFirstly the subjects were allowed (or even encouraged) torotate their head in order to fully utilize the available ITDand ILD cues Secondly the initial head orientation wasnot in any way fixed This combined with the fact thatthere were no clear cues to the subjectrsquos initial orientationin the room at the origin made this small initial angularoffset unimportant Furthermore none of the participantsreported any bias in their assessment based on the horizon-tal offset of the sound source

For trials with binaural presentation high quality openback headphones (AKG-K601) were used which exhibitlow levels of interaural magnitude and group delay dis-tortion Sound field rotation tilt and tumble control wasimplemented via the TrackIR 5 infra-red head trackingsystem [36] resulting in stable virtual images with headrotations The system responsible for playback of virtual-ized sound sources was completely built in the Pure Datavisual programming environment [37] and its combinedlatency (including head-tracker data porting and audio up-date rate) was 20ms

Figure 7 Participant performing a trial during the experiment

64 Procedure

In the experiment subjects entered the test environmentblindfolded and without any prior expectation regardingthe room dimensions its acoustic properties or the testapparatus They were guided by the experimenter to thereference point (the lsquooriginrsquo) After a short explanation ofthe experiment objectives a training session began witha short (3ndash5min) walking-only trial until participants feltcomfortable with walking blindfolded and using a guiderope Next they performed 4ndash6 training trials in which thesame test stimuli to be used in the experiment (speech andpink noise) were played by the loudspeaker at randomlychosen distances No feedback was given and no resultswere recorded after each test trial The end of the trainingsession was clearly announced and after a 1 minute inter-val the first phase of the test began

In test phase I participants were asked to listen to staticsound sources at a randomly chosen points focusing onthe perceived distance They could listen to any audio sam-ple as many times as they wished During the playbackthey were instructed to stay still and refrain from any trans-lational head movements However they were encouragedto rotate their head freely After the playback had stoppedthey were asked to walk guided by the rope to the pointwhere they thought the sound originated from The dis-tance walked was subsequently recorded by the assistantusing a laser measuring tool after which the participantwalked backwards to the origin In the meantime the loud-speaker was noiselessly translated to its new position and

67

ACTA ACUSTICA UNITED WITH ACUSTICA Kearney et al Distance perceptionVol 98 (2012)

the test proceeded Similar to the training session no feed-back was given at any stage

During the first test phase participants had to indicatethe perceived distance for sound sources randomly lo-cated at 2m 4m 6m or 8m Taking into account thatboth speech and pink noise bursts samples were used (ina pseudo-random order) the number of trials in the firstphase added up to 8 Each subject performed all the trialsonly once

Upon completion of the first phase of the test there wasa short (approximately 2 minutes) interval that was re-quired in order to put on the headphones and calibratethe head-tracking system In phase II subjects were alsoasked to identify the sound source distance but this timeusing Ambisonic sound fields presented over headphonesOther than the fact that headphones and the head-trackingsystem were used the test protocol remained the same asin phase I However due to the fact that there were threeplayback configurations to be tested (1st 2nd and 3rd orderAmbisonics) participants had to perform 24 trails insteadof 8 Instead of separate phases for each Ambisonic orderall samples were randomly presented to the subject withinthe same test phase Again subjects performed all the tri-als only once and no feedback was given at any stage

7 Results

The perceived sound source distance (indicated by the dis-tance walked) was collected from 7 subjects for 4 presen-tation points (2m 4m 6m and 8m) two stimuli (femalespeech and pink noise bursts) and four playback options1st 2nd and 3rd Order Ambisonics and real loudspeakerswhich for analysis we will denote FOA SOA TOA andREAL respectively With headphone trials none of theparticipants reported in-head localization however therewere 3 cases were the proximity of the sound source wasvery apparent so participants decided not to move at allIn some cases the virtual sound source was initially local-ized behind the subjects but all participants were able toresolve the confusion by applying head-rotation

We computed the mean values of walked distances micro foreach test condition along with the corresponding standarderrors se(micro) The results are presented separately for eachstimulus type within 95 Confidence Intervals

As expected the perception of distance for the real sour-ces was more accurate for near sources Beyond 4m dis-tance perception was continuously underestimated whichis congruent with the previous studies outlined in sec-tion 2 Furthermore the standard deviation of localiza-tion increases as the source moves further into the diffusefield We also see that unfamiliar stimuli produce greatervariability in subjectsrsquo answers The mean localization ofthe virtual sources follows the reference source localiza-tion well The answers for virtual sources deviate fromtheir means roughly in the same fashion as the answers forreference sources as localization becomes more difficultwithin the diffuse field

Since the study followed the within-subject factorial de-sign with 2(stimuli)4(playback conditions) in order to

0 1 2 3 4 5 6 7 8 90

1

2

3

4

5

6

7

8

9

Real distance [m]

Dis

tance

walk

ed

[m]

FOASOATOAReal

Figure 8 Mean localization of real and virtual sound sources (fe-male speech)

0 1 2 3 4 5 6 7 8 90

1

2

3

4

5

6

7

8

9

Real distance [m]

Dis

tance

walk

ed

[m]

FOASOATOAReal

Figure 9 Mean localization of real and virtual sound sources(pink noise bursts)

investigate the effects of these two factors (referred lateras factors A and B) as well as potential interaction ef-fects for each presentation distance a two-way ANOVAhas been performed The null hypothesis being tested hereis that all the mean perceived distances for all the stimuliand playback methods do not differ significantly

H0 microFOA=microSOA=microTOA=microReal=micro

H1 not all localization means (microi) are the same

No statistically significant effect of stimuli (familiar vsunfamiliar) on the perception of distance has been found(F2m(3 48) = 0835 p = 0365 F4m(3 48) = 20462p = 0159 F6m(3 48) = 2575 p = 0115 F8m(3 48) =20462 p = 0159) For distances of 4m and moreplayback option had also no statistically significant effect(F4m(3 48) = 2192 p = 0101 F6m(3 48) = 0665p = 0577 F8m(3 48) = 0202 p = 0894)

However a statistically significant difference has beendetected for the distance of 2m In larger study de-signs with multiple levels it is advisable to use the Hon-

68

Kearney et al Distance perception ACTA ACUSTICA UNITED WITH ACUSTICAVol 98 (2012)

Table I Mean localization [m] of virtual and real sound sourcesat 2m

microFOA microSOA microTOA microReal

Speech 1119 1389 0841 1638Noise 0877 1001 0902 1641

Table II Correlation coefficients ρ and corresponding pminus valuesfor pairs of distance estimations for real and virtual soundsources (Speech)

ρ p minus valueReal vs FOA 09828 00172Real vs SOA 09960 00040Real vs TOA 09590 00410

Table III Correlation coefficients ρ and corresponding pminusvaluesfor pairs of distance estimations for real and virtual soundsources (Noise)

ρ p minus valueReal vs FOA 09913 00087Real vs SOA 09857 00143Real vs TOA 09972 00028

estly Significant Difference (HSD) approach since thereis an increased risk of spuriously significant differencearisen purely by chance So in order to investigate furtherwhere the difference occurs anHSD has been computed(HSD = 1423m) If we now compile the table of meanperceived distances for the sound sources located at 2mwe can see that all of the above values clearly lie withina single HSD to each other and cannot be distinguishedWe can safely assume then an ANOVA false alarm (typeI error) and no statistically significant effect of playbackmethod for the sources at the distance of 2m as well

Lastly for all the distances no synergetic effects of fac-tors A (stimuli) and B (playback conditions) have beendetected

Additionally we calculated correlation coefficients ρ forpairs of distance estimations for real and virtual soundsources (either 1st 2nd or 3rd order) and two stimuli Inall cases high correlation coefficients have been obtainedwhich confirms our findings that for these particular testconditions the perception of distance of binaurally ren-dered Ambisonic sound fields of orders 1 to 3 cannot bedistinguished from the perception of distance of the realsound sources

8 Discussion

The results presented for real sources corroborate the clas-sic underestimation of source distance as reported in theliterature These results were used as a basis with whichto measure the ability of Ambisonic sound fields of differ-ent orders to present sources at different distances It was

expected that a further underestimation of the source dis-tance would ensue with the binaural rendering as reportedin [17] However this was not the case even for first or-der presentations and the apparent distances of the vir-tual sources matched the real source distances well Oneshould note that the major difference between this studyand that of [17] is our use of head-tracking indicatingthe importance of head-movements in perceiving sourcedistance which develops the findings of Waller [18] andAshmead et al [10] on user interaction in a virtual spaceFurther work is required to quantify the effect of this

Moreover the presented study demonstrates that the en-hanced directional accuracy gained by presenting soundsources in HOA through head-tracked binaural renderingdoes not yield a significant improvement in the perceptionof the source distance What is noteworthy is that for eachorder there is no significant difference in the perception ofthe source location when compared to real-world sourcesWe therefore conclude that sound field directionality fordistance perception is sufficient with 1st order playback

The presence of the ANOVA false alarm at the 2m pointis of interest It is noteworthy that the 2m point representsa source inside the virtual array geometry It is a knownissue that virtual sound sources rendered inside the arrayof loudspeakers cannot be reproduced in a straightforwardway without artifacts Some of these artifacts include in-correct wave-front curvature and insufficient bass boost

In the first case there is ample evidence in the litera-ture to suggest that the wavefront curvature translates to asignificant binaural cues for sound sources near the head[30 38] It was already shown in section 21 that as asource moves closer to the head the levels of the monau-ral transfer function and the ILD both change significantlywith source angle However this effect is not strong at 1mand beyond For sources further away it has been shownin [39] that it is very difficult to assess distance by binauralcues alone

In the second case the requirement for distance com-pensation filtering due to near field effects for the largeloudspeaker radius (327m) and the given source distances(gt2m) is only prominent below 100Hz For the femalespeech test stimuli this will not have an effect since thefirst formant frequencies do not go down below 180HzAlso the current method employed for capturing HRIRsallowed for reliably obtaining filters with a frequencyresponse reaching down to around 170Hz thereby alsoband-limiting the delivery of the pink noise stimuli

Finally there was no significant difference in the resultspresented for different sources although the greater vari-ance in the results for pink noise suggest that the famil-iarity of the source does indeed play a role in the percep-tion of source distance as mentioned in section 23 Futurestudies will investigate the use of these monaural cues fur-ther and will utilize 0th order sound field rendering sinceit will remove the influence of any directional information

Considering the aforementioned study of Bronkhorst etal [14] where the accuracy of distance perception for bin-aural playback increases with the number of reflections

69

ACTA ACUSTICA UNITED WITH ACUSTICA Kearney et al Distance perceptionVol 98 (2012)

our findings demonstrate that the net effect of the monauralcues of direct to reverberant ratio level difference and timeof arrival of early reflections are of greater importance indistance perception for binaural rendering than Ambisonicdirectional accuracy beyond 1st order

9 Conclusions

We have assessed through subjective analysis the per-ceived source distance in virtual Ambisonic sound fieldsin comparison to real world sources The hypothesis testedwas that enhanced directional accuracy of deterministicpart of the sound field may lead to better reconstructionof environmental depth and thus improve the perceptionof sound source distance However it was shown thatAmbisonic reproduction matches the perceived real worldsource distances well even at 1st order and no improvementin this regard was observed when increasing the order Itmust be emphasized though that this analysis applies toAmbisonic-to-binaural decodes with higher order synthe-sis achieved using the directional analysis method of [23]Therefore further work will examine this topic for loud-speaker reproduction for both centre and off-centre listen-ing as well as investigate the effectiveness of HOA synthe-sis in comparison to real world HOA measurements

Acknowledgments

The authors gratefully acknowledge the participation ofthe test subjects for both their time and constructive com-ments as well as the technical support staff at the Depart-ment of Theatre Film and Television at the University ofYork for their assistance in the experimental setups Thisresearch is supported by Science Foundation Ireland

References

[1] L Fauster Stereoscopic techniques in computer graphicsTechnical paper TU Wien 2007

[2] J Lee Head tracking for desktop VR displays using theWii remote httpjohnnyleenetprojectswiiaccessed 30th Sept 2011

[3] D R Begault Direct comparison of the impact of headtracking reverberation and individualized head-relatedtransfer functions on the spatial perception of a virtualsound source J Audio Eng Soc 49 (2001) 904ndash916

[4] M Otani T Hirahara Auditory artifacts due to switchinghead-related transfer functions of a dynamic virtual audi-tory display IEICE Trans Fundam Electron CommunComput Sci E91-A (2008) 1320ndash1328

[5] V Pulkki Virtual sound source positioning using VectorBase Amplitude Panning J Audio Eng Soc 45 (1997)456ndash466

[6] A J Berkhout A Holographic Approach to Acoustic Con-trol J Audio Eng Soc 36 (1988) 977ndash995

[7] M A Gerzon Periphony With-height sound reproductionJ Audio Eng Soc 21 (1973) 2ndash10

[8] F Rumsey Spatial quality evaluation for reproducedsound Terminology meaning and a scene-based para-digm J Audio Eng Soc 50 (2002) 651ndash666

[9] J Blauert Communication acoustics Springer 2008

[10] D H Ashmead D L Davis A Northington Contributionof listenersrsquo approaching motion to auditory distance per-ception J Exp Psy Hum Percep and Perform 21 (1995)239ndash256

[11] E Czerwinski A Voishvillo S Alexandrov A TerekhovPropagation distortion in sound systems Can we avoid itJ Audio Eng Soc 48 (2000) 30ndash48

[12] S H Nielsen Auditory distance perception in differentrooms J Audio Eng Soc 41 (1993) 755ndash770

[13] M B Gardner Distance estimation of 0 or apparent 0oriented speech signals in anechoic space J Acoust SocAm 45 (1969) 47ndash53

[14] A W Bronkhorst T Houtgast Auditory distance percep-tion in rooms Nature 397 (1999) 517ndash520

[15] J B Allen D A Berkley Image method for efficientlysimulating small-room acoustics J Acoust Soc Am 65(1979) 943ndash950

[16] M Rychtarikova T V d Bogaert G Vermeir J WoutersBinaural sound source localization in real and virtualrooms J Audio Eng Soc 57 (2009) 205ndash220

[17] J S Chan C Maguinness D Lisiecka C Ennis MLarkin C OrsquoSullivan F Newell Comparing audiovisualdistance perception in various real and virtual environ-ments Proc of the 32nd Euro Conf on Vis Percep Re-gensburg Germany 2009

[18] D Waller Factors affecting the perception of interobjectdistances in virtual environments Presence Teleoper Vir-tual Environ 8 (1999) 657ndash670

[19] A McKeag D McGrath Sound field format to binauraldecoder with head-tracking Proc of the 6th Australian Re-gional Convention of the AES 1996

[20] M Noisternig A Sontacchi T Musil R Holdrich A3D Ambisonic based binaural sound reproduction systemProc of the 24th Int Conf of the Audio Eng Soc AlbertaCanada 2003

[21] B-I Dalenback M Str

omberg Real time walkthrough au-

ralization - the first year Proc of the Inst of AcousCopenhagen Denmark 2006

[22] C Masterson S Adams G Kearney F Boland A methodfor head related impulse response simplification Procof the 17th European Signal Processing Conference (EU-SIPCO) Glasgow Scotland 2009

[23] J Merimaa V Pulkki Spatial impulse response renderingi Analysis and synthesis J Audio Eng Soc 53 (2005)

[24] W M Hartmann Localization of sound in rooms JAcoust Soc Am 74 (1983) 1380ndash1391

[25] D Griesinger Spatial impression and envelopment in smallrooms Proc of the 103rd Conv of the Audio Eng SocNew York USA 1997

[26] G Kearney M Gorzel H Rice F Boland Depth per-ception in interactive virual acoustic environments usinghigher order ambisonic soundfields Proc of the 2nd IntAmbisonics Symp Paris France 2010

[27] A Farina Simultaneous measurement of impulse responseand distortion with a swept-sine technique Proc of the108th Conv of the Audio Eng Soc Paris France 2000

[28] M Gerzon The design of precisely coincident microphonearrays for stereoand surround sound Proc of the 50thConv of the Audio Eng Soc London UK 1975

70

Kearney et al Distance perception ACTA ACUSTICA UNITED WITH ACUSTICAVol 98 (2012)

[29] C Guastavino B F G Katz Perceptual evaluation ofmulti-dimensional spatial audio reproduction J AcoustSoc Am 116 (2004) 1105ndash1115

[30] P Zahorik Assessing auditory distance perception usingvirtual acoustics J Acoust Soc Am 111 (2002) 1832ndash1846

[31] J M Loomis R L Klatzky J W Philbeck R G Goll-edge Assessing auditory distance perception using percep-tually directed action Perception And Psychophysics 60(1998) 966ndash980

[32] T Y Grechkin T D Nguyen J M Plumert J F CremerJ K Kearney How does presentation method and measure-ment protocol affect distance estimation in real and virtualenvironments ACM Trans Appl Percept 7 (2010) 261ndash2618

[33] S Bertet Formats audio 3d hieacuterarchiques Caracteacuterisationobjective et perceptive des systeacutemes ambisonicsdrsquoordressupeacuterieurs PhD dissertation INSA Lyon 2008

[34] W M Fisher G R Doddington K M Goudie-MarshallThe darpa speech recognition research database Specifica-tions and status Proc of the DARPA Workshop on SpeechRecognition 1986

[35] M A Gerzon G J Barton Ambisonic decoders forHDTV Proc of the 92nd Conv of the Audio Eng SocVienna Austria 1992

[36] NaturalPoint Trackir 5 httpwwwnaturalpointcomtrackir accessed 30th Sept 2011

[37] M Puckette Pure data httppuredatainfo ac-cessed 30th Sept 2011

[38] P Zahorik D S Brungart A W Bronkhorst Auditory dis-tance perception in humans A summary of past and presentresearch Acta Acustica united with Acustica 91 (2005)409ndash420

[39] H Wittek Perceptual differences between Wavefield Syn-thesis and Stereophony Department of Music and SoundRecording School of Arts Communication and Humani-ties University of Surrey UK 2007

71

Page 4: 1 ) -0 #0*- ! 1 ) #! !-* 0* ! ) *- ! ) ) ) *#! #0! *sigmedia/pmwiki/uploads/Main...# ( " &/4$/' ,$ +$ *- ! ) %- #! ! !- ) - 1 ) -0 #0*- ! 1 ) #! !-* 0* ! ) *- ! )) ) *#! #0! * K05

ACTA ACUSTICA UNITED WITH ACUSTICA Kearney et al Distance perceptionVol 98 (2012)

4 Virtual Loudspeaker Reproduction

In the lsquovirtual loudspeakerrsquo approach HRTFs are mea-sured at the lsquosweet-spotrsquo (the limited region in the cen-tre of a reproduction array where an adequate spatial im-pression is generally guaranteed) in a multi-loudspeakerreproduction setup and the resultant binaural playback isformed from the convolution of the loudspeaker feeds withthe virtual loudspeakers This concept is illustrated in Fig-ure 3 For the left ear we have

L =I

i=1

hLi lowast qi (7)

where lowast denotes convolution and hLi is the left ear HRIRcorresponding to the ith virtual loudspeaker and qi is theith loudspeaker feed Similar relations apply for the rightear signal This method was first introduced by McKeagand McGrath [19] and examples of its adoption can befound in [20] and [21] This approach has major computa-tional advantages since a complex filter kernel is not re-quired and head rotation can be simulated by changing theloudspeaker feeds p as opposed to the HRTFs Whilst theHRTFs in this case play an important role in the spatializa-tion ultimately it is the sound field creation over the virtualloudspeakers which gives the overall spatial impressionMost existing research uses a block frequency domain ap-proach to this convolution However given that the virtualloudspeaker feeds are controlled via head-tracking in real-time a time-domain filtering approach can also be utilizedFor short filter lengths obtaining the output in a point wisemanner avoids the inherent latencies introduced by blockconvolution in the frequency domain A strategy for sig-nificant reduction of the filter length without artifacts hasbeen proposed in [22]

5 Higher Order Synthesis

In order to compare the distance perception of differentorders of Ambisonic sound fields it is desirable to takereal world sound field measurements However the for-mation of higher order spherical harmonic directional pat-terns is non-trivial Thus in order for us to change FOAimpulse responses to HOA representations we will em-ploy a perceptual based approach which will allow us tosynthesize the increased directional resolution that wouldbe achieved with a HOA sound field recording For this weadopt the directional analysis method of Pulkki and Meri-maa found in [23] Here the B-format signals are analyzedin terms of sound intensity and energy in order to derivetime-frequency based direction of arrival and diffusenessThe instantaneous intensity vector is given from the pres-sure p and particle velocity u as

I(t) = p(t) u(t) (8)

Since we are using FOA impulse response measurementsthe pressure can be approximated by the 0th order Am-bisonics component w(t) which is omnidirectional

p(t) = w(t) (9)

Figure 3 The virtual loudspeaker reproduction concept

and the particle velocity by

u(t) =1radic2Z0

x(t)ex + y(t)ey + z(t)ez (10)

where ex ey and ez represent Cartesian unit vectors x(t)y(t) z(t) are the FOA signals and Z0 is the characteristicacoustic impedance of air

The instantaneous intensity represents the direction ofthe energy transfer of the sound field and the direction ofarrival can be determined simply by the opposite directionof I For FOA we can calculate the intensity for each coor-dinate axis and in the frequency domain Since a portionof the energy will also oscillate locally a diffuseness esti-mate can be made from the ratio of the magnitude of theintensity vector to the overall energy density E given as

ψ = 1 minusI

c E (11)

where middot denotes time averaging || middot || denotes the normof the vector and c is the speed of sound The diffusenessestimate will yield a value of zero for incident plane wavesfrom a particular direction but will give a value of 1 wherethere is no net transport of acoustic energy such as in thecases of reverberation or standing waves Time averagingis used since it is difficult to determine an instantaneousmeasure of diffuseness

64

Kearney et al Distance perception ACTA ACUSTICA UNITED WITH ACUSTICAVol 98 (2012)

The output of the analysis is then subject to smoothingbased on the Equivalent Rectangular Bandwidth (ERB)scale such that the resolution of the human auditory sys-tem is approximated Since the frequency dependent direc-tion of arrival of the non-diffuse portion of the sound fieldcan be determined HOA reproduction can be achievedby re-encoding point like sources corresponding to the di-rection indicated in each temporal average and frequencyband into a higher order spherical harmonic representa-tion The resultant Ambisonic signals are then weighted ineach frequency band k according to 1 minus ψk However itis only vital to re-encode non-diffuse components to higherorder and the diffuse field can be obtained by multiplyingthe FOA signals by

radicψk and forming a first order decode

This is justified since source localisation is dependent onthe direction of arrival of the direct sound and early reflec-tions and not on late room reverberation [24] Thus fromthe perceptual point of view it is questionable whetherthere is a need to preserve the full directional accuracy ofthe reverberant field Furthermore if there exists a generaldirectional distribution to the diffuse field this will still bepreserved in first order form On the other hand the diffusecomponent should not be simply derived from the 0th ordersignal One can easily see that such a solution would pro-vide perfectly correlated versions of the diffuse field to theleft and right ear signals which have no equivalent in thephysical world (ie real physical sound field) Moreoverinteraural decorrelation is an important factor in providingspatial impression in enclosed environments [25]

Figure 4 shows an example of the first 20ms of a 1st

order impulse response taken in a reverberant hall [26]Here the source was located 3m from a Soundfield

ST350 microphone and the Spatial Room Impulse Re-sponse (SRIR) captured using the exponentially swept-sine tone technique [27] In these plots particular attentionis drawn to the direct sound (coming from directly in frontof the microphone) and a left wall reflection at approxi-mately 14ms It can be seen that the directional resolutionincreases significantly with HOA representation It shouldbe noted that the A-format capsule on sound field micro-phones only display adequate directionality up to 10 kHz[28] Spatial aliasing is therefore an issue for high fre-quencies and as a result the directional information above10 kHz cannot be relied upon

6 Method Localization of Distance of TestSounds

Different protocols have been used in literature for subjec-tive assessment of distance perception most notably a ver-bal report [29 30] direct or indirect blind walking [31 32]or imagined timed walking [32] All of these methods haveproved to provide reliable and comparable results for bothauditory and visual stimuli with direct blind walking ex-hibiting the least between-subject variability [31 32]

In former work [26] authors of this paper developed amethod where subjects indicated the perceived distance ofreal and virtual sound sources by selecting one of several

Direct sound

Left wall reflection

Direct sound

Left wall reflection

(a)

(b)

Figure 4 Ambisonic sound field from 1st order measurementwith a Soundfield ST350 (a) 1st order representation (b) 3rd

order up-mix

physical loudspeakers lined up (and slightly offset in orderto provide lsquoacoustic transparencyrsquo) in front of their eyesHowever for the present study in order to completelyeliminate any possible anchors as well as visual cues itwas decided to utilize the method of direct blind walkingOf the main concerns in the experiment was a direct com-parison of distance perception of real sound sources versusvirtual sound sources presented over headphones Due todifferent apparatus requirements the experiment had to beconducted in two separate phases

61 Participants

Seven participants aged 24ndash58 took part in the experimentAll subjects were of good hearing and were either musictechnology students or practitioners actively involved inaudio research or production Prior to the test HRIR data

65

ACTA ACUSTICA UNITED WITH ACUSTICA Kearney et al Distance perceptionVol 98 (2012)

Figure 5 Measuring Head Related Impulse Responses withminiature microphones

for all the participants has been obtained in a sound-prooflarge (18times15times10m3) but quite damped (T60 1000Hz =057 s) multipurpose room (Black Box) in the Departmentof Theatre Film and Television at the University of YorkAdditional damping was assured by thick heavy curtainscovering all four walls and a carpet on the floor Themeasurement process consisted of a standard procedurewhere miniature omnidirectional microphones (KnowlesFG-23629-P16) were placed at the entrance of a blockedear canal in order to capture acoustic pressure generatedby one loudspeaker at a time located at constant distanceand varying angular direction

Subjects were seated on an elevated platform so thattheir ears were 220m above the ground and their headwas in the centre of a spherical loudspeaker array arrangedin diametrically opposed pairs The ear height was cali-brated using a laser guide as shown in Figure 5 The ar-ray consisted of 16 full range Genelec 8050A loudspeak-ers since the intention was to reproduce Ambisonic soundfields up to and including 3rd order This 3-D setup shownin Figure 6 comprised a flat-front horizontal octagon anda cube (four loudspeakers on top and four on the bottom)The radius of the loudspeaker array (and thus the virtualloudspeaker array) was 327m For FOA-to-binaural de-code only virtual loudspeakers from the cube configura-tion were utilized since no directional resolution is gainedby using a higher number of loudspeakers Furthermoredespite careful alignment oversampling of the sound fieldwith higher numbers of speakers has the potential to yieldsound field distortions [33] Note that for 2nd and 3rd order

Figure 6 Array of 16 loudspeakers used for HRIR measure-ments

reproduction all 16 loudspeakers were used Although theoversampled configuration was not optimal from the 2nd

order reproduction point of view it was not possible toeasily and accurately rearrange the loudspeaker array inorder to accommodate for a different layout

HRIRs were captured using the exponentially swept-sine tone technique [27] at 441 kHz sampling rate and16-bit resolution Since the measurement environment wasnot fully anechoic further processing of the measured datawas necessary The HRIRs were tapered before the arrivalof the first reflection (from the floor) yielding filter kernelswith 257 taps and were subsequently diffuse-field equal-ized

62 Stimuli

The stimuli used in the experiment were pink noisebursts and phonetically balanced phrases selected fromthe TIMIT Acoustic-Phonetic Continuous Speech Corpusdatabase and recorded by a female reader [34] A sam-pling rate of 441 kHz and 16 bit resolution was used inboth cases These two sample types were selected in orderto represent both unfamiliar and familiar sound sourcesThey were presented to the subjects in a pseudo-random-ized manner to avoid any ordering effects

For headphone reproduction prior to the test phaseFOA impulse response measurements were taken from thelistener position of each loudspeaker using the exponen-tially swept-sine tone technique [27] From these measure-ments 2nd and 3rd order impulse response sets were ex-tracted using the directional analysis approach outlined insection 5 0th order Ambisonics does not provide any di-rectional information which means that it would lack thecues that are investigated in the higher order renderingsTherefore it was decided not to include it in this compari-son

The only psychoacoustical optimization applied to theAmbisonics decodes was shelf filtering and was intendedto satisfy Gerzonrsquos localization criteria for maximized ve-locity decode at low frequencies and energy decode athigher frequencies [35] This involved changing the ratio

66

Kearney et al Distance perception ACTA ACUSTICA UNITED WITH ACUSTICAVol 98 (2012)

of the pressure to velocity components at low and highfrequencies Whilst the crossover frequency for the highfrequency boost in the pressure channel at first order isnormally in the region of 400Hz for regular loudspeakerlistening here we restore the crossover point to 700Hzsince the subject is always perfectly centred in the virtualloudspeaker array

63 Test Environment and Apparatus

A series of subjective listening tests was conducted in theLarge Rehearsal Room in the Department of Theatre Filmand Television in the University of York The room dimen-sions were 12 times 9 times 35m3 and the spatially averaged T60at 1 kHz was 026 s A low T60 was desired for this studyso the walls were covered with thick heavy curtains asshown in Figure 7 Since the up-mix from 1st to 2nd and 3rd

order Ambisonics concerned only the deterministic part ofthe measured SRIRs it was assumed that no advantagewould be gained from using a more reverberant space

A professional camera dolly track was set up roughlyin the direction of the diagonal of the room It not onlyallowed for testing distances of the real loudspeaker upto 8m but its non-symmetrical position also assured thatearly reflections of the same order from different surfacesdid not easily coincide at the subjects ears but instead ar-rived at different times A single full-range loudspeaker(Genelec 8050A) was mounted on a camera dolly whichenabled it to be noiselessly translated by the experimentassistant to different locations The guiding rope was hungalong the dolly track which was intended to help and guidethe participants when walking toward the sound sourceSince it was not possible to walk exactly on the dolly trackit was decided that the walking path would be directly nextto it as shown in Figure 7 The only weakness of this so-lution was that the sound source horizontal angle variedfrom 1404 degrees at the closest distance (2m) to 358 de-grees at the furthest distance (8m) However this did nothave any effect on the distance judgments for two reasonsFirstly the subjects were allowed (or even encouraged) torotate their head in order to fully utilize the available ITDand ILD cues Secondly the initial head orientation wasnot in any way fixed This combined with the fact thatthere were no clear cues to the subjectrsquos initial orientationin the room at the origin made this small initial angularoffset unimportant Furthermore none of the participantsreported any bias in their assessment based on the horizon-tal offset of the sound source

For trials with binaural presentation high quality openback headphones (AKG-K601) were used which exhibitlow levels of interaural magnitude and group delay dis-tortion Sound field rotation tilt and tumble control wasimplemented via the TrackIR 5 infra-red head trackingsystem [36] resulting in stable virtual images with headrotations The system responsible for playback of virtual-ized sound sources was completely built in the Pure Datavisual programming environment [37] and its combinedlatency (including head-tracker data porting and audio up-date rate) was 20ms

Figure 7 Participant performing a trial during the experiment

64 Procedure

In the experiment subjects entered the test environmentblindfolded and without any prior expectation regardingthe room dimensions its acoustic properties or the testapparatus They were guided by the experimenter to thereference point (the lsquooriginrsquo) After a short explanation ofthe experiment objectives a training session began witha short (3ndash5min) walking-only trial until participants feltcomfortable with walking blindfolded and using a guiderope Next they performed 4ndash6 training trials in which thesame test stimuli to be used in the experiment (speech andpink noise) were played by the loudspeaker at randomlychosen distances No feedback was given and no resultswere recorded after each test trial The end of the trainingsession was clearly announced and after a 1 minute inter-val the first phase of the test began

In test phase I participants were asked to listen to staticsound sources at a randomly chosen points focusing onthe perceived distance They could listen to any audio sam-ple as many times as they wished During the playbackthey were instructed to stay still and refrain from any trans-lational head movements However they were encouragedto rotate their head freely After the playback had stoppedthey were asked to walk guided by the rope to the pointwhere they thought the sound originated from The dis-tance walked was subsequently recorded by the assistantusing a laser measuring tool after which the participantwalked backwards to the origin In the meantime the loud-speaker was noiselessly translated to its new position and

67

ACTA ACUSTICA UNITED WITH ACUSTICA Kearney et al Distance perceptionVol 98 (2012)

the test proceeded Similar to the training session no feed-back was given at any stage

During the first test phase participants had to indicatethe perceived distance for sound sources randomly lo-cated at 2m 4m 6m or 8m Taking into account thatboth speech and pink noise bursts samples were used (ina pseudo-random order) the number of trials in the firstphase added up to 8 Each subject performed all the trialsonly once

Upon completion of the first phase of the test there wasa short (approximately 2 minutes) interval that was re-quired in order to put on the headphones and calibratethe head-tracking system In phase II subjects were alsoasked to identify the sound source distance but this timeusing Ambisonic sound fields presented over headphonesOther than the fact that headphones and the head-trackingsystem were used the test protocol remained the same asin phase I However due to the fact that there were threeplayback configurations to be tested (1st 2nd and 3rd orderAmbisonics) participants had to perform 24 trails insteadof 8 Instead of separate phases for each Ambisonic orderall samples were randomly presented to the subject withinthe same test phase Again subjects performed all the tri-als only once and no feedback was given at any stage

7 Results

The perceived sound source distance (indicated by the dis-tance walked) was collected from 7 subjects for 4 presen-tation points (2m 4m 6m and 8m) two stimuli (femalespeech and pink noise bursts) and four playback options1st 2nd and 3rd Order Ambisonics and real loudspeakerswhich for analysis we will denote FOA SOA TOA andREAL respectively With headphone trials none of theparticipants reported in-head localization however therewere 3 cases were the proximity of the sound source wasvery apparent so participants decided not to move at allIn some cases the virtual sound source was initially local-ized behind the subjects but all participants were able toresolve the confusion by applying head-rotation

We computed the mean values of walked distances micro foreach test condition along with the corresponding standarderrors se(micro) The results are presented separately for eachstimulus type within 95 Confidence Intervals

As expected the perception of distance for the real sour-ces was more accurate for near sources Beyond 4m dis-tance perception was continuously underestimated whichis congruent with the previous studies outlined in sec-tion 2 Furthermore the standard deviation of localiza-tion increases as the source moves further into the diffusefield We also see that unfamiliar stimuli produce greatervariability in subjectsrsquo answers The mean localization ofthe virtual sources follows the reference source localiza-tion well The answers for virtual sources deviate fromtheir means roughly in the same fashion as the answers forreference sources as localization becomes more difficultwithin the diffuse field

Since the study followed the within-subject factorial de-sign with 2(stimuli)4(playback conditions) in order to

0 1 2 3 4 5 6 7 8 90

1

2

3

4

5

6

7

8

9

Real distance [m]

Dis

tance

walk

ed

[m]

FOASOATOAReal

Figure 8 Mean localization of real and virtual sound sources (fe-male speech)

0 1 2 3 4 5 6 7 8 90

1

2

3

4

5

6

7

8

9

Real distance [m]

Dis

tance

walk

ed

[m]

FOASOATOAReal

Figure 9 Mean localization of real and virtual sound sources(pink noise bursts)

investigate the effects of these two factors (referred lateras factors A and B) as well as potential interaction ef-fects for each presentation distance a two-way ANOVAhas been performed The null hypothesis being tested hereis that all the mean perceived distances for all the stimuliand playback methods do not differ significantly

H0 microFOA=microSOA=microTOA=microReal=micro

H1 not all localization means (microi) are the same

No statistically significant effect of stimuli (familiar vsunfamiliar) on the perception of distance has been found(F2m(3 48) = 0835 p = 0365 F4m(3 48) = 20462p = 0159 F6m(3 48) = 2575 p = 0115 F8m(3 48) =20462 p = 0159) For distances of 4m and moreplayback option had also no statistically significant effect(F4m(3 48) = 2192 p = 0101 F6m(3 48) = 0665p = 0577 F8m(3 48) = 0202 p = 0894)

However a statistically significant difference has beendetected for the distance of 2m In larger study de-signs with multiple levels it is advisable to use the Hon-

68

Kearney et al Distance perception ACTA ACUSTICA UNITED WITH ACUSTICAVol 98 (2012)

Table I Mean localization [m] of virtual and real sound sourcesat 2m

microFOA microSOA microTOA microReal

Speech 1119 1389 0841 1638Noise 0877 1001 0902 1641

Table II Correlation coefficients ρ and corresponding pminus valuesfor pairs of distance estimations for real and virtual soundsources (Speech)

ρ p minus valueReal vs FOA 09828 00172Real vs SOA 09960 00040Real vs TOA 09590 00410

Table III Correlation coefficients ρ and corresponding pminusvaluesfor pairs of distance estimations for real and virtual soundsources (Noise)

ρ p minus valueReal vs FOA 09913 00087Real vs SOA 09857 00143Real vs TOA 09972 00028

estly Significant Difference (HSD) approach since thereis an increased risk of spuriously significant differencearisen purely by chance So in order to investigate furtherwhere the difference occurs anHSD has been computed(HSD = 1423m) If we now compile the table of meanperceived distances for the sound sources located at 2mwe can see that all of the above values clearly lie withina single HSD to each other and cannot be distinguishedWe can safely assume then an ANOVA false alarm (typeI error) and no statistically significant effect of playbackmethod for the sources at the distance of 2m as well

Lastly for all the distances no synergetic effects of fac-tors A (stimuli) and B (playback conditions) have beendetected

Additionally we calculated correlation coefficients ρ forpairs of distance estimations for real and virtual soundsources (either 1st 2nd or 3rd order) and two stimuli Inall cases high correlation coefficients have been obtainedwhich confirms our findings that for these particular testconditions the perception of distance of binaurally ren-dered Ambisonic sound fields of orders 1 to 3 cannot bedistinguished from the perception of distance of the realsound sources

8 Discussion

The results presented for real sources corroborate the clas-sic underestimation of source distance as reported in theliterature These results were used as a basis with whichto measure the ability of Ambisonic sound fields of differ-ent orders to present sources at different distances It was

expected that a further underestimation of the source dis-tance would ensue with the binaural rendering as reportedin [17] However this was not the case even for first or-der presentations and the apparent distances of the vir-tual sources matched the real source distances well Oneshould note that the major difference between this studyand that of [17] is our use of head-tracking indicatingthe importance of head-movements in perceiving sourcedistance which develops the findings of Waller [18] andAshmead et al [10] on user interaction in a virtual spaceFurther work is required to quantify the effect of this

Moreover the presented study demonstrates that the en-hanced directional accuracy gained by presenting soundsources in HOA through head-tracked binaural renderingdoes not yield a significant improvement in the perceptionof the source distance What is noteworthy is that for eachorder there is no significant difference in the perception ofthe source location when compared to real-world sourcesWe therefore conclude that sound field directionality fordistance perception is sufficient with 1st order playback

The presence of the ANOVA false alarm at the 2m pointis of interest It is noteworthy that the 2m point representsa source inside the virtual array geometry It is a knownissue that virtual sound sources rendered inside the arrayof loudspeakers cannot be reproduced in a straightforwardway without artifacts Some of these artifacts include in-correct wave-front curvature and insufficient bass boost

In the first case there is ample evidence in the litera-ture to suggest that the wavefront curvature translates to asignificant binaural cues for sound sources near the head[30 38] It was already shown in section 21 that as asource moves closer to the head the levels of the monau-ral transfer function and the ILD both change significantlywith source angle However this effect is not strong at 1mand beyond For sources further away it has been shownin [39] that it is very difficult to assess distance by binauralcues alone

In the second case the requirement for distance com-pensation filtering due to near field effects for the largeloudspeaker radius (327m) and the given source distances(gt2m) is only prominent below 100Hz For the femalespeech test stimuli this will not have an effect since thefirst formant frequencies do not go down below 180HzAlso the current method employed for capturing HRIRsallowed for reliably obtaining filters with a frequencyresponse reaching down to around 170Hz thereby alsoband-limiting the delivery of the pink noise stimuli

Finally there was no significant difference in the resultspresented for different sources although the greater vari-ance in the results for pink noise suggest that the famil-iarity of the source does indeed play a role in the percep-tion of source distance as mentioned in section 23 Futurestudies will investigate the use of these monaural cues fur-ther and will utilize 0th order sound field rendering sinceit will remove the influence of any directional information

Considering the aforementioned study of Bronkhorst etal [14] where the accuracy of distance perception for bin-aural playback increases with the number of reflections

69

ACTA ACUSTICA UNITED WITH ACUSTICA Kearney et al Distance perceptionVol 98 (2012)

our findings demonstrate that the net effect of the monauralcues of direct to reverberant ratio level difference and timeof arrival of early reflections are of greater importance indistance perception for binaural rendering than Ambisonicdirectional accuracy beyond 1st order

9 Conclusions

We have assessed through subjective analysis the per-ceived source distance in virtual Ambisonic sound fieldsin comparison to real world sources The hypothesis testedwas that enhanced directional accuracy of deterministicpart of the sound field may lead to better reconstructionof environmental depth and thus improve the perceptionof sound source distance However it was shown thatAmbisonic reproduction matches the perceived real worldsource distances well even at 1st order and no improvementin this regard was observed when increasing the order Itmust be emphasized though that this analysis applies toAmbisonic-to-binaural decodes with higher order synthe-sis achieved using the directional analysis method of [23]Therefore further work will examine this topic for loud-speaker reproduction for both centre and off-centre listen-ing as well as investigate the effectiveness of HOA synthe-sis in comparison to real world HOA measurements

Acknowledgments

The authors gratefully acknowledge the participation ofthe test subjects for both their time and constructive com-ments as well as the technical support staff at the Depart-ment of Theatre Film and Television at the University ofYork for their assistance in the experimental setups Thisresearch is supported by Science Foundation Ireland

References

[1] L Fauster Stereoscopic techniques in computer graphicsTechnical paper TU Wien 2007

[2] J Lee Head tracking for desktop VR displays using theWii remote httpjohnnyleenetprojectswiiaccessed 30th Sept 2011

[3] D R Begault Direct comparison of the impact of headtracking reverberation and individualized head-relatedtransfer functions on the spatial perception of a virtualsound source J Audio Eng Soc 49 (2001) 904ndash916

[4] M Otani T Hirahara Auditory artifacts due to switchinghead-related transfer functions of a dynamic virtual audi-tory display IEICE Trans Fundam Electron CommunComput Sci E91-A (2008) 1320ndash1328

[5] V Pulkki Virtual sound source positioning using VectorBase Amplitude Panning J Audio Eng Soc 45 (1997)456ndash466

[6] A J Berkhout A Holographic Approach to Acoustic Con-trol J Audio Eng Soc 36 (1988) 977ndash995

[7] M A Gerzon Periphony With-height sound reproductionJ Audio Eng Soc 21 (1973) 2ndash10

[8] F Rumsey Spatial quality evaluation for reproducedsound Terminology meaning and a scene-based para-digm J Audio Eng Soc 50 (2002) 651ndash666

[9] J Blauert Communication acoustics Springer 2008

[10] D H Ashmead D L Davis A Northington Contributionof listenersrsquo approaching motion to auditory distance per-ception J Exp Psy Hum Percep and Perform 21 (1995)239ndash256

[11] E Czerwinski A Voishvillo S Alexandrov A TerekhovPropagation distortion in sound systems Can we avoid itJ Audio Eng Soc 48 (2000) 30ndash48

[12] S H Nielsen Auditory distance perception in differentrooms J Audio Eng Soc 41 (1993) 755ndash770

[13] M B Gardner Distance estimation of 0 or apparent 0oriented speech signals in anechoic space J Acoust SocAm 45 (1969) 47ndash53

[14] A W Bronkhorst T Houtgast Auditory distance percep-tion in rooms Nature 397 (1999) 517ndash520

[15] J B Allen D A Berkley Image method for efficientlysimulating small-room acoustics J Acoust Soc Am 65(1979) 943ndash950

[16] M Rychtarikova T V d Bogaert G Vermeir J WoutersBinaural sound source localization in real and virtualrooms J Audio Eng Soc 57 (2009) 205ndash220

[17] J S Chan C Maguinness D Lisiecka C Ennis MLarkin C OrsquoSullivan F Newell Comparing audiovisualdistance perception in various real and virtual environ-ments Proc of the 32nd Euro Conf on Vis Percep Re-gensburg Germany 2009

[18] D Waller Factors affecting the perception of interobjectdistances in virtual environments Presence Teleoper Vir-tual Environ 8 (1999) 657ndash670

[19] A McKeag D McGrath Sound field format to binauraldecoder with head-tracking Proc of the 6th Australian Re-gional Convention of the AES 1996

[20] M Noisternig A Sontacchi T Musil R Holdrich A3D Ambisonic based binaural sound reproduction systemProc of the 24th Int Conf of the Audio Eng Soc AlbertaCanada 2003

[21] B-I Dalenback M Str

omberg Real time walkthrough au-

ralization - the first year Proc of the Inst of AcousCopenhagen Denmark 2006

[22] C Masterson S Adams G Kearney F Boland A methodfor head related impulse response simplification Procof the 17th European Signal Processing Conference (EU-SIPCO) Glasgow Scotland 2009

[23] J Merimaa V Pulkki Spatial impulse response renderingi Analysis and synthesis J Audio Eng Soc 53 (2005)

[24] W M Hartmann Localization of sound in rooms JAcoust Soc Am 74 (1983) 1380ndash1391

[25] D Griesinger Spatial impression and envelopment in smallrooms Proc of the 103rd Conv of the Audio Eng SocNew York USA 1997

[26] G Kearney M Gorzel H Rice F Boland Depth per-ception in interactive virual acoustic environments usinghigher order ambisonic soundfields Proc of the 2nd IntAmbisonics Symp Paris France 2010

[27] A Farina Simultaneous measurement of impulse responseand distortion with a swept-sine technique Proc of the108th Conv of the Audio Eng Soc Paris France 2000

[28] M Gerzon The design of precisely coincident microphonearrays for stereoand surround sound Proc of the 50thConv of the Audio Eng Soc London UK 1975

70

Kearney et al Distance perception ACTA ACUSTICA UNITED WITH ACUSTICAVol 98 (2012)

[29] C Guastavino B F G Katz Perceptual evaluation ofmulti-dimensional spatial audio reproduction J AcoustSoc Am 116 (2004) 1105ndash1115

[30] P Zahorik Assessing auditory distance perception usingvirtual acoustics J Acoust Soc Am 111 (2002) 1832ndash1846

[31] J M Loomis R L Klatzky J W Philbeck R G Goll-edge Assessing auditory distance perception using percep-tually directed action Perception And Psychophysics 60(1998) 966ndash980

[32] T Y Grechkin T D Nguyen J M Plumert J F CremerJ K Kearney How does presentation method and measure-ment protocol affect distance estimation in real and virtualenvironments ACM Trans Appl Percept 7 (2010) 261ndash2618

[33] S Bertet Formats audio 3d hieacuterarchiques Caracteacuterisationobjective et perceptive des systeacutemes ambisonicsdrsquoordressupeacuterieurs PhD dissertation INSA Lyon 2008

[34] W M Fisher G R Doddington K M Goudie-MarshallThe darpa speech recognition research database Specifica-tions and status Proc of the DARPA Workshop on SpeechRecognition 1986

[35] M A Gerzon G J Barton Ambisonic decoders forHDTV Proc of the 92nd Conv of the Audio Eng SocVienna Austria 1992

[36] NaturalPoint Trackir 5 httpwwwnaturalpointcomtrackir accessed 30th Sept 2011

[37] M Puckette Pure data httppuredatainfo ac-cessed 30th Sept 2011

[38] P Zahorik D S Brungart A W Bronkhorst Auditory dis-tance perception in humans A summary of past and presentresearch Acta Acustica united with Acustica 91 (2005)409ndash420

[39] H Wittek Perceptual differences between Wavefield Syn-thesis and Stereophony Department of Music and SoundRecording School of Arts Communication and Humani-ties University of Surrey UK 2007

71

Page 5: 1 ) -0 #0*- ! 1 ) #! !-* 0* ! ) *- ! ) ) ) *#! #0! *sigmedia/pmwiki/uploads/Main...# ( " &/4$/' ,$ +$ *- ! ) %- #! ! !- ) - 1 ) -0 #0*- ! 1 ) #! !-* 0* ! ) *- ! )) ) *#! #0! * K05

Kearney et al Distance perception ACTA ACUSTICA UNITED WITH ACUSTICAVol 98 (2012)

The output of the analysis is then subject to smoothingbased on the Equivalent Rectangular Bandwidth (ERB)scale such that the resolution of the human auditory sys-tem is approximated Since the frequency dependent direc-tion of arrival of the non-diffuse portion of the sound fieldcan be determined HOA reproduction can be achievedby re-encoding point like sources corresponding to the di-rection indicated in each temporal average and frequencyband into a higher order spherical harmonic representa-tion The resultant Ambisonic signals are then weighted ineach frequency band k according to 1 minus ψk However itis only vital to re-encode non-diffuse components to higherorder and the diffuse field can be obtained by multiplyingthe FOA signals by

radicψk and forming a first order decode

This is justified since source localisation is dependent onthe direction of arrival of the direct sound and early reflec-tions and not on late room reverberation [24] Thus fromthe perceptual point of view it is questionable whetherthere is a need to preserve the full directional accuracy ofthe reverberant field Furthermore if there exists a generaldirectional distribution to the diffuse field this will still bepreserved in first order form On the other hand the diffusecomponent should not be simply derived from the 0th ordersignal One can easily see that such a solution would pro-vide perfectly correlated versions of the diffuse field to theleft and right ear signals which have no equivalent in thephysical world (ie real physical sound field) Moreoverinteraural decorrelation is an important factor in providingspatial impression in enclosed environments [25]

Figure 4 shows an example of the first 20ms of a 1st

order impulse response taken in a reverberant hall [26]Here the source was located 3m from a Soundfield

ST350 microphone and the Spatial Room Impulse Re-sponse (SRIR) captured using the exponentially swept-sine tone technique [27] In these plots particular attentionis drawn to the direct sound (coming from directly in frontof the microphone) and a left wall reflection at approxi-mately 14ms It can be seen that the directional resolutionincreases significantly with HOA representation It shouldbe noted that the A-format capsule on sound field micro-phones only display adequate directionality up to 10 kHz[28] Spatial aliasing is therefore an issue for high fre-quencies and as a result the directional information above10 kHz cannot be relied upon

6 Method Localization of Distance of TestSounds

Different protocols have been used in literature for subjec-tive assessment of distance perception most notably a ver-bal report [29 30] direct or indirect blind walking [31 32]or imagined timed walking [32] All of these methods haveproved to provide reliable and comparable results for bothauditory and visual stimuli with direct blind walking ex-hibiting the least between-subject variability [31 32]

In former work [26] authors of this paper developed amethod where subjects indicated the perceived distance ofreal and virtual sound sources by selecting one of several

Direct sound

Left wall reflection

Direct sound

Left wall reflection

(a)

(b)

Figure 4 Ambisonic sound field from 1st order measurementwith a Soundfield ST350 (a) 1st order representation (b) 3rd

order up-mix

physical loudspeakers lined up (and slightly offset in orderto provide lsquoacoustic transparencyrsquo) in front of their eyesHowever for the present study in order to completelyeliminate any possible anchors as well as visual cues itwas decided to utilize the method of direct blind walkingOf the main concerns in the experiment was a direct com-parison of distance perception of real sound sources versusvirtual sound sources presented over headphones Due todifferent apparatus requirements the experiment had to beconducted in two separate phases

61 Participants

Seven participants aged 24ndash58 took part in the experimentAll subjects were of good hearing and were either musictechnology students or practitioners actively involved inaudio research or production Prior to the test HRIR data

65

ACTA ACUSTICA UNITED WITH ACUSTICA Kearney et al Distance perceptionVol 98 (2012)

Figure 5 Measuring Head Related Impulse Responses withminiature microphones

for all the participants has been obtained in a sound-prooflarge (18times15times10m3) but quite damped (T60 1000Hz =057 s) multipurpose room (Black Box) in the Departmentof Theatre Film and Television at the University of YorkAdditional damping was assured by thick heavy curtainscovering all four walls and a carpet on the floor Themeasurement process consisted of a standard procedurewhere miniature omnidirectional microphones (KnowlesFG-23629-P16) were placed at the entrance of a blockedear canal in order to capture acoustic pressure generatedby one loudspeaker at a time located at constant distanceand varying angular direction

Subjects were seated on an elevated platform so thattheir ears were 220m above the ground and their headwas in the centre of a spherical loudspeaker array arrangedin diametrically opposed pairs The ear height was cali-brated using a laser guide as shown in Figure 5 The ar-ray consisted of 16 full range Genelec 8050A loudspeak-ers since the intention was to reproduce Ambisonic soundfields up to and including 3rd order This 3-D setup shownin Figure 6 comprised a flat-front horizontal octagon anda cube (four loudspeakers on top and four on the bottom)The radius of the loudspeaker array (and thus the virtualloudspeaker array) was 327m For FOA-to-binaural de-code only virtual loudspeakers from the cube configura-tion were utilized since no directional resolution is gainedby using a higher number of loudspeakers Furthermoredespite careful alignment oversampling of the sound fieldwith higher numbers of speakers has the potential to yieldsound field distortions [33] Note that for 2nd and 3rd order

Figure 6 Array of 16 loudspeakers used for HRIR measure-ments

reproduction all 16 loudspeakers were used Although theoversampled configuration was not optimal from the 2nd

order reproduction point of view it was not possible toeasily and accurately rearrange the loudspeaker array inorder to accommodate for a different layout

HRIRs were captured using the exponentially swept-sine tone technique [27] at 441 kHz sampling rate and16-bit resolution Since the measurement environment wasnot fully anechoic further processing of the measured datawas necessary The HRIRs were tapered before the arrivalof the first reflection (from the floor) yielding filter kernelswith 257 taps and were subsequently diffuse-field equal-ized

62 Stimuli

The stimuli used in the experiment were pink noisebursts and phonetically balanced phrases selected fromthe TIMIT Acoustic-Phonetic Continuous Speech Corpusdatabase and recorded by a female reader [34] A sam-pling rate of 441 kHz and 16 bit resolution was used inboth cases These two sample types were selected in orderto represent both unfamiliar and familiar sound sourcesThey were presented to the subjects in a pseudo-random-ized manner to avoid any ordering effects

For headphone reproduction prior to the test phaseFOA impulse response measurements were taken from thelistener position of each loudspeaker using the exponen-tially swept-sine tone technique [27] From these measure-ments 2nd and 3rd order impulse response sets were ex-tracted using the directional analysis approach outlined insection 5 0th order Ambisonics does not provide any di-rectional information which means that it would lack thecues that are investigated in the higher order renderingsTherefore it was decided not to include it in this compari-son

The only psychoacoustical optimization applied to theAmbisonics decodes was shelf filtering and was intendedto satisfy Gerzonrsquos localization criteria for maximized ve-locity decode at low frequencies and energy decode athigher frequencies [35] This involved changing the ratio

66

Kearney et al Distance perception ACTA ACUSTICA UNITED WITH ACUSTICAVol 98 (2012)

of the pressure to velocity components at low and highfrequencies Whilst the crossover frequency for the highfrequency boost in the pressure channel at first order isnormally in the region of 400Hz for regular loudspeakerlistening here we restore the crossover point to 700Hzsince the subject is always perfectly centred in the virtualloudspeaker array

63 Test Environment and Apparatus

A series of subjective listening tests was conducted in theLarge Rehearsal Room in the Department of Theatre Filmand Television in the University of York The room dimen-sions were 12 times 9 times 35m3 and the spatially averaged T60at 1 kHz was 026 s A low T60 was desired for this studyso the walls were covered with thick heavy curtains asshown in Figure 7 Since the up-mix from 1st to 2nd and 3rd

order Ambisonics concerned only the deterministic part ofthe measured SRIRs it was assumed that no advantagewould be gained from using a more reverberant space

A professional camera dolly track was set up roughlyin the direction of the diagonal of the room It not onlyallowed for testing distances of the real loudspeaker upto 8m but its non-symmetrical position also assured thatearly reflections of the same order from different surfacesdid not easily coincide at the subjects ears but instead ar-rived at different times A single full-range loudspeaker(Genelec 8050A) was mounted on a camera dolly whichenabled it to be noiselessly translated by the experimentassistant to different locations The guiding rope was hungalong the dolly track which was intended to help and guidethe participants when walking toward the sound sourceSince it was not possible to walk exactly on the dolly trackit was decided that the walking path would be directly nextto it as shown in Figure 7 The only weakness of this so-lution was that the sound source horizontal angle variedfrom 1404 degrees at the closest distance (2m) to 358 de-grees at the furthest distance (8m) However this did nothave any effect on the distance judgments for two reasonsFirstly the subjects were allowed (or even encouraged) torotate their head in order to fully utilize the available ITDand ILD cues Secondly the initial head orientation wasnot in any way fixed This combined with the fact thatthere were no clear cues to the subjectrsquos initial orientationin the room at the origin made this small initial angularoffset unimportant Furthermore none of the participantsreported any bias in their assessment based on the horizon-tal offset of the sound source

For trials with binaural presentation high quality openback headphones (AKG-K601) were used which exhibitlow levels of interaural magnitude and group delay dis-tortion Sound field rotation tilt and tumble control wasimplemented via the TrackIR 5 infra-red head trackingsystem [36] resulting in stable virtual images with headrotations The system responsible for playback of virtual-ized sound sources was completely built in the Pure Datavisual programming environment [37] and its combinedlatency (including head-tracker data porting and audio up-date rate) was 20ms

Figure 7 Participant performing a trial during the experiment

64 Procedure

In the experiment subjects entered the test environmentblindfolded and without any prior expectation regardingthe room dimensions its acoustic properties or the testapparatus They were guided by the experimenter to thereference point (the lsquooriginrsquo) After a short explanation ofthe experiment objectives a training session began witha short (3ndash5min) walking-only trial until participants feltcomfortable with walking blindfolded and using a guiderope Next they performed 4ndash6 training trials in which thesame test stimuli to be used in the experiment (speech andpink noise) were played by the loudspeaker at randomlychosen distances No feedback was given and no resultswere recorded after each test trial The end of the trainingsession was clearly announced and after a 1 minute inter-val the first phase of the test began

In test phase I participants were asked to listen to staticsound sources at a randomly chosen points focusing onthe perceived distance They could listen to any audio sam-ple as many times as they wished During the playbackthey were instructed to stay still and refrain from any trans-lational head movements However they were encouragedto rotate their head freely After the playback had stoppedthey were asked to walk guided by the rope to the pointwhere they thought the sound originated from The dis-tance walked was subsequently recorded by the assistantusing a laser measuring tool after which the participantwalked backwards to the origin In the meantime the loud-speaker was noiselessly translated to its new position and

67

ACTA ACUSTICA UNITED WITH ACUSTICA Kearney et al Distance perceptionVol 98 (2012)

the test proceeded Similar to the training session no feed-back was given at any stage

During the first test phase participants had to indicatethe perceived distance for sound sources randomly lo-cated at 2m 4m 6m or 8m Taking into account thatboth speech and pink noise bursts samples were used (ina pseudo-random order) the number of trials in the firstphase added up to 8 Each subject performed all the trialsonly once

Upon completion of the first phase of the test there wasa short (approximately 2 minutes) interval that was re-quired in order to put on the headphones and calibratethe head-tracking system In phase II subjects were alsoasked to identify the sound source distance but this timeusing Ambisonic sound fields presented over headphonesOther than the fact that headphones and the head-trackingsystem were used the test protocol remained the same asin phase I However due to the fact that there were threeplayback configurations to be tested (1st 2nd and 3rd orderAmbisonics) participants had to perform 24 trails insteadof 8 Instead of separate phases for each Ambisonic orderall samples were randomly presented to the subject withinthe same test phase Again subjects performed all the tri-als only once and no feedback was given at any stage

7 Results

The perceived sound source distance (indicated by the dis-tance walked) was collected from 7 subjects for 4 presen-tation points (2m 4m 6m and 8m) two stimuli (femalespeech and pink noise bursts) and four playback options1st 2nd and 3rd Order Ambisonics and real loudspeakerswhich for analysis we will denote FOA SOA TOA andREAL respectively With headphone trials none of theparticipants reported in-head localization however therewere 3 cases were the proximity of the sound source wasvery apparent so participants decided not to move at allIn some cases the virtual sound source was initially local-ized behind the subjects but all participants were able toresolve the confusion by applying head-rotation

We computed the mean values of walked distances micro foreach test condition along with the corresponding standarderrors se(micro) The results are presented separately for eachstimulus type within 95 Confidence Intervals

As expected the perception of distance for the real sour-ces was more accurate for near sources Beyond 4m dis-tance perception was continuously underestimated whichis congruent with the previous studies outlined in sec-tion 2 Furthermore the standard deviation of localiza-tion increases as the source moves further into the diffusefield We also see that unfamiliar stimuli produce greatervariability in subjectsrsquo answers The mean localization ofthe virtual sources follows the reference source localiza-tion well The answers for virtual sources deviate fromtheir means roughly in the same fashion as the answers forreference sources as localization becomes more difficultwithin the diffuse field

Since the study followed the within-subject factorial de-sign with 2(stimuli)4(playback conditions) in order to

0 1 2 3 4 5 6 7 8 90

1

2

3

4

5

6

7

8

9

Real distance [m]

Dis

tance

walk

ed

[m]

FOASOATOAReal

Figure 8 Mean localization of real and virtual sound sources (fe-male speech)

0 1 2 3 4 5 6 7 8 90

1

2

3

4

5

6

7

8

9

Real distance [m]

Dis

tance

walk

ed

[m]

FOASOATOAReal

Figure 9 Mean localization of real and virtual sound sources(pink noise bursts)

investigate the effects of these two factors (referred lateras factors A and B) as well as potential interaction ef-fects for each presentation distance a two-way ANOVAhas been performed The null hypothesis being tested hereis that all the mean perceived distances for all the stimuliand playback methods do not differ significantly

H0 microFOA=microSOA=microTOA=microReal=micro

H1 not all localization means (microi) are the same

No statistically significant effect of stimuli (familiar vsunfamiliar) on the perception of distance has been found(F2m(3 48) = 0835 p = 0365 F4m(3 48) = 20462p = 0159 F6m(3 48) = 2575 p = 0115 F8m(3 48) =20462 p = 0159) For distances of 4m and moreplayback option had also no statistically significant effect(F4m(3 48) = 2192 p = 0101 F6m(3 48) = 0665p = 0577 F8m(3 48) = 0202 p = 0894)

However a statistically significant difference has beendetected for the distance of 2m In larger study de-signs with multiple levels it is advisable to use the Hon-

68

Kearney et al Distance perception ACTA ACUSTICA UNITED WITH ACUSTICAVol 98 (2012)

Table I Mean localization [m] of virtual and real sound sourcesat 2m

microFOA microSOA microTOA microReal

Speech 1119 1389 0841 1638Noise 0877 1001 0902 1641

Table II Correlation coefficients ρ and corresponding pminus valuesfor pairs of distance estimations for real and virtual soundsources (Speech)

ρ p minus valueReal vs FOA 09828 00172Real vs SOA 09960 00040Real vs TOA 09590 00410

Table III Correlation coefficients ρ and corresponding pminusvaluesfor pairs of distance estimations for real and virtual soundsources (Noise)

ρ p minus valueReal vs FOA 09913 00087Real vs SOA 09857 00143Real vs TOA 09972 00028

estly Significant Difference (HSD) approach since thereis an increased risk of spuriously significant differencearisen purely by chance So in order to investigate furtherwhere the difference occurs anHSD has been computed(HSD = 1423m) If we now compile the table of meanperceived distances for the sound sources located at 2mwe can see that all of the above values clearly lie withina single HSD to each other and cannot be distinguishedWe can safely assume then an ANOVA false alarm (typeI error) and no statistically significant effect of playbackmethod for the sources at the distance of 2m as well

Lastly for all the distances no synergetic effects of fac-tors A (stimuli) and B (playback conditions) have beendetected

Additionally we calculated correlation coefficients ρ forpairs of distance estimations for real and virtual soundsources (either 1st 2nd or 3rd order) and two stimuli Inall cases high correlation coefficients have been obtainedwhich confirms our findings that for these particular testconditions the perception of distance of binaurally ren-dered Ambisonic sound fields of orders 1 to 3 cannot bedistinguished from the perception of distance of the realsound sources

8 Discussion

The results presented for real sources corroborate the clas-sic underestimation of source distance as reported in theliterature These results were used as a basis with whichto measure the ability of Ambisonic sound fields of differ-ent orders to present sources at different distances It was

expected that a further underestimation of the source dis-tance would ensue with the binaural rendering as reportedin [17] However this was not the case even for first or-der presentations and the apparent distances of the vir-tual sources matched the real source distances well Oneshould note that the major difference between this studyand that of [17] is our use of head-tracking indicatingthe importance of head-movements in perceiving sourcedistance which develops the findings of Waller [18] andAshmead et al [10] on user interaction in a virtual spaceFurther work is required to quantify the effect of this

Moreover the presented study demonstrates that the en-hanced directional accuracy gained by presenting soundsources in HOA through head-tracked binaural renderingdoes not yield a significant improvement in the perceptionof the source distance What is noteworthy is that for eachorder there is no significant difference in the perception ofthe source location when compared to real-world sourcesWe therefore conclude that sound field directionality fordistance perception is sufficient with 1st order playback

The presence of the ANOVA false alarm at the 2m pointis of interest It is noteworthy that the 2m point representsa source inside the virtual array geometry It is a knownissue that virtual sound sources rendered inside the arrayof loudspeakers cannot be reproduced in a straightforwardway without artifacts Some of these artifacts include in-correct wave-front curvature and insufficient bass boost

In the first case there is ample evidence in the litera-ture to suggest that the wavefront curvature translates to asignificant binaural cues for sound sources near the head[30 38] It was already shown in section 21 that as asource moves closer to the head the levels of the monau-ral transfer function and the ILD both change significantlywith source angle However this effect is not strong at 1mand beyond For sources further away it has been shownin [39] that it is very difficult to assess distance by binauralcues alone

In the second case the requirement for distance com-pensation filtering due to near field effects for the largeloudspeaker radius (327m) and the given source distances(gt2m) is only prominent below 100Hz For the femalespeech test stimuli this will not have an effect since thefirst formant frequencies do not go down below 180HzAlso the current method employed for capturing HRIRsallowed for reliably obtaining filters with a frequencyresponse reaching down to around 170Hz thereby alsoband-limiting the delivery of the pink noise stimuli

Finally there was no significant difference in the resultspresented for different sources although the greater vari-ance in the results for pink noise suggest that the famil-iarity of the source does indeed play a role in the percep-tion of source distance as mentioned in section 23 Futurestudies will investigate the use of these monaural cues fur-ther and will utilize 0th order sound field rendering sinceit will remove the influence of any directional information

Considering the aforementioned study of Bronkhorst etal [14] where the accuracy of distance perception for bin-aural playback increases with the number of reflections

69

ACTA ACUSTICA UNITED WITH ACUSTICA Kearney et al Distance perceptionVol 98 (2012)

our findings demonstrate that the net effect of the monauralcues of direct to reverberant ratio level difference and timeof arrival of early reflections are of greater importance indistance perception for binaural rendering than Ambisonicdirectional accuracy beyond 1st order

9 Conclusions

We have assessed through subjective analysis the per-ceived source distance in virtual Ambisonic sound fieldsin comparison to real world sources The hypothesis testedwas that enhanced directional accuracy of deterministicpart of the sound field may lead to better reconstructionof environmental depth and thus improve the perceptionof sound source distance However it was shown thatAmbisonic reproduction matches the perceived real worldsource distances well even at 1st order and no improvementin this regard was observed when increasing the order Itmust be emphasized though that this analysis applies toAmbisonic-to-binaural decodes with higher order synthe-sis achieved using the directional analysis method of [23]Therefore further work will examine this topic for loud-speaker reproduction for both centre and off-centre listen-ing as well as investigate the effectiveness of HOA synthe-sis in comparison to real world HOA measurements

Acknowledgments

The authors gratefully acknowledge the participation ofthe test subjects for both their time and constructive com-ments as well as the technical support staff at the Depart-ment of Theatre Film and Television at the University ofYork for their assistance in the experimental setups Thisresearch is supported by Science Foundation Ireland

References

[1] L Fauster Stereoscopic techniques in computer graphicsTechnical paper TU Wien 2007

[2] J Lee Head tracking for desktop VR displays using theWii remote httpjohnnyleenetprojectswiiaccessed 30th Sept 2011

[3] D R Begault Direct comparison of the impact of headtracking reverberation and individualized head-relatedtransfer functions on the spatial perception of a virtualsound source J Audio Eng Soc 49 (2001) 904ndash916

[4] M Otani T Hirahara Auditory artifacts due to switchinghead-related transfer functions of a dynamic virtual audi-tory display IEICE Trans Fundam Electron CommunComput Sci E91-A (2008) 1320ndash1328

[5] V Pulkki Virtual sound source positioning using VectorBase Amplitude Panning J Audio Eng Soc 45 (1997)456ndash466

[6] A J Berkhout A Holographic Approach to Acoustic Con-trol J Audio Eng Soc 36 (1988) 977ndash995

[7] M A Gerzon Periphony With-height sound reproductionJ Audio Eng Soc 21 (1973) 2ndash10

[8] F Rumsey Spatial quality evaluation for reproducedsound Terminology meaning and a scene-based para-digm J Audio Eng Soc 50 (2002) 651ndash666

[9] J Blauert Communication acoustics Springer 2008

[10] D H Ashmead D L Davis A Northington Contributionof listenersrsquo approaching motion to auditory distance per-ception J Exp Psy Hum Percep and Perform 21 (1995)239ndash256

[11] E Czerwinski A Voishvillo S Alexandrov A TerekhovPropagation distortion in sound systems Can we avoid itJ Audio Eng Soc 48 (2000) 30ndash48

[12] S H Nielsen Auditory distance perception in differentrooms J Audio Eng Soc 41 (1993) 755ndash770

[13] M B Gardner Distance estimation of 0 or apparent 0oriented speech signals in anechoic space J Acoust SocAm 45 (1969) 47ndash53

[14] A W Bronkhorst T Houtgast Auditory distance percep-tion in rooms Nature 397 (1999) 517ndash520

[15] J B Allen D A Berkley Image method for efficientlysimulating small-room acoustics J Acoust Soc Am 65(1979) 943ndash950

[16] M Rychtarikova T V d Bogaert G Vermeir J WoutersBinaural sound source localization in real and virtualrooms J Audio Eng Soc 57 (2009) 205ndash220

[17] J S Chan C Maguinness D Lisiecka C Ennis MLarkin C OrsquoSullivan F Newell Comparing audiovisualdistance perception in various real and virtual environ-ments Proc of the 32nd Euro Conf on Vis Percep Re-gensburg Germany 2009

[18] D Waller Factors affecting the perception of interobjectdistances in virtual environments Presence Teleoper Vir-tual Environ 8 (1999) 657ndash670

[19] A McKeag D McGrath Sound field format to binauraldecoder with head-tracking Proc of the 6th Australian Re-gional Convention of the AES 1996

[20] M Noisternig A Sontacchi T Musil R Holdrich A3D Ambisonic based binaural sound reproduction systemProc of the 24th Int Conf of the Audio Eng Soc AlbertaCanada 2003

[21] B-I Dalenback M Str

omberg Real time walkthrough au-

ralization - the first year Proc of the Inst of AcousCopenhagen Denmark 2006

[22] C Masterson S Adams G Kearney F Boland A methodfor head related impulse response simplification Procof the 17th European Signal Processing Conference (EU-SIPCO) Glasgow Scotland 2009

[23] J Merimaa V Pulkki Spatial impulse response renderingi Analysis and synthesis J Audio Eng Soc 53 (2005)

[24] W M Hartmann Localization of sound in rooms JAcoust Soc Am 74 (1983) 1380ndash1391

[25] D Griesinger Spatial impression and envelopment in smallrooms Proc of the 103rd Conv of the Audio Eng SocNew York USA 1997

[26] G Kearney M Gorzel H Rice F Boland Depth per-ception in interactive virual acoustic environments usinghigher order ambisonic soundfields Proc of the 2nd IntAmbisonics Symp Paris France 2010

[27] A Farina Simultaneous measurement of impulse responseand distortion with a swept-sine technique Proc of the108th Conv of the Audio Eng Soc Paris France 2000

[28] M Gerzon The design of precisely coincident microphonearrays for stereoand surround sound Proc of the 50thConv of the Audio Eng Soc London UK 1975

70

Kearney et al Distance perception ACTA ACUSTICA UNITED WITH ACUSTICAVol 98 (2012)

[29] C Guastavino B F G Katz Perceptual evaluation ofmulti-dimensional spatial audio reproduction J AcoustSoc Am 116 (2004) 1105ndash1115

[30] P Zahorik Assessing auditory distance perception usingvirtual acoustics J Acoust Soc Am 111 (2002) 1832ndash1846

[31] J M Loomis R L Klatzky J W Philbeck R G Goll-edge Assessing auditory distance perception using percep-tually directed action Perception And Psychophysics 60(1998) 966ndash980

[32] T Y Grechkin T D Nguyen J M Plumert J F CremerJ K Kearney How does presentation method and measure-ment protocol affect distance estimation in real and virtualenvironments ACM Trans Appl Percept 7 (2010) 261ndash2618

[33] S Bertet Formats audio 3d hieacuterarchiques Caracteacuterisationobjective et perceptive des systeacutemes ambisonicsdrsquoordressupeacuterieurs PhD dissertation INSA Lyon 2008

[34] W M Fisher G R Doddington K M Goudie-MarshallThe darpa speech recognition research database Specifica-tions and status Proc of the DARPA Workshop on SpeechRecognition 1986

[35] M A Gerzon G J Barton Ambisonic decoders forHDTV Proc of the 92nd Conv of the Audio Eng SocVienna Austria 1992

[36] NaturalPoint Trackir 5 httpwwwnaturalpointcomtrackir accessed 30th Sept 2011

[37] M Puckette Pure data httppuredatainfo ac-cessed 30th Sept 2011

[38] P Zahorik D S Brungart A W Bronkhorst Auditory dis-tance perception in humans A summary of past and presentresearch Acta Acustica united with Acustica 91 (2005)409ndash420

[39] H Wittek Perceptual differences between Wavefield Syn-thesis and Stereophony Department of Music and SoundRecording School of Arts Communication and Humani-ties University of Surrey UK 2007

71

Page 6: 1 ) -0 #0*- ! 1 ) #! !-* 0* ! ) *- ! ) ) ) *#! #0! *sigmedia/pmwiki/uploads/Main...# ( " &/4$/' ,$ +$ *- ! ) %- #! ! !- ) - 1 ) -0 #0*- ! 1 ) #! !-* 0* ! ) *- ! )) ) *#! #0! * K05

ACTA ACUSTICA UNITED WITH ACUSTICA Kearney et al Distance perceptionVol 98 (2012)

Figure 5 Measuring Head Related Impulse Responses withminiature microphones

for all the participants has been obtained in a sound-prooflarge (18times15times10m3) but quite damped (T60 1000Hz =057 s) multipurpose room (Black Box) in the Departmentof Theatre Film and Television at the University of YorkAdditional damping was assured by thick heavy curtainscovering all four walls and a carpet on the floor Themeasurement process consisted of a standard procedurewhere miniature omnidirectional microphones (KnowlesFG-23629-P16) were placed at the entrance of a blockedear canal in order to capture acoustic pressure generatedby one loudspeaker at a time located at constant distanceand varying angular direction

Subjects were seated on an elevated platform so thattheir ears were 220m above the ground and their headwas in the centre of a spherical loudspeaker array arrangedin diametrically opposed pairs The ear height was cali-brated using a laser guide as shown in Figure 5 The ar-ray consisted of 16 full range Genelec 8050A loudspeak-ers since the intention was to reproduce Ambisonic soundfields up to and including 3rd order This 3-D setup shownin Figure 6 comprised a flat-front horizontal octagon anda cube (four loudspeakers on top and four on the bottom)The radius of the loudspeaker array (and thus the virtualloudspeaker array) was 327m For FOA-to-binaural de-code only virtual loudspeakers from the cube configura-tion were utilized since no directional resolution is gainedby using a higher number of loudspeakers Furthermoredespite careful alignment oversampling of the sound fieldwith higher numbers of speakers has the potential to yieldsound field distortions [33] Note that for 2nd and 3rd order

Figure 6 Array of 16 loudspeakers used for HRIR measure-ments

reproduction all 16 loudspeakers were used Although theoversampled configuration was not optimal from the 2nd

order reproduction point of view it was not possible toeasily and accurately rearrange the loudspeaker array inorder to accommodate for a different layout

HRIRs were captured using the exponentially swept-sine tone technique [27] at 441 kHz sampling rate and16-bit resolution Since the measurement environment wasnot fully anechoic further processing of the measured datawas necessary The HRIRs were tapered before the arrivalof the first reflection (from the floor) yielding filter kernelswith 257 taps and were subsequently diffuse-field equal-ized

62 Stimuli

The stimuli used in the experiment were pink noisebursts and phonetically balanced phrases selected fromthe TIMIT Acoustic-Phonetic Continuous Speech Corpusdatabase and recorded by a female reader [34] A sam-pling rate of 441 kHz and 16 bit resolution was used inboth cases These two sample types were selected in orderto represent both unfamiliar and familiar sound sourcesThey were presented to the subjects in a pseudo-random-ized manner to avoid any ordering effects

For headphone reproduction prior to the test phaseFOA impulse response measurements were taken from thelistener position of each loudspeaker using the exponen-tially swept-sine tone technique [27] From these measure-ments 2nd and 3rd order impulse response sets were ex-tracted using the directional analysis approach outlined insection 5 0th order Ambisonics does not provide any di-rectional information which means that it would lack thecues that are investigated in the higher order renderingsTherefore it was decided not to include it in this compari-son

The only psychoacoustical optimization applied to theAmbisonics decodes was shelf filtering and was intendedto satisfy Gerzonrsquos localization criteria for maximized ve-locity decode at low frequencies and energy decode athigher frequencies [35] This involved changing the ratio

66

Kearney et al Distance perception ACTA ACUSTICA UNITED WITH ACUSTICAVol 98 (2012)

of the pressure to velocity components at low and highfrequencies Whilst the crossover frequency for the highfrequency boost in the pressure channel at first order isnormally in the region of 400Hz for regular loudspeakerlistening here we restore the crossover point to 700Hzsince the subject is always perfectly centred in the virtualloudspeaker array

63 Test Environment and Apparatus

A series of subjective listening tests was conducted in theLarge Rehearsal Room in the Department of Theatre Filmand Television in the University of York The room dimen-sions were 12 times 9 times 35m3 and the spatially averaged T60at 1 kHz was 026 s A low T60 was desired for this studyso the walls were covered with thick heavy curtains asshown in Figure 7 Since the up-mix from 1st to 2nd and 3rd

order Ambisonics concerned only the deterministic part ofthe measured SRIRs it was assumed that no advantagewould be gained from using a more reverberant space

A professional camera dolly track was set up roughlyin the direction of the diagonal of the room It not onlyallowed for testing distances of the real loudspeaker upto 8m but its non-symmetrical position also assured thatearly reflections of the same order from different surfacesdid not easily coincide at the subjects ears but instead ar-rived at different times A single full-range loudspeaker(Genelec 8050A) was mounted on a camera dolly whichenabled it to be noiselessly translated by the experimentassistant to different locations The guiding rope was hungalong the dolly track which was intended to help and guidethe participants when walking toward the sound sourceSince it was not possible to walk exactly on the dolly trackit was decided that the walking path would be directly nextto it as shown in Figure 7 The only weakness of this so-lution was that the sound source horizontal angle variedfrom 1404 degrees at the closest distance (2m) to 358 de-grees at the furthest distance (8m) However this did nothave any effect on the distance judgments for two reasonsFirstly the subjects were allowed (or even encouraged) torotate their head in order to fully utilize the available ITDand ILD cues Secondly the initial head orientation wasnot in any way fixed This combined with the fact thatthere were no clear cues to the subjectrsquos initial orientationin the room at the origin made this small initial angularoffset unimportant Furthermore none of the participantsreported any bias in their assessment based on the horizon-tal offset of the sound source

For trials with binaural presentation high quality openback headphones (AKG-K601) were used which exhibitlow levels of interaural magnitude and group delay dis-tortion Sound field rotation tilt and tumble control wasimplemented via the TrackIR 5 infra-red head trackingsystem [36] resulting in stable virtual images with headrotations The system responsible for playback of virtual-ized sound sources was completely built in the Pure Datavisual programming environment [37] and its combinedlatency (including head-tracker data porting and audio up-date rate) was 20ms

Figure 7 Participant performing a trial during the experiment

64 Procedure

In the experiment subjects entered the test environmentblindfolded and without any prior expectation regardingthe room dimensions its acoustic properties or the testapparatus They were guided by the experimenter to thereference point (the lsquooriginrsquo) After a short explanation ofthe experiment objectives a training session began witha short (3ndash5min) walking-only trial until participants feltcomfortable with walking blindfolded and using a guiderope Next they performed 4ndash6 training trials in which thesame test stimuli to be used in the experiment (speech andpink noise) were played by the loudspeaker at randomlychosen distances No feedback was given and no resultswere recorded after each test trial The end of the trainingsession was clearly announced and after a 1 minute inter-val the first phase of the test began

In test phase I participants were asked to listen to staticsound sources at a randomly chosen points focusing onthe perceived distance They could listen to any audio sam-ple as many times as they wished During the playbackthey were instructed to stay still and refrain from any trans-lational head movements However they were encouragedto rotate their head freely After the playback had stoppedthey were asked to walk guided by the rope to the pointwhere they thought the sound originated from The dis-tance walked was subsequently recorded by the assistantusing a laser measuring tool after which the participantwalked backwards to the origin In the meantime the loud-speaker was noiselessly translated to its new position and

67

ACTA ACUSTICA UNITED WITH ACUSTICA Kearney et al Distance perceptionVol 98 (2012)

the test proceeded Similar to the training session no feed-back was given at any stage

During the first test phase participants had to indicatethe perceived distance for sound sources randomly lo-cated at 2m 4m 6m or 8m Taking into account thatboth speech and pink noise bursts samples were used (ina pseudo-random order) the number of trials in the firstphase added up to 8 Each subject performed all the trialsonly once

Upon completion of the first phase of the test there wasa short (approximately 2 minutes) interval that was re-quired in order to put on the headphones and calibratethe head-tracking system In phase II subjects were alsoasked to identify the sound source distance but this timeusing Ambisonic sound fields presented over headphonesOther than the fact that headphones and the head-trackingsystem were used the test protocol remained the same asin phase I However due to the fact that there were threeplayback configurations to be tested (1st 2nd and 3rd orderAmbisonics) participants had to perform 24 trails insteadof 8 Instead of separate phases for each Ambisonic orderall samples were randomly presented to the subject withinthe same test phase Again subjects performed all the tri-als only once and no feedback was given at any stage

7 Results

The perceived sound source distance (indicated by the dis-tance walked) was collected from 7 subjects for 4 presen-tation points (2m 4m 6m and 8m) two stimuli (femalespeech and pink noise bursts) and four playback options1st 2nd and 3rd Order Ambisonics and real loudspeakerswhich for analysis we will denote FOA SOA TOA andREAL respectively With headphone trials none of theparticipants reported in-head localization however therewere 3 cases were the proximity of the sound source wasvery apparent so participants decided not to move at allIn some cases the virtual sound source was initially local-ized behind the subjects but all participants were able toresolve the confusion by applying head-rotation

We computed the mean values of walked distances micro foreach test condition along with the corresponding standarderrors se(micro) The results are presented separately for eachstimulus type within 95 Confidence Intervals

As expected the perception of distance for the real sour-ces was more accurate for near sources Beyond 4m dis-tance perception was continuously underestimated whichis congruent with the previous studies outlined in sec-tion 2 Furthermore the standard deviation of localiza-tion increases as the source moves further into the diffusefield We also see that unfamiliar stimuli produce greatervariability in subjectsrsquo answers The mean localization ofthe virtual sources follows the reference source localiza-tion well The answers for virtual sources deviate fromtheir means roughly in the same fashion as the answers forreference sources as localization becomes more difficultwithin the diffuse field

Since the study followed the within-subject factorial de-sign with 2(stimuli)4(playback conditions) in order to

0 1 2 3 4 5 6 7 8 90

1

2

3

4

5

6

7

8

9

Real distance [m]

Dis

tance

walk

ed

[m]

FOASOATOAReal

Figure 8 Mean localization of real and virtual sound sources (fe-male speech)

0 1 2 3 4 5 6 7 8 90

1

2

3

4

5

6

7

8

9

Real distance [m]

Dis

tance

walk

ed

[m]

FOASOATOAReal

Figure 9 Mean localization of real and virtual sound sources(pink noise bursts)

investigate the effects of these two factors (referred lateras factors A and B) as well as potential interaction ef-fects for each presentation distance a two-way ANOVAhas been performed The null hypothesis being tested hereis that all the mean perceived distances for all the stimuliand playback methods do not differ significantly

H0 microFOA=microSOA=microTOA=microReal=micro

H1 not all localization means (microi) are the same

No statistically significant effect of stimuli (familiar vsunfamiliar) on the perception of distance has been found(F2m(3 48) = 0835 p = 0365 F4m(3 48) = 20462p = 0159 F6m(3 48) = 2575 p = 0115 F8m(3 48) =20462 p = 0159) For distances of 4m and moreplayback option had also no statistically significant effect(F4m(3 48) = 2192 p = 0101 F6m(3 48) = 0665p = 0577 F8m(3 48) = 0202 p = 0894)

However a statistically significant difference has beendetected for the distance of 2m In larger study de-signs with multiple levels it is advisable to use the Hon-

68

Kearney et al Distance perception ACTA ACUSTICA UNITED WITH ACUSTICAVol 98 (2012)

Table I Mean localization [m] of virtual and real sound sourcesat 2m

microFOA microSOA microTOA microReal

Speech 1119 1389 0841 1638Noise 0877 1001 0902 1641

Table II Correlation coefficients ρ and corresponding pminus valuesfor pairs of distance estimations for real and virtual soundsources (Speech)

ρ p minus valueReal vs FOA 09828 00172Real vs SOA 09960 00040Real vs TOA 09590 00410

Table III Correlation coefficients ρ and corresponding pminusvaluesfor pairs of distance estimations for real and virtual soundsources (Noise)

ρ p minus valueReal vs FOA 09913 00087Real vs SOA 09857 00143Real vs TOA 09972 00028

estly Significant Difference (HSD) approach since thereis an increased risk of spuriously significant differencearisen purely by chance So in order to investigate furtherwhere the difference occurs anHSD has been computed(HSD = 1423m) If we now compile the table of meanperceived distances for the sound sources located at 2mwe can see that all of the above values clearly lie withina single HSD to each other and cannot be distinguishedWe can safely assume then an ANOVA false alarm (typeI error) and no statistically significant effect of playbackmethod for the sources at the distance of 2m as well

Lastly for all the distances no synergetic effects of fac-tors A (stimuli) and B (playback conditions) have beendetected

Additionally we calculated correlation coefficients ρ forpairs of distance estimations for real and virtual soundsources (either 1st 2nd or 3rd order) and two stimuli Inall cases high correlation coefficients have been obtainedwhich confirms our findings that for these particular testconditions the perception of distance of binaurally ren-dered Ambisonic sound fields of orders 1 to 3 cannot bedistinguished from the perception of distance of the realsound sources

8 Discussion

The results presented for real sources corroborate the clas-sic underestimation of source distance as reported in theliterature These results were used as a basis with whichto measure the ability of Ambisonic sound fields of differ-ent orders to present sources at different distances It was

expected that a further underestimation of the source dis-tance would ensue with the binaural rendering as reportedin [17] However this was not the case even for first or-der presentations and the apparent distances of the vir-tual sources matched the real source distances well Oneshould note that the major difference between this studyand that of [17] is our use of head-tracking indicatingthe importance of head-movements in perceiving sourcedistance which develops the findings of Waller [18] andAshmead et al [10] on user interaction in a virtual spaceFurther work is required to quantify the effect of this

Moreover the presented study demonstrates that the en-hanced directional accuracy gained by presenting soundsources in HOA through head-tracked binaural renderingdoes not yield a significant improvement in the perceptionof the source distance What is noteworthy is that for eachorder there is no significant difference in the perception ofthe source location when compared to real-world sourcesWe therefore conclude that sound field directionality fordistance perception is sufficient with 1st order playback

The presence of the ANOVA false alarm at the 2m pointis of interest It is noteworthy that the 2m point representsa source inside the virtual array geometry It is a knownissue that virtual sound sources rendered inside the arrayof loudspeakers cannot be reproduced in a straightforwardway without artifacts Some of these artifacts include in-correct wave-front curvature and insufficient bass boost

In the first case there is ample evidence in the litera-ture to suggest that the wavefront curvature translates to asignificant binaural cues for sound sources near the head[30 38] It was already shown in section 21 that as asource moves closer to the head the levels of the monau-ral transfer function and the ILD both change significantlywith source angle However this effect is not strong at 1mand beyond For sources further away it has been shownin [39] that it is very difficult to assess distance by binauralcues alone

In the second case the requirement for distance com-pensation filtering due to near field effects for the largeloudspeaker radius (327m) and the given source distances(gt2m) is only prominent below 100Hz For the femalespeech test stimuli this will not have an effect since thefirst formant frequencies do not go down below 180HzAlso the current method employed for capturing HRIRsallowed for reliably obtaining filters with a frequencyresponse reaching down to around 170Hz thereby alsoband-limiting the delivery of the pink noise stimuli

Finally there was no significant difference in the resultspresented for different sources although the greater vari-ance in the results for pink noise suggest that the famil-iarity of the source does indeed play a role in the percep-tion of source distance as mentioned in section 23 Futurestudies will investigate the use of these monaural cues fur-ther and will utilize 0th order sound field rendering sinceit will remove the influence of any directional information

Considering the aforementioned study of Bronkhorst etal [14] where the accuracy of distance perception for bin-aural playback increases with the number of reflections

69

ACTA ACUSTICA UNITED WITH ACUSTICA Kearney et al Distance perceptionVol 98 (2012)

our findings demonstrate that the net effect of the monauralcues of direct to reverberant ratio level difference and timeof arrival of early reflections are of greater importance indistance perception for binaural rendering than Ambisonicdirectional accuracy beyond 1st order

9 Conclusions

We have assessed through subjective analysis the per-ceived source distance in virtual Ambisonic sound fieldsin comparison to real world sources The hypothesis testedwas that enhanced directional accuracy of deterministicpart of the sound field may lead to better reconstructionof environmental depth and thus improve the perceptionof sound source distance However it was shown thatAmbisonic reproduction matches the perceived real worldsource distances well even at 1st order and no improvementin this regard was observed when increasing the order Itmust be emphasized though that this analysis applies toAmbisonic-to-binaural decodes with higher order synthe-sis achieved using the directional analysis method of [23]Therefore further work will examine this topic for loud-speaker reproduction for both centre and off-centre listen-ing as well as investigate the effectiveness of HOA synthe-sis in comparison to real world HOA measurements

Acknowledgments

The authors gratefully acknowledge the participation ofthe test subjects for both their time and constructive com-ments as well as the technical support staff at the Depart-ment of Theatre Film and Television at the University ofYork for their assistance in the experimental setups Thisresearch is supported by Science Foundation Ireland

References

[1] L Fauster Stereoscopic techniques in computer graphicsTechnical paper TU Wien 2007

[2] J Lee Head tracking for desktop VR displays using theWii remote httpjohnnyleenetprojectswiiaccessed 30th Sept 2011

[3] D R Begault Direct comparison of the impact of headtracking reverberation and individualized head-relatedtransfer functions on the spatial perception of a virtualsound source J Audio Eng Soc 49 (2001) 904ndash916

[4] M Otani T Hirahara Auditory artifacts due to switchinghead-related transfer functions of a dynamic virtual audi-tory display IEICE Trans Fundam Electron CommunComput Sci E91-A (2008) 1320ndash1328

[5] V Pulkki Virtual sound source positioning using VectorBase Amplitude Panning J Audio Eng Soc 45 (1997)456ndash466

[6] A J Berkhout A Holographic Approach to Acoustic Con-trol J Audio Eng Soc 36 (1988) 977ndash995

[7] M A Gerzon Periphony With-height sound reproductionJ Audio Eng Soc 21 (1973) 2ndash10

[8] F Rumsey Spatial quality evaluation for reproducedsound Terminology meaning and a scene-based para-digm J Audio Eng Soc 50 (2002) 651ndash666

[9] J Blauert Communication acoustics Springer 2008

[10] D H Ashmead D L Davis A Northington Contributionof listenersrsquo approaching motion to auditory distance per-ception J Exp Psy Hum Percep and Perform 21 (1995)239ndash256

[11] E Czerwinski A Voishvillo S Alexandrov A TerekhovPropagation distortion in sound systems Can we avoid itJ Audio Eng Soc 48 (2000) 30ndash48

[12] S H Nielsen Auditory distance perception in differentrooms J Audio Eng Soc 41 (1993) 755ndash770

[13] M B Gardner Distance estimation of 0 or apparent 0oriented speech signals in anechoic space J Acoust SocAm 45 (1969) 47ndash53

[14] A W Bronkhorst T Houtgast Auditory distance percep-tion in rooms Nature 397 (1999) 517ndash520

[15] J B Allen D A Berkley Image method for efficientlysimulating small-room acoustics J Acoust Soc Am 65(1979) 943ndash950

[16] M Rychtarikova T V d Bogaert G Vermeir J WoutersBinaural sound source localization in real and virtualrooms J Audio Eng Soc 57 (2009) 205ndash220

[17] J S Chan C Maguinness D Lisiecka C Ennis MLarkin C OrsquoSullivan F Newell Comparing audiovisualdistance perception in various real and virtual environ-ments Proc of the 32nd Euro Conf on Vis Percep Re-gensburg Germany 2009

[18] D Waller Factors affecting the perception of interobjectdistances in virtual environments Presence Teleoper Vir-tual Environ 8 (1999) 657ndash670

[19] A McKeag D McGrath Sound field format to binauraldecoder with head-tracking Proc of the 6th Australian Re-gional Convention of the AES 1996

[20] M Noisternig A Sontacchi T Musil R Holdrich A3D Ambisonic based binaural sound reproduction systemProc of the 24th Int Conf of the Audio Eng Soc AlbertaCanada 2003

[21] B-I Dalenback M Str

omberg Real time walkthrough au-

ralization - the first year Proc of the Inst of AcousCopenhagen Denmark 2006

[22] C Masterson S Adams G Kearney F Boland A methodfor head related impulse response simplification Procof the 17th European Signal Processing Conference (EU-SIPCO) Glasgow Scotland 2009

[23] J Merimaa V Pulkki Spatial impulse response renderingi Analysis and synthesis J Audio Eng Soc 53 (2005)

[24] W M Hartmann Localization of sound in rooms JAcoust Soc Am 74 (1983) 1380ndash1391

[25] D Griesinger Spatial impression and envelopment in smallrooms Proc of the 103rd Conv of the Audio Eng SocNew York USA 1997

[26] G Kearney M Gorzel H Rice F Boland Depth per-ception in interactive virual acoustic environments usinghigher order ambisonic soundfields Proc of the 2nd IntAmbisonics Symp Paris France 2010

[27] A Farina Simultaneous measurement of impulse responseand distortion with a swept-sine technique Proc of the108th Conv of the Audio Eng Soc Paris France 2000

[28] M Gerzon The design of precisely coincident microphonearrays for stereoand surround sound Proc of the 50thConv of the Audio Eng Soc London UK 1975

70

Kearney et al Distance perception ACTA ACUSTICA UNITED WITH ACUSTICAVol 98 (2012)

[29] C Guastavino B F G Katz Perceptual evaluation ofmulti-dimensional spatial audio reproduction J AcoustSoc Am 116 (2004) 1105ndash1115

[30] P Zahorik Assessing auditory distance perception usingvirtual acoustics J Acoust Soc Am 111 (2002) 1832ndash1846

[31] J M Loomis R L Klatzky J W Philbeck R G Goll-edge Assessing auditory distance perception using percep-tually directed action Perception And Psychophysics 60(1998) 966ndash980

[32] T Y Grechkin T D Nguyen J M Plumert J F CremerJ K Kearney How does presentation method and measure-ment protocol affect distance estimation in real and virtualenvironments ACM Trans Appl Percept 7 (2010) 261ndash2618

[33] S Bertet Formats audio 3d hieacuterarchiques Caracteacuterisationobjective et perceptive des systeacutemes ambisonicsdrsquoordressupeacuterieurs PhD dissertation INSA Lyon 2008

[34] W M Fisher G R Doddington K M Goudie-MarshallThe darpa speech recognition research database Specifica-tions and status Proc of the DARPA Workshop on SpeechRecognition 1986

[35] M A Gerzon G J Barton Ambisonic decoders forHDTV Proc of the 92nd Conv of the Audio Eng SocVienna Austria 1992

[36] NaturalPoint Trackir 5 httpwwwnaturalpointcomtrackir accessed 30th Sept 2011

[37] M Puckette Pure data httppuredatainfo ac-cessed 30th Sept 2011

[38] P Zahorik D S Brungart A W Bronkhorst Auditory dis-tance perception in humans A summary of past and presentresearch Acta Acustica united with Acustica 91 (2005)409ndash420

[39] H Wittek Perceptual differences between Wavefield Syn-thesis and Stereophony Department of Music and SoundRecording School of Arts Communication and Humani-ties University of Surrey UK 2007

71

Page 7: 1 ) -0 #0*- ! 1 ) #! !-* 0* ! ) *- ! ) ) ) *#! #0! *sigmedia/pmwiki/uploads/Main...# ( " &/4$/' ,$ +$ *- ! ) %- #! ! !- ) - 1 ) -0 #0*- ! 1 ) #! !-* 0* ! ) *- ! )) ) *#! #0! * K05

Kearney et al Distance perception ACTA ACUSTICA UNITED WITH ACUSTICAVol 98 (2012)

of the pressure to velocity components at low and highfrequencies Whilst the crossover frequency for the highfrequency boost in the pressure channel at first order isnormally in the region of 400Hz for regular loudspeakerlistening here we restore the crossover point to 700Hzsince the subject is always perfectly centred in the virtualloudspeaker array

63 Test Environment and Apparatus

A series of subjective listening tests was conducted in theLarge Rehearsal Room in the Department of Theatre Filmand Television in the University of York The room dimen-sions were 12 times 9 times 35m3 and the spatially averaged T60at 1 kHz was 026 s A low T60 was desired for this studyso the walls were covered with thick heavy curtains asshown in Figure 7 Since the up-mix from 1st to 2nd and 3rd

order Ambisonics concerned only the deterministic part ofthe measured SRIRs it was assumed that no advantagewould be gained from using a more reverberant space

A professional camera dolly track was set up roughlyin the direction of the diagonal of the room It not onlyallowed for testing distances of the real loudspeaker upto 8m but its non-symmetrical position also assured thatearly reflections of the same order from different surfacesdid not easily coincide at the subjects ears but instead ar-rived at different times A single full-range loudspeaker(Genelec 8050A) was mounted on a camera dolly whichenabled it to be noiselessly translated by the experimentassistant to different locations The guiding rope was hungalong the dolly track which was intended to help and guidethe participants when walking toward the sound sourceSince it was not possible to walk exactly on the dolly trackit was decided that the walking path would be directly nextto it as shown in Figure 7 The only weakness of this so-lution was that the sound source horizontal angle variedfrom 1404 degrees at the closest distance (2m) to 358 de-grees at the furthest distance (8m) However this did nothave any effect on the distance judgments for two reasonsFirstly the subjects were allowed (or even encouraged) torotate their head in order to fully utilize the available ITDand ILD cues Secondly the initial head orientation wasnot in any way fixed This combined with the fact thatthere were no clear cues to the subjectrsquos initial orientationin the room at the origin made this small initial angularoffset unimportant Furthermore none of the participantsreported any bias in their assessment based on the horizon-tal offset of the sound source

For trials with binaural presentation high quality openback headphones (AKG-K601) were used which exhibitlow levels of interaural magnitude and group delay dis-tortion Sound field rotation tilt and tumble control wasimplemented via the TrackIR 5 infra-red head trackingsystem [36] resulting in stable virtual images with headrotations The system responsible for playback of virtual-ized sound sources was completely built in the Pure Datavisual programming environment [37] and its combinedlatency (including head-tracker data porting and audio up-date rate) was 20ms

Figure 7 Participant performing a trial during the experiment

64 Procedure

In the experiment subjects entered the test environmentblindfolded and without any prior expectation regardingthe room dimensions its acoustic properties or the testapparatus They were guided by the experimenter to thereference point (the lsquooriginrsquo) After a short explanation ofthe experiment objectives a training session began witha short (3ndash5min) walking-only trial until participants feltcomfortable with walking blindfolded and using a guiderope Next they performed 4ndash6 training trials in which thesame test stimuli to be used in the experiment (speech andpink noise) were played by the loudspeaker at randomlychosen distances No feedback was given and no resultswere recorded after each test trial The end of the trainingsession was clearly announced and after a 1 minute inter-val the first phase of the test began

In test phase I participants were asked to listen to staticsound sources at a randomly chosen points focusing onthe perceived distance They could listen to any audio sam-ple as many times as they wished During the playbackthey were instructed to stay still and refrain from any trans-lational head movements However they were encouragedto rotate their head freely After the playback had stoppedthey were asked to walk guided by the rope to the pointwhere they thought the sound originated from The dis-tance walked was subsequently recorded by the assistantusing a laser measuring tool after which the participantwalked backwards to the origin In the meantime the loud-speaker was noiselessly translated to its new position and

67

ACTA ACUSTICA UNITED WITH ACUSTICA Kearney et al Distance perceptionVol 98 (2012)

the test proceeded Similar to the training session no feed-back was given at any stage

During the first test phase participants had to indicatethe perceived distance for sound sources randomly lo-cated at 2m 4m 6m or 8m Taking into account thatboth speech and pink noise bursts samples were used (ina pseudo-random order) the number of trials in the firstphase added up to 8 Each subject performed all the trialsonly once

Upon completion of the first phase of the test there wasa short (approximately 2 minutes) interval that was re-quired in order to put on the headphones and calibratethe head-tracking system In phase II subjects were alsoasked to identify the sound source distance but this timeusing Ambisonic sound fields presented over headphonesOther than the fact that headphones and the head-trackingsystem were used the test protocol remained the same asin phase I However due to the fact that there were threeplayback configurations to be tested (1st 2nd and 3rd orderAmbisonics) participants had to perform 24 trails insteadof 8 Instead of separate phases for each Ambisonic orderall samples were randomly presented to the subject withinthe same test phase Again subjects performed all the tri-als only once and no feedback was given at any stage

7 Results

The perceived sound source distance (indicated by the dis-tance walked) was collected from 7 subjects for 4 presen-tation points (2m 4m 6m and 8m) two stimuli (femalespeech and pink noise bursts) and four playback options1st 2nd and 3rd Order Ambisonics and real loudspeakerswhich for analysis we will denote FOA SOA TOA andREAL respectively With headphone trials none of theparticipants reported in-head localization however therewere 3 cases were the proximity of the sound source wasvery apparent so participants decided not to move at allIn some cases the virtual sound source was initially local-ized behind the subjects but all participants were able toresolve the confusion by applying head-rotation

We computed the mean values of walked distances micro foreach test condition along with the corresponding standarderrors se(micro) The results are presented separately for eachstimulus type within 95 Confidence Intervals

As expected the perception of distance for the real sour-ces was more accurate for near sources Beyond 4m dis-tance perception was continuously underestimated whichis congruent with the previous studies outlined in sec-tion 2 Furthermore the standard deviation of localiza-tion increases as the source moves further into the diffusefield We also see that unfamiliar stimuli produce greatervariability in subjectsrsquo answers The mean localization ofthe virtual sources follows the reference source localiza-tion well The answers for virtual sources deviate fromtheir means roughly in the same fashion as the answers forreference sources as localization becomes more difficultwithin the diffuse field

Since the study followed the within-subject factorial de-sign with 2(stimuli)4(playback conditions) in order to

0 1 2 3 4 5 6 7 8 90

1

2

3

4

5

6

7

8

9

Real distance [m]

Dis

tance

walk

ed

[m]

FOASOATOAReal

Figure 8 Mean localization of real and virtual sound sources (fe-male speech)

0 1 2 3 4 5 6 7 8 90

1

2

3

4

5

6

7

8

9

Real distance [m]

Dis

tance

walk

ed

[m]

FOASOATOAReal

Figure 9 Mean localization of real and virtual sound sources(pink noise bursts)

investigate the effects of these two factors (referred lateras factors A and B) as well as potential interaction ef-fects for each presentation distance a two-way ANOVAhas been performed The null hypothesis being tested hereis that all the mean perceived distances for all the stimuliand playback methods do not differ significantly

H0 microFOA=microSOA=microTOA=microReal=micro

H1 not all localization means (microi) are the same

No statistically significant effect of stimuli (familiar vsunfamiliar) on the perception of distance has been found(F2m(3 48) = 0835 p = 0365 F4m(3 48) = 20462p = 0159 F6m(3 48) = 2575 p = 0115 F8m(3 48) =20462 p = 0159) For distances of 4m and moreplayback option had also no statistically significant effect(F4m(3 48) = 2192 p = 0101 F6m(3 48) = 0665p = 0577 F8m(3 48) = 0202 p = 0894)

However a statistically significant difference has beendetected for the distance of 2m In larger study de-signs with multiple levels it is advisable to use the Hon-

68

Kearney et al Distance perception ACTA ACUSTICA UNITED WITH ACUSTICAVol 98 (2012)

Table I Mean localization [m] of virtual and real sound sourcesat 2m

microFOA microSOA microTOA microReal

Speech 1119 1389 0841 1638Noise 0877 1001 0902 1641

Table II Correlation coefficients ρ and corresponding pminus valuesfor pairs of distance estimations for real and virtual soundsources (Speech)

ρ p minus valueReal vs FOA 09828 00172Real vs SOA 09960 00040Real vs TOA 09590 00410

Table III Correlation coefficients ρ and corresponding pminusvaluesfor pairs of distance estimations for real and virtual soundsources (Noise)

ρ p minus valueReal vs FOA 09913 00087Real vs SOA 09857 00143Real vs TOA 09972 00028

estly Significant Difference (HSD) approach since thereis an increased risk of spuriously significant differencearisen purely by chance So in order to investigate furtherwhere the difference occurs anHSD has been computed(HSD = 1423m) If we now compile the table of meanperceived distances for the sound sources located at 2mwe can see that all of the above values clearly lie withina single HSD to each other and cannot be distinguishedWe can safely assume then an ANOVA false alarm (typeI error) and no statistically significant effect of playbackmethod for the sources at the distance of 2m as well

Lastly for all the distances no synergetic effects of fac-tors A (stimuli) and B (playback conditions) have beendetected

Additionally we calculated correlation coefficients ρ forpairs of distance estimations for real and virtual soundsources (either 1st 2nd or 3rd order) and two stimuli Inall cases high correlation coefficients have been obtainedwhich confirms our findings that for these particular testconditions the perception of distance of binaurally ren-dered Ambisonic sound fields of orders 1 to 3 cannot bedistinguished from the perception of distance of the realsound sources

8 Discussion

The results presented for real sources corroborate the clas-sic underestimation of source distance as reported in theliterature These results were used as a basis with whichto measure the ability of Ambisonic sound fields of differ-ent orders to present sources at different distances It was

expected that a further underestimation of the source dis-tance would ensue with the binaural rendering as reportedin [17] However this was not the case even for first or-der presentations and the apparent distances of the vir-tual sources matched the real source distances well Oneshould note that the major difference between this studyand that of [17] is our use of head-tracking indicatingthe importance of head-movements in perceiving sourcedistance which develops the findings of Waller [18] andAshmead et al [10] on user interaction in a virtual spaceFurther work is required to quantify the effect of this

Moreover the presented study demonstrates that the en-hanced directional accuracy gained by presenting soundsources in HOA through head-tracked binaural renderingdoes not yield a significant improvement in the perceptionof the source distance What is noteworthy is that for eachorder there is no significant difference in the perception ofthe source location when compared to real-world sourcesWe therefore conclude that sound field directionality fordistance perception is sufficient with 1st order playback

The presence of the ANOVA false alarm at the 2m pointis of interest It is noteworthy that the 2m point representsa source inside the virtual array geometry It is a knownissue that virtual sound sources rendered inside the arrayof loudspeakers cannot be reproduced in a straightforwardway without artifacts Some of these artifacts include in-correct wave-front curvature and insufficient bass boost

In the first case there is ample evidence in the litera-ture to suggest that the wavefront curvature translates to asignificant binaural cues for sound sources near the head[30 38] It was already shown in section 21 that as asource moves closer to the head the levels of the monau-ral transfer function and the ILD both change significantlywith source angle However this effect is not strong at 1mand beyond For sources further away it has been shownin [39] that it is very difficult to assess distance by binauralcues alone

In the second case the requirement for distance com-pensation filtering due to near field effects for the largeloudspeaker radius (327m) and the given source distances(gt2m) is only prominent below 100Hz For the femalespeech test stimuli this will not have an effect since thefirst formant frequencies do not go down below 180HzAlso the current method employed for capturing HRIRsallowed for reliably obtaining filters with a frequencyresponse reaching down to around 170Hz thereby alsoband-limiting the delivery of the pink noise stimuli

Finally there was no significant difference in the resultspresented for different sources although the greater vari-ance in the results for pink noise suggest that the famil-iarity of the source does indeed play a role in the percep-tion of source distance as mentioned in section 23 Futurestudies will investigate the use of these monaural cues fur-ther and will utilize 0th order sound field rendering sinceit will remove the influence of any directional information

Considering the aforementioned study of Bronkhorst etal [14] where the accuracy of distance perception for bin-aural playback increases with the number of reflections

69

ACTA ACUSTICA UNITED WITH ACUSTICA Kearney et al Distance perceptionVol 98 (2012)

our findings demonstrate that the net effect of the monauralcues of direct to reverberant ratio level difference and timeof arrival of early reflections are of greater importance indistance perception for binaural rendering than Ambisonicdirectional accuracy beyond 1st order

9 Conclusions

We have assessed through subjective analysis the per-ceived source distance in virtual Ambisonic sound fieldsin comparison to real world sources The hypothesis testedwas that enhanced directional accuracy of deterministicpart of the sound field may lead to better reconstructionof environmental depth and thus improve the perceptionof sound source distance However it was shown thatAmbisonic reproduction matches the perceived real worldsource distances well even at 1st order and no improvementin this regard was observed when increasing the order Itmust be emphasized though that this analysis applies toAmbisonic-to-binaural decodes with higher order synthe-sis achieved using the directional analysis method of [23]Therefore further work will examine this topic for loud-speaker reproduction for both centre and off-centre listen-ing as well as investigate the effectiveness of HOA synthe-sis in comparison to real world HOA measurements

Acknowledgments

The authors gratefully acknowledge the participation ofthe test subjects for both their time and constructive com-ments as well as the technical support staff at the Depart-ment of Theatre Film and Television at the University ofYork for their assistance in the experimental setups Thisresearch is supported by Science Foundation Ireland

References

[1] L Fauster Stereoscopic techniques in computer graphicsTechnical paper TU Wien 2007

[2] J Lee Head tracking for desktop VR displays using theWii remote httpjohnnyleenetprojectswiiaccessed 30th Sept 2011

[3] D R Begault Direct comparison of the impact of headtracking reverberation and individualized head-relatedtransfer functions on the spatial perception of a virtualsound source J Audio Eng Soc 49 (2001) 904ndash916

[4] M Otani T Hirahara Auditory artifacts due to switchinghead-related transfer functions of a dynamic virtual audi-tory display IEICE Trans Fundam Electron CommunComput Sci E91-A (2008) 1320ndash1328

[5] V Pulkki Virtual sound source positioning using VectorBase Amplitude Panning J Audio Eng Soc 45 (1997)456ndash466

[6] A J Berkhout A Holographic Approach to Acoustic Con-trol J Audio Eng Soc 36 (1988) 977ndash995

[7] M A Gerzon Periphony With-height sound reproductionJ Audio Eng Soc 21 (1973) 2ndash10

[8] F Rumsey Spatial quality evaluation for reproducedsound Terminology meaning and a scene-based para-digm J Audio Eng Soc 50 (2002) 651ndash666

[9] J Blauert Communication acoustics Springer 2008

[10] D H Ashmead D L Davis A Northington Contributionof listenersrsquo approaching motion to auditory distance per-ception J Exp Psy Hum Percep and Perform 21 (1995)239ndash256

[11] E Czerwinski A Voishvillo S Alexandrov A TerekhovPropagation distortion in sound systems Can we avoid itJ Audio Eng Soc 48 (2000) 30ndash48

[12] S H Nielsen Auditory distance perception in differentrooms J Audio Eng Soc 41 (1993) 755ndash770

[13] M B Gardner Distance estimation of 0 or apparent 0oriented speech signals in anechoic space J Acoust SocAm 45 (1969) 47ndash53

[14] A W Bronkhorst T Houtgast Auditory distance percep-tion in rooms Nature 397 (1999) 517ndash520

[15] J B Allen D A Berkley Image method for efficientlysimulating small-room acoustics J Acoust Soc Am 65(1979) 943ndash950

[16] M Rychtarikova T V d Bogaert G Vermeir J WoutersBinaural sound source localization in real and virtualrooms J Audio Eng Soc 57 (2009) 205ndash220

[17] J S Chan C Maguinness D Lisiecka C Ennis MLarkin C OrsquoSullivan F Newell Comparing audiovisualdistance perception in various real and virtual environ-ments Proc of the 32nd Euro Conf on Vis Percep Re-gensburg Germany 2009

[18] D Waller Factors affecting the perception of interobjectdistances in virtual environments Presence Teleoper Vir-tual Environ 8 (1999) 657ndash670

[19] A McKeag D McGrath Sound field format to binauraldecoder with head-tracking Proc of the 6th Australian Re-gional Convention of the AES 1996

[20] M Noisternig A Sontacchi T Musil R Holdrich A3D Ambisonic based binaural sound reproduction systemProc of the 24th Int Conf of the Audio Eng Soc AlbertaCanada 2003

[21] B-I Dalenback M Str

omberg Real time walkthrough au-

ralization - the first year Proc of the Inst of AcousCopenhagen Denmark 2006

[22] C Masterson S Adams G Kearney F Boland A methodfor head related impulse response simplification Procof the 17th European Signal Processing Conference (EU-SIPCO) Glasgow Scotland 2009

[23] J Merimaa V Pulkki Spatial impulse response renderingi Analysis and synthesis J Audio Eng Soc 53 (2005)

[24] W M Hartmann Localization of sound in rooms JAcoust Soc Am 74 (1983) 1380ndash1391

[25] D Griesinger Spatial impression and envelopment in smallrooms Proc of the 103rd Conv of the Audio Eng SocNew York USA 1997

[26] G Kearney M Gorzel H Rice F Boland Depth per-ception in interactive virual acoustic environments usinghigher order ambisonic soundfields Proc of the 2nd IntAmbisonics Symp Paris France 2010

[27] A Farina Simultaneous measurement of impulse responseand distortion with a swept-sine technique Proc of the108th Conv of the Audio Eng Soc Paris France 2000

[28] M Gerzon The design of precisely coincident microphonearrays for stereoand surround sound Proc of the 50thConv of the Audio Eng Soc London UK 1975

70

Kearney et al Distance perception ACTA ACUSTICA UNITED WITH ACUSTICAVol 98 (2012)

[29] C Guastavino B F G Katz Perceptual evaluation ofmulti-dimensional spatial audio reproduction J AcoustSoc Am 116 (2004) 1105ndash1115

[30] P Zahorik Assessing auditory distance perception usingvirtual acoustics J Acoust Soc Am 111 (2002) 1832ndash1846

[31] J M Loomis R L Klatzky J W Philbeck R G Goll-edge Assessing auditory distance perception using percep-tually directed action Perception And Psychophysics 60(1998) 966ndash980

[32] T Y Grechkin T D Nguyen J M Plumert J F CremerJ K Kearney How does presentation method and measure-ment protocol affect distance estimation in real and virtualenvironments ACM Trans Appl Percept 7 (2010) 261ndash2618

[33] S Bertet Formats audio 3d hieacuterarchiques Caracteacuterisationobjective et perceptive des systeacutemes ambisonicsdrsquoordressupeacuterieurs PhD dissertation INSA Lyon 2008

[34] W M Fisher G R Doddington K M Goudie-MarshallThe darpa speech recognition research database Specifica-tions and status Proc of the DARPA Workshop on SpeechRecognition 1986

[35] M A Gerzon G J Barton Ambisonic decoders forHDTV Proc of the 92nd Conv of the Audio Eng SocVienna Austria 1992

[36] NaturalPoint Trackir 5 httpwwwnaturalpointcomtrackir accessed 30th Sept 2011

[37] M Puckette Pure data httppuredatainfo ac-cessed 30th Sept 2011

[38] P Zahorik D S Brungart A W Bronkhorst Auditory dis-tance perception in humans A summary of past and presentresearch Acta Acustica united with Acustica 91 (2005)409ndash420

[39] H Wittek Perceptual differences between Wavefield Syn-thesis and Stereophony Department of Music and SoundRecording School of Arts Communication and Humani-ties University of Surrey UK 2007

71

Page 8: 1 ) -0 #0*- ! 1 ) #! !-* 0* ! ) *- ! ) ) ) *#! #0! *sigmedia/pmwiki/uploads/Main...# ( " &/4$/' ,$ +$ *- ! ) %- #! ! !- ) - 1 ) -0 #0*- ! 1 ) #! !-* 0* ! ) *- ! )) ) *#! #0! * K05

ACTA ACUSTICA UNITED WITH ACUSTICA Kearney et al Distance perceptionVol 98 (2012)

the test proceeded Similar to the training session no feed-back was given at any stage

During the first test phase participants had to indicatethe perceived distance for sound sources randomly lo-cated at 2m 4m 6m or 8m Taking into account thatboth speech and pink noise bursts samples were used (ina pseudo-random order) the number of trials in the firstphase added up to 8 Each subject performed all the trialsonly once

Upon completion of the first phase of the test there wasa short (approximately 2 minutes) interval that was re-quired in order to put on the headphones and calibratethe head-tracking system In phase II subjects were alsoasked to identify the sound source distance but this timeusing Ambisonic sound fields presented over headphonesOther than the fact that headphones and the head-trackingsystem were used the test protocol remained the same asin phase I However due to the fact that there were threeplayback configurations to be tested (1st 2nd and 3rd orderAmbisonics) participants had to perform 24 trails insteadof 8 Instead of separate phases for each Ambisonic orderall samples were randomly presented to the subject withinthe same test phase Again subjects performed all the tri-als only once and no feedback was given at any stage

7 Results

The perceived sound source distance (indicated by the dis-tance walked) was collected from 7 subjects for 4 presen-tation points (2m 4m 6m and 8m) two stimuli (femalespeech and pink noise bursts) and four playback options1st 2nd and 3rd Order Ambisonics and real loudspeakerswhich for analysis we will denote FOA SOA TOA andREAL respectively With headphone trials none of theparticipants reported in-head localization however therewere 3 cases were the proximity of the sound source wasvery apparent so participants decided not to move at allIn some cases the virtual sound source was initially local-ized behind the subjects but all participants were able toresolve the confusion by applying head-rotation

We computed the mean values of walked distances micro foreach test condition along with the corresponding standarderrors se(micro) The results are presented separately for eachstimulus type within 95 Confidence Intervals

As expected the perception of distance for the real sour-ces was more accurate for near sources Beyond 4m dis-tance perception was continuously underestimated whichis congruent with the previous studies outlined in sec-tion 2 Furthermore the standard deviation of localiza-tion increases as the source moves further into the diffusefield We also see that unfamiliar stimuli produce greatervariability in subjectsrsquo answers The mean localization ofthe virtual sources follows the reference source localiza-tion well The answers for virtual sources deviate fromtheir means roughly in the same fashion as the answers forreference sources as localization becomes more difficultwithin the diffuse field

Since the study followed the within-subject factorial de-sign with 2(stimuli)4(playback conditions) in order to

0 1 2 3 4 5 6 7 8 90

1

2

3

4

5

6

7

8

9

Real distance [m]

Dis

tance

walk

ed

[m]

FOASOATOAReal

Figure 8 Mean localization of real and virtual sound sources (fe-male speech)

0 1 2 3 4 5 6 7 8 90

1

2

3

4

5

6

7

8

9

Real distance [m]

Dis

tance

walk

ed

[m]

FOASOATOAReal

Figure 9 Mean localization of real and virtual sound sources(pink noise bursts)

investigate the effects of these two factors (referred lateras factors A and B) as well as potential interaction ef-fects for each presentation distance a two-way ANOVAhas been performed The null hypothesis being tested hereis that all the mean perceived distances for all the stimuliand playback methods do not differ significantly

H0 microFOA=microSOA=microTOA=microReal=micro

H1 not all localization means (microi) are the same

No statistically significant effect of stimuli (familiar vsunfamiliar) on the perception of distance has been found(F2m(3 48) = 0835 p = 0365 F4m(3 48) = 20462p = 0159 F6m(3 48) = 2575 p = 0115 F8m(3 48) =20462 p = 0159) For distances of 4m and moreplayback option had also no statistically significant effect(F4m(3 48) = 2192 p = 0101 F6m(3 48) = 0665p = 0577 F8m(3 48) = 0202 p = 0894)

However a statistically significant difference has beendetected for the distance of 2m In larger study de-signs with multiple levels it is advisable to use the Hon-

68

Kearney et al Distance perception ACTA ACUSTICA UNITED WITH ACUSTICAVol 98 (2012)

Table I Mean localization [m] of virtual and real sound sourcesat 2m

microFOA microSOA microTOA microReal

Speech 1119 1389 0841 1638Noise 0877 1001 0902 1641

Table II Correlation coefficients ρ and corresponding pminus valuesfor pairs of distance estimations for real and virtual soundsources (Speech)

ρ p minus valueReal vs FOA 09828 00172Real vs SOA 09960 00040Real vs TOA 09590 00410

Table III Correlation coefficients ρ and corresponding pminusvaluesfor pairs of distance estimations for real and virtual soundsources (Noise)

ρ p minus valueReal vs FOA 09913 00087Real vs SOA 09857 00143Real vs TOA 09972 00028

estly Significant Difference (HSD) approach since thereis an increased risk of spuriously significant differencearisen purely by chance So in order to investigate furtherwhere the difference occurs anHSD has been computed(HSD = 1423m) If we now compile the table of meanperceived distances for the sound sources located at 2mwe can see that all of the above values clearly lie withina single HSD to each other and cannot be distinguishedWe can safely assume then an ANOVA false alarm (typeI error) and no statistically significant effect of playbackmethod for the sources at the distance of 2m as well

Lastly for all the distances no synergetic effects of fac-tors A (stimuli) and B (playback conditions) have beendetected

Additionally we calculated correlation coefficients ρ forpairs of distance estimations for real and virtual soundsources (either 1st 2nd or 3rd order) and two stimuli Inall cases high correlation coefficients have been obtainedwhich confirms our findings that for these particular testconditions the perception of distance of binaurally ren-dered Ambisonic sound fields of orders 1 to 3 cannot bedistinguished from the perception of distance of the realsound sources

8 Discussion

The results presented for real sources corroborate the clas-sic underestimation of source distance as reported in theliterature These results were used as a basis with whichto measure the ability of Ambisonic sound fields of differ-ent orders to present sources at different distances It was

expected that a further underestimation of the source dis-tance would ensue with the binaural rendering as reportedin [17] However this was not the case even for first or-der presentations and the apparent distances of the vir-tual sources matched the real source distances well Oneshould note that the major difference between this studyand that of [17] is our use of head-tracking indicatingthe importance of head-movements in perceiving sourcedistance which develops the findings of Waller [18] andAshmead et al [10] on user interaction in a virtual spaceFurther work is required to quantify the effect of this

Moreover the presented study demonstrates that the en-hanced directional accuracy gained by presenting soundsources in HOA through head-tracked binaural renderingdoes not yield a significant improvement in the perceptionof the source distance What is noteworthy is that for eachorder there is no significant difference in the perception ofthe source location when compared to real-world sourcesWe therefore conclude that sound field directionality fordistance perception is sufficient with 1st order playback

The presence of the ANOVA false alarm at the 2m pointis of interest It is noteworthy that the 2m point representsa source inside the virtual array geometry It is a knownissue that virtual sound sources rendered inside the arrayof loudspeakers cannot be reproduced in a straightforwardway without artifacts Some of these artifacts include in-correct wave-front curvature and insufficient bass boost

In the first case there is ample evidence in the litera-ture to suggest that the wavefront curvature translates to asignificant binaural cues for sound sources near the head[30 38] It was already shown in section 21 that as asource moves closer to the head the levels of the monau-ral transfer function and the ILD both change significantlywith source angle However this effect is not strong at 1mand beyond For sources further away it has been shownin [39] that it is very difficult to assess distance by binauralcues alone

In the second case the requirement for distance com-pensation filtering due to near field effects for the largeloudspeaker radius (327m) and the given source distances(gt2m) is only prominent below 100Hz For the femalespeech test stimuli this will not have an effect since thefirst formant frequencies do not go down below 180HzAlso the current method employed for capturing HRIRsallowed for reliably obtaining filters with a frequencyresponse reaching down to around 170Hz thereby alsoband-limiting the delivery of the pink noise stimuli

Finally there was no significant difference in the resultspresented for different sources although the greater vari-ance in the results for pink noise suggest that the famil-iarity of the source does indeed play a role in the percep-tion of source distance as mentioned in section 23 Futurestudies will investigate the use of these monaural cues fur-ther and will utilize 0th order sound field rendering sinceit will remove the influence of any directional information

Considering the aforementioned study of Bronkhorst etal [14] where the accuracy of distance perception for bin-aural playback increases with the number of reflections

69

ACTA ACUSTICA UNITED WITH ACUSTICA Kearney et al Distance perceptionVol 98 (2012)

our findings demonstrate that the net effect of the monauralcues of direct to reverberant ratio level difference and timeof arrival of early reflections are of greater importance indistance perception for binaural rendering than Ambisonicdirectional accuracy beyond 1st order

9 Conclusions

We have assessed through subjective analysis the per-ceived source distance in virtual Ambisonic sound fieldsin comparison to real world sources The hypothesis testedwas that enhanced directional accuracy of deterministicpart of the sound field may lead to better reconstructionof environmental depth and thus improve the perceptionof sound source distance However it was shown thatAmbisonic reproduction matches the perceived real worldsource distances well even at 1st order and no improvementin this regard was observed when increasing the order Itmust be emphasized though that this analysis applies toAmbisonic-to-binaural decodes with higher order synthe-sis achieved using the directional analysis method of [23]Therefore further work will examine this topic for loud-speaker reproduction for both centre and off-centre listen-ing as well as investigate the effectiveness of HOA synthe-sis in comparison to real world HOA measurements

Acknowledgments

The authors gratefully acknowledge the participation ofthe test subjects for both their time and constructive com-ments as well as the technical support staff at the Depart-ment of Theatre Film and Television at the University ofYork for their assistance in the experimental setups Thisresearch is supported by Science Foundation Ireland

References

[1] L Fauster Stereoscopic techniques in computer graphicsTechnical paper TU Wien 2007

[2] J Lee Head tracking for desktop VR displays using theWii remote httpjohnnyleenetprojectswiiaccessed 30th Sept 2011

[3] D R Begault Direct comparison of the impact of headtracking reverberation and individualized head-relatedtransfer functions on the spatial perception of a virtualsound source J Audio Eng Soc 49 (2001) 904ndash916

[4] M Otani T Hirahara Auditory artifacts due to switchinghead-related transfer functions of a dynamic virtual audi-tory display IEICE Trans Fundam Electron CommunComput Sci E91-A (2008) 1320ndash1328

[5] V Pulkki Virtual sound source positioning using VectorBase Amplitude Panning J Audio Eng Soc 45 (1997)456ndash466

[6] A J Berkhout A Holographic Approach to Acoustic Con-trol J Audio Eng Soc 36 (1988) 977ndash995

[7] M A Gerzon Periphony With-height sound reproductionJ Audio Eng Soc 21 (1973) 2ndash10

[8] F Rumsey Spatial quality evaluation for reproducedsound Terminology meaning and a scene-based para-digm J Audio Eng Soc 50 (2002) 651ndash666

[9] J Blauert Communication acoustics Springer 2008

[10] D H Ashmead D L Davis A Northington Contributionof listenersrsquo approaching motion to auditory distance per-ception J Exp Psy Hum Percep and Perform 21 (1995)239ndash256

[11] E Czerwinski A Voishvillo S Alexandrov A TerekhovPropagation distortion in sound systems Can we avoid itJ Audio Eng Soc 48 (2000) 30ndash48

[12] S H Nielsen Auditory distance perception in differentrooms J Audio Eng Soc 41 (1993) 755ndash770

[13] M B Gardner Distance estimation of 0 or apparent 0oriented speech signals in anechoic space J Acoust SocAm 45 (1969) 47ndash53

[14] A W Bronkhorst T Houtgast Auditory distance percep-tion in rooms Nature 397 (1999) 517ndash520

[15] J B Allen D A Berkley Image method for efficientlysimulating small-room acoustics J Acoust Soc Am 65(1979) 943ndash950

[16] M Rychtarikova T V d Bogaert G Vermeir J WoutersBinaural sound source localization in real and virtualrooms J Audio Eng Soc 57 (2009) 205ndash220

[17] J S Chan C Maguinness D Lisiecka C Ennis MLarkin C OrsquoSullivan F Newell Comparing audiovisualdistance perception in various real and virtual environ-ments Proc of the 32nd Euro Conf on Vis Percep Re-gensburg Germany 2009

[18] D Waller Factors affecting the perception of interobjectdistances in virtual environments Presence Teleoper Vir-tual Environ 8 (1999) 657ndash670

[19] A McKeag D McGrath Sound field format to binauraldecoder with head-tracking Proc of the 6th Australian Re-gional Convention of the AES 1996

[20] M Noisternig A Sontacchi T Musil R Holdrich A3D Ambisonic based binaural sound reproduction systemProc of the 24th Int Conf of the Audio Eng Soc AlbertaCanada 2003

[21] B-I Dalenback M Str

omberg Real time walkthrough au-

ralization - the first year Proc of the Inst of AcousCopenhagen Denmark 2006

[22] C Masterson S Adams G Kearney F Boland A methodfor head related impulse response simplification Procof the 17th European Signal Processing Conference (EU-SIPCO) Glasgow Scotland 2009

[23] J Merimaa V Pulkki Spatial impulse response renderingi Analysis and synthesis J Audio Eng Soc 53 (2005)

[24] W M Hartmann Localization of sound in rooms JAcoust Soc Am 74 (1983) 1380ndash1391

[25] D Griesinger Spatial impression and envelopment in smallrooms Proc of the 103rd Conv of the Audio Eng SocNew York USA 1997

[26] G Kearney M Gorzel H Rice F Boland Depth per-ception in interactive virual acoustic environments usinghigher order ambisonic soundfields Proc of the 2nd IntAmbisonics Symp Paris France 2010

[27] A Farina Simultaneous measurement of impulse responseand distortion with a swept-sine technique Proc of the108th Conv of the Audio Eng Soc Paris France 2000

[28] M Gerzon The design of precisely coincident microphonearrays for stereoand surround sound Proc of the 50thConv of the Audio Eng Soc London UK 1975

70

Kearney et al Distance perception ACTA ACUSTICA UNITED WITH ACUSTICAVol 98 (2012)

[29] C Guastavino B F G Katz Perceptual evaluation ofmulti-dimensional spatial audio reproduction J AcoustSoc Am 116 (2004) 1105ndash1115

[30] P Zahorik Assessing auditory distance perception usingvirtual acoustics J Acoust Soc Am 111 (2002) 1832ndash1846

[31] J M Loomis R L Klatzky J W Philbeck R G Goll-edge Assessing auditory distance perception using percep-tually directed action Perception And Psychophysics 60(1998) 966ndash980

[32] T Y Grechkin T D Nguyen J M Plumert J F CremerJ K Kearney How does presentation method and measure-ment protocol affect distance estimation in real and virtualenvironments ACM Trans Appl Percept 7 (2010) 261ndash2618

[33] S Bertet Formats audio 3d hieacuterarchiques Caracteacuterisationobjective et perceptive des systeacutemes ambisonicsdrsquoordressupeacuterieurs PhD dissertation INSA Lyon 2008

[34] W M Fisher G R Doddington K M Goudie-MarshallThe darpa speech recognition research database Specifica-tions and status Proc of the DARPA Workshop on SpeechRecognition 1986

[35] M A Gerzon G J Barton Ambisonic decoders forHDTV Proc of the 92nd Conv of the Audio Eng SocVienna Austria 1992

[36] NaturalPoint Trackir 5 httpwwwnaturalpointcomtrackir accessed 30th Sept 2011

[37] M Puckette Pure data httppuredatainfo ac-cessed 30th Sept 2011

[38] P Zahorik D S Brungart A W Bronkhorst Auditory dis-tance perception in humans A summary of past and presentresearch Acta Acustica united with Acustica 91 (2005)409ndash420

[39] H Wittek Perceptual differences between Wavefield Syn-thesis and Stereophony Department of Music and SoundRecording School of Arts Communication and Humani-ties University of Surrey UK 2007

71

Page 9: 1 ) -0 #0*- ! 1 ) #! !-* 0* ! ) *- ! ) ) ) *#! #0! *sigmedia/pmwiki/uploads/Main...# ( " &/4$/' ,$ +$ *- ! ) %- #! ! !- ) - 1 ) -0 #0*- ! 1 ) #! !-* 0* ! ) *- ! )) ) *#! #0! * K05

Kearney et al Distance perception ACTA ACUSTICA UNITED WITH ACUSTICAVol 98 (2012)

Table I Mean localization [m] of virtual and real sound sourcesat 2m

microFOA microSOA microTOA microReal

Speech 1119 1389 0841 1638Noise 0877 1001 0902 1641

Table II Correlation coefficients ρ and corresponding pminus valuesfor pairs of distance estimations for real and virtual soundsources (Speech)

ρ p minus valueReal vs FOA 09828 00172Real vs SOA 09960 00040Real vs TOA 09590 00410

Table III Correlation coefficients ρ and corresponding pminusvaluesfor pairs of distance estimations for real and virtual soundsources (Noise)

ρ p minus valueReal vs FOA 09913 00087Real vs SOA 09857 00143Real vs TOA 09972 00028

estly Significant Difference (HSD) approach since thereis an increased risk of spuriously significant differencearisen purely by chance So in order to investigate furtherwhere the difference occurs anHSD has been computed(HSD = 1423m) If we now compile the table of meanperceived distances for the sound sources located at 2mwe can see that all of the above values clearly lie withina single HSD to each other and cannot be distinguishedWe can safely assume then an ANOVA false alarm (typeI error) and no statistically significant effect of playbackmethod for the sources at the distance of 2m as well

Lastly for all the distances no synergetic effects of fac-tors A (stimuli) and B (playback conditions) have beendetected

Additionally we calculated correlation coefficients ρ forpairs of distance estimations for real and virtual soundsources (either 1st 2nd or 3rd order) and two stimuli Inall cases high correlation coefficients have been obtainedwhich confirms our findings that for these particular testconditions the perception of distance of binaurally ren-dered Ambisonic sound fields of orders 1 to 3 cannot bedistinguished from the perception of distance of the realsound sources

8 Discussion

The results presented for real sources corroborate the clas-sic underestimation of source distance as reported in theliterature These results were used as a basis with whichto measure the ability of Ambisonic sound fields of differ-ent orders to present sources at different distances It was

expected that a further underestimation of the source dis-tance would ensue with the binaural rendering as reportedin [17] However this was not the case even for first or-der presentations and the apparent distances of the vir-tual sources matched the real source distances well Oneshould note that the major difference between this studyand that of [17] is our use of head-tracking indicatingthe importance of head-movements in perceiving sourcedistance which develops the findings of Waller [18] andAshmead et al [10] on user interaction in a virtual spaceFurther work is required to quantify the effect of this

Moreover the presented study demonstrates that the en-hanced directional accuracy gained by presenting soundsources in HOA through head-tracked binaural renderingdoes not yield a significant improvement in the perceptionof the source distance What is noteworthy is that for eachorder there is no significant difference in the perception ofthe source location when compared to real-world sourcesWe therefore conclude that sound field directionality fordistance perception is sufficient with 1st order playback

The presence of the ANOVA false alarm at the 2m pointis of interest It is noteworthy that the 2m point representsa source inside the virtual array geometry It is a knownissue that virtual sound sources rendered inside the arrayof loudspeakers cannot be reproduced in a straightforwardway without artifacts Some of these artifacts include in-correct wave-front curvature and insufficient bass boost

In the first case there is ample evidence in the litera-ture to suggest that the wavefront curvature translates to asignificant binaural cues for sound sources near the head[30 38] It was already shown in section 21 that as asource moves closer to the head the levels of the monau-ral transfer function and the ILD both change significantlywith source angle However this effect is not strong at 1mand beyond For sources further away it has been shownin [39] that it is very difficult to assess distance by binauralcues alone

In the second case the requirement for distance com-pensation filtering due to near field effects for the largeloudspeaker radius (327m) and the given source distances(gt2m) is only prominent below 100Hz For the femalespeech test stimuli this will not have an effect since thefirst formant frequencies do not go down below 180HzAlso the current method employed for capturing HRIRsallowed for reliably obtaining filters with a frequencyresponse reaching down to around 170Hz thereby alsoband-limiting the delivery of the pink noise stimuli

Finally there was no significant difference in the resultspresented for different sources although the greater vari-ance in the results for pink noise suggest that the famil-iarity of the source does indeed play a role in the percep-tion of source distance as mentioned in section 23 Futurestudies will investigate the use of these monaural cues fur-ther and will utilize 0th order sound field rendering sinceit will remove the influence of any directional information

Considering the aforementioned study of Bronkhorst etal [14] where the accuracy of distance perception for bin-aural playback increases with the number of reflections

69

ACTA ACUSTICA UNITED WITH ACUSTICA Kearney et al Distance perceptionVol 98 (2012)

our findings demonstrate that the net effect of the monauralcues of direct to reverberant ratio level difference and timeof arrival of early reflections are of greater importance indistance perception for binaural rendering than Ambisonicdirectional accuracy beyond 1st order

9 Conclusions

We have assessed through subjective analysis the per-ceived source distance in virtual Ambisonic sound fieldsin comparison to real world sources The hypothesis testedwas that enhanced directional accuracy of deterministicpart of the sound field may lead to better reconstructionof environmental depth and thus improve the perceptionof sound source distance However it was shown thatAmbisonic reproduction matches the perceived real worldsource distances well even at 1st order and no improvementin this regard was observed when increasing the order Itmust be emphasized though that this analysis applies toAmbisonic-to-binaural decodes with higher order synthe-sis achieved using the directional analysis method of [23]Therefore further work will examine this topic for loud-speaker reproduction for both centre and off-centre listen-ing as well as investigate the effectiveness of HOA synthe-sis in comparison to real world HOA measurements

Acknowledgments

The authors gratefully acknowledge the participation ofthe test subjects for both their time and constructive com-ments as well as the technical support staff at the Depart-ment of Theatre Film and Television at the University ofYork for their assistance in the experimental setups Thisresearch is supported by Science Foundation Ireland

References

[1] L Fauster Stereoscopic techniques in computer graphicsTechnical paper TU Wien 2007

[2] J Lee Head tracking for desktop VR displays using theWii remote httpjohnnyleenetprojectswiiaccessed 30th Sept 2011

[3] D R Begault Direct comparison of the impact of headtracking reverberation and individualized head-relatedtransfer functions on the spatial perception of a virtualsound source J Audio Eng Soc 49 (2001) 904ndash916

[4] M Otani T Hirahara Auditory artifacts due to switchinghead-related transfer functions of a dynamic virtual audi-tory display IEICE Trans Fundam Electron CommunComput Sci E91-A (2008) 1320ndash1328

[5] V Pulkki Virtual sound source positioning using VectorBase Amplitude Panning J Audio Eng Soc 45 (1997)456ndash466

[6] A J Berkhout A Holographic Approach to Acoustic Con-trol J Audio Eng Soc 36 (1988) 977ndash995

[7] M A Gerzon Periphony With-height sound reproductionJ Audio Eng Soc 21 (1973) 2ndash10

[8] F Rumsey Spatial quality evaluation for reproducedsound Terminology meaning and a scene-based para-digm J Audio Eng Soc 50 (2002) 651ndash666

[9] J Blauert Communication acoustics Springer 2008

[10] D H Ashmead D L Davis A Northington Contributionof listenersrsquo approaching motion to auditory distance per-ception J Exp Psy Hum Percep and Perform 21 (1995)239ndash256

[11] E Czerwinski A Voishvillo S Alexandrov A TerekhovPropagation distortion in sound systems Can we avoid itJ Audio Eng Soc 48 (2000) 30ndash48

[12] S H Nielsen Auditory distance perception in differentrooms J Audio Eng Soc 41 (1993) 755ndash770

[13] M B Gardner Distance estimation of 0 or apparent 0oriented speech signals in anechoic space J Acoust SocAm 45 (1969) 47ndash53

[14] A W Bronkhorst T Houtgast Auditory distance percep-tion in rooms Nature 397 (1999) 517ndash520

[15] J B Allen D A Berkley Image method for efficientlysimulating small-room acoustics J Acoust Soc Am 65(1979) 943ndash950

[16] M Rychtarikova T V d Bogaert G Vermeir J WoutersBinaural sound source localization in real and virtualrooms J Audio Eng Soc 57 (2009) 205ndash220

[17] J S Chan C Maguinness D Lisiecka C Ennis MLarkin C OrsquoSullivan F Newell Comparing audiovisualdistance perception in various real and virtual environ-ments Proc of the 32nd Euro Conf on Vis Percep Re-gensburg Germany 2009

[18] D Waller Factors affecting the perception of interobjectdistances in virtual environments Presence Teleoper Vir-tual Environ 8 (1999) 657ndash670

[19] A McKeag D McGrath Sound field format to binauraldecoder with head-tracking Proc of the 6th Australian Re-gional Convention of the AES 1996

[20] M Noisternig A Sontacchi T Musil R Holdrich A3D Ambisonic based binaural sound reproduction systemProc of the 24th Int Conf of the Audio Eng Soc AlbertaCanada 2003

[21] B-I Dalenback M Str

omberg Real time walkthrough au-

ralization - the first year Proc of the Inst of AcousCopenhagen Denmark 2006

[22] C Masterson S Adams G Kearney F Boland A methodfor head related impulse response simplification Procof the 17th European Signal Processing Conference (EU-SIPCO) Glasgow Scotland 2009

[23] J Merimaa V Pulkki Spatial impulse response renderingi Analysis and synthesis J Audio Eng Soc 53 (2005)

[24] W M Hartmann Localization of sound in rooms JAcoust Soc Am 74 (1983) 1380ndash1391

[25] D Griesinger Spatial impression and envelopment in smallrooms Proc of the 103rd Conv of the Audio Eng SocNew York USA 1997

[26] G Kearney M Gorzel H Rice F Boland Depth per-ception in interactive virual acoustic environments usinghigher order ambisonic soundfields Proc of the 2nd IntAmbisonics Symp Paris France 2010

[27] A Farina Simultaneous measurement of impulse responseand distortion with a swept-sine technique Proc of the108th Conv of the Audio Eng Soc Paris France 2000

[28] M Gerzon The design of precisely coincident microphonearrays for stereoand surround sound Proc of the 50thConv of the Audio Eng Soc London UK 1975

70

Kearney et al Distance perception ACTA ACUSTICA UNITED WITH ACUSTICAVol 98 (2012)

[29] C Guastavino B F G Katz Perceptual evaluation ofmulti-dimensional spatial audio reproduction J AcoustSoc Am 116 (2004) 1105ndash1115

[30] P Zahorik Assessing auditory distance perception usingvirtual acoustics J Acoust Soc Am 111 (2002) 1832ndash1846

[31] J M Loomis R L Klatzky J W Philbeck R G Goll-edge Assessing auditory distance perception using percep-tually directed action Perception And Psychophysics 60(1998) 966ndash980

[32] T Y Grechkin T D Nguyen J M Plumert J F CremerJ K Kearney How does presentation method and measure-ment protocol affect distance estimation in real and virtualenvironments ACM Trans Appl Percept 7 (2010) 261ndash2618

[33] S Bertet Formats audio 3d hieacuterarchiques Caracteacuterisationobjective et perceptive des systeacutemes ambisonicsdrsquoordressupeacuterieurs PhD dissertation INSA Lyon 2008

[34] W M Fisher G R Doddington K M Goudie-MarshallThe darpa speech recognition research database Specifica-tions and status Proc of the DARPA Workshop on SpeechRecognition 1986

[35] M A Gerzon G J Barton Ambisonic decoders forHDTV Proc of the 92nd Conv of the Audio Eng SocVienna Austria 1992

[36] NaturalPoint Trackir 5 httpwwwnaturalpointcomtrackir accessed 30th Sept 2011

[37] M Puckette Pure data httppuredatainfo ac-cessed 30th Sept 2011

[38] P Zahorik D S Brungart A W Bronkhorst Auditory dis-tance perception in humans A summary of past and presentresearch Acta Acustica united with Acustica 91 (2005)409ndash420

[39] H Wittek Perceptual differences between Wavefield Syn-thesis and Stereophony Department of Music and SoundRecording School of Arts Communication and Humani-ties University of Surrey UK 2007

71

Page 10: 1 ) -0 #0*- ! 1 ) #! !-* 0* ! ) *- ! ) ) ) *#! #0! *sigmedia/pmwiki/uploads/Main...# ( " &/4$/' ,$ +$ *- ! ) %- #! ! !- ) - 1 ) -0 #0*- ! 1 ) #! !-* 0* ! ) *- ! )) ) *#! #0! * K05

ACTA ACUSTICA UNITED WITH ACUSTICA Kearney et al Distance perceptionVol 98 (2012)

our findings demonstrate that the net effect of the monauralcues of direct to reverberant ratio level difference and timeof arrival of early reflections are of greater importance indistance perception for binaural rendering than Ambisonicdirectional accuracy beyond 1st order

9 Conclusions

We have assessed through subjective analysis the per-ceived source distance in virtual Ambisonic sound fieldsin comparison to real world sources The hypothesis testedwas that enhanced directional accuracy of deterministicpart of the sound field may lead to better reconstructionof environmental depth and thus improve the perceptionof sound source distance However it was shown thatAmbisonic reproduction matches the perceived real worldsource distances well even at 1st order and no improvementin this regard was observed when increasing the order Itmust be emphasized though that this analysis applies toAmbisonic-to-binaural decodes with higher order synthe-sis achieved using the directional analysis method of [23]Therefore further work will examine this topic for loud-speaker reproduction for both centre and off-centre listen-ing as well as investigate the effectiveness of HOA synthe-sis in comparison to real world HOA measurements

Acknowledgments

The authors gratefully acknowledge the participation ofthe test subjects for both their time and constructive com-ments as well as the technical support staff at the Depart-ment of Theatre Film and Television at the University ofYork for their assistance in the experimental setups Thisresearch is supported by Science Foundation Ireland

References

[1] L Fauster Stereoscopic techniques in computer graphicsTechnical paper TU Wien 2007

[2] J Lee Head tracking for desktop VR displays using theWii remote httpjohnnyleenetprojectswiiaccessed 30th Sept 2011

[3] D R Begault Direct comparison of the impact of headtracking reverberation and individualized head-relatedtransfer functions on the spatial perception of a virtualsound source J Audio Eng Soc 49 (2001) 904ndash916

[4] M Otani T Hirahara Auditory artifacts due to switchinghead-related transfer functions of a dynamic virtual audi-tory display IEICE Trans Fundam Electron CommunComput Sci E91-A (2008) 1320ndash1328

[5] V Pulkki Virtual sound source positioning using VectorBase Amplitude Panning J Audio Eng Soc 45 (1997)456ndash466

[6] A J Berkhout A Holographic Approach to Acoustic Con-trol J Audio Eng Soc 36 (1988) 977ndash995

[7] M A Gerzon Periphony With-height sound reproductionJ Audio Eng Soc 21 (1973) 2ndash10

[8] F Rumsey Spatial quality evaluation for reproducedsound Terminology meaning and a scene-based para-digm J Audio Eng Soc 50 (2002) 651ndash666

[9] J Blauert Communication acoustics Springer 2008

[10] D H Ashmead D L Davis A Northington Contributionof listenersrsquo approaching motion to auditory distance per-ception J Exp Psy Hum Percep and Perform 21 (1995)239ndash256

[11] E Czerwinski A Voishvillo S Alexandrov A TerekhovPropagation distortion in sound systems Can we avoid itJ Audio Eng Soc 48 (2000) 30ndash48

[12] S H Nielsen Auditory distance perception in differentrooms J Audio Eng Soc 41 (1993) 755ndash770

[13] M B Gardner Distance estimation of 0 or apparent 0oriented speech signals in anechoic space J Acoust SocAm 45 (1969) 47ndash53

[14] A W Bronkhorst T Houtgast Auditory distance percep-tion in rooms Nature 397 (1999) 517ndash520

[15] J B Allen D A Berkley Image method for efficientlysimulating small-room acoustics J Acoust Soc Am 65(1979) 943ndash950

[16] M Rychtarikova T V d Bogaert G Vermeir J WoutersBinaural sound source localization in real and virtualrooms J Audio Eng Soc 57 (2009) 205ndash220

[17] J S Chan C Maguinness D Lisiecka C Ennis MLarkin C OrsquoSullivan F Newell Comparing audiovisualdistance perception in various real and virtual environ-ments Proc of the 32nd Euro Conf on Vis Percep Re-gensburg Germany 2009

[18] D Waller Factors affecting the perception of interobjectdistances in virtual environments Presence Teleoper Vir-tual Environ 8 (1999) 657ndash670

[19] A McKeag D McGrath Sound field format to binauraldecoder with head-tracking Proc of the 6th Australian Re-gional Convention of the AES 1996

[20] M Noisternig A Sontacchi T Musil R Holdrich A3D Ambisonic based binaural sound reproduction systemProc of the 24th Int Conf of the Audio Eng Soc AlbertaCanada 2003

[21] B-I Dalenback M Str

omberg Real time walkthrough au-

ralization - the first year Proc of the Inst of AcousCopenhagen Denmark 2006

[22] C Masterson S Adams G Kearney F Boland A methodfor head related impulse response simplification Procof the 17th European Signal Processing Conference (EU-SIPCO) Glasgow Scotland 2009

[23] J Merimaa V Pulkki Spatial impulse response renderingi Analysis and synthesis J Audio Eng Soc 53 (2005)

[24] W M Hartmann Localization of sound in rooms JAcoust Soc Am 74 (1983) 1380ndash1391

[25] D Griesinger Spatial impression and envelopment in smallrooms Proc of the 103rd Conv of the Audio Eng SocNew York USA 1997

[26] G Kearney M Gorzel H Rice F Boland Depth per-ception in interactive virual acoustic environments usinghigher order ambisonic soundfields Proc of the 2nd IntAmbisonics Symp Paris France 2010

[27] A Farina Simultaneous measurement of impulse responseand distortion with a swept-sine technique Proc of the108th Conv of the Audio Eng Soc Paris France 2000

[28] M Gerzon The design of precisely coincident microphonearrays for stereoand surround sound Proc of the 50thConv of the Audio Eng Soc London UK 1975

70

Kearney et al Distance perception ACTA ACUSTICA UNITED WITH ACUSTICAVol 98 (2012)

[29] C Guastavino B F G Katz Perceptual evaluation ofmulti-dimensional spatial audio reproduction J AcoustSoc Am 116 (2004) 1105ndash1115

[30] P Zahorik Assessing auditory distance perception usingvirtual acoustics J Acoust Soc Am 111 (2002) 1832ndash1846

[31] J M Loomis R L Klatzky J W Philbeck R G Goll-edge Assessing auditory distance perception using percep-tually directed action Perception And Psychophysics 60(1998) 966ndash980

[32] T Y Grechkin T D Nguyen J M Plumert J F CremerJ K Kearney How does presentation method and measure-ment protocol affect distance estimation in real and virtualenvironments ACM Trans Appl Percept 7 (2010) 261ndash2618

[33] S Bertet Formats audio 3d hieacuterarchiques Caracteacuterisationobjective et perceptive des systeacutemes ambisonicsdrsquoordressupeacuterieurs PhD dissertation INSA Lyon 2008

[34] W M Fisher G R Doddington K M Goudie-MarshallThe darpa speech recognition research database Specifica-tions and status Proc of the DARPA Workshop on SpeechRecognition 1986

[35] M A Gerzon G J Barton Ambisonic decoders forHDTV Proc of the 92nd Conv of the Audio Eng SocVienna Austria 1992

[36] NaturalPoint Trackir 5 httpwwwnaturalpointcomtrackir accessed 30th Sept 2011

[37] M Puckette Pure data httppuredatainfo ac-cessed 30th Sept 2011

[38] P Zahorik D S Brungart A W Bronkhorst Auditory dis-tance perception in humans A summary of past and presentresearch Acta Acustica united with Acustica 91 (2005)409ndash420

[39] H Wittek Perceptual differences between Wavefield Syn-thesis and Stereophony Department of Music and SoundRecording School of Arts Communication and Humani-ties University of Surrey UK 2007

71

Page 11: 1 ) -0 #0*- ! 1 ) #! !-* 0* ! ) *- ! ) ) ) *#! #0! *sigmedia/pmwiki/uploads/Main...# ( " &/4$/' ,$ +$ *- ! ) %- #! ! !- ) - 1 ) -0 #0*- ! 1 ) #! !-* 0* ! ) *- ! )) ) *#! #0! * K05

Kearney et al Distance perception ACTA ACUSTICA UNITED WITH ACUSTICAVol 98 (2012)

[29] C Guastavino B F G Katz Perceptual evaluation ofmulti-dimensional spatial audio reproduction J AcoustSoc Am 116 (2004) 1105ndash1115

[30] P Zahorik Assessing auditory distance perception usingvirtual acoustics J Acoust Soc Am 111 (2002) 1832ndash1846

[31] J M Loomis R L Klatzky J W Philbeck R G Goll-edge Assessing auditory distance perception using percep-tually directed action Perception And Psychophysics 60(1998) 966ndash980

[32] T Y Grechkin T D Nguyen J M Plumert J F CremerJ K Kearney How does presentation method and measure-ment protocol affect distance estimation in real and virtualenvironments ACM Trans Appl Percept 7 (2010) 261ndash2618

[33] S Bertet Formats audio 3d hieacuterarchiques Caracteacuterisationobjective et perceptive des systeacutemes ambisonicsdrsquoordressupeacuterieurs PhD dissertation INSA Lyon 2008

[34] W M Fisher G R Doddington K M Goudie-MarshallThe darpa speech recognition research database Specifica-tions and status Proc of the DARPA Workshop on SpeechRecognition 1986

[35] M A Gerzon G J Barton Ambisonic decoders forHDTV Proc of the 92nd Conv of the Audio Eng SocVienna Austria 1992

[36] NaturalPoint Trackir 5 httpwwwnaturalpointcomtrackir accessed 30th Sept 2011

[37] M Puckette Pure data httppuredatainfo ac-cessed 30th Sept 2011

[38] P Zahorik D S Brungart A W Bronkhorst Auditory dis-tance perception in humans A summary of past and presentresearch Acta Acustica united with Acustica 91 (2005)409ndash420

[39] H Wittek Perceptual differences between Wavefield Syn-thesis and Stereophony Department of Music and SoundRecording School of Arts Communication and Humani-ties University of Surrey UK 2007

71