development of a test environment to evaluate performance ... · leads to a larger change in speech...

15
J Am Acad Audiol 16:27–41 (2005) 27 *Auditory Research Department, Sonic Innovations, Inc.; †Department of Audiology and Speech-Language Pathology, David O. McKay School of Education, Brigham Young University Michael J. Nilsson, Ph.D., Sonic Innovations, Inc., Hearing Aid Research Laboratory, 2795 East Cottonwood Parkway, Suite 100, Salt Lake City, Utah 84121; Phone: 801-365-2871; Fax: 801-365-3005 Development of a Test Environment to Evaluate Performance of Modern Hearing Aid Features Michael Nilsson* Robert M. Ghent* Victor Bray* Richard Harris† Abstract This article describes a new test environment and materials with the potential to measure performance differences among different hearing aid signal pro- cessing methods and features. Normative data suggest a linearly predictable increase in difficulty on a speech-in-noise task as the masker changes from random noise to multiple-talker speech, and the number of talkers increases. Data collected with normal listeners revealed no differences across four test sites for the single loudspeaker (Noise-Front) results and some differences across test sites for the multiple loudspeaker results. Room dimension differ- ences among audiometric test enclosures and diffusion (or lack thereof) of the maskers in the vertical and horizontal dimensions of the sound fields appear to account for performance differences for the multiloudspeaker arrays, con- firming the need to limit maskers to aperiodic signals in rooms with controlled ceiling height or to establish norms for each test environment such that results obtained in different enclosures can be compared. Key Words: Diffuse field, directional microphone, hearing aid, HINT, multi- ple-talker speech, noise reduction, random noise, sound field, speech testing, verification Abbreviations: 2DD = two-dimensionally diffuse; ANSI = American National Standards Institute; BYU = (test site at) Brigham Young University; CCF = (test site at) Cleveland Clinic Foundation; DAW = digital audio workstation; HIA = Hearing Industries Association; HINT = Hearing in Noise Test; ISO = International Organization for Standardization; ITU = International Telecommunications Union; LTAS = long-term average spectrum; MANOVA= multivariate analysis of variance; MTS = multiple-talker speech; P-I = performance-intensity; rms = root-mean-square; RTS = reception thresholds for sentences; SACD = super audio compact disc; SLC = (test site at) Salt Lake City; SNR = signal-to-noise ratio; SPIN = Speech Perception in Noise test; WAV = digital audio files in Windows Media format; WUSM = (test site at) Washington University School of Medicine Sumario Este artículo describe un nuevo ambiente de evaluación y los materiales con el potencial de medir las diferencias de desempeño entre diferentes métodos de procesamiento de la señal para auxiliares auditivos, y sus características. Los datos normativos sugieren un incremento linealmente predecible en la dificultad de una tarea de lenguaje en ruido, conforme el enmascarador cambia de ruido aleatorio a ruido de lenguaje de hablantes múltiples, y conforme aumenta el número de hablantes. La información colectada con sujetos normo-oyentes no reveló diferencias en los cuatro sitios de evaluación

Upload: others

Post on 30-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Development of a Test Environment to Evaluate Performance ... · leads to a larger change in speech recognition. The P-I function for the HINT is an 8.9% improvement in speech recognition

J Am Acad Audiol 16:27–41 (2005)

27

*Auditory Research Department, Sonic Innovations, Inc.; †Department of Audiology and Speech-Language Pathology, David O.McKay School of Education, Brigham Young University

Michael J. Nilsson, Ph.D., Sonic Innovations, Inc., Hearing Aid Research Laboratory, 2795 East Cottonwood Parkway, Suite100, Salt Lake City, Utah 84121; Phone: 801-365-2871; Fax: 801-365-3005

Development of a Test Environment toEvaluate Performance of Modern HearingAid Features

Michael Nilsson*Robert M. Ghent*Victor Bray*Richard Harris†

Abstract

This article describes a new test environment and materials with the potentialto measure performance differences among different hearing aid signal pro-cessing methods and features. Normative data suggest a linearly predictableincrease in difficulty on a speech-in-noise task as the masker changes fromrandom noise to multiple-talker speech, and the number of talkers increases. Data collected with normal listeners revealed no differences across four testsites for the single loudspeaker (Noise-Front) results and some differencesacross test sites for the multiple loudspeaker results. Room dimension differ-ences among audiometric test enclosures and diffusion (or lack thereof) of themaskers in the vertical and horizontal dimensions of the sound fields appearto account for performance differences for the multiloudspeaker arrays, con-firming the need to limit maskers to aperiodic signals in rooms with controlledceiling height or to establish norms for each test environment such that resultsobtained in different enclosures can be compared.

Key Words: Diffuse field, directional microphone, hearing aid, HINT, multi-ple-talker speech, noise reduction, random noise, sound field, speech testing,verification

Abbreviations: 2DD = two-dimensionally diffuse; ANSI = American NationalStandards Institute; BYU = (test site at) Brigham Young University; CCF = (testsite at) Cleveland Clinic Foundation; DAW = digital audio workstation; HIA =Hearing Industries Association; HINT = Hearing in Noise Test; ISO = InternationalOrganization for Standardization; ITU = International TelecommunicationsUnion; LTAS = long-term average spectrum; MANOVA = multivariate analysisof variance; MTS = multiple-talker speech; P-I = performance-intensity; rms =root-mean-square; RTS = reception thresholds for sentences; SACD = superaudio compact disc; SLC = (test site at) Salt Lake City; SNR = signal-to-noiseratio; SPIN = Speech Perception in Noise test; WAV = digital audio files inWindows Media format; WUSM = (test site at) Washington University Schoolof Medicine

Sumario

Este artículo describe un nuevo ambiente de evaluación y los materiales conel potencial de medir las diferencias de desempeño entre diferentes métodosde procesamiento de la señal para auxiliares auditivos, y sus características.Los datos normativos sugieren un incremento linealmente predecible en ladificultad de una tarea de lenguaje en ruido, conforme el enmascaradorcambia de ruido aleatorio a ruido de lenguaje de hablantes múltiples, yconforme aumenta el número de hablantes. La información colectada consujetos normo-oyentes no reveló diferencias en los cuatro sitios de evaluación

Page 2: Development of a Test Environment to Evaluate Performance ... · leads to a larger change in speech recognition. The P-I function for the HINT is an 8.9% improvement in speech recognition

28

JJoouurrnnaall ooff tthhee AAmmeerriiccaann AAccaaddeemmyy ooff AAuuddiioollooggyy/Volume 16, Number 1, 2005

Laboratory studies of advanced signalprocessing hearing aids typicallyevaluate benefit for speech recognition

in noise (speech-in-noise) with respect to thefactors of audibility, directionality, and noisereduction. Many recent studies have notshown performance differences associatedwith these signal processing parameters(Elberling et al, 1993; Ludvigsen et al, 1993).This may be due to the lack of true benefitassociated with the particular parameter,the lack of sensitivity to parameterperformance in the test environment, or acombination of both. The purpose of thisexperiment was to test the design of acontrolled, sensitive environment with thepotential for isolating and quantifying benefitattributable to the advanced parametersfound in today’s hearing aids. Additionalmotivation comes from the establishment ofguidelines by the HIA (Hearing IndustriesAssociation) for the substantiation ofadvertising claims made by hearing aidmanufacturers (2003). Such a test systemcould provide a valid, repeatable, norm-referenced baseline method capable of

measuring benefit for a wide range oftechnologies without unduly biasing results.

Many competing requirements must beconsidered when designing test materials toassess human performance with hearing aids.Among these are the general experimentaldesign requirements of reliability, sensitivity,and validity. In addition, when quantifyingspeech recognition in noise (especially withadvanced hearing aids), temporalcharacteristics (used by noise reductionalgorithms) and spatial characteristics (usedby directional microphone systems) must becontrolled or consistently manipulated toallow for parsing of individual contributorsto performance.

The test environment described in thepresent paper was designed around theHearing in Noise Test (HINT), a well-known,norm-referenced, reliable, and valid test ofspeech recognition in quiet and in noise(Nilsson et al, 1994). The HINT is aprerecorded test used to measure the RTS(reception thresholds for sentences) in quietor in noise. The 250 sentences comprisingthe HINT are of approximately equal length

para los resultados de parlante único (ruido al frente), y sí algunas diferenciasen los resultados de los lugares de evaluación con parlantes múltiples. Lasdiferencias de las dimensiones de la cabina en los espacios de evaluaciónaudiométrica y la difusión (o la ausencia de ella) de los enmascaradorescolocados en la dimensión vertical y horizontal del campo sonoro, parecengenerar diferencias de desempeño en el arreglo multiparlante. Esto confirmala necesidad de limitar los enmascaradores a señales aperiódicas, en cabinascon altura superior controlada, o de establecer normas para cada ambientede evaluación, para que los resultados obtenidos en diferentes espaciospuedan ser comparados.

Palabras Clave: Campo difuso, micrófono direccional, auxiliar auditivo, HINT,lenguaje de hablantes múltiples, reducción del ruido, ruido aleatorio, camposonoro, evaluación del lenguaje, verificación

Abreviaturas: 2DD = difusión bi-dimensional; ANSI = Instituto NacionalAmericano de Estándares; BYU = (sitio de evaluación en la) UniversidadBrigham Young; CCF = (sitio de evaluación en la) Fundación Clínica Cleveland;DAW = Estación de audio digital; HIA = Asociación de Industrias de la Audición;NINT = Prueba de Audición en Ruido; ISO = Organización Internacional deEstandarización; ITU = Sindicato Internacional de Telecomunicaciones; LTAS= espectro de promediación a largo plazo; MANOVA = análisis multivariadode variancia; MTS = Lenguaje de hablantes múltiples; P-I = Desempeño-Intensidad; rms = raíz media cuadrada; RTS = Umbral de recepción de frases;SACD = super disco compacto de audio; SLC = (sitio de evaluación en) SaltLake City; SNR = tasa señal-ruido; SPIN = Prueba de Percepción de Lenguajeen Ruido; WAV = archivos digitales de audio en el formato Windows Media;WUSM = (sitio de evaluación en la) Escuela de Medicina de la Universidadde Washington

Page 3: Development of a Test Environment to Evaluate Performance ... · leads to a larger change in speech recognition. The P-I function for the HINT is an 8.9% improvement in speech recognition

DDeevveellooppmmeenntt ooff aa TTeesstt EEnnvviirroonnmmeenntt/Nilsson et al

29

(five to seven syllables) and difficulty (first-grade reading level). They were spoken by amale voice actor and digitally recorded. Thesentences were equated for difficulty forpresentation in quiet or in noise and groupedinto phonemically balanced lists of 10 or 20sentences, providing 25 equivalent 10-sentence lists or 12 equivalent 20-sentencelists (Hanks and Johnson, 1998). Thetemporal and spectral characteristics of theoriginal masking noise are stationary. Thenoise spectrum matches the long-termaverage spectrum (LTAS) of the sentences,thereby maintaining a steep P-I(performance-intensity) function (Studebakeret al, 1987).

The choice of number and length of listsdepends on the need for measurementaccuracy and testing time. RTSs in noise aremeasured by fixing the noise level andadaptively varying the sentence level until thesentences can be repeated correctly 50% of thetime. RTSs in quiet are obtained using thesame adaptive technique with an equivalentnoise level of zero. The adaptive procedureavoids nonlinear ceiling and floor effectscustomarily associated with intelligibilitytests given at fixed presentation levels, andoptimizes the statistical efficiency of the test(Levitt, 1978). A threshold measurement witha single list usually takes less than twominutes, contributing to the clinical feasibilityof the test.

The reliability and sensitivity of the testhave been established through use of within-subject repeated measurements of thresholdswith different lists (Soli and Nilsson, 1997).The reliability of the HINT comparesfavorably with a set of Dutch materials(Plomp and Mimpen, 1979) and demonstratesimproved reliability over the SPIN (SpeechPerception in Noise test) sentences (Gelfandet al, 1988). A P-I function defines thesensitivity of a test: A steeper slope meansthat a smaller change in presentation levelleads to a larger change in speech recognition.The P-I function for the HINT is an 8.9%improvement in speech recognition for each1 dB change in presentation level in quiet andnoise when testing both normal-hearing andhearing-impaired listeners (Soli and Nilsson,1997). The slope of the P-I function is closerthan other sentence materials to previouslyreported P-I functions that have beenmeasured with word lists (Studebaker andSherbecoe, 1991). The well-controlled length

and simplicity of the sentences limit thecontextual effects that account for the steeperfunctions shown with continuous discourse(Bronkhorst and Plomp, 1989). A tradeoffbetween reliability and realistic quality ofthe materials forces the sacrifice of one (theuse of continuous discourse andconversational speech) for the other (improvedreliability and sensitivity).

Digital noise reduction algorithms inhearing aids monitor auditory signals overtime, seeking out steady-state energy withinthe continuous waveform (Chabries and Bray,2002). There are two implications for testingspeech recognition in noise with noisereduction algorithms. First, the test materialsmust allow the noise reduction algorithmsufficient time to sample the signal for noise.This requires that any masker must precedethe target signal by sufficient time to allowthe noise reduction algorithm to engage andstabilize (Nilsson et al, 2000).

Second, noise reduction algorithms act onthe more-or-less steady-state elements of asignal, requiring maskers to contain somesteady-state energy. There are many steady-state maskers found in the real world (roador traffic noise, fan or motor noise, airplanenoise, or crowd noise) as well as manymodulated maskers (speech). Specifying onetype of masker as representative of allenvironments is artificial, no matter whichmasker is chosen; instead, maskers should bechosen for their efficiency or for their physicalproperties to trigger specific features. Theclinically reasonable solution is to samplemaskers that extend over a range ofconditions to compare and contrast theproperties of interest. By defining the hardestcondition, or even the easiest, laboratorytesting can set boundaries to understand thepotential of a system, with real-world benefitoccurring somewhere within a defined range.This must be done with well-controlledstimuli in order to increase sensitivity andmust be correlated with patients’ experiencesin the real world.

Directional microphones used in analogand digital hearing aids require spatialseparation between the target signal and themasker in order for benefit to be realized,requiring use of multiple loudspeakers inthe test environment. This allows the targetsignal to be delivered from in front of thesubject (0˚ azimuth, assuming listeners facethe person with whom they are talking) with

Page 4: Development of a Test Environment to Evaluate Performance ... · leads to a larger change in speech recognition. The P-I function for the HINT is an 8.9% improvement in speech recognition

JJoouurrnnaall ooff tthhee AAmmeerriiccaann AAccaaddeemmyy ooff AAuuddiioollooggyy/Volume 16, Number 1, 2005

30

the competing masker coming from otherlocations, generally from the sides and backof the subject. Various configurations of noisesources have been reported (Leeuw andDreschler, 1991; Valente et al, 1995; Voss,1997; Ricketts and Dhar, 1999; Wouters et al,1999; Ricketts, 2000; Revit et al, 2002), butwhile more complex sound fields may betterrepresent the environments encountered inreal-world listening situations, complex soundfields are very difficult to replicate in clinicalsettings.

Limitations must be considered whengeneralizing laboratory soundfield testconditions to real-world hearing aidperformance. Listeners are often exposed toenvironments where maskers have broadspectra and/or modulated properties (e.g.,cocktail party noise). However, manyenvironmental maskers may be steady stateand have a limited bandwidth (e.g., motors,fans, road noise). Listeners are often in diffuseor reverberant fields where there is as muchnoise in the foreground as in thebackground—few environmental maskersare positioned ideally at or near the polarresponse nulls of directional microphones—and significant noise from many directionssimultaneously may confuse adaptivedirectional microphone systems, which resultsin a fixed directional response from thesystem.

The test environment must reflect itsclinical purpose: verification, validation, orboth. Verification entails measurement ofbehavioral performance and quantifiablebenefit from the individual hearing aidfeatures. Such a test environment must berepeatable and replicable, well controlledand clinically valid—established in line withaccepted norms. For the purposes ofvalidation, such a system should be able torepresent as nearly as possible the real-worldlistening environments in which individualpatients might reasonably be expected tocommunicate, where the patient’s unaidedperformance in the same environment servesas the individual’s control. In this case, thetest environment need not be replicable orwell controlled, and by individual necessity,norms do not exist (Anastasi, 1988).

We propose a test environment thatmeets the rigors of clinical verification butmay also be used to validate hearing aidfittings. In evaluating advanced hearing aidswith noise reduction and/or directional

microphones, the challenge is to develop alaboratory test that provides careful controlover both temporal and spatial variables.This paper describes the development of thematerials, equipment, and environment forsuch a test, and reports on normativeperformance. A follow-up paper in preparationdescribes unaided and aided performance inthe same test environment with hearing-impaired listeners.

The identical test configuration wasinstalled at four sites. Normative data werecollected at all sites to evaluate theequivalency of the test configuration and toallow comparisons of data collected fromsamples of different normal-hearingpopulations. To accommodate the operationalrequirements of the noise reduction anddirectional features of modern hearing aids,the HINT masker was modified for noiseonset time, diffusion, and multisignalcorrelation. A new set of maskers was createdusing multiple-talker speech. Details of thesemodifications are found in the “Methods”section below.

MMEETTHHOODDSS

TTeesstt MMaatteerriiaallss

HHeeaarriinngg--iinn--NNooiissee TTeesstt

The HINT masking noise is shaped tomatch the LTAS of the male talker in anattempt to reduce auditory cues elicited byspectral differences between the target signaland the masker (Nilsson et al, 1994). Severalmodifications were made to the HINTmasking noise for the present study.

The first modification was to move thestart of the noise forward in time relative tothe start of each sentence by increasing thetime interval between the onset of the noiseand the onset of the sentences from 0.5seconds to approximately 5.0 seconds. (Theoriginal intention of this modification wasto allow time for single-microphone noisereduction algorithms to engage, but effects onnormal performance were also observed andwill be discussed later.) Evaluation of single-microphone adaptive noise reductionalgorithms can be accomplished with a singleloudspeaker in this design.

The second and third modifications to

Page 5: Development of a Test Environment to Evaluate Performance ... · leads to a larger change in speech recognition. The P-I function for the HINT is an 8.9% improvement in speech recognition

DDeevveellooppmmeenntt ooff aa TTeesstt EEnnvviirroonnmmeenntt/Nilsson et al

31

the HINT noise involved the creation ofdiffuse, correlated and diffuse, uncorrelatedmasker tracks to be reproducedsimultaneously from four differentloudspeakers arranged in a soundfield array.A single noise signal delivered through all fourloudspeakers simultaneously creates acorrelated noise field. Four separate recordingtracks of noise, each having a differentstarting point when sampled from the originalHINT noise recording and delivered throughthe individual loudspeakers surroundingeach listener, create an uncorrelated noisefield.

Binaural presentation of correlatedsignals in a sound field or under headphonescreates a fused auditory image in the centerof a listener’s head, whereas uncorrelatedmaskers are perceived as originating outsidea listener’s head. These phenomena (well-known and described in myriad otherreferences) were confirmed in the presenttest environment using the HINT maskingnoise. Data were collected with normallisteners using the correlated noise presentedfrom multiple loudspeakers to compare itseffect on speech-in-noise performance to thatof uncorrelated noise in the same testenvironment.

A new set of maskers was created bycombining recordings of uncorrelatedmultiple-talker speech, with four, eight,twelve, and sixteen individual talkers. Thismasking condition adds the variable oftemporal modulation across differentnumbers of talkers. Details on the creationof these new maskers follow.

MMuullttiippllee--TTaallkkeerr SSppeeeecchh

Recordings of eight young adult maleand eight young adult female talkers wererandomly selected from a sample of 50individual digital recordings of each gender(100 total recordings) reading the “TelevisionPassage” (Cox and Moore, 1988). The originalrecordings had been made in an anechoicroom with a free-field microphone positioned30 cm away from each talker (Layton, 1991).

All of the original recordings weretransferred onto a computer hard disk andconverted to Windows Media format digitalaudio (WAV) files at 16-bit resolution and asampling rate of 44.1 kHz using Sound Forge4.5 (Sonic Foundry). The files remained in thedigital domain, at this bit resolution and

sample rate, throughout editing, mixing, filetransfer, and final reproduction for subjecttesting.

Editing was accomplished using CoolEdit Pro 1.2 (Syntrillium Software). Eachpassage was edited to remove pauses, breaths,and discontinuities (>75 msec) in order tocreate continuous, intelligible, runningspeech. Temporal gaps due to the morphemic,phonemic, and prosodic features of speechwere retained.

Since the original masking noise of theHINT was filtered to match the LTAS of themale talker for the target sentences, therecordings of the individual talkers werelikewise filtered to approximate the LTAS ofthe HINT noise. The result of this filteringwas to remove from the recording of eachtalker any energy not present in the HINTnoise spectrum. However, energy present inthe HINT noise spectrum (predominantlylow-frequency) was not added to the spectraof the talkers where missing. This producedsmall changes to the sibilant energy for eachtalker but did not alter the formant structure,such that individual talkers could still beuniquely identified. As the talkers weresummed to create the multiple-talker files,the resulting waveforms and spectra becameincreasingly like those of the HINT noise(Figures 1 and 2).

The recorded passage was uncorrelatedamong the 16 talkers by setting the startingpoint for each talker at a different positionwithin the passage. The individual recordingswere trimmed to the approximate length oftime required to deliver two sequential listsof ten HINT sentences, including theincreased noise onset time and an appropriateinterstimulus interval for subject response(about 4.2 total minutes for each 20-sentencelist). The recordings were then mixed toachieve the desired number of multipletalkers per test condition, with an equalaverage rms (root-mean-square) powercontribution from each talker. Each testcondition was equated to within 0.1 dB of theaverage rms power of the HINT noise. Themix of talkers for each test condition wasgender-balanced, and there was no talkerredundancy in any of the multiple-talkermaskers.

Finally, each recording was synchronizedwith the presentation of the HINT sentenceswith silence inserted at the interstimulusintervals. The synchronized tests were saved

Page 6: Development of a Test Environment to Evaluate Performance ... · leads to a larger change in speech recognition. The P-I function for the HINT is an 8.9% improvement in speech recognition

JJoouurrnnaall ooff tthhee AAmmeerriiccaann AAccaaddeemmyy ooff AAuuddiioollooggyy/Volume 16, Number 1, 2005

32

FFiigguurree 11.. Spectra of maskers: (a) a single talker forreference; (b) four-talker MTS; (c) eight-talker MTS;(d) twelve-talker MTS; (e) sixteen-talker MTS; (f)HINT noise. All of the MTS maskers were filtered tomatch the LTAS of the HINT noise and obtainedusing a 1024-line FFT with a Blackmann-Harris win-dow.

FFiigguurree 22.. Time-waveforms of maskers: (a) a singletalker for reference; (b) four-talker MTS; (c) eight-talker MTS; (d) twelve-talker MTS; (e) sixteen-talkerMTS; (f) HINT noise. All waveforms are ten-secondsamples and equated for average rms power.

a

b

c

d

e

f

a

b

c

d

e

f

Dan Brown
Text Box
Errata: View the correct figure for Figure 2 at http://www.audiology.org/includes/display.php?f=jaaa/16_2/p123.pdf
Page 7: Development of a Test Environment to Evaluate Performance ... · leads to a larger change in speech recognition. The P-I function for the HINT is an 8.9% improvement in speech recognition

DDeevveellooppmmeenntt ooff aa TTeesstt EEnnvviirroonnmmeenntt/Nilsson et al

33

as WAV files on recordable compact disc anduploaded to a Yamaha AW4416 DAW (digitalaudio workstation) for final mixing, routing,and playback in the sound field.

TTeesstt EEqquuiippmmeenntt

MMuullttiittrraacckk RReeccoorrddiinngg aanndd PPllaayybbaacckk

The Yamaha AW4416 DAW is a 16-track,hard disk-based digital mixer and recorder.Each HINT list in the present study consistedof two sets of HINT sentences (a total of 20sentences per list). Signal routing and finalmixing levels for playback were precalibratedas established in the sound field by the fader(volume control slider) positions for each testcondition (quiet, HINT noise from the front,HINT noise from the loudspeaker array, andfour-, eight-, twelve-, and sixteen-talkerspeech from the loudspeaker array). Mixerlevel and signal routing settings were savedto the hard disk along with each HINT list.Playback of a particular test conditionrequired selecting a HINT list from a menu,then recalling the settings from memory.

LLoouuddssppeeaakkeerr AArrrraayy

Six discrete outputs from the DAW wereused to deliver signals to the sound field.Two outputs were routed to channels oneand two of the audiometer for control ofsignals (target and masker) to be presentedfrom the center/front loudspeaker of thesoundfield array. The left soundfield outputof the audiometer was routed to one channelof a five-channel power amplifier. Four otherassignable outputs of the DAW were routedto the four remaining channels of the poweramplifier.

The outputs of the power amplifierchannels were used to drive the fiveloudspeakers of the soundfield array,arranged according to Figure 3. All five of theloudspeakers (RCA PRO-X44AV) werepositioned on stands 1 m away from thecenter of the listening position per ANSI(American National Standards Institute)requirements (1996) and ISO (InternationalOrganization for Standardization)requirements (1992). Speaker height wasselected to match average adult head levelwhen seated (1.15 m). The four loudspeakers

FFiigguurree 33.. Schematic diagram of the equipment connections and soundfield loudspeaker array. The center/frontloudspeaker is placed toward a corner of the audiometric test enclosure with the axes of the masking loudspeakersperpendicular to the walls.

Page 8: Development of a Test Environment to Evaluate Performance ... · leads to a larger change in speech recognition. The P-I function for the HINT is an 8.9% improvement in speech recognition

JJoouurrnnaall ooff tthhee AAmmeerriiccaann AAccaaddeemmyy ooff AAuuddiioollooggyy/Volume 16, Number 1, 2005

34

used for reproducing the maskers wereequidistant from each other and set at 45˚,135˚, 225˚, and 315˚ azimuth.

The system was calibrated such that theHINT masking noise was reproduced at 65dB(A) at the center of the listening positionunder any test condition (any masker fromthe front loudspeaker or any masker from thesoundfield array). When the soundfield arraywas used to reproduce the masking condition(both HINT noise and multiple-talker speech[MTS]), the mixer settings ensured equalcontribution from each loudspeaker.

While the sound field met the ANSI(1996) and ISO (1992) requirements for aquasi-free sound field, this definition doesnot adequately describe the behavior of thetest environment, nor is the term “quasi-freesound field” an accurate label for what isdescribed in the two standards. A free soundfield, by definition, has no boundaries. Typicalaudiometric test environments are enclosedvolumes with boundaries that exhibitabsorption and reflection to varying degreesat different frequencies. By placinguncorrelated sound sources around theazimuth of the listening position, the effectof diffusion can be created within a subspaceof the enclosed volume (see Appendix byGhent, 2004, in this issue), but only in the twodimensions represented azimuthally.Therefore, the sound field thus created maybe defined, with limitations, as two-dimensionally diffuse, or 2DD.

The design rationale for using fiveloudspeakers in the array described in Figure3 was based on several factors. Quantifiableverification of speech-in-noise performancerequires controlled laboratory conditions,specifically a diffuse sound field at thelistening position. Having four evenly spacedloudspeakers for delivering the maskingsignal satisfies this requirement, creating astable 2DD environment. Should it bedesirable to validate individual hearing aidfittings, however, where simulation of real-world listening environments is desired, thesame five loudspeakers can be spaced forreproduction of surround-encoded audioaccording to the InternationalTelecommunications Union RecommendationITU-R BS.775-1 (1994). Audio productiontechniques for creating realistic simulatedaudio environments using the recommendedfive-loudspeaker array, without the need foraccompanying motion picture visual

reinforcement, have been described (Bosun,2001), and the five-loudspeaker array thusarranged has been shown to create asimulated listening environment that isnearly equivalent psychoacoustically to a 12-loudspeaker reference array (Hiyama et al,2002), while being much less costly andcomplex.

TTeesstt EEnnvviirroonnmmeennttss

The test setup described was replicatedat four sites: the Hearing Aid ResearchLaboratory at Sonic Innovations in Salt LakeCity, Utah (SLC), the Department ofOtolaryngology at Washington UniversitySchool of Medicine in St. Louis, Missouri(WUSM), the Audiology Research Laboratoryat the Cleveland Clinic Foundation inCleveland, Ohio (CCF), and the Departmentof Audiology and Speech-Language Pathologyat Brigham Young University in Provo, Utah(BYU). The audiometric test room at each sitehad to be large enough to accommodate thefive-loudspeaker soundfield array shown inFigure 3 (a minimum inside dimension of 2m by 2 m). The listening position was locatedat the center of each audiometric test room.The audiometric test rooms complied withANSI requirements for permissible ambientnoise, ears uncovered (ANSI, 1991).

Replication of the equipment setup atthe various test sites was accomplished usingidentical equipment and bit-for-bit copies ofthe digitally recorded test materials, includingthe precalibrated mixer settings. Calibrationat the listening position of each of thealternate test sites was verified to be within0.5 dB of the original test site. The spectraof the HINT maskers were measured in theNoise-Front and uncorrelated 2DD soundfieldconditions at the listening position of threeof the four test sites. The 1/3-octave spectrawere obtained using a Bruel & Kjaer 2260Investigator connected to a 2.54 cmmicrophone and are shown in Figure 4. Thesemeasurements were undertaken to ensurethat the HINT noise spectrum (including thepeaks at 1 kHz, 3 kHz, and 6 kHz in theFFT-derived spectrum in Figure 2 and the 1/3-octave spectrum in Figure 4) was maintainedbetween the Noise-Front and 2DD soundfieldconditions as well as across test sites. Theconsistency between 500 Hz and 8 kHz acrosssites was high enough that collection ofspectral data at the fourth site was deemed

Page 9: Development of a Test Environment to Evaluate Performance ... · leads to a larger change in speech recognition. The P-I function for the HINT is an 8.9% improvement in speech recognition

DDeevveellooppmmeenntt ooff aa TTeesstt EEnnvviirroonnmmeenntt/Nilsson et al

35

unnecessary and cannot be collected as thebooth has since been disassembled and nolonger exists in its original configuration.

TTeesstt PPrroocceedduurree

Modified hyper-latin-squares were usedto randomize the starting order andpresentation order of the HINT lists, andthe starting order and presentation order ofthe test conditions. The prescribed adaptivepresentation and scoring procedure for theHINT was followed, as described in the HINTinstructions. The presentation level of theHINT sentences was varied by adjusting theHL attenuator on Channel 1 of theaudiometer.

RREESSUULLTTSS AANNDD DDIISSCCUUSSSSIIOONN

Soundfield test results have been difficultto compare across test facilities because

of interactions with the sound field itself;different room sizes and differences inreflecting surfaces within an audiometrictest room can change base performance(Ghent, 2004, in this issue). Therefore, the

validity of the present set of materials wasestablished across several levels. First,questions concerning noise onset time andcorrelation of noise in the multipleloudspeaker soundfield conditions wereresolved to determine the test conditions tobe used in the multisite testing. Next, thedifference between the Noise-Front and 2DDsoundfield noise conditions was establishedacross the four test setups. Finally, the changein performance with the different multiple-talker speech maskers was evaluated.

HHIINNTT NNooiissee UUssiinngg TTwwoo NNooiissee OOnnsseettTTiimmeess

Normative data were collected on tennormal-hearing listeners (three male andseven female with pure-tone thresholds lessthan 20 dB HL between 250 and 6000 Hz,native speakers of English, age range from24 to 47 years with a mean age of 30.7 years)at the SLC site. Subjects were placed at thelistening position, and target sentences werepresented from the front loudspeaker whileuncorrelated HINT noise was delivered fromthe four surrounding loudspeakers. Adaptive

FFiigguurree 44.. 1/3-octave real-time spectral analysis of HINT noise measured at the listening position, from the Noise-Front (dashed lines) and the 2DD soundfield (solid lines) conditions of three of the four test sites. The dashedlines are measurements of the Noise-Front condition, and the solid lines are measurements of the 2DD noisefield.

Page 10: Development of a Test Environment to Evaluate Performance ... · leads to a larger change in speech recognition. The P-I function for the HINT is an 8.9% improvement in speech recognition

JJoouurrnnaall ooff tthhee AAmmeerriiccaann AAccaaddeemmyy ooff AAuuddiioollooggyy/Volume 16, Number 1, 2005

36

thresholds were measured using the 20-sentence lists with both 0.5 second noiseonset time and 5.0 second noise onset time.

An ANOVA found a significant differencein thresholds for the two noise onset times[F(1, 9) = 38.55, p < .01]. The longer noiseonset time produced lower thresholds (-8.16dB SNR [signal-to-noise ratio]) than the shortonset time (-5.58 dB SNR). It is possible thatallocation of attention is better accomplishedwith sufficient time to prepare for listeningin this difficult condition, thereby improvingthresholds by over 2.5 dB. All multisite datawere therefore collected with the long noiseonset time, which benefits absoluteperformance but also allows evaluation ofsignal processing algorithms that requireprocessing time to stabilize.

The longer noise onset time can also beconsidered as an accommodation for thelimitations of the HINT procedure.Background noise is turned off betweensentences in order to hear the subject’sresponse and determine the presentationlevel of the next sentence. Real-worldsituations do not typically include interruptedmaskers that occur only when a speech targetis present. The longer onset time re-establishes the more typical listeningsituation of a continuous masker. The clinicalimplication that should not be overlooked isthat using the short onset time in the unaidedcondition (for clinical expediency) and thelong onset time in the aided condition willartificially inflate any observed benefit, orshow benefit where none exists.

CCoorrrreellaatteedd vveerrssuuss UUnnccoorrrreellaatteedd NNooiisseeSSoouurrcceess

Speech-in-noise performance with asingle noise signal delivered through all fourloudspeakers simultaneously (a correlatednoise field condition) was compared to thatof four separate tracks of noise, each havinga different starting phase, delivered throughthe individual loudspeakers surroundingeach listener (an uncorrelated noise fieldcondition). The same 10 subjects were testedwith a five-second noise onset time in bothconditions. An ANOVA found significantdifferences between the correlated anduncorrelated noise field conditions [F(1, 9) =6.28, p < .05], with lower thresholds in thecorrelated noise condition (-8.16 dB SNR)than in the uncorrelated noise condition (-7.27

dB SNR). Since one goal of the system is toprovide laboratory control over conditionsthat are representative of those that may beencountered in the real world, the use of theuncorrelated noise in the sound field issuggested. It is important to note thatdifferences in the time alignment of even arandom noise can cause changes inperformance (cf. masking level differences),suggesting that correlated and uncorrelatedsoundfield conditions should not be directlycompared.

TTeessttiinngg aatt MMuullttiippllee SSiitteess

Thresholds were measured on normal-hearing subjects in one single-loudspeakercondition, and five multiple-loudspeakerconditions at the four test sites (SLC, WUSM,CCF, and BYU). Of interest was the abilityof the replicated equipment setup (includingdigital recorder, speakers, recordings, androom layout) to produce comparableperformance in a range of listening conditions.At the same time, differences in absoluteperformance with various maskers could alsobe evaluated. All conditions used the five-second noise onset time, and the multiple-loudspeaker soundfield conditions all useduncorrelated signals.

An ANOVA was used to evaluate thesingle-loudspeaker condition with HINTspeech and noise presented from the frontloudspeaker. No significant differencebetween sites was found [F(3, 46) = 0.6, p =.62]. The mean RTS for the normal-hearingsubjects across sites was -2.9 dB SNR.

A MANOVA (multivariate analysis ofvariance) was used to compare the four testsites and the five different multiple-loudspeaker conditions (uncorrelated HINTnoise presented from four loudspeakers, andone, two, three, or four individual talkersplayed out of each of the four soundfieldloudspeakers creating four-, eight-, twelve-,or sixteen-talker speech maskers). A maineffect of test site [F(3, 46) = 9.1, p < .01] wasfound, with a post-hoc analysis revealingthat SLC (-5.1 dB SNR) was different fromthe other three sites (BYU = -3.5 dB SNR;WSM = -3.5 dB SNR; CCF = -3.8 dB SNR).A main effect of masker type was found [F(4,184) = 71.0, p < .01], with post-hoc analysesshowing the five maskers producedsignificantly different RTSs (2DD noise = -5.4dB SNR; four talkers = -4.7 dB SNR; eight

Page 11: Development of a Test Environment to Evaluate Performance ... · leads to a larger change in speech recognition. The P-I function for the HINT is an 8.9% improvement in speech recognition

DDeevveellooppmmeenntt ooff aa TTeesstt EEnnvviirroonnmmeenntt/Nilsson et al

37

talkers = -4.0 dB SNR; twelve talkers = -3.3dB SNR; sixteen talkers = -2.1 dB SNR). Nosignificant interaction was found [F(12, 184)= 1.08, p = .38].

The respective statistical analyses reduceto several succinct observations: (1) No siteeffect was shown in the single-loudspeakercondition; (2) A site effect was found in themultiple-loudspeaker conditions; (3) RTSvalues were different for the front versus2DD noise; (4) variation in the number oftalkers produced different RTS values; (5)RTS values were lower for 2DD noise than forall multiple-talker masker conditions.

The consistency of RTS scores acrosssites in the single-loudspeaker noise conditionsupports the ability of this test configurationto be accurately replicated and calibrated atmultiple sites. This also suggests that thesubject pools were equivalent, and differencesfound in other conditions should not beattributed to population sampling errors.

The major difference among the test sitesthat could affect the multiple-loudspeakerconditions is the variation in room size,specifically dimensions of the respectiveaudiometric test enclosures. The audiometric

test suite at SLC has a volume of 22.79 m3;the test suite at BYU has a volume of 17.52m3; and the enclosures at WUSM and CCFhave volumes of 14.05 m3 and 13.78 m3,respectively. The effect of enclosure volumeon HINT RTSs in noise is shown in Figure 5.

Performance with the multiple-talkermaskers appears consistent and predictable,and step-wise regressions were used todetermine which room dimensions weresignificant predictors of performance. Factorsincluded in the analyses were room height,depth, and width; largest and smallestdistance between any loudspeaker and thewalls of the test enclosure; and distance fromthe front loudspeaker to the closest wall.Backwards regression after a full loading ofthe dependent variables yields significantpredictive relationships between either roomdepth or width and the multiple-talkerconditions [F(1, 48) = 9.6 to 12.9, p < .01 andr2 = 0.2 for all analyses]. Backwardsregression after a full loading of the samedependent variables listed above yields asignificant predictive relationship between2DD noise and ceiling height [F(1, 48) = 25.3,p < .01, r2 = 0.3].

FFiigguurree 55.. Performance (RTSs in noise) with various maskers presented from the loudspeaker array, as a func-tion of the internal volume of the audiometric test enclosures. The respective internal volumes and ceiling heightsof the test enclosures at each site are as follows. SLC: 22.79 m3 and 2.44 m; BYU: 17.52 m3 and 1.98 m; WUS:14.05 m3 and 1.98 m; CCF: 13.78 m3 and 1.98 m. BBN = broadband noise.

Page 12: Development of a Test Environment to Evaluate Performance ... · leads to a larger change in speech recognition. The P-I function for the HINT is an 8.9% improvement in speech recognition

Therefore, variability in the RTS scoresfor the multiple-talker conditions is bestpredicted by the length-width dimensions ofthe various test enclosures. The variability inRTS for the 2DD noise is dominated by ceilingheight (2.44 m at SLC versus 1.98 m at allof the other sites) where there is no controlof diffusion. Unlike the multiple-talkermaskers, the lack of periodicity in the randomnoise means true diffusion (random incidence)is achieved in the two horizontal dimensionscontrolled by the uncorrelated signal delivery.If the third (vertical) dimension were to becontrolled by placement of loudspeakersdelivering uncorrelated noise above and belowthe listening position, the room dimensionswould no longer significantly predict RTS inrandom noise. There would be little change,however, in the relationships for the multiple-talker conditions because their periodicstructure would continue to interact withthe overall volume of the enclosure (Ghent,2004, in this issue). Therefore, the limitationof the enclosed 2DD sound field is thatdiffusion can only be achieved in thecontrolled dimensions with uncorrelatedrandom energy.

Regarding the observation that the RTSvaries between Noise-Front (-2.9 dB SNR)and 2DD noise (-5.4 dB SNR), previousresearch has long established improvedperformance in noise due to spatial separationof the target signal and masker for bothbroadband noise (Licklider, 1948; Dubno,Ahlstrom, 2002) and multiple-talker speech(Gelfand et al, 1988). The present data simplyconfirm this fact.

Testing with different numbers of talkersin a multiple-talker speech masker was doneto test the theory that with fewer talkers inthe background, a listener may be able totake advantage of temporal gaps in themasker to extract information from the targetsignal and complete the message throughauditory closure. Recent work by Dubno,Horwitz (2002, 2003) supports this theory,although interrupted noise was used as themasker instead of speech. The theory includedthe notion that increasing the number oftalkers would increase the difficulty of thetask as the masker became more gapless.The present experiment supports this notion.

Regarding the observation that RTS washigher for all multiple-talker maskerconditions than for the 2DD noise condition,it is most likely other acoustical differences

between the maskers, namely the smallerpeak-to-power ratio and random nature of thenoise versus the similar peak-to-power ratioand periodicity of the multiple-talker maskersto the target that account for the differences,or contextual differences, between themaskers.

The random noise, with its moreconsistent distribution over time, will havea smaller peak-to-power ratio (a smallerrange from the highest intensity peak to thelowest) than is seen in any of the multiple-talker speech maskers (compare the time-waveforms in Figure 2). At some point thepeak intensity may be determining theamount of masking provided, which istypically seen with very short speech samplessuch as individual words (Zhang and Zeng,1997), especially in the higher frequencies.Therefore, listeners can take advantage ofsome gaps (which explains why betterperformance is found with fewer talkers in themasker), but because the alignment of thetarget and masker is random, the effectivemasking compared to random noise is greaterbecause of the wider range of levels that aremasked.

In 1969, Carhart et al introduced theterm “perceptual masking” to explain theirfinding that a speech-in-noise task increasedin complexity as the background (masker)required the listener to disentangle the targetfrom increasing numbers of competing signalscomprised of meaningful speech. Data fromthe present experiment would seem tosupport this model. However, an experimentby Lewis et al (1988), while confirming againthat a speech-based masker was moreefficient than broadband noise, did not findsupport for the perceptual masking modelof Carhart et al (1969). Lewis et al (1988),while offering many possible explanations,could not account for their observeddifferences; however, they did observe that theP-I functions of the noise and babble resultsdiffered by a constant (2.3 dB) and that theslopes of the functions were nearly identicalas is shown in Figure 5 with the currentdata.

CCOONNCCLLUUSSIIOONNSS

Modifications to the HINT have beendeveloped that control the temporal

and spatial characteristics that are used byfeatures available in modern hearing aids.

JJoouurrnnaall ooff tthhee AAmmeerriiccaann AAccaaddeemmyy ooff AAuuddiioollooggyy/Volume 16, Number 1, 2005

38

Page 13: Development of a Test Environment to Evaluate Performance ... · leads to a larger change in speech recognition. The P-I function for the HINT is an 8.9% improvement in speech recognition

Performance characteristics with thesedevices historically have been difficult toassess because traditional test environmentsand materials do not adequately addresstheir capabilities and limitations. The newtest environment and materials described inthis paper increase the noise onset interval,allowing temporal processes sufficient timeto take effect, while the introduction of newmasking conditions presented from aloudspeaker array allow evaluation of hearingaid features that rely on spatial factors. Thenew test environment and materials alsomeet or exceed the requirements set forth inthe HIA guidelines (2003).

The materials and procedures describedherein were replicated at four laboratories.The Noise-Front condition does not exhibitvariation with respect to dimensions of theaudiometric enclosure. A consistent shift inperformance as a function of the enclosurewidth and depth was observed for thematerials masked by multiple-talker speech.A shift in performance that is dominated byceiling height was demonstrated with theuncorrelated noise presented in the 2DDsound field. Although both the 2DD and themultiple-talker maskers met the ISO andANSI definition of a “quasi-free sound field,”we found that room size affected the results,which was unexpected.

With variation in room size, the aperiodicnoise generated diffusion in two dimensions(i.e., the results are independent of the roomwidth and depth) but failed to generatediffusion in the third dimension (i.e., theresults interacted with room height). Thisinteraction will be a natural consequencewhenever a horizontal loudspeaker array isplaced at ear level, rather than at varyingheights around the enclosure. This suggeststhat additional uncorrelated noise introducingdiffusion in the vertical dimension wouldeliminate any interaction between RTS androom size. Additionally, standard audiometrictest rooms with ceiling heights comparableto the the CCF, WUMC, and BYU rooms(1.98 m) should generate normative datacomparable to the current study. Theregression equation found can also be usedto correct for nonstandard ceiling heights.

On the other hand, with periodic signals,diffusion cannot be generated to the sameextent as random noise. The elimination ofinteractions between RTS and horizontalroom dimensions was not attained with these

speech materials. Therefore, the cost of usingspeech as an experimental masker is the lossof control (higher variability) associated withroom dimensions, which is not addressed inthe current definition of the “quasi-free soundfield.” The regression equation predicts thatperformance will shift by 1 dB for each 6.3 m3

change in volume, and can also be used toadjust the current data to the size of anyparticular audiometric test room.

Despite the limitation associated withroom size described above, the testconfiguration still addresses our otherdevelopment criteria (control of spatialseparation, noise onset time, and temporalcharacteristics). This limitation does notprevent the evaluation of absoluteperformance, which can be attained bycomparison of site-specific data to normativedata for that site. Analysis of combined groupdata is possible by using site-specific controls(subjects running as their own controls) andcan be verified by the lack of significantinteraction between site and the variable ofinterest. The current norms can be appliedto other audiometric test rooms with thecaveats that a minimum size is needed for thespeaker array, and the appropriate ceilingheight or room volume adjustment is applied(based upon the regression equation).

As of this writing, the test environmentand materials described in this paper havebeen used with hearing-impaired listeners toevaluate digital noise reduction, directionalmicrophones, and multichannel signalprocessing algorithms in hearing aids. Thesemultisite clinical trial results will help usunderstand how temporal and spatialcharacteristics of the masking noise affectaided speech recognition with advanced signalprocessing.

By standardizing the test setup on oneof the commonly available 5.1-channelsurround-sound configurations, we foreseethis methodology being cost-effectivelyutilized in various clinical settings. Encodingfor 5.1-channel sound reproduction is nowavailable on super audio compact discs(SACDs) and audio DVDs, as well as on theaudio tracks of conventional DVDs. It ispossible to use the test system described inthe present study to develop standardizedtest materials that can then be distributed onthe commercially available digital audiomedia listed above. For general clinical utility,less costly home theater equipment designed

DDeevveellooppmmeenntt ooff aa TTeesstt EEnnvviirroonnmmeenntt/Nilsson et al

39

Page 14: Development of a Test Environment to Evaluate Performance ... · leads to a larger change in speech recognition. The P-I function for the HINT is an 8.9% improvement in speech recognition

for playback of 5.1-channel audio can bedropped in place of the multitrack DAW usedfor test material development, mixing, andencoding. Additionally, on the DVD formatmany alternate maskers can be included, aswell as tests in multiple languages.Simulations of real-world listeningenvironments can be menu-accessed andenhanced with the addition of motion picturevisual reinforcement for use as a validationtool.

AAcckknnoowwlleeddggmmeennttss.. The authors wish to thank JaneEnrietto and Suzanne Kornhass for data collectionat WUSM and CCF, respectively; Jeannette Johnsonfor her contributions to protocol development; HarveyDillon for his suggestions regarding technical issues;and Patrick Murphy, Michael Valente, and DavidFabry for their reviews of and comments on the exper-imental design. Special thanks to Roxanne Olsen andRebecca Nilsson for their help in preparing data col-lection materials, and to Jeff Tackett for helping usget over the learning curve on the Yamaha equipmentfaster than we would have otherwise.

RREEFFEERREENNCCEESS

American National Standards Institute. (1991)Maximum Permissible Ambient Noise Levels orAudiometric Test Rooms [ANSI S3.1-1991]. New York:American National Standards Institute.

American National Standards Institute. (1996)Specification for Audiometers [ANSI S3.6-1996]. NewYork: American National Standards Institute.

Anastasi A. (1988) Psychological Testing. New York:Macmillan.

Bosun X. (2001) Signal mixing for a 5.1-channel sur-round sound system—analysis and experiment. JAudio-Engineering Soc 49:263–274.

Bronkhorst AW, Plomp R. (1989) Binaural speechrecognition in noise for hearing-impaired listeners.J Acoust Soc Am 86:1374–1383.

Carhart R, Tillman TW, Greetis ES. (1969) Perceptualmasking in multiple sound backgrounds. J AcoustSoc Am 45:694–703.

Chabries DM, Bray VH. (2002) Use of DSP techniquesto enhance the performance of hearing aids in noise.In: Davis GM, ed. Noise Reduction in SpeechApplications. Boca Raton, FL: CRC Press.

Cox RM, Moore JN. (1988) Composite speech spec-trum for hearing aid gain prescriptions. J SpeechHear Res 31:102–107.

Dubno JR, Ahlstrom JB, Horwitz AR. (2002) Spectralcontributions to the benefit from spatial separationof speech and noise. J Speech Hear Res 45:1297–1310.

Dubno JR, Horwitz AR, Ahlstrom JB. (2002) Benefitof modulated maskers for speech recognition by

younger and older adults with normal hearing. JAcoust Soc Am 111:2897–2907.

Dubno JR, Horwitz AR, Ahlstrom JB. (2003) Recoveryfrom prior stimulation: masking of speech by inter-rupted noise for younger and older adults with normalhearing. J Acoust Soc Am 113:2084–2094.

Elberling C, Ludvigsen C, Keidser G. (1993) Thedesign and testing of a noise reduction algorithmbased on spectral subtraction. Scand Audiol Suppl38:39–49.

Gelfand SA, Ross L, Miller S. (1988) Sentence recep-tion in noise from one versus two sources: effects ofaging and hearing loss. J Acoust Soc Am 83:248–256.

Ghent RM. (2004) A tutorial on complex sound fieldsfor audiometric testing. J Am Acad Audiol 15:18–26.

Hanks WD, Johnson GD. (1998) HINT list equiva-lency using older listeners. J Speech Hear Res41:1335–1340.

Hearing Industries Association. (2003) Guidelines forHearing Aid Manufacturers for Substantiation ofPerformance Claims. Alexandria, VA: HearingIndustries Association.

Hiyama K, Komiyama S, Hamasaki K. (2002) Theminimum number of loudspeakers and its arrange-ment for reproducing the spatial impression of diffusesound field. J Aud Eng Soc 50:964.

International Organization for Standardization. (1992)Acoustics—Audiometric Test Methods—Part 2: SoundField Audiometry with Pure Tone and Narrow-BandTest Signals [ISO 8253-2]. Geneva: InternationalOrganization for Standardization.

International Telecommunications Union. (1994)Multichannel Stereophonic Sound System with andwithout Accompanying Picture [RecommendationITU-R BS.775-1]. Geneva: InternationalTelecommunications Union.

Layton BJ. (1991) Distribution of rms levels in speech.Master's thesis, Brigham Young University, Provo,UT.

Leeuw AR, Dreschler WA. (1991) Advantages of direc-tional hearing aid microphones related to roomacoustics. Audiology 30:330–344.

Levitt H. (1978) Adaptive testing in audiology. ScandAudiol Suppl 6:241–291.

Lewis HD, Benignus VA, Muller KE, Malott CM,Barton CN. (1988) Babble and random-noise mask-ing of speech in high and low context cue conditions.J Speech Hear Res 31:108–114.

Licklider JCR. (1948) The influence of interauralphase relations upon the masking of speech by whitenoise. J Acoust Soc Am 20:150–159.

Ludvigsen C, Elberling C, Keidser G. (1993)Evaluation of a noise reduction method—comparisonbetween observed scores and scores predicted fromSTI. Scand Audiol Suppl 38:50–55.

Nilsson M, Fang X, Ghent R, Murphy P, Bray V. (2000)Improved speech intelligibility in noise with a single-

JJoouurrnnaall ooff tthhee AAmmeerriiccaann AAccaaddeemmyy ooff AAuuddiioollooggyy/Volume 16, Number 1, 2005

40

Page 15: Development of a Test Environment to Evaluate Performance ... · leads to a larger change in speech recognition. The P-I function for the HINT is an 8.9% improvement in speech recognition

microphone noise reduction technique. J Acoust SocAm 108:2574.

Nilsson MJ, Soli SD, Sullivan JA. (1994) Developmentof the Hearing In Noise Test for the measurement ofspeech reception thresholds in quiet and in noise. JAcoust Soc Am 95:1085–1099.

Plomp R, Mimpen AM. (1979) Improving the relia-bility of testing the speech reception threshold forsentences. Audiology 18:43–52.

Revit LJ, Schulein RB, Julstrom SD. (2002) Towardaccurate assessment of real-world hearing aid bene-fit. Hear Rev 9:34–51.

Ricketts T. (2000) Impact of noise source configura-tion on directional hearing aid benefit andperformance. Ear Hear 21:194–205.

Ricketts T, Dhar S. (1999) Comparison of perform-ance across three directional hearing aids. J Am AcadAudiol 10:180–189.

Soli SD, Nilsson MJ. (1997) Predicting speech intel-ligibility in noise: the role of factors other thanpure-tone sensitivity. J Acoust Soc Am 101:3201.

Studebaker GA, Pavlovic CV, Sherbecoe RL. (1987)A frequency importance function for continuous dis-course. J Acoust Soc Am 81:1130–1138.

Studebaker GA, Sherbecoe RL. (1991) Frequency-importance and transfer functions for recorded CIDW-22 word lists. J Speech Hear Res 34:427–438.

Valente M, Fabry DA, Potts LG. (1995) Recognitionof speech in noise with hearing aids using dual micro-phones. J Am Acad Audiol 6:440–449.

Voss T. (1997) Clinical evaluation of multi-microphonehearing instruments. Hear Rev 4:36–74.

Wouters J, Litiere L, van Wieringen A. (1999) Speechintelligibility in noisy environments with one- andtwo-microphone hearing aids. Audiology 38:91–98.

Zhang C, Zeng FG. (1997) Loudness of dynamic stim-uli in acoustic and electric hearing. J Acoust Soc Am102:2925–2934.

DDeevveellooppmmeenntt ooff aa TTeesstt EEnnvviirroonnmmeenntt/Nilsson et al

41