journal aes 2003 sept vol 51 num 9

108
AES JOURNAL OF THE AUDIO ENGINEERING SOCIETY AUDIO / ACOUSTICS / APPLICATIONS Volume 51 Number 9 2003 September In this issue… Down Mixing 5.1 Surround Approximations to HRTF Computations Listeners Evaluate Loudspeakers Objective Measure of Envelopment Features… 23rd Conference Report, Copenhagen Digital Rights Management Call for Papers 25th Conference, London

Upload: benjamin-alain-menanteau-torres

Post on 08-Dec-2015

39 views

Category:

Documents


13 download

DESCRIPTION

Journal AES 2003 Sept Vol 51 Num 9

TRANSCRIPT

Page 1: Journal AES 2003 Sept Vol 51 Num 9

sustainingmemberorganizations AESAES

VO

LU

ME

51,NO

.9JO

UR

NA

L O

F T

HE

AU

DIO

EN

GIN

EE

RIN

G S

OC

IET

Y2003 S

EP

TE

MB

ER

JOURNAL OF THE AUDIO ENGINEERING SOCIETYAUDIO / ACOUSTICS / APPLICATIONSVolume 51 Number 9 2003 September

In this issue…

Down Mixing 5.1 Surround

Approximations to HRTFComputations

Listeners Evaluate Loudspeakers

Objective Measure of Envelopment

Features…

23rd Conference Report,Copenhagen

Digital Rights Management

Call for Papers25th Conference, London

The Audio Engineering Society recognizes with gratitude the financialsupport given by its sustaining members, which enables the work ofthe Society to be extended. Addresses and brief descriptions of thebusiness activities of the sustaining members appear in the Octoberissue of the Journal.

The Society invites applications for sustaining membership. Informa-tion may be obtained from the Chair, Sustaining Memberships Committee, Audio Engineering Society, 60 East 42nd St., Room2520, New York, New York 10165-2520, USA, tel: 212-661-8528.Fax: 212-682-0477.

ACO Pacific, Inc.Acustica Beyma SAAir Studios Ltd.AKG Acoustics GmbHAKM Semiconductor, Inc.Amber Technology LimitedAMS Neve plcATC Loudspeaker Technology Ltd.Audio LimitedAudiomatica S.r.l.Audio Media/IMAS Publishing Ltd.Audio Precision, Inc.AudioScience, Inc.Audio-Technica U.S., Inc.AudioTrack CorporationAutograph Sound Recording Ltd.B & W Loudspeakers LimitedBMP RecordingBritish Broadcasting CorporationBSS Audio Cadac Electronics PLCCalrec AudioCanford Audio plcCEDAR Audio Ltd.Celestion International LimitedCerwin-Vega, IncorporatedClearOne Communications Corp.Community Professional Loudspeakers, Inc.Crystal Audio Products/Cirrus Logic Inc.D.A.S. Audio, S.A.D.A.T. Ltd.dCS Ltd.Deltron Emcon LimitedDigidesignDigigramDigital Audio Disc CorporationDolby Laboratories, Inc.DRA LaboratoriesDTS, Inc.DYNACORD, EVI Audio GmbHEastern Acoustic Works, Inc.Eminence Speaker LLC

Event Electronics, LLCFerrotec (USA) CorporationFocusrite Audio Engineering Ltd.Fostex America, a division of Foster Electric

U.S.A., Inc.Fraunhofer IIS-AFreeSystems Private LimitedFTG Sandar TeleCast ASHarman BeckerHHB Communications Ltd.Innova SONInnovative Electronic Designs (IED), Inc.International Federation of the Phonographic

IndustryJBL ProfessionalJensen Transformers Inc.Kawamura Electrical LaboratoryKEF Audio (UK) LimitedKenwood U.S.A. CorporationKlark Teknik Group (UK) PlcKlipsch L.L.C.Laboratories for InformationL-Acoustics USLeitch Technology CorporationLindos ElectronicsMagnetic Reference Laboratory (MRL) Inc.Martin Audio Ltd.Meridian Audio LimitedMetropolis GroupMiddle Atlantic Products Inc.Mosses & MitchellM2 Gauss Corp.Georg Neumann GmbH Neutrik AGNVisionNXT (New Transducers Ltd.)1 LimitedOntario Institute of Audio Recording

TechnologyOutline sncPacific Audio-VisualPRIMEDIA Business Magazines & Media Inc.Prism SoundPro-Bel Limited

Pro-Sound NewsPsychotechnology, Inc.Radio Free AsiaRane CorporationRecording ConnectionRocket NetworkRoyal National Institute for the BlindRTI Tech Pte. Ltd.Rycote Microphone Windshields Ltd.SADiESanctuary Studios Ltd.Sekaku Electron Ind. Co., Ltd.Sennheiser Electronic CorporationShure Inc.Snell & Wilcox Ltd.Solid State Logic, Ltd.Sony Broadcast & Professional EuropeSound Devices LLCSound On Sound Ltd.Soundcraft Electronics Ltd.Sowter Audio TransformersSRS Labs, Inc.Stage AccompanySterling Sound, Inc.Studer North America Inc.Studer Professional Audio AGTannoy LimitedTASCAMTHAT CorporationTOA Electronics, Inc.TommexTouchtunes Music Corp.TurbosoundUnited Entertainment Media, Inc.Uniton AGUniversity of DerbyUniversity of SalfordUniversity of Surrey, Dept. of Sound

RecordingVCS AktiengesellschaftVidiPaxWenger CorporationJ. M. Woodgate and AssociatesYamaha Research and Development

Page 2: Journal AES 2003 Sept Vol 51 Num 9

AUDIO ENGINEERING SOCIETY, INC.INTERNATIONAL HEADQUARTERS

60 East 42nd Street, Room 2520, New York, NY 10165-2520, USATel: +1 212 661 8528 . Fax: +1 212 682 0477E-mail: [email protected] . Internet: http://www.aes.org

Roger K. Furness Executive DirectorSandra J. Requa Executive Assistant to the Executive Director

ADMINISTRATION

STANDARDS COMMITTEE

GOVERNORS

OFFICERS 2002/2003

Karl-Otto BäderCurtis HoytRoy Pritts

Don PuluseDavid Robinson

Annemarie StaepelaereRoland Tan

Kunimaro Tanaka

Ted Sheldon Chair Dietrich Schüller Vice Chair

Mendel Kleiner Chair David Josephson Vice Chair

SC-04-01 Acoustics and Sound Source Modeling Richard H. Campbell, Wolfgang Ahnert

SC-04-02 Characterization of Acoustical MaterialsPeter D’Antonio, Trevor J. Cox

SC-04-03 Loudspeaker Modeling and Measurement David Prince, Neil Harris, Steve Hutt

SC-04-04 Microphone Measurement and CharacterizationDavid Josephson, Jackie Green

SC-04-07 Listening Tests: David Clark, T. Nousaine

SC-06-01 Audio-File Transfer and Exchange Mark Yonge, Brooks Harris

SC-06-02 Audio Applications Using the High Performance SerialBus (IEEE: 1394): John Strawn, Bob Moses

SC-06-04 Internet Audio Delivery SystemKarlheinz Brandenburg

SC-06-06 Audio MetadataC. Chambers

Kees A. Immink President

Ronald Streicher President-Elect

Garry Margolis Past President

Jim Anderson Vice President Eastern Region, USA/Canada

James A. Kaiser Vice PresidentCentral Region, USA/Canada

Bob Moses Vice President,Western Region, USA/Canada

Søren Bech Vice PresidentNorthern Region, Europe

Markus ErneVice President, Central Region, Europe

Daniel Zalay Vice President, Southern Region, Europe

Mercedes Onorato Vice President,Latin American Region

Neville ThieleVice President, International Region

Han Tendeloo Secretary

Marshall Buck Treasurer

TECHNICAL COUNCIL

Wieslaw V. Woszczyk ChairJürgen Herre and

Robert Schulein Vice Chairs

COMMITTEES

SC-02-01 Digital Audio Measurement Techniques Richard C. Cabot, I. Dennis, M. Keyhl

SC-02-02 Digital Input-Output Interfacing: Julian DunnRobert A. Finger, John Grant

SC-02- 05 Synchronization: Robin Caine

John P. Nunn Chair Robert A. Finger Vice Chair

Robin Caine Chair Steve Harris Vice Chair

John P. NunnChair

John WoodgateVice Chair

Bruce OlsonVice Chair, Western Hemisphere

Mark YongeSecretary, Standards Manager

Yoshizo Sohma Vice Chair, International

SC-02 SUBCOMMITTEE ON DIGITAL AUDIO

Working Groups

SC-03 SUBCOMMITTEE ON THE PRESERVATION AND RESTORATIONOF AUDIO RECORDING

Working Groups

SC-04 SUBCOMMITTEE ON ACOUSTICS

Working Groups

SC-06 SUBCOMMITTEE ON NETWORK AND FILE TRANSFER OF AUDIO

Working Groups

TECHNICAL COMMITTEES

SC-03-01 Analog Recording: J. G. McKnight

SC-03-02 Transfer Technologies: Lars Gaustad, Greg Faris

SC-03-04 Storage and Handling of Media: Ted Sheldon, Gerd Cyrener

SC-03-06 Digital Library and Archives Systems: David Ackerman, Ted Sheldon

SC-03-12 Forensic Audio: Tom Owen, M. McDermottEddy Bogh Brixen

TELLERSChristopher V. Freitag Chair

Correspondence to AES officers and committee chairs should be addressed to them at the society’s international headquarters.

Ray Rayburn Chair John Woodgate Vice Chair

SC-05-02 Audio ConnectorsRay Rayburn, Werner Bachmann

SC-05-03 Audio Connector DocumentationDave Tosti-Lane, J. Chester

SC-05-05 Grounding and EMC Practices Bruce Olson, Jim Brown

SC-05 SUBCOMMITTEE ON INTERCONNECTIONS

Working Groups

ACOUSTICS & SOUNDREINFORCEMENT

Mendel Kleiner ChairKurt Graffy Vice Chair

ARCHIVING, RESTORATION ANDDIGITAL LIBRARIES

David Ackerman Chair

AUDIO FOR GAMESMartin Wilde Chair

AUDIO FORTELECOMMUNICATIONS

Bob Zurek ChairAndrew Bright Vice Chair

CODING OF AUDIO SIGNALSJames Johnston and

Jürgen Herre Cochairs

AUTOMOTIVE AUDIORichard S. Stroud Chair

Tim Nind Vice Chair

HIGH-RESOLUTION AUDIOMalcolm Hawksford Chair

Vicki R. Melchior andTakeo Yamamoto Vice Chairs

LOUDSPEAKERS & HEADPHONESDavid Clark Chair

Juha Backman Vice Chair

MICROPHONES & APPLICATIONSDavid Josephson Chair

Wolfgang Niehoff Vice Chair

MULTICHANNEL & BINAURALAUDIO TECHNOLOGIESFrancis Rumsey Chair

Gunther Theile Vice Chair

NETWORK AUDIO SYSTEMSJeremy Cooperstock ChairRobert Rowe and Thomas

Sporer Vice Chairs

AUDIO RECORDING & STORAGESYSTEMS

Derk Reefman ChairKunimaro Tanaka Vice Chair

PERCEPTION & SUBJECTIVEEVALUATION OF AUDIO SIGNALS

Durand Begault ChairSøren Bech and Eiichi Miyasaka

Vice Chairs

SEMANTIC AUDIO ANALYSISMark Sandler Chair

SIGNAL PROCESSINGRonald Aarts Chair

James Johnston and Christoph M.Musialik Vice Chairs

STUDIO PRACTICES & PRODUCTIONGeorge Massenburg Chair

Alan Parsons, David Smith andMick Sawaguchi Vice Chairs

TRANSMISSION & BROADCASTINGStephen Lyman Chair

Neville Thiele Vice Chair

AWARDSRoy Pritts Chair

CONFERENCE POLICYSøren Bech Chair

CONVENTION POLICY & FINANCEMarshall Buck Chair

EDUCATIONDon Puluse Chair

FUTURE DIRECTIONSKees A. Immink Chair

HISTORICALJ. G. (Jay) McKnight Chair

Irving Joel Vice ChairDonald J. Plunkett Chair Emeritus

LAWS & RESOLUTIONSRon Streicher Chair

MEMBERSHIP/ADMISSIONSFrancis Rumsey Chair

NOMINATIONSGarry Margolis Chair

PUBLICATIONS POLICYRichard H. Small Chair

REGIONS AND SECTIONSSubir Pramanik Chair

STANDARDSJohn P. Nunn Chair

Page 3: Journal AES 2003 Sept Vol 51 Num 9

AES Journal of the Audio Engineering Society(ISSN 0004-7554), Volume 51, Number 9, 2003 SeptemberPublished monthly, except January/February and July/August when published bi-monthly, by the Audio Engineering Society, 60 East 42nd Street, New York, NewYork 10165-2520, USA, Telephone: +1 212 661 8528. Fax: +1 212 682 0477. E-mail: [email protected]. Periodical postage paid at New York, New York, and at anadditional mailing office. Postmaster: Send address corrections to Audio Engineer-ing Society, 60 East 42nd Street, New York, New York 10165-2520.

The Audio Engineering Society is not responsible for statements made by itscontributors.

COPYRIGHTCopyright © 2003 by the Audio Engi-neering Society, Inc. It is permitted toquote from this Journal with custom-ary credit to the source.

COPIESIndividual readers are permitted tophotocopy isolated ar ticles forresearch or other noncommercial use.Permission to photocopy for internal orpersonal use of specific clients isgranted by the Audio EngineeringSociety to libraries and other usersregistered with the Copyright Clear-ance Center (CCC), provided that thebase fee of $1 per copy plus $.50 perpage is paid directly to CCC, 222Rosewood Dr., Danvers, MA 01923,USA. 0004-7554/95. Photocopies ofindividual articles may be orderedfrom the AES Headquarters office at$5 per article.

REPRINTS AND REPUBLICATIONMultiple reproduction or republica-tion of any material in this Journal requires the permission of the AudioEngineering Society. Permission may also be required from the author(s). Send inquiries to AES Edi-torial office.

ONLINE JOURNALAES members can view the Journalonline at www.aes.org/journal/online.

SUBSCRIPTIONSThe Journal is available by subscrip-tion. Annual rates are $180 surfacemail, $225 air mail. For information,contact AES Headquarters.

BACK ISSUESSelected back issues are available:From Vol. 1 (1953) through Vol. 12(1964), $10 per issue (members), $15(nonmembers); Vol. 13 (1965) to pre-sent, $6 per issue (members), $11(nonmembers). For information, con-tact AES Headquarters office.

MICROFILMCopies of Vol. 19, No. 1 (1971 Jan-uary) to the present edition are avail-able on microfilm from University Microfilms International, 300 NorthZeeb Rd., Ann Arbor, MI 48106, USA.

ADVERTISINGCall the AES Editorial office or send e-mail to: [email protected].

MANUSCRIPTSFor information on the presentationand processing of manuscripts, seeInformation for Authors.

William T. McQuaide Managing EditorGerri M. Calamusa Senior EditorAbbie J. Cohen Senior EditorMary Ellen Ilich Associate EditorPatricia L. Sarch Art DirectorFlávia Elzinga Advertising

EDITORIAL STAFF

Europe ConventionsZevenbunderslaan 142/9, BE-1190 Brussels, Belgium, Tel: +32 2 3457971, Fax: +32 2 345 3419, E-mail for convention information:[email protected] ServicesB.P. 50, FR-94364 Bry Sur Marne Cedex, France, Tel: +33 1 4881 4632,Fax: +33 1 4706 0648, E-mail for membership and publication sales:[email protected] KingdomBritish Section, Audio Engineering Society Ltd., P. O. Box 645, Slough,SL1 8BJ UK, Tel: +441628 663725, Fax: +44 1628 667002,E-mail: [email protected] Japan Section, 1-38-2 Yoyogi, Room 703, Shibuyaku-ku, Tokyo 151-0053, Japan, Tel: +81 3 5358 7320, Fax: +81 3 5358 7328, E-mail: [email protected].

PURPOSE: The Audio Engineering Society is organized for the purposeof: uniting persons performing professional services in the audio engi-neering field and its allied arts; collecting, collating, and disseminatingscientific knowledge in the field of audio engineering and its allied arts;advancing such science in both theoretical and practical applications;and preparing, publishing, and distributing literature and periodicals rela-tive to the foregoing purposes and policies.MEMBERSHIP: Individuals who are interested in audio engineering maybecome members of the AES. Information on joining the AES can be foundat www.aes.org. Grades and annual dues are: Full members and associatemembers, $90 for both the printed and online Journal; $60 for online Jour-nal only. Student members: $50 for printed and online Journal; $20 for online Journal only. A subscription to the Journal is included with all member-ships. Sustaining memberships are available to persons, corporations, ororganizations who wish to support the Society.

Ronald M. AartsJames A. S. AngusGeorge L. AugspurgerJeffrey BarishJerry BauckJames W. BeauchampSøren BechDurand BegaultBarry A. BlesserJohn S. BradleyRobert Bristow-JohnsonJohn J. BubbersMarshall BuckMahlon D. BurkhardRichard C. CabotEdward M. CherryRobert R. CordellAndrew DuncanJohn M. EargleLouis D. FielderEdward J. Foster

Mark R. GanderEarl R. GeddesDavid GriesingerMalcolm O. J. HawksfordJürgen HerreTomlinson HolmanAndrew HornerJyri HuopaniemiJames D. JohnstonArie J. M. KaizerJames M. KatesD. B. Keele, Jr.Mendel KleinerDavid L. KlepperW. Marshall Leach, Jr.Stanley P. LipshitzRobert C. MaherDan Mapes-RiordanJ. G. (Jay) McKnightGuy W. McNallyD. J. MearesRobert A. MoogBrian C. J. MooreJames A. Moorer

Dick PierceMartin PolonD. PreisFrancis RumseyKees A. Schouhamer

ImminkManfred R. SchroederRobert B. SchuleinRichard H. SmallJulius O. Smith IIIGilbert SoulodreHerman J. M. SteenekenJohn StrawnG. R. (Bob) ThurmondJiri TichyFloyd E. TooleEmil L. TorickJohn VanderkooyAlexander VoishvilloDaniel R. von

RecklinghausenRhonda WilsonJohn M. WoodgateWieslaw V. Woszczyk

REVIEW BOARD

Ingeborg M. StochmalCopy Editor

Barry A. BlesserConsulting Technical Editor

Stephanie PaynesWriter

Daniel R. von Recklinghausen Editor

Eastern Region, USA/CanadaSections: Atlanta, Boston, District of Columbia, New York, Philadelphia, TorontoStudent Sections: American University, Berklee College of Music, CarnegieMellon University, Duquesne University, Fredonia, Full Sail Real WorldEducation, Hampton University, Institute of Audio Research, McGillUniversity, Peabody Institute of Johns Hopkins University, Pennsylvania StateUniversity, University of Hartford, University of Massachusetts-Lowell,University of Miami, University of North Carolina at Asheville, WilliamPatterson University, Worcester Polytechnic UniversityCentral Region, USA/CanadaSections: Central Indiana, Chicago, Detroit, Kansas City, Nashville, NewOrleans, St. Louis, Upper Midwest, West MichiganStudent Sections: Ball State University, Belmont University, ColumbiaCollege, Michigan Technological University, Middle Tennessee StateUniversity, Music Tech College, SAE Nashville, Northeast CommunityCollege, Ohio University, Ridgewater College, Hutchinson Campus,Southwest Texas State University, University of Arkansas-Pine Bluff,University of Cincinnati, University of Illinois-Urbana-ChampaignWestern Region, USA/CanadaSections: Alberta, Colorado, Los Angeles, Pacific Northwest, Portland, San Diego, San Francisco, Utah, VancouverStudent Sections: American River College, Brigham Young University,California State University–Chico, Citrus College, Cogswell PolytechnicalCollege, Conservatory of Recording Arts and Sciences, Denver, ExpressionCenter for New Media, Long Beach City College, San Diego State University,San Francisco State University, Cal Poly San Luis Obispo, Stanford University,The Art Institute of Seattle, University of Southern California, VancouverNorthern Region, Europe Sections: Belgian, British, Danish, Finnish, Moscow, Netherlands, Norwegian, St. Petersburg, SwedishStudent Sections: All-Russian State Institute of Cinematography, Danish,Netherlands, Russian Academy of Music, St. Petersburg, University of Lulea-PiteaCentral Region, EuropeSections: Austrian, Belarus, Czech, Central German, North German, South German, Hungarian, Lithuanian, Polish, Slovakian Republic, Swiss,UkrainianStudent Sections: Aachen, Berlin, Czech Republic, Darmstadt, Detmold,Düsseldorf, Graz, Ilmenau, Technical University of Gdansk (Poland), Vienna,Wroclaw University of TechnologySouthern Region, EuropeSections: Bosnia-Herzegovina, Bulgarian, Croatian, French, Greek, Israel,Italian, Portugal, Romanian, Slovenian, Spanish, Serbia and Montenegro,Turkish Student Sections: Croatian, Conservatoire de Paris, Italian, Louis-Lumière SchoolLatin American Region Sections: Argentina, Brazil, Chile, Colombia (Medellin), Mexico, Uruguay,VenezuelaStudent Sections: Taller de Arte Sonoro (Caracas)International RegionSections: Adelaide, Brisbane, Hong Kong, India, Japan, Korea, Malaysia,Melbourne, Philippines, Singapore, Sydney

AES REGIONAL OFFICES

AES REGIONS AND SECTIONS

Page 4: Journal AES 2003 Sept Vol 51 Num 9

AES JOURNAL OF THE

AUDIO ENGINEERING SOCIETY

AUDIO/ACOUSTICS/APPLICATIONS

VOLUME 51 NUMBER 9 2003 SEPTEMBER

CONTENT

In Memoriam: Patricia Macdonald .................................................................................Roger K. Furness 779

PAPERS

Effects of Down-Mix Algorithms on Quality of Surround Sound...............................................................................S ⁄lawomir K. Zielinski, Francis Rumsey, and Søren Bech 780When channel limitations prevent the transmission of a full 5.1 surround mix, there are many options for converting to a lesser number of channels using down mixing. Listeners were asked to evaluate eight different algorithms from two listening positions in terms of preferences rather than quality. Unfortunately,different audio cases produced variations in the conclusion about optimum. The presence of a video picture influenced the experience of reduced audio channels.

A Study on Head-Shape Simplification Using Spherical Harmonics for HRTF Computation at Low Frequencies ..........................................................................Yufei Tao, Anthony I.Tew, and Stuart J. Porter 799Using a simplified shape for the human head in computing head-related transfer functions (HRTFs) produces errors in the calculated pressures on the surface. A model of a head can be represented as a series of spherical harmonics. This study computes the errors in acoustic pressure that result from truncating the series, which corresponds to low-pass shape filtering. These shape errors follow the corresponding pressure errors for frequencies below 3 kHz. Harmonics to order 11 are sufficient for the low frequencies representation of a head. Beyond order 14 there is no additional improvement.

Differences in Performance and Preference of Trained versus Untrained Listeners in Loudspeaker Tests: A Case Study ..................................................................................................Sean E. Olive 806The audio industry makes many assumptions about the appropriateness of various quality testing methods,but there have not been any significant studies to validate these assumptions. The choices are reduced to using trained listeners, who are efficient and discriminating, or untrained listeners, who are more representative of the user population. This 18-month study shows that trained listeners produce the same conclusion as 268 untrained listeners when evaluating loudspeakers.

Objective Measures of Listener Envelopment in Multichannel Surround Systems...................................................................Gilbert A. Soulodre, Michel C. Lavoie, and Scott G. Norcross 826Predicting the degree of listener envelopment is more complex than the traditional measures of lateral energy after the first 80 ms. This detailed study shows that the transition threshold between early and late energy is frequency-dependent. In addition, the loudness of the lateral energy is equally important. A new objective measure is proposed with a very high correlation between perceived envelopment and the calculated metric.

LETTERS TO THE EDITOR

“More Comments on President’s Message and Comments” .....................................John Woodgate 841

STANDARDS AND INFORMATION DOCUMENTS

AES Standards Committee News........................................................................................................... 842Sampling frequencies; digital audio synchronization; preservation and restoration of recordings; loudspeaker modeling and measurement

FEATURES

23rd Conference Report, Copenhagen .................................................................................................. 846Digital Rights Management..................................................................................................................... 85525th Conference, London, Call for Papers ............................................................................................ 871

DEPARTMENTS

News of the Sections ........................................861Sound Track........................................................865New Products and Developments....................867Upcoming Meetings ..........................................868Available Literature ...........................................869

Membership Information...................................872Advertiser Internet Directory............................873Sections Contacts Directory ............................874AES Conventions and Conferences ................880

Page 5: Journal AES 2003 Sept Vol 51 Num 9

P atricia M. Macdonald, executive editor of theJournal of the Audio Engineering Society, died onJuly 6, 2003, after a long and brave battle with can-

cer. It is indeed a daunting task to try to summarize Pat’scontribution to the AES over many years. All those whoworked with her: staff, authors, reviewers and colleaguesremember her dedication, incisive mind, diligence andsense of fairness. But, to tell the story of a life is nevereasy, perhaps because it is imbued not only with biograph-ical information but with personality and character.

Pat was born in London and educated in the Republic ofIreland and the U.K. She traveled with friends first toCanada and then the U.S. in 1955 as a tourist. They weretaken with the excitement of the U.S. and enjoyed the favorable climate. She emigrated to the U.S. in 1956.

She began her career in the audio field when she joinedKoessler Sales Company, Los Angeles, a manufacturers’representative whose products included JBL and Mag-necord. In 1961 she worked for British Industries, PortWashington, NY, which represented Garrard andWharfedale in the U.S. She later assisted Gerry, her hus-band, in establishing Magnetic Recording Systems inWestbury, NY.

Gerry designed one of the earliest professional servo con-trolled tape machines for stereo audio; and Pat built much ofthe electronics. They showed this recorder at one of the AESconventions, which was her introduction to the Society. Sheand Gerry also started Choice Records, a specialized recordlabel featuring some of the most prominent jazz artists.

In 1969 Pat began working on Audio Engineering Soci-

ety projects when she joined Jacqueline Harvey, HarveyAssociates, responsible for producing the AES Journaland managing equipment exhibits at AES conventions. In1974, when all Harvey Associates’ employees joined theAES, she became associate editor of the Journal. She wasappointed managing editor in 1976 with responsibility forproduction of the Journal and in 1979 for all other AEStechnical publications. She remained in that position until1989 when she moved to Annapolis, Maryland, concen-trating on Journal papers.

In 1992 she was persuaded to become executive editor,working from her home office, with regular trips to NewYork for meetings and other work. Her managerial abilitybrought clarification to staff meetings. She had an extraor-dinary focus, concentration and analytical mind. Her tire-less energy was an inspiration to her editorial staff andothers who knew her. She was an intent listener and acompassionate friend.

During the years that she was responsible for the pro-duction of AES publications they have expanded to include, in addition to the Journal and its special issues,anthologies, conference proceedings, special publications,and electronic publication on CD-ROM of conventionpreprints, conference papers, indices, and the completeAES electronic library. She received an honorary member-ship award in 2001 for “extraordinary contributions to audio engineering publications and to the dissemination ofscientific knowledge.”

This is a short and very inadequate list of Pat’s achieve-ments. Behind this is the person whose dedication to theAES was unbelievable. Although most of her energieswere spent in publications, she was well versed in all areasof the AES. She could always be relied upon to give an un-biased opinion on some aspect of Society life, allowingmany of us to rethink some new plan that we might have,before making a mistake. She was generous with help toall those who sought it, but swift to defend the high stan-dards that she felt the AES needed in order to continue asthe leader in audio. This led to many a heated discussion,which could be misunderstood by those not used to herdirectness. The discussions always resulted in arriving atthe best possible solution or path on which to proceed.

I count myself very lucky to have known Pat and am cer-tain there are many who share my feeling. She was a men-tor, a good friend, and socially, a lot of fun, with a greatsense of humor. Her bright smile, optimism, intelligence,and refinement will be sorely missed. The best tribute wecan pay her is to maintain the high standards she set forherself and the AES.

She is survived by her husband Gerry.

ROGER K. FURNESS

Executive Director

In Memoriam

Patricia M. Macdonald 1936-2003

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 779

Page 6: Journal AES 2003 Sept Vol 51 Num 9

PAPERS

780 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

*Manuscript received 2002 August 19; revised 2003 May 22.

0 INTRODUCTION

Sound quality optimization for audio, video, and multi-media products is an interdisciplinary issue that drawsupon the investigators’ experience of acoustics, psychol-ogy, music, audio engineering, and electronics. It hasimportant implications for product designers, programmakers, and media service providers because ultimately ahuman recipient or consumer is the judge of product qual-ity. A knowledge of the design factors that lead to certainsubjective judgments of perceived quality is increasinglyvaluable in an industry that is concerned with tradeoffsbetween cost, data bandwidth/storage requirements, prod-uct complexity, and quality of service.

In an earlier paper [1] we discussed the effects of band-width limitation on audio quality in the context ofmutichannel audiovisual systems. We showed that it mightbe possible to limit the bandwidth of the center channel orto limit the bandwidth of the rear channels without a sig-nificant deterioration of quality for some types of programmaterial. We also investigated the way in which a nativepicture accompanying audio material affects the evaluationof audio quality by exposing the listener to program mate-rial with and without pictures. It was found that the pres-

ence of pictures had only a small effect on the evaluationof the audio quality of band-limited audio material. In thispaper we present the results of a subjective evaluation ofeight down-mix algorithms. (A down-mix algorithm can bedefined as a process of reducing the number of channels.)The main objective of this experiment was to investigateperceptual effects caused by different down-mix algo-rithms, which may help broadcasters or codec designers tofind optimum tradeoffs between the number of transmittedor coded channels and the resultant audio quality. It ishoped that one of the possible applications of these resultsis multichannel audio streaming over the Internet. The mainresearch questions in this experiment were as follows:

• Which down-mix algorithms (from the group selected)are best in terms of basic audio quality?

• What is the difference in audio quality between the cen-ter and the off-center listening positions for both origi-nal and down-mixed items?

• Does the presentations of a native picture accompanyingaudio material have any effect on audio quality evalua-tion as a function of down-mix algorithm and listeningposition?

In order to answer these questions a formal listeningtest was carried out.

Effects of Down-Mix Algorithms on Quality ofSurround Sound*

S ⁄LAWOMIR K. ZIELINSKI, AES Member, AND FRANCIS RUMSEY, AES Fellow

Institute of Sound Recording, University of Surrey, Guildford, Surrey, GU2 7XH, UK

AND

SØREN BECH, AES Fellow

Bang & Olufsen, Struer, Denmark

Eight down-mix algorithms were evaluated in terms of basic audio quality. The investiga-tion was focused on the standard 5.1 multichannel audio setup (ITU-R BS.775-1) and limitedto two listening positions. The results obtained are summarized and detailed specifications ofthe subjectively best algorithms are given. The effect of the presentation of moving pictureson the assessment of audio quality was also investigated. The results show that the exposureto a visual content has a considerable effect on the evaluation of the audio quality at theoff-center position to some types of program material.

Page 7: Journal AES 2003 Sept Vol 51 Num 9

PAPERS EFFECTS OF MIX-DOWN ALGORITHMS ON SOUND QUALITY

1 SELECTION OF PROGRAM MATERIAL

The main and most obvious criterion of the selection ofprogram material was to choose the most popular andgeneric types of material that are currently used. Thereforeit was decided to choose excerpts representing cate-gories such as classical, pop music, movies, and TVshows. A special excerpt with applause recorded after aconcert of classical music was also included in our selec-tion. In the authors’ opinion this item created a veryenveloping impression and therefore was suitable for test-ing down-mix algorithms.

Since surround audio material varies in its spatial con-tent, it was considered to choose a criterion of programmaterial selection based on microphone or panning tech-niques used during recording. However, there were twoproblems related to this criterion. First, it would be neces-sary to select excerpts representing a large number of typesof multichannel microphone and panning techniques,which would increase the number of excerpts used in theexperiment, making the listening test longer and morecomplicated. (A detailed discussion of different multichan-nel microphone and panning techniques can be found in[2].) Second, detailed information about the microphoneand panning techniques used in some recordings is notalways easily accessible. Therefore in order to simplify themethod of program selection, it is decided to use a criterionbased on an audio scene–based paradigm [3].

In this approach, program material is divided into twobasic categories according to the spatial characteristics ofthe program content presented in the front and rear chan-nels, as judged by a small selection panel. Basically, it ispossible to distinguish between two most typical audioscene categories, called F-B and F-F, respectively. Thefirst category (F-B) describes the case where front chan-nels reproduce predominant foreground audio content(mainly close and clearly perceived audio sources),whereas rear channels contain only background audiocontent (room response, reverberant sounds, unclear,“foggy”). This situation may be compared to the typicalsound impression perceived by a listener sitting in a con-cert hall (sound stage with musicians at the front, reflec-tions from the sides and back). Therefore typical record-ings of classical music can be described as F-B scenes,since the front loudspeakers reproduce predominant fore-ground content (orchestra, soloists, and so on) whereas therear channels contain only room response in the form ofreverberations. (The interested reader may find exemplaryrecordings representing the F-B category elsewhere [4],[5].) The second audio scene (F-F) describes a recordingin which both front and rear channels contain predominantforeground content. This category may refer to the audioimpression when a listener is surrounded by the orchestra.As opposed to the F-B category, where rear channels con-tain only reverberation, in the F-F category rear channelscontain also clearly identifiable sound sources, often dif-ferent from the instruments reproduced by the front chan-nels, for example, percussion instruments, backing vocals,and so on. Nowadays many modern pop-music recordingsare mixed down in a way that can be described by means

of the F-F spatial audio scene (see [6], [7] as exemplaryrecordings).

The authors found that sometimes it is difficult to usethis approach for the categorization of some items, espe-cially with variable spatial characteristics (such as dynam-ically panned effects). Moreover, some recordings do notfall neatly into one or the other category. For example,some film music items with a string section “pulled back”toward a listener are difficult to categorize precisely intoeither of the categories. Therefore in the experiment it wasdecided to use only “obvious” items of short duration. Theshort loops selected for the experiment were judged to bereasonable consistent in their spatial characteristics andpredominately in the chosen category. The selectionprocess was performed by the authors and verified infor-mally by their colleagues. Three items with F-B charac-teristics and three with F-F characteristics were selectedfor the experiment. The results of the informal validationby the authors’ colleagues confirmed that the materialselected fell correctly in to the F-B and F-F categories,respectively.

One may raise some questions about the reliability ofthe procedure applied, possible biases, and their implica-tions. The authors believe that the applied method of pro-gram material categorization based on the spatial audio-scene paradigm is externally valid, that is, any otherexperimenter can select recordings representing basicaudio-scene categories (F-B and F-F) in the same way ora way similar to the one proposed by the authors.However, this statement needs experimental verification,which will be the subject of the next experiment. It isplanned that in a future experiment more than one hundredrecordings will be judged by a group of trained listenersand subjectively classified in terms of basic audio-scenecategories. This will enable us to assess the reliability ofthe method and to detect any possible biases involved inthe categorization process. It is also hoped that some phys-ical descriptors of the audio signals, allowing for auto-matic classification of audio recordings, would be derivedon the basis of the experimental data.

As already mentioned, the important selection criterionof the material was consistency of its characteristics. Longitems having variable spectral and spatial characteristicsare difficult to assess. Therefore it was decided to use rel-atively short (approximate duration 20 seconds), loopeditems with possibly time-invariant characteristics. Theexception was the item TV show, in which case it wasimpossible to select any excerpt with very consistent char-acteristics. Special attention was paid to create artistically“correct” loops from both an audio and a visual point ofview. A final selection of program material with itsdescription is presented in Table 1. The measurementresults of the total rms power for each item selected aregiven in the Appendix (Table 7).

2 DOWN-MIX ALGORITHMS

The development of new down-mix algorithms requiresthe engineering of advanced time-variant and frequency-dependent algorithms optimized by means of formal lis-

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 781

Page 8: Journal AES 2003 Sept Vol 51 Num 9

ZIELINSKI ET AL. PAPERS

tening tests, which is beyond the scope of this study. Inthis experiment it was decided to compare the standarddown-mix algorithms based on international recommen-dations [8]. Alternative algorithms exist (for example,[9]); however, their possible advantages over the standardones have not yet been proven, and therefore they were notincluded in this study.

Table 2 gives a detailed list of the algorithms evaluatedtogether with the corresponding mixing equations. Most ofthe down-mix algorithms used in this experiment wereadapted from ITU-R BS.775 [8]: 1/0, 2/0, 3/0, 2/1, 3/1, 2/2.Since the original recommendation gives flexibility con-cerning the choice of coefficients for down-mixing the rearchannels, some informal experiments (described later) havebeen undertaken in order to optimize these coefficients sub-jectively. Two further algorithms (1/2 and LR-mono) werealso included due to their potential applicability.

The first algorithm (1/0) presented in Table 2 allows forthe down-mixing of surround five-channel audio materialinto mono format. This algorithm can be described usingthe following equation:

. . . .C L R C LS RS0 71 0 71 0 5 0 5 l (1)

where

Cl - center output channelL - front left input channelR - front right input channelC - front center input channelLS - left surround input channelRS - right surround input channel.

A broad range of program material was auditionedusing this algorithm, and it was found that apart from asubstantial change in spatial characteristics, distortions oftimbre were also clearly noticeable. These distortionswere perceived mainly in the form of pronounced low-fre-quency content or, sometimes, as sound coloration due toa comb-filtering effect (especially for the Applause item).Some informal listening tests aimed at the optimization ofthis algorithm were carried out. The results showed thatincreasing the value of the coefficients used for down-mixing the left surround channel LS and the right sur-round channel RS from 0.5 to 0.71 may decrease intelligi-bility for some types of program material. The authors, inan informal listening test, determined that the recom-mended default value of 0.5 is the best compromisebetween intelligibility and aesthetic quality of the record-ing; see Eq. (1).

The next algorithm (2/0) allows for the down-mixing iffive-channel audio material to the traditional two-channelstereo format. Therefore it seems to be of special impor-tance as it has a broad range of possible applications.Some informal listening tests aimed at the optimization ofthis algorithm were also carried out. According to theresults both a value of 0.5 and a value of 0.71 used as coef-ficients for down-mixing the surround channels could giveartistically satisfactory results. Both values were almostequally good, and it was difficult to establish an optimumvalue. This discrepancy in the results was also observed inthe experiment conducted at the BBC Research Depart-ment (described in [2]).

The algorithm denoted 3.0 corresponds to a situation inwhich the rear channels are redirected and mixed with the

782 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

SpatialItem Program Type Characteristic Duration Description

1 Classical music F-B 5s

2 Pop music F-B 11s

3 Pop music F-F 14s

4 Movie F-B 8s

5 TV show F-F 10s

6 Applause F-F 14s

Table 1. Audiovisual material selected for experiment. (All excerpts contained a moving picture.)

Typical orchestral music recording with pronounced violin and cellosections. Instruments in front channels with reverberation in rearones. No LFE (low-frequency-effects) channel. Picture content: viewof playing musicians in orchestra.

Live recording. Instruments panned to front channels with reverbera-tion in rear channels. Center channel: mainly leading vocal. Picturecontent: view of leading vocalist and musicians performing on stage.

Live recording. Instruments mixed to all channels. Center channel:leading vocal, kick and snare drum. Rear channels: piano and stringsection. Picture content: view of leading vocalist and musicians per-forming on stage.

Typical movie excerpt. Center channel: dialogue. Front left and rightchannels: some special audio effects. Orchestral music spread aroundall loudspeakers except center one. Front loudspeakers louder thanrear ones. No LFE channel. Picture content: group of talking people.

Typical TV show with audience (live). Audience laughter andapplause in all channels. Center channel: mainly voice of presenter,also audience laughter. No LFE channel. Picture content: presenterand audience.

Applause in all channels. Very spatial and enveloping item. No LFEchannel. Picture content: view of audience and musicians.

Page 9: Journal AES 2003 Sept Vol 51 Num 9

PAPERS EFFECTS OF MIX-DOWN ALGORITHMS ON SOUND QUALITY

front channels. The next algorithm (2/1) represents the casein which the center channel is down-mixed to the front leftand right channels with the rear channel operating in amono mode. In the 1/2 algorithm all front channels aredown mixed to the center channel without any modifica-tions of the rear channels. This algorithm was not indicatedby any recommendation, but it was included in our selec-tion due to its possible applicability (for example, for pro-gram material with a narrow front image). The next algo-rithm described in Table 2 is is denoted by 3/1. In thisalgorithm the front channels are not processed, but the rearchannels are down-mixed to mono. For the 2/2 algorithmthe center channel is down-mixed to the front left and rightchannels, and the rear channels remain unchanged. The lastalgorithm used in this experiment is denoted by LR-mono.Similarly to the 1/2 algorithm, it was not indicated by anyrecommendation. This algorithm was included in theexperiment mainly because the results of the informalpilot tests showed that for some program material it wasone of the “best” algorithm in terms of audio quality.

A broad range of program material auditioned using thealgorithms selected, and some informal trials of optimiz-ing the down-mix equation were performed. The resultsconfirm that the coefficients presented in Table 2 wereacceptable in terms of resultant audio quality. It does notmean that the coefficients presented for each down-mix

algorithm are optimal, but they were found to be the bestin the informal tests. In order to optimize each down-mixalgorithm, according to the informal test undertaken bythe authors, it would be necessary to modify the equationsby taking into account any short-time cross correlationbetween channels and their instantaneous spectra.Therefore optimization of the down-mix algorithms wouldrequire engineering of advance time-variant and fre-quency-dependent algorithms evaluated by formal listen-ing tests, which is beyond the scope of this experiment.

Regardless of the type of down-mix algorithm, it wasdecided to preserve the low-frequency-effects (LFE) chan-nel in recordings that originally contained this channel. Inother words, the content of the LFE channel was not mod-ified in the down-mix versions of the original recordings.

3 SELECTION OF LISTENING PANEL

The listening panel consisted of ten experienced listen-ers selected from the group of listeners who had alreadytaken part in previous experiments [1]. This group wasrecruited using a special screening procedure duringwhich a questionnaire, audiometric measurements, and aspecial “discrimination” test were carried out to verifyeach listener’s reliability and consistency. During the dis-crimination test listeners were asked to evaluate the qual-

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 783

Table 2. Weighting factors in down-mix equations—see Eq. (1) for comparison.

Input ChannelsOutput

Down-Mix Algorithm Channels L R C LS RS

Mono—1/0 format C 0.71 0.71 1 0.5 0.5

Stereo—2/0 L 1 0 0.71 0.71 0R 0 1 0.71 0 0.71

Three channels—3/0 format L 1 0 0 0.71R 0 1 0 0 0.71C 0 0 1 0 0

Three channels—2/1 format L 1 0 0.71 0 0R 0 1 0.71 0 0

LS RS 0 0 0 0.71 0.71

Three channels—1/2 format C 0.71 0.71 1 0 0LS 0 0 0 1 0RS 0 0 0 0 1

Four channels—3/1 format L 1 0 0 0 0R 0 1 0 0 0C 0 0 1 0 0

LS RS 0 0 0 0.71 0.71

Four channels—2/2 format L 1 0 0.71 0 0R 0 1 0.71 0 0LS 0 0 0 1 0RS 0 0 0 0 1

Four channels—LR-mono format L R 0.5 0.5 0 0 0C 0 0 1 0 0LS 0 0 0 1 0RS 0 0 0 0 1

Note: L—left; R—right; C—center; LS—left surround; RS—right surround.

Page 10: Journal AES 2003 Sept Vol 51 Num 9

ZIELINSKI ET AL. PAPERS

ity of slightly impaired items and hidden reference(unprocessed recording). This procedure allowed to findout how accurately listeners can discriminate between thehidden reference and slightly impaired items. More detailsconcerning this procedure can be found in [1].

Although selected listeners demonstrated high reliabil-ity and consistency in previous experiments investigatingthe effects of bandwidth limitation, one could not be surethat these listeners would perform equally well in anexperiment related to spatial deterioration. It was also rec-ognized that most of the listeners might have been biaseddue to their habits of listening to traditional two-channelstereo recordings. Therefore it was decided to provide thelisteners with the opportunity of extra training (1 hour persubject). After reading the instructions the subjectswatched a short audiovisual excerpt and then were askedto listen to a number of different surround recordings(approximately 40 excerpts). The intent of this watchingand listening was to present to the listeners typical sur-round recording, and thus to minimize their traditional“two-channel stereo” bias. Toward the end of the trainingthe listeners could familiarize themselves with the inter-face and take part in exemplary listening test. The instruc-tions given to the subjects in the training phase are pre-sented in the Appendix.

4 EQUIPMENT

Five loudspeakers were arranged according to ITU-RBS.775 [8] (see Fig. 1). The distance between the loudspeak-ers and the center listening position was 2.1 m. The sub-

woofer was located behind the center loudspeaker about 30mm from the wall and 350 mm from the center loudspeaker.

A TV monitor (42 in plasma display, 16:9 aspect ratio)was used for the visual presentations. The distancebetween the TV monitor and the listener was set to 4H,where H is the height of the viewing area. This distanceconformed to [10]. It was not easy to decide where toinstall the TV monitor with respect to the center loud-speaker. Several options were informally tested.Eventually it was decided to set up the TV monitor belowthe center loudspeaker and to set the center loudspeakerhigher than the remaining channels. This was the mostcomfortable arrangement for the listeners/viewers. Tominimize the phase distortion at high and mid frequenciesdue to the different distances between the listener and thetweeters of the front loudspeakers, the center loudspeakerwas installed upside down in such a way that the tweeterswere aligned at the same height (see Fig. 2). Informalsubjective tests showed that this arrangement did notcause the audio quality to deteriorate noticeably, and thefurther adjustments (such as loudspeaker phase alignment)were not needed.

In this experiment all channels (L, R, C, LS, RS) weredriven without a bass management system in order to min-imize any undesired effects (deterioration of spatial char-acteristics) due to the bass management system. Thereforeall channels were connected directly to the correspondingloudspeakers. The LFE channel was connected directly tothe subwoofer. The gain of the LFE channel in the consolewas set 10 dB higher than the gain of the main channels.The technical specifications of the loudspeakers used in

784 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

Fig. 1. Loudspeaker setup used in experiment A—center listening position; B—off-center listening position. Dimensions in meters.

Subwoofer0.82

Center

Left Right

TV monitor

60o

radius = 2.1

5.35115o 115o

A B

0.71.5

Leftsurround

Rightsurround

1.0

4.55 1.06

7.35

Page 11: Journal AES 2003 Sept Vol 51 Num 9

PAPERS EFFECTS OF MIX-DOWN ALGORITHMS ON SOUND QUALITY

the experiment are presented in the Appendix.The listening tests were automated using the Alex soft-

ware developed at the Institute of Sound Recording. Itwas run on an SGI computer with built-in digital audio(ADAT) and analog video extension cards. The audioitems were stored using six-channel uncompressed “wav”audio files (16-bit resolution, 48 kHz sampling rate)whereas the accompanying video material was stored inthe M-JPEG format using a spatial compression factor of0.85. The audio signal was transmitted digitally from theSGI computer to a digital mixing desk (Yamaha O2R) andthen fed to the active loudspeakers using analog connec-tions. A computer monitor with a mouse was setup in frontof the listener low enough that any distortion due toacoustical “shadowing” or reflections were minimized.The computer monitor did not hide the TV monitor.

Luminance of the TV monitor, the luminance of com-puter monitor, and the background room illuminationwere not measured, but these parameters were kept con-stant during the experiment. The illuminance of TV mon-itor was set to the standard settings.

5 ACOUSTICAL CONDITIONS

The listening tests were conducted in the listening roomof the Institute of Sound Recording at the University ofSurrey, UK. The acoustical parameters of this room con-form to the requirements of ITU-R BS.1116 [11].

5.1 Level AlignmentAll channels L, R, C, LS, RS were aligned relative to

each other with a tolerance of less than 0.3 dB SPL(measured at the reference listening position). Absolutelevel alignment was carried using a modified SurroundSound Forum approach. This method is explained indetail in [12]. The modification consisted in reducing thealignment signals by 10 dB. The reason for this modifi-cation was the high sensitivity of the loudspeakers in con-junction with the high analog output level of the console’s

digital-to-analog converters. The alignment procedurewas as follows:

• Band-limited pink noise ( 200Hz to 20 kHz, 30 dB FSrms) was generated consecutively through each mainloudspeaker (one channel at a time). The input sensitiv-ity potentiometers in each loudspeaker were adjusted toachieve a sound pressure level (SPL) at the optimum lis-tening position equal to 78 dBA (slow).

• Band-limited uncorrelated pink noise (200 Hz to 20kHz, 30 dB FS rms) was generated through all mainchannels at the same time. The SPL measured at theoptimum listening position was equal to 85 dBA (slow).

All measurements were performed using a 1/2 in pres-sure microphone (B&K type 4143) at the center listeningposition (measurements were carried out only at one lis-tening position). The microphone was installed at a heightof 1.2 m and was pointing upward.

The level of subwoofer was aligned using band-limitedpink noise in the LFE channel (20Hz to 200 Hz). This sig-nal was reproduced over the subwoofer and the centerchannel by means of a simple bass management systemused only for alignment purposes. (The bass managementwas not used during the experiment.) The reason for usingthis bass management system was to avoid an overlap inthe frequency region determined by the lower cutoff fre-quency of the center loudspeaker (41 Hz) and the uppercutoff frequency of the subwoofer (85 Hz). Details con-cerning the bass management system used for alignmentpurposes are described in [1]. The spectrum of the result-ant sound at the optimum listening position was analyzedin one-third-octave bands. The subwoofer sensitivity wasadjusted to achieve a maximally flat response within therange of frequencies from 20 Hz to 200 Hz measured inone-third-octave bands. Once the alignment of the sub-woofer had been completed, the gain of the LFE channelin the console was set 10 dB higher than the gain of themain channel.

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 785

Fig. 2. Setup of center loudspeaker with respect to TV monitor and other loudspeakers. Dimensions in meters.

Center

0.32

0.180.27

0.49Left Right

0.450.45 0.36 0.22

0.03

TV monitor

1.060.64(42”)

1.04

Page 12: Journal AES 2003 Sept Vol 51 Num 9

ZIELINSKI ET AL. PAPERS

5.2 Loudness AlignmentThe loudness of all stimuli (both original and down-

mixed) used in the experiment was equalized in order tominimize any experimental error due to loudness changes.Equalization was performed only at the center listeningposition. The level of audio source material was adjustedto achieve a loudness of 41 sones at the listening position.This value was assessed by the author as the most com-fortable during informal listening tests. Loudness evalua-tion was accomplished by the measurement of equivalentSPLs Leq in one-third-octave bands over a 32 second timewindow (audio material was looped). The loudness wascalculated using Moore’s loudness model [13]. Since theloudness model was originally developed for stationarysignal only (the model for nonstationary signals was notavailable to the author in the course of this experiment), itwas necessary to check its applicability to the loudnessequalization of the nonstationary, but relatively consistentaudio material used in this experiment. Informal listeningtests showed that the results obtained were acceptable,although some loudness differences between differentdown-mixed algorithms were still perceivable (especiallybetween the reference and the mono version).

6 EXPERIMENTAL DESIGN

6.1 Listening Test MethodIt was decided to use a modified double-blind multi-

stimulus test method with hidden reference and a hiddenanchor [14], [15] as a basis for the experimental design.The main reason for this choice is its suitability for theassessment of medium and large impairments. The qualityof most of the processed items used in this experiment wasdegraded quite considerably. Moreover this test allows fora quick comparison and assessment of a large number ofstimuli, which is beneficial in terms of the duration of a lis-tening test. (Details about the number of stimuli assessedby each listener and the duration of the test are discussedlater in this section.) In order to check whether listenerscould discriminate between the original and processeditems correctly, the original unprocessed item was includedin each test among other items to be evaluated. Thereforethis item is referred to as a “hidden reference” and wasused solely as control item to check the listeners’ perform-ance. Another control item called “hidden anchor,” being amono version of the original recording (1/0 format), wasemployed in the tests mainly for two reasons. First to makethe listeners use the scale in a consistent matter, and secondto check their consistency of grading. (Details concerningthis issue will be discussed in section 7.1.) Moreover, thehidden anchor was also intended to be used as a known“bad” quality item to force the listeners’ results to spanmost of the scale. The type of hidden anchor used in theexperiment was different from the one suggested by theMUSRA recommendation [14] in order to avoid a possibleconfusion during the evaluation process. (Originally a 3.5kHz low-pass filtered version of the original recording isrecommended to be used as hidden anchor.) In otherwords, it was decided to keep the nature of the degradation

of audio items the same in the entire experiment and not tomix low-pass-filtered items with the down-mixed items.(According to informal tests undertaken by the authors, thesimultaneous evaluation of the quality of items having dif-ferent types of degradation is more difficult than the evalu-ation of items degraded in a similar way, that is only bymeans of the down-mix algorithm.)

Before taking part in the tests the listeners were askedto read the instructions carefully and to listen to the origi-nal (unprocessed) items and the most degraded ones (ver-sions down-mixed to mono). Then they participated in thelistening test. Listeners were asked to grade basic audioquality defined as the global attribute describing any andall detected differences between the reference and theevaluated excerpts. It was emphasized that this attributemight include differences in timbre, in spatial characteris-tics, in the number of active channels, balance, dynamicrange, changes in front image, changes in the localizationof audio sources, changes in envelopment, occurrences ofany type of linear or nonlinear distortion, any kind ofnoise or distortion, distortion caused by compression algo-rithm, phase distortion, and so on. The reason for provid-ing such a long list of possible subattributes contributingto overall basic audio quality was to reduce any biascaused by the fact that listeners may arbitrarily assumethat this global attribute is only related to timbre, distor-tion, and noise presence, and may not take into considera-tion any spatial quality aspects. In other words, listenersmay identify basic audio quality only with “conventionaltechnical quality of audio,” neglecting any spatial audioquality distortions. The grading scale used in this experi-ment is presented in Table 3.

Since various down-mix algorithms may affect the per-ceived audio quality differently in different listening posi-tion, it was decided to evaluate the audio quality at twofixed listening positions: (A) center and (B) off-center.These two positions roughly corresponded to the typicaldomestic situation for two people sitting next to eachother. The distance between these two positions was equalto 0.7 m (see Fig. 1). In the listening room environmenttwo chairs were positioned according to the listening posi-tions investigated. During the listening tests investigatingthe effect of the off-center listening position, each listenercould switch between these two chairs at his or her dis-cretion in order to evaluate possible difference in audioquality between center and off-center positions.

Informal listening tests at the off-center position per-formed by the authors prior to the experiment showedthat the front image was skewed for some programmed

786 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

Table 3. Scale used for evaluatingaudio quality.

Quality Grading Range

Excellent 80–100Good 60–80Fair 40–60Poor 20–40Bad 0–20

Page 13: Journal AES 2003 Sept Vol 51 Num 9

PAPERS EFFECTS OF MIX-DOWN ALGORITHMS ON SOUND QUALITY

material, leading to a localization mismatch betweenaudio and video cues. Moreover the right surroundchannel seemed to be too loud in comparison withremaining channels for programmed material having theF-F spatial characteristic.

The evaluation procedure was different at the two lis-tening positions. At the center position the listeners werejust asked to grade the quality of processed items in termsof basic audio quality. An exemplary graphic user inter-face used at the center listening position is presented in theAppendix (Fig. 13). The main button (REFERENCE) wasused to play back the original (unprocessed) item. Buttonslabeled 1 to 6 represent excerpts to be graded (processeditems, the hidden reference, and the hidden anchor).Sliders were used to record scores given by the listenersfor each item. Listeners were instructed that the scale wascontinuous and that they were free to record their scoresusing any number from the minimum to the maximum ofthe scale. Labels (numbers) on the scale defined onlysome characteristic points. Subjects were instructed that atthe center listening position one or more excerpts shouldbe given the maximum grade on the scale because theunprocessed reference excerpt was included as one of theexcerpts to be graded. (It was assumed that the quality ofeach original item at the center listening position wasexcellent because it referred closely to the intent of theproducer.)

At the off-center position the subject were asked to per-form two separate tasks. The first task was to evaluate thequality of the original excerpt at the off-center position inrelation to the quality of this item perceived at the centerposition. During this task the listeners were free to move(they could swap the listening positions at their discre-tion). This way they could compare the quality of the orig-inal excerpt at the off-center position with the quality ofsame excerpt at the center position and evaluate a possibleloss of quality due to changing from the center position tothe off-center position. After completion of this task theywere asked to perform the second task, which was gradingthe quality of the processed items. During the second taskthey could not move from the off-center position. In otherwords, at both listening positions listeners were asked tograde the quality of down-mixed items, but at the off-centerlistening position the subjects were also asked to grade thequality of the original excerpt in comparison with thequality of this excerpt at the center listening position. Theuser interface used at the off-center listening position wasdifferent from that used at the center listening position.The main difference was that the button R representing thereference item was made smaller and was aligned in thesame row together with the remaining items to be graded.An extra slider associated with this button was alsoinserted between other sliders. This modification allowedfor the evaluation of the original excerpt at the off-centerlistening position.

Listeners could listen to the excerpts in any order, anynumber of items. The audio signal material was loopedduring each trial. The subjects were able to switchbetween different audio items at their discretion. It isimportant to note that after switching to a new item, play-

back was continued from the time “point” the previousitem had reached during switching. In other words, it wasa synchronous type of switching. A cross-fade transitionbetween switched audio items was used in order to avoidany problems with clicks. The looped accompanyingvisual excerpt was displayed synchronously with theaudio.

The listeners were asked to have their eyes closed dur-ing audio-only presentation and to keep their eyes openand fixed on the TV monitor when the audiovisual mate-rial was presented. They had to look at the computer mon-itor occasionally in order to record their scores using amouse, but most of the time they could switch between thestimuli “by touch” using the computer keyboard. It wasemphasized that during the audiovisual presentation sub-jects were still expected to grade the quality of the audio,not video. The detailed instructions given to the listenersare provided in the Appendix.

6.2 Experimental FactorsThe following factors were used in the experiment:

down-mix algorithm (DMIX), program material (ITEM),listening position (POS) and picture presence/absence(PICTURE). Table 4 shows all experimental levels andvalues corresponding to these factors. The PICTURE fac-tor was included in the experiment in order to checkwhether there is any relation between the evaluations ofaudio quality with and without native pictures accompa-nying the audio material.

6.3 Blocking and RandomizationThe experiment was designed as a full factorial one.

(Each listener took part in each experimental condition,being a combination of all experimental factors and lev-els.) Therefore taking into account all possible combina-tions of the experimental factors and levels, there were216 excerpts to be graded by each listener (9 6 2 2). This number did not include the listeners’ “errorcheck” excerpts such as hidden reference (HR) or hiddenanchor (HA). Since there were too many excerpts for theevaluation to be completed in one session, it was neces-sary to block them into eight separate sessions. For tech-nical reasons the main blocking factors were the listeningposition and the picture. Another blocking factor was item(in each session three items out of six were evaluated).The down-mixed versions of each item were also blockedbetween two consecutive trials, representing two sets ofstimuli to be graded (two consecutive windows in thegraphic interface). In each trial the hidden reference HRand the hidden anchor HA were included in order to checkthe listener’s consistency or reliability. During the experi-mental design the schedule of the sessions and the order ofpresentation of the items within each session were ran-domized for each subject individually in order to mini-mize the carry-over effect. The order of assigning thestimuli to buttons on the graphic interface was also ran-domized. The average duration of one session was about25 minutes. Breaks between two consecutive sessions foreach subject were never shorter than 1 hour (on averagethe breaks lasted a few hours or sometimes even a few

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 787

Page 14: Journal AES 2003 Sept Vol 51 Num 9

ZIELINSKI ET AL. PAPERS

days). The whole listening test was carried out within twoweeks.

7 DATA ANALYSIS

7.1 Test of Listeners Reliability andConsistency

In this paper the term “reliability” is interpreted as a lis-tener’s ability to identify the hidden reference correctly atthe center listening position. It is expected that reliable lis-teners should grade the hidden reference at the center lis-tening position using the maximum score on the scale(100) since there are no perceptual differences betweenthe reference and the hidden reference (it is the sameexcerpt). Since the expected mean value of the scoresgiven for the hidden reference at the off-center positionwas unknown (a possible loss of quality due to the subop-timal listening position), it was not legitimate to use thescores obtained at the off-center position for testing thelisteners’ reliability. In order to check listener reliability,only the scores given for the hidden reference at the cen-ter listening positions were analyzed for each subject sep-arately. Results showed that five of the ten listeners thattook part in the experiment did not make any mistake(they always identified the hidden reference correctly).The remaining listeners made some mistakes occasionally.For example, they evaluated the hidden reference usingscores ranging from 85 to 99. In two extreme cases listen-ers evaluated the hidden reference using scores from themiddle range of the scale (50–70). The results of the t testshowed that scores obtained for only one subject were sig-nificantly different from the expected value (100). Closeexamination of the scores given by this subject showedthat he made several small mistakes (scores ranging from90 to 99). Consideration was given to postscreening all the

scores given by this listener due to his low reliability, butthe results of an additional test (discussed later) showedthat this listener was one of the best in terms of consis-tency. Therefore, although this listener can be character-ized as the least reliable, his scores were included in allfurther analysis.

There were no repetitions included in the experimentaldesign, apart from the hidden reference (two repetitions ateach listening position) and the hidden anchor (three rep-etitions at each listening position). Therefore, the scoresobtained for these two items were used for evaluating eachlistener’s inconsistency. It was decided to exclude fromthis analysis the scores obtained for the hidden referenceat the center listening position since this is a relativelyeasy task in terms of consistency. In order to check eachlistener’s inconsistency an ANOVA test was performedseparately for each subject. A square root of the error vari-ance from the ANOVA analysis was used as a measure ofa listener’s inconsistency. The reason for calculating thesquare root of the error variance instead of taking intoaccount its direct value was the need for having a measurethat is comparable to the “units” of the original scale usedin the test. For example, the square root of the error vari-ance for the most consistent listener was equal to 0.1,which can be interpreted that the average inconsistencyerror measured on a 100-point scale was equal to only 1point (a surprisingly high consistency). A more detailedanalysis of the data obtained from this listener showed thathis low error value was due to the fact that he was gradingusing some characteristic points on the scale such as labelsor midway points between labels. The average inconsis-tency error for the least consistent listener was 8 points.Interestingly, the listener showing the least reliability inthe previously discussed test was one of the most consis-tent listeners (average inconsistency error equal to 3.3.

788 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

Table 4. Experimental factors, levels, and values usedin experiment.

Factors Number of Levels Values

Down-mix 9 1/0 (mono)2/0 (stereo)

3/02/11/23/12/2

LR-monoHR (hidden reference)

Item 6 Classical F-BPop F-BPop F-F

Movie F-BTV show F-FApplause F-F

Position 2 CenterOff-center

Picture 2 OnOff

Page 15: Journal AES 2003 Sept Vol 51 Num 9

PAPERS EFFECTS OF MIX-DOWN ALGORITHMS ON SOUND QUALITY

Taking into account the fact that all listeners were rela-tively consistent (the absolute value of the inconsistencyerror was less than 10) no postscreening was needed.

7.2 Test of ANOVA AssumptionsThree main assumptions for the ANOVA test are as fol-

lows: 1) independence of grading, 2) normal distributionof scores for each case, and 3) homogeneity of variancebetween cases.

There are several mechanisms that cause dependency ingrading. For example, evaluation of the quality of a givenitem may be affected by the quality of other excerpts con-tained in the user interface. Another possible source ofdependency in grading is the visual influence of the sliderpositions in the interface. These and other possible sourcesof dependency were minimized in this experiment due tothe randomization of the experimental factors.

The distributions of scores obtained for all experimentalfactors were examined using the Kolmogorov–Smyrnovtest. The results revealed that the scores for most of theexperimental conditions show significant departures fromnormality.

Another assumption for the ANOVA is that the data in eachcell come from populations with the same variance (homo-geneity of variance). This assumption was also violated.

According to the results obtained two main assumptionsfor ANOVA were violated: normality of distributions andhomogeneity of variance. Nevertheless it is known that theANOVA test is “robust” to violations of the normalityassumption provided the sample size is large (minimum15 cases per group) [16]. In this experiment this require-ment was fulfilled. Moreover, an ANOVA test may stillgive reliable results even when the variances are not equalacross different groups, provided the number of cases ineach group is the same [17]. this condition was not ful-filled in the experiment since the hidden reference HR andthe hidden anchor HA were evaluated more frequentlythan other excerpts (unbalanced design). Therefore it was

decided to balance the data obtained (equalizing the num-ber of cases across groups) by calculating and taking intothe ANOVA test the mean values of the scores obtained forthe hidden reference HR and the hidden anchor HA(“Raw” scores obtained for HR and HA were ignored inthe main analysis.) After this preprocessing of the data theuse of ANOVA in our experiment is legitimate. The scoresobtained for different subjects are not normalized.

8 RESULTS

8.1 ANOVA TestThe ANOVA test was performed using a custom model

comprising the main effects and all second-order interac-tions. Subjects were also included as an experimental fac-tor in the ANOVA model. The results of the ANOVA testare presented in Table 5.

According to the results four factors and eight interac-tions were detected as significant at P < 0.05. It is impor-tant to note that the F values presented in this table can beused only for determining whether a particular factor orinteraction is significant and they cannot be used for esti-mating the magnitude of an experimental effect. (“Thefact that an analysis of variance has produced a significantF simply tells us that there are differences among themeans of treatments that cannot be attributed to error. Itsays nothing about whether these differences are of anypractical importance” [17].) Therefore in order to estimatethe magnitude of the effects observed, it was decided touse the partial eta squared (η2) values presented in the lastcolumn of Table 5.

Fig. 3 shows the magnitude of the effects observed in adescending order. According to expectations the down-mixalgorithms (DMIX) influenced the most the scoresobtained (η2 0.715). A strong interaction between pro-gram material and down-mix algorithms was also detected(ITEM ∗ DMIX, η2 0.489). The results presented in Fig.3 indicate a large variability between subjects (SUB, η2

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 789

Type III Sum PartialSource of Squares df Mean Square F P η2

DMIX 694058.477 8 86757.310 603.582 0.000 0.715ITEM 29984.573 5 5996.915 41.721 0.000 0.098POS 27115.351 1 27115.351 188.645 0.000 0.089PICTURE 154.242 1 154.242 1.073 0.300 0.001SUB 109270.826 9 12141.203 84.468 0.000 0.283DMIX ∗ ITEM 265220.074 40 6630.502 46.129 0.000 0.489DMIX ∗ POS 20962.445 8 2620.306 18.230 0.000 0.070DMIX ∗ PICTURE 1685.716 8 210.714 1.466 0.165 0.006DMIX ∗ SUB 49631.601 72 689.328 4.796 0.000 0.152ITEM ∗ POS 2788.123 5 557.625 3.879 0.002 0.010ITEM ∗ PICTURE 1953.682 5 390.736 2.718 0.019 0.007ITEM ∗ SUB 12860.120 45 285.780 1.988 0.000 0.044POS ∗ PICTURE 1720.617 1 1720.617 11.971 0.001 0.006POS ∗ SUB 2110.385 9 234.487 1.631 0.101 0.008PICTURE ∗ SUB 7112.469 9 790.274 5.498 0.000 0.025Error 277125.793 1928 143.737Total 9932461.040 2155Corrected total 1502175.288 2154

Note: Dependent variable—Score.

Table 5. Results of ANOVA test. Tests of between-subjects effects.

Page 16: Journal AES 2003 Sept Vol 51 Num 9

ZIELINSKI ET AL. PAPERS

0.283). A strong interaction between subjects and down-mix algorithms was also found (SUB ∗ DMIX, η2 0.152). The results obtained depend also on the programmaterial (ITEM, η2 0.098). It was found that the scoresobtained were different for different listening positions(POS, η2 0.089) and that this effect was influenced bythe type of down-mix algorithm (POS ∗ DMIX, η2 0.07). There was a group of interactions that, although sta-tistically significant, had only a small effect on the experi-mental results (SUB ∗ ITEM, SUB ∗ PICTURE, POS ∗

ITEM, PICTURE ∗ ITEM, and POS ∗ PICTURE, η2 <0.05). Thus the experiment can be summarized by plottingthe results for the following factors and interactions:ITEM ∗ DMIX, SUB ∗ DMIX, and POS ∗ DMIX.

8.2 Effects of Down-Mix AlgorithmsAs discussed in the previous section, the down-mix

algorithms and the interaction between down-mix algo-rithms and program material affected the experimentalresults in the most significant way. Fig. 4 shows the effectsof the down-mix algorithms for different program materialaveraged across both listening positions.

The “best algorithm for the Classical F-B item was thealgorithm in which the rear channels were down-mixed tomono (3/1). This algorithm was also assessed subjectivelyas the best in the case of the remaining two items havingthe F-B spatial characteristic (Pop F-B and Movie F-B).This observation indicates that results are strongly depend-ent on the spatial characteristic of the program material.For example, it is possible to note that for all the items hav-ing the F-B spatial characteristic the second algorithm giv-ing the best results is the down-mix to front channels (3/0).It can be summarized that for program material with the F-Bspatial characteristic limiting the number of channels fromfive to four was achieved with a minimum loss of qualityby means of the 3/1 down-mix algorithm. A further limita-tion of the number of channels down to three was accom-plished successfully by using the 3/0 down-mix algorithm(down-mix to front channels only) with the resultant“good” or even “excellent” quality.

In the case of program material having the F-F spatial

790 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

Fig. 4. Effects of different down-mix algorithms. Scores averaged across both listening positions. Marginal means and 95% CI esti-mated by ANOVA model.

Classical F-B Pop F-B Pop F-F

M ovie F-B TV Show F-F Applause F-F

0

20

40

60

80

100

Sc

ore

s

1/02/0

3/02/1

1/23/1

2/2LR-Mono

HR

Down-Mix

0

20

40

60

80

100

Sc

ore

s

1/02/0

3/02/1

1/23/1

2/2LR-Mono

HR

Down-Mix

1/02/0

3/02/1

1/23/1

2/2LR-Mono

HR

Down-Mix

Fig. 3. Magnitude of significant effects and interactions onscores obtained according to partial η2 values from ANOVA test.

S E

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

2

PO

S *

PIC

TU

RE

PIC

TU

RE

* I

TE

M

PO

S *

IT

EM

SU

B *

PIC

TU

RE

SU

B *

IT

EM

PO

S *

DM

IX

PO

S

ITE

M

SU

B *

DM

IX

SU

B

ITE

M *

DM

IX

DM

IX

Page 17: Journal AES 2003 Sept Vol 51 Num 9

PAPERS EFFECTS OF MIX-DOWN ALGORITHMS ON SOUND QUALITY

characteristic the subjectively best down-mix algorithmsare different for different items. For example, for the PopF-F item, LR-mono was evaluated as the best four-channeldown-mix algorithm and 1/2 as the best three channelalgorithm. For the TV show F-F item, LR-mono and 3/1were assessed as the best four-channel algorithmswhereas 3/0 together with 1/2 were the best three channelalgorithms. For the Applause F-F item, the 2/2 down-mixalgorithm was evaluated as the best four channel algo-rithm whereas statistical differences between the meanvalues obtained for the three channel algorithms wereinsignificant (similar audio quality). Therefore it is diffi-cult to summarize the results for program material havingthe F-F spatial characteristic. According to the results itcan be concluded that all the four-channel down-mix algo-rithms investigated (LR-mono, 3/1, and 2/2) can be usedsuccessfully, but the subjective results depend strongly onthe program material. The 1/2 and 3/0 down-mix algo-rithms were assessed as the best three channel algorithmsfor program material having the F-F spatial characteristic.

Not surprisingly, regardless of the program type, theworst results were obtained using the down-mix to monoalgorithm (1/0). It is clear that this was caused by a sub-stantial deterioration of the spatial characteristic com-bined with a colorization of sound (bass tip up or acomb-filtering effect were especially pronounced for theApplause F-F item.)

The mean values of the scores obtained for the down-mixto stereo (2/0) can be characterized as “fair.” The onlyexception is the Classical F-B item for which this algo-rithm was assessed as “good.”

It is noteworthy that irrespective of the program mate-rial, the mean scores obtained for the original recordings(hidden reference HR) averaged across both listening posi-tions were “excellent” (80–100). This result indicates thatthe system of multichannel audio reproduction investigated(5.1 setup) may provide satisfactory audio quality for boththe center and the off-center listening positions. This issue,however, needs further investigation in order to quantify

more precisely a map of the areas of equal quality.As discussed previously, the subjective effects of

down-mix algorithms depend on the spatial characteristicof the program material. In order to understand this phe-nomenon better it was decided to plot and analyze thescores for the F-B and F-F spatial characteristics sepa-rately (Fig.5). All down-mix algorithms investigated weredeliberately divided into three groups: A, B, and C. GroupA consists of the down-mix algorithms affecting bothfront and rear channels (1/0, 2/0, and 2/1), group B is com-posed of the algorithms affecting only the front channels(LR-mono, 2/2, and 1/2), whereas group C comprises thealgorithms affecting only the rear channels (3/1 and 3/0).It is possible to notice that for the first group of algorithms(group A: 1/0, 2/0, 2/1) the results do not depend signifi-cantly on the spatial characteristics (confidence intervalsoverlap). However, for the second group of down-mixalgorithms (group B: LR-mono, 2/2, 1/2) significant inter-action with the spatial characteristics can be observed. Themean scores obtained for the F-F spatial characteristics arebetter than those obtained for the F-B characteristic.Interestingly, an opposite interaction can be observed forthe last group of down-mix algorithms (group C: 3/1, 3/0).The scores obtained for the items having the F-F spatialcharacteristic are worse in comparison with the scoresobtained for program material with the F-B characteristic.A more detailed analysis of Fig. 5 reveals an interesting“rule” that program material with the F-B spatial charac-teristic is “robust” to quality degradation when it isprocessed using the algorithms affecting only the rearchannels (group C). Conversely, the quality of programmaterial with the F-B spatial characteristic can deteriorateconsiderably when the algorithms applied modify the con-tent of the front channels (groups A and B).

8.3 Effects of Changing the Listening PositionThe effects of changing the listening position for differ-

ent down-mix algorithms are illustrated in Fig. 6. This fig-ure shows the marginal means estimated by the ANOVAmodel presented separately for different down-mix algo-rithms. For clarity 95% confidence intervals (CI) are notpresented in this figure. (Information about the signifi-cance of differences can be based solely on a t test.)According to the results of the t test, differences betweenmeans obtained for the center and off-center listeningpositions are insignificant only for the 1/0 and LR-monodown-mix algorithms. For the remaining down-mix algo-rithms these differences are significant at P < 0.1. Theresults presented in this figure show that the down-mix tomono (1/0) resulted in “poor” audio quality and that thesubjective impression related to this algorithm wasassessed as being the same at both listening positions. Inother words, the down-mix to mono sounds equally poor“everywhere.” Surprisingly the mean scores estimatedfor the LR-mono down-mix algorithm were also inde-pendent of the listening position. This surprising obser-vation implies that the LR-mono down-mix may providethe same audio quality for a relatively large listeningarea.

The greatest differences between the scores obtained for

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 791

Fig. 5. Effects of down-mix algorithms presented separately fordifferent spatial characteristics of program material. Scores aver-aged across both listening positions. Means and 95% CI basedon raw scores.

CB

A

Spatial Characteristic

Page 18: Journal AES 2003 Sept Vol 51 Num 9

ZIELINSKI ET AL. PAPERS

different listening positions were observed for the hiddenreference (HR) and for the 2/2 down-mix algorithm. Thismeans that the loss of quality due to the change of listeningposition from the center to the off-center location is great-est for the original recordings and for the 2/2 down-mixalgorithm (so-called phantom center algorithm). The lastobservation confirms the superiority of the physical centerchannel over the phantom one. It is believed that this loss ofquality at the off-center listening position is due to the lackof a physical center channel.

For the original recordings (HR) the change in listeningposition from center to off-center caused a loss in qualityfrom 100 (“excellent”) to 83 (boundary of “excellent” and“good”). This observation confirms the previously drawnconclusion that the standard 5.1 multichannel audio setupmay provide good audio quality over a relatively large lis-tening area. Moreover, it is possible to note that apart fromthe results obtained for the original excerpts (HR) and the2/2 down/mix algorithms, the difference between theaudio quality perceived at the center listening position andthe audio quality at the off-center listening position wasnot substantial. Therefore it might be concluded that the5.1 multichannel audio setup may provide stable (or simi-lar) audio quality over a relatively large area for somedown-mix algorithms.

8.4 Listener VariabilityResults of the previously discussed ANOVA test

showed substantial variability between subjects and alsosignificant interaction between subjects and down-mixalgorithms. This interaction is plotted in Fig. 7. For rea-sons of simplicity this plot does not include interactionswith all down-mix algorithms (algorithms omitted: 1/2,2/1, 3/0, LR-mono) and it does not show the 95% confi-dence intervals. The results presented show that listener 2and listener 10 were more “tolerant” in their assessmentsthan other listeners (they were giving “better” scores than

others). This effect is especially noticeable for severelydeteriorated items. The least impaired items were gradedby all listeners in an almost similar way.

8.5 Audiovisual InteractionsThe picture factor by itself did not have any significant

effect on the scores obtained. However, according to theANOVA test, some interactions between picture and otherexperimental factors had small but statistically significanteffect on the results obtained. For example, Fig. 8 showsthe differences between the scores obtained during audio-visual presentation and audio-only presentation for eachsubject separately. Positive mean values representimprovement in audio quality due to the presentation ofmotion pictures, whereas negative values indicate anopposite interaction (zero represents no audiovisual inter-action). Stars in this figure indicate mean values that aresignificantly different from zero according to the results ofthe t test. It was found that some listeners were more sus-ceptible to pictures than others and also that the presenta-tion of pictures caused different effects for different sub-jects. For example, subjects 4, 9, and 10 had a tendency tograde the audio quality slightly “better” for audiovisualpresentations than for audio only. An opposite interactionwas found for subject 5.

Fig. 9 shows the differences between the scoresobtained during audiovisual presentations and audio-onlypresentations for each item. Only the diffgrades obtainedfor the Pop F-F item were significantly different fromzero, which means that for this item the audio quality wasgraded slightly better (up to 5 points on the 100-pointscale) when a picture accompanied the audio presentation.The remaining items were graded similarly for both audio-visual and for audio-only presentations.

Fig. 10 illustrates the interaction between picture andlistening position. It is possible to note that picture pres-ence caused a slight improvement in audio quality (up to

792 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

Fig. 6. Effects of changing listening position. Results averaged across audiovisual and audio-only presentation. Means and 95% CIbased on ANOVA model.

Down-Mix

HR

3/0

3/1

1/2

2/1

2/2

LR-Mono

2/0

1/0

Estim

ate

d M

arg

ina

l M

ea

ns

100

80

60

40

20

0

Listening Position

Centre

Off-Centre

Ce

Off-Ce

nter

nter

Page 19: Journal AES 2003 Sept Vol 51 Num 9

PAPERS EFFECTS OF MIX-DOWN ALGORITHMS ON SOUND QUALITY

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 793

Fig. 10. Differences between scores obtained with and without picture for different listening positions. Means and 95% CI based onraw data.

-4

-2

0

2

4

Dif

fgra

des

(Pic

ture

)

Center Off-Center

Listening Position

Fig. 8. Differences between scores obtained with and withoutpicture for different subjects. ∗— means significantly differentfrom zero. Means and 95% CI based on raw data.

Fig. 9. Differences between scores obtained with and withoutpicture for different items. ∗— means significantly differentfrom zero. Means and 95% CI based on raw data.

1 2 3 4 5 6 7 8 9 10

Subject No.

-10

-5

0

5

10

Dif

fgra

des

(Pic

ture

)

Classical F-B

Pop F-B

Pop F-F

Movie F-B

T V Show F-F

Applause F-F

-2.5

0.0

2.5

5.0

Dif

fgra

des

(Pic

ture

)

Fig. 7. Listener variability. Score averaged across both listening positions. Means estimated by ANOVA model.

10987654321

Estim

ate

d M

arg

ina

l M

ea

ns

100

80

60

40

20

0

Down-Mix

1/0

2/0

2/2

3/1

HR

1/0

2/0

2/2

3/1

HR

Page 20: Journal AES 2003 Sept Vol 51 Num 9

ZIELINSKI ET AL. PAPERS

4 points) at the center listening position. On the contrary,the scores obtained at the off-center position show that thelisteners had a tendency to grade the audio quality a bitlower (up to 3 points down) when a picture accompaniedthe audio presentation. This effect of a negative interactionwith the picture at the off-center listening position mighthave been caused by an audiovisual localization mismatchoccurring for some of the down-mix algorithms. In gen-eral, the results obtained show that video had a small (butstatistically significant) effect on audio scores. This obser-vation is in line with results obtained by Beerends andDeCaluwe [18].

A detailed analysis of the data obtained revealed a veryinteresting interaction between picture and listening posi-tion for some of the down-mix algorithms. Fig. 11 showsdifferences between scores obtained with and without pic-tures for program material having the F-B spatial charac-teristic. The presentation of the results was limited to themost interesting cases (down-mix algorithms 2/0, 2/2, and2/1). It is clear that the presentation of a picture caused aslight improvement in audio quality for the center listen-ing position. On the contrary, for the off-center listeningposition the presentation of a picture caused a substantialdeterioration of the audio quality. The magnitude of thisinteraction is relatively high (10 points). A detailedexplanation of this interesting observation is not easy. (Anadditional experiment focusing on this phenomenonwould be required.) However, it is very likely that thiseffect was caused by the previously mentioned audiovi-sual localization mismatch (spatial discrepancy betweenaudio and visual cues). It is interesting to note that alldown-mix algorithms presented in Fig. 11 could be char-acterized as different forms of a “phantom center” algo-rithm. (In each case the center channel was down-mixed tothe front left and right channels.) As a consequence, afront image skew could occur at the off-center listeningposition, which would result in the audiovisual localiza-tion mismatch. Regardless of the correctness of this expla-nation, the phenomenon observed supports a view of thehigh importance of the center channel, especially in thecontext of audiovisual systems.

8.6 Feedback from ListenersAfter the experiment had been completed, the listeners

were asked a couple of questions concerning differentaspects of the listening tests. According to their answersthe test was difficult. Subjects could easily hear the differ-ences between the stimuli, but it was difficult to gradethem. Although the listeners had not been informed aboutthe nature of the experiment, they easily recognized thatthe main perceptual differences between stimuli wererelated to spatial characteristics, number of active loud-speakers, positioning of audio sources, and changes infrontal image. They also noticed changes in bandwidth,coloration of sound, and phase distortions. Listenersreported that at the off-center listening position the near-est surround loudspeaker was too loud. They also noticedsome effects related to a front image skew (for example,one of the listeners reported that “the speech movedright”), which confirmed the results of the informal listen-ing tests performed by the authors prior to the experiment.

Some listeners noticed effects related to the audiovisuallocalization mismatch at the off-center listening position.For example, they reported that “it is disturbing whensomething you are seeing is not corresponding to what youare hearing” or that “the sound becomes divorced from theimage.” This observation shows that down-mix algorithmsthat may cause front-image skew should be used with cau-tion in the context of audiovisual systems. In particular,algorithms in which the center channel is not used (2/2,2/1, 2/0) should be avoided in audiovisual systems.

One listener reported an interesting problem concerningthe evaluation of the audio quality at the off-center listen-ing position in comparison with the audio quality perceivedat the center listening position. In general, it is possible touse two different criteria for evaluating the changes in spa-tial characteristics between the center listening positionand the off-center one, which may lead to different results.The first criterion assumes the “absolute frame of refer-ence” (audio sources are anchored to particular places inspace). This can be illustrated by an example of a listenerwho is changing seats in a concert hall—a change of the

794 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

Fig. 11. Differences between scores obtained with and without picture for selected down-mixed algorithms. Scores averaged only formaterial with F-B spatial characteristic. Means and 95% CI based on raw data.

2/0 2/2 2/1

Down-Mix

-20.0

-10.0

0.0

10.0

Dif

fgra

des

(Pic

ture

)

Listening Position

Off-Center

Center

Page 21: Journal AES 2003 Sept Vol 51 Num 9

PAPERS EFFECTS OF MIX-DOWN ALGORITHMS ON SOUND QUALITY

listening position affects the angles at which soundsources are perceived. Therefore in this case a front-imageskew is tolerable or even desired due to the change of thelistening position. The other criterion assumes the “rela-tive frame of reference” (audio sources move togetherwith the listener, and as a consequence the angles at whichthe audio sources are perceived remain constant). In thiscase it is expected that the front image and the overall spa-tial characteristic would be the same, regardless of the lis-tening position (for example, a leading vocalist shouldalways be positioned in front of the listener), and thereforeany changes to the front image are undesirable. There isno clear answer as to which criterion is “correct” for theaudio-only mode of presentation. However, for the audio-visual mode of presentation it is appropriate that the spa-tial quality of audio should be judged using the “absoluteframe of reference” approach, since the main audiosources are anchored to the TV monitor.

9 DISCUSSION

Despite the fact that the listeners had some experience inlistening to surround audio material, one cannot exclude thepossibility that the results obtained were biased by their habitsof listening to traditional two-channel stereo recordings.

One of the most surprising results concerned the LR-monodown-mix algorithm, which was evaluated as one of thebest algorithms for program material having the F-F spa-tial characteristic. Informal listening tests carried out forthis algorithm showed that although the front left and rightchannels operated in mono, the overall spatial impressionwas similar to the spatial impression of the originalexcerpt. It might have been caused by the fact that theroom may decorrelate the signals from these loudspeakersto some extent.

10 CONCLUSIONS

In this experiment a number of down-mix algorithms fora standard 5.1 multichannel audio setup were evaluated interms of basic audio quality. The investigation was limitedto two listening positions. The most important resultsobtained in the experiment are summarized in Table 6.

The limitation of the number of channels from five tofour for program material having foreground content inthe front and a background content in the rear (F-B spatialcharacteristic) may be accomplished “gracefully” bymeans of the 3/1 down-mix algorithm. A further limitationdown to three channels can be performed with minimumloss of quality using the 3/0 down-mix algorithm.

The results obtained for program material with fore-ground content in all channels (F-F spatial characteristic)are not very conclusive in the case of four-channel down-mix algorithms. It was found that all the four-channelalgorithms evaluated (LR-mono, 3/1, and 2/2) may givesubjectively good results, depending on the program mate-rial. The limitation of channels from five to three for pro-gram material having the F-F spatial characteristic may beachieved with the best subjective results by means of the1/2 or the 3/0 down-mix algorithm.

Degradation of the audio quality due to a change in lis-tening position from the center to the off-center was alsoinvestigated. The most pronounced degradation of quality(20 points on the 100-point scale) was observed for theoriginal recordings and for items processed using the 2/2down-mix algorithm (a so-called phantom center algo-rithm). Degradation of the quality for the items processedusing the remaining down-mix algorithms was muchsmaller, which shows that the standard 5.1 setup may pro-vide stable audio quality across a relatively large listeningarea for some down-mix algorithms.

The results obtained prove the importance of the centerchannel, especially in the context of audiovisual presenta-tions. It was found that the presentation of motion pictureshad a negative influence on the evaluation of audio qual-ity at the off-center listening position for down-mix algo-rithms not using the center channel (2/2, 2/1, 2/0).Informal feedback from the listeners supports the hypoth-esis that this effect was caused by audiovisual localizationmismatch.

As discussed in Section 1, one may criticize the newprogram material categorization method, which was usedin the experimental design, as potentially causing somebias. At this moment, this supposition can be “eitherdenied” or confirmed. (A separate experiment is plannedto investigate this issue in more detail.) Despite the factthat some bias might have influenced the program selec-tion process, the main conclusions drawn from the experi-ment will remain unaffected (3/1 is the “best” four-channeldown-mix algorithm whereas 3/0 is the “best” three-channeldown-mix algorithm).

It is hoped that the results reported in this paper willhelp broadcasters and codec designers to find optimumtradeoffs between the number of audio channels and theresultant audio quality.

11 ACKNOWLEDGMENT

The authors would like to express their gratitude toDavid Meares (BBC, R&D Department) for his commentson the results of the experiments described in this paperand for stimulating discussion. We would like to mentionBen Supper who helped with proofreading. This projectwas carried out with the financial support of theEngineering and Physical Sciences Research Council,UK. Some of the audiovisual excerpts used in this experi-ment were kindly supplied by BBC, R&D Department(used with permission).

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 795

Table 6. “Best” down-mix algorithms in terms ofbasic audio quality.

Spatial Characteristic ofProgram Material

Number of Channels in aDown-Mix Algorithm F-B F-F

Four channels 3/1 LR-mono or 3/1 or 2/2*

Three channels 3/0 1/2 or 3/0

*Possibility of audiovisual localization mismatch.

Page 22: Journal AES 2003 Sept Vol 51 Num 9

ZIELINSKI ET AL. PAPERS

12 REFERENCES

[1] S. K. Zielinski, F. Rumsey, and S. Bech, “Effects ofBandwidth Limitation on Audio Quality in ConsumerMultichannel Audiovisual Delivery Systems,” J. AudioEng. Soc., vol. 51, pp. 475–501 (2003 June).

[2] F. Rumsey, Spatial Audio, (Focal Press, Oxford,UK, 2001), pp. 151–218, 226.

[3] F. Rumsey, “Spatial Quality Evaluation for Repro-duced Sound: Terminology, Meaning, and a Scene-BasedParadigm,” J. Audio Eng. Soc. vol. 50, pp. 651–666 (2002Sept.).

[4] Swedish Radio Choir and Berliner Philharmoniker,Claudio Abbado Conductor, W. A. Mozart, “Requiem in DMinor”, Herbert von Karajan Memorial Concert, DVD-Videodisk, cat. no. PAL. 100 036 (Arthaus Musik, Euroarts,Videal / Brilliant Media, Salzburg, Austria, 1999).

[5] Oldfield, “Tubular Bells II”, DVD-Video Disk,3984-27243-2 (Warner Music UK, Oldfield Music, 1999).

[6] Fourplay, “Fourplay,” DVD-Audio disk, 7599-26656-9(Warner Bros. Records Inc., A time Warner Company,2001).

[7] Pat Metheny Group, “Imaginary Day,” DVD-Audiodisk, 9362-46791-2 (Warner Bros. Records Inc. AN AOLTime Warner Company, 2001).

[8] ITU-R Rec. BS.775-1, “Multi-Channel Stereo-phonic Sound System with or without AccompanyingPicture,” International Telecommunications Union,Geneva, Switzerland (1992–1994).

[9] J. Bauck and D. H. Cooper, “GeneralizedTransaural Stereo and Applications,” J. Audio Eng. Soc.,vol. 44, pp. 683–705 (1996 Sept.).

[10] EBU Rec. Tech 2376-E, “Listening Conditions forthe Assessment of Sound Programme Material.Supplement 1—Multichannel Sound,” European Broad-casting Union, Geneva, Switzerland (1999).

[11] ITU-R Rec. BS.1116, “Methods for SubjectiveAssessment of Small Impairments in Audio SystemsIncluding Multichannel Sound Systems,” InternationalCommunications Union, Geneva, Switzerland (1994).

[12] Multichannel Universe, “Die Referrenz Demo- undTest DVD,” DVD-Video disk BAL-9500-3 (SurroundSound Forum, Balance and Media City. BalanceMunchen, 2000).

[13] B. C. J. Moore, B. R. Glasberg, and T. Baer, “AModel for the Prediction of Thresholds, and PartialLoudness,” J. Audio Eng. Soc., vol. 45, pp. 224–240(1997 Apr.).

[14] MUSHRA-EBU, “Method for Subjective

Listening Tests of Intermediate Audio Quality,” DraftEBU Rec. B/AIM 022 (Rev. 8)/BMC 607rev, EuropeanBroadcasting Union (2000 Jan.).

[15] ITU-R Draft New Rec. BS.[DOC. 6/106 (Rev.1)-E]1534, “ Method for the Subjective Assessment ofIntermediate Audio Quality,” International Communica-tions Union, Geneva, Switzerland (2001 Apr.).

[16] S. B. Green, N. J. Salkind, and T. M. Akey, UsingSPSS for Windows (Prentice-Hall, Englewood Cliffs, NJ2000 Apr.).

[17] D, C, Howell, Statistical Methods for Psychology(Duxbury, NY, 1997).

[18] J. G. Beerends and F. E. De Caluwe, “TheInfluence of Video Quality on Perceived Audio Qualityand Vice Versa,” J. Audio Eng. Soc. (EngineeringReports), vol. 47, pp.355–362 (1999 May).

13 APPENDIX

A.1 RMS Power of Program MaterialTable 7 shows the total rms power (in dB) for each item

used in the experiment. The measurement values are pre-sented for each channel separately.

A.2 Technical Specification of LoudspeakersThe specifications of the loudspeakers used during the

listening test are as follows:• Five main loudspeakers—Genelec 1032A (active mon-

itors). Free-field frequency response 42 Hz to 21 kHz(2.5dB).

• Subwoofer—Genelec 1094A (active monitor). Freefield frequency response 29 Hz to 80 Hz (2.5dB);short term output power 400 W (8Ω); crossover fre-quency 85 Hz.

A.3 Instructions to ListenersA.3.1 Familiarization and training

The first step in the listening test is the familiarizationwith the listening tests process. This phase is called atraining phase, and it precedes the true evaluation phase.The purpose of the training phase is to allow you, as anevaluator, to achieve two objectives as follows:• PART A: to become familiar with the sound excerpts

under test and their quality level• PART B: to learn how to use the test equipment and the

grading scale.In PART A of the training phase you will be able to lis-

ten to different DVD-V disks. Moreover, you will be ableto listen to a selection of many short audio items using a

796 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

Item L R C LFE LS RS

Classical (F-B) 34.3 33.8 36.63 – 36.66 36.75Pop (F-B) 26.98 30.74 25.33 34.87 42.69 44.23Pop (F-F) 29.07 27.37 23.98 38.61 31.59 25.75Movie (F-B) 39.69 42.34 27.06 – 48 51.33TV show (F-F) 31.91 31.85 28.32 – 36.08 38.63Applause (F-F) 36.04 36.08 40.73 – 35.38 34.64

Table 7. Total rms power of program material used in experiment.

Page 23: Journal AES 2003 Sept Vol 51 Num 9

PAPERS EFFECTS OF MIX-DOWN ALGORITHMS ON SOUND QUALITY

software sampler. The aim of this phase is to make youmore familiar with different ways of producing audiomaterial. At the end of this phase you will be able to lis-ten to all excerpts that have been selected for the tests inorder to illustrate the range of possible qualities. You mayclick on different buttons to listen to different soundexcerpts.

A.3.2 Grading PhaseDuring the phase you are asked to grade a basic audio

quality.Basic audio quality is defined as the global attribute that

describes any and all detected differences between the ref-erence and the evaluated excerpt. For example, this mayinclude differences in timbre, differences in spatial char-acteristics, differences in number of active channels, bal-ance, dynamic range, changes in front image, changes inlocalization of audio sources, changes in envelopment,occurances of any type of linear and/or nonlinear distor-tion, any kind of noise and distortion, distortion caused bycompression algorithms, phase distortion, and so on.

When assigning your grades you will use the qualityscale presented in Fig. 12. The grading scale is continuousfrom “excellent” to “bad”. A grade 0 corresponds to thebottom of the “bad” category, whereas a grade 100 corre-sponds to the top of the “excellent” category.

The evaluation of the audio quality may depend on theapplication considered. For example, some excerpts maybe graded “good” when somebody is listening to a per-sonal hi-fi or is listening to the audio over the Internet,whereas the same excerpts may be graded as “bad” whensomebody is listening to a DVD-A and as a consequencehas higher expectations. In this test you can assume thatyou are listening to an audiovisual home-theater systeminstalled in a living room.

During the training phase you should be able to learnhow you, as an individual, interpret the audible impair-ments in terms of the grading scale. You should not dis-cuss your personal interpretation with the other subjects atany time during the test. Please make your own interpre-tation of the scale and be consistent in your grading. Thereis no right or wrong answer. Your judgment is the correctanswer, and you will not be marked on your choices.However, it is expected that your answers be consistentand repeatable within consecutive trials.

No grades given during the training phase will be takeninto account in the true tests.

In the current experiment two listening positions aretaken into consideration (Fig. 1). The evaluation proce-dure and the user interface are different for these two posi-tions. In the center position (A) you are asked to grade theitem in comparison with the reference. The user interfaceused in this position is shown in Fig. 13. In evaluating thesound excerpts, please note that you should not necessar-ily give a grade in the “bad” category to the sound excerptwith the lowest quality in the test. However, one or moreexcerpts must be given a grade of 100 because theunprocessed reference signal is included as one of theexcerpts to be graded.

Your scores should reflect your subjective judgement ofthe quality level for each of the sound excerpts presentedto you. Each trial will contain six excerpts to be graded(labels 1 to 6). Each of the items is approximately 10 to 20seconds long. You can switch between the excerpts usingeither the computer mouse or the keyboard (keys 1, 2 ... 6,and R for reference and S for STOP). You may listen to theexcerpts in any order, and number of times. The audioexcerpts are looped. You can stop and playback by click-ing the STOP button. You can switch between the differ-ent excerpts while they are played back in order to makequick comparisons.

At the off-center listening position (B) the quality of thereference may be degraded due to a nonoptimum listeningposition (it is assumed that the quality of the reference isthe best in the center position). Therefore you are asked tostart the evaluating procedure in the off-center listeningposition by grading the quality of the reference first. Theuser interface for the off-center listening position is shownin Fig. 14. If you wish you can change your sitting posi-tion (between positions A and B) as many times as youwant during evaluation of the reference. Once you havegraded the reference you are asked to evaluate the qualityof the other items (you are not allowed to change your

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 797

Fig. 12. Quality scale used in experiment. Fig. 13. User interface used at center listening position.

100

Excellent

80

Good

60

Fair

40

Poor

20

Bad

0

Page 24: Journal AES 2003 Sept Vol 51 Num 9

ZIELINSKI ET AL. PAPERS

place at this stage). Note that in general the quality of theitems evaluated may be graded even higher than the qual-ity of the reference.

Use the slider for each excerpt to indicate your opinionof its quality. When you are satisfied with your grading ofall excerpts you should click on the button at the bottomof the screen. It will automatically save your scores andwill allow you to move on to the next trial.

When evaluating the audio-only items keep yours eyesclosed and switch between the excerpts “by touch” usingthe keyboard. Whenever the audiovisual material is pre-sented keep your eyes fixed on the picture (of course youwould have to look at the computer monitor occasionallyin order to switch between the stimuli and to record yourscores). Please remember that in both cases (audio onlyand audiovisual) you are asked to evaluate the quality ofaudio, not video.

To recapitulate, you are asked to grade the basic audioquality.

Basic audio quality is defined as the global attribute thatdescribes any and all detected differences between the ref-erence and the evaluated excerpt. It is important to bear inmind that at the center listening position one or moreexcerpts should be given the grade “excellent” because thehidden reference excerpt is included as one of the excerptsto be graded.

Please feel free to ask any questions, preferably beforethe test.

Thank you for taking part in this experiment. Enjoy thelistening!

The biographies of Sl/awomir K. Zielinski, Francis Rumsey,and Søren Bech were published in the 2003 June issue of theJournal.

798 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

Fig. 14. User interface used at off-center listening position.

Reference

Page 25: Journal AES 2003 Sept Vol 51 Num 9

PAPERS

0 INTRODUCTION

HRTFs are determined by the size and shape of indi-viduals’ anatomical structure components, such as torso,head, and pinna. The form and extent to which differentbody parts contribute to HRTFs is therefore of great inter-est. Research has been carried out on the transfer func-tions of the head and pinna in isolation [1]–[6], either byextensive acoustic measurements on manikins or by usingsimplified physical or numerical models. Although theindividual effects from different parts of the whole are noteasily discerned, it has been shown that the total effect onHRTFs can sometimes be estimated using a relatively sim-ple model [7], [8]. An investigation of how HRTFs areaffected by differences in the shape of the torso, head, andpinna may provide insights into the operation of the spa-tial hearing system.

Various simplified head shapes have been studied forpurposes of HRTF approximation. Yet as far as theauthors are aware, there are no parameterization methodsthat systematically simplify head shapes for HRTFresearch. In this paper we propose a parameterizationmethod for the systematic representation and simplifica-tion of head shapes based on spherical harmonics (SHs).Guidelines are deduced for the tradeoff between head

model simplification and the numerical accuracy of theresulting low-frequency HRTFs by studying the pressureerrors that originate from head shapes that are simplifiedto different degrees.

The use of parameterization to study the influence ofhead shapes on HRTFs may lead to an efficient methodfor computing individualized HRTFs. For example, wehave recently proposed the differential pressure synthe-sis (DPS) method [9], [10] to compute acoustic pres-sures efficiently at the ear canal entrances. The compu-tation was based on low-pass-filtered head shapesdescribed by SHs. The head-shape simplificationmethod proposed in this paper could make the DPSmethod more systematic.

We concentrate on the influence of head-shape fea-tures on low-frequency HRTFs excluding the pinnae, andthe investigation is based on a pinnaless KEMAR, thesurface of which we described using SHs. Of particularinterest are the pressure errors on the surface of this headmodel caused by low-pass filtering the head shape to dif-ferent degrees. Throughout this paper, pressures are cal-culated using the boundary-element method (BEM)implemented in the software package PAFEC [11], [12].To limit our discussions to the influences of shape sim-plifications only, we assume all head shapes to be rigid.As pinna effects are not considered in this paper and thepinnae start to have a significant effect on HRTFs aboveabout 3 kHz, we limit our discussion to frequencies up tothis point.

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 799

A Study on Head-Shape Simplification Using SphericalHarmonics for HRTF Computation at Low Frequencies*

YUFEI TAO, ANTHONY I. TEW, AES Associate, AND STUART J. PORTER

Department of Electronics, University of York, Heslington, YO10 5DD, UK

Simplified head shapes, such as spheres and ellipsoids, have often been applied in theresearch of head-related transfer functions (HRTFs). However, the effects of the missinghead-shape features in these simplified head models have not been thoroughly examined.Head shapes are represented using spherical harmonics, which allows the simplification ofhead shapes to be carried out in a controlled and systematic way. The KEMAR head shape islow-pass filtered to different degrees. The errors in both the head shape and the acousticpressures introduced by the low-pass filters are studied. Guidelines are presented forexamining the tradeoff between head-shape simplification and accuracy of pressure estima-tion. It is concluded that spherical harmonics above degree 11 may be ignored in the compu-tation of HRTFs below 3 kHz.

*Presented at the 114th Convention of the Audio EngineeringSociety, Amsterdam, The Netherlands, 2003 March 22 – 25;revised 2003 June 12.

Page 26: Journal AES 2003 Sept Vol 51 Num 9

TAO ET AL. PAPERS

1 HEAD-SHAPE REPRESENTATION

The KEMAR head shape was captured using aFastScan1 laser scanner. A simulated flat plate wasinserted under the truncated neck of the scan to close thesurface.

1.1 Spherical HarmonicsThe surface of the head shape is described by its radial

distance r from the origin, where r is a function of the ele-vation angle θ and the azimuth angle ϕ, as shown in Fig. 1.

The head shape can then be analyzed using SHs [9], [13],

cos

cos sin

θr a P

a m b m

θ ϕ

ϕ ϕ

, nm

m

n

n

N

nm nm

0001

. !!_ ^

_ _

i h

i i8 B

(1)

where N is large enough to ensure that the truncation erroris negligible; anm and bnm are the SH coefficients; Pn

m (cosθ) is the Legendre polynomial of degree n, order m; andan an0. The SH coefficients anm and bnm in Eq. (1) canbe calculated by

(2a)sin

sin

a r Y

b r Y

θ ϕ θ ϕ θ θ ϕ

θ ϕ θ ϕ θ θ ϕ

, , d d

, , d d

ππ

ππ

nm

nm

00

2

00

2

θϕ

θϕ

1

0

##

##

_ _

_ _

i i

i i (2b)

where Y1

and Y0

are the SHs,

(3a)cos cos

sin cos

ϕ

ϕ

YN

m P

YN

m P

1

1

θ ϕ θ

θ ϕ θ

,

,

nmnm

nmnm

1

0

_ _ ^

_ _ ^

i i h

i i h (3b)

with

!

!,

,

,επ

εNn n m

n m m

m

4

2 1

1 1 0

2 0

nm

mm

!^

^

h

h*

being the normalization coefficient. It is easily seen fromEqs. (2) and (3) that bn0 0. For clarity, the SH coeffi-

cients anm and bnm will be addressed as coefficients ofdegree n, order m, in the subsequent parts of this paper.

1.2 Gaussian QuadratureThe integrals in Eqs. (2) can be calculated numerically

using the computationally efficient Gauss–Legendrequadrature formula [14], [15],

sinθ θ ϕd d . .fj ,π

fN

wθ ϕ θ ϕ,ππ

i ji

N

j

N

00

2

11

2

θϕ## !!_ ai k

(4)

The elevation angle set θi is chosen so that cos(θi) andwi are the Gauss-Legendre nodes and weights [16] on[1, 1]. The azimuth angles ϕj are evenly spaced on [0, 2π).

The SH coefficients in Eqs. (2) are calculated for 0 ≤ n≤ N, 0 ≤ m ≤ n, and written in matrix form,

A [anm]B [bnm] .

2 LOW-PASS-FILTERED HEAD SHAPES

2.1 Method of Low-pass FilteringThe original shape consisting of the maximum degree

of N can be restored using the SH coefficients A and Baccording to Eq. (1). By excluding higher degree/ordercoefficients of the A and B matrices in the reconstructedshape, the original shape can be low-pass filtered, or“smoothed,” to different degrees. As more coefficients ofhigher n and m are discarded by the low-pass filters(LPFs), more shape features are lost. Here we apply LPFsthat include all values of m up to n, 0 ≤ m ≤ n, where nranges up to the required degree N, 0 ≤ n ≤ N.

We have chosen N 34 as the highest degree withwhich to represent the KEMAR head, as shown in Fig. 2.We call this the reference shape. The N elevation angles inθ are chosen to be the Gauss–Legendre nodes of order N.There are 2N evenly spaced azimuth angles in ϕ, whereϕ 0 lies along the x axis. These θ and ϕ values areused to form the mesh on the surface. There are a total of2N2 grid nodes on the mesh.

The head shape was low-pass filtered to all degrees n

800 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

Fig. 1. Spherical coordinates used to represent head shape. Fig. 2. Original shape, N 34.

1FastSCAN is a trademark of Polhemus Incorporated,Colchester, VT, USA.

Page 27: Journal AES 2003 Sept Vol 51 Num 9

PAPERS A STUDY OF HEAD-SHAPE SIMPLIFICATION USING SPHERICAL HARMONICS

ranging from 0 to 34, where n N 34 corresponds tothe reference shape. Shape errors for three regions wereconsidered: over the whole surface, at the ear position(only errors at one ear were calculated as the head shapeis symmetrical), and in the vicinity of the nose. Errorswere calculated by comparing node coordinates for eachlow-pass-filtered shape with the corresponding coordi-nates of the reference shape.

2.2 Calculation of Shape ErrorsWith the KEMAR head shape facing the x direction,

the position of the ear was measured to be 5° below thehorizontal plane (that is, θ 95°) and 14° behind the ver-tical plane (that is, ϕL 90° 14° 104°; ϕR 270° 14° 256° for the left and right ears, respectively).Measurements around the nose region were taken at thesame elevation angle as the ears (θ 95°), on the tip ofthe nose and 15° either side (ϕnL, nR 15°) which weterm the “nose side points.” The tip of the nose is on theedge of adjoining surface patches, which leads to conver-gence problems. We therefore used the nose side points todetermine shape errors in this region, that is, the noseshape was considered to have converged when the radiir(θ, ϕnL, nR) of the nose side points have converged.

The rms error of the shape, low-pass filtered to degreen, may be calculated using

ε n r i r i N2 _rmsr n Ni

N 2

1

22

2

!^ ^ ^h h h8 B (5)

in which the summation is carried out over all 2N2 meshgrid nodes on the surface of the head. The percentage rmserror is expressed as

% .ε n

r i N

r i r i N

2

2

100

_ %rmsr

Ni

N

n Ni

N

2

1

22

2

1

22

2

2

!

!

^

^

^ ^

h

h

h h

8

8

B

B

(6)

Errors at the ear positions may be calculated using

ε r n r N _ear ear earr ^ ^h h (7)

and the corresponding percentage error is given by

%εr N

r n r N100

_ %ear

ear

ear earr

^

^ ^

h

h h(8)

where rear(i) is the radius at the ear position in the shapelow-pass filtered to the ith degree. Errors at and around thenose may be calculated similarly.

2.3 Results of Shape ErrorsShape errors are plotted as a function of LPF degree in

Fig. 3. The figure displays some important features.1) The monotonic decrement of the rms error suggests

that the overall accuracy of the shape representationimproves with the inclusion of increasingly higherdegrees/orders of harmonics. Degree 2 SHs of large

amplitudes exist in the head shape, resulting in a signifi-cant drop in the rms error curve at n 2. The error curvedecreases steadily from n 2 to approximately n 9. Aneven slower decrement is present from n ≥ 9.

2) It can be seen that the low-pass-filtered shape at theear position converges to the reference shape very early.The 0th and 1st degree shapes diverge greatly from then N 34 shape, while a significant drop in the errorcurve occurs at n 2. Small fluctuations are present in theerror curve between n 2 and 10, after which little fur-ther improvement is observable. This reflects the fact thatthe surface at the pinnaless ear positions is very smoothand can be expressed quite accurately using only lowerdegree harmonics.

3) On the other hand, Fig. 3 also shows that the shapeof the nose converges relatively late with increasingdegree n. This is because the nose is not a smooth featureand contains significant contributions from high-degreeharmonics. Relatively high values of n are needed for theshape in this region to approach that of the referenceshape. Convergence for the nose side points is substan-tially complete by n 14, while that for the front of thenose remains incomplete throughout almost the wholerange of n.

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 801

Fig. 3. Shape errors for shapes low-pass filtered to differentdegrees n. (a) Absolute shape errors m, (b) Percentage shapeerrors. — Rms error; — nose tip; -∆- nose, 15° to the side; -*- ear.

(b)

(a)

Page 28: Journal AES 2003 Sept Vol 51 Num 9

TAO ET AL. PAPERS

As is borne out by direct observation, the ear regionsare very smooth in the pinnaless head model. Thereforeonly a small number of SHs are needed to represent themaccurately, despite the existence of remote shape featurescontaining greater contributions of the high-order SHs, inparticular the nose. In other words, removing high-orderSHs in the pinnaless head model mainly modifies regionssuch as the nose and has little effect on the regions closeto each ear. This property of SH parameterization makes itpossible to determine the extent to which shape featuresremote from the ears play a part in numerically definingHRTFs.

3 PRESSURE ERRORS ONLOW-PASS-FILTERED HEAD SHAPES

3.1 Calculation of Pressures on Head ShapesAcoustic pressures on the surface of the head shapes

described in Section 2 were computed using the BEM.The acoustic source was a plane wave traveling along thex axis in the negative direction. Six head orientations wereconsidered, for which the head was rotated from the orig-inal orientation (ϕ 0°) by 30°, 90°, 150°, 210°, 270°,and 330°. These orientations are depicted in Fig. 4. As thehead shapes are symmetrical about the median plane, onlythe first three orientations were computed using the BEM.The pressure results of the other three orientations wereobtained from the results of the first three by swapping theleft–right sides.

3.2 Calculation of Pressure ErrorsPressures and pressure errors on the shapes vary with

the head-shape orientation o, frequency f, and degree n ofthe LPFs applied. The rms pressure error of these shapeslow-pass filtered to degree n is calculated by

., ,, , , , , ,

ε o f nN

p o f n i p o f N i

2

_rmspi

N

2

2

1

2

2

!_

_ _

i

i i

(9)

Again, the summation is carried out over all 2N 2 meshgrid nodes on the surface of the head shape. The per-centage rms pressure error is calculated by normalizingthe rms error of the low-pass-filtered shapes using theaverage pressure over the whole surface of the referenceshape,

% .

, ,

, , ,

, , , , , ,

ε o f n

p o f N i N

p o f n i p o f N i N

100

2

2

_ %rmsp

i

i N

i

N

2

1

22

2

1

22

2

2

!

!

_

_

_ _

i

i

i i

(10)

Pressure errors at the ear positions are calculated using

, , , ,ε p o f n p o f N _ear ear earp _ _i i (11)

and the corresponding percentage error is

,,

, ,% .,

,

, ,ε o f n

p o f N

p o f n p o f N100

_ %ear

ear

ear earp _

_

_ _

i

i

i i

(12)

Errors at and around the nose are calculated similarly.As some parts of the head (such as the face) contain

more prominent features than others, the wave incidentdirection will have an impact on the pressure errors. In thispaper we are primarily interested in the directionless pres-sure errors, and so the orientation-related effects havebeen removed by averaging the percentage pressure errorsover all the wave incident directions. As changing thewave incident angle is equivalent to rotating the head, thedirectionless pressure errors are calculated by averagingthe percentage pressure errors for all head orientations(Fig. 4). For example, the errors at the ear position may becalculated using

,,

%,ε

E f nN

o f n100 _ %

_ %ear

earp

o

p

o

N

1

o

!__

ii

(13)

where No 6 is the total number of head orientations.Other directionless pressure errors are calculated simi-larly. This reduces the effects of the features associatedwith any particular wave incident angle while at the sametime it preserves the main trends of the pressure errorcurves.

3.3 Results of Pressure ErrorsThe percent directionless errors are shown in Fig. 5 for

250 Hz, 500 Hz, 1 kHz, 2 kHz, and 3 kHz. The conver-gence of the pressures with increasing LPF degree n isclearly demonstrated in Fig. 5, from which the followingcan be observed:

1) The rms errors at all frequencies decrease almostmonotonically with the rise in n. This confirms that the headshape leads to better approximation of the true pressure.

2) Pressures at the ear positions converge very early. Asignificant drop in the error curve occurs at n 2 for allfrequencies. The accuracy keeps improving within theregion from n 2 to 10. No significant improvement isseen for n > 10.

3) Pressures in the nose region converge much later

802 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

Fig. 4. Six head orientations.

Page 29: Journal AES 2003 Sept Vol 51 Num 9

PAPERS A STUDY OF HEAD-SHAPE SIMPLIFICATION USING SPHERICAL HARMONICS

than those close to the ear positions. Relatively good con-vergence appears at n ≥ 12 at 250 Hz, 500 Hz, and 1 kHzand at n ≥ 14 for 2 kHz and 3 kHz.

4) A low-pressure region forms at the nose due to wavesreflected from the face. This results in high percent pressureserrors at the nose, as is clearly seen in Fig. 5(d) and (e).

4 DISCUSSION

The application of LPFs of different degrees introduceschanges to both the shape of the head and the pressures onits surface. By analyzing Figs. 3 and 5 we find some inter-esting similarities between the error curves of the shapeand pressures.

1) The rms error curves for both the shape and the pres-sures at all five frequencies decrease almost monotoni-cally, with a steep drop at n 2. Rms error curves of theshapes and pressures follow very similar trends, especiallyat low frequencies up to 1 kHz.

2) At the ears, both shapes and pressures converge veryearly to the non-low-pass-filtered values, with a signifi-cant drop at n 2. Both curves decrease from n 2–10,with negligible improvement for n > 10.

3) At the nose, both shape and pressure converge much

later than at the ears. The convergence of the nose shape,indicated by the convergence of the nose sidepoints, set-tles for n ≥ 14.

These similarities confirm that there is a close corre-spondence between shape errors and the resulting pressureerrors. The similarity between the rms error curves showsthat the overall pressure accuracy is determined by the

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 803

Fig. 5. Direction-averaged percent pressure errors. (a) 250 Hz, (b) 500 Hz, (c) 1 kHz, (d) 2 kHz, (e) 3 kHz. Note changes in verticalscale. — Rms error; — nose tip; -∆- nose, 15° to the side; -*- ear.

(b)

(a)

(e)

(d)

(c)

Page 30: Journal AES 2003 Sept Vol 51 Num 9

TAO ET AL. PAPERS

overall shape accuracy. A particularly good correspon-dence occurs at low frequencies, that is 250 Hz, 500 Hz,and 1 kHz. Similarity also exists between the error curvesof shapes and pressures around local features, namely, theear and nose. This indicates that local shape errors make asignificant contribution to the pressure errors associatedwith these local features, especially at low frequencies.The discrepancies between pressure error curves andshape error curves, on the other hand, probably arisebecause the local pressure accuracy is also affected by theacoustic field from other parts of the shape.

As the frequency rises, for example, at 2 and 3 kHz,errors much larger than those of the lower frequenciesappear at the ear between SHs of degrees 2 and 10. Shapediscrepancies caused by such LPFs are about 2–3 mm atthe ear position and 7–24 mm at the nose. Consideringthat the wavelengths in air at 2 and 3 kHz are approxi-mately 170 and 113 mm, respectively, shape errors of suchsizes tend to have more significant effects on the pressuresthan at lower frequencies (longer wavelengths).

From these results it is seen that for the computation ofHRTFs at frequencies not higher than 3 kHz, an LPF con-sisting of SHs of degrees up to 11 or higher introduces anrms pressure error of less than 5%. No further improve-ment in pressure accuracy is achieved by increasing thedegree of the LPF beyond 14.

5 CONCLUSION

This paper demonstrates the pressure errors that arisewhen the KEMAR head is described using spherical har-monics of various degrees. At low frequencies the overallshape of the human head plays an important role in deter-mining HRTFs. Fine details of head-shape features do nothave a significant impact. Exploring the relationshipbetween shape and pressure has led to guidelines for thedegree of SHs needed to achieve a particular pressureaccuracy. The computation has been carried out on aKEMAR head shape, which is designed to possess mod-erate anthropometric human features. Therefore it is rea-sonable to assume that the results obtained in this work arevalid for the study of the acoustic properties of a widerange of real human head shapes. The results thus providea basis for choosing simplified head shapes in the study oflow-frequency HRTFs. Furthermore this systematic methodfor head-shape simplification facilitates a controlled trade-off between complexity of the head shape and numericalaccuracy of the HRTFs. In conjunction with work we areconducting on the parameterization of pinna shapes [17],[18], the method appears to provide a promising routetoward the efficient generation of individualized HRTFs.

6 ACKNOWLEDGEMENT

Much of this work was funded by Grant GR/M53363from the Engineering and Physical Sciences ResearchCouncil (EPSRC), UK. The authors wish to thank PatrickMacey of PACSYS Ltd. for his technical assistancethroughout the work. The reviewers of this paper aregratefully acknowledged for their helpful comments.

7 REFERENCES

[1] R. Teranishi and E. A. G. Shaw, “External-EarAcoustics Models with Simple Geometry,” J. Acoust. Soc.Am., vol. 44, pp. 257–263 (1968).

[2] E. A. G. Shaw and R. Teranishi, “Sound PressureGenerated in an External-Ear Replica and Real HumanEars by a Nearby Point Source,” J. Acoust. Soc. Am., vol.44, pp. 240–249 (1968).

[3] D. J. Haigh, “Evidence for the Generation ofMultipath Localisation Cues by Human Pinnae,”Acustica–Acta Acustica, vol. 84, pp. 914–917 (1998).

[4] E. A. Lopez-Poveda and R. Meddis, “A PhysicalModel of Sound Diffraction and Reflection,” J. Acoust.Soc. Am., vol. 100, pp. 3248–3259 (1996).

[5] C. Avendano, V. R. Algazi, and R. O. Duda, “AHead-and-Torso Model for Low-Frequency BinauralElevation Effects,” in Proc. IEEE Workshop onApplications of Signal Processing to Audio and Acoustics,(1999), pp. 179–182.

[6] V. R. Algazi, C. Avendano, and R. O. Duda,“Elevation Localization and Head-Related TransferFunction Analysis at Low Frequencies,” J. Acoust. Soc.Am., vol. 109, pp. 1110–1122 (2001).

[7] K. Genuit, “A Description of the Human Outer EarTransfer Function by Elements of CommunicationTheory,” presented at the 12th International Congress onAcoustics (Toronto, Ont., Canada, 1986).

[8] V. R. Algazi, R. O. Duda, R. P. Morrison, and D. M.Thompson, “Structural Composition and Decompositionof HRTFs,” in Proc. 2001 IEEE Workshop on Applicationsof Signal Processing to Audio and Acoustics (WASPAA01),New York, 2001 Oct. 21, pp. 103–106.

[9] Y. Tao, A. I. Tew, and S. J. Porter, “The DifferentialPressure Synthesis Method for Estimating AcousticPressures on Human Heads,” J. Audio Eng. Soc., vol. 51,pp. 647–656 (2003 July/August).

[10] Y. Tao, A. I. Tew, and S. J. Porter, “Interaural TimeDifference Estimation Using the Differential PressureSynthesis Method,” in Proc. AES 22nd Int. Conf. onVirtual Synthetic and Entertainment Audio (Espoo,Finland, 2002 June 15), pp. 99–105.

[11] “ PAFEC-FE Level 8.6 Data Preparation Manual,”PACSYS Ltd., Nottingham, UK (1999).

[12] “ PAFEC-FE Acoustics User Manual 8.5,” SERSystems Ltd., Nottingham, UK (1999).

[13] E. W. Hobson, “Spherical and EllipsoidalHarmonics,” 2nd ed. (Chelsea Publ., New York, 1965).

[14] K. Atkinson, “Numerical Integration on theSphere,” J. Aust. Math. Soc., vol. B23, pp. 332–347(1982).

[15] M. J. Evans, J. A. S. Angus, and A. I. Tew,“Analyzing Head-Related Transfer FunctionMeasurements Using Surface Spherical Harmonics,” J.Acoust. Soc. Am., vol. 104, pp. 2400–2411 (1998).

[16] A. H. Stroud and D. Secrest, GaussianQuadrature Formulas (Prentice Hall, Englewood Cliffs,NJ, 1996).

[17] C. T. Hetherington, A. I. Tew, and Y. Tao,“Three-Dimensional Elliptic Fourier Methods for the

804 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

Page 31: Journal AES 2003 Sept Vol 51 Num 9

PAPERS A STUDY OF HEAD-SHAPE SIMPLIFICATION USING SPHERICAL HARMONICS

Parameterization of Human Pinna Shape,” presented at theIEEE Int. Conf. on Acoustics, Speech and SignalProcessing (ICASSP 2003), (Hong Kong, 2003 April6–10), paper V-612.

[18] C. T. Hetherington and A. I. Tew, “ParameterizingHuman Pinna Shape for the Estimation of Head-RelatedTransfer Functions,” presented at the 114th Conven-

tion of the Audio Engineering Society, J. Audio Eng.Soc. (Abstracts), vol. 51, pp. 415–416 (2003 May),paper 5753.

The biographies of Yufei Tao, Anthony I. Tew, and Stuart J.Porter were published in the 2003 July/August issue of theJournal.

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 805

Page 32: Journal AES 2003 Sept Vol 51 Num 9

PAPERS

0 INTRODUCTION

Unfortunately, controlled listening tests in audio prod-ucts are seldom performed by audio manufacturers, retail-ers, and the audio review press. The most common excuseis that the tests are too time consuming, expensive, or dif-ficult to conduct. Among the few organizations that rou-tinely perform controlled listening tests, it is commonpractice to use a small panel of highly trained expert lis-teners [1]–[6] on the basis that they are more reliable anddiscriminating in their judgements of various attributes ofsound quality and preference. For example, Bech hasreported that one trained listener can yield the equivalentstatistical confidence of using seven untrained listeners[1]. If this is true, training listeners can save considerabletime and money over the long term.

In sensory measurements of consumer products (suchas food and wine), subjects must be highly trained to per-form reliably a complex descriptive analysis of variousperceptual attributes of the products being tested [7]. Onthe other hand, for preference testing, naive or untrainedsubjects representative of the targeted customer are pre-

ferred. This is because preference for most consumerproducts is influenced by demographic and socioeco-nomic factors [7]. Some audio marketing departmentshave argued that the same rationale for testing chocolateand chardonnay should be applied to preference testing ofaudio products, their underlying assumption being thatdifferent demographics have different tastes in soundquality. Another legitimate concern is that the trainingprocess itself may inherently bias listeners’ preferences.They may become conditioned to prefer certain types ofloudspeakers based on how they are trained and rewarded.Costly sonic improvements valued by a critically trainedear may be unappreciated by the average untrained lis-tener. The money might be better spent on improving theproduct’s cosmetics or increasing the marketing andadvertising budgets. These are all valid arguments thatchallenge the wisdom and rationale for using trained lis-tening panels for preference testing.

A third approach in selecting listeners is to solicit theopinions of the audio retailers who sell the products andthe reviewers who write about them based on the assump-tion that their opinions largely determine a product’s com-mercial success. The problem with this approach is thatunless their opinions can be measured under the same con-trolled listening conditions, there is little chance that theywill agree with themselves or with each other. In other

806 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

Differences in Performance and Preference ofTrained versus Untrained Listeners in

Loudspeaker Tests: A Case Study*

Sean E. Olive, AES Fellow

Research & Development Group, Harman International Industries, Inc., Northridge, CA, 91329, USA

Listening tests on four different loudspeakers were conducted over the course of 18 monthsusing 36 different groups of listeners. The groups included 256 untrained listeners whoseoccupations fell into one of four categories: audio retailer, marketing and sales, professionalaudio reviewer, and college student. The loudspeaker preferences and performance of theselisteners were compared to those of a panel of 12 trained listeners. Significant differences inperformance, expressed in terms of the magnitude of the loudspeaker F statistic FL, werefound among the different categories of listeners. The trained listeners were the most discrim-inating and reliable listeners, with mean FL values 3–27 times higher than the other fourlistener categories. Performance differences aside, loudspeaker preferences were generallyconsistent across all categories of listeners, providing evidence that the preferences of trainedlisteners can be safely extrapolated to a larger population. The highest rated loudspeakers hadthe flattest measured frequency response maintained uniformly off axis. Effects and interac-tions between training, programs, and loudspeakers are discussed.

*Presented at the 114th Convention of the Audio EngineeringSociety, Amsterdam, The Netherlands, 2003 March 22–25;revised 2003 June 19.

Page 33: Journal AES 2003 Sept Vol 51 Num 9

PAPERS DIFFERENCES IN PERFORMANCE AND PREFERENCE OF TRAINED VERSUS UNTRAINED LISTENERS

words, the opinions on sound quality are biased by theinfluence of a number of nuisance variables [8], whichinclude visual and psychologically related bias (such assize, brand, price, cosmetics) [9], listening-room acoustics[10]–[13], and loudspeaker placement [14], [15].

All of these arguments have been largely untested orsupported with scientific data. With this in mind, a studywas designed to answer the following hypothetical ques-tion. To what extent do the loudspeaker preferences of agroup of untrained listeners, measured under identical lis-tening conditions, agree with those of an expert panel oftrained listeners? To answer this question we measured theloudspeaker preferences of 256 listeners with little of noformal training or experience in judging sound qualityunder controlled listening conditions. The untrained lis-teners included audio marketing and sales people, retail-ers, audio reviewers, and college students. The measuredvariations in loudspeaker preference and performancebetween the groups were compared to those of a panel oftrained listeners. This study addresses directly some of theuntested arguments against using trained listeners, whichare essentially that their preferences are too biased andunrepresentative of a naive, untrained listener.

1 PREVIOUS WORK

To the best of the author’s knowledge, no scientificstudies have investigated the sound-quality judgements oftrained listeners and compared them to those made byaudio retailers, professional reviewers, and untrained lis-teners. However, a few studies have compared the judge-ments of experienced listeners versus inexperienced lis-teners and examined how training and hearingperformance affect the listener’s reliability in judgingsound quality.

Kirk, in 1956, was one of the first to report the effectsof listening experience and learning on loudspeaker band-width preferences among 210 college students [16]. Hefound that preference was dictated by the quality of thereproduction systems the student most commonly experi-enced. Most students preferred a severely band-restrictedloudspeaker, and preferred the wider bandwidth loud-speakers only after repeated exposure to them over 6.5weeks. By today’s standards, these tests were not verywell controlled. Besides the questionable linearity of theloudspeaker used, the recordings and phonographs werelikely major sources of noise and distortion, whichundoubtedly would have been the least audible and annoy-ing on the band-limited loudspeaker.

More extensive work has been done by Gabrielsson andhis colleagues [17]–[19], who investigated 12 listeners’judgements of sound quality among five different loud-speakers [17]. These 12 subjects were divided into threecategories: listeners in general (L), musicians (M), and“hi-fi” subjects (H). All listeners had normal hearing andranged in ages from 23 to 41. Listeners gave ratings basedon loudspeaker fidelity, similarity, and various verbaldescriptions. The reliability of ratings among groups wasgenerally high, although the inexperienced group L tendedto be less reliable than the other two groups and generally

awarded higher ratings to the poorer loudspeakers. Allgroups tended to vary in the weightings applied to certaindimensions. The experienced listeners gave greater weightto the “brightness” dimension compared to the inexperi-enced listeners, who gave more weight to “loudness.”

Toole conducted a large-scale series of listening tests thatinvolved 42 listeners and 37 loudspeakers over a period of2 years [20], [21]. The ratings were given on a 0–10 pointinterval fidelity scale. This was the first large-scale studythat examined the effect of hearing loss on the repeatabilityof the listener. As the mean hearing threshold below 1 kHzincreased, the listeners’ standard deviations in responsesincreased. Listeners with normal hearing had standarddeviations of less than 1 interval.

Bech further explored the effects of hearing loss, listen-ing experience, and training on a listener’s ability to ratethe fidelity of four different loudspeakers reliably [1]. Hefound no clear correlation with hearing loss and standarddeviation in ratings, although he noted that his subjectswere on average younger and all had normal hearing (<15dB HL) compared to Toole’s subjects. To explore theeffects of training, Bech repeated a loudspeaker test (fourloudspeakers and four programs) six times using 12 inex-perienced listeners. He found that 65% of the subjectsreached an asymtotic performance after only four listeningsessions based on the magnitude of their error varianceand their individual loudspeaker F statistic, which isdefined hereafter. The remaining subjects reached theirpeak performance after seven to eight listening sessions.The difference in performance between a trained and anuntrained listener seemed to disappear after about four toeight listening sessions. This finding agrees with the 50combined years of loudspeaker listening test experience ofthe author and Toole. Similar training effects have beenreported using various computer-based listener trainingprograms [3], [4], [22].

The issue of which metric is best for measuring listenerperformance has been examined in depth by Gabrielsson[23] and more recently by Bech [1]. While the use of stan-dard deviations in response ratings represents the repeata-bility of a listener accurately, it fails to measure the effectsize or the ability to discriminate among loudspeakers. Forexample, a listener using a very small range (for examplea 0.5 rating) produces a low standard deviation score. Thelistener who recognizes the sonic signature of the loud-speakers, uses a larger scoring range, and replicates theirratings perfectly will also produce a standard deviation of0. In this case the second listener is more useful to theexperimenter because he is as reliable as the first subjectbut more discriminating. However, his usefulness is some-what compromised if the judgements are no longer inde-pendent due to product recognition.

Another performance metric proposed by Gabrielssonet al. [17] is the intraindividual reliability index MSw,which represents normalized within-cell error variancecalculated from an individual analysis of variance(ANOVA). Gabrielsson noted that the reliability indexvalue varies significantly depending on the attribute. Themost reliable ratings were 0.53 (loudness) with the fidelitybeing 1.29. Ratings on spatial attributes were among the

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 807

Page 34: Journal AES 2003 Sept Vol 51 Num 9

OLIVE PAPERS

least reliable (spaciousness 1.64, nearness 1.48). One ben-efit in using this metric is that it accounts for variance in alistener’s ratings caused by other factors such as programand its interactions with loudspeakers.

However, it the main focus of interest is the effect of theloudspeaker on preference ratings, Bech argues that theindividual loudspeaker F statistic FL is a better choice [1].FL is the ratio of the loudspeaker effect (mean sum ofsquares for loudspeaker ratings) divided by the error vari-ance (mean sum of squares of the residual). This metricaccounts for the listeners’ ability to discriminate betweenloudspeakers as well as their ability to repeat their ratings,expressed in the denominator. In the current study, listenerperformance is based on the magnitude of the loudspeakerF statistic FL. The author uses this metric for selecting thebest listeners based on their performance in various train-ing tasks [5] and day-to-day performance in preferencetesting of audio products.

One of the issues with using FL is the problem thatoccurs when the error variance is 0, resulting in an unde-fined value due to division by 0. This happens when a lis-tener replicates his or her ratings perfectly in every trial. Inthis paper the author arbitrarily assigned a maximum FLvalue of 2000. Only 16 of the 268 listeners (6%) achieveda 0 error variance, all occurring in the three-way loud-speaker test.

2 EXPERIMENTS

This section describes the experimental design of thetests, including the selection of loudspeakers, programs,listeners, physical test setup, and the experiment protocol.

Two different tests were repeated over the course of 18months, involving a total of 268 different listeners and 36listening groups. A total of 5,256 preference ratings weremeasured. The catagorization and details of the differentlistening groups are discussed in section 2.8.

The two tests are referred to hereafter as the four-waytest and three-way test. The four-way test involved multi-ple comparisons among four loudspeakers rated independ-ently using four different programs. The tests comprisedfour trials conducted in the morning, followed by a repeatof the test in the afternoon for a total of eight trials. In thethree-way test loudspeaker I was dropped from the test,and there were no repeats. Otherwise the two tests wereidentical in all aspects, including program material, play-back level, and seating arrangements. One confoundingfactor was that the seating was not included as a variablein the design of the experiment. The 12 trained listenerslistened alone, seated in the front row directly on axis tothe loudspeakers, whereas the 256 untrained listeners

were assigned randomly to one of eight seats arranged intwo rows. This would explain some of the differences inpreference and intergroup reliability between the trainedand untrained listeners, but not necessarily differences intheir individual performances.

2.1 Loudspeakers and MeasurementsThe four loudspeakers used in both tests are shown in

Table 1. Each loudspeaker is coded with a letter, since thebrand name and model were not relevant to the aims ofthis study. The manufacturer’s suggested retail price perpair (MSRP) ranges from approximately $5000 to$11,000. The loudspeakers were chosen because they areall widely available and compete against each other in themarketplace. Given the relatively high prices of the loud-speakers, they should in theory represent “state-of-art”designs in terms of technical and sonic performance.Indeed, all four models have received high accolades andrecommendations from the audiophile press. In one mag-azine, two of the models (P and M) have received thehighest performance category status possible (class A) forthe past three years, and loudspeaker M was declared a“product of the year.”

The loudspeaker measurements are shown in Appendix1 (Fig. 9). Each loudspeaker was measured in the largeHarman anechoic chamber at a distance of 2 m with 2-Hzfrequency resolution. The chamber is anechoic down toapproximately 60 Hz and has been calibrated down to 20Hz. For each loudspeaker the set of curves represent (fromtop to bottom) the on-axis response, the spatially averaged(30º horizontal, 10º vertical) listening window, theaverage early reflected sounds, and the calculated soundpower response. The lower two curves represent the direc-tivity indices derived from the early reflected sound andthe total radiated sound power. Details of the anechoicchamber and measurement procedure are available in [24].A discussion of these measurements and their correlationwith listeners’ preferences is presented in Section 3.13.

2.2 Program SelectionsTable 2 lists the four program selections used in these

tests. Each program was a short 20–30-second loop digi-tally extracted from a compact disc as a 16-bit, 44.1-kHzstereo wav format file. The programs were selected on thebasis of their ability to reveal spectral and preferential dif-

808 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

Table 1. Codes and descriptions of loudspeakers.

Loudspeaker MSRPCode Description (approx.)

B Three-way dynamic $8,000P Four-way dynamic $10,000M Electrostatic/dynamic $11,000I Four-way dynamic $5,000

Table 2. Program selection used in tests.

ProgramCode Artist, Track, and Album

JT

LF

TC

JW

James Taylor, “That’s Why I’m Here” from“That’s Why I’m Here,” Sony Records.

Little Feat, “Hangin’ on to the Good Times” from“Let It Roll,” Warner Brothers.

Tracy Chapman, “Fast Car” from “TracyChapman,” Elektra/Asylum Records.

Jennifer Warnes, “Bird on a Wire” from “FamousBlue Rain Coat,” Attic Records.

Page 35: Journal AES 2003 Sept Vol 51 Num 9

PAPERS DIFFERENCES IN PERFORMANCE AND PREFERENCE OF TRAINED VERSUS UNTRAINED LISTENERS

ferences between different loudspeakers in over 100 dif-ferent loudspeakers in over 100 different listening testsand various listener training exercises.

2.3 Preference ScaleIn each listening test, listeners were required to rate

each loudspeaker on the interval preference scale definedin the listener instructions (see Appendix 2). The scaleconsists of 11 points ranging from 0 to 10, where the mag-nitude of the rating indicates the degree to which the lis-tener likes or dislikes the sound quality of a loudspeaker.The distance between two loudspeaker ratings representsthe magnitude of preference: separations of 2 or morepoints indicate a strong preference for the higher ratedloudspeaker; 1 point difference, a moderate preference;and a 0.5 point difference represents a slight preference.These definitions were intended to encourage listeners touse the scale in a similar manner and to help make thescale linear, so that equal distances of separation betweentwo loudspeakers imply the same thing no matter whatpart of the scale is used. Listeners were instructed not togive tied ratings.

2.4 Listening RoomAll of the listening tests were conducted in the multi-

channel listening lab (MLL) located at HarmanInternational in Northridge, CA. The physical and acousti-cal characteristics of the listening room and its special fea-tures have been described extensively in [25]. Since thepublication of this document, the walls and ceiling of theroom have been finished with standard gypsum board tobetter simulate the surfaces found in domestic homes.

One unique feature of this listening room is the automatedloudspeaker shuffler. The device permits fast (~3 second)positional substitution of four mono, stereo, or left–cen-ter–right sets of loudspeakers and effectively eliminatesloudspeaker position as a variable in the test. This isimportant since the effect of loudspeaker position on theperceived sound quality has been shown to be a significantvariable, at times larger than the effect between two dif-ferent loudspeakers [14], [15].

Another benefit accrued from the loudspeaker shuffleris that the judgments of loudspeakers between trials or dif-ferent program selections are truly independent. The con-trol computer automatically shuffles the loudspeakersbetween trials and randomly assigns a letter code (Athrough D) to each loudspeaker. For multiple-comparisonloudspeaker tests that do not employ a loudspeaker shuf-fler, the positions of the loudspeakers should be random-ized between trials to reduce bias. The effects of this biason the results of nonshuffled loudspeaker tests have beenraised by Jason [26], [27]. Given that loudspeakers can berecognized easily or identified by differences in theirphysical positions in the room, there is an increased like-lihood of measuring artificially high individual listener FLvalues. Bech has argued the opposite, saying that the loud-speaker–position interactions are likely to increase thevariations in a listener’s ratings each time the positions ofthe loudspeakers are swapped [1]. Clearly having a loud-speaker shuffler eliminates the need to sort out the various

effects and biases that loudspeaker positions have on sub-jective ratings.

All control of the equipment including the switching ofaudio signals and loudspeakers, was computer automatedthrough custom software. In these tests, listeners wererequired to enter their responses on a standard listeningtest form using a pencil. The ratings were later enteredmanually into the database server by the experimenter.This labor-intensive process has recently been eliminatedby giving each listener a personal digital assistant device(PDA) that is wirelessly networked to the database serverand control computer. In this way, listener data input andstorage is completely automated and monitored. Real-timeanalysis of the data is also possible.

2.5 Playback EquipmentThe program signals were reproduced from the hard

disk on the control computer equipped with a digital soundcard (SEK’D ProDif 96). The AES-EBU signal was fed toa digital switcher–distributor (Spirit 328 digital mixer) andconverted to four analog signals using an eight-channelStuder D19 digital-to-analog converter. Precise levelmatching between loudspeaker was done by adjusting thetrim controls on each analog output. Each loudspeaker wasamplified with a Proceed AMP3 amplifier.

2.6 Level AdjustmentEach loudspeaker was level matched to within 0.1 dB

(B-weighted) using pink noise fed to each loudspeaker.The calibrated microphone (AKG-CK62) was positionedat ear height over the middle front-row chair. Levels werecalculated using SpectraLAB (version 4.32). The averageplayback level of the program selections in the listeningroom was 75 dB (B-weighted).

2.7 Test ProcedureAll tests were performed double blind using mono-

phonic (single-loudspeaker) comparisons. Before eachtest, listeners were given their instructions and were freeto ask questions about the test procedure.

In both tests the program order was randomized. Foreach trial the control computer determined randomly theletter (A through D) assigned to each loudspeaker.Listeners were provided feedback through an LCD moni-tor that indicated the current loudspeaker being played.

Switching between loudspeakers in each trial was per-formed in a random sequence by the experimenter. Themusic was paused during the 3-second interval required tosubstitute the positions of the loudspeakers. Although theeffect of this silent interval on the loudspeaker tests hasnot been investigated, different studies have shown thatincreasing the interstimulus interval impairs listeners’ dis-crimination of pitch, loudness [28], and timbre [29]. Whiledecreasing the interstimulus time gap is advisable formeasuring much smaller audible differences (such as dif-ferent high-quality audio codecs) the author has not foundthe 3-second time gap to be limiting factor in measuringloudspeakers. In fact Toole found that using positionalsubstitution of loudspeakers in both stereo and mono-phonic comparisons lead to a lower error variance in the

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 809

Page 36: Journal AES 2003 Sept Vol 51 Num 9

OLIVE PAPERS

listeners’ ratings, despite the fact the method increased theinterstimulus interval from almost 0 to 5 seconds [30]. Thebenefits of controlling the loudspeaker positional biasesclearly outweigh any effects that result from increasingthe interstimulus time interval to a few seconds.

The presentation time for each loudspeaker was typi-cally equal to the length of the program loop (15–30 sec-onds) and shortened to 10–15 seconds toward the end ofeach trial. Switching continued until all listeners hadentered a rating for each loudspeaker, at which point thenext trial would begin. A trial typically lasted 3–5 minutes,with an entire session typically lasting 15–20 minutes.

For the four-way test listeners were told not to discusstheir responses with one another until the end of the sec-ond session. All listeners were shown their results after thecompletion of the test.

2.8 ListenersThe 268 listeners were categorized according to their

occupations (see Table 3). The table shows the number oflisteners in each category and the percentage of listenersbased on the total number. Note that the number of listenersin each category was not balanced. This is because therecruitment and selection of the untrained listeners was nota factor controlled by the experimenter. The listeners wereall guests invited by the various Harman-brand marketinggroups, including the retailers and audio reviewers. Thestudents were unsolicited guests who were interested invisiting the facility. Factors such as age, gender, years ofaudio experience, and hearing loss were not measured orcontrolled, except for the trained listeners.

The first category (AR) was comprised of 215 audioequipment retailers, ranging from small privately ownedboutiques to large audio retail chains located across NorthAmerica. This group represented by a wide margin thelargest percentage of the total listeners (80.2%).

The second group (S) consisted of 14 university stu-dents from two California universities. One group (CALP)consisted of undergraduate electrical/mechanical engi-neering students with an interest in audio engineering. Theother student group (UC) was enrolled in programspreparing them for careers in music and recording indus-tries. Based on personal observations it would be safe tosay that the student group was the youngest group in thisstudy and had the least amount of experience judging thesound quality of loudspeakers. As a group they represent5.2% of the total sample size.

The third group (MS) consisted of field marketing and

sales people within Harman Consumer Group (HCG) andJBL Professional (JBL). This group had relatively moreprofessional audio experience in evaluating sound qualitycompared to the students. However, none were membersof the Harman-trained listening panel, and they had littleexperience in controlled listening tests. This group con-sisted of 21 listeners, or 7.8% of the sample size.

The fourth group (PR) consisted of six professionalaudio reviewers who review products for some of the mostpopular audio and home theater trade magazines. Thesemembers had considerable experience evaluating thesound quality of audio products but not necessarily undercontrolled listening test conditions.

The final group (T) included 12 members of theHarman-trained listening panel, including one trainee withbroad-band hearing loss (mean of 38 dB HL between 250Hz and 8 kHz) in one ear caused by a genetic mechanicaldefect in the ossicles of the middle ear. Post-hoc analysisof this listener’s test results showed perfect (1.0) nega-tive correlation between loudspeaker preferences in thefour-way and three-way tests. In other words, he com-pletely reversed his order of loudspeaker preferencesbetween tests, supporting Toole’s finding that listenerswith hearing loss are less reliable [20], [21]. This listenerwas not included in the final results. All other listenerswere audiometrically normal (<15 dB HL at all audiomet-ric frequencies between 250 Hz and 8 kHz) and had com-pleted listener training successfully. All had participatedin numerous controlled loudspeaker listening tests withexperience ranging from 2 to 17 years. The mean age forthe trained listeners was 36 years, ranging from 25 to 43years.

3 RESULTS

In this section the results of the two listening tests arepresented and discussed.

3.1 Statistical AnalysisThe results of the four-way and three-way tests were

analyzed separately using a repeated-measures analysis ofvariance (ANOVA). In both tests the dependent variablewas preference rating.

The four-way test was analyzed as a 16 4 4 2design where the within-subject fixed factors includedloudspeaker (4 levels), program (4 levels), and session (2levels; morning and afternoon). The between-subjects fac-tor, group (16 levels), is a nominal variable representing the16 different groups of listeners that participated in the test.

The repeated measures ANOVA for the three-way testconsisted of a 20 3 4 design that included thebetween-subjects factor, group (20 levels), and the within-subject fixed factors, loudspeakers (3 levels) and program(4 levels). There was no afternoon repetition of the test sosession was not a variable. The differences in the totalnumbers of listeners and listening groups between the twotests were not factors controlled by the experimenter. Theselection and pooling of the 256 untrained listeners werebased on the availability of guests invited to participate byvarious marketing departments within the company. A

810 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

Table 3. Category and number of untrained listeners accordingto occupation.

Occupation Code Count Percent of Total

Audio retailers AR 215 80.2University students S 14 5.2Marketing and sales MS 21 7.8Audio reviewers PR 6 2.2Trained Harman Listeners T 12 4.5Total 268 100

Page 37: Journal AES 2003 Sept Vol 51 Num 9

PAPERS DIFFERENCES IN PERFORMANCE AND PREFERENCE OF TRAINED VERSUS UNTRAINED LISTENERS

complete factorial analysis was used in the ANOVA modelwith a significance level of 0.05 for all statistical tests.Inspecting the distribution of the mean loudspeaker rat-ings we found that the means were relatively normal andsymmetrical except in the four-way test, where the meansfor loudspeakers P and I were negatively skewed, and theratings for loudspeaker M were somewhat positivelyskewed. The deviations from normality were likely relatedto the untrained listeners’ tendency to use the entire rangeof the preference scale, including the extreme end points.The tendency for subjects to use the entire scale in psy-chophysical judgement is described in Parducci’srange–frequency theory [31]–[38]. Over many judge-ments, subjects tend to use all available categories definedon the scale an equal number of times. When closing in orspreading out the overall range of products, subjects willmap their experience onto the available categories.Distortions in the scale tend to decrease as the range, num-ber, and frequency of the stimuli increase. These three fac-tors also influence the extent to which the ratings arebiased by contextual effects (see Section 3.5).

Given that ANOVA is quite robust to deviations fromnormality, particularly when the sample size is quite large,

the probability of committing a type I error was consid-ered remote. As a safeguard, a nonparametric analysis(both Friedman and Wilcox signed rank post-hoc tests)was performed on the data, and this lead to the generalresults and conclusions found in the ANOVA tests.

3.2 ANOVA Summary TablesAppendix 3 gives the ANOVA summary tables for the

four-way and three-way tests. Table 4 as well as theScheffe post-hoc test summary tables for the variable,loudspeaker (Table 5). In the following discussion the Fvalue for each effect is given in the form

, ,DF DFF x psource residual_ i (1)

where F( ) is the F statistic expressed as number x,DFsource is the degrees of freedom of the factor or variableor interaction, DFresidual is the degrees of freedom in theresidual, and p is the level of significance.

3.3 Practical Significance and Effect SizeIn practice, significant effects can be easily achieved in

listening tests by using a very large number of listeners.

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 811

Table 4. ANOVA table for preference.

Sum of MeanDF Squares Square F Value P Value Lambda Power

1. Four-way testGroup 15 1969.183 131.279 5.185 <0.001 77.779 1.000Subject (Group) 86 2177.316 25.318Session 1 1.072 1.071 1.048 0.3088 1.048 .164Session ∗ Group 15 13.571 0.905 0.884 0.5836 13.263 0.530Session ∗ Subject (Group) 86 87.993 1.023Program 3 9.710 3.237 4.689 0.0033 14.045 0.903Program ∗ Group 45 28.743 0.639 0.925 0.6109 41.645 0.919Program ∗ Subject (Group) 258 178.067 0.690Loudspeaker 3 9496.861 3165.620 230.954 <0.0001 692.863 1.000Loudspeaker ∗ Group 45 675.109 15.002 1.095 0.3256 49.254 0.966Loudspeaker ∗ Subject (Group) 258 3536.325 13.707Session ∗ Program 3 1.702 0.567 0.920 0.4318 2.760 0.244Session ∗ Program ∗ Group 45 19.747 0.439 0.711 0.9153 32.016 0.795Session ∗ Program ∗ Subject (Group) 258 159.131 0.617Session ∗ Loudspeaker 3 5.875 1.958 0.585 0.6252 1.755 0.167Session ∗ Loudspeaker ∗ Group 45 121.377 2.697 0.806 0.8063 36.270 0.860Session ∗ Loudspeaker ∗ Subject (Group) 258 863.383 3.346Program ∗ Loudspeaker 9 107.217 11.913 3.816 <0.0001 34.342 0.996Program ∗ Loudspeaker ∗ Group 135 654.398 4.847 1.553 0.0002 209.608 1.000Program ∗ Loudspeaker ∗ Subject (Group) 774 2416.437 3.122Session ∗ Program ∗ Loudspeaker 9 44.929 4.992 1.866 0.0539 16.791 0.831Session ∗ Program ∗ Loudspeaker ∗ Group 135 378.575 2.804 1.048 0.3489 141.486 1.000Session ∗ Program ∗ Loudspeaker ∗ Subject (Group) 774 2070.992 2.676

2. Three-way testGroup 19 702.353 36.966 4.206 <0.0001 79.919 1.000Subject (Group) 146 1283.095 8.788Program 3 16.025 5.342 9.989 <0.0001 29.968 0.999Program ∗ Group 57 30.207 0.530 0.991 0.4975 56.492 0.979Program ∗ Subject (Group) 438 234.208 0.535Loudspeaker 2 2234.033 1117.017 149.233 <0.0001 298.466 1.000Loudspeaker ∗ Group 38 722.270 19.007 2.539 <0.0001 96.495 1.000Loudspeaker ∗ Subject (Group) 292 2185.634 7.485Program ∗ Loudspeaker 6 44.295 7.382 4.244 0.0003 25.465 0.986Program ∗ Loudspeaker ∗ Group 114 243.610 2.137 1.229 0.0623 140.051 1.000Program ∗ Loudspeaker ∗ Subject (Group) 876 1523.752 1.739

Page 38: Journal AES 2003 Sept Vol 51 Num 9

OLIVE PAPERS

However, does statistical significance have any importantpractical consequence when the difference in preferencebetween two loudspeakers is less than a 0.5 rating, a slightpreference?

The 5th edition of Publication of the AmericanPsychological Association (2001) states: “...for the readerto fully understand the importance of your findings, it isalmost always necessary to include some [measure ofpractical significance such as an] index of effect size orstrength of relationship” [39].

According to Hurlburt, there is no universally usedmethod for reporting the effect size in an experiment [39].The raw effect size is the magnitude of an experimentalresult measured on the same scale used in the experiment.In this study the maximum raw effect size was 4.2 prefer-ence ratings for the variable loudspeaker (a very strongpreference) and 3.4 preference ratings for the variable lis-tening group. The effect size index d is a unitless measurethat expresses the magnitude of an experimental result,and is widely used in many scientific journals. Inrepeated-measures tests, where more than two loudspeak-ers are compared, Hurlburt recommends calculating themaximum effect size index dM as follows:

D MSd 2 maxM residual) (2)

where Dmax is the largest raw difference in the meansfound between different levels of the independent vari-able, and MS is the residual mean sum of squares in theratings. Fig. 1 shows the maximum effect size indices dMcalculated for each of the three independent variables (lis-tening group, loudspeaker, and program) in both tests.According to Cohen [40], dM values of 0.2, 0.5, and 0.8represent small, medium, and large effects, respectively.In both the four-way and three-way tests the variable loud-speaker had a large effect on the preference ratings (dM 0.8 and 0.77) while listening group produced a mediumeffect (dM 0.47 and 0.64). Program had a very smalleffect (dM 0.11 and 0.25) in both tests.

The anticipated effect size has important practicalimplications on the design of an experiment. The numberof listeners required needs to meet a certain level of statis-tical power. It is proportional to the effect size indexdivided by the residual error variance in judgements. Inother words, fewer listeners are required as the reliabilityand differences between mean rating increases. For exam-ple, Cohen has shown that to achieve the same power (forexample, 0.8) in a repeated-measures test that has twoloudspeakers, the number of listeners required for testswith small, medium, and large effects indices is 196, 33,and 14 listeners [39]. Reducing the power can lower thenumber of subjects needed, but at the expense of anincreased probability of making a type I error (that is,rejecting the null hypothesis when in fact it is true). A bet-ter solution for reducing the size of the subject pool is totrain a few listeners who can produce more reliable anddiscriminating judgements of sound quality.

3.4 Main EffectsIn both tests there was a highly significant difference

in preference between the different loudspeakers; F(3,258) 231.0, p < 0.0001 for the four-way test, and F(2,292) 149.2, p < 0.0001 for the three-way test. A Scheffépost-hoc test performed at a significance level of 0.05showed a significant difference in the means between allpairs of loudspeakers in both tests.

Other main effects that were statistically significant inboth tests were listening group; F(15, 86) 5.2, p <0.0001 for the four-way test and F(19, 146) 4.2, p <

812 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

Table 5. Scheffe table for preference; variable, loudspeaker.

Mean CriticalDifference Difference P Value

1. Four-way testB, P 1.917 0.302 <0.0001 SB, M 2.382 0.302 <0.0001 SB, I 1.581 0.302 <0.0001 SP, M 4.300 0.302 <0.0001 SP, I 0.336 0.302 0.0214 SM, I 3.963 0.302 <0.0001 S

2. Three-way testB, P 1.113 0.252 <0.0001 SB, M 1.865 0.252 <0.0001 SP, M 2.979 0.252 <0.0001 S

Note: Significance level 5%.

Fig. 1. Effect size index dM in four-way and three-way tests for main independent variables: listening group, loudspeaker, and program.

Page 39: Journal AES 2003 Sept Vol 51 Num 9

PAPERS DIFFERENCES IN PERFORMANCE AND PREFERENCE OF TRAINED VERSUS UNTRAINED LISTENERS

0.0001 in the three way test.Program was statistically significant in both tests; F(3,

258) 4.698, p 0.0033 for the four-way tests and F(3,438) 9.99, p < 0.0001 in the three-way tests. Details onthe main effects and interactions are discussed in the fol-lowing sections.

3.5 Loudspeaker EffectsThe mean loudspeaker ratings and 95% confidence

intervals are shown in Fig. 2 for both tests. Note that in allthe graphs that follow, only the upper half of the 95% con-fidence interval is shown. The true mean lies somewherebetween the upper bar and the same distance below theestimated mean, with a probability of 95%.

In the four-way tests the mean loudspeaker rating were7.51 (loudspeaker P), 7.17 (loudspeaker I), 5.59 (loud-speaker B), and 3.21 (loudspeaker M). The difference inthe means (0.34 preference) between loudspeaker P and Irepresents a slight preference. The difference in the meanratings between these two loudspeakers and loudspeakerB (1.92 and 1.58) represents a moderate to strong prefer-ence. Both loudspeakers P and I were very strongly pre-ferred over loudspeaker M based on the difference in themean ratings (4.3 and 3.96). Loudspeaker B was stronglypreferred over loudspeaker M (2.38 rating).

The mean loudspeaker rating in the three-way tests

were 7.07 (loudspeaker P), 5.96 (loudspeaker B), and 4.09(loudspeaker M).

The rank orders of preference for loudspeakers P, B,and M were identical in both four-way and three-waytests. However, the relative magnitude of preferencebetween loudspeakers P, B, and M was somewhat smallerin the three-way tests. The differences in the loudspeakerratings were reduced between 0.51 and 1.3 preference rat-ings compared to those measure in the four-way tests.

A possible explanation for this difference is that a scal-ing effect occurred when the number of loudspeakersincreased from three to four. The listeners may haveexpanded the separation and range in order to accommo-date the additional loudspeaker. Another cause could bewell-known context effect described earlier [31]–[38].Contextual biases highlight a general principal in sensoryjudgment that human observers act like measuring instru-ments that constantly readjust themselves to the context orexpected frame of reference. For example, how warm orcold 40° F feels depends on when (January versus July) orwhere (Arizona versus Alaska) the question is asked.Similarly, the preference and perceived attributes of aloudspeaker will be influenced by the context in which thejudgment is made. A mediocre loudspeaker may receivehigher ratings when it is compared against a group ofweaker loudspeakers versus a group of stronger competi-

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 813

Fig. 2. Mean loudspeaker ratings and 95% confidence intervals. (a) Four-way test. (b) Three-way test.

(b)

(a)

Page 40: Journal AES 2003 Sept Vol 51 Num 9

OLIVE PAPERS

tors. An otherwise neutral loudspeaker may be percievedas sounding “too dull” when it is compared against a pre-dominantly “bright” group of loudspeakers, due to thecontrast effect [41]. Therefore the range of loudspeakerstested affects the distribution of their ratings and wherethey fall on the scale.

In these tests the context effect would be as follows. Inthe four-way test the relative sonic similarities betweenloudspeakers P and I may have accentuated the sonic dif-ferences and deficiencies of loudspeakers B and M, and lis-teners accordingly adjusted the ratings of these loudspeak-ers downward.

The context or contrast effects can be minimized in lis-tening tests by randomizing the order of presentation,using a large number of intervals, increasing the sample

size of test loudspeakers, and including anchors or refer-ences. Training the listeners may possibly reduce theirsusceptibility to context or range effects by creating amore stable sound-quality reference in their long-termmemories. A comparison as rated by trained listeners ofthe two tests shows that the relative and absolute loud-speaker ratings did not change significantly. This suggeststhat they may have been less susceptible to range or con-text effects compared to the untrained listeners. Moreexperiments are needed to test this hypothesis.

3.6 Listener Group EffectSignificant effects were reported earlier for the experi-

mental variable, listening groups. The effects of listeninggroups on the mean preference ratings are plotted in Fig. 3

814 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

Fig. 3. Mean preference ratings and 95% confidence intervals for each listening group. (a) Four-way test. (b) Three-way test.

(b)

(a)

Page 41: Journal AES 2003 Sept Vol 51 Num 9

PAPERS DIFFERENCES IN PERFORMANCE AND PREFERENCE OF TRAINED VERSUS UNTRAINED LISTENERS

for the two tests. The graph indicates some significantvariance among the different listening groups in theirmean preference ratings. This implies that different groupsused different parts of the preference scale. In both teststhe trained listeners (HAR) gave the lowest mean prefer-ence ratings.

The mean preference ratings were also calculated as afunction of the five different listener occupation cate-gories. In the four-way test the trained-listener mean pref-erence rating was 3.32 compared to 5.64 (audio reviewers)and 6.06 (audio retailers). In the three-way test the trainedlisteners’ mean rating was 4.17 compared to 5.42 (market-ing and sales), 5.72 (audio retailers), and 6.51 (students).If we assume that the scale is interpreted the samebetween the different categories of listeners, the studentswere generally the most pleased with the sound quality ofthe loudspeakers, whereas the trained listeners, on averagefound more things to complain about. One possibility isthat trained listeners are generally more critical and diffi-cult to please than untrained listeners. Gabrielsson et al.noted that experienced listeners tend to be more criticaland give lower rating to poorer loudspeakers than inexpe-rienced listeners [17]. The author’s experience has beenthat trained listeners tend to use the same part of the pref-erence scale whether they are judging small inexpensive

computer loudspeakers or very expensive “state-of the-art” loudspeakers such as the ones used in these tests.These listeners put less value on the absolute ratings thanthey do in establishing meaningful differences betweenthe ratings.

3.7 Program EffectsThe mean preference ratings for the four-way and three-

way tests are plotted in Fig. 4. Note that the order in whichprograms TC and JW are plotted is reversed between testsand that the scale has been zoomed in to highlight thesmall but significant effects program had on the prefer-ence ratings. Listeners on average gave higher preferenceratings when the loudspeakers were auditioned using pro-gram JT, with lower ratings given for programs TC, JW,and LF. More noteworthy is that the rank order and rela-tive magnitude of the preference ratings related to pro-gram was remarkably consistent across both listeningtests.

There are a number of plausible reasons why programmight influence the preference ratings. They includeeffects related to a listener’s musical taste for certain pro-grams as well as differences in the sonic fidelity of therecordings. Finally, certain programs may be better atrevealing or concealing differences in the spectral, spatial,

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 815

Fig. 4. Mean preference ratings and 95% confidence intervals for each program. (a) Four-way test. (b) Three-way test.

(b)

(a)

Page 42: Journal AES 2003 Sept Vol 51 Num 9

OLIVE PAPERS

and nonlinear distortions that exist among the differentloudspeakers.

3.8 Interaction EffectsThere was an interaction effect between program and

loudspeaker in the four-way and three-way tests;F(9774) 3.186, p < 0.0001 and F(6876) 4.244, p 0.0003, respectively. Interaction effects between program,loudspeaker, and group were also found in the four-waytest, F(135, 774) 1.553, p 0.0002.

In the three-way test an interaction was found betweenloudspeaker and group, F(26, 22) 3.292, p 0.0458.These interaction effects are discussed separately in moredetail in the following sections.

3.9 Program–Loudspeaker InteractionsFig. 5 shows the interactions between program and

loudspeaker in the four-way and three-way tests. Theinteraction effect is largely isolated to interactionsbetween loudspeaker B and program LF. In both tests themean rating of loudspeaker B dropped almost 1 preferencerating when auditioned with this program. The interac-tions between other loudspeakers and program were com-paratively much smaller.

Program–loudspeaker interactions in listening tests havebeen reported widely in the literature [9], [11], [12], [14],[15], [42], [43]. Gabrielsson has attempted to explain theinteractions by doing spectral analysis of the loudspeakersin the room while they were reproducing each programand looking for correlations with the subjective ratings[42]. It is the author’s experience that if the programs areselected carefully, well recorded, and spectrally homoge-neous, the program interactions can be minimized.

3.10 Listener Group and LoudspeakerInteractions

The statistically significant interactions betweengroup and loudspeaker are shown graphically in Fig. 6.Interaction effects are indicated by changes in the rela-tive distances between the four horizontal lines, eachrepresenting the mean loudspeaker rating as a function ofthe listening group. In the four-way test, this largestdeviation occurred between loudspeakers P and I, and toa lesser extent, between loudspeakers B and I. Some lis-tening groups were better than others in their ability todiscriminate between loudspeakers P and I, although thedifferences were seldom greater than 0.5 rating (slightpreference).

816 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

Fig. 5. Mean loudspeaker preference ratings and 95% confidence intervals for each loudspeaker as a function of program. (a) Four-waytest. (b) Three-way test.

(b)

(a)

Page 43: Journal AES 2003 Sept Vol 51 Num 9

PAPERS DIFFERENCES IN PERFORMANCE AND PREFERENCE OF TRAINED VERSUS UNTRAINED LISTENERS

The trained listeners (HAR) had difficulty discriminat-ing between loudspeakers I and P compared to othergroups. One explanation for this could be related to dif-ference in the seating positions between trained anduntrained listeners. As mentioned previously, the sameseat position was common to all trained listeners, who satdirectly on axis to the loudspeakers. The untrained listen-ers were distributed among eight seats arranged in tworows. A hypothesis is that the audible differences betweenthese two loudspeakers were more apparent for thoseuntrained listeners who were off axis or in the second-rowseating positions.

Apart from these interactions, the difference in themeans between loudspeakers B, M, and I was remarkably

consistent across the 16 different listening groups in thefour-way test.

In the three-way test the interaction between loud-speaker and group is much stronger. The reasons for thisare not clear. Some of this interaction effect is traceable tothe two student groups (UC and CALP). Both groups haddifficulty forming preferences between loudspeaker, P andB reliably and, to a lesser extent, loudspeaker M. Someadditional interaction variance comes from groups RD1and RD2, who rated loudspeaker B the highest. Both thesegroups experienced an equipment failure during the test,where a cable became disconnected from the subwoofer inloudspeaker P, effectively removing all bass below 80 Hzfrom this loudspeaker.

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 817

Fig. 6. Mean preference ratings and 95% confidence intervals for each loudspeaker as a function of listening group. (a) Four-way test.(b) Three-way test.

(b)

(a)

Page 44: Journal AES 2003 Sept Vol 51 Num 9

OLIVE PAPERS

3.11 Performance among Different ListeningGroups

The listener performance metric FL described inSection 1 was calculated for each of the 268 listeners. Thiswas done by performing a one-way ANOVA for eachindividual listener, where the independent factor was loud-speaker. FL represents the mean sum of squares of loud-speaker ratings divided by the residual mean sum ofsquares, also known as the error variance.

The average FL values for each of the 36 different lis-tening groups are plotted in Fig. 7 for the four-way andthree-way tests. Due to the wide range of values in thethree-way test the 95% confidence intervals are ommitedin Fig. 7(b) to clarify better the differences between thedifferent groups at the lower parts of th FL scale . The con-fidence intervals in the three-way test were similar in pro-

portion to those found in the four-way test. In the four-waytest the mean listening group FL value was 34.5, rangingfrom a low of 10.06 (Acad9) to a maximum value of 96.21(Acad 14), slightly higher than 94.6 for the trained listen-ers (HAR). The fairly large confidence intervals indicatehigher standard deviations in listener performance withinthe group. However, given the relatively small number oflisteners in each group, a high standard deviation couldresult if one of the listeners got a perfect score of 2000.

The same mean listening group FL values are shown forthe three-way test. In the three-way test the mean FL valueaveraged across all listeners was 247.29 compared to37.12 in the four-way test. The range of mean FL valuesbetween groups is also much larger (8.14 to 781.32). Thissuggests two things. First, the three-way test presented aneasier task in discriminating between the loudspeakers.Second, the variance in performance among the listening

818 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

Fig. 7. Mean loudspeaker F statistic as a fuction of listening group. (a) Four-way test. (b) Three-way test.

(b)

(a)

Page 45: Journal AES 2003 Sept Vol 51 Num 9

PAPERS DIFFERENCES IN PERFORMANCE AND PREFERENCE OF TRAINED VERSUS UNTRAINED LISTENERS

groups was much larger in the three-way test than in thefour-way test.

Finally, the high FL values of the nominally untrainedgroups Acad14 (four-way test) and APL (three-way test)relative to the trained listeners suggest that formalizedtraining is not always a prerequisite for listeners to per-form well in preference tests. These two groups of listen-ers apparently had sufficient experience, aptitude, andmotivation to perform as reliably as a group of trained lis-teners. As the results show, this is an exception to the rulerather than the norm.

3.12 Occupation as a factor in ListenerPerformance

To examine listener performance in view of occupationmore clearly, the mean listener FL values were plotted asa function of occupation for both tests (see Fig. 8).

In the four-way tests the listener performance of the dif-ferent categories based on the mean FL values from high-est to lowest was trained listeners (94.36), audio retailers(34.57), and audio reviewers (18.16). In the three-waytest, the mean performance scores were: trained listeners(857.04), audio retailers (273.13), marketing and sales

(112.15), and students (32.35).The performance of the trained panel is significantly

better than the performance of any other category of lis-tener. They are about three times better than the best groupof audio retailers, five times better than the reviewers, and27 times better than the students. The combination oftraining and experience in controlled listening tests clearlyhas a positive effect on a listener’s performance. The stu-dents’ poor performance is likely due to the student’s lackof training and professional experience in the field ofaudio. The reviewers’ performance is somewhat of a sur-prise given that they are all paid to audition and reviewproducts for various audiophile magazines. In terms of lis-tening performance, they are about equal to the marketingand sales people, who are well below the performance ofthe audio retailers and trained listeners.

3.13 Correlation with Objective MeasurementsThe acoustic measurements of the loudspeakers first

described in Section 2.1 are now discussed to determinethe extent to which they correlate with the listening testresults. The measurements of each loudspeaker are shownin Appendix 4 (Table 5) in the order (top to bottom) in

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 819

Fig. 8. Mean loudspeaker F statistic FL as a function of occupation category (a) Four-way test. (b) Three-way test.

(b)

(a)

Page 46: Journal AES 2003 Sept Vol 51 Num 9

OLIVE PAPERS

which they are rated, most preferred to least preferred.Loudspeakers P and I are very similar in terms of bass

extension and flatness in frequency response that is main-tained well off axis. Based on their similarity in measuredperformance, it is not surprising that the mean difference inthe preference ratings was only a 0.34 (a slight preference).

Loudspeaker B was rated third in the four-way test(5.59 preference), 1.92 and 1.58 lower than loudspeakersP and I, respectively. This represents a strong and moder-ately strong preference. Loudspeaker B has a respectableperformance on axis with some gentle undulations in itsresponse. The loudspeaker also has less bass output below80 Hz compared to loudspeakers P and I. More serious isthe rather substantial dip in its sound power response cen-tered at 3 kHz, which is caused by a mismatch in the direc-tivities of midrange and tweeter through their transitionalpassband regions. Listeners described the subjective effectas a hollow and recessed midrange coloration, whichexplains, in part, why it scored lower. The qualities of theindirect and reverberant sounds are evidently importantsince this coloration would not have affected the directsound heard by the listener. It is important that manufac-turers pay attention to these details and have the ability tomeasure and characterize the complete off-axis perform-ance of loudspeakers accurately.

Now we turn to loudspeaker M, an electrostatic hybridloudspeaker that received a mean rating of 3.21 in thefour-way test, a full 2.38 to 4.3 ratings below the otherthree loudspeakers. The response curves are not very flator pretty. There are many visible resonances that are wellabove threshold [44], [45] and are present in both the on-axis and the off-axis curves. This means that the col-orations will be present in both the direct and the reflectedsounds in the listening room. The loudspeaker has lessbass output below 40 Hz than the other three loudspeak-ers, and the midrange frequencies are somewhat empha-sized. The slope of the sound power curve indicates thatthe high-frequency output drops dramatically as the lis-tener moves off-axis from the loudspeaker. Listeners whosit off axis will hear a much duller sound than those sittingon axis. Apparently this did not matter in these tests sincethe listeners who sat on axis (trained listeners) tended torate the loudspeaker as low or lower than the untrained lis-teners sitting off axis. The colorations of this loudspeakerwere dominant and omnipresent, regardless of where thelistener sat in the room.

In conclusion, we can see clear visual correlationsbetween these four sets of measurements and the listeners’preference ratings. The loudspeakers with the flattest,smoothest, and most extended frequency responsesreceived the highest ratings.

4 DISCUSSION

This study reports one of the largest controlled loud-speaker listening tests conducted to date in terms of thesheer number of listeners involved. It is also unique in thatmost of the listeners (96%) had no formal training and lit-tle or no prior experience in controlled tests. One of themost significant findings is that the loudspeaker prefer-

ences of these nominally untrained listeners were verysimilar to those of the panel of trained listeners. Theresults may finally validate the use of trained listeners onthe basis that their preferences can be extrapolated to alarger population of untrained listeners. The notion thatthe loudspeaker preferences of trained listeners are some-how biased can cannot be used to predict those of review-ers, audio retailers, and the intended (untrained) customeris not supported by scientific data.

The differences between trained and untrained listenersare mostly related to differences in performance. Themean performances of the trained listeners based on loud-speakers FL values were 3–27 times higher than any of theother four listener occupations measured in this study.Training and experience in controlled tests lead to signifi-cant gains in performance so that fewer listeners arerequired to achieve the same statistical power. The com-paratively poorer performance of the students relative tothe other three groups of audio professionals suggests thatin field job experience can be beneficial to making morereliable judgments of sound quality. This implies thatsome form of training may be necessary in order to meas-ure statistically significant preferences using more naïveand inexperienced listeners. Fortunately Bech has shownthat very little training (four to eight sessions) is required[1].

The trained listeners were also found to use lower pref-erence ratings than the untrained listeners. However, theloudspeaker rank ordering and the relative differences inpreference between them were quite similar for bothtrained and untrained listeners. This means that extrapola-tions across different listener groups are possible based onthe results from trained listeners. Trained listeners werethe least forgiving when it came to rating the technicallyand sonically weakest loudspeaker in the test (for exam-ple, loudspeaker M).

The study provides strong validation for the current setof acoustic loudspeaker measurements used to design andtest loudspeakers in our organization. There are clearvisual correlations between measurements and subjectivepreference ratings, which supports the earlier findingsreported by Toole [20], [21]. While interpreting the loud-speaker measurements still takes some skill and experi-ence, the set of frequency-response curves alone couldhave largely predicted the outcome of these listening tests.The audio product reviewing industry could do a greatservice to consumers if they adopted a more meaningfulset of technical measurements such as the ones shownhere. Unfortunately such measurements are difficult andcostly to perform, and beyond the reach of most audioreviewers. In the end it is the listening test that is the finalarbiter of performance, and it is here that the reviewersneed to spend more time and take greater care. Hopefullydoing so will prevent reviewers from recommending twoloudspeakers (P and M) as “state-of -the-art” equals whentheir technical and subjective performances have nothingin common. In retrospect, the only common denominatorbetween these two loudspeakers is price.

It is the author’s experience that most of the differencesin opinion about the sound quality of audio product(s) in

820 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

Page 47: Journal AES 2003 Sept Vol 51 Num 9

PAPERS DIFFERENCES IN PERFORMANCE AND PREFERENCE OF TRAINED VERSUS UNTRAINED LISTENERS

our industry are confounded by the influence of nuisancefactors that have nothing to do with the product itself.These include differences in listening rooms, loudspeakerpositions, and personal prejudices (such as price, brand,and reputation) known to strongly influence a person’sjudgment of sound quality [9]. This study has only rein-forced this view. The remarkable consensus in loud-speaker preference among these 268 listeners was onlypossible because the judgments were all made under con-trolled double-blind listening conditions.

5 CONCLUSION

The conclusions from this study are summarized in thefollowing.

1) The loudspeaker preferences of trained listeners weregenerally the same as those measured using a group ofnominally untrained listeners composed of audio retailers,marketing and sales people, audio reviewers, and collegestudents.

2) Different groups of listeners use different parts of thepreference scale. Trained listeners use the lowest part ofthe preference scale, indicating they may be more criticaland harder to please.

3) Significant differences in performance were meas-ured among the four different occupations. The average FLvalues of the trained listeners were 3–27 times higher thanthose measured by the other groups. The second most dis-criminating and reliable group of listeners were the audioretailers, followed by the audio reviewers who were aboutequal to the marketing and sales people. The students hadthe worst performance, most likely due to their lack ofaudio experience compared to the other groups.

4) There were clear correlations between listeners’loudspeaker preferences and a set of acoustic anechoicmeasurements. The most preferred loudspeakers had thesmoothest, flattest, and most extended frequencyresponses maintained uniformly off axis.

5) The rank order of the loudspeaker preferences didnot change between the four-way and the three-way tests.However, eliminating loudspeaker I in the three-way testreduced the differences in mean ratings between loud-speakers P, B, and M. The most likely cause is a scaling orcontext effect related to the number and relative soundquality of the loudspeakers compared in the test.

6) The individual loudspeaker F statistics were on aver-age seven times higher in the three-way test, indicatingthat rating these three loudspeakers may have been an eas-ier task for the listeners.

6 ACKNOWLEDGMENT

Harman International sponsored this work. The authorwould like to thank all of the 268 listeners who partici-pated in his study, as well as the engineering interns whohelped set up and run many of these tests: CharlesSprinkle, Daniel Faissol, Ara Baghdassarian, and JohnJackson. He is also grateful to his wife Valerie, FloydToole, and Søren Bech who provided valuable suggestionsand corrections to this text.

7 REFERENCES

[1] S. Bech, “Selection and Training of Subjects forListening Tests on Sound-Reproducing Equipment,” J.Audio Eng. Soc., vol. 40, pp. 590–610 (1992 July/Aug.).

[2] R. Shively, “Subjective Evaluation of ReproducedSound in Automotive Spaces,” in Proc. AES 15th Inte.Conf. in Audio, Acoustics and Small Spaces (1998), pp.109–121.

[3] S. E. Olive, “A Method for Training Listeners andSelecting Program Material for Listening Tests,” pre-sented at the 97th Convention of the Audio EngineeringSociety, J. Audio Eng. Soc. (Abstracts), vol. 42, p. 1058(1994 Dec.), preprint 3893.

[4] S. E. Olive, “A Method for Training Listeners: PartII,” presented at the 101st Convention of the AudioEngineering Society, J. Audio Eng. Soc. (Abstracts), vol.44, p. 1160 (1996 Dec.), no preprint.

[5] S. E. Olive, “A New Listener Training SoftwareApplication,” presented at the 110th Convention of theAudio Engineering Society, J. Audio Eng. Soc.(Abstracts), vol. 49, p. 542 (2001 June), preprint 5384.

[6] T. Neher, F. Rumsey and T. Brookes, “Training ofListeners for the Evaluation of Spatial SoundReproduction,” presented at the 112th Convention of theAudio Engineering Society, J. Audio Eng. Soc.(Abstracts), vol. 50, p. 518–519 (2002 June), preprint5584.

[7] M. Meilgaard, G. V. Civille, and C. T. Carr, SensoryEvaluation Techniques, 2nd Ed. (CRC Press, Boca Raton,FL, 1991).

[8] F. E. Toole, “Subjective Evaluation: Identifying andControlling the Variables,” presented at the AES 8th Int.Conf.: The Sound of Audio (1990, Apr.), paper 8-013.

[9] F. E. Toole and S. E. Olive, “Hearing Is Believing vs.Believing Is Hearing: Blind vs. Sighted Listening Tests, andOther Interesting Things,” presented at 97th Convention ofAudio Engineering Society, J. Audio Eng. Soc. (Abstracts),vol. 42, p. 1058 (1994 Dec.), preprint 3894.

[10] F. E. Toole, “The Acoustics and Psychoacousticsof Loudspeakers and Rooms: The Stereo Past and theMultichannel Future,” presented at the 109th Conventionof the Audio Engineering Society, J. Audio Eng. Soc.(Abstracts), vol. 48, p. 1101 (2000 Nov.), preprint 5201.

[11] P. Schuck, S. Olive, J. Ryan, F. Toole, S. Sally, M.Bonneville, E. Verreault, and K. Momtahan, “Perception ofReproduced Sound in Rooms: Some Results from the AthenaProject,” in Proc. AES 12th Int. Conf. (1993 June), pp. 49–73.

[12] S. E. Olive, P. Schuck, S. Sally, M. Bonneville, “TheVariability of Loudspeaker Sound Quality among FourDomestic-Sized Rooms,” presented at the 99th Convention ofthe Audio Engineering Society, J. Audio Eng. Soc. (Abstracts),vol. 43, pp. 1088–1089 (1995 Dec.), preprint 4092.

[13] F. E. Toole, “Loudspeakers and Rooms for Stereo-phonic Sound Reproduction,” presented at the AES 8thInt. Conf. The Sound of Audio (1990 Apr.), paper 8-011.

[14] S. E. Olive, P. L. Schuck, S. L. Sally, and M. E.Bonneville, “The Effects of Loudspeaker Placement onListener Preference Ratings,” J. Audio Eng. Soc., vol. 42,pp. 651–669 (1994 Sept.).

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 821

Page 48: Journal AES 2003 Sept Vol 51 Num 9

OLIVE PAPERS

[15] S. Bech, “Timbral Aspects of Reproduced Soundin Small Rooms. I,” J. Acoust. Soc. Am., vol. 97, pp.1717–1726 (1995).

[16] R. E. Kirk, “Learning a Major Factor InfluencingPreferences for High-Fidelity Reproducing Systems,” J.Acoust. Soc. Am., vol. 28, pp. 1113–1116 (1956).

[17] A. Gabrielsson, U. Rosenburg, and H. Sjogren,“Judgments and Dimension Analysis of Precieved SoundQuality of Sound-Reproducing Systems,” J. Acoust. Soc.Am., vol 55, pp. 854–861 (1974).

[18] A. Gabrielsson, “Loudspeaker FrequencyResponse and Precieved Sound Quality,” J. Acoust. Soc.Am., vol. 90, pp. 707–719 (1991).

[19] A. Gabrielsson and B. Lindstrom, “PrecievedSound Quality of High-Fidelity Loudspeakers,” J. AudioEng. Soc., vol. 33 pp. 33–53 (1985 Jan./Feb.).

[20] F. E. Toole, “Loudspeaker Measurements andTheir Relationship to Listener Preferences: Part 1,” J.Audio Eng. Soc., vol. 34, pp. 227–235 (1986 Apr.).

[21] F. E. Toole, “Loudspeaker Measurements andTheir Relationship to Listener Preferences: Part 2,” J.Audio Eng. Soc., vol 34, pp. 323–348 (1986 May).

[22] R. Quesnel, “A Computer-Assisted Method forTraining and Researching Timbre Memory EvaluationSkills,” PhD Dissertation, Tech. Rep. 1, McGillUniversity, Montreal, P.Q., Canada (2002).

[23] A. Gabrielsson, “Statistical Treatment of Data forListening Tests on Sound-Reproducing Treatment,” Rep.T. A, Karolinska Institute, Techincal Audiology, HH,Stockholm, Sweden (1979).

[24] A. Devantier, “Characterizing the AmplitudeResponse of Loudspeaker Systems,” presented at the 113thConvention of the Audio Engineering Society, J. Audio Eng.Soc. (Abstracts), vol. 50, p. 954 (2002 Nov.), preprint 5638.

[25] S. E. Olive, B. Castro, and F. E. Toole, “A NewLaboratory for Evaluating Multichannel Audio Componentsand Systems,” presented at the 105th Convention of AudioEngineering Society, J. Audio Eng. Soc. (Abstracts), vol. 46,pp. 1032–1033 (1998 Nov.), preprint 4842.

[26] M. R. Jason, “A Real-World Implementation ofCurrent Theory in Loudspeaker Subjective Evaluation,”presented at the 90th Convention of Audio EngineeringSociety, J. Audio Eng. Soc. (Abstracts), vol. 39, pp. 385(1991 May.), preprint 3048.

[27] M. R. Jason, “Design Considerations for Loud-speaker Preference Experiments,” J. Audio Eng. Soc., vol.40, pp. 979–996 (1992 Dec.).

[28] S. Clement, L. Demany, and C. Semal, “Memoryfor Pitch versus Memory for Loudness,” J. Acoust. Soc.Am., vol. 106, (1999 Nov.).

[29] G. E. Starr, and M. A Pitt, “Interference Effects inShort-Term Memory for Timbre,” J. Acoust. Soc. Am., vol.102, pp. 486–494 (1997).

[30] F. E. Toole, Private conversation (2003).[31] A. Parducci, “The Relativism of Absolute

Judgment,” Sci. Am., vol. 219, pp. 84–90 (1968).[32] A. Parducci, “Contextual Effects: A Range-Frequency

Analysis,” in E. C. Cartrette and M. P. Friedman, Eds.Handbook of Perception, vol. 2. (Academic Press, NewYork, 1974).

[33] A. Parducci and D. H. Wedell, “The CategoryEffect with Rating Scales: Number of Categories, Numberof Stimuli, and Method of Presentation.” J. Experim.Psychol., vol. 12, pp. 496–516 (1986).

[34] D. H. Wedell and A. Parducci, “The CategoryEffect in Social Judgment: Experimental Ratings ofHappiness,” J. Personal. Soc. Psychol., vol. 55, pp.341–356 (1988).

[35] D. H Wedell, A. Parducci, and M. Lane,“Reducing the Dependence of Clinical Judgment on theImmediate Context: Effects of Number of Categories andType of Anchors,” J. Personal Soc. Psychol., vol. 58, pp319–329 (1990).

[36] D. H. Wedell and J. C. Pettibone, “Preference andthe Contextual Basis of Ideals in Judgment and Choice,” J.Experim. Psychol: General, vol. 128, pp. 346–361 (1999).

[37] E. C. Poulton, “Bias in Quantifying Judgments,”(Lawrence Erlbaum Assoc., Hove, UK, 1989), 304 pp.

[38] H. Lawless, “Bias and Context Effects inRatings,” Lecture in Food Science 410: SensoryEvaluation, Cornell University (2002), http://zingerone.foodsci.cornell.edu/fs410/lectures/context.pdf.

[39] R. T. Hurlburt, Comprehending BehavioralStatistics, 3rd ed. (Thomas Wadsworth, 2003).

[40] J. Cohen, Statistical Power Analysis for the Beha-vioral Sciences (Lawrence Erlbaum Assoc., Hove, UK, 1988).

[41] B. C. J. Moore, An Introduction to the Psychologyof Hearing, 4th ed. (Academic Press, New York, 1997).

[42] A. Gabrielsson, B. Hagerman, T. Bech-Kristensen, and G. Lundberg, “Preceived Sound Qualityof Reproductions with Different Frequency Responsesand Sound Levels,” J. Acoust. Soc. Am., vol. 83, pp.1359–1366 (1990).

[43] S. Olive, “Evaluation of Five Commercial StereoEnhancement 3-D Audio Software Plug-ins,” presented at the110th Convention of the Audio Engineering Society, J. AudioEng. Soc. (Abstracts), vol. 49, p. 543 (2001 June), preprint 5386.

[44] F. E. Toole and S. E. Olive, “The Modification ofTimbre by Resonances: Perception and Measurement,” J.Audio Eng. Soc., vol. 36, pp. 122–142 (1988 Mar.).

[45] S. E. Olive, P. L. Schuck, J. G. Ryan, S. L. Sally,and M. E. Bonneville, “The Detection Thresholds ofResonances at Low Frequencies.” J. Audio Eng. Soc., vol.45, pp. 116–128 (1997 Mar.).

APPENDIX 1LOUDSPEAKER MEASUREMENTS

The spatially averaged anechoic measurements of eachloudspeaker used in the listening tests in the order inwhich the loudspeakers were rated, from highest to low-est, are presented in Fig. 9. See section 2.1 for a descrip-tion of what each curve represents.

APPENDIX 2INSTRUCTIONS TO LISTENERS

In these tests you will be judging the sound quality ofdifferent loudspeakers and rating them according to yourpersonal preference. You MUST enter a rating for each

822 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

Page 49: Journal AES 2003 Sept Vol 51 Num 9

PAPERS DIFFERENCES IN PERFORMANCE AND PREFERENCE OF TRAINED VERSUS UNTRAINED LISTENERS

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 823

Fig. 9. Spatially averaged anechoic measurements (a) Loudspeaker P. (b) Loudspeaker I. (c) Loudspeaker B. (d) Loudspeaker M.

(b)

(a)

(d)

(c)

Page 50: Journal AES 2003 Sept Vol 51 Num 9

THE AUTHOR

OLIVE PAPERS

loudspeaker in the appropriate box after the programselection has ended. Please enter your ratings using thefollowing prefence scale:

Preference Scale

Your ratings can contain up to one decimal place (e.g.,7.3. 2.5).

DO NOT GIVE TIED SCORES IN ANY ROUND.If you do, the computer will ask you to reenter your rat-ings.

You should seperate your preference ratings among dif-ferent speakers to reflect your relative preference betweentwo speakers. Use the following guidelines:

Slight Preference Moderate Preference Strong Preference

Speaker A – Speaker A – Speaker A –Speaker B at least at least at least

0.5 points 1 point 2 points– Speaker B – –|– |– Speaker B |–CommentsFinally, we encourage you to write comments about

what you like and dislike about the sound of the speakersyou are comparing: what aspects is it about the speakerthat makes you prefer it (or not prefer it) over the otherspeaker(s)?

APPENDIX 3

Tables 4 and 5 are the ANOVA summary table and theScheffe post-hoc test table for the variable, loudspeaker.

APPENDIX 4

Table 6 lists the 36 different listening groups, showingthe number in each group, their occupation category(AR—audio retailer, PR—professional audio reviewer,T—Harman-trained listener, MS—Harman marketing andsales, and S—student), and the dates of the listening tests.

824 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

Table 6. Loudspeaker Measurements.

Group Count Category Test Date

Acad1 7 AR 3/11/02Acad2 6 AR 3/12/02Acad3 6 AR 3/13/02Acad4 7 AR 3/14/02Acad5 4 AR 3/15/02Acad6 5 AR 4/17/02Audio reviewers 6 PR 4/16/02Acad7 8 AR 9/10/02Acad8 8 AR 9/11/02HAR 6 T 9/16/02Acad9 8 AR 9/12/02Acad10 5 AR 2/11/03Acad11 6 AR 2/12/03Acad12 7 AR 2/13/03Acad13 8 AR 4/23/03Acad14 5 AR 4/24/03

Subtotal 102

AC1 6 AR 11/1/01AC2 3 AR 11/5/01APP 6 AR 5/1/03CALP 7 S 11/5/01HAR 6 T 10/23/02HCG 9 MS 11/5/01JBL 12 MS 11/8/01RD1 7 AR 11/1/01RD2 7 AR 11/5/01RD3 5 AR 11/5/01RD4 6 AR 11/5/01RD5 5 AR 1/14/02RD6 23 AR 5/20/02RD7 22 AR 10/18/02RD8 13 AR 1/10/03RD9 5 AR 3/10/03RD10 6 AR 3/10/03RD11 6 AR 3/10/03RD12 5 AR 3/10/03UC 7 S 2/26/02

Subtotal 166

Total 268

Page 51: Journal AES 2003 Sept Vol 51 Num 9

PAPERS DIFFERENCES IN PERFORMANCE AND PREFERENCE OF TRAINED VERSUS UNTRAINED LISTENERS

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 825

Sean E. Olive received a bachelor of music degree fromthe University of Toronto, Ont., Canada, 1982 and a mas-ter’s degree in sound recording from McGill University,Montreal, P.Q., in 1986. He is currently pursuing a Ph.D.degree in sound recording at McGill University, investi-gating the perception of spectral distortion and its effecton listener preference.

From 1986 to 1993 he was a research scientist in theAcoustics and Signal Processing Group at the NationalResearch Council in Ottawa, Ont. There he worked withDr. Floyd Toole on research related to subjective andobjective testing of loudspeakers and microphones,room-adaptive loudspeakers, and the detection of reflec-tions and resonances. Much of this work had been pre-

sented in various AES publications. For two of thesepapers he received, as a coauthor, AES publicationawards in 1990 and 1995 and an AES Fellowship in1996. Since 1993, he has been the manager of SubjectiveEvaluation with the R&D group at Harman Internationalin Northridge, CA, where he is responsible for subjectivetesting of all Harman consumer products and conductingpsychoacoustics-related research in sound reproduction.

Mr. Olive is a former chair of the AES Los AngelesSection, a past AES governor, and a current member oftwo AES technical committees. For the past five years hehas taught psychoacoustics and critical listening at theUCLA Extension’s recording engineering certificateprogram.

Page 52: Journal AES 2003 Sept Vol 51 Num 9

PAPERS

0 INTRODUCTION

There has been a dramatic growth in the number ofhome multichannel surround systems in the past severalyears. This has been fueled largely by the availability ofmovies in DVD format, which offer multichannel sound-tracks. These systems enable users to audition both multi-channel film soundtracks as well as multichannel musicalrecordings.

An obvious application of the surround channels is toplace sound sources at locations that are not otherwisepossible with a standard stereo reproduction system. Forexample, using the surround loudspeakers it is possible toplace a sound source either to the side or behind the lis-tener. This application of the surround channels is foundprimarily in movie soundtracks where specific soundeffects are desired.

In musical applications the surround channels are morecommonly used to try to create a more realistic and aes-thetically pleasing sound field. For example, one commongoal of multichannel systems is to create a better approx-imation of the concert-hall experience than can beachieved with only two loudspeakers. The addition of thecenter and surround channels allows the sound to arrivefrom more locations, which in turn allows for a greatersense of realism in the resulting sound fields. While the

majority of multichannel surround systems are based on a5.1-channel configuration, some researchers have beeninvestigating the use of more independent channels [1].

It is well known in the field of concert-hall acousticsthat a strong sense of spatial impression is important inorder to obtain a subjectively pleasing sound field. Whilethere has been a good deal of confusion in the past threedecades over the definition of spatial impression, it is nowwell established that it is composed of at least two com-ponents, apparent source width (ASW) and listener envel-opment (LEV) [2]–[4]. ASW is defined as a broadeningof the apparent width of the sound source, whereas LEVrefers to the listener’s sense of being surrounded orenveloped by sound. Work by Bradley and Soulodre indi-cates that ASW is primarily determined by the energyarriving within the first 80 ms after the arrival of the directsound. Conversely LEV was found to be determined pri-marily by the amount of late lateral energy (arriving after80 ms) in the sound field [5], [6].

In their work Bradley and Soulodre conducted subjec-tive experiments in an anechoic chamber to examine thevarious acoustical parameters that affect LEV. As a resultthey proposed an objective measure LG80

∞ , which is relatedto the sum of the lateral energy arriving after 80 ms. LG80

was shown to correlate highly with the subjective percep-tion of LEV in those experiments. LG80

∞ is defined as

logLG

d

ddB

p t t

p t t10 .

A

F

802

0

2

0 083

3

3

#

#

^

^

h

h

R

T

SSSSS

7

V

X

WWWWW

A (1)

826 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

Objective Measures of Listener Envelopment inMultichannel Surround Systems*

Gilbert A. Soulodre, AES Fellow, Michel C. Lavoie, and Scott G. Norcross, AES Member

Communications Research Centre, Ottawa, Ont. K2H 8S2, Canada

A common goal in multichannel musical recordings is to create a better approximation ofthe concert-hall experience than can be achieved with a traditional stereo reproductionsystem. Listener envelopment (LEV) is known to be an important part of good concert-hallacoustics and is therefore desirable in multichannel reproduction. In the present study a seriesof subjective tests were conducted to determine which acoustic parameters are important tothe creation of LEV. It is shown that LEV can be controlled systematically in a home listeningenvironment by varying the level and angular distribution of the late arriving sound. Whilethe perceptual transition point between early and late energy has traditionally been set to 80ms when predicting LEV, this matter has not been investigated rigorously. Subjective testswere conducted wherein the temporal and spatial distributions of the late energy were varied.A new frequency-dependent objective measure GSperc was derived, and it was shown tooutperform other objective measures significantly.

*Presented under the title “Temporal Aspects of ListenerEnvelopment in Multichannel Surround Systems” at the 114thConvention of the Audio Engineering Society, Amsterdam, TheNetherlands, 2003 March 22 – 25; revised 2003 June 10.

Page 53: Journal AES 2003 Sept Vol 51 Num 9

PAPERS OBJECTIVE MEASURES OF LISTENER ENVELOPMENT

where pF(t) is the instantaneous lateral sound pressure asmeasured using a figure-of-eight microphone and pA(t) isthe response of the same source at a distance of 10 m in afree field. They obtained the highest correlation with theirsubjective results when they used the value of LG80

∞ aver-aged over the octave bands from 125 to 1000 Hz.

In a recent study the authors examined whether theaddition of surround channels could be used to enhanceand control the perception of LEV by providing additionallate lateral energy as compared to a traditional stereoreproduction system [7]. A series of formal subjective andobjective tests were conducted to investigate the compo-nents of a sound field that contribute to the sense of LEVin a multichannel surround system. Several parameters ofthe sound fields (C80, RT, overall level, and spatial distri-bution) were varied systematically, and subjects rated theirperception of LEV for each of these sound fields. Thus therelative influence on LEV of each parameter could bedetermined.

The results confirmed that LEV is determined primarilyby the level and spatial distribution of the late energy.Secondary factors that also influenced LEV included theoverall playback level and the reverberation time. Theobjective measure LG80

∞ was found to correlate well withthe perception of LEV in the sound fields examined in thatstudy.

In concert-hall research the dividing point betweenearly and late energy (for music) has traditionally been setto 80 ms. That is, any energy arriving within the first 80ms after the direct sound is considered to be “early”energy, whereas all energy arriving after 80 ms is consid-ered to be “late” energy. This is reflected in acoustic meas-ures such as C80, which is a relative measure of the earlyarriving energy versus the late arriving energy in a soundfield. As such, objective measures of LEV have used 80ms as the point where the late energy begins. However, thetransition point between early and late energy has neverbeen investigated in studies of LEV even though there isevidence that this may not be the optimal transition pointfor measuring LEV.

Soulodre and Bradley conducted a subjective study inwhich a gated burst of energy was systematically varied intime relative to a fixed set of early reflections [2]. It wasfound that energy arriving at 100 and 120 ms gave agreater sense of LEV than energy arriving at 80 ms.Therefore it may be that an objective measure of LEVbased on a transition point other than 80 ms (such as 100or 120 ms) may be more suitable.

In the present study formal subjective tests were con-ducted to examine directly the transition point betweenearly and late energy for the perception of LEV. Objectivemeasures of LEV using various early/late transition pointsare evaluated, and a new measure is proposed based on theintegration properties of the ear.

1 EXPERIMENTAL OVERVIEW

Four formal subjective experiments were conducted toexamine the various aspects of listener envelopment in amultichannel surround system [7], [8]. The results of these

experiments provide a database that will be used inSection 4 to investigate objective measures of LEV.

The experiments were conducted in a listening environ-ment that meets all of the acoustical requirements set forthin ITU-R BS.1116-1 [9]. For each experiment the loud-speakers were arranged in conformance with the configu-ration defined in ITU-R BS.775-1 [10], as shown in Fig.1. Five Tannoy 800A self-powered loudspeakers wereused. This is a dual-concentric style loudspeaker and thusprovides a better approximation to a point source.

In the experiments a 20-s segment of anechoic musicwas used as the test stimulus (Denon recording ofHandel’s Water Music Suite). A computer-based multi-channel sound-field generator developed by the authorsallowed real-time independent control over the earlyreflections, reverberation, equalization, and mixing foreach channel (see Fig. 2). The system was used to createthe various sound fields through which the anechoic musicwas played. The reverberation algorithm is based on acommonly used commercial reverberation system andprovides uncorrelated reverberation for each of the outputchannels. While the system allows for as many as 16 out-put channels, only five were employed in the subjectivetests.

The subjective test method used was a modified versionof the MUSHRA methodology (ITU-R BS.1534) [11].Specifically, band-limited anchor points were not includedsince they were not appropriate for the present study. Thismultistimulus methodology allows the subject to compareinstantly several test items (sound fields) in order to derivea score for each of the items, and it is particularly wellsuited for evaluating widely different sounds [12]. In eachtest, subjects were asked to rate the amount of LEV foreach of the sound fields as compared to a reference soundfield. Subjects were instructed that the reference soundfield had the lowest level of LEV. A 100-point subjectivegrading scale more suitable to evaluating LEV was usedinstead of the standard MUSHRA scale. A score of 0 indi-cated that the sound-field had very low LEV, whereas ascore of 100 indicated very high LEV.

The sound-field generator was used to create a series of

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 827

Fig. 1. Loudspeaker configuration used in experiments.

C

L R

RSurLSur

-110° +110°

-30° +30°

Page 54: Journal AES 2003 Sept Vol 51 Num 9

SOULODRE ET AL. PAPERS

multichannel sound files, one for each of the sound fields tobe evaluated in the subjective experiments. A computer-basedmultichannel playback system was used to play the soundfiles to the subjects. Subjects were presented with the

computer interface shown in Fig. 3. Using the mouse, sub-jects could switch between any of the sound fields by sim-ply clicking on the appropriate button. The subjects couldalso listen to the reference sound field at any time by

828 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

Fig. 3. Computer interface used by subjects in the experiments.

Fig. 2. Computer interface for multichannel sound-field generating system.

Page 55: Journal AES 2003 Sept Vol 51 Num 9

PAPERS OBJECTIVE MEASURES OF LISTENER ENVELOPMENT

selecting the REF button. The music played continuously,and switching between sound fields was accomplishedthrough a rapid cross-fade. In keeping with the MUSHRAmethodology, subjects were allowed to compare the soundfields as much as they required in order to make theirjudgments. Subjects were also allowed to refine theirscores until they were satisfied with all of the grades thatthey had given.

A total of 17 subjects participated in each of the firsttwo experiments; 10 subjects participated in the third andfourth experiments. Prior to conducting the formal exper-iments, each subject went through a thorough trainingsession wherein they were exposed to the full range ofsound fields that they would later rate in the blind tests.During the training session subjects were advised that,while many subjective parameters of the sound fields maybe varying, they were to rate only their perception of LEVin each sound field. LEV was described as the sense ofbeing enveloped or surrounded by sound. Each subjectwas alone when conducting the formal blind tests and wasallowed to conduct the experiment at their own pace.

The subjects were required to make numerous ratingsand so, to eliminate any possible systematic temporaleffects (such as fatigue), the experiments were dividedinto several sessions. Different sets of sessions were cre-ated and the assignment of sound fields to the buttons onthe computer interface was randomized. As such, the pres-entation of sound fields was different for each subject.

2 SUBJECTIVE EXPERIMENTS

2.1 Experiment 1In the first experiment subjects rated the amount of

LEV for 27 sound fields with respect to a reference soundfield. Fig. 4 gives a symbolic representation of the struc-ture of the impulse responses used in the experiment. Thedirect sound and four early reflections were not variedbetween sound fields, except as part of an adjustment tothe overall sound field level. This was done so that relativemeasures of the early energy such as the lateral energyfraction and the interaural cross correlation would remainconstant throughout the experiments [5]. These measuresare known to be related to the ASW component of spatialimpression.

In the first experiment the three parameters RT, C80, and

angular distribution of the late sound were varied system-atically. Here C80 is defined as

log

d

ddBC

p t t

p t t10

.

.

802

0 08

2

0

0 08

3

#

#

^

^

h

h

R

T

SSSSS

7

V

X

WWWWW

A (2)

where p(t) is the instantaneous sound pressure. Three val-ues of each parameter were used, yielding a total of 27 dif-ferent sound fields. The average values (500- and 1000-Hzoctave bands) of the three parameters are given in Table 1.The range of values found in this table is similar to thatfound in real concert halls.

Since the early portion of the sound fields was held con-stant, variations in C80 were obtained by altering the levelof the late energy as depicted in Fig. 4. Changes to RTwere achieved by changing the reverberation times foreach channel of the multichannel should-field generator.Finally the angular distribution parameter was varied byhaving the reverberant energy come from either one, three,or five loudspeakers, corresponding to late energy distrib-uted over an angle of 0°, 30°, or 110°. The total lateenergy was evenly distributed across the one, three, or fiveloudspeakers. The reference signal consisted of the soundfield having RT 0.5 s, C80 7.0 dB, and 0° angulardistribution.

An analysis of variance (ANOVA) of the results indi-cated that in this experiment there were highly significantmain effects (p < 0.001) for all three independent variables(RT, C80, and angular distribution). The ANOVA also indi-cated that there was a highly significant interaction effectbetween C80 and the angular distribution of the lateenergy. This means that these parameters are not inde-pendent of each other, and therefore the effect of C80 onLEV depends on the angular distribution of the sound.Similarly the effect of angular distribution on LEVdepends on the value of C80.

Fig. 5 plots the perceived LEV as a function of angulardistribution for three values of C80. The error bars repre-sent the critical difference for the experiment derivedusing a t-test. The critical difference is an indicator ofwhether or not the differences between data points are sta-tistically significant. As such, any two data points are sta-

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 829

Fig. 4. Symbolic impulse response depicting sound field struc-ture experiments 1 and 2. C—center; L—left; R—right; LSur—left surround; RSur—right surround; vertical lines—discretereflections.

Table 1. Average midfrequency values of independentvariables for 27 sound fields in experiments 1 and 2.

Low Medium High

Experiment 1RT(s) 0.5 1.2 1.9C80 (dB) 7.0 4.0 2.0Angle (deg) 0 30 110A-weighted playback level (dBA) 77 77 77

Experiment 2RT (s) 1.9 1.9 1.9C80 (dB) 7.0 4.0 2.0Angle (deg) 0 30 110A-weighted playback level (dBA) 74 77 80

Page 56: Journal AES 2003 Sept Vol 51 Num 9

SOULODRE ET AL. PAPERS

tistically different (p < 0.05) if their error bars do not over-lap, whereas overlapping error bars indicate that the datapoints must be considered to be statistically identical.

It can be seen from Fig. 5 that the interaction occurs pri-marily for the 0° angular distribution. For this angular dis-tribution all of the late energy is arriving from directlyahead of the listener. The results suggest that increasingthe level of the late energy (from C80 7 dB to C80 4dB) can increase the perception of LEV somewhat.However, further increases in the level of the late energy(from C80 4 dB to C80 2 dB) does not result in a fur-ther increase in the perception of LEV. Thus it can be con-cluded that, beyond some limit, further increases in thelevel of late energy arriving from the center loudspeakerwill not result in an increase in LEV. It is interesting tonote that Bradley and Soulodre did not find this interac-tion in their study. However, their sound fields did notinclude as broad a range of C80 values, and so the effectmay not have been revealed in that study.

Since the ANOVA did not show any interaction effectsbetween RT and the other independent variables it is pos-sible to examine the effects of RT on LEV independently.Fig. 6 shows the mean LEV scores versus RT when aver-aged over all values of C80 and angular distribution. Againthe error bars represent the critical difference for thisexperiment. As can be seen, LEV increases with increas-ing reverberation times. However, the change in LEV ver-sus RT is not as large as the change in LEV due to varia-tions in C80 and angular distribution, as shown in Fig. 5.That is, the variation in C80 has a larger effect on LEVthan the variations in RT. This is in good agreement withthe findings in [5].

The results of the first experiment show that it is possi-ble to vary the perception of LEV in a typical listeningenvironment by varying either the relative level or theangular distribution of the late energy. It is also possible tovary LEV to a lesser extent by altering the reverberationtime of the sound field.

2.2 Experiment 2The second experiment was very similar to the first except

that RT was held constant at 1.9 s while the A-weightedplayback level was varied. Therefore the 27 sound fields

in this experiment consisted of three values of C80 by threeplayback levels, by three angular distributions of the latesound. The average values (500- and 1000-Hz octavebands) of the three parameters are given in Table 1.Subjects rated the magnitude of LEV for the 27 soundfields as compared to a reference sound field. For thisexperiment the reference consisted of the sound field hav-ing C80 7.0 dB, 74-dBA playback level, and 0° angulardistribution.

An ANOVA of the results showed that there were highlysignificant main effects due to C80, playback level, and theangular distribution of the late sound. Again a highly sig-nificant interaction effect was found between C80 and theangular distribution of the late energy.

Fig. 7 plots the perceived LEV as a function of angulardistribution for three values of C80 for the second experi-ment. The error bars represent again the critical differencefor this experiment. As was found for the first experiment,the interaction appears to be primarily for the 0° angulardistribution. The results suggest that increasing the levelof the late frontal energy is not an effective means ofincreasing LEV as any increase is small and only forhigher values of C80.

The ANOVA showed that there was only a weak inter-action between the overall playback level and C80, and no

830 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

Fig. 5. Mean LEV scores versus angular distribution for the threevalues of C80, experiment 1.

Fig. 7. Mean LEV scores versus angular distribution for the threevalues of C80, experiment 2.

Fig. 6. Mean LEV versus RT averaged over all values of otherindependent variables, experiment 1.

Angular Distribution, degrees

0 +/-30 +/-110

LE

V

0

20

40

60

80

100

C80 = -2dB

C80 = 4dB

C80 = 7dB

RT, s

0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2

LE

V

0

20

40

60

80

100

Angular Distribution, degrees

0 +/-30 +/-110

LE

V

0

20

40

60

80

100

C80 = -2dB

C80 = 4dB

C80 = 7dB

Page 57: Journal AES 2003 Sept Vol 51 Num 9

PAPERS OBJECTIVE MEASURES OF LISTENER ENVELOPMENT

interaction between overall playback level and angulardistribution. Therefore it is reasonable to examine theeffects of playback level on LEV.

Fig. 8 plots the mean LEV versus the three playbacklevels averaged over all values of C80 and angular distri-bution. It can be seen that there is a uniform increase inLEV with increased playback level. Of course, one maynot expect this trend to continue. It seems reasonable toexpect that beyond some point any further increases inplayback level will not result in a corresponding increasein LEV.

The results of this second experiment confirm that it ispossible to vary the perception of LEV in a typically lis-tening environment by varying either the relative level orthe angular distribution of the late energy. It is also possi-ble to control the perception of LEV by varying the over-all playback level of the signal.

2.3 Experiment 3Previous work by Soulodre and Bradley indicated that

the perception of LEV is dependent on the level and tem-poral distribution of the late arriving energy [2]. In thisexperiment a subjective test based on [2] was conducted toexamine this matter further.

Subjects rated the amount of LEV in eight sound fieldsrelative to a reference sound field. A symbolic representa-tion of the impulse responses of the multichannel sound

fields is shown in Fig. 9. The direct sound and early reflec-tions were held constant for all sound fields. A gated burstof energy was used instead of an exponentially decayingreverberant tail. The onset time of the gated burst variedbetween sound fields (0, 40, 80, and 120 ms), while thetotal duration of the gated burst was approximately 100ms. The gated burst was emitted from all five loudspeak-ers, although separate uncorrelated bursts were used foreach channel.

Two levels of the gated burst were included in the test,thus giving eight (2 levels by 4 onset times) differentsound fields. The level of the gated burst was either 3 or6 dB relative to the combined energy of the direct soundand early reflections. The sound field having the lowerlevel gated burst and a 0-ms onset time was used as thereference signal. The level, delay times, and angle ofarrival of the early reflections were chosen to minimizeany echo disturbance in the sound fields having longeronset times.

An ANOVA of the results indicates that there werehighly significant main effects (p < 0.001) for both inde-pendent variables (onset time and level of the gated burst)in the experiment. Fig. 10 shows the mean LEV scoresversus onset time for the two levels of the gated burst. Theerror bars represent the critical difference for the experi-ment derived using a t-test. It can be seen from the figurethat the perception of LEV increases monotonically withincreased onset delay times of the gated burst, with themaximum LEV occurring at the longest delay time (120ms). The results also indicate that for a given onset timethe higher level gated burst of energy provided a greatersense of LEV.

Examining the results shown in Fig. 10 one might betempted to speculate that an onset delay time greater than120 ms should provide a further increase in LEV.However, in informal pilot tests, conducted while estab-lishing the parameters of this experiment, it was found thatlonger delay times produced a disturbing echo rather thanan increase in LEV. This echo disturbance caused someconfusion among subjects since it created a perceptualeffect that was different and separate from LEV. Evidenceof this confusion was seen in some of the subjects’ scores

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 831

Fig. 9. Symbolic impulse response depicting sound field struc-ture, experiment 3. C—center; L—left; R—right; LSur—leftsurround; vertical lines—discrete reflections.

Fig. 8. Mean LEV versus playback level averaged over all valuesof C80 and angular distribution, experiment 2.

Level, dB

-3 -2 -1 0 1 2 3

LE

V

0

20

40

60

80

100

Fig. 10. Mean LEV scores versus onset time of gated burst ofenergy.

Onset time, ms

0 40 80 120

LE

V

0

10

20

30

40

50

60

70

80

90

100

Low Level Gated Burst

High Level Gated Burst

Page 58: Journal AES 2003 Sept Vol 51 Num 9

SOULODRE ET AL. PAPERS

wherein they were not as consistent in their grading of thesound field with the high-level gated burst and a 120-msonset time. As can be seen in the upper curve of Fig. 10, theincrease in LEV is not significant between 100 and 120 ms.

The fact that an onset time of 120 ms produced a greatersense of LEV than an onset time of 80 ms suggests thatthen 80-ms integration limit used in calculating LG80

∞ maynot be optimal. Therefore another subjective experimentwas designed to examine this matter further.

2.4 Experiment 4In the first two subjective experiments the delineation

point between early and late energy was assumed to be 80ms. As a result, the sound fields in these experiments weredesigned so that the late energy arrived at some time after80 ms. More specifically, the discrete early reflectionsarrived before 80 ms and the diffuse late energy arrivedafter 80 ms. This fact may inadvertently introduce a typeof bias that could result in LG80

∞ being an inherently goodobjective measure of LEV in those experiments. In thisfourth experiment the sound fields were designed to elim-inate this possible bias by broadly varying and blurringany boundaries between early and late energy. The resultswere used to examine the effect of the integration limitused in objective measures of LEV.

Fig. 11 provides a symbolic representation of thestructure of the impulse responses used in the fourthexperiment. Unlike for the previous experiments, exceptfor the direct sound, nothing was held fixed from onesound field to the next. The number, levels, delays, andangles of arrival of the early reflections were variedbetween sound fields, and discrete reflections wereallowed to arrive at times beyond 80 ms. Similarly, thelevel as well as the temporal and spatial distributions ofthe diffuse reverberant energy, were also varied broadlyamong the sound fields. None of the sound fields in thetest sounded unrealistic.

In certain sound fields the diffuse reverberant energystarted immediately after the direct sound, whereas forothers there was a delay ranging from 20 to 120 ms beforethe onset of the reverberant energy. The reverberation timewas also allowed to vary between sound fields. In this waypossible biases that could result from having a fixed tem-poral distribution were eliminated. Subjects rated a total of12 sound fields relative to a reference sound field, whichhad the least amount of LEV.

An ANOVA of the results was performed, and the criti-cal difference for this experiment was found to be 4.62,thus indicating that there was a high degree of correlationbetween the subjects’ scores. The results are plotted in Fig.12, with the data points arranged in order of increasingLEV. It can be seen that the sound fields were distributeduniformly in terms of the perception of LEV and subjectswere able to discriminate many levels of LEV. The resultsof this experiment were used in Section 4 and evaluateobjective measures of LEV.

3 ACOUSTIC MEASUREMENTS

In order to derive an objective measure that can predictLEV scores accurately, acoustic measures of the sound fieldsof the experiments were collected using a software-basedmeasurement system developed by the authors (see Fig.13). CRC-MARS (multichannel audio research system)can be used to measure impulse responses using either amaximum-length sequence or a swept-sine-wave approach.

Pairs of impulse responses were measured for eachsound field with an omnidirectional and a figure-of-eightmicrophone. The microphones were placed at the lis-tener’s position with the null of the figure-of-eight micro-phone directed toward the center (front) loudspeaker. Thecombination of impulse responses measured with anomnidirectional and a figure-of-eight microphone allowscertain spatial aspects of a sound field to be investigated.Impulse responses were also measured for each tests con-dition using a Neumann KU81i dummy head.

A swept sine wave was used as the input to the multichan-nel sound-field generator. The outputs of the sound-field gen-erator were then fed to the five corresponding loudspeak-ers. As such, the measued impulse responses include theeffects of the multichannel sound-field generator, theloudspeakers, and the listening room. Therefore theimpulse responses were a true representation of what thesubjects heard during the formal blind test.

4 OBJECTIVE MEASURES OF LEV

The results of several subjective tests, both in the pres-ent study and in previous works, indicate that LEV is

832 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

Fig. 11. Symbolic impulse response depicting sound field struc-ture, experiment 4. C—center; L—left; R—right; LSur—leftsurround; vertical lines—discrete reflections. Fig. 12. LEV scores for sound fields, experiment 4.

Sound Field

1 2 3 4 5 6 7 8 9 10 11 12

LE

V

0

10

20

30

40

50

60

70

80

90

100

Page 59: Journal AES 2003 Sept Vol 51 Num 9

PAPERS OBJECTIVE MEASURES OF LISTENER ENVELOPMENT

related to the relative level and angular distribution of thelate-arriving sound. As mentioned earlier, Bradley andSoulodre proposed an objective measure for LEV (LG80

∞)that reflects this [5].

LG80∞ is an objective measure of the level of the late

sound arriving from the lateral angles. The pick-up patternof the figure-of-eight microphone used to measure LG80

emphasizes sounds arriving from more lateral angles.LG80

∞ uses 80 ms as the delineation point early and late energy.In the context of a multichannel surround system the

denominator of Eq. (1) is meaningless since there is noabsolute reference level for the source signal. This is dueto the fact that the overall playback level can be changedeasily by adjusting the volume control. Conversely, in aconcert-hall context the term in the denominator acts as anecessary reference. Therefore Eq. (1) can be altered tomake it more suitable for predicting LEV in a multichan-nel surround sound system by setting the denominator to1. Thus we have

logLG d dBp t t10. F80

2

0 08

33

# ^ h

R

T

SSS

7

V

X

WWW

A (3)

as a possible predictor of LEV.As a first step in evaluating objective measures of

LEV, the values of LG80∞ for the sound fields of experi-

ments 1, 2, and 4 were determined from their impulseresponses. The resulting LG80

∞ values were then correlatedagainst their corresponding LEV scores from the subjec-tive experiments.

While Bradley and Soulodre found that LG80∞ averaged

over the 125 –1000-Hz octave bands gave the highest cor-relation for their experiments, it is useful to investigateother possible combinations of frequency bands. Table 2show the correlations between several multiband versionsof LG80

∞ and the LEV scores from these three experiments.It can be seen that the correlations are fairly independ-

ent of how the various octave bands are grouped.However, it can also be seen that the grouping that givesthe highest correlation is different for the different exper-iments. More specifically, for the first two experimentsthe highest correlation is obtained when LG80

∞ is averagedacross either the octave bands from 63 to 1000 Hz or thebands from 125 to 1000 Hz. This is in good agreementwith the findings of Bradley and Soulodre. Conversely,

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 833

Octave Bands(Hz) Exp 1 Exp 2 Exp 4 Average

63–500 0.973 0.930 0.892 0.93263–1000 0.973 0.931 0.908 0.938

125–1000 0.973 0.931 0.906 0.937250–1000 0.968 0.922 0.916 0.935500–1000 0.971 0.928 0.919 0.939125–2000 0.969 0.925 0.909 0.93463–8000 0.969 0.924 0.913 0.935

Table 2. Correlations between LEV and various octave-bandaverages of LG80

3 .

Fig. 13. Screen shot of CRC-MARS used to measure impulse responses of sound fields.

Page 60: Journal AES 2003 Sept Vol 51 Num 9

SOULODRE ET AL. PAPERS

for the fourth experiment the highest correlation isobtained when only the 500- and 1000-Hz octave bandsare used to calculate LG80

∞ . This result suggests that LG80∞

may not be the optimal objective measure of listenerenvelopment.

4.1 Temporal Aspects of LEVThe results of experiment 3 suggest that an 80-ms tran-

sition point between the early and late energy may not bethe most suitable for predicting LEV. Experiment 4 wasspecifically designed to examine this question and to elim-inate possible biases that may have existed in previoussubjective tests.

To examine the effect of the early–late transition point,the impulse responses from experiments 1, 2, and 4 wereprocessed to calculate LG x

3, where

logLG d dBp t t10 Fx x

233

# ^ h

R

T

SSS

7

V

X

WWW

A (4)

where x 5–200 ms. LG x3 was calculated at 5-ms inter-

vals, in octave bands 63 to 8000 Hz.Fig. 14 shows the correlation between LG x

3 and themean LEV scores from experiment 4. The values of LG x

3

were averaged over the octave bands from 125 to 1000 Hzsince this was found to give the highest correlation in thefirst two experiments.

It can be seen from Fig. 14 that similar correlations areobtained for lower integration limits between about 50 and110 ms. The highest correlation with the LEV scores is notobtained using LG80

∞ . Rather, an integration limit of 105ms gives the highest correlation (r 0.919). Beyond

about 115 ms the correlation drops off steadily with fur-ther increases in the lower integration limit. ThereforeLG105

∞ is better predictor of LEV scores for this experi-ment, where

.logLG d dBp t t10 F1052

105_3

3

# ^ h

R

T

SSS

7

V

X

WWW

A (5)

While LG105∞ was found to give the highest correlation

with the subjective results of the fourth experiment, it isimportant to investigate how well it predicts the resultsof the first two experiments. Therefore the results fromexperiment 1 and 2 were correlated against the corre-sponding values of LG x

3 (x 5–200 ms) as defined inEq. (4).

Again, the average values over the 125–1000-Hzoctave bands were used. The results are shown in Table 3.It can be seen that LG105

∞ gave a higher average correlationthan LG80

∞ . In fact LG105∞ was found to give the highest

average correlation of any integration limit between 5 and200 ms, thus supporting the conclusion that LG105

∞ is amore suitable objective measure of LEV than LG80

∞ . WhileLG105

∞ gave a higher average correlation, it did not providethe highest correlation for each experiment. It can be seenin Table 3 that for experiment 1 LG105

∞ gave a lower cor-relation than LG80

∞ . Moreover, for each of the three exper-iments different values of x [the lower integration limit inEq. (4)] gave the highest correlation. This is shown inTable 4. The results of Tables 3 and 4 suggest that a singleintegration limit for calculating LG x

3 may not be optimal.One refinement is to use frequency-dependent integrationlimit when calculating LG x

3.

834 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

Exp 1 Exp 2 Exp 4 Average

LG803 0.973 0.931 0.906 0.937

LG1053 0.966 0.936 0.919 0.940

Table 3. Correlation between LEV scores and LG803 and LG105

3 .

Fig. 14. Correlation to mean LEV scores of experiment 4 versus lower integration limit used for calculating late lateral energy.

Lower Integration Limit, ms

0 20 40 60 80 100 120 140 160 180 200

Corr

ela

tio

n

0.80

0.85

0.90

0.95

Page 61: Journal AES 2003 Sept Vol 51 Num 9

PAPERS OBJECTIVE MEASURES OF LISTENER ENVELOPMENT

4.2 Perceptually Motivated Integration LimitsThe well-known precedence effect, or Haas effect, is

the phenomenon by which reflected sounds in a room arenot heard as individual echoes, but rather they are inte-grated with the direct sound so that only a single unifiedsound source is perceived. This unified source appears tocome from the direction of the direct sound, whicharrives first. Thus in general early lateral reflections in aroom simply tend to make the location of the sourceslightly ambiguous and cause the apparent width of thesource to broaden. This is the ASW component of spatialimpression.

Conversely, later arriving lateral reflections are not inte-grated with the direct sound, but are both temporally andspatially separated from it. Thus to the listener, the laterarriving lateral reflections appear to arrive from all direc-tions, thereby creating the sense of the listener beingenveloped by the sound (LEV). It is therefore reasonableto assume that the two components of spaciousness (ASWand LEV) are related to the temporal integration proper-ties of our hearing system.

Jesteadt et al. as well as Moore and Glasberg haveshown that forward masking is frequency dependent [13],[14]. In particular there is more forward masking at lowerfrequencies than at higher frequencies. This implies thatthe integration time of the ear is longer at lower frequen-cies than at higher frequencies. These results suggest thatobjective measures of LEV (and ASW) should take intoaccount the frequency-dependent integration times.

Based on the findings of Jesteadt et al, Soulodre devel-oped an analytic expression to predict forward maskingacross frequency [15]. Fig. 15 plots the predicted forwardmasking versus frequency for a 70-dB SPL masker. It canbe seen that the amount of forward masking decreasessteadily from 100 to about 1000 Hz. Above about 1000 Hzthe amount of forward masking remains constant. Thecurve in Fig. 15 suggests that an optimal objective meas-ure of LEV should have higher integration limits at lowerfrequencies and should drop to some constant value forfrequencies above 1000 Hz.

While the plot of Fig. 15 provides an indication of howthe integration limit should vary with frequency, it does notprovide the actual values. To determine the optimal inte-gration limits as a function of frequency, a comprehensivesearch was conducted. Using the impulse responses fromthe three experiments, approximately 100 million combi-nations of LG x

3 (x 5 – 200 ms) were computed across thefrequency bands from 63 to 8000 Hz, and the resulting cor-relations to the LEV scores were calculated.

Given the large amount of data resulting from thesearch, we were faced with the problem of determining asuitable criterion for selecting the optimum values of x. Itwas decided that it was not sufficient to simply choose thecombination that yields the highest correlation. Rather, themeasure should also be robust as well as perceptuallymeaningful. The integration limits that were ultimatelyselected are given in Table 5.

No value is given for the 63-Hz octave band since theresults indicated that the correlation was always higher

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 835

Fig. 15. Forward masking versus frequency for 70-dB SPL masker.

Exp 1 Exp 2 Exp 4

x 70 ms 195 ms 105 msr 0.976 0.941 0.919

Table 4. Integration limit x giving highest correlationwith LEV scores, and corresponding correlation r.

Page 62: Journal AES 2003 Sept Vol 51 Num 9

SOULODRE ET AL. PAPERS

when this frequency band was excluded from the measure.We now define the term LGperc to refer to this perceptuallymotivated measure of LG x

3 based on the values of Table 5.Table 6 shows the correlation between LGperc and the

LEV scores for the experiments. A comparison to Tables 3and 6 indicates that the average correlation is only mar-ginally better when using LGperc. LGperc gives a significantincrease in correlation for experiment 4. This is reasonablesince experiment 4 was designed specifically to examinethe effect of the integration limits. Conversely, for experi-ment 2 the correlation drops significantly when usingLGperc. To investigate this further, the LEV scores forexperiment 2 are plotted versus LGperc in Fig. 16.

Recall that in experiment 2 there were nine sound fieldsreproduced at three different playback levels. The symbolsused for the data points in Fig. 16 indicate the playbacklevels (3 dB, 0 dB, 3 dB). It can be seen that the datatend to form three separate groups according to the play-back level. This suggests that LGperc is not accounting forthe effects of level sufficiently well.

4.3 Components of LEVThe results of the subjective tests have demonstrated

that listener envelopment is related to the relative level andangular distribution of the late-arriving sound. LGpercattempts to account for both of these factors at the sametime. However, the results of Fig. 16 indicate that it maybe better to consider the components of LEV separately.

Recall the generalized equation for LG x3 ,

.LG

d

ddB

p t t

p t t

A

F

xx

2

0

23

3

3

#

#

^

^

h

h

7 A (6)

A quantity that is commonly used to measure spatialaspects in concert-hall acoustics is the lateral energy frac-tion defined as

log

d

ddBLF

p t t

p t t10.

.

.

O

F

00 08

2

0

0 08

2

0

0 08

#

#

^

^

h

h

R

T

SSSSS

7

V

X

WWWWW

A (7)

where pF(t) is the instantaneous sound pressure as meas-ured using a figure-of-eight microphone and pO(t) is meas-ured using an omnidirectional microphone [16]. Bychanging the integration limits we obtain a measure of thespatial aspects of the late energy,

.LF

d

d

p t t

p t t

O

F

x

x

x

2

23

3

3

#

#

^

^

h

h

(8)

Substituting Eq. (8) into Eq. (6) gives

logLG

d

ddB

LF

p t t

p t t10

O

A

x

xx

2

0

2 :3

3

33

#

#

^

^

h

h

R

T

SSSSS

7

V

X

WWWWW

A (9)

836 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

Fig. 16. LEV versus LGperc, experiment 2.

LGperc

-2 0 2 4 6 8 10 12 14

LE

V

0

10

20

30

40

50

60

70

80

90

100

-3 dB

0 dB

+3 dB

Exp 1 Exp 2 Exp 4 Average

LGperc 0.963 0.901 0.960 0.941

Table 6. Correlation between LEV scores and LGperc.

Octave Bands Integration Limit x(Hz) (ms)

63 –125 160250 160500 1601000 752000 554000 458000 45

Table 5. Perceptually motivated integration limits.

Page 63: Journal AES 2003 Sept Vol 51 Num 9

PAPERS OBJECTIVE MEASURES OF LISTENER ENVELOPMENT

and

.log logLG

d

dLF dB

p t t

p t t10 10

O

A

xx

x2

0

23

3

3

3

#

#

^

^

h

h

R

T

SSSSS

9 7

V

X

WWWWW

C A

(10)

The first term in Eq. (10) is recognized as a measure of therelative sound level of the late energy Gx

3 (referred to asstrength in concert-hall acoustics) [5],

.log

d

ddBG

p t t

p t t10

O

A

xx

2

0

23

3

3

#

#

^

^

h

h

R

T

SSSSS

7

V

X

WWWWW

A (11)

Substituting Eq. (11) into Eq. (10) yields

.logLG LF dBG 10 x x x3 3 3

9 7C A (12)

We define the right-hand term in Eq. (12) as S x3

log LFS 10x x_3 39 C (13)

to give a new representation of LG x3,

.LG dBG S x x x3 3 3

7 A (14)

In Eq. (14) the first term accounts for the level compo-nent of LG x

3 and the second term accounts for the spatial

distribution of the sound energy,

LG x3 level component spatial component.

Eq. (14) enables us to consider the effects of the twocomponents of LEV separately. Specifically, we can applyseparate weightings to the two components to reflectinfluence of LEV,

αLEV G Sperc perc. (15)

where we now use the perceptually motivated integrationlimits defined in Table 5.

Using the data from experiment 2, it was found that avalue of α 0.5 in Eq. (15) gave the highest correlation(r 0.983). Thus we propose a new objective measure ofLEV that accounts for the relative influence of level andspatial distribution,

. .GS dBG S0 5 perc perc perc$ 7 A (16)

Fig. 17 plots the measured values of GSperc for experiment2 versus the corresponding LEV scores. It can be seen thatthis new measure is significantly better than previousmeasures.

The data from experiments 1 and 4 were analyzed andit was found that a value of α 0.5 also gave the highestcorrelation for those experiments. The results are given inTable 7 and plotted in Figs. 18 and 19, respectively. Theresults show that GSperc outperforms all other objectivemeasures of LEV significantly for all three subjectiveexperiments. The correlations resulting from GSperc arethe highest obtained for each individual experiment as

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 837

Exp 1 Exp 2 Exp 4 Average

GSperc 0.980 0.983 0.975 0.979

Table 7. Correlation between LEV scores and GSperc.

Fig. 17. LEV versus GSperc, experiment 2.

GSperc

,dB

0 1 2 3 4 5 6 7 8 9 10 11 12 13

LE

V

0

10

20

30

40

50

60

70

80

90

100

Page 64: Journal AES 2003 Sept Vol 51 Num 9

SOULODRE ET AL. PAPERS

well as on average.GSperc is a new objective measure of LEV that accounts

for the frequency-dependent integration properties of thehuman auditory system, and also for the relative influenceof the level and spatial distribution components of LEV. Ithas been shown to be robust in the sense that it provides avery good estimate of subjective LEV scores across abroad range of experimental conditions.

Since GSperc consists of a level component and a spatialdistribution component, it should be possible to use othermeasures to represent the spatial distribution component.For example, the interaural cross correlation (IACC) iscommonly used to measure the spatial distribution of asound field. IACC is measured using a dummy head. Itseems reasonable that the spatial component Sperc ofGSperc could be represented by a measure based on IACC.This is the subject of ongoing research and is presentlybeing investigated.

5 CONCLUSIONS

Formal double-blind subjective tests were conducted toinvestigate the acoustical parameters that influence theperception of LEV. Subjects rated the amount of LEV in aseries of sound fields. The tests employed a modified ver-sion of the MUSHRA methodology and were conductedin a typical listening environment.

The first two subjective experiments showed that theperception of listener envelopment is influenced primarilyby the overall playback level as well as the level and angu-lar distribution of the late arriving sound. These parametersare not entirely independent in their effect on LEV. Thesubjective data showed a strong interaction between thelevel and the angular distribution of the late arriving sound.

In the third experiment subjects rated the degree ofLEV resulting from a gated burst of energy that varied inonset time and amplitude. The perception of LEV was

838 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

Fig. 19. LEV versus GSperc, experiment 4.

Fig. 18. LEV versus GSperc, experiment 1.

GSperc

,dB

0 1 2 3 4 5 6 7 8 9 10 11 12 13

LE

V

0

10

20

30

40

50

60

70

80

90

100

GSperc

,dB

0 1 2 3 4 5 6 7 8 9 10 11 12 13

LE

V

0

10

20

30

40

50

60

70

80

90

100

Page 65: Journal AES 2003 Sept Vol 51 Num 9

THE AUTHORS

PAPERS OBJECTIVE MEASURES OF LISTENER ENVELOPMENT

related to both the onset time and the amplitude of thegated burst of energy. Later onset times and higher ampli-tudes provide an increase in the perception of LEV.

In the fourth experiment subjects were exposed to a widerange of sound fields in which the temporal and spatialaspects of the early and late energy were broadly varied.

The relative late lateral sound level LG80∞ was evaluated as

a predictor of the LEV scores in these experiments. It wasfound that LG80

∞ was effective in some cases, but not in oth-ers. Alternate limits of integration were examined and it wasfound that the optimum value varied across the subjectiveexperiments. This suggested that the limits of integrationshould be frequency dependent, and a perceptually moti-vated set of integration limits was derived based on the for-ward masking properties of the human auditory system.

A new representation of the late lateral sound level wasderived to separate it into a level component and a spatialdistribution component. This, in turn, allowed the twocomponents to be weighted differently to reflect their rel-ative influence on the perception of LEV. An optimumweighting for the two components was found, and a newobjective measure GSperc was proposed. GSperc was foundto be the best predictor of LEV for all of the subjectiveexperiments, and is thus proposed as the preferred objec-tive measure of LEV.

6 REFERENCES

[1] T. Holman, 5.1 Surround Sound up and Running(Focal Press, an imprint of Butterworth-Heinemann,Boston, MA, 2000).

[2] G. A. Soulodre and J. S. Bradley, “The Influence ofLater Arriving Energy on Concert Hall SpatialImpression,” in Proc. Sabine Centennial Symp.(Cambridge, MA) (Acoustical Society of America,Woodbury, NY, 1994), pp. 101–104.

[3] M. Morimoto and Z. Maekawa, “AuditorySpaciousness and Envelopment,” in Proc. 13th Int. Congr.on Acoustics (Belgrade, 1989), vol. 2, pp. 215–218.

[4] J. S. Bradley and G. A. Soulodre, “ListenerEnvelopment: An Essential Part of Good Concert HallAcoustics,” J. Acoust. Soc. Am., vol. 99, p. 22 (1996 Jan.).

[5] J. S. Bradley and G. A. Soulodre, “ObjectiveMeasures of Listener Envelopment,” J. Acoust. Soc. Am.,

vol. 98, pt. 1, pp. 2590–2597 (1995 Nov.).[6] G. A. Soulodre and J. S. Bradley, “An Objective

Measure of the Listener Envelopment Component ofSpatial Impression,” in Proc. 15th Int. Cong. on Acoustics.(Trondheim, Norway, 1995), vol. 2, pp. 649–652.

[7] G. A. Soulodre, M. C. Lavoie, and S. G. Norcross,“Investigation of Listener Envelopment in MultichannelSurround Systems,” presented at the 113th Convention ofthe Audio Engineering Society, J. Audio Eng. Soc.(Abstracts), vol. 50, pp. 964–965 (2002 Nov.), paper 5676.

[8] G. A. Soulodre, M. C. Lavoie, and S. G. Norcross,“Temporal Aspects of Listener Envelopment in MultichannelSurround Systems,” presented at the 114th Convention of theAudio Engineering Society, J. Audio Eng. Soc. (Abstracts),vol. 51, pp. 429–430 (2003 May), paper 5803.

[9] ITU-R BS.1116, “Methods for the SubjectiveAssessment of Small Impairments in Audio SystemsIncluding Multichannel Sound Systems,” InternationalTelecommunications Union, Geneva, Switzerland.

[10] ITU-R BS.775-1, “Multi-channel Stereophonic SoundSystem with or without Accompanying Picture,” Inter-national Telecommunications Union, Geneva, Switzerland.

[11] ITU-R BS.1534, “Methods for the SubjectiveAssessment of Intermediate Audio Quality,” InternationalTelecommunications Union, Geneva, Switzerland.

[12] G. A. Soulodre, M. C. Lavoie, “SubjectiveEvaluation of Large and Small Impairments in AudioCodecs,” in Pro. AES 17th Int. Conf. (Florence, Italy,1999), pp. 329–336.

[13] W. Jesteadt, S. P. Bacon, and J. R. Lehman,“Forward Masking as a Function of Frequency, MaskerLevel, and Signal Delay,” J. Acoust. Soc. Am., vol. 71, pp.950–962 (1982 Apr.).

[14] B. C. J. Moore and B. R. Glasberg, “Growth of For-ward Masking for Sinusoidal and Noise Maskers as a Func-tion of Signal Delay; Implications for Suppression in Noise,”J. Acoust. Soc. Am., vol. 73, pp. 1249–1259 (1983 Apr.).

[15] G. A. Soulodre, “Adaptive Methods for RemovingCamera Noise from Film Soundtracks,” Ph.D. Thesis,McGill University, Montreal, Que., Canada (1998).

[16] M. Barron and A. H. Marshall, “SpatialImpression Due to Early Lateral Reflections in ConcertHalls: The Derivation of a Physical Measure,” J. SoundVibr., vol. 77, pp. 475–494 (1981).

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 839

G. A. Soulodre M. C. Lavoie S. G. Norcross

Page 66: Journal AES 2003 Sept Vol 51 Num 9

SOULODRE ET AL. PAPERS

840 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

Gilbert A. Soulodre received B.Sc. and M.Sc. degreesin electrical engineering from the University of Manitobain Winnipeg, Canada. In 1987 he joined the Audio andAcoustics department of Bell-Northern Research (nowNortel) as a member of the scientific staff. There he wasinvolved in the development of digital audio systems fortelecommunications. In 1990 he began research on thedevelopment of adaptive DSP algorithms for removingnoise from audio signals for his Ph.D. degree. From 1991to 1994 he was an assistant professor in the GraduateProgram in Sound Recording at McGill University,Montreal, Quebec, Canada.

Dr. Soulodre is currently a researcher with theAdvanced Audio Systems Group at the CommunicationsResearch Centre, Ottawa, Canada. There his main focus isin the areas of audio processing and sound perception. Heparticipates in the ITU-R audio standards committees andwas heavily involved in the development of the BS.1116and MUSHRA standards for subjective testing. He wasalso a research adjunct professor of psychology atCarleton University, where he examined the subjectivecomponents of sound fields in concert halls and multi-channel surround systems.

In 1996, Dr. Soulodre was recognized for his work onspatial impression and listener envelopment by theAcoustical Society of America and the American Instituteof Physics. He is a fellow of the AES.

Michel C. Lavoie received a bachelor’s degree in elec-trical engineering from the University of Manitoba,Canada, in 1984. For the next two years he worked for theCanadian Broadcasting Corporation, where he occupiedvarious positions in radio and television production. From

1986 until 1995 he worked as an independent contractorin live and recorded audio production. In 1995 he joinedthe Signal Processing and Psychoacoustics Group at theCommunications Research Centre, Ottawa, Canada,where he conducts research in subjective testing.

Scott G. Norcross received a B.Sc. degree in physicsfrom McGill University, Montreal, Quebec, Canada, in1993. He joined the Audio Research Group at theUniversity of Waterloo, Canada, and received an M.Sc.degree in physics in 1996, under the supervision ofProfessors John Vanderkooy and Stanley Lipshitz. His the-sis was on “The Effects of Nonlinearity on ImpulseResponse Measurements.”

Mr. Norcross then spent the next year at the AmericanUniversity in Washington, DC, where he taught acoustics,electronics, and audio technology for the AudioTechnology Program in the Department of Physics. In1997 he joined the acoustics group at the NationalResearch Council (NRC) of Canada in Ottawa, under thesupervision of Dr. John S. Bradley. There he worked onacoustical measurement systems for concert halls, rooms,airplanes, and offices; and designed and conducted subjec-tive tests on speech intelligibility and open officeacoustics. He is now a research engineer in the AdvancedAudio Systems Group at the Communications ResearchCentre in Ottawa, working under the supervision of Dr.Gilbert Soulodre, doing research on DSP techniques andsubjective aspects of multichannel audio and inverse filter-ing for room and loudspeaker equalization. The latter is theresearch area for his Ph.D. degree in electrical engineeringat the University of Ottawa, which he is currently workingon under the supervision of Prof. Martin Bouchard.

Page 67: Journal AES 2003 Sept Vol 51 Num 9

LETTERS TO THE EDITOR

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 841

MORE COMMENTS ON PRESIDENTS MESSAGEAND COMMENTS*

Dr. Immink writes1: “Over the years the ‘old regime’generation of analog audio equipment has beenmade obsolete, and the ‘new regime,’ digital, tooksway.”

Now, I am sure that most people know exactly what thatmeans, and it reflects current widely expressed opinions,but it gives an impression that has unfortunate, and poten-tially serious, consequences. In professional sound rein-forcement and distribution, a very great deal of analogequipment is already installed, with a long life expectancy,and a very great deal is still being manufactured andinstalled. These installations may or may not also includedigital equipment, of course.

The consequences that I refer to here include:

• A strong reluctance to provide education and training inanalog techniques

• An aversion by students to take up one of the limitednumber of such courses offered

• A reluctance of standards-making bodies, notably IECTC100 and its supporters in the industry, to devoteresources to standards for analog audio equipment.

This last point is particularly significant because ofdevelopments in Europe. The fire-alarm industry is keento establish performance standards (multipart EN 54) forvoice-alarm systems, which are, of course, sound distribu-tion systems. This work is not being done, as would beexpected, in CENELEC, but in CEN, because that is thestandards-making body for all other European fire engi-neering and prevention industry standards. There is paral-lel international work under way in ISO.

In order to guide the fire-alarm experts toward realisticand economically achievable performance requirements,we in the UK audio industry see the great advantages ofappealing, for general information and proven methods ofmeasurement, to the long-established multipart interna-tional standard for sound systems, IEC 60268. It is clearlyunacceptable to impose performance requirements that arenot based on well-defined and economically imple-mentable methods of measurement.

Authoritative implications that the techniques referredto, and thus the standards that deal with them, are obsoleteare not helpful in this process.

It should be noted that the standard EN 54 is very likelyto be cited under the European Construction ProductsDirective, which requires third-party certification forequipment. Companies intending to supply European mar-kets in the future may well need to pay attention to this.

JOHN WOODGATE

Rayleigh, Essex SS6 8RG, UK

LETTERS TO THE EDITOR

* Manuscript received 2003 April 21.1 J. Audio Eng. Soc. (Letters to the Editor), vol. 51, pp.

251–252 (2003 Apr.).

Page 68: Journal AES 2003 Sept Vol 51 Num 9

Call for Comment on DRAFT AES5-20xx,DRAFT REVISED AES RecommendedPractice for Professional Digital Audio—Preferred sampling frequencies for applications employing pulse-codemodulation has been publishedThis document was developed by a writing group of theAudio Engineering Society Standards Committee(AESSC) and has been prepared for comment according toAES policies and procedures. It has been brought to the at-tention of International Electrotechnical CommissionTechnical Committee 100. Existing international standardsrelating to the subject of this document were used and ref-erenced throughout its development.

To view this document got to http://www.aes.org/standards/b_comments/cfc-draft-aes5-20xx.

Address comments by mail to the AESSC Secretariat,Audio Engineering Society, 60 E. 42nd St., New York,NY 10165, US; or by e-mail to the secretariat [email protected]. E-mail is preferred. Only comments soaddressed will be considered. Comments that suggestchanges must include proposed wording. Comments mustbe restricted to this document only. Send comments toother documents separately.

This document will be approved by the AES after anyadverse comment received within three months of the pub-lication of this call on www.aes.org/standards, 2003-07-02,has been resolved. All comments will be published on theWeb site.

Persons unable to obtain this document from the Website may request a copy from the secretariat at: AudioEngineering Society Standards Committee, DraftComments Dept., Audio Engineering Society, Inc., 60 E.42nd St., New York, NY 10165, US.

Because this document is a draft and is subject to change,no portion of it shall be quoted in any publication withoutthe written permission of the AES, and all published ref-erences to it must include a prominent warning that thedraft will be changed and must not be used as a standard.

Call for Comment on DRAFT AES11-20xx, DRAFT REVISED AES Recommended Practice for Digital AudioEngineering—Synchronization of digitalaudio equipment in studio operationshas been publishedThis document was developed by a writing group of theAudio Engineering Society Standards Committee(AESSC) and has been prepared for comment according toAES policies and procedures. It has been brought to the at-tention of International Electrotechnical CommissionTechnical Committee 100. Existing international standardsrelating to the subject of this document were used and ref-erenced throughout its development.

To view this document got to http://www.aes.org/standards/b_comments/cfc-draft-aes11-20xx.

Address comments by mail to the AESSC Secretariat,Audio Engineering Society, 60 E. 42nd St., New York,NY 10165, US; or by e-mail to the secretariat [email protected]. E-mail is preferred. Only comments soaddressed will be considered. Comments that suggestchanges must include proposed wording. Comments mustbe restricted to this document only. Send comments toother documents separately.

This document will be approved by the AES after anyadverse comment received within three months of the pub-lication of this call on www.aes.org/standards, 2003-07-10,has been resolved. All comments will be published on theWeb site.

Persons unable to obtain this document from the Website may request a copy from the secretariat at: AudioEngineering Society Standards Committee, DraftComments Dept., Audio Engineering Society, Inc., 60 E.42nd St., New York, NY10165, US.

Because this document is a draft and is subject to change,no portion of it shall be quoted in any publication withoutthe written permission of the AES, and all published ref-erences to it must include a prominent warning that thedraft will be changed and must not be used as a standard.

842 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

COMMITTEE NEWSAES STANDARDS

Information regarding Standards Committee activi-ties including meetings, structure, procedures, re-ports, and membership may be obtained viahttp://www.aes.org/standards/. For its publisheddocuments and reports, including this column, theAESSC is guided by International ElectrotechnicalCommission (IEC) style as described in the ISO-IECDirectives, Part 3. IEC style differs in some respectsfrom the style of the AES as used elsewhere in thisJournal. For current project schedules, see the pro-ject-status document on the Web site. AESSC docu-ment stages referenced are proposed task-groupdraft (PTD), proposed working-group draft (PWD),proposed call for comment (PCFC), and call forcomment (CFC).

Page 69: Journal AES 2003 Sept Vol 51 Num 9

Report of the SC-03-06 Working Groupon Digital Library and Archive Systemsof the SC-03 Subcommittee on thePreservation and Restoration of AudioRecording meeting, held in conjunctionwith the AES 114th Convention in Amsterdam, The Netherlands, 2003-03-22

Chair T. Sheldon convened the meeting and welcomed allattendees.

The agenda and the report from the 2002-10 meeting inLos Angeles were approved as distributed.

Open projects

AES-X98 Review of Audio MetadataThe administrative metadata area is being addressed in twoparts.

Part I: Review of Administrative Metadata for Audio.Audio Processing History Metadata Schema. D.Ackerman, principle author, presented the latest workingdraft from Task Group SC-03-06-A and reviewed thebackground that led to its current shape. It is not aframework for data. Rather it documents what equipmentwas used, how it was set up, and other, similar, pertinentinformation that could be needed in the future. The schemais designed to be flexible and adaptable. Ackerman felt thatthis Processing History metadata should not be transmittedwith the data.

R. Wright indicated concern about alignment of the draftwith SMPTE approaches and documents. C. Chambers saidthat the opportunity exists to feed back into the SMPTEDictionary any additional elements that are needed.

Part II: Core Audio. Ackerman serves as principle authorfor this document also. He proposes to change the approachof Core Audio to align it better with SMPTE. At somepoint, SMPTE will be consulted for comments to strengthenalignment of the Core Audio document with SMPTEmetadata principles and approaches. The meeting agreedthat the Core Audio document should be developed as astandard.

The Descriptive Metadata draft document was reviewedand deemed ready for review as a PWD. Following thatstage it will be formatted by the Standards Secretariat withthe aim of publication as a standard.

Liaisons

AES-X120 Liaison with International Association ofSound and Audiovisual Archives (IASA)The current arrangements for the liaison relationship wasreviewed. It was reported that within IASA the liaison hasnow been assigned to the Technical Committee. Theliaison has been strengthened in the last year by joint mem-berships on both IASA and AES committees.

New projectsThe possibility of reorganizing the work undertaken by SC-03-06 and SC-06-06 was noted and discussed briefly.

New businessThere was no new business.

The next meeting is scheduled to be held in conjunctionwith the AES 115th Convention in New York, NY inOctober 2003.

Report of the SC-04-03 Working Groupon Loudspeaker Modeling and Measure-ment of the SC-04 Subcommittee onAcoustics meeting, held in conjunctionwith the AES 113th Convention in Ams-terdam, The Netherlands on 2003-03-23Vice chair N. Harris convened the meeting.

The agenda and the report of the previous meeting inLos Angeles were accepted as written.

Open projects

AES-1id-R Review of AES-1id-1991 (r2003) AES infor-mation document—Plane-wave tubes: design andpracticeM. Dodd enquired about optimum placement of the mi-crophone in plane wave tubes (PWT). J. Panzer suggestedthat near the wall avoids pressure nodes; a short technicaldiscussion ensued.

D. Gunness and Dodd suggested that the PWT is nownot commonly used for compression driver design—it isonly good for low frequency measurements. The main areaof concern for this type of driver is its high frequency per-formance. Gunness said that he tests drivers through a ref-erence horn.

J. Woodgate reminded the committee that the Call ForComment (CFC) for reaffirmation period closed on 2003-03-14.

AES-5id-R Review of AES-5id-1997 (r2003) AES infor-mation document for room acoustics and sound-reinforcement systems—Loudspeaker modeling and measurement—Frequency and angular resolution formeasuring, presenting and predicting loudspeaker polardataNo action was proposed or required

AES2-R Revision of AES2-1984 (r2003) AES recommended practice—Specification of loudspeaker components used in professional audio and sound reinforcementThis project is awaiting contributions. Woodgate gave averbal report from Task Group SC-04-03-A. He noted that,for example, graph scales and some words were incon-sistent with the IEC 60268-5 specification.

D. Clark recommended one mandatory sheet and one op-tional sheet. The group recommended using a graph perIEC-60263 with 25 dB/decade ratio for scale. Woodgate toclarify. The reference should point to the IEC document.

A number of questions need to be considered. “Point ofrotation” for polar measurements should be defined. “AESMusic program” should be clarified. Measurement ofpower rating should be compared with the “long termmaximum power” specification of IEC 60268-5.

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 843

AES STANDARDSCOMMITTEE NEWS

Page 70: Journal AES 2003 Sept Vol 51 Num 9

It was felt that the simulated program signal should betailored to driver bandwidth. Bandwidth-limiting filter slopesneed to be clarified: IEC 60268-5 uses 24 dB per octave.Sound pressure levels should be specified to a precision of0.1 dB.

W. Klippel proposed using DC resistance of the voicecoil to derive figures for power compression. In discussion,those present were not in favor of abandoning “Xmax,” aparameter describing maximum driver excursion. It shouldinstead be clarified.

AES19-R Review of AES19-1992 (r1998) AES-ALMAstandard test method for audio engineering—Measurementof the lowest resonance frequency of loudspeaker conesNoted that a Call for Comment for Withdrawal of thisstandard was in progress. The document will subsequentlybe maintained by ALMA International.

Development projectsAES-X72 Acoustic Center of LoudspeakersIn discussion it was felt that the term “Acoustic Center”was unhelpful. Instead a number of alternative terms wasexplored: Point of Rotation, Point of Reference, ReferencePoint (IEC), Wavefront Shape Center, Apparent SourceLocation, Temporal Center, or Time Center.

Woodgate expressed concerns over the use of the term“Point of Reference” as he felt it could be confused withthe IEC “Reference Point.” Gunness felt that the the twoterms were equivalent. Woodgate pointed out that such aReference Point did not “appear to have the properties ofan acoustic center—it is purely mechanical.”

It was noted that for purely practical purposes, that thePoint of Rotation is typically the center of gravity of theloudspeaker. It usefully provides an unambiguous ref-erence point.

It was noted that Temporal Center is not a fixed point atany frequency. Center of wavefront is also frequency de-pendent. These are not good references, but it should bestated why this is so in the document, together with thereasons why we cannot define “acoustic center.”

The literature suggests that knowledge of a driver’sphysical location and rotation axis should allow proper re-construction of polar data. Woodgate suggests that a moreclearly defined name convention is needed.

AES-X103 Large Signal Parameters of Low-FrequencyLoudspeaker DriversThe meeting felt that the “Xmax” name should remainbecause people would continue to use the term in any case.However, it needed a proper definition in AES2. Klippelgave a verbal report on his progress on this project.

Various approaches to measurement of Xmax were dis-cussed. Physical displacement calculation could use theKeele method or it could be measured using a laser dis-placement meter.

Klippel suggests clarification for Xmax, that it could bespecified as 10 % THD or M % THD, or M % stiffness; inother words it could be specified in terms of any nonlinearmodel parameter. In practice there were not many mea-surement system options.

For two-tone methods some defined symbol or nomen-clature is needed. For example, “[email protected]” indicating thatthe upper frequency, F2, was 8.5 times the frequency of F1.

The IEC method could be used for measurement of per-centage Total Harmonic Distortion (THD) orIntermodulation Distortion (IMD), whichever is higher.There was some discussion of the IEC method where F1was at Resonance and F2 was set to give a level 12 dBbelow the F1 level. An alternative suggests that theMaximum SPL measurement is made using a two-tone testwhere F1 is set to give a level 3 dB below that at resonance.

It was felt that a standard format is needed for use withthese nonlinear model parameter approaches. Gunness ob-served that most designers were using the physical mea-surement approach of M. Gander.

Klippel will prepare a draft document for the group.

AES-X129 Loudspeaker Distortion Perception andMeasurementR. Heinecke-Schmitt presented a synopsis of her re-search work from 1996 into the audibility of nonlineardistortions as typically generated by loudspeakers.Correlation of distortion to listener perception using dif-ferent source materials with the distortion electricallygenerated using Klippel’s mirror filter method. Testingwas made to determine perception thresholds of dis-tortion in different source materials. The importance ofsignal type was emphasized. Two AES preprints exist,4016 and 4131, and this re-presentation of her 1996paper was felt to be useful.

R. Cabot requested that her presentation be posted to thegroup document site. Heinecke-Schmitt agreed that, withsome modification, it would be possible. Heinecke-Schmittis also in the process of completing her doctorate on an ex-tension of the same work and indicated that upon com-pletion, and with an invitation, she would present theresults to the Working Group.

J. Stewart asked whether there was a good mea-surement—even for a specific class of signal—that wouldcorrelate with listener perception.

Research indicated which model parameter was mostcritical but not how you would measure the objectiveeffects of that parameter.

S. Temme had made contact with representatives fromOpticom who could not attend our meeting, but agreed torun PEAQ evaluations on some recordings of loudspeakers.

Should a standard specify the parameters or the dis-tortions? Following a brief discussion of the values thatmight appear on a specification sheet—direct measures, ormodel parameter values—the meeting was closed with thediscussion to continue.

New projectsNo new projects were received or introduced

New businessThere was no new business.

The next meeting will be held in conjunction with theAES 115th Convention in New York, NY, US, 2003-10.

844 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

AES STANDARDSCOMMITTEE NEWS

Page 71: Journal AES 2003 Sept Vol 51 Num 9
Page 72: Journal AES 2003 Sept Vol 51 Num 9

AES23rd INTERNATIONALCONFERENCE

Signal Processing in AudioRecording and Reproduction

846 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

Marienlyst Hotel, HelsingørCopenhagen, Denmark

May 23–25, 2003

Jeff Bier, keynote speaker Kees Immink, AES president Per Rubak, conference chair

Page 73: Journal AES 2003 Sept Vol 51 Num 9

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 847

The town of Helsingør is situated on the north-east corner of Denmark overlooking the nar-rowest part of the Øresund, a busy waterwaythat links the North Sea to the Baltic Sea. Thehistory of the town can be traced back to 70AD, and the area contains a number of impres-

sive royal castles dating from as early as 1100 AD. Againstthis historic backdrop, from May 23rd to the 25th, the AESheld its 23rd International Conference, Signal Processing inAudio Recording and Reproduction, covering some of thevery latest techniques in signal processing for audio.

The conference committee worked hard to ensure thatthe conference was a success. Per Rubak as conferencechair and Jan Abildgaard Pedersen and Lars Gottfried Jo-hansen as papers cochairs drew together over 20 excellentpapers covering many aspects of signal processing. Theywere ably assisted by Knud Bank Christensen, conferencesecretary, Eddy Bøgh Brixen, facilities chair, and SubirPramanik, treasurer.

The conference was held in the Marienlyst Hoteland Conference Center, overlooking the Øre-sund and the shore of Sweden across the wa-ter. Included in the program was a mix ofpapers sessions, demonstrations, andsocial events. In addition to these, therewere plenty of opportunities for thedelegates to discuss hot topics and tocontribute to the global communitythat is the Audio Engineering Society.

OPENINGConference Chair Per Rubak opened theproceedings by giving an overview ofthe wide range of applications that werepossible through the increasing use andavailability of digital signal processingtechnology. He reflected on the great

amount of progress that has been achieved in this area sofar and hoped that the conference would provide inspirationfor further progress.

AES President Kees Immink thanked the hard-workingcommittee for producing a successful conference. He statedthat it was an odd place to hold an audio event, as the reign-ing king of Denmark had introduced a tax on the Sound in1429. He was, of course. talking about the tax levied onships passing through the Øresund (Sound). He encouragedthe delegates to take advantage of the conference format byconferring with each other to gain from the international ex-pertise and knowledge of those present.

SIGNAL PROCESSING HARDWAREThe keynote speech, given by Jeff Bier of Berkeley DesignTechnology, was an overview of trends in signal pro-

Page 74: Journal AES 2003 Sept Vol 51 Num 9

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September848 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

cessing hardware. He covered different types of signal pro-cessing hardware that may be used for audio purposes, con-sidering not only the processing capability but also thepractical and commercial aspects of each option. He ex-plained that one of the main decisions that has to be madewhen choosing a suitable processor is the compromise be-tween efficiency and flexibility, and that the optimumchoice can differ greatly depending on the specific applica-tion. Bier summarized a number of important trends in theconsumer marketplace—including the convergence of mul-timedia applications into other devices and the increasingconnectivity between devices. “These are interestingtimes,” he stated, because of the increasing ubiquity of dig-ital audio, the change in emphasis from hardware to soft-ware, and the capabilities and challenges of increasing con-nectivity. He forcast that we will see a great increase in theuse and development of the techniques discussed at theconference.

SIGNAL CONVERSIONThe afternoon session of the first day started with a paperby Søren Nielsen and Thomas Lund of TC Electronics, fo-cusing on the topic of overload in signal conversion. It wasan interesting study into the clipping that can occur evenwhen the digital sample peaks are below the full range af-forded by the system. The authors suggested a measure-ment technique to evaluate this problem and showed resultsfrom a number of processes and commercial devices. Theyfinished by summarizing that the problem could be avoidedby reducing the level either on the recorded media or priorto conversion.

SPEECH PROCESSING AND METADATAThis session started with an invited paper by PatrickBastien of TC-Helicon on voice-specific signal processingtools. He explained some of the unique characteristics ofvoice signals that have to be considered when processingand gave a number of entertaining and informative audiodemonstrations to illustrate the potential pitfalls and solu-tions. He concluded by showing a complex voice-modeling

system that could alter andenhance vocals, but he con-ceded that it could not yetturn Joe Cocker into BritneySpears.

Continuing the topic ofspeech processing, NielsHenrik Pontoppidan andMads Dyrholm of the Tech-nical University of Denmarkcovered the problem of sepa-rating multiple speech signalscaptured with one receiver.Demonstrations of the systemthey developed showed thateven though the resulting au-dio quality was not of a highstandard, the speech signalswere sufficiently separated to

allow further analysis and processing.The final paper of the day was presented on video tape,

because invited author Elizabeth Cohen of UCLA was notable to travel to the conference. In the presentation she dis-cussed the value of metadata and stressed the importance ofcreating metadata at the same time as the content so that itcan accompany the audio signal through the entire chainfrom creation to product.

INTERFACING LOUDSPEAKER AND ROOMOne of the major emerging trends in signal processing forsound reproduction that was discussed at the conferencewas compensation for a less than ideal interaction betweenthe loudspeaker and the room. Saturday morning’s sessionwas devoted entirely to this topic, covering a range frommodal equalization to full room compensation. One of thecommon themes throughout these presentations was the im-portance of the spatial robustness of the processing, to en-sure that any attempts to improve the sound at a single lis-tening position do not degrade the sound anywhere else inthe room.

In the first presentation of the session, Jan AbildgaardPedersen of Bang & Olufsen noted that loudspeaker manu-facturers go to great trouble to optimize a large number ofparameters. However, the final reproduction room of thecustomer has a large effect on the sound. But since the prop-erties of the room can vary widely and are unknown by themanufacturer, they cannot be compensated for in a static de-sign. A solution to this dilemma is the use of active compen-sation, where the processing is adapted to take into accountthe effect of an individual room and even a specific loud-speaker position within that room. Pedersen described onemethod that involves measurement and compensation foreach loudspeaker. He explained that this can be done by in-cluding a microphone in the loudspeaker cabinet to makemeasurements of the loudspeaker radiation resistance fortwo receiver positions. This information can then be used tocorrect the frequency response from 20 to 500 Hz.

A Bang & Olufsen loudspeaker with this technology wasshown in a small demonstration room during the confer-

Clockwise, from bottom left: Wolfgang Klippel, Jean-MarcJot, Ronald Aarts, and Patrick Bastien.

INVITED AUTHORS

AES 23rd INTERNATIONAL CONFERENCE

Page 75: Journal AES 2003 Sept Vol 51 Num 9

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 849

ence. The loudspeaker was used for replay in more than oneposition in the room, exhibiting the problems of the posi-tion-dependent interaction between the loudspeaker and theroom. The active compensation was then demonstrated byaligning the loudspeaker in each position using the methoddescribed in the presentation. This resulted in the pair ofloudspeakers in different positions having a more similartimbre.

Two further papers on the equalization of room modeswere given by Rhonda Wilson and Michael Capp of Meridian Audio and Matti Karjalainen, Poju Antsalo, andAki Mäkivirta of Helsinki University of Technology andGenelec. The common aim of these papers was to reducethe decay time of low-frequency room modes, though eachused different approaches to analyze and filter the mostprominent modes in a given reproduction room.

The application-based papers on the subject of modalequalization were supported by a theoretical paper presentedby Jean-Dominique Polack of the University of Paris andcoauthored by Jan Abidgaard Pedersen. They used semiclas-sical theory to explain and predict the creation of room

Attendees commented that the quality of the paperspresentations was consistently high throughout the entireconference; questions and comments from the audienceenhanced the exchange of technical information.

Coffee breaks and meals gave attendees time to relax, catchup with old friends, and make new ones.

Page 76: Journal AES 2003 Sept Vol 51 Num 9

850 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

modes and the correction requiredfor loudspeakers at different posi-tions. Comparison of measuredand predicted results showed thatthe approximation was reason-able, but that further refinementscould be used to improve the ac-curacy of the predicted results.

Other papers in the sessioncovered a range of topics relatedto the interaction between a loud-speaker and the room. AndrewGoldberg and Aki Mäkivirta ofGenelec presented a paper thatdescribed an automated method to determine thecorrect equalization settings on an active loud-speaker in a given room based on the measured fre-quency response. By the use of heuristic analysisdeveloped from knowledge gained from manualalignment, the complexity of the process was great-ly reduced, meaning that the system could computethe optimum settings in a short time.

A paper by Etienne Corteel of IRCAM andRozenn Nicol of France Telecom focused on theproblems that the reproduction room can cause forwavefield synthesis (WFS) and how the propertiesof the WFS system can be used to compensate forthis. By the use of simulations they showed that

AES 23rd INTERNATIONAL CONFERENCE

Some of the attendees congregated at the conference center entrance before the short walk to Helsingør Castle.

Jan Pedersen,right,demonstrates newloudspeakertechnology ofBang & Olufsen.

Wolfgang Klippel, left, demonstrates practical systems for loudspeakermeasurement and active compensation.

Page 77: Journal AES 2003 Sept Vol 51 Num 9

Mono

Multichannel

Stereo

• Home Theater/Entertainment

• Wireless + Portable

• Telecom + Voice

• Gaming

• Internet + Broadcast

Technologies. Product Applications

World Wide Partners

• Circle Surround II

• FOCUS

• SRS 3D

• SRS Headphone

• TruBass

• TruSurround XT

• VIP

• WOW

The Future of Audio. Technical information and online demos at www.srslabs.com2002 SRS Labs, Inc. All rights reserved. The SRS logo is a registered trademark of SRS Labs, Inc.C

Aiwa, AKM, Analog Devices, Broadcom, Cirrus Logic, ESS, Fujitsu, Funai,

Hitachi, Hughes Network Systems, Kenwood, Marantz, Microsoft,

Mitsubishi, Motorola, NJRC, Olympus, Philips, Pioneer, RCA, Samsung,

Sanyo, Sherwood, Sony, STMicroelectronics, Texas Instruments, Toshiba

SRS Labs is a recognized leader in developing audio solutions for any application. Its diverse portfolio

of proprietary technologies includes mono and stereo enhancement, voice processing, multichannel

audio, headphones, and speaker design. • With over seventy patents, established platform partnerships

with analog and digital implementations, and hardware or software solutions, SRS Labs is the perfect

partner for companies reliant upon audio performance.

Page 78: Journal AES 2003 Sept Vol 51 Num 9

852 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

WFS can be used to cancel early reflections in the horizon-tal plane over a wide listening area, though only up to thespatial aliasing frequency of the reproduction system.

CREATING SPACE WITH DSPThe Saturday afternoon session focused on using signal pro-cessing to simulate and reproduce spatial properties. An in-vited paper by Jean-Marc Jot and Carlos Avendano of the

Creative Advanced Technology Center considered the prob-lem of combining sources and reproduction systems with awide range of spatial characteristics (mono, 2-channel, and5.1 surround). A number of conversion techniques were dis-cussed, including headphone virtualization, stereo widening,upmixing, and downmixing. To support the presentation aseparate demonstration was given that compared a numberof different techniques for creating a 5-channel surround

AES 23rd INTERNATIONAL CONFERENCE

Saturday evening attendeeswere treated to a guided tour ofhistoric Kronborg Castle,immortalized by Shakespeare inHamlet.

Page 79: Journal AES 2003 Sept Vol 51 Num 9

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 853

sound signal from a 2-channel stereo original.The topic of downmixing was continued in a paper by At-

tila Kiss and István Matók of Digital Pro Studio. This ex-plored the use of the center channel when recording in 5-channel surround sound, considering the effect on anypotential downmixing to 2-channel stereo. Yasuyo Yasudaof NTT DoCoMo presented a paper on 3-D audio for mobilecommunications, which highlighted the wide range of waysin which this technology can be applied. The results of anumber of subjective tests that attempted to evaluate theperformance of such systems with varying levels of com-plexity were also discussed.

A paper by Per Rubak and Lars Gottfried Johansen ofAalborg University that considered the perception of col-oration in room impulse responses was presented by PerRubak. This included a detailed literature review on the top-ic, which resulted in the proposal of a new method of mea-suring this effect.

The final paper of the day was presented by JérômeDaniel of France Telecom. He discussed the problem ofcoding distance in spatial recording and reproduction for-mats such as high-order Ambisonics. He explained that Am-bisonics assumes that the virtual sources and reproductionloudspeakers are in the far field, meaning that plane wavesreach the listener. He showed that this is not the case in apractical situation, but that it can be compensated for by us-

ing digital signal processing. Hefinished by suggesting a methodfor simulating the distance of thesound source together with ameans of transmitting the param-eters to ensure that the sound isrendered correctly.

KRONBORG TOUR ANDBANQUETFollowing the final session of theday, the delegates were treated toa tour of Kronborg Slot, a largecastle built in the Dutch Renais-sance style. It is one of thelargest and most extravagant cas-tles of the period. The tourguides explained that the castlewas built to defend the Øresundand to show off the wealth ofDenmark. It is better known inEnglish-speaking countries as theElsinore Castle immortalized inShakespeare’s Hamlet. It is notknown whether Shakespeare evervisited the castle, though he

could have heard descriptions of the castle and the tradition-al Danish tale on which Hamlet is based from the links be-tween the royalty of Denmark and England.

The tour of the castle was followed by a banquet that in-cluded an excellent musical performance by two members ofthe Danish National Symphony Orchestra, Klaus Tönshoff onclarinet and Per Salo on piano. During the banquet RogerFurness, AES executive director, again thanked the organizingcommittee for the hard work that went into making the confer-ence so successful. After the banquet the delegates had coffeein a lounge overlooking the Sound where they viewed a spec-tacular display of lightning, for which Facilities Chair EddyBøgh Brixen would have liked to claim credit, along with therest of the faultless organization.

DSP IN LOUDSPEAKERSThe final day of the conference consisted of two sessions onsignal processing in loudspeaker systems. Ronald Aarts ofPhilips Research Laboratories presented an invited paperwith an overview of a number of digital signal processingtechniques that can improve the end-user experience. Onesuch technique improves the perceived low-frequency per-formance of a loudspeaker by taking advantage of the psy-choacoustic effect of virtual pitch, where a fundamental fre-quency can be perceived even when it is absent. Hesummarized by stating that the combination of DSP and

Subir Pramanik (far left), conferencetreasurer, and Roger Furness, AESexecutive director, welcomed everyoneat Saturday night’s banquet.

Page 80: Journal AES 2003 Sept Vol 51 Num 9

854 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

psychoacoustics can be a very powerful tool.This was followed by a presentation given by John Mour-

jopolous and Nicholas-Alexander Tatlas of the University ofPatras. They considered the benefits and potential problemsof an all-digital audio signal path from the source to theloudspeaker, including the possible applications of wirelessnetworks and the current limitations in digital loudspeakerdesigns. Peter Mapp of Peter Mapp Associates presented apaper that focused on the range of signal processing em-ployed in sound reinforcement. He outlined the kinds ofproblems that can be alleviated by signal processing andthose that require a physical solution.

Continuing the theme of using digital signal processing tocompensate for poor acoustical performance, two papersdiscussed active compensation of nonlinear distortion inloudspeaker transducers. In an invited paper WolfgangKlippel discussed the need for small, inexpensive, andlightweight loudspeakers with a high power output. Klippelexplained that it can be difficult to achieve high sound qual-ity with such loudspeakers through transducer design alone.He suggested active loudspeaker control as a method of alle-viating some of the problems, and he reviewed the relativeadvantages and disadvantages of a number of techniquesthat could be used for compensation.

Throughout the conference Klippel also gave demonstra-tions of practical systems for loudspeaker measurement andactive compensation. He showed examples of the causes ofnonlinear distortions in loudspeakers and demonstrated theaudible effects on test signals. A short set of measurementswere made of the loudspeaker, from which the required

compensation was derived. The application of this compen-sation to the input signal showed that the problems weregreatly reduced.

Andrew Bright of Nokia also discussed compensation ofnonlinear distortion of loudspeakers in his presentation. Heexplained his use of a simplified algorithm that was ob-tained by a discrete-time model of the loudspeaker nonlin-earity. He also presented experimental results that demon-strated the improvements to be gained by using this model.

The final paper of the conference was by Steen Munk andKennet Skov Andersen of Bang & Olufsen ICEpower. Thisdescribed the use of class D amplifiers, how their perfor-mance may be improved by compensation in the modulatorand in the analog stage of the amplifier, and methods to im-plement control of the amplifier gain. They also comparedthe performance of the class D amplifier with state-of-the-art analog amplifier designs. They found that while the digi-tal amplifier is currently inferior in some respects, it is ex-pected that with further development the performance canbe improved to match or exceed the analog designs.

CONFERENCE CLOSEAfter three days of informative papers that were conduciveto discussion and inspiration, Chair Per Rubak closed theconference by thanking all those involved and hoping thatmany would return for future international conferences inDenmark. Delegates were impressed with the high techni-cal content of the presentations and the relaxed and friendlyatmosphere of the conference. All look forward to similarAES events in the future.

AES 23rd INTERNATIONAL CONFERENCE

Conference committee: from left, seated, Per Rubak, Ole Moesmann, and Martin Rune Andersen; standing, Subir Pramanik,Eddy Bøgh Brixen, Lars Gottfried Johansen, Knud Bank Christensen, Jan Abildgaard Pedersen, and Crilles Bak Rasmussen.Russell Mason, conference webmaster, missed the photo.

Page 81: Journal AES 2003 Sept Vol 51 Num 9

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 855

In one sense digital rightsmanagement, or DRM, hasnothing to do with audio.However, rather in the sameway that metadata (coveredin the July/August issue) isdata that relates to or de-scribes audio information,

DRM is concerned with the means bywhich the intellectual property rights inaudio information is managed in elec-tronic-commerce systems. Its impor-tance cannot be underestimated, as it isthe primary means by which owners ofaudio content hope to get paid in thefast-growing world of e-commerce.

In the May 2002 Journal, Keith Hill’sarticle described some of the businessand technical challenges to be addressedin rights management. This article aimsto introduce some of the key DRM tech-nologies and terms, with the aim of pro-viding an overview of this fast-changingand politically charged field. It is impos-sible to make such an overview entirelycomprehensive owing to the huge rangeof patents, commercial interests, and le-gal issues involved in DRM. It is intend-ed, therefore, to concentrate on some ofthe most prominent technical issues re-lating to DRM for audio that are emerg-ing in current standardization and com-merce. Some useful websites havingfurther details on the technologies andconcepts described in this article are list-ed on page 859.

WHAT IS DRM?The term digital rights management cov-ers a range of technologies intended toperform functions relating to the expres-sion and protection of rights in content.The overall aim is to ensure that theowner of an audio item (for example asong, a sound effect, a piece of dialog)can control the way in which that mate-rial is used. This can be done by grant-ing licenses to those that wish to use itand in some cases receiving payment inreturn. There may also be an element ofprotection involved so that the ownercan prevent unauthorized usage of the

material. Most independent commenta-tors seem to agree that the acquisition ofrights and access to recorded mediamust be a one-step process for the po-tential user. In other words, it must not,like the shareware market, be based onan honor system whereby users gain ac-cess the material freely and then pay theowner if they feel like it; this is simplyinconvenient and unreliable. The soft-ware market has found ways of protect-ing its assets at the same time as allow-ing users to obtain material, includingrestricted trial versions. Although themusic business has some different char-acteristics, there are many things thatcan be learned here.

There are essentially three main ele-ments to DRM: description, identifica-tion, and protection. Description is usedto define what the material is and howit may be used, relying on the use ofmetadata. Identification is used to de-scribe under what circumstances and bywhom the material can be used (the li-cense conditions, for example, ex-pressed using a rights expression lan-guage); then protection normallyrequires the adoption of some technicalmeans to enforce the rights expressedin the previous two elements.

A number of different approaches toDRM have been taken over the years. Inthe audio field there have been many at-tempts at copy protection, such as therelatively simple SCMS (serial copymanagement system) approach used inconsumer equipment. But DRM is muchmore than simple copy protection, asdiscussed below. The field has beencharacterized by numerous technical de-velopments that have resulted in a veri-table sea of patents, many of them run-ning to hundreds or even thousands ofpages, and many resulting from thework of startups and dotcom companiesthat have since been bought out by thebig boys or have otherwise ceased to ex-ist. In fact many observers say that thelarge multinationals have acquired suchcompanies only to gain control of thepatent pools.

The Secure Digital Music Initiative(SDMI) was originally established as acollaborative project to develop opentechnology specifications for new digi-tal music distribution paradigms. It gotas far as developing a Phase 1 portabledevice specification and a watermarksystem but was unable to establish aconsensus for Phase 2. It was put onhold in 2001 pending developments inthe market.

Recently there has been a shake up inthe DRM field due to the recognition (atlast, some might say) by content produc-ers such as record companies that newmodels for music distribution must beintroduced or else the system will beovertaken by external forces. The shake-up has also been due in part to initiativesintroduced under the MPEG banner,such as the IPMP initiative (intellectualproperty management and protection)that was started under MPEG-4, themetadata standards of MPEG-7, and thee-commerce tools inherent in MPEG-21.The result has been a competitive envi-ronment that to some extent has forcedthe issue concerning a standardizedrights expression language (REL) forMPEG-21, forcing rapid realignmentsamong the alliances of the major play-ers. The computer giants such as Appleand Microsoft have jumped in with bothfeet, striving to become major forces inthe on-line content distribution market.The allegiance of Apple and Microsoftand music-industry giants such as Sonyto particular technologies has helped topolarize the field.

IS DRM COPY PROTECTION?Although a lot of early DRM was lit-tle more than copy protection, there isgeneral agreement that copy protec-tion just annoys consumers. Thereforemodels have to be developed that al-low copying but build in safeguardsand means by which the rights ownercan be paid. Many people in the fieldsay that a problem with any form ofcopy protection is that it makes it im-possible for legitimate owners to

DIGITAL RIGHTSMANAGEMENT

Page 82: Journal AES 2003 Sept Vol 51 Num 9

856 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

make “fair use” back-up copies. Thisarticle is mainly about the means bywhich rights can be managed on net-worked media, rather than ways inwhich people can be stopped fromcopying physical media. The two is-sues are closely related, though, andat least one of the weapons in theDRM armory is indeed some form ofcopy protection or encryption.

A form of copy protection has beenapplied to CDs, for example, by somerecord companies in an attempt to re-duce the incidence of college students“ripping” copies on their computers forall their friends. These disks have smallmodifications to the data structure andcertain errors that a normal CD playerwill probably correct or conceal, butthat a CD-ROM drive will probably re-ject as corrupted. However, Philips hasclaimed that such disks cannot properlybe called CD-Audio disks because theymay break the Red Book format re-quirements. Media such as DVD andSACD incorporate various proprietarycopy protection mechanisms, either en-cryption or watermarking or physicalmodification of the data surface.

Recent copyright laws in the UnitedStates, for example the U.S. Digital Mil-lennium Copyright Act 1998 (DMCA),now make it an offense to attempt to cir-cumvent any technical means that mayhave been put in place for protecting in-tellectual property in digital media. Thishas led to a range of objections from or-ganizations that represent the users of le-gitimate copying in professional set-tings. New bills have been presented inthe U.S. Senate that propose to amendthe DMCA so as to give users greaterfreedom. Previously it was the copyingitself that was illegal, but now the at-tempt to circumvent the means of pro-tection is also proscribed. This couldmean, for example, that you would becommitting a crime if you tried to defeatthe SCMS mechanism in a consumerDAT machine to make legitimate copiesfor professional or academic uses,whereas previously it was not illegal.Copyright directives in Europe (see, forexample, European Copyright Directive2001/29/EC at www.eurorights.org/eudmca/CopyrightDirective.html) ap-pear to be following a similar direc-tion, and American DMCA concepts

are rapidly being exported to othercountries.

Usage control seems to be preferredthese days to out and out copy protec-tion, and this becomes easier in a net-worked environment. In other words, themessage is “I can’t completely stop youfrom copying my data, but I will findways of controlling how or whether youcan use it. And I will try to find ways toensure that I am notified when you usemy data and that it will always be identi-fied as my intellectual property.”

RIGHTS EXPRESSIONLANGUAGESRights expression languages (RELs) areat the heart of DRM. They are the meansby which the rights in content can be de-scribed and the licensing conditionsspelled out. This is one of the areas inwhich serious competition has been tak-ing place in recent years and it is thecore of the legal rights protection mech-anism that underpins DRM. It is com-mon for RELs to use XML (eXtensibleMarkup Language) as a format.

Part of the debate regarding RELs hascentered around whether to adopt a roy-alty-bearing language or whether to gofor so-called open-source approaches.There are many who would like to seethe latter adopted as the norm for rightsexpression, and a number of projectshave taken place in this domain, proba-bly the most successful of which hasbeen Renato Iannella’s ODRL (OpenDigital Rights Language), as describedon page 857. However, there has beensubstantial commercial support for aREL called XrML that is based on alarge patent pool owned by Content-Guard and endorsed by Microsoft. Thisis the REL that is almost certain to beadopted as the base architecture withinMPEG-21 Part 5, as discussed below,but it can be argued that there is nothingto prevent users implementing ODRLwithin MPEG-21 architectures if theywish. Indeed both languages provide ameans for the unambiguous expressionof rights, and suitable middleware couldbe used to translate between them.

The competing intellectual property(IP) claims in this field are phenomenal-ly complex to unravel and will almostcertainly provide years of work forarmies of lawyers. For example, there

are said to be overlapping claims inpatents, and even those who use some ofthe open source RELs (see below) couldbe in danger of infringing upon some ofContentGuard’s or Intertrust’s patents. Itis not the intention of this article,though, to get bogged down in this legalquagmire.

Content Guard and XrML

XrML (eXtensible rights Markup Lan-guage) is a REL managed by Content-Guard, originating in work undertakenat Xerox and currently in its Core 2.1version (see http://xml.coverpages.org/xrml2core.htm). Like other RELs,XrML uses XML-formatted documentsto describe rights. A number of elementsof DRM have been subsumed within theContentGuard portfolio; for example, itssoftware development kit integrates twoprimarily Xerox technologies—digitalproperty rights language (DPRL) andself-protecting documents (SPD).

The scope of the XrML standardstates:

This document explains the basicconcepts for issuing rights in a ma-chine-readable language and de-scribes the language syntax and se-mantics. It does not providespecifications for security in trustedsystems, propose specific applica-tions, or describe the details of theaccounting systems required. Oneof the goals of this document is todevelop an approach and languagethat can be used throughout indus-try to stipulate rights to use re-sources and the conditions underwhich those rights may be exercisedand by whom. This document doesnot address the agreements, coordi-nation or institutional challengesinvolved in achieving that goal.

In other words it is a means or proto-col by which the rights in a digital docu-ment can be expressed but not a meansby which they can be protected or givensecurity; the latter are dealt with by oth-er technologies such as encryption andwatermarking.

XrML has many different parts andhas gone through a number of versions,some of which have been submitted tostandards committees such as MPEG,the Moving Picture Experts Group, and

DIGITAL RIGHTS MANAGEMENT

Page 83: Journal AES 2003 Sept Vol 51 Num 9

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 857

OASIS, a global, not-for-profit consor-tium working on e-business standards.There is still come confusion concerningwhat degree of interoperability will existwithin XrML-based systems, owing tothe variety of core and extension setsthat are possible. However, standards or-ganizations are being encouraged toagree on a single core and extension setso that interoperability can be main-tained. It is also not entirely clear whatpotential users might expect to have topay when using XrML. It is supposed tobe freely available and not strictly li-censed by ContentGuard, but userscould find themselves infringing onContentGuard’s patent portfolio andhave to pay the company a royalty. Thelegalese can be tricky. ContentGuard’sFAQ page, for example, has the follow-ing statement on the matter:

ContentGuard has a portfolio ofpatented technologies in the area ofdigital rights management, amongother things. They are not specificto XrML. Claims in the patents cov-er the distribution and use of digitalworks and the use of a grammar inconnection with the distribution ofdigital works. You may need to belicensed to use XrML in a contextcovered by the patents.

Key concepts in XrML are those ofthe license and the grant. A license canbe authorized by a digital signature fromthe issuer which then makes it legitimateas a means by which rights can be is-sued in one or more resources. A licensemay contain a number of grants or grantgroups that cover different resources. Alicense is associated with a particular is-suer who can indicate information suchas the time at which the license isdeemed to be granted and any time lim-its it may bear. It is possible within thestandard to encrypt the details of the li-cense so that they can be hidden fromthird parties. Multiple grants may be re-lated to different types of rights in thesame material, such as a right to playand a right to copy for example. Condi-tions may be attached to grants so thatsome requirement has to be fulfilled be-fore the grant is authorized.

The effective target or subject of theright is the so-called principal, whomight be the person who is granted the

right (for example the user that down-loads an audio file). A right can begranted without a principal, but this isregarded as dangerous practice, for ob-vious reasons (the right would then bedeemed to have been granted to anyuser). A key-holder principal is onewho has access to a private key for de-crypting the material relating to a cer-tain public key.

The standard describes an authoriza-tion algorithm that is expected to be pre-sent in some form in any software thatmight need to check and authorize therights in digital material. Its outputs canbe yes, no, or maybe, the last one beingdependent on the fulfilling of certainconditions before the right is authorized.

XMCL

Extensible Media Commerce Language(XMCL) was proposed by Real Net-works as a universal business languagefor expressing rights in digital media. Itis supported by a number of other com-panies including Nokia. It is strictly arights-specification language rather thana rights-expression language. XMCLwas merged with ODRL (see below) in-stead of being submitted separately toMPEG in the competition for theMPEG-21 REL, with Real Networks de-ciding to support ODRL instead. This isjust one example of the complicatedchain of alliances and intellectual prop-

erty sparring that has been a feature ofthe jockeying for positions in the DRMworld in recent years.

ODRL

Open Digital Rights Language (ODRL)is an alternative XML schema to XrMLfor expressing rights. The project sum-mary states:

The ODRL Initiative Supporters arefocused on fostering and supportingopen and free standards for thespecification of media commercerights languages. The ODRL Initia-tive is a forum used to propose, dis-cuss, and gather consensus for alanguage that it will subsequentlynurture via formal standards bod-ies. The ODRL Initiative will striveto openly participate in standardsgroups that allow for the adoptionof royalty-free specifications… .The ODRL Initiative is committedto supporting MPEG-21 and is acompatible Rights Language thatwill support open and free interop-erability within and across theMPEG-21 Multimedia Framework.

It enables the “signing” of a digital li-cense to authorize the usage and ensuresthat each licensed version is crypto-graphically unique to the licensee. Someof the relationships involved are picturedin Fig. 1. It was originally proposed

DIGITAL RIGHTS MANAGEMENT

Fig. 1. Relationships between elements in ODRL (Fig. courtesy of Renato Iannella)

Page 84: Journal AES 2003 Sept Vol 51 Num 9

858 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

by the W3C consortium and has recentlybeen adopted by the Open Mobile Al-liance (formerly the WAP forum) as therights language for all mobile content. Itwas not, however, successful in becom-ing the adopted base architecture RELfor MPEG-21. It is understood, nonethe-less, that any XML based REL can stillbe used within the MPEG-21 frameworkto express rights, as an alternative to theroyalty-bearing MPEG REL (which isXrML). ODRL is also a major feature ofthe Open IPMP project (see below).

OPEN SOURCE SOLUTIONSA number of developers are unhappywith the idea of large companies hav-ing total control over DRM solutions,so they are promoting open source so-lutions that are freely available. Opensystems are recognized around the in-dustry as a valuable alternative to pro-prietary systems. Some examples areintroduced here.

Open IPMP

Open IPMP (Intellectual Property Man-agement and Protection) is a projectdedicated to developing open source so-lutions to DRM. IPMP was a term thatMPEG originally coined, particularly inrelation to MPEG-4 in the first instance.Under MPEG-4 it is possible to havetight coupling between IPMP data andcontent data, so that rights in content el-ements can be checked at the renderingstage, as shown in Fig. 2. MPEG-4,

however, did not specify which systemsshould be used for IPMP, but a numberof the requirements were introduced thatwere subsequently addressed in detail byMPEG-21.

Open IPMP embodies many of theconcepts that have been identified asnecessary for effective DRM. It incorpo-rates tools for user/content identificationand management, such as digital objectidentification (DOI) and open digitalrights management (ODRL), and forcryptography, such as public key identi-fication (PKI), asymmetric and symmet-ric encryption, digital signatures, SSL(secure sockets layer), and secure stor-age. Asymmetric encryption is used fordata such as licenses and symmetric en-cryption is used for content. It conformsto the Internet Streaming Media Al-liance requirements (ISMA 1.0).

Media-S

Media-S was developed by SidespaceSolutions and is intended to be an open-source development package that pro-vides an open digital rights interface formultimedia source material. It is initiallyaimed at the company’s Ogg Vorbis fileformat which is an alternative to MP3for the encoding of low-bit-rate audiobut without any royalty requirements.

Authena

Authena seems to be primarily con-cerned with providing a forum forlinking open-source DRM to content-

management systems(CMS). It is claimed that“by providing a set of mod-ules and an architecturalphilosophy for generatingRDF/RSS descriptions in-corporating the Dublin Core(see the 2003 July/AugustJournal article on Metada-ta) and the extensible Cre-ative Commons licenses(see below), Authena seeksto marry a full spectrum ofrights definitions to OpenSource CMS.”

RDF is the Resource De-scription Framework, whichis a W3C format for describ-ing networked content. RSSstands for RDF site summaryand is a means of enabling

easy syndication of web-based contentsuch as news feeds and the like. Initiallyit was said to stand for really simple syndication.

Creative Commons

The Creative Commons is an organiza-tion that does not intend to make a profitand is set up for the sole purpose of pro-viding a rights vehicle for authors whowish to make their material available un-der a “some rights reserved” or “norights reserved” banner. In other wordsit believes that there is a large need for alicensing mechanism that looks after theauthor’s intellectual property but makesit available to the world without the pro-tection of full-fledged copyright.

MPEG-21: INTEGRATING THEPIECESThe background and vision of MPEG-21, as originally described in ISO/IEC(2002) ISO/IEC JTC1/SC29/WG11N5333 MPEG-21 Requirements v 1.4., is as follows:

Today, many elements exist to buildan infrastructure for the deliveryand consumption of multimediacontent. There is, however, no “bigpicture” to describe how these ele-ments, either in existence or underdevelopment, relate to each other.The aim for MPEG-21 is to de-scribe how these various elementsfit together. Where gaps are identi-fied, MPEG-21 will recommend

DIGITAL RIGHTS MANAGEMENT

Fig. 2. MPEG 4 possible model for tight coupling between IPMP and content (courtesy MPEG)

DMIF

DM

UX

Elementary Stream Interface

Audio DB

Video DB

OD DB

BIFS DB

IPMP DB

AudioDecode

VideoDecode

ODDecode

BIFSDecode

Audio CB

Video CB

DecodedBIFS

Com

posite

Render

IPMP-ESIPMP-Ds

IPMP Systems(s) Possible IPMPControl Points

BIFS Tree

Page 85: Journal AES 2003 Sept Vol 51 Num 9

which new standards are required.ISO/IEC JTC 1/SC 29/WG 11(MPEG) will then develop newstandards as appropriate while oth-er relevant standards may be devel-oped by other bodies. These specifi-cations will be integrated into themultimedia framework through col-laboration between MPEG andthese bodies.

The result is an open frameworkfor multimedia delivery and con-sumption for use by all the playersin the delivery and consumptionchain. This open framework thusprovides content creators and ser-vice providers with equal opportu-nities in the MPEG-21 enabledopen market. This will also be tothe benefit of the content consumerproviding them access to a largevariety of content in an interopera-ble manner.

The vision for MPEG-21 is to de-fine a multimedia framework to en-able transparent and augmenteduse of multimedia resources acrossa wide range of networks and de-vices used by different communities.

In order to achieve this the followingelements were identified as necessary(see Fig. 3): digital item declaration, digital item identification and description, content handling and usage,intellectual property management and protection, terminals and networks, con-tent representation, and event reporting.

MPEG-21 is ISO standard 21000-N[ISO/IEC (2003) ISO/IEC 21000-N: In-formation technology — Multimediaframework (MPEG-21)] and currentlyconsists of a number of parts, namely:

Part 1: Vision, Technologies, and Strategy

Part 2: Digital Item DeclarationPart 3: Digital Item IdentificationPart 4: Intellectual Property Manage-

ment and ProtectionPart 5: Rights Expression LanguagePart 6: Rights Data DictionaryPart 7: Digital Item AdaptationPart 8: Reference SoftwarePart 9: File Format.

Part 2 is primarily concerned withdeveloping a flexible means by whichdigital items (containing content)

DIGITAL RIGHTS MANAGEMENT

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 859

USEFUL WEBSITESAuthena: http://authena.org/Copy protection on CDs: http://ukcdr.orgCreative Commons: http://creativecommons.orgMedia-S: www.sidespace.comMPEG IPMP: www.chiariglione.org/mpeg/standards/ipmp/MPEG 21: www.chiariglione.org/mpeg/standards/mpeg-21/mpeg-21.htmOASIS: www.oasis-open.orgODRL: www.w3.org/TR/odrl/ and http://odrl.net/Open IPMP: http://openipmp.com/Open Source development: www.sourceforge.netSecure Digital Music Initiative (SDMI): www.sdmi.orgXrML: http://xml.coverpages.org/xrml2core.htm

T E S T F A S T E R F O R L E S SW I T H D S C O P E S E R I E S I I I

Ideal for:• Research & Development• Automated Production Test• Quality Assurance• Servicing• Installation

F o l l o w i n g c o m p l e t i o n o f o u r e x t e n s i v e b e t a - t e s tp r o g r a m , R e l e a s e 1 . 0 0 i s n o w a v a i l a b l e

Prism Media Products LimitedWilliam James House,Cowley Road, Cambridge. CB4 0WX. UK.

Tel: +44 (0)1223 424988Fax: +44 (0)1223 425023

[email protected]

Prism Media Products Inc.21 Pine Street, Rockaway, NJ. 07866. USA.

Tel: 1-973 983 9577 Fax: 1-973 983 9588

www.prismsound.com

dScope Series III issimply the fastest way to test.

Call or e-mail NOW to find out just how fast your tests can be!

DSNet I/O Switcher 16:2now available

Page 86: Journal AES 2003 Sept Vol 51 Num 9

860 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

can be defined, so that they can be usedinteroperably across systems. This in-cludes systems for, among other things,containing content in a hierarchicalfashion and enabling searching and re-vision management. Part 3, on the otherhand, is concerned with the means bywhich such digital items and other in-formation can be described and identi-fied. Part 4 (IPMP) is concerned withthe means of protection, including suchtechnologies as watermarking, authen-tication, and encryption. The issues sur-rounding Part 5 (RELs) have alreadybeen covered in some detail above.

The scope statement for Part 6 states:The Rights Data Dictionary (RDD)comprises a set of clear, consistent,structured, integrated and uniquelyidentified Terms… to support theMPEG-21 Rights Expression Lan-guage… . Use of the RDD Systemwill facilitate the accurate exchangeand processing of information be-tween interested parties involved inthe administration of rights in, anduse of, Digital Items, and in particu-lar it is intended to support theMPEG-21 REL… . As well as pro-

viding definitions of Terms for usein the REL, the RDD System is de-signed to support the mapping andtransformation of metadata from theterminology of one namespace (orAuthority) into that of anothernamespace (or Authority) in an au-tomated or partially-automatedway, with the minimum ambiguityor loss of semantic integrity.

It therefore relates to the definition ofterms for use in the expression of rightsas well as the means by which suchmetadata may be translated betweendifferent systems (such as the termino-logical systems of different licensingauthorities).

Part 7 (Digital Item Adaptation)seems to be related to a means by whichdigital items and descriptive data can bemodified by some authorized system.

LIGHT WEIGHT DIGITAL RIGHTSMANAGEMENT (LWDRM)The Fraunhofer Institute has coined theterm Light Weight Digital Rights Man-agement to refer to a system it has de-veloped for enabling the marking of

content with the user’s digital signa-ture. Rather than always acting as afull-blown DRM system, LWDRMworks on the principle that fair use ofcontent by a user is made possible butif such content “leaks out” to the gener-al public it can be traced back to theuser. It is intended as a plug-in for pop-ular players as well as for dedicated ap-plications allowing publication of ma-terial. Protection is facilitated byencryption (using the Advanced En-cryption Standard) and the company’sown watermarking technology. Initialimplementations are based aroundMPEG-4 AAC (audio) codecs, but oth-ers are planned for MP3 files andMPEG-4 AVC (video).

It achieves the desired end by meansof two different formats for the material:local media format (LMF) and signedmedia format (SMF). LMF files areunique to the machine they were gener-ated on and cannot be played elsewhere,whereas SMF files can be used for fairuse copies that a user makes for replayon another system, for example in thesame house or for the car. It is proposedthat three different levels of authoriza-tion and use would be allowed. Level 1involves the replay of only SMF files using a LWDRM-compli-ant player. Level 2 allows the user tocreate content and replay SMF material.The content created is tied to the ma-chine on which it was generated (LMF files). The example given is a local jukebox allowing the user to replay his own content. Level 3 enables theuser to generate (publish) his own SMFfiles from LMF files, which could be re-played by any LWDRM player. Theuser has to register with a CertificationAuthority to be able to use Level 3. TheCertification Authority is used to provethe identity of the registered user.

CONCLUSIONThere can be little doubt that some formof DRM is here to stay, particularly inrelation to the consumer distribution ofaudio content. The challenge for imple-menters is primarily to develop and im-plement satisfactory business modelsthat make the experience of using digitalcontent a pleasant and convenient onefor the user while ensuring that appro-priate rights are maintained.

DIGITAL RIGHTS MANAGEMENT

Fig. 3. Examples of elements within MPEG 21 (courtesy MPEG)

Examples•Container•Item•Resource

Examples•Encryption•Authentication•Watermarking

User A User BTransaction/Use/Relationship

Digital ItemAuthorization/Value Exchange

Examples•Resource Abstraction•Resources Mgt. (QoS)

Examples•Storage Mgt.•ContentPersonalization

Examples•Unique Identifiers•Content Descriptors

Examples•Natural and Synthetic•Stability

Digital ItemDeclaration

ContentRepresentation

Digital Item Identification

and Description

ContentManagementand Usage

IntellectualProperty

Managementand Protection

Terminalsand

Networks

Page 87: Journal AES 2003 Sept Vol 51 Num 9

OF THE

SECTIONSWe appreciate the assistance of thesection secretaries in providing theinformation for the following reports.

NEWS

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 861

Power of FirewireThe New York Section’s May 13meeting began with an introduction ofcandidates running for office. A mo-

tion for the election ofthe committee as aslate was secondedand passed. Since therewere no nominationsfrom the floor, ballotswere then distributed

for the election of individual officersand a slate of committee members.These ballots were also mailed tothose members who were unable to attend the May meeting.

Guest speakers Mike Overlin fromYamaha America and John Strawnfrom S-Systems, Inc. spoke about audio over IEEE 1394—commonlyknown as Firewire. Recent extensionsto Firewire technology have made it avery powerful audio device network-ing tool.

Strawn presented a technicaloverview of the standards related toIEEE 1394. Although most people areaware of hooking up individualFirewire devices such as a hard drive,scanner or video camera to a computerfor relatively high-speed data transfer,the 1394 Trade Association has devel-oped a series of standards for variousfunctions that have been formally

John Strawn (left) andMike Overlin discussFirewire at New YorkSection meeting.

Page 88: Journal AES 2003 Sept Vol 51 Num 9

Smith then related the story ofthe audio development of oneof his company’s heart defib-rillators.

Heart defibrillators areelectronic medical devicesthat shock an improperlybeating heart back into a reg-ular rhythm. Growing venuesfor these devices are publicspaces such as airports, so thatthey are as available as fire ex-tinguishers and first aid kits tountrained personnel who mayaccess them in the case of aheart attack emergency. Theunits are completely micro-

processor-controlled and cannot send ashock unless needed. The standard wayto guide the user is with prerecordedvoice prompts. The on-site sound isalso recorded for later analysis.

Not surprisingly, audio is oftentreated as an afterthought in the design. Smith felt compelled to con-vince management of the need forgood audio engineering for the defib-rillator. He argued that the voiceprompts are critical in noisy, tense sit-uations. To demonstrate this, Smithplayed a security video synched to thedefibrillator audio/heart data recordingduring an actual resuscitation todemonstrate the potential commotionthat might occur in such a situation.He also demonstrated the unit used inthe video, and notably, the promptswere somewhat distorted. The level ofthe logging audio fluctuated due to apoor AGC.

According to Smith, some designchanges were obvious. First, it wasimportant to choose a decent loud-speaker and place it on top, pointed atthe user, rather than inside the box atthe end of a channel. Second, it wouldbe a mistake to put the logging micro-phone way inside at the end of a longplastic channel. Some other audio ele-ments required research and testing,such as codecs, bit depths, samplingrates and compression. To determine

networks to be interconnected. According to Strawn and Overlin,

although Yamaha took the lead in developing mLan, over 30 manufac-turers currently make devices for thestandard. Using slides, they showedthe group examples of some of theequipment now available from a vari-ety of manufacturers.

A lively question-and-answer periodfollowed the presentation. Strawn andOverlin recently published an article inElectronic Musician about mLan.Thanks to the generosity of the maga-zine’s publisher, they were able to dis-tribute copies of the issue to everyonepresent. The section thanked Yamahafor sponsoring both presenters.

Section members enjoyed the picnicin June. It featured a demonstrationof ambiophonic surround sound reproduction.

Eric Somers

Audio in Medical TechMembers of the Pacific NorthwestSection gathered in Seattle on April 8to hear Bob Smith, staff scientist atMedtronic Physio-Control and ownerof BS Studios, talk about Audio Designfor Nontraditional Audio Products.

Rick Chinn, section chair, openedthe meeting with a brief report on theAES Convention in Amsterdam.

adopted as standards by the Interna-tional Electrotechnical Commission(IEC). These standards have the basicnumber 61883 with subsets having anappended digit (e.g. IEC 61833-1).Additional standards include a MIDIManufacturers Association (MMA)standard for carrying MIDI over aFirewire cable and a 1394 Trade Asso-ciation standard for Audio VisualControl Protocol. Strawn described thevarious standards and their applica-tions. On top of the 61883 standards,Yamaha has also developed an mLanstandard to give audio and musical devices additional capability.

Overlin then presented a thoroughsummary of mLan in audio applica-tions. Thanks to these standards, IEEE1394 is capable of becoming a net-working environment that will allowdevices to send simultaneous multi-channel data streams bidirectionallyover a single mLan cable. In addition,equipment built for mLan standardscan be configured as part of a net-work by using a computer to set upthe virtual network. This equipmentwill “remember” the network struc-ture after a power shutdown evenwithout a computer in the network.Thus, equipment can be unpluggedand reconnected without destroyingthe integrity of the network. More-over, bridge devices also allow mLan

OF THE

NEWS

SECTIONS

862 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

IMG 2800 TIF: Bob Smithdemonstrates audio aspectsof a defibrillator to PacificNorthwest members in April.

Page 89: Journal AES 2003 Sept Vol 51 Num 9

OF THE

NEWS

SECTIONS

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 863

for broadcast. First, what do we mea-sure? Although there are exceptions,the simple answer is dialog. This iswhat cable, satellite and terrestrialbroadcasters are struggling to level-match. Indeed, for DTV, the FCC ref-erences the ATSC A53B document inrequiring that the dialnorm metadatain the Dolby Digital bitstream beproperly set to reflect the average lev-el of dialog, thereby allowing the con-sumer’s receiver to match levels.

The second issue according to Reid-miller was how accurate is accurateenough? Reidmiller described thecomputational overhead of some ofthe measuring techniques. For exam-ple, the Moore/Glasberg algorithm isaccurate but requires 6 x 2 K FFTs permillisecond. The scope of the workbeing done in the NCTA is to deter-mine the average consumer toleranceto level variations. This informationcan then be used to quantify the de-gree of accuracy, and conversely thedegree of complexity needed to mea-sure dialog level in the real world.

Reidmiller ended the meeting byasking attendees to take an audio leveltolerance survey. The purpose of theexercise was not to determine whetheror not attendees could hear a differ-ence in level, but to determine howmuch level difference was consideredacceptable. In other words, how muchlevel change would cause one to turnthe TV volume up or down? Althoughthis was certainly not a controlled test-ing environment, the results gave aglimpse of how tolerant some are tolevel differences in TV audio, specifi-cally dialog. A chart of the results ofthis survey can be found at the sec-tion’s Web site:

www.aes.org/sections/la/.Steve Venezia

Speech recognitionOn April 29, the section met for anoverview of speech recognition tech-nologies on the PC platform and em-bedded devices, such as portable audio/video player products. To kickoff the meeting, executive committeemember and director of product man-agement and development, ChrisPalmer, described some basic commu-nication and language challenges

critical aspect of the initial design of such products.

Gary Louie

From Theory to PracticeNeil A. Shaw and Jeffrey Riedmillertook members of the Los AngelesSection on a journey through loudness—from theory to practice on March25. Shaw, a principal of Menlo Scien-tific Acoustics, Inc., is a fellow of theAcoustical Society of America. Ried-miller, from Dolby Laboratories, Inc.,is co-chair of the National Cable Tele-vision Association (NCTA) audioquality subcommittee.

Shaw began with an historical lookat loudness, from how the human earworks to the research and develop-ment of sophisticated loudness estima-tion algorithms, including the ISO532B method. He gave a very detailedexplanation of the complexities of theinner ear and how it processes sound.He then continued with an overviewof the research on the frequency sensi-tivity of human hearing (as studied byFletcher and Munson back in the1930s) and how it applies to manyproducts today. Shaw explained thatISO 532B, developed by E. Zwicker,was created to calculate the loudnessof steady complex sounds for which1/3-octave band analyses have beenobtained.

Riedmiller took a look at develop-ing a standard for loudness measure-ment for the broadcast industry. Whilethe International Telephony Union(ITU) is currently looking at a numberof measurement techniques to accu-rately measure loudness, the cable,satellite and broadcast industries alsohave to deal with how to measureloudness quickly and easily (not justaccurately). Time is a luxury broad-casters do not have. Program audio,especially live, has to be measured asit goes to air. Even preprogrammedcontent must be measured quickly because of the quantity of shows beingaired, especially in cable and satellitesystems that receive, store and passthrough huge amounts of contentevery minute of every day.

Riedmiller pointed out the two fun-damental issues in measuring loudness

the environment, noise measurementswere made on real aid calls, in aidcars, a ferryboat, and emergencyrooms. This analysis helped determinethe requirements for better playbackand recording.

Smith noted that although the inter-nal microphones were familiar aluminum electret capsules, a water-proofing membrane on the case, elec-trical isolation requirements andmounting on the PC board at the endof a plastic channel meant that the oldunit was actually working very muchlike a kazoo. Its simple AGC madegross volume changes every time aprompt played. A new unit, which hada microphone placed near the case sur-face away from the loudspeaker used amuch improved AGC, a better codec,higher sample rate, and greater bitdepth. Needless to say, this unitworked much better.

Smith stated that the audio system inmedical products must be of a highergrade than those in consumer products.This limits design and component selection. The power budget for the defibrillator was about 1 W, since it isa portable device. Smith demonstratedhow a few intelligent design changesgreatly improved the loudness andclarity of the playback system. Actualvoice actors recorded the voiceprompts in twenty-one languages andgreat care was taken to achieve theproper urgency, cadence, delivery anddiction. These recordings were madewith later compression and encodingin mind.

Smith ran the older unit (with asimulator) to hear all the prompts,then ran the newer unit, which hadobvious improvements in loudnessand clarity. Although both used thesame audio amplifier, there was abouta 12-dB difference. A modest amountof additional memory was also need-ed to supplement the sound.

At the end of the meeting, a grabbag of door prizes was distributed tothe 14 attendees. They included someitems from the AES Amsterdam Con-vention, courtesy of Rick Chinn, andplenty of Medtronic novelties, suchas keychains with CPR moisture bar-riers inside. Smith’s lecture served asa reminder that audio should be a

Page 90: Journal AES 2003 Sept Vol 51 Num 9

challenge for hardware and softwaredesigners in that portable devices havelimited processing power and memo-ry, require voice independence, andmust be capable of effectively detect-ing speech. The system architecturemust be able to manage and addressissues of audio level clipping, signal-to-noise, fidelity, microphone mount-ing and noise. To demonstrate,Anandpura showed an e.Digital-designed music player with a 20 GBhard drive, which can store thousandsof sound files. This hand-held devicewas able to accurately recognize theartist and album names spoken to itby several audience members.

Finally, Anandpura demonstrateddictation in Microsoft Word usingPalmer’s voice profile, which pro-duced humorous results. For compari-son, a female audience member alsoread the same passage of text. As expected, the demonstration showedthat when using Palmer’s profile, thesoftware more accurately recognizedhis voice over the other voices.

Following such a survey of recentadvances in speech-recognition tech-nologies, it was clear that although PCapplications are able to take advantageof extensive processing power in dic-tation and command-and-control func-tions, there are several more recentand far less known advances that havebeen made in non-PC embedded designs, such as the portable audio device shown during the meeting. Thegroup thanked Anandpura for such athoughtful and informative overview.The slide presentation for this meetingis available for downloading from thesection Web site at:

www.aes.org/sections/la/.Chris Palmer

Soundwire in SFForty-five people attended the Maymeeting of the San Francisco Sectionheld at CCRMA (Center for ComputerResearch in Music and Acoustics) atStanford University.

trol functionality. The purpose of thiswas to provide a contrast to the inher-ent challenges in handling speech andvoice navigation on non-PC, embed-ded, audio device applications.

Guest speaker Atul Anandpura pro-vided an overview of the input andcontrol devices used in portable devices, such as keypads, touch-screens, stylus, hand gloves, and eye-tracking devices. As the methods ofinputting data have grown, so has theuse of high-capacity compact storage.The proliferation of mass media stor-age of audio and music files has creat-ed both an opportunity and challengeto hardware developers and manufac-turers. Locating thousands of audioclips quickly on small LCD displayshas opened new avenues for the use ofspeech recognition.

Historically, voice recognition tech-nology includes the early work ofAlexander Graham Bell using visiblespeech for the hearing impaired, the1950s Bell Labs 10-digit recognitionsystems, and in more recent history,PC dictation software applications thatare able to recognize over 100 000words. Today, it is possible to navi-gate data on embedded portable devices in a structured hierarchy thatcould include such information asgenre, artist, album and title informa-tion. Transport controls such as playcan be voice activated. This is quite a

that occur in an increasingly globalbusiness and entertainment world.Speech recognition is one tool that canassist in breaking down some of thebarriers to communication by offering ahands-free input device option, versusthe more conventional keyboard input.

Over the past ten years, numerousPC speech recognition applicationshave been introduced into the market.In general, these systems rely on com-parison of phoneme-based acousticpatterns. Phonemes are the smallestunits of speech sound from whichrecognition profile libraries are creat-ed. Almost all speech recognition soft-ware packages and many word processing applications are able tosupport continuous dictation as well ascommand-and-control functionality. Inorder to prepare for accepting voiceinput, the software trains itself by ana-lyzing samples of the user’s voice. Ingeneral, the more training the userprovides, the more accurate the speechrecognition will be. Most applicationsclaim between 80-90 percent accura-cy. This is based on a combination ofaccurate voice profiles, proper headsetmicrophone technique and extensivedictation training. Palmer read a para-graph of text in order to demonstratecontinuous speech recognition basedon his voice profile. He also showedhow menus, or even macros could becontrolled using command-and-con-

OF THE

NEWS

SECTIONS

864 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

Professor Chris Chafe (left) and ScottWilson (seated) explain Soundwire at San Francisco meeting.

Page 91: Journal AES 2003 Sept Vol 51 Num 9

ABOUT PEOPLE…

The IEEE has named AES fellowsRichard H. Small and Neville Thieleco-recipients of the 2003 IEEE MasaruIbuka Consumer Electronics Award fortheir contributions to the synthesis andanalysis of loudspeakers. Sponsored bythe Sony Corporation, the award hon-ors the efforts of individuals in the fieldof consumer electronics technology.The award was presented to Small andThiele on June 18 at the 2003 IEEE International Conference on ConsumerElectronics in Los Angeles.

Small, a senior principal engineer atHarman/Becker Automotive Systems,in Martinsville, Indiana, and Thiele, aconsulting engineer in Sydney, Australia, worked to revolutionizeloudspeaker design through their development of the Thiele-Small (TS)parameters.

For more than 40 years, TS parame-ters have been the de facto criteria forassessing loudspeaker performance.This unified approach analyzes theelectromechanical behavior of a loud-speaker through the interaction of itscomponents and with the air insideand outside the loudspeaker cabinet.The resulting equation is mathemati-cally identical to that describing a cir-cuit. Then, sound produced by theloudspeaker can be obtained using asimple circuit analysis. By employingthe TS parameters in computer mod-els, users can design the loudspeak-er-/cabinet interface without having tomanually build a loudspeaker cabinet.

A senior member of the IEEE,Small is also a member of the Institu-tion of Engineers Australia. He has received the AES’s PublicationAward, Silver Medal and Gold Medal.He earned his master’s degree in elec-trical engineering from the Massachu-

TRACK

SOUND

setts Institute of Technology and hisdoctorate for research on direct-radia-tor electrodynamic loudspeaker sys-tems from the University of Sydney.

Thiele has published 37 papers andhas been awarded the Norman W.V.Hayes Medals of the Institution ofRadio and Electronics Engineers, Aus-tralia for best papers published in theInstitutions Proceedings. His otherhonors include the Institution’s Awardof Honour and the Silver Medal of theAudio Engineering Society. He is anactive member of the ITU-R’s Aus-tralian National Study Group and theCommittee on Digital Audio andVideo, Standards, Australia. Thieleearned his bachelor’s degree in mechanical and electrical engineeringfrom the University of Sydney.

Jim Anderson,AES vice presi-dent, Eastern Re-gion, USA/Canada,has been appointedvisiting professorin the newly creat-ed Clive DavisSchool for Record-

ed Music at New York University. TheClive Davis Department of RecordedMusic offers a course of study leadingto a Bachelor of Fine Arts that is designed to educate students in all as-pects of contemporary recorded musicwith a special focus on the art of identi-fying musical talent and developingcreative material within the complexrange of recorded music technologies.

The program, the first of its kind inthe country, recognizes creative recordproducers as artists in their own rightand musical recording itself as a cre-ative medium. Anderson has been active on the New York recordingscene for the past 23 years and

Professor Chris Chafe and ScottWilson, Ph.D. candidate, explainedtheir research project, SoundWIRE(Sound Waves on the Internet fromReal-time Echoes).

The principle of streaming Internetaudio is to digitize sounds, break upthe data stream into packets, and sendthe packets through the Internet. Pack-ets must be reassembled at the receiv-ing end.

The main problems with streamingInternet audio are time delays (laten-cy), and lost data. To avoid problemsof latency and data loss, universitiesand research labs across the USA aredeveloping an improved, next-genera-tion data network, known as Internet2.On advanced networks, audio signalstravel at close to the speed of light,and data loss is minimized.

The SoundWIRE research project revolves around a software applica-tion by the same name. The Sound-WIRE utility is a prototype system forinteractive audio, which makes fulluse of next-generation networks.SoundWIRE uses an elegant methodfor testing data networks, called soni-fication. An audible “ping” isbounced back and forth between net-work hosts. The data network is treat-ed as an acoustic environment, suchas a concert hall. The listener can eas-ily detect problems with sound qualityby listening to reflections.

High quality, near real-timestreaming audio on the Internet ismoving from theory to reality. Nearreal-time, CD-quality, bidirectional,multichannel music has been demon-strated at several performances, fea-turing artists playing hundreds ofkilometers apart.

One consequence of high-quality interactive audio over wide-area net-works is shared acoustics. By combin-ing sounds from several locations, theacoustic environments are also shared.

In a question-and-answer session after the presentation, the problem ofsecurity was raised. Some of the otherquestions discussed were what mea-sures can be taken to make sure unau-thorized people do not access the datastream, and how security will affectquality.

Paul Howard

OF THE

NEWS

SECTIONS

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 865

Page 92: Journal AES 2003 Sept Vol 51 Num 9

prior to that was on staff at NationalPublic Radio.

Anderson is facilities chair of the115th Convention in New York slatedfor this October.

Neutrik AG, a sustaining member ofthe AES, announces the appointmentof Heinrich Zant as internationalsales manager. Zant takes over respon-sibility for all sales activities from thecompany’s headquarters in Schaan,Liechtenstein. Previously with AKGAcoustics in Vienna, Austria, Zantbrings 28 years of experience as assis-tant sales director to his new position.To contact the company by phone, call +423 237 24 25 or e-mail:

[email protected].

AES sustaining member CadacElectronics Plc of Luton, UK, announces the appointment ofThomas Bensen and GLS Marketingas its two new sales representatives inthe United States.

Bensen is a familiar face in the proaudio industry. With over 25 years ofexperience with a wide variety ofleading brands, he has both sound reinforcement expertise and a knowl-edge of marketing. He will be han-dling the sales of Cadac consoles onthe East Coast.

On the West Coast, GLS Marketingoffers a mix of technical skills andcreative sales and marketing tech-niques. Headed by president GregHockman with Ron Thomas as gener-al manager, the company was estab-lished in 1981 and operates out ofHuntington Beach, California, sup-porting southern California, southernNevada, Hawaii and Arizona.

Both Bensen and GLS will repre-sent Cadac’s J-Type and F-Type LiveProduction Consoles, the M-TypeMonitor Board, and the R-Type Light-weight-Touring console, as well as thenewly launched S-Type Live Perfor-mance Mixing Console. For information: www.cadac-sound.com.

COURSES, SEMINARS…

National Instruments and Endevcohave just announced a new seminar on“Measuring Shock and Vibration:

866 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

SOUND

TRACK

Fundamentals Through AdvancedTechnology.” The seminar will fea-ture Patrick L. Walter, senior technol-ogist of Dynamic Instrumentation atEndevco and professor of engineeringat TCU.

Walter will discuss the fundamen-tals of shock and vibration measure-ments and advanced topics such asTEDS “smart transducers” and thehandling and processing of their sig-nals. Subjects will include the physicsof how accelerometers work; how todefine the right accelerometer and sys-tem requirements based on measure-ment applications; new technologies,such as IEEE P1451.4 TEDS, Plug andPlay Sensors and more; and several interactive demonstrations. The semi-nar is designed for managers, engi-neers and technicians who work withor are involved in shock and vibrationmeasurements and accelerometers.

For more information on the loca-tions and dates of this and other National Instrument seminars, visit:www.ni.com/seminars.

A short course in “UnderwaterAcoustics and Signal Processing”will be given at the Penn State Confer-ence Center Hotel in State College,Pennsylvania, on September 29 to October 3. Presented by Penn State’sApplied Research Laboratory (ARL),the program offers participants the opportunity to gain a practical under-standing of fundamental concepts, current research and development activities in the field of underwateracoustics and signal processing.

For information, call: 814-863-5100 or e-mail:

[email protected].

2003 TEC AWARDS

The Mix Foundation for Excellence inAudio has announced the nominees forthe 19th Annual Technical Excel-lence & Creativity Awards, honoringoutstanding technical and creativeachievement in professional audiorecording and sound production. Theawards ceremony will be held on Sat-urday, October 11, at the New YorkMarriott Marquis, during the 115thAES Convention. The event is expect-

ed to attract more than 700 audio pro-fessionals from around the world.

A panel of 140 audio industry pro-fessionals who reviewed the products,facilities and recording and/or broad-cast projects completed betweenMarch 1, 2002 and April 1, 2003 madenominations in 24 categories. Thesecategories comprised 17 areas of tech-nical achievement in product designand seven in creative achievement inaudio production. Winners will be cho-sen by the 41 000 BPA-qualified Mixsubscribers who cast the ballots boundin the August issue of the magazine.

Proceeds of the ceremony are donat-ed to organizations working for theprevention of noise-induced hearingloss and to scholarships for students ofthe audio arts and sciences. For com-plete information on the awards andlist of nominees, visit:

www.mixfoundation.org or contactKaren Dunn, executive director, at 925-939-6149 or e-mail:[email protected].

WEB OF SCIENCE

The National Diet Library (NDL) inJapan has purchased the entire Web ofScience® content file, whose archivesdate back to 1945. An integral part ofThompson ISI’s grand database, theWeb of Knowledge, the Web of Sci-ence is a powerful Web-based resourcethat enables users to search current andretrospective multidisciplinary infor-mation from more than 8500 of theworld’s most prestigious scholarlyjournals.

The complete version of the contentfile includes Science Citation IndexExpanded®, Social Sciences CitationIndex®, and the Arts & HumanitiesCitation Index.™ This research tool allows users to navigate through themultidisciplinary literature to uncoverall information relevant to their work.Cited reference searching also allowsresearchers to learn who is citing theirwork and the impact they, or their col-leagues, are having on the global research community.

The NDL is the only national libraryin Japan. It was established in 1948 bythe National Diet Library Law. TheNDL’s mission is to provide library

Page 93: Journal AES 2003 Sept Vol 51 Num 9

MULTICHANNEL MONITORINGSYSTEM is designed for post-produc-tion suites and recording studios withcontrol rooms measuring less than3000 cubic feet (W x D x H). The1029.LSE PowerPak™ consists of fiveGenelec 1029A two-way, bi-amplifiedactive monitors, one 7060A LSE™

Series active subwoofer and anAcousti/Tape™ frequency/wavelengthmeasuring tape. A 1029.LSEPowerPak setup guide is included foraccurate loudspeaker placement,wiring, and fine-tuning. Genelec Inc.,7 Tech Circle, Natick, MA 01760,USA; tel. +1 508 652 0900; fax +1 508652 0909; Web site www.genelec.com.

SOFTWARE PLUG-IN is a digitalstereo delay and phase sampler. Capableof generating a wide variety of flange,phase, and other delay-based effects, theplug-in also incorporates a precise tapesaturation algorithm, as well as simula-tion of vintage tape machine delays. Aflexible modulation section allows theuser to fine tune effects to add subtle ani-mation and movement to the processedsignal. Tape saturation and high-endabsorption effects are available, and thefreely adjustable delay line samplingrate can be used to create smooth sound-ing low-fidelity processing. Lexicon, 3

Oak Park, Bedford, MA 01730-1441,USA; tel. +1 781 280 0300; fax +1 781280 0490; e-mail [email protected];Web site www.lexicon.com.

A E S S U S T A I N I N G M E M B E R

NOISE-FREE HEADPHONES forportable music players reduce annoyingambient noise. The PX 250 dynamicstereo mini headphones employ sealedear cups, snug, comfortably fitting earpads and switchable NoiseGard™ activenoise compensation. Bass tube technolo-gy delivers punchy bass while adaptivebaffle damping (patent pending) ensuresa smooth, detailed sound across the fre-quency range from 10 Hz to 21 kHz.Soft ring ear pads and closed ear cupsoffer passive attenuation, reducing undesirable ambient noise at frequenciesabove 1200 Hz by 15 dB to 25 dB.Sennheiser Electronic Corporation, 1Enterprise Drive, Old Lyme, CT 06371,USA; tel. +1 860 434 9190; fax +1 860 434 1759; Web site www.sennheiserusa.com.

SURROUND TOOLKIT offers a com-plete set of tools for surround audio pro-duction to industry standards. This suiteof tools is being released exclusively onDSP for Digidesign® Pro Tools|HD®

and MIX™ systems on the Macintosh®

platform. The 360-degree SurroundToolkit consists of seven surround toolsfor surround localization, spatializa-

AND

DEVELOPMENTSProduct information is provided as aservice to our readers. Contact manu-facturers directly for additional infor-mation and please refer to the Journalof the Audio Engineering Society.

NEW PRODUCTS

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 867

services for the executive and judicialbranches of the national governmentas well as the general public. As theonly depository library in Japan, the li-brary acquires all materials publishedin Japan, preserves them as nationalcultural heritage, compiles catalogs ofthese publications in a database or oth-er format, and with these collectionsprovides library services. With thepurchase of these indices, NDL joins agrowing list of other prestigiousJapanese institutions to invest in thefull Web of Science file for their research community.

The ISI Thompson Corporation,based in Philadelphia, Pennsylvania,provides value-added information,software tools and applications tobusiness and professional customers inthe fields of scientific research andhealthcare, law, tax, accounting, finan-cial services, higher education, refer-ence information, corporate trainingand assessment. For more informationabout ISI, visit: www.isinet.com.

SOUND RECORDING EXHIBIT

The Museum of Sound Recording isproud to announce the installation of apermanent public exhibit entitled“Making Tracks,” a celebration of thedevelopment of multitrack recording.The exhibit’s showcase is located inthe renovated gallery of the RKO Keith Theater in Queens, New York.As a museum exhibit, all equipment is intended to be fully operable by staff.In this way, the exhibit will continue toevolve, deepen and grow in usefulnessfor the public and industry.

The main consoles used for opera-tion of the exhibit will be an MCI32X24, for equipment up to1980, anda small WE console representing sys-tems from the early 50s. Some of theequipment on display includes: a WE6X1 console; Ampex 300; Ampex300/351; an Ampex 300/351 8-track,1-in deck; 3m 1-in 8-track; Ampex351 1⁄4-in 2-track; Scully 280 1⁄2-in 4-track; Scully 280 1⁄4-in 2-track; threeAltec A7s; EV bass reflex loudspeak-er; Eventide, Deltalab, Urei, Altec out-board gear, and more. Anyone whowishes to participate contact Dan Gay-dos at 718-441-6767 or 718-794-1183.

SOUND

TRACK

Page 94: Journal AES 2003 Sept Vol 51 Num 9

tion, and envelopment with enhancedpanning, reverberation, and dynamics.The Surround Manager allows the cali-bration of a studio setup to all industrystandard surround release formats andincludes flexible bass management.Surround Reverb includes six channelsof de-correlated reverberation with special front and rear surround control.Surround Imager, Limiter, Compressorand Mixdown functions are also included in the package. Waves, 306West Depot Road, Suite 100, Knoxville,TN 37917, USA; tel. +1 865 546 6115; fax +1 865 546 8445; [email protected]; Web sitewww.waves.com.

HANDHELD VOCAL MICRO-PHONE is the first model to join theShure SM line in over ten years. Thenew SM86 has a cardioid polar patternwith a wide frequency response of 50Hz to 18 kHz. The microphone deliv-ers high gain-before-feedback and atailored frequency response for clearreproduction of vocals. The SM86 isequipped with a two-stage windscreenand pop filter and has a built-in, three-point shock mount, which virtuallyeliminates stand and handling noise.The microphone requires phantompower for operation and is housed in arugged, silver-colored, enamel-paintedenclosure incorporating a steel-meshgrille. Shure Inc., 222 Hartrey Avenue,Evanston, IL 60202, USA; tel. +1 847866 2200; Web site www.shure.com.

AUDIO DSP 96/24 SOLUTION fea-tures a certified decoder as well as acomplete software system that includesauto-detection, Input/Output (I/O) andstream management. The Aureus™

Audio DSP performs these functionsusing only 30 to 40 percent of the DSP.Manufacturers can use the additionalperformance to differentiate their prod-ucts by adding post-processing featuressuch as dual digital zones, loudspeakervirtualization, and automatic room cor-rection. Features of the DSP 96/24include: ability to provide 5.1 channelsof 96/24 along with full motion video onDVD-Video and DVD-Audio, compati-bility with all DVD-Video players, andaccessibility through the digital output.Texas Instruments Inc., SemiconductorGroup, SC-01156, Literature Response

Center, P.O. Box 954, Santa Clarita, CA91380, USA; tel. 1 800 477 8924 ext.4500 (toll free); Web site www.ti.com.

RECORDING CONSOLE combinesthe best of early recording technologywith surround sound, 5.1 mix, andplayback facilities. The Series 80 5.1 isbased on the original Trident AudioDevelopments Series 80 but offersmore auxillary sends and stereoreturns, routing, and equalization. Thecenter master section has two-channelhigh definition equalization and a soni-comp compressor/limiter to produce afinalized two-mix master straight fromthe board. A new feature allows theuser to combine up to 24 CenturyModules from the Oram BEQ PRO-24Series consoles into the same Series 805.1 frame. A moving fader and muteautomation system is available as anoption. Trident Audio Products, OramPro Audio, London, UK; tel. +44 1474815 300; e-mail [email protected];Web site www.oram.co.uk.

PROFESSIONAL SYSTEMS FORDAB MULTIPLEXER supports theoperation of a DAB ensemble multi-plexer, which is a central element in a DAB transmission chain. TheRedundancy DAB EnsembleMultiplexer R&S DM001-R helpsensure high availability and minimizetransmitter failures, which would impairthe operation of the entire ensemble.The system also supports switchover inthe case of instrument failures; forexample, if the ETI output signal or theentire AC supply of the master multi-plexer fails. For servicing, operation canbe switched to the standby multiplexerwithout interrupting transmitter opera-tion. In addition, new software for multi-plexer monitoring checks a number oferror conditions simultaneously. Rohde& Schwarz GmbH & Co. KG,Mühldorfstr. 15, D-81671 München,Germany; tel. +49 89 4129 13779; fax+49 89 4129 13777; e-mail [email protected].

868 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

AND

NEW PRODUCTS

DEVELOPMENTS

2003 October 10-13: AES 115thConvention, Jacob K. JavitsConvention Center, NewYork, NY, USA. See p. 880for details.

•2003 October 20-23: NAB

Europe Radio Conference,Prague, Czech Republic. Contact Mark Rebholz (202) 429-3191 or e-mail: [email protected].

•2003 October 30-November 1:

Broadcast India 2003 Exhibi-t ion , World Trade Centre,Mumbai, India. For informa-t ion contact Kavita Meer, director, Saicom Trade Fairs& Exhibitions Pvt. Ltd., tel:+(91-22) 2215 1396, fax:+(91-22) 2215 1269.

•2003 October 30-31: Autumn

Meeting of the Swiss Acousti-cal Society, Basel, Switzer-land. Contact SGA-SSA, c/oAkust ik, Suva, P. O. Box4358, 6002 Lucern, Switzer-land; fax: +41 419 62 13, onthe Web: www.sga-ssa.ch.

•2003 November 5-6: Institute of

Acoustics (UK) Autumn Con-ference, Oxford, UK. Contact77A St. Peter’s St., St. Al-bans, Herdfordshire AL1 3BN,UK. Fax: +44 1727 850553, oron the Web: www.ioa.org.uk.

•2003 November 10-14: 146th

Meeting of the Acoustical Society of America, Austin,TX. For information contacttel: 516-576-2360, fax: 516-576-2377 or on the Internet:www: asa.aip.org.

•2004 May 8-11: AES 116th

Convention, Messe Berlin,Berlin, Germany. Contact: e-mail: [email protected] page 880 for details.

Upcoming Meetings

Page 95: Journal AES 2003 Sept Vol 51 Num 9

With the growth and proliferation ofcomputer-based digital audio worksta-tions (DAWs) all this has changed,however. Today, a record project canbe accomplished by a single “homerecordist,” working literally in a “bed-room” from start to finish. Recording,tracking, mixing, mastering, and evenCD replication all can be done withina relatively inexpensive computer sys-tem using readily available, off-the-shelf hardware and software.

Nonetheless, it is still the final stageof mastering that will determine theultimate sound of the recording pro-ject. So, whether done by a “special-ist” or a home-studio engineer, thiscontinues to demand critical skill andtalent. Understanding these skills isthe focus of this book by Bob Katz.

In the book’s 22 chapters and 13 appendices, Katz presents a text that isclear and easy to understand, with justenough mathematics and “technicaljargon” to explain the technologybehind the theory, philosophy, andmethodology of the mastering pro-cesses, but without overwhelming thereader. Ample diagrams — many ofwhich are “screen shots” directly fromthe DAW — appropriate quotations,and “debugging” common myths fur-ther enhance the explanations andmake for very enjoyable reading.

Perhaps the most significant sen-tence in the book is contained in thevery first chapter: “Attention to detail:the last 10% of the job takes 90% ofthe time.” With this as the underlyingtheme for his presentation, the book isdivided into four major sections (plusthe appendices) entitled: “Prepara-tion,” “ Mastering Techniques,” “Ad-vanced Theory and Practice,” and

“Out of the Jungle.” Each section andchapter focuses on the “attention todetail” necessary to achieve the goalof an excellent final record release.

“Preparation” describes all of theinitial work necessary to begin the project, such as proper logs, the equip-ment required, and how to configure itefficiently, developing good listeningskills, appropriate selection and use ofdither, and the procedures for main-taining proper signal levels. “Myth:Normalization makes the song levelscorrect.”

Mastering presents the meat of themethodology — specific criteria for final assembly of the tracks, equaliza-tion and other forms of signal process-ing, noise reduction techniques, anddynamic and spatial manipulations.“It’s not how loud you make it, it ishow you make it LOUD.”

“Advanced Theory and Practice” focuses on techniques for producingsurround-sound recordings, additionaldetails relating to the various digital sig-nal formats (word length and bit-depth),and methods for dealing with digital-specific problems such as format con-version, jitter, metering and monitoring,etc. “Making good SOUND is likepreparing GOOD FOOD: if you over-cook it, it loses its taste.”

“Out of the Jungle” brings the read-er back to the practical world and relates some of Katz’s personal expe-riences and recommendations for cre-ating: “A world which recognizes craftand training in audio itself which isnot distaining; where overall excel-lence is what we seek and art comesfrom long-worked technique.”

The appendices provide a wealthof information in themselves: in-

LITERATUREThe opinions expressed are those ofthe individual reviewers and are notnecessarily endorsed by the Editors ofthe Journal.

AVAILABLE

MASTERING AUDIO: THE ARTAND THE SCIENCE, by Bob Katz,Focal Press, Burlington, MA, USA,2002, 318 pages, $39.99, ISBN: 0-240-80545-3.

Mastering has long been consideredone of the “secret arts” sin the variedand complex process of producing acommercial recording. Traditionally,this has been done behind closeddoors, most often by a “specialist”who heretofore has not been involvedin the recording project, but whosetechnical and artistic judgment ulti-mately will determine what the recordbuying public will hear coming fromtheir loudspeakers. The mastering engineer therefore has become the“person preeminent in a discipline …with the ability or power to control”the sound of the final record product.(definitions quoted from the RandomHouse/Webster College Dictionary,1995 edition).

Throughout the “vinyl era,” master-ing suites employed expensive andhighly specialized equipment, not always found even in the most elabo-rate recording studio control rooms —equipment specifically designed to tai-lor the frequency response and dynam-ic range to fit within the limitations ofthe vinyl disc and keep the stylus inthe groove (such as tape playback machines with “preview” heads, low-frequency “crossover equalizers,” andvertical/lateral phase-monitor oscillo-scopes). Even in the more moderntimes of CD production, special digitalplayback devices and converter/pro-cessors have been required for gener-ating the master tape and/or disc thatultimately goes on to the replicationfacility.

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 869

Page 96: Journal AES 2003 Sept Vol 51 Num 9

870 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

cluded are several essays, a further explanation of various audio-file for-mats, formats and recommendationsfor preparation of logs and other docu-mentation, recommended readings, alisting of relevant technical CDs appropriate for ear training and cali-brating the audio production system,and an extensive glossary. “Masteringis the art of compromise … Practice isthe best of all instructions.”

Ron StreicherPasadena, CA

IN BRIEF AND OF INTEREST…

Producing in the Home Studio withPro Tools, Second Edition, by DavidFranz (Berklee Press) is a 282-pageguide to professional home studio pro-duction using industry-standard ProTools® digital audio software.

The first edition of the book becamea top choice in instruction for homestudio-based artists, producers and engineers looking to learn or hone pro-duction skills and techniques and to

achieve top-quality recordings. Thesecond edition is fully updated to cov-er the new features of Pro Tools 6 soft-ware and Digidesign home studiohardware products, including Digi 002,Mbox, Digi 001 and Digi Toolbox XP.

The book contains four major sec-tions: Getting Started, Preproduction,Production and Postproduction, as wellas several appendices for range charts,Web sites, technical support, equip-ment manufacturers and recommendedreading.

The author walks the reader throughall steps of recording, mixing and mas-tering. The new edition also features achapter on digital audio editing andnew hands-on projects in every chap-ter. These projects follow, from start tofinish, four songs through the completeproduction process. In addition, thereare updated, more advanced Pro Toolssessions with audio examples that explain many in-demand Pro Toolstechniques, such as how to comp a vocal track, set up an effects send, tunea vocal track, edit a voiceover track,

design sounds using plug-ins, andmore. QuickTime movies explain sev-eral of the techniques on the enclosedCD-ROM. Clear illustrations also enhance reader comprehension of themyriad recording processes.

Franz is a songwriter, record pro-ducer, engineer, multi-instrumentalist,arranger, orchestrator, performingartist, studio musician, author and instructor. He earned a bachelor’s andmaster’s degree in industrial and sys-tems engineering from Virginia Techand attended the Berklee College ofMusic, where he studied music pro-duction and engineering. He is also theauthor of an online Pro Tools courseavailable through Berklee Media at:www.berkleemusic.com. Producing inthe Home Studio with Pro Tools, Sec-ond Edition, is available from BerkleePress at: www.berkleepress.com. Priceis $34.95 (soft cover). ISBN: 0-87639-008-4. Berklee Press, 1140 BoylstonStreet, Boston, MA 02215-3693, USA;tel: 617-747-2146, fax: 617-747-2149,Internet: www.berkleepress.com.

AVAILABLE

LITERATURE

THEPROCEEDINGS

OF THE AES 23RD

INTERNATIONALCONFERENCE

Signal Processing in Audio Recording and Reproduction

2003 May 23 –25Copenhagen, Denmark

You can purchase the book and CD-ROM online at www.aes.org.

For more informatione-mail Andy Veloz at

[email protected] or telephone+1 212 661 8528 x39.291 pages

These 22 papers focus on sound recording andreproduction from microphone to loudspeaker, including

the interaction between loudspeaker and room.

Page 97: Journal AES 2003 Sept Vol 51 Num 9

PROPOSED TOPICS FOR PAPERS

Please submit proposed title, abstract and precisat www.aes.org/25th_authors no later than 2003December 10. If you have any questions, contact:

Metadata in the broadcasting industryMetadata in the recording industryMetadata in libraries and archivesManagement of metadataUnique identificationMetadata implementations in equipmentStandards and related issuesFeature extraction, indexing, and retrievalBiometricsVoice recognition and speech-to-text

SUBMISSION OF PAPERS SCHEDULE

Proposal deadline: 2003 December 10Acceptance emailed: 2004 January 29Paper deadline: 2004 April 7

Authors whose contributions have beenaccepted for presentation will receiveadditional instructions for submissionof their manuscripts.

PAPERS COCHAIRSGerhard Stoll Russell MasonIRT University of SurreyMunich, Germany Guildford, UK

Email: [email protected]

Universal multimedia access technologiesMetadata and audio formatsCharacter sets for metadata exchangeUsage environment descriptionDigital rights managementControls on access to contentWatermarking and encryptionContent adaptationContent personalizationElectronic program guides

AUDIO ENGINEERINGSOCIETY

CALL for PAPERSAES 25th Conference, 2004

London, UK

As the means for production and distribution of digital audio proliferate, appropriate metadata tools are needed to facilitate, control,and extend these activities. There has been a great deal of activity in individual organizations to develop metadata tools. However,substantial issues remain to be addressed before the desired goal of global exchange and common understanding can be reached.International standardization, such as the work of MPEG-7 and MPEG-21 may hold some important answers.This conference seeks to describe the state of the art, identify the issues, and indicate directions for the development of advancedmetadata systems, both for consumer distribution and business-to-business. It will bring together media publishers and softwaredesigners, media librarians and archivists, database managers and streaming engineers, whose operations are increasingly dependent on the success of sophisticated metadata systems.The AES 25th Conference Committee invites submission of technical papers for presentation at the conference in 2004 in London.By 2003 December 10, a proposed title, 60- to 120-word abstract, and 500- to 750-word precis of the paper should be submittedvia the Internet to the AES25th Conference paper-submission site at www.aes.org/25th_authors. You can visit this site for moreinformation and complete instructions for using the site anytime after 2003 September 17. The author’s information, title, abstract,and precis should all be submitted online. The precis should describe the work performed, methods employed, conclusion(s), andsignif icance of the paper. Ti t les and abstracts should fol low the guidel ines in Information for Authors atwww.aes.org/journal/con_infoauth.html. Acceptance of papers will be determined by the 25th Conference review committeebased on an assessment of the abstract and precis.

Dates: June 17–19, 2004, Location: London, UKChair: John Grant, Nine Tiles Networks, UK, Email: [email protected]

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 871

Page 98: Journal AES 2003 Sept Vol 51 Num 9

872 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

Section symbols are: Aachen Student Section (AA), Adelaide (ADE), Alberta (AB), All-Russian State Institute of Cinematography(ARSIC), American River College (ARC), American University (AMU), Argentina (RA), Atlanta (AT), Austrian (AU), Ball StateUniversity (BSU), Belarus (BLS), Belgian (BEL), Belmont University (BU), Berklee College of Music (BCM), Berlin Student(BNS), Bosnia-Herzegovina (BA), Boston (BOS), Brazil (BZ), Brigham Young University (BYU), Brisbane (BRI), British (BR),Bulgarian (BG), Cal Poly San Luis Obispo State University (CPSLO), California State University–Chico (CSU), Carnegie MellonUniversity (CMU), Central German (CG), Central Indiana (CI), Chicago (CH), Chile (RCH), Citrus College (CTC), CogswellPolytechnical College (CPC), Colombia (COL), Colorado (CO), Columbia College (CC), Conservatoire de Paris Student (CPS),Conservatory of Recording Arts and Sciences (CRAS), Croatian (HR), Croatian Student (HRS), Czech (CR), Czech RepublicStudent (CRS), Danish (DA), Danish Student (DAS), Darmstadt (DMS), Denver/Student (DEN/S), Detmold Student (DS), Detroit(DET), District of Columbia (DC), Duquesne University (DU), Düsseldorf (DF), Expression Center for New Media (ECNM),Finnish (FIN), Fredonia (FRE), French (FR), Full Sail Real World Education (FS), Graz (GZ), Greek (GR), Hampton University(HPTU), Hong Kong (HK), Hungarian (HU), Ilmenau (IM), India (IND), Institute of Audio Research (IAR), Israel (IS), Italian(IT), Italian Student (ITS), Japan (JA), Kansas City (KC), Korea (RK), Lithuanian (LT), Long Beach/Student (LB/S), Los Angeles(LA), Louis Lumière (LL), Malaysia (MY), McGill University (MGU), Melbourne (MEL), Mexican (MEX), MichiganTechnological University (MTU), Middle Tennessee State University (MTSU), Moscow (MOS), Music Tech (MT), Nashville (NA),Netherlands (NE), Netherlands Student (NES), New Orleans (NO), New York (NY), North German (NG), Northeast CommunityCollege (NCC), Norwegian (NOR), Ohio University (OU), Pacific Northwest (PNW), Peabody Institute of Johns HopkinsUniversity (PI), Pennsylvania State University (PSU), Philadelphia (PHIL), Philippines (RP), Polish (POL), Portland (POR),Portugal (PT), Ridgewater College, Hutchinson Campus (RC), Romanian (ROM), Russian Academy of Music, Moscow (RAM/S),SAE Nashville (SAENA), St. Louis (STL), St. Petersburg (STP), St. Petersburg Student (STPS), San Diego (SD), San Diego StateUniversity (SDSU), San Francisco (SF), San Francisco State University (SFU), Serbia and Montenegro (SAM), Singapore (SGP),Slovakian Republic (SR), Slovenian (SL), South German (SG), Southwest Texas State University (STSU), Spanish (SPA), StanfordUniversity (SU), Swedish (SWE), Swiss (SWI), Sydney (SYD), Taller de Arte Sonoro, Caracas (TAS), Technical University ofGdansk (TUG), The Art Institute of Seattle (TAIS), Toronto (TOR), Turkey (TR), Ukrainian (UKR), University of Arkansas at PineBluff (UAPB), University of Cincinnati (UC), University of Hartford (UH), University of Illinois at Urbana-Champaign (UIUC),University of Luleå-Piteå (ULP), University of Massachusetts–Lowell (UL), University of Miami (UOM), University of NorthCarolina at Asheville (UNCA), University of Southern California (USC), Upper Midwest (UMW), Uruguay (ROU), Utah (UT),Vancouver (BC), Vancouver Student (BCS), Venezuela (VEN), Vienna (VI), West Michigan (WM), William Paterson University(WPU), Worcester Polytechnic Institute (WPI), Wroclaw University of Technology (WUT).

INFORMATION

MEMBERSHIP

Elvezio Agostivia Borsi Giosue 14, IT 20143, Milano, Italy(IT)

Peter Alyea1101 E. Capitol St. SE, #2, Washington, DC20003 (DC)

Jack B. Andersen4825 David Ln. #634, Austin, TX 78749

Cedric AndrieuMotorola PCS/WB5G, BP 1029, FR 31023,Toulouse, France (FR)

Shigeru AokiFuda 1-40 #1108, Chofu-shi, Tokyo 102-8080, Japan (JA)

Larry Appelbaum1733 20th St. NW #301, Washington, D.C.20009 (DC)

Huseyin Selcuk ArtutSabanci University, Faculty of Arts andSocial Sciences, Orhanli-Tuzla, Istanbul,Turkey (TR)

Rathish BabuSAE Technology College, Parsn Paradise ABlock, 109 Gn Chetty Road T. Nagar,

Chennai 600017, India (IND)

Mingsian BaiDepartment of Mechanical Engineering,National Chiao-Tung University, 1001 Ta-Hsueh Rd., Hsin-Chu, Republic of China

Richard BarhamNational Physical Laboratory, Centre forMechanical & Acoustical Metrology,Teddington, Middlesex TW11 0LW, UK(BR)

Robert Barnard3680 Bowen Rd., Howeel, MI 48855 (DET)

John M. Bebbs11009 McKay Rd., Fort Washington, MD20744 (DC)

Gary Beebe235 Westridge Ave. E. #A14, Tacoma, WA98466 (PNW)

Fabio Blasizzovia Sottomonte 12, IT 33018, Tarvisio (VD),Italy (IT)

Bjarkt Pihl BovjergEngen 7, Lem St., DK 6940, Aalborg,Denmark (DA)

Michael BowadtRudolph Wulffsgade 10,4 th, DK 8000,Aarhus C, Denmark (DA)

Jan BruningsDanish Broadcast, TV-Byen, DK 2860,Soborg, Denmark (DA)

Giovanni Bugarivia Quarenghi 41, IT 20151, Milan, Italy (IT)

Dominik G. CampanaSimplex Grinnell LP, 1399 Visher Ferry Rd.,Clifton Park, NY 12065 (BOS)

Simon CarlsenSkogvgen 39 C, NO 9403, Harstad, Norway(NOR)

Francesco Castellottivia dei Missaglia 53, IT 20142, Milan (MI),Italy (IT)

Jung-Kuei Chang7201 Hart Ln. #2077, Austin, TX 78731

Sylvain ChoiselSindergade 42 St. Tv, DK 9000, Aalborg,Denmark (DA)

Joseph Clapp9 Hilltop Rd., Pembroke, MA 02359 (BOS)

MEMBERS

These listings represent new membership according to grade.

Page 99: Journal AES 2003 Sept Vol 51 Num 9

MEMBERSHIP

INFORMATION

Shawn L. CoulterPanasonic Electronic Components, AcomMatsushita, 5105 S. National Dr., Knoxville,TN 37804 (NA)

Stefano Cucellivia Pezzotti 11, IT 20141, Milan (MI), Italy(IT)

Kemper J. Maas732 Jenifer St., Madison, WI 53703 (UMW)

Shuichi Maeda150 Pasito Terrace #603, Sunnyvale, CA94086 (SF)

Marco MassimiVicolo Gentili 1, IT 03011, Alatri, Italy (IT)

Edgar Matysio1880 Saskatchewan Dr., Regina, S4P 0B2,Saschatewan, Canada

Tim McCarthyFlat 2/1, 13 Grantley Gardens, Shawlands,Glasgow, Northumberland, G41 3PZ, UK(BR)

Michael McNeil502 Roxbury Dr., Safety Harbor, FL 34695

David B. McRell610 Willow Wood Dr. #114, Carol Stream,IL 60188 (CH)

Timothy McVey5471 Beechtree Dr., Warrenton, VA 20187(DC)

Jens McVoy49 Bogart St. #43, Brooklyn, NY 11206 (NY)

Armando MendesPaseo de Boliches 3, Planta 2 - 1B, ES11630, Arcos de la Frontera, Spain (SPA)

Bob MichaelsProduction Services, 2231 S. Carmelina Ave.,W. Los Angeles, CA 90064 (LA)

Michael Milbert15221 Arminio Ct., Darnestown, MD 20874(DC)

Masayuki Mimura6-5-404 Shimizu-cho, Nishinomiya-shi,Hyogo, 662-0033, Japan (JA)

George Moore6391 C Smithy Square, Glen Burnie, MD21061 (DC)

Gil G. Moreno4106 Dellbrook Dr., Tampa, FL 33624

Andri MunadiJLN Cipinang-Cempedak 4 RT08/03 No.41,Jakarta, Jakarta-Timur 13340, Indonesia

Bob MunizQSC Audio Products Inc., 1675 MacArthurBlvd., Costa Mesa, CA 92626 (LA)

Sang Wook NamSoundmirror, 265-24 JungSun B/D B1,Yanjaedong, Seochoku, Seoul 137130, Korea(RK)

Ken Nelson1224 Poplar St., Helena, MT 59601.

Vasilis NikolopoulosDios 23A N.Kifisia, GR 14564, Athens,Greece (GR)

Kyle Nineff499 Rosewood Ave., San Jose, CA 95117(SF)

Mikkel NymandGydevang 42-44, DK 3450, Allerod,Denmark (DA)

Karim Abdeselam Haasc/Mesena 106 5d, ES 28033, Madrid Spain

Conor Adams5401 S. Race Ct., Greenwood Village, CO80121 (DEN/S)

Bryan Adamson116 Mustang Dr., San Luis Obispo, CA93405 (CPSLO)

Mathieu Agee851 N. Hoye Ave. #1r, Chicago, IL 60622(CC)

Vincent Agne14625 9th Pl. NE, Seattle, WA 98155 (TAIS)

Aberto Aguirre330 Waymore #12, El Paso, TX 79902(STSU)

Jarrett R. Aitkens4306 Wabasso Rd., Garden Valley, CA95633 (ARC)

Muhammad Tayyad Ali Malik186 Mamdot Blk., Mustafa Town, WahdatRd., Lahhore, Punjab 54890, Pakistan

Raphael Allain1 rue Eugene Jumin, FR 75019, Paris, France(CPS)

Thomas Allen657 Waterloo St., London, N6B 2R6,Ontario, Canada

Jackie Alvarez6401 S. Cedar St., Littleton, CO 80120(DEN/S)

Dominic Ameneyro7650 Milldale Circle, Elverta, CA 95676(ARC)

Moritz AndreesenSchoenaugasse 111, AT 8010, Graz, Austria(GZ)

Kristian AnshelmAnskarskatavagen 89C, SE 941 34, Pitea,Sweden (ULP)

Bart Aromando203 E. 76th St., New York, NY 10021 (IAR)

Nenad ArsenijevicOrasacka 10/11, YU 11550, Lazarevac,Yugoslavia

Michael W. AthertonConservatory of Recording Arts & Sciences,2300 E. Broadway Rd., Tempe, AZ (CRAS)

STUDENTS

ASSOCIATES

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 873

AdvertiserInternetDirectory

BSWA Technology Co. Ltd. ..............873www.bswa-tech.com

*NEUTRIK AG......................................861www.neutrik.com

*Prism Media Products, Inc. ..............859 www.prismsound.com

*SRS Labs. Inc. ...................................851www.srslabs.com

*AES Sustaining Member.

Page 100: Journal AES 2003 Sept Vol 51 Num 9

874 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

EASTERN REGION,USA/CANADA

Vice President:Jim Anderson12 Garfield PlaceBrooklyn, NY 11215Tel. +1 718 369 7633Fax +1 718 669 7631E-mail [email protected]

UNITED STATES OFAMERICA

CONNECTICUTUniversity of HartfordSection (Student)Howard A. CanistraroFaculty AdvisorAES Student SectionUniversity of HartfordWard College of Technology200 Bloomfield Ave.West Hartford, CT 06117Tel. +1 860 768 5358Fax +1 860 768 5074 E-mail [email protected]

FLORIDA

Full Sail Real WorldEducation Section (Student)Bill Smith, Faculty AdvisorAES Student SectionFull Sail Real World Education3300 University Blvd., Suite 160Winter Park, FL 327922Tel. +1 800 679 0100E-mail [email protected]

University of Miami Section(Student)Ken Pohlmann, Faculty AdvisorAES Student SectionUniversity of MiamiSchool of MusicPO Box 248165Coral Gables, FL 33124-7610Tel. +1 305 284 6252Fax +1 305 284 4448E-mail [email protected]

GEORGIA

Atlanta SectionRobert Mason2712 Leslie Dr.Atlanta, GA 30345Home Tel. +1 770 908 1833E-mail [email protected]

MARYLAND

Peabody Institute of JohnsHopkins University Section

(Student)Neil Shade, Faculty AdvisorAES Student SectionPeabody Institute of Johns

Hopkins UniversityRecording Arts & Science Dept.2nd Floor Conservatory Bldg.1 E. Mount Vernon PlaceBaltimore, MD 21202Tel. +1 410 659 8100 ext. 1226E-mail [email protected]

MASSACHUSETTS

Berklee College of MusicSection (Student)Eric Reuter, Faculty AdvisorBerklee College of MusicAudio Engineering Societyc/o Student Activities1140 Boylston St., Box 82Boston, MA 02215Tel. +1 617 747 8251Fax +1 617 747 2179E-mail [email protected]

Boston SectionJ. Nelson Chadderdonc/o Oceanwave Consulting, Inc.21 Old Town Rd.Beverly, MA 01915Tel. +1 978 232 9535 x201Fax +1 978 232 9537E-mail [email protected]

University of Massachusetts–Lowell Section (Student)John Shirley, Faculty AdvisorAES Student ChapterUniversity of Massachusetts–LowellDept. of Music35 Wilder St., Ste. 3Lowell, MA 01854-3083Tel. +1 978 934 3886Fax +1 978 934 3034E-mail [email protected]

Worcester PolytechnicInstitute Section (Student) William MichalsonFaculty AdvisorAES Student SectionWorcester Polytechnic Institute100 Institute Rd.Worcester, MA 01609Tel. +1 508 831 5766E-mail [email protected]

NEW JERSEY

William Paterson UniversitySection (Student)David Kerzner, Faculty AdvisorAES Student SectionWilliam Paterson University

300 Pompton Rd.Wayne, NJ 07470-2103Tel. +1 973 720 3198Fax +1 973 720 2217E-mail [email protected]

NEW YORK

Fredonia Section (Student)Bernd Gottinger, Faculty AdvisorAES Student SectionSUNY–Fredonia1146 Mason HallFredonia, NY 14063Tel. +1 716 673 4634Fax +1 716 673 3154E-mail [email protected]

Institute of Audio ResearchSection (Student)Noel Smith, Faculty AdvisorAES Student SectionInstitute of Audio Research 64 University Pl.New York, NY 10003Tel. +1 212 677 7580Fax +1 212 677 6549E-mail [email protected]

New York SectionRobbin L. GheeslingBroadness, LLC265 Madison Ave., Second FloorNew York, NY 10016Tel. +1 212 818 1313Fax +1 212 818 1330E-mail [email protected]

NORTH CAROLINA

University of North Carolinaat Asheville Section (Student)Wayne J. KirbyFaculty AdvisorAES Student SectionUniversity of North Carolina at

AshevilleDept. of MusicOne University HeightsAsheville, NC 28804Tel. +1 828 251 6487Fax +1 828 253 4573E-mail [email protected]

PENNSYLVANIA

Carnegie Mellon UniversitySection (Student)Thomas SullivanFaculty AdvisorAES Student SectionCarnegie Mellon UniversityUniversity Center Box 122Pittsburg, PA 15213Tel. +1 412 268 3351E-mail [email protected]

Duquesne University Section(Student)Francisco RodriguezFaculty AdvisorAES Student SectionDuquesne UniversitySchool of Music600 Forbes Ave.Pittsburgh, PA 15282Tel. +1 412 434 1630Fax +1 412 396 5479E-mail [email protected]

Pennsylvania State UniversitySection (Student)Dan ValenteAES Penn State Student ChapterGraduate Program in Acoustics217 Applied Science Bldg.University Park, PA 16802Home Tel. +1 814 863 8282Fax +1 814 865 3119E-mail [email protected]

Philadelphia SectionRebecca MercuriP.O. Box 1166.Philadelphia, PA 19105Tel. +1 609 895 1375E-mail [email protected]

VIRGINIA

Hampton University Section(Student)Bob Ransom, Faculty AdvisorAES Student SectionHampton UniversityDept. of MusicHampton, VA 23668Office Tel. +1 757 727 5658,

+1 757 727 5404Home Tel. +1 757 826 0092Fax +1 757 727 5084E-mail [email protected]

WASHINGTON, DC

American University Section(Student)Benjamin TomassettiFaculty AdvisorAES Student SectionAmerican UniversityPhysics Dept.4400 Massachusetts Ave., N.W.Washington, DC 20016Tel. +1 202 885 2746Fax +1 202 885 2723E-mail [email protected]

District of Columbia SectionJohn W. ReiserDC AES Section SecretaryP.O. Box 169

DIRECTORY

SECTIONS CONTACTS

The following is the latest information we have available for our sections contacts. If youwish to change the listing for your section, please mail, fax or e-mail the new informationto: Mary Ellen Ilich, AES Publications Office, Audio Engineering Society, Inc., 60 East42nd Street, Suite 2520, New York, NY 10165-2520, USA. Telephone +1 212 661 8528.Fax +1 212 661 7829. E-mail [email protected].

Updated information that is received by the first of the month will be published in thenext month’s Journal. Please help us to keep this information accurate and timely.

Page 101: Journal AES 2003 Sept Vol 51 Num 9

Mt. Vernon, VA 22121-0169Tel. +1 703 780 4824Fax +1 703 780 4214E-mail [email protected]

CANADA

McGill University Section(Student)John Klepko, Faculty AdvisorAES Student SectionMcGill UniversitySound Recording StudiosStrathcona Music Bldg.555 Sherbrooke St. W.Montreal, Quebec H3A 1E3CanadaTel. +1 514 398 4535 ext. 0454E-mail [email protected]

Toronto SectionAnne Reynolds606-50 Cosburn Ave.Toronto, Ontario M4K 2G8CanadaTel. +1 416 957 6204Fax +1 416 364 1310E-mail [email protected]

CENTRAL REGION,USA/CANADA

Vice President:Jim KaiserMaster Mix1921 Division St.Nashville, TN 37203Tel. +1 615 321 5970Fax +1 615 321 0764E-mail [email protected]

UNITED STATES OFAMERICA

ARKANSAS

University of Arkansas atPine Bluff Section (Student)Robert Elliott, Faculty AdvisorAES Student SectionMusic Dept. Univ. of Arkansasat Pine Bluff1200 N. University DrivePine Bluff, AR 71601Tel. +1 870 575 8916Fax +1 870 543 8108E-mail [email protected]

ILLINOIS

Chicago SectionRobert ZurekMotorola2001 N. Division St.Harvard, IL 60033Tel. +1 847 523 5399Fax +1 847 523 2519E-mail [email protected]

Columbia College Section(Student)Dominique J. ChéenneFaculty AdvisorAES Student Section676 N. LaSalle, Ste. 300Chicago, IL 60610Tel. +1 312 344 7802Fax +1 312 482 9083

University of Illinois atUrbana-Champaign Section(Student)David S. Petruncio Jr.AES Student SectionUniversity of Illinois, Urbana-

ChampaignUrbana, IL 61801Tel. +1 217 621 7586E-mail [email protected]

INDIANA

Ball State University Section(Student)Michael Pounds, Faculty AdvisorAES Student SectionBall State UniversityMET Studios2520 W. BethelMuncie, IN 47306Tel. +1 765 285 5537Fax +1 765 285 8768E-mail [email protected]

Central Indiana SectionJames LattaSound Around6349 Warren Ln.Brownsburg, IN 46112Office Tel. +1 317 852 8379Fax +1 317 858 8105E-mail [email protected]

KANSAS

Kansas City SectionJim MitchellCustom Distribution Limited12301 Riggs Rd.Overland Park, KS 66209Tel. +1 913 661 0131Fax +1 913 663 5662

LOUISIANA

New Orleans SectionJoseph DohertyFactory Masters4611 Magazine St.New Orleans, LA 70115Tel. +1 504 891 4424Cell +1 504 669 4571Fax +1 504 899 9262E-mail [email protected]

MICHIGAN

Detroit SectionTom ConlinDaimlerChryslerE-mail [email protected]

Michigan TechnologicalUniversity Section (Student)Andre LaRoucheAES Student SectionMichigan Technological

UniversityElectrical Engineering Dept.1400 Townsend Dr.Houghton, MI 49931Home Tel. +1 906 847 9324E-mail [email protected]

West Michigan SectionCarl HordykCalvin College3201 Burton S.E.Grand Rapids, MI 49546Tel. +1 616 957 6279

Fax +1 616 957 6469E-mail [email protected]

MINNESOTA

Music Tech College Section(Student)Michael McKernFaculty AdvisorAES Student SectionMusic Tech College19 Exchange Street EastSaint Paul, MN 55101Tel. +1 651 291 0177Fax +1 651 291 [email protected]

Ridgewater College,Hutchinson Campus Section(Student)Dave Igl, Faculty AdvisorAES Student SectionRidgewater College, Hutchinson

Campus2 Century Ave. S.E.Hutchinson, MN 55350E-mail [email protected]

Upper Midwest SectionGreg ReiersonRare Form Mastering4624 34th Avenue SouthMinneapolis, MN 55406Tel. +1 612 327 8750E-mail [email protected]

MISSOURI

St. Louis SectionJohn Nolan, Jr.693 Green Forest Dr.Fenton, MO 63026Tel./Fax +1 636 343 4765E-mail [email protected]

NEBRASKA

Northeast Community CollegeSection (Student)Anthony D. BeardsleeFaculty AdvisorAES Student SectionNortheast Community CollegeP.O. Box 469Norfolk, NE 68702Tel. +1 402 844 7365Fax +1 209 254 8282E-mail [email protected]

OHIO

Ohio University Section(Student)Erin M. DawesAES Student SectionOhio UniversityRTVC Bldg.9 S. College St.Athens, OH 45701-2979Home Tel. +1 740 597 6608E-mail [email protected]

University of CincinnatiSection (Student)Thomas A. HainesFaculty AdvisorAES Student SectionUniversity of CincinnatiCollege-Conservatory of MusicM.L. 0003Cincinnati, OH 45221

Tel. +1 513 556 9497Fax +1 513 556 0202

TENNESSEE

Belmont University Section(Student)Wesley Bulla, Faculty AdvisorAES Student SectionBelmont UniversityNashville, TN 37212

Middle Tennessee StateUniversity Section (Student)Phil Shullo, Faculty AdvisorAES Student SectionMiddle Tennessee State University301 E. Main St., Box 21Murfreesboro, TN 37132Tel. +1 615 898 2553E-mail [email protected]

Nashville Section Tom EdwardsMTV Networks330 Commerce St.Nashville, TN 37201Tel. +1 615 335 8520Fax +1 615 335 8608E-mail [email protected]

SAE Nashville Section (Student)Larry Sterling, Faculty AdvisorAES Student Section7 Music Circle N.Nashville, TN 37203Tel. +1 615 244 5848Fax +1 615 244 3192E-mail [email protected]

TEXAS

Southwest Texas StateUniversity Section (Student)Mark C. EricksonFaculty AdvisorAES Student Section Southwest Texas State

University224 N. Guadalupe St.San Marcos, TX 78666Tel. +1 512 245 8451Fax +1 512 396 1169E-mail [email protected]

WESTERN REGION,USA/CANADA

Vice President:Bob MosesIsland Digital Media Group,

LLC26510 Vashon Highway S.W.Vashon, WA 98070Tel. +1 206 463 6667Fax +1 810 454 5349E-mail [email protected]

UNITED STATES OFAMERICA

ARIZONA

Conservatory of TheRecording Arts and SciencesSection (Student)Glen O’Hara, Faculty AdvisorAES Student Section

SECTIONS CONTACTSDIRECTORY

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 875

Page 102: Journal AES 2003 Sept Vol 51 Num 9

Conservatory of The Recording Arts and Sciences

2300 E. Broadway Rd.Tempe, AZ 85282Tel. +1 480 858 9400, 800 562

6383 (toll-free)Fax +1 480 829 [email protected]

CALIFORNIA

American River CollegeSection (Student)Eric Chun, Faculty AdvisorAES Student SectionAmerican River College Chapter4700 College Oak Dr.Sacramento, CA 95841Tel. +1 916 484 8420E-mail [email protected]

Cal Poly San Luis ObispoState University Section(Student)Jerome R. BreitenbachFaculty AdvisorAES Student SectionCalifornia Polytechnic State

UniversityDept. of Electrical EngineeringSan Luis Obispo, CA 93407Tel. +1 805 756 5710Fax +1 805 756 1458E-mail [email protected]

California State University–Chico Section (Student)Keith Seppanen, Faculty AdvisorAES Student SectionCalifornia State University–Chico400 W. 1st St.Chico, CA 95929-0805Tel. +1 530 898 5500E-mail [email protected]

Citrus College Section(Student)Gary Mraz, Faculty AdvisorAES Student SectionCitrus CollegeRecording Arts1000 W. Foothill Blvd.Glendora, CA 91741-1899Fax +1 626 852 8063

Cogswells PolytechnicalCollege Section (Student)Tim Duncan, Faculty SponsorAES Student SectionCogswell Polytechnical CollegeMusic Engineering Technology1175 Bordeaux Dr.Sunnyvale, CA 94089Tel. +1 408 541 0100, ext. 130Fax +1 408 747 0764E-mail [email protected]

Expression Center for NewMedia Section (Student)Scott Theakston, Faculty AdvisorAES Student SectionEx’pression Center for New

Media6601 Shellmount St.Emeryville, CA 94608Tel. +1 510 654 2934

Fax +1 510 658 3414E-mail [email protected]

Long Beach City CollegeSection (Student)Nancy Allen, Faculty AdvisorAES Student SectionLong Beach City College4901 E. Carson St.Long Beach, CA 90808Tel. +1 562 938 4312Fax +1 562 938 4409E-mail [email protected]

Los Angeles SectionAndrew Turner14858 Gilmore St.Van Nuys, CA 91411Tel. +1 818 901 8056E-mail [email protected]

San Diego SectionJ. Russell Lemon2031 Ladera Ct.Carlsbad, CA 92009-8521Home Tel. +1 760 753 2949E-mail [email protected]

San Diego State UniversitySection (Student)John Kennedy, Faculty AdvisorAES Student SectionSan Diego State UniversityElectrical & Computer

Engineering Dept.5500 Campanile Dr.San Diego, CA 92182-1309Tel. +1 619 594 1053Fax +1 619 594 2654E-mail [email protected]

San Francisco SectionBill Orner1513 Meadow LaneMountain View, Ca 94040Tel. +1 650 903 0301Fax +1 650 903 0409E-mail [email protected]

San Francisco StateUniversity Section (Student)John Barsotti, Faculty AdvisorAES Student SectionSan Francisco State UniversityBroadcast and Electronic

Communication Arts Dept.1600 Halloway Ave.San Francisco, CA 94132Tel. +1 415 338 1507E-mail [email protected]

Stanford University Section(Student)Jay Kadis, Faculty AdvisorStanford AES Student SectionStanford UniversityCCRMA/Dept. of MusicStanford, CA 94305-8180Tel. +1 650 723 4971Fax +1 650 723 8468E-mail [email protected]

University of SouthernCalifornia Section(Student)Kenneth Lopez

Faculty AdvisorAES Student SectionUniversity of Southern California840 W. 34th St.Los Angeles, CA 90089-0851Tel. +1 213 740 3224Fax +1 213 740 3217E-mail [email protected]

COLORADO

Colorado SectionRobert F. MahoneyRobert F. Mahoney &

Associates310 Balsam Ave.Boulder, CO 80304Tel. +1 303 443 2213Fax +1 303 443 6989E-mail [email protected]

Denver Section (Student)Roy Pritts, Faculty AdvisorAES Student SectionUniversity of Colorado at

DenverDept. of Professional StudiesCampus Box 162P.O. Box 173364Denver, CO 80217-3364Tel. +1 303 556 2795Fax +1 303 556 2335E-mail [email protected]

OREGON

Portland SectionTony Dal MolinAudio Precision, Inc.5750 S.W. Arctic Dr.Portland, OR 97005Tel. +1 503 627 0832Fax +1 503 641 8906E-mail [email protected]

UTAH

Brigham Young UniversitySection (Student)Timothy Leishman,

Faculty AdvisorBYU-AES Student SectionDepartment of Physics andAstronomy Brigham Young UniversityProvo, UT 84602Tel. +1 801 422 4612E-mail [email protected]

Utah SectionDeward Timothyc/o Poll Sound4026 S. MainSalt Lake City, UT 84107Tel. +1 801 261 2500Fax +1 801 262 7379

WASHINGTON

Pacific Northwest SectionGary LouieUniversity of Washington

School of MusicPO Box 353450Seattle, WA 98195Office Tel. +1 206 543 1218Fax +1 206 685 9499E-mail [email protected]

The Art Institute of SeattleSection (Student)David G. ChristensenFaculty AdvisorAES Student SectionThe Art Institute of Seattle2323 Elliott Ave.Seattle, WA 98121-1622 Tel. +1 206 448 [email protected]

CANADA

Alberta SectionFrank LockwoodAES Alberta SectionSuite 404815 - 50 Avenue S.W.Calgary, Alberta T2S 1H8CanadaHome Tel. +1 403 703 5277Fax +1 403 762 6665E-mail [email protected]

Vancouver SectionPeter L. JanisC-Tec #114, 1585 BroadwayPort Coquitlam, B.C. V3C 2M7CanadaTel. +1 604 942 1001Fax +1 604 942 1010E-mail [email protected]

Vancouver Student SectionGregg Gorrie, Faculty AdvisorAES Greater Vancouver

Student SectionCentre for Digital Imaging and

Sound3264 Beta Ave.Burnaby, B.C. V5G 4K4, CanadaTel. +1 604 298 [email protected]

NORTHERN REGION,EUROPE

Vice President:Søren BechBang & Olufsen a/sCoreTechPeter Bangs Vej 15DK-7600 Struer, DenmarkTel. +45 96 84 49 62Fax +45 97 85 59 [email protected]

BELGIUM

Belgian SectionHermann A. O. WilmsAES Europe Region OfficeZevenbunderslaan 142, #9BE-1190 Vorst-Brussels, BelgiumTel. +32 2 345 7971Fax +32 2 345 3419

DENMARK

Danish SectionKnud Bank Christensen

876 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

SECTIONS CONTACTSDIRECTORY

Page 103: Journal AES 2003 Sept Vol 51 Num 9

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 877

Skovvej 2DK-8550 Ryomgård, DenmarkTel. +45 87 42 71 46Fax +45 87 42 70 10E-mail [email protected]

Danish Student SectionKnud Bank ChristensenSkovvej 2DK-8550 Ryomgård, DenmarkTel. +45 87 42 71 46Fax +45 87 42 70 10E-mail [email protected]

FINLAND

Finnish SectionKalle KoivuniemiNokia Research CenterP.O. Box 100FI-33721 Tampere, FinlandTel. +358 7180 35452Fax +358 7180 35897E-mail [email protected]

NETHERLANDS

Netherlands SectionRinus BooneVoorweg 105ANL-2715 NG ZoetermeerNetherlandsTel. +31 15 278 14 71, +31 62

127 36 51Fax +31 79 352 10 08E-mail [email protected]

Netherlands Student SectionDirk FischerAES Student SectionGroenewegje 143aDen Haag, NetherlandsHome Tel. +31 70 [email protected]

NORWAY

Norwegian SectionJan Erik JensenNøklesvingen 74NO-0689 Oslo, NorwayOffice Tel. +47 22 24 07 52Home Tel. +47 22 26 36 13 Fax +47 22 24 28 06E-mail [email protected]

RUSSIA

All-Russian State Institute ofCinematography Section(Student)Leonid Sheetov, Faculty SponsorAES Student SectionAll-Russian State Institute of

Cinematography (VGIK)W. Pieck St. 3RU-129226 Moscow, RussiaTel. +7 095 181 3868Fax +7 095 187 7174E-mail [email protected]

Moscow SectionMichael LannieResearch Institute for

Television and RadioAcoustic Laboratory

12-79 Chernomorsky bulvarRU-113452 Moscow, RussiaTel. +7 095 2502161, +7 095

1929011Fax +7 095 9430006E-mail [email protected]

St. Petersburg SectionIrina A. AldoshinaSt. Petersburg University of

TelecommunicationsGangutskaya St. 16, #31RU-191187 St. Petersburg

RussiaTel. +7 812 272 4405Fax +7 812 316 1559E-mail [email protected]

St. Petersburg Student SectionNatalia V. TyurinaFaculty AdvisorProsvescheniya pr., 41, 185RU-194291 St. Petersburg, RussiaTel. +7 812 595 1730Fax +7 812 316 [email protected]

SWEDEN

Swedish SectionMikael OlssonAudio Data LabKatarinavägen 22SE-116 45 Stockholm, SwedenTel. +46 8 30 29 98Fax +46 8 641 67 91E-mail [email protected]

University of Luleå-PiteåSection (Student)Lars Hallberg, Faculty SponsorAES Student SectionUniversity of Luleå-PiteåSchool of MusicBox 744S-94134 Piteå, SwedenTel. +46 911 726 27Fax +46 911 727 10E-mail [email protected]

UNITED KINGDOM

British SectionHeather LaneAudio Engineering SocietyP.O. Box 645Slough GB-SL1 8BJUnited KingdomTel. +44 1628 663725Fax +44 1628 667002E-mail [email protected]

CENTRAL REGION,EUROPE

Vice President:Markus ErneScopein ResearchSonnmattweg 6CH-5000 Aarau, SwitzerlandTel. +41 62 825 09 19Fax +41 62 825 09 15

[email protected]

AUSTRIA

Austrian SectionFranz LechleitnerLainergasse 7-19/2/1AT-1230 Vienna, AustriaOffice Tel. +43 1 4277 29602Fax +43 1 4277 9296E-mail [email protected]

Graz Section (Student)Robert Höldrich Faculty SponsorInstitut für Elektronische Musik

und AkustikInffeldgasse 10AT-8010 Graz, AustriaTel. +43 316 389 3172Fax +43 316 389 3171E-mail [email protected]

Vienna Section (Student)Jürg Jecklin, Faculty SponsorVienna Student SectionUniversität für Musik und

Darstellende Kunst WienInstitut für Elektroakustik und

Experimentelle MusikRienösslgasse 12AT-1040 Vienna, AustriaTel. +43 1 587 3478Fax +43 1 587 3478 20E-mail [email protected]

CZECH REPUBLIC

Czech SectionJiri OcenasekDejvicka 36CZ-160 00 Prague 6Czech Republic Home Tel. +420 2 24324556E-mail [email protected]

Czech Republic StudentSectionLibor Husník, Faculty AdvisorAES Student SectionCzech Technical University at

PragueTechnická 2, CZ-116 27 Prague 6Czech RepublicTel. +420 2 2435 2115E-mail [email protected]

GERMANY

Aachen Section (Student)Michael VorländerFaculty AdvisorInstitut für Technische AkustikRWTH AachenTemplergraben 55D-52065 Aachen, GermanyTel. +49 241 807985Fax +49 241 8888214E-mail [email protected]

Berlin Section (Student)Bernhard Güttler Zionskirchstrasse 14DE-10119 Berlin, GermanyTel. +49 30 4404 72 19

Fax +49 30 4405 39 03E-mail [email protected]

Central German SectionErnst-Joachim VölkerInstitut für Akustik und

BauphysikKiesweg 22-24DE-61440 Oberursel, GermanyTel. +49 6171 75031Fax +49 6171 85483E-mail [email protected]

Darmstadt Section (Student)G. M. Sessler, Faculty SponsorAES Student SectionTechnical University of

DarmstadtInstitut für ÜbertragungstechnikMerkstr. 25DE-64283 Darmstadt, GermanyTel. +49 6151 [email protected]

Detmold Section (Student)Andreas Meyer, Faculty SponsorAES Student Sectionc/o Erich Thienhaus InstitutTonmeisterausbildung

Hochschule für Musik Detmold

Neustadt 22, DE-32756Detmold, GermanyTel/Fax +49 5231 975639E-mail [email protected]

Düsseldolf Section (Student)Ludwig KuglerAES Student SectionBilker Allee 126DE-40217 Düsseldorf, GermanyTel. +49 211 3 36 80 [email protected]

Ilmenau Section (Student)Karlheinz BrandenburgFaculty SponsorAES Student SectionInstitut für MedientechnikPF 10 05 65DE-98684 Ilmenau, GermanyTel. +49 3677 69 2676Fax +49 3677 69 [email protected]

North German SectionReinhard O. SahrEickhopskamp 3DE-30938 Burgwedel, GermanyTel. +49 5139 4978Fax +49 5139 5977E-mail [email protected]

South German SectionGerhard E. PicklappLandshuter Allee 162DE-80637 Munich, GermanyTel. +49 89 15 16 17Fax +49 89 157 10 31E-mail [email protected]

SECTIONS CONTACTSDIRECTORY

Page 104: Journal AES 2003 Sept Vol 51 Num 9

HUNGARY

Hungarian SectionIstván MatókRona u. 102. II. 10HU-1149 Budapest, HungaryHome Tel. +36 30 900 1802Fax +36 1 383 24 81E-mail [email protected]

LITHUANIA

Lithuanian SectionVytautas J. StauskisVilnius Gediminas Technical

UniversityTraku 1/26, Room 112LT-2001 Vilnius, LithuaniaTel. +370 5 262 91 78Fax +370 5 261 91 44E-mail [email protected]

POLAND

Polish SectionJan A. AdamczykUniversity of Mining and

MetallurgyDept. of Mechanics and

Vibroacousticsal. Mickiewicza 30PL-30 059 Cracow, PolandTel. +48 12 617 30 55Fax +48 12 633 23 14E-mail [email protected]

Technical University of GdanskSection (Student)Pawel ZwanAES Student Section Technical University of GdanskSound Engineering Dept.ul. Narutowicza 11/12PL-80 952 Gdansk, PolandHome Tel. +48 58 347 23 98Office Tel. +4858 3471301Fax +48 58 3471114E-mail [email protected]

Wroclaw University ofTechnology Section (Student)Andrzej B. DobruckiFaculty SponsorAES Student SectionInstitute of Telecommunications

and AcousticsWroclaw Univ.TechnologyWybrzeze Wyspianskiego 27PL-503 70 Wroclaw, PolandTel. +48 71 320 30 68Fax +48 71 320 31 89E-mail [email protected]

REPUBLIC OF BELARUS

Belarus SectionValery ShalatoninBelarusian State University of

Informatics and Radioelectronics

vul. Petrusya Brouki 6BY-220027 MinskRepublic of BelarusTel. +375 17 239 80 95Fax +375 17 231 09 14

E-mail [email protected]

SLOVAK REPUBLIC

Slovakian Republic SectionRichard VarkondaCentron Slovakia Ltd.Podhaj 107SK-841 03 BratislavaSlovak RepublicTel. +421 7 6478 0767Fax. +421 7 6478 [email protected]

SWITZERLAND

Swiss SectionJoël GodelAES Swiss SectionSonnmattweg 6CH-5000 AarauSwitzerlandE-mail [email protected]

UKRAINE

Ukrainian SectionValentin AbakumovNational Technical University

of UkraineKiev Politechnical InstitutePolitechnical St. 16Kiev UA-56, UkraineTel./Fax +38 044 2366093

SOUTHERN REGION,EUROPE

Vice President:Daniel ZalayConservatoire de ParisDept. SonFR-75019 Paris, FranceOffice Tel. +33 1 40 40 46 14Fax +33 1 40 40 47 [email protected]

BOSNIA-HERZEGOVINA

Bosnia-Herzegovina SectionJozo TalajicBulevar Mese Selimovica 12BA-71000 SarajevoBosnia–HerzegovinaTel. +387 33 455 160Fax +387 33 455 163E-mail [email protected]

BULGARIA

Bulgarian SectionKonstantin D. KounovBulgarian National RadioTechnical Dept.4 Dragan Tzankov Blvd. BG-1040 Sofia, BulgariaTel. +359 2 65 93 37, +359 2

9336 6 01

Fax +359 2 963 1003E-mail [email protected]

CROATIA

Croatian SectionSilvije StamacHrvatski RadioPrisavlje 3HR-10000 Zagreb, CroatiaTel. +385 1 634 28 81Fax +385 1 611 58 29E-mail [email protected]

Croatian Student SectionHrvoje DomitrovicFaculty AdvisorAES Student SectionFaculty of Electrical

Engineering and ComputingDept. of Electroaocustics (X. Fl.)Unska 3HR-10000 Zagreb, CroatiaTel. +385 1 6129 640Fax +385 1 6129 [email protected]

FRANCE

Conservatoire de ParisSection (Student)Alessandra Galleron36, Ave. ParmentierFR-75011 Paris, FranceTel. +33 1 43 38 15 94

French SectionMichael WilliamsIle du Moulin62 bis Quai de l’Artois FR-94170 Le Perreux sur

Marne, FranceTel. +33 1 48 81 46 32Fax +33 1 47 06 06 48E-mail [email protected]

Louis Lumière Section(Student)Alexandra Carr-BrownAES Student SectionEcole Nationale Supérieure

Louis Lumière7, allée du Promontoire, BP 22FR-93161 Noisy Le Grand

Cedex, FranceTel. +33 6 18 57 84 41E-mail [email protected]

GREECE

Greek SectionVassilis TsakirisCrystal AudioAiantos 3a VrillissiaGR 15235 Athens, GreeceTel. + 30 2 10 6134767Fax + 30 2 10 6137010E-mail [email protected]

ISRAEL

Israel SectionBen Bernfeld Jr.H. M. Acustica Ltd.20G/5 Mashabim St..

IL-45201 Hod Hasharon, IsraelTel./Fax +972 9 7444099E-mail [email protected]

ITALY

Italian SectionCarlo Perrettac/o AES Italian SectionPiazza Cantore 10IT-20134 Milan, ItalyTel. +39 338 9108768Fax +39 02 58440640E-mail [email protected]

Italian Student SectionFranco Grossi, Faculty AdvisorAES Student SectionViale San Daniele 29 IT-33100 Udine, ItalyTel. +39 [email protected]

PORTUGAL

Portugal SectionRui Miguel Avelans CoelhoR. Paulo Renato 1, 2APT-2745-147 Linda-a-VelhaPortugalTel. +351 214145827E-mail [email protected]

ROMANIA

Romanian SectionMarcia TaiachinRadio Romania60-62 Grl. Berthelot St.RO-79756 Bucharest, RomaniaTel. +40 1 303 12 07Fax +40 1 222 69 19

SERBIA AND MONTENEGRO

Serbia and Montenegro SectionTomislav StanojevicSava centreM. Popovica 9YU-11070 Belgrade, YugoslaviaTel. +381 11 311 1368Fax +38111 605 [email protected]

SLOVENIA

Slovenian SectionTone SeliskarRTV SlovenijaKolodvorska 2SI-1550 Ljubljana, SloveniaTel. +386 61 175 2708Fax +386 61 175 2710E-mail [email protected]

SPAIN

Spanish SectionJuan Recio Morillas

878 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

SECTIONS CONTACTSDIRECTORY

Page 105: Journal AES 2003 Sept Vol 51 Num 9

J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September 879

Spanish SectionC/Florencia 14 3oDES-28850 Torrejon de Ardoz

(Madrid), SpainTel. +34 91 540 14 03E-mail [email protected]

Turkish SectionSorgun AkkorSTDGazeteciler Sitesi, Yazarlar

Sok. 19/6Esentepe 80300 Istanbul, TurkeyTel. +90 212 2889825Fax +90 212 2889831E-mail [email protected]

LATIN AMERICAN REGION

Vice President:Mercedes OnoratoTalcahuano 141Buenos Aires, ArgentinaTel./Fax +5411 4 375 [email protected]

ARGENTINA

Argentina SectionHernan Ranucci Talcahuano 141Buenos Aires, Argentina 1013Tel./Fax +5411 4 375 0116E-mail [email protected]

BRAZIL

Brazil SectionRosalfonso BortoniRua Doutor Jesuíno Maciel,

1584/22Campo BeloSão Paulo, SP, Brazil 04615-004Tel.+55 11 5533-3970Fax +55 21 2421 0112E-mail [email protected]

CHILE

Chile SectionAndres SchmidtHernan Cortes 2768Ñuñoa, Santiago de ChileTel. +56 2 4249583E-mail [email protected]

COLOMBIA

Colombia SectionTony Penarredonda CaraballoCarrera 51 #13-223Medellin, ColombiaTel. +57 4 265 7000Fax +57 4 265 2772E-mail [email protected]

MEXICO

Mexican SectionJavier Posada Div. Del Norte #1008Col. Del Valle

Mexico, D.F. MX-03100MexicoTel. +52 5 669 48 79Fax +52 5 543 60 [email protected]

URUGUAY

Uruguay SectionRafael AbalSondor S.A.Calle Rio Branco 1530C.P. UY-11100 MontevideoUruguayTel. +598 2 901 26 70,

+598 2 90253 88Fax +598 2 902 52 72E-mail [email protected]

VENEZUELA

Taller de Arte Sonoro,Caracas Section (Student)Carmen Bell-Smythe de LealFaculty AdvisorAES Student SectionTaller de Arte SonoroAve. Rio de Janeiro Qta. Tres PinosChuao, VE-1061 CaracasVenezuelaTel. +58 14 9292552Tel./Fax +58 2 9937296E-mail [email protected]

Venezuela SectionElmar LealAve. Rio de JaneiroQta. Tres PinosChuao, VE-1061 CaracasVenezuelaTel. +58 14 9292552Tel./Fax +58 2 9937296E-mail [email protected]

INTERNATIONAL REGION

Vice President:Neville Thiele10 Wycombe St.Epping, NSW AU-2121,AustraliaTel. +61 2 9876 2407Fax +61 2 9876 2749E-mail [email protected]

AUSTRALIA

Adelaide SectionDavid MurphyKrix Loudspeakers14 Chapman Rd.Hackham AU-5163South AustraliaTel. +618 8 8384 3433Fax +618 8 8384 3419E-mail [email protected]

Brisbane SectionDavid RingroseAES Brisbane SectionP.O. Box 642Roma St. Post Office

Brisbane, Qld. AU-4003, AustraliaOffice Tel. +61 7 3364 6510E-mail [email protected]

Melbourne SectionGraham J. HaynesP.O. Box 5266Wantirna South, VictoriaAU-3152, AustraliaTel. +61 3 9887 3765Fax +61 3 9887 [email protected]

Sydney SectionHoward JonesAES Sydney SectionP.O. Box 766Crows Nest, NSW AU-2065AustraliaTel. +61 2 9417 3200Fax +61 2 9417 3714E-mail [email protected]

HONG KONG

Hong Kong SectionHenry Ma Chi FaiHKAPA, School of Film and

Television1 Gloucester Rd. Wanchai, Hong KongTel. +852 2584 8824Fax +852 2588 [email protected]

INDIA

India SectionAvisound A-20, DeepanjaliShahaji Raje MargVile Parle EastMumbai IN-400 057, IndiaTel. +91 22 26827535E-mail [email protected]

JAPAN

Japan SectionKatsuya (Vic) Goh2-15-4 Tenjin-cho, Fujisawa-shiKanagawa-ken 252-0814, JapanTel./Fax +81 466 81 0681E-mail [email protected]

KOREA

Korea SectionSeong-Hoon KangTaejeon Health Science CollegeDept. of Broadcasting

Technology77-3 Gayang-dong Dong-guTaejeon, Korea Tel. +82 42 630 5990Fax +82 42 628 1423E-mail [email protected]

MALAYSIA

Malaysia SectionC. K. Ng King Musical Industries

Sdn Bhd

Lot 5, Jalan 13/2MY-46200 Kuala LumpurMalaysiaTel. +603 7956 1668Fax +603 7955 4926E-mail [email protected]

Philippines SectionDario (Dar) J. Quintos125 Regalia Park TowerP. Tuazon Blvd., CubaoQuezon City, PhilippinesTel./Fax +63 2 4211790, +63 2

4211784E-mail [email protected]

SINGAPORE

Singapore SectionKenneth J. Delbridge480B Upper East Coast Rd.Singapore 466518Tel. +65 9875 0877Fax +65 6220 0328E-mail [email protected]

Chair:Dell HarrisHampton University Section(AES)63 Litchfield CloseHampton, VA 23669Tel +1 757 265 1033E-mail [email protected]

Vice Chair:Scott CannonStanford University Section (AES)P.O. Box 15259Stanford, CA 94309Tel. +1 650 346 4556Fax +1 650 723 8468E-mail [email protected]

Chair:Isabella Biedermann European Student SectionAuerhahnweg 13A-9020 Klagenfurt, AustriaTel. +43 664 452 57 22E-mail [email protected]

Vice Chair:Felix Dreher European Student SectionUniversity of Music andPerforming ArtsStreichergasse 3/1 AA-1030 Vienna, AustriaTel. +43 1 920 54 19E-mail [email protected]

EUROPE/INTERNATIONALREGIONS

NORTH/SOUTH AMERICA REGIONS

STUDENT DELEGATEASSEMBLY

SECTIONS CONTACTSDIRECTORY

Page 106: Journal AES 2003 Sept Vol 51 Num 9

880 J. Audio Eng. Soc., Vol. 51, No. 9, 2003 September

AES CONVENTIONS AND CON

24th International ConferenceBanff, Canada“Multichannel Audio:The New Reality”Date: 2003 June 26–28Location: The Banff Centre,Banff, Alberta, Canada

The latest details on the following events are posted on the AES Website: http://www.aes.org

Conference chair:Per RubakAalborg UniversityFredrik Bajers Vej 7 A3-216DK-9220 Aalborg ØDenmarkTelephone: +45 9635 8682Email: [email protected]

Papers cochair: Jan Abildgaard PedersenBang & Olufsen A/SPeter Bangs Vej 15P.O. Box 40,DK-7600 StruerPhone: +45 9684 1122Email: [email protected]

Papers cochair: Lars Gottfried Johansen

23rd International ConferenceCopenhagen, Denmark“Signal Processing in AudioRecording andReproduction”Date: 2003 May 23–25Location: Marienlyst Hotel,Helsingør, Copenhagen,Denmark

Conference chair:Theresa LeonardThe Banff CentreBanff, CanadaEmail: [email protected]

Conference vice chair:John SorensenThe Banff CentreBanff, CanadaEmail: [email protected]

Papers chair: Geoff MartinEmail: [email protected]

Convention chair:Kimio HamasakiNHK Science & TechnicalResearch LaboratoriesTelephone: +81 3 5494 3208Fax: +81 3 5494 3219Email: [email protected]

Convention vice chair: Hiroaki SuzukiVictor Company of Japan (JVC)Telephone: +81 45 450 1779

Email: [email protected]

Papers chair: Shinji KoyanoPioneer CorporationTelephone: +81 49 279 2627Fax: +81 49 279 1513Email: [email protected]

Workshops chair: Toru KamekawaTokyo National University of Fine Art &MusicTelephone: +81 3 297 73 8663

11th Regional ConventionTokyo, JapanDate: 2003 July 7–9Location: Science Museum,Chiyoda, Tokyo, Japan

Papers cochair:Gerhard StollIRT, Munich, GermanyEmail: [email protected]

Papers cochair:Russell MasonUniversity of Surrey, Guildford, UKEmail: [email protected]

Conference chair:John GrantNine Tiles Networks, Cambridge, UKEmail: [email protected]

25th International ConferenceLondon, UK“Metadata for Audio”Date: 2004 June 17–19

Papers chair:James D. JohnstonMicrosoft CorporationTelephone: + 1 425 703 6380Email: [email protected]

Convention chair:Zoe ThrallThe Hit Factory421 West 54th StreetNew York, NY 10019, USATelephone: + 1 212 664 1000Fax: + 1 212 307 6129Email: [email protected]

115th ConventionNew York, NY, USADate: 2003 October 10–13Location: Jacob K. JavitsConvention Center, NewYork, New York, USA

New York

2003

Papers cochair:Ben BernfeldKrozinger Str. 22DE-79219 Staufen, GermanyEmail: [email protected]

Papers cochair:Stephan PeusGeorg Neumann GmbHEmail: [email protected]

Convention chair:Reinhard O. SahrEickhopskamp 3DE-30938 Burgwedel, GermanyTelephone: + 49 5139 4978Fax: + 49 5139 5977Email: [email protected]

Vice chair:Jörg KnotheDeutschlandRadioEmail: [email protected]

116th ConventionBerlin, GermanyDate: 2004 May 8–11Location: Messe BerlinBerlin, Germany

2003

Banff2003

Berlin, Germany2004

Page 107: Journal AES 2003 Sept Vol 51 Num 9

Aalborg UniversityPhone: +45 9635 9828Email: [email protected]

Call for papers: Vol. 50, No. 9,p. 737 (2002 September)

Conference preview: Vol. 51, No. 3,pp. 170–179 (2003 March)

Conference report: This issue,pp. 846–854 (2003 September)

FERENCESPresentationManuscripts submitted should betypewritten on one side of ISO size A4(210 x 297 mm) or 216-mm x 280-mm(8.5-inch x 11-inch) paper with 40-mm(1.5-inch) margins. All copies includingabstract, text, references, figure captions,and tables should be double-spaced.Pages should be numbered consecutively.Authors should submit an original plustwo copies of text and illustrations.ReviewManuscripts are reviewed anonymouslyby members of the review board. After thereviewers’ analysis and recommendationto the editors, the author is advised ofeither acceptance or rejection. On thebasis of the reviewers’ comments, theeditor may request that the author makecertain revisions which will allow thepaper to be accepted for publication.ContentTechnical articles should be informativeand well organized. They should citeoriginal work or review previous work,giving proper credit. Results of actualexperiments or research should beincluded. The Journal cannot acceptunsubstantiated or commercial statements.OrganizationAn informative and self-containedabstract of about 60 words must beprovided. The manuscript should developthe main point, beginning with anintroduction and ending with a summaryor conclusion. Illustrations must haveinformative captions and must be referredto in the text.

References should be cited numerically inbrackets in order of appearance in thetext. Footnotes should be avoided, whenpossible, by making parentheticalremarks in the text.

Mathematical symbols, abbreviations,acronyms, etc., which may not be familiarto readers must be spelled out or definedthe first time they are cited in the text.

Subheads are appropriate and should beinserted where necessary. Paragraphdivision numbers should be of the form 0(only for introduction), 1, 1.1, 1.1.1, 2, 2.1,2.1.1, etc.

References should be typed on amanuscript page at the end of the text inorder of appearance. References toperiodicals should include the authors’names, title of article, periodical title,volume, page numbers, year and monthof publication. Book references shouldcontain the names of the authors, title ofbook, edition (if other than first), nameand location of publisher, publication year,and page numbers. References to AESconvention preprints should be replacedwith Journal publication citations if thepreprint has been published.IllustrationsFigure captions should be typed on aseparate sheet following the references.Captions should be concise. All figures

should be labeled with author’s name andfigure number.Photographs should be black and white prints without a halftone screen,preferably 200 mm x 250 mm (8 inch by10 inch).Line drawings (graphs or sketches) can beoriginal drawings on white paper, or high-quality photographic reproductions.The size of illustrations when printed in theJournal is usually 82 mm (3.25 inches)wide, although 170 mm (6.75 inches) widecan be used if required. Letters on originalillustrations (before reduction) must be largeenough so that the smallest letters are atleast 1.5 mm (1/16 inch) high when theillustrations are reduced to one of the abovewidths. If possible, letters on all originalillustrations should be the same size.Units and SymbolsMetric units according to the System ofInternational Units (SI) should be used.For more details, see G. F. Montgomery,“Metric Review,” JAES, Vol. 32, No. 11,pp. 890–893 (1984 Nov.) and J. G.McKnight, “Quantities, Units, LetterSymbols, and Abbreviations,” JAES, Vol.24, No. 1, pp. 40, 42, 44 (1976 Jan./Feb.).Following are some frequently used SIunits and their symbols, some non-SI unitsthat may be used with SI units (), andsome non-SI units that are deprecated ( ).

Unit Name Unit Symbolampere Abit or bits spell outbytes spell outdecibel dBdegree (plane angle) () °farad Fgauss ( ) Gsgram ghenry Hhertz Hzhour () hinch ( ) injoule Jkelvin Kkilohertz kHzkilohm kΩliter () l, Lmegahertz MHzmeter mmicrofarad µFmicrometer µmmicrosecond µsmilliampere mAmillihenry mHmillimeter mmmillivolt mVminute (time) () minminute (plane angle) () ’nanosecond nsoersted ( ) Oeohm Ωpascal Papicofarad pFsecond (time) ssecond (plane angle) () ”siemens Stesla Tvolt Vwatt Wweber Wb

INFORMATION FOR AUTHORS

Call for contributions: Vol. 50, No. 10,pp. 851–852 (2002 October)

Conference preview: Vol. 51, No. 4,pp. 258–270 (2003 April)

Call for papers: This issue,pp. 871 (2003 September)

Exhibit information:Chris PlunkettTelephone: +1 212 661 8528Fax: +1 212 682 0477Email: [email protected]

Call for papers: Vol. 51, No. 1/2,pp. 112 (2003 January/February)

Exhibit information:Thierry BergmansTelephone: +32 2 345 7971Fax: +32 2 345 3419Email: [email protected]

Call for papers: Vol. 51, No. 7/8,pp. 768 (2003 July/August)

Fax: +81 297 73 8670Email: [email protected]

Exhibit chair: Tadahiko NakaokiPioneer Business Systems DivisionTelephone: +81 3 3763 9445Fax : +81 3 3763 3138Email: [email protected]

Section contact: Vic GohEmail: [email protected]

Call for papers: Vol. 50, No. 12,pp. 1124 (2002 December)

Page 108: Journal AES 2003 Sept Vol 51 Num 9

sustainingmemberorganizations AESAES

VO

LU

ME

51,NO

.9JO

UR

NA

L O

F T

HE

AU

DIO

EN

GIN

EE

RIN

G S

OC

IET

Y2003 S

EP

TE

MB

ER

JOURNAL OF THE AUDIO ENGINEERING SOCIETYAUDIO / ACOUSTICS / APPLICATIONSVolume 51 Number 9 2003 September

In this issue…

Down Mixing 5.1 Surround

Approximations to HRTFComputations

Listeners Evaluate Loudspeakers

Objective Measure of Envelopment

Features…

23rd Conference Report,Copenhagen

Digital Rights Management

Call for Papers25th Conference, London

The Audio Engineering Society recognizes with gratitude the financialsupport given by its sustaining members, which enables the work ofthe Society to be extended. Addresses and brief descriptions of thebusiness activities of the sustaining members appear in the Octoberissue of the Journal.

The Society invites applications for sustaining membership. Informa-tion may be obtained from the Chair, Sustaining Memberships Committee, Audio Engineering Society, 60 East 42nd St., Room2520, New York, New York 10165-2520, USA, tel: 212-661-8528.Fax: 212-682-0477.

ACO Pacific, Inc.Acustica Beyma SAAir Studios Ltd.AKG Acoustics GmbHAKM Semiconductor, Inc.Amber Technology LimitedAMS Neve plcATC Loudspeaker Technology Ltd.Audio LimitedAudiomatica S.r.l.Audio Media/IMAS Publishing Ltd.Audio Precision, Inc.AudioScience, Inc.Audio-Technica U.S., Inc.AudioTrack CorporationAutograph Sound Recording Ltd.B & W Loudspeakers LimitedBMP RecordingBritish Broadcasting CorporationBSS Audio Cadac Electronics PLCCalrec AudioCanford Audio plcCEDAR Audio Ltd.Celestion International LimitedCerwin-Vega, IncorporatedClearOne Communications Corp.Community Professional Loudspeakers, Inc.Crystal Audio Products/Cirrus Logic Inc.D.A.S. Audio, S.A.D.A.T. Ltd.dCS Ltd.Deltron Emcon LimitedDigidesignDigigramDigital Audio Disc CorporationDolby Laboratories, Inc.DRA LaboratoriesDTS, Inc.DYNACORD, EVI Audio GmbHEastern Acoustic Works, Inc.Eminence Speaker LLC

Event Electronics, LLCFerrotec (USA) CorporationFocusrite Audio Engineering Ltd.Fostex America, a division of Foster Electric

U.S.A., Inc.Fraunhofer IIS-AFreeSystems Private LimitedFTG Sandar TeleCast ASHarman BeckerHHB Communications Ltd.Innova SONInnovative Electronic Designs (IED), Inc.International Federation of the Phonographic

IndustryJBL ProfessionalJensen Transformers Inc.Kawamura Electrical LaboratoryKEF Audio (UK) LimitedKenwood U.S.A. CorporationKlark Teknik Group (UK) PlcKlipsch L.L.C.Laboratories for InformationL-Acoustics USLeitch Technology CorporationLindos ElectronicsMagnetic Reference Laboratory (MRL) Inc.Martin Audio Ltd.Meridian Audio LimitedMetropolis GroupMiddle Atlantic Products Inc.Mosses & MitchellM2 Gauss Corp.Georg Neumann GmbH Neutrik AGNVisionNXT (New Transducers Ltd.)1 LimitedOntario Institute of Audio Recording

TechnologyOutline sncPacific Audio-VisualPRIMEDIA Business Magazines & Media Inc.Prism SoundPro-Bel Limited

Pro-Sound NewsPsychotechnology, Inc.Radio Free AsiaRane CorporationRecording ConnectionRocket NetworkRoyal National Institute for the BlindRTI Tech Pte. Ltd.Rycote Microphone Windshields Ltd.SADiESanctuary Studios Ltd.Sekaku Electron Ind. Co., Ltd.Sennheiser Electronic CorporationShure Inc.Snell & Wilcox Ltd.Solid State Logic, Ltd.Sony Broadcast & Professional EuropeSound Devices LLCSound On Sound Ltd.Soundcraft Electronics Ltd.Sowter Audio TransformersSRS Labs, Inc.Stage AccompanySterling Sound, Inc.Studer North America Inc.Studer Professional Audio AGTannoy LimitedTASCAMTHAT CorporationTOA Electronics, Inc.TommexTouchtunes Music Corp.TurbosoundUnited Entertainment Media, Inc.Uniton AGUniversity of DerbyUniversity of SalfordUniversity of Surrey, Dept. of Sound

RecordingVCS AktiengesellschaftVidiPaxWenger CorporationJ. M. Woodgate and AssociatesYamaha Research and Development