the real-time implementation of 3d sound system using dspee301.wdfiles.com/local--files/dsp1/3d...

3
The Real-Time Implementation of 3D Sound System using DSP Hyung-Jung Kim, Deock-Gu Jee, Man-Ho Park, Byung-Sik Yoon, Song-In Choi Mobile A/V Research Team Electronics and Telecommunications Research Institute 161 Gajeong-Dong, Yuseong-Gu, Daejeon 305-350, Korea [email protected] Abstract—This paper describes a real-time 3D sound system implemented with the use of embedded DSP. We develop a headphone-based new 3D sound algorithm by applying source localization method using HRTF. Localizing 5.1-channel data appropriate location by our algorithm generates the virtual 3D sound. The real-time implementation is achieved using TMS320C6713 DSP chip. We describe hardware design. As a result of a DSP implementation and a subjective listening test, this 3D sound system can be applied to mobile application. I. INTRODUCTION In recent years, in the field of acoustic signal processing, several sophisticated approaches have been attempted for realizing 3D sound effect [1,2], which are based mainly on the so-called Head-Related-Transfer Function (HRTF), which is influenced in the human ears [3]. In general, given a sound source, the 3D localization of the sound can be realized on the basis of the HRTFs from the source to the right and left ears. A scheme to transform a monaural sound of no localization into 2-channel stereo sound of 3D localization on the basis of HRTFs has been confronted with the difficulty the total computational complexity is too intensive to be implemented by using General DSP. Thus the conventional HRTF approach has been unable so far to afford a DSP implementation of the real-time 3D sound localization. To cope with the difficulty, this paper proposes a DSP implementation of a real-time 3D sound localization algorithm to be run on embedded DSP, on which sound quality is evaluated by a subjective listening test. In this paper we present new 3D sound algorithm, software and hardware implementation based on TI TMS320C6713 DSP chip. The conventional 3D sound localization is explained in section II. Then, we explain the novel real-time algorithm and DSP implementation scheme in section III and IV respectively. Finally we remark conclusions in section V. II. CONVENTIONAL 3D SOUND LOCALIZATION The conventional HRTF approach [2,4] for the 3D sound localization is briefed below: First, compute HRTFs necessary for the 3D sound localization, which can be attained by formulating two equations, one representing the signal of a given sound source and the other indicating the signal output from the dummy head microphones for the sound transferred from the source. Then, monaural input data are processed by these HRTFs, and the results are superposed to output from stereo-headphones. In general, the frequency response characteristic of an HRTF is so complicated the large numbers of digital filters are necessary for the 3D sound localization. Also in the conventional methods, “Convolvotron” that E.M. Wenzel et al.[5] has devised requires 300 MIPS to realize the 3D sound localization. Therefore, a great amount of parametric operation is necessary to realize the 3D sound localization. We use two HRTF DBs as baseline of work. One is MIT HRTF DB that is released to the public in the Internet in 1994, and the other is CIPIC HRTF DB released by CIPIC Interface Lab. in 2001. Especially, CIPIC HRTF DB has HRTF data for 43 subjects as well as 2 KEMAR dummy heads with high spatial resolution and it also contains anthropometric measurements for each subject. The impulse response of CIPIC DB, which is made of 200 tabs, requires more complexity in real-time implementation. So, we reduce impulse response of CIPIC DB to 128 tabs that are equivalent in spectrum response. We find out the most suitable set of digital filters by many simulation and experiment. Consequently, associated with such a conventional method, there still remains much room to remove the barrier of computational complexity so as to achieve a DSP implementation of the real-time 3D sound localization. III. NOVEL REAL-TIME ALGORITHM We propose a new algorithm of a stereo headphone-based multi-channel 3D sound generation system using HRTF. First, the 3D sound localization algorithm for a mono sound is implemented using the HRTF, and then virtual 3D sound is generated by localizing each 5.1 channel signal to appropriate location. We use reduced CIPIC DB for sound localization processing. In general, binaural filtering of a sound source with HRTFs can localize the sound desired elevation and azimuth. But it has some problems such as the cone-of-confusion and inside-the- head localization. To alleviate the cone-of-confusion, we apply the conventional algorithms that modify the spectrum of HRTF and we also propose a new algorithm that boosts or suppresses 4798 0-7803-8521-7/04/$20.00 © 2004 IEEE

Upload: vanduong

Post on 13-Feb-2019

225 views

Category:

Documents


0 download

TRANSCRIPT

The Real-Time Implementation of 3D Sound System using DSP

Hyung-Jung Kim, Deock-Gu Jee, Man-Ho Park, Byung-Sik Yoon, Song-In Choi Mobile A/V Research Team

Electronics and Telecommunications Research Institute 161 Gajeong-Dong, Yuseong-Gu, Daejeon 305-350, Korea

[email protected]

Abstract—This paper describes a real-time 3D sound system implemented with the use of embedded DSP. We develop a headphone-based new 3D sound algorithm by applying source localization method using HRTF. Localizing 5.1-channel data appropriate location by our algorithm generates the virtual 3D sound. The real-time implementation is achieved using TMS320C6713 DSP chip. We describe hardware design. As a result of a DSP implementation and a subjective listening test, this 3D sound system can be applied to mobile application.

I. INTRODUCTION In recent years, in the field of acoustic signal processing,

several sophisticated approaches have been attempted for realizing 3D sound effect [1,2], which are based mainly on the so-called Head-Related-Transfer Function (HRTF), which is influenced in the human ears [3]. In general, given a sound source, the 3D localization of the sound can be realized on the basis of the HRTFs from the source to the right and left ears.

A scheme to transform a monaural sound of no localization into 2-channel stereo sound of 3D localization on the basis of HRTFs has been confronted with the difficulty the total computational complexity is too intensive to be implemented by using General DSP.

Thus the conventional HRTF approach has been unable so far to afford a DSP implementation of the real-time 3D sound localization. To cope with the difficulty, this paper proposes a DSP implementation of a real-time 3D sound localization algorithm to be run on embedded DSP, on which sound quality is evaluated by a subjective listening test.

In this paper we present new 3D sound algorithm, software and hardware implementation based on TI TMS320C6713 DSP chip. The conventional 3D sound localization is explained in section II. Then, we explain the novel real-time algorithm and DSP implementation scheme in section III and IV respectively. Finally we remark conclusions in section V.

II. CONVENTIONAL 3D SOUND LOCALIZATION The conventional HRTF approach [2,4] for the 3D sound

localization is briefed below: First, compute HRTFs necessary for the 3D sound localization, which can be attained by formulating two equations, one representing the signal of a given sound source and the other indicating the signal output

from the dummy head microphones for the sound transferred from the source. Then, monaural input data are processed by these HRTFs, and the results are superposed to output from stereo-headphones.

In general, the frequency response characteristic of an HRTF is so complicated the large numbers of digital filters are necessary for the 3D sound localization. Also in the conventional methods, “Convolvotron” that E.M. Wenzel et al.[5] has devised requires 300 MIPS to realize the 3D sound localization. Therefore, a great amount of parametric operation is necessary to realize the 3D sound localization.

We use two HRTF DBs as baseline of work. One is MIT HRTF DB that is released to the public in the Internet in 1994, and the other is CIPIC HRTF DB released by CIPIC Interface Lab. in 2001. Especially, CIPIC HRTF DB has HRTF data for 43 subjects as well as 2 KEMAR dummy heads with high spatial resolution and it also contains anthropometric measurements for each subject. The impulse response of CIPIC DB, which is made of 200 tabs, requires more complexity in real-time implementation. So, we reduce impulse response of CIPIC DB to 128 tabs that are equivalent in spectrum response. We find out the most suitable set of digital filters by many simulation and experiment.

Consequently, associated with such a conventional method, there still remains much room to remove the barrier of computational complexity so as to achieve a DSP implementation of the real-time 3D sound localization.

III. NOVEL REAL-TIME ALGORITHM We propose a new algorithm of a stereo headphone-based

multi-channel 3D sound generation system using HRTF. First, the 3D sound localization algorithm for a mono sound is implemented using the HRTF, and then virtual 3D sound is generated by localizing each 5.1 channel signal to appropriate location. We use reduced CIPIC DB for sound localization processing.

In general, binaural filtering of a sound source with HRTFs can localize the sound desired elevation and azimuth. But it has some problems such as the cone-of-confusion and inside-the-head localization. To alleviate the cone-of-confusion, we apply the conventional algorithms that modify the spectrum of HRTF and we also propose a new algorithm that boosts or suppresses

0-7803-8521-7/04/$20.00 (C) 2004 IEEE

47980-7803-8521-7/04/$20.00 © 2004 IEEE

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KANPUR. Downloaded on February 26,2010 at 12:12:51 EST from IEEE Xplore. Restrictions apply.

the spectral difference between front and back HRTFs as weight function. These algorithms improve somewhat the front-back confusion but result in kind of distortion of sound. Adding the reverberation effect by designing an appropriate reverberator solves the problem of inside-the-head localization. This devised reverberator is composed of early reflection generator and late reverberation generator. To achieve effective reverberation effect and reasonable computational complexity, the early reflection generator is designed as FIR filter and the late reverberation generator is designed as IIR filter filters with all-pass filters and comb filters.

The computational complexity of the conventional methods in time domain is too heavy to use in real-time application. So frequency domain method is usually adopted by using the FFT algorithm. But in this case, transforming time-domain signal of 5.1 channels into frequency domain signal and inverse transforming those signals after processing is another kind of overhead. So we use virtual signal processing using MDCT. We use AC-3(Dolby Digital) and MPEG-AAC algorithm to get 5.1 channel data and these algorithm use IMDCT to transforms the exponents and mantissa into the time domain and produces the decoded PCM time samples. MDCT has the correspondence between the time domain convolution and MDCT domain multiplication. So we construct HRTF database via MDCT and achieve HRTF effect by multiplying MDCT coefficient of each channel input and HRTF coefficients of user demanding property.

30°

110°

L

R SLS

C

R

Figure 1. Listening space recommended by ITU-R

Proposed algorithm is programmed with Visual C/C++ for PC to examine performance and check complexity. The virtual listening space, which is shown in Fig. 1, is modeled according to the listening room recommended by ITU-R. We have developed the spectral modification technique to improve the front-back confusion and the reverberator to overcome the problem of inside-the-head localization. In subjective listening test, devised 3D sound algorithm generates live and rich 3D sound

IV. DSP IMPLEMENTATION To implement real-time 3D sound system, we developed

software and hardware both. The software is implemented based on devised real-time algorithm. The AC-3 and MPEG-AAC algorithm is used to generate multi-channel audio signal, which is used for sound localization. The AC-3 and MPEG-AAC algorithm is also real-time implemented. The hardware designed using TMS320C6713 DSP chip.

A. S/W scheme for TMS320C67xx DSP chip We perform floating-point C optimization using

TMS320C67xx compiler intrinsic for cross compile. And we analyzed the efficiency of assembly code resulting from cross compile at each subroutine. We carried out linear assembly optimization for non-efficient subroutine and specially hand-coded assembly language optimization for high complexity subroutine.

Table I shows the results of the real-time implementation of devised 3D sound system using AC-3 algorithm for multi-channel audio signal. And, Table II shows the results of the real-time implementation of our devised 3D sound system using MPEG-AAC algorithm for multi-channel audio signal. The frame length of AC-3 algorithm is 32 msec and the frame length of MPEG-AAC algorithm is 21.333 msec. The operation clcoks of TMS320C6713 DSP is 200MHz.

TABLE I. IMPLEMENTATION RESULTS USING AC-3 ALGORITHM

Item Performance Program/Data Size 210 K-byte

AC-3 decoding 70.5 MHz (35%) Complexity 3D processing 22.5 MHz (11%)

TABLE II. IMPLEMENTATION RESULTS USING AAC ALGORITHM

Item Performance Program Size 128 K-byte Data Size 960 K-byte

AAC decoding 109 MHz (54%) Complexity 3D processing 50.6 MHz (25%)

As shown in implementation results, the real-time

implementation using MPEG-AAC algorithm is not fully optimized. We have some idea for reduction of complexity and program/data size.

B. Hardware Design The board, which block diagram is shown in Fig. 2, is

called Virtual Sound System (VSS). The VSS is operable in stand-alone mode and add-on mode.

This board consists of a TMS320C6713 DSP chip, SDRAM, DPRAM, Flash-ROM, audio signal interface and control logic. The audio signal interface is consisting of s/pdif interface that is used for digital transfer of packet data and analog codec that is used for D/A conversion. The SDRAM is used for external program and data memory. DPRAM transmits and receives muti-channel audio packet data and external commands from the host modules. Flash-ROM is used

0-7803-8521-7/04/$20.00 (C) 2004 IEEE

47990-7803-8521-7/04/$20.00 © 2004 IEEE

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KANPUR. Downloaded on February 26,2010 at 12:12:51 EST from IEEE Xplore. Restrictions apply.

for the purpose of self-booting. We use an FPGA for generating signal to control other devices. Also, We use J-tag port for debugging VSS and RS-232 port for monitoring VSS. The LCD module is used to notify the status of VSS

D PR A M

D SP(TM S320C 6713 )

A udioInterface

FLA SHR O M FPG A

P ow er

SD R A M C lock

CO N 2

CO N 1

J-TagPort

R S-232LC D

Interface

Figure 2. Block diagram of VSS HW

The architecture of the software, which includes real-time multi-channel audio decoding and 3D processing, is shown in Fig. 3. The software for real-time operation consists of a main control module and three functional modules.

AudioDecoding

Command Handler

PCM Interrupt Handler

Rx Frame

Audio signalInterface

3D Processing

HostProcessorInterface

Figure 3. Software architecture of VSS

The three functional modules are audio decoding module, 3D processing module and interrupt handler. The interrupt handler module is divided PCM interrupt handling routine and external command handling routine. The PCM interrupt handler controls output of PCM data via audio signal interface and operation of audio processing module. The external command handler controls service to command that is sent by host processor and receive of packet data.

C. Stand-alone Test System We also develop the stand-alone test system, which is

called Sound Test Equipment (STE), to verify operation of VSS. A STE, which is mounted a VSS, is shown in Fig. 4.

Figure 4. Picture of STE HW

TMS320C6713 DSP chip is used as host processor of this system. We use J-TAG port for debugging. This system has s/pdif interface for digital transfer of packet data. We directly examine performance of VSS using DVD player or D-VHS player. Through s/pdif interface audio packet data is transferred from DVD player to TMS3206713 DSP in STE.

CONCLUSION In this paper, we present effective 3D sound algorithm and

DSP implementation of 3D sound system with the use of an embedded DSP. We present an efficient software scheme for real-time implementation. We perform C-code optimization, linear assembly optimization, and hand-coded assembly optimization for real-time operation. We also describe hardware design and efforts for verifying its operation using our stand-alone test system. The 3D sound software and hardware have been evaluated this test system.

Our new algorithm efficiently provides a listener with 3D sound through headphone. In informal listening test, we confirm that the implemented 3D sound system generates live and rich 3D sound compared with a simple stereo or Dolby surround system.

REFERENCES

[1] G. Eason, B. Noble, and I. N. Sneddon, “On certain integrals of [1] M. Okamoto, I. Kinoshita, S.Aoki, and H. Matusi. “Sound image rendering system for headphone”. IEEE Trans. Consumer Electronics, vol. 43, 1997, pp. 689-693

[2] C.P. Brown and R. O. Duda, “A Structural model for binaural sound”, IEEE Trans, Speech Audio Precessing, vol. 6, 1998, pp 476-488.

[3] E. M. Wenzel, M. Arruda, D. J. Kistler, and F. L. Winghtman, “Localization using nonindividualized head-related transfer functions” J. Acoust. Soc. Amer., vol. 94, 1993, pp. 111-123.

[4] F. Asano, Y. Suzuki, and T. Sone, “Role of Spectral cues in median plane localization” J. Acoust. Soc. Amer., vol. 88, 1990, pp. 159-169.

[5] E. M. Wenzel and S.H. Foster, “Realtime digital systhesis of virtual acoustic environments”, Computer Graphics, vol. 24, pp. 139-140, March 1990

0-7803-8521-7/04/$20.00 (C) 2004 IEEE

48000-7803-8521-7/04/$20.00 © 2004 IEEE

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KANPUR. Downloaded on February 26,2010 at 12:12:51 EST from IEEE Xplore. Restrictions apply.