time-reversal approach to the stereophonic acoustic echo cancellation problem

11
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 2, FEBRUARY 2011 385 Time-Reversal Approach to the Stereophonic Acoustic Echo Cancellation Problem Dinh-Quy Nguyen, Student Member, IEEE, Woon-Seng Gan, Senior Member, IEEE, and Andy W. H. Khong, Member, IEEE Abstract—Stereophonic acoustic echo cancellation (SAEC) plays an important role in delivering realistic teleconferencing experi- ence. The fundamental problem of SAEC system is that stereo- phonic channels are linearly related and this results in slow con- vergence of the adaptive filters. In this paper, we present a novel algorithm by employing a selective time-reversal block to solve the SAEC problem which results in a significant increase in the con- vergence performance of adaptive filters such that the stereophonic image as well as quality are preserved. The proposed algorithm em- ploys time-reversal operation on selective blocks of input data sam- ples for one of the two channels to decorrelate stereophonic chan- nels in the SAEC system. To achieve good stereophonic perception, time-reversal operation is only applied to the selective blocks whose magnitudes fall below a pre-determined threshold. Theoretical and numerical simulation results are also studied and investigated to show that the proposed algorithm achieves faster convergence in terms of normalized misalignment and better stereophonic percep- tion with less audio distortion compared to the well-known non- linear transformation algorithm for the SAEC system. Index Terms—Decorrelation techniques, nonlinear trans- formation, stereophonic acoustic echo cancellation (SAEC), time-reversal. I. INTRODUCTION S TEREOPHONIC acoustic echo cancellation (SAEC) is an important area of research for applications including video/ teleconferencing as well as virtual gaming. These systems can enhance spatial information which in turn provides a more im- mersive experience for users. Stereophonic acoustic echo can- cellers often employ a pair of adaptive filters for the estimation of acoustic impulse responses in the receiving room. The effec- tiveness of echo cancellation is often determined by the perfor- mance of these adaptive filters which is quantified by both their convergence rate and their steady-state normalized misalign- ment. One of the main challenges for a stereophonic acoustic echo canceller is that it suffers from poor convergence [1], [2]. This poor convergence has been found to be caused by the high Manuscript received November 04, 2009; revised February 03, 2010; accepted April 15, 2010. Date of publication April 22, 2010; date of current version October 29, 2010. This work is supported by the Singapore National Research Foundation Interactive Digital Media R&D Program, under research grant NRF2007IDM-IDM002-086. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Sharon Gannot. The authors are with the with the Digital Signal Processing Laboratory, Nanyang Technological University, Singapore (e-mail: [email protected]; [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TASL.2010.2048941 interchannel coherence between the two-channel input signals, which as a consequence, give rise to an ill-conditioning problem [1]. To address this problem, several algorithms have since been proposed with the common aim of achieving high rate of con- vergence by decorrelating the stereophonic input signals. One of the first algorithms proposed for SAEC involves the addition of random noise to the stereophonic signals [3]. A sim- ilar approach computes the amount of noise to be added adap- tively through the use of variable gain and energy detection so as to achieve sufficient decorrelation for fast convergence [4]. One of the main disadvantages of this approach is the degradation of audio quality in order to achieve sufficiently high convergence for the adaptive filters. To address the tradeoff between maintaining audio quality and decorrelation, the use of psychoacoustics have been pro- posed [5], [6]. The first method introduces the addition of spec- trally shaped random noise that is designed to be an auxiliary signal based on the human auditory properties [5]. Although this technique aims to maintain audio quality by reducing the effect of perceived noise, the operation of frequency masking results in an undesirable processing delay. The signal decorre- lation technique proposed in [6] approximates the masking pat- terns by employing an autoregressive analysis for the source in the transmission room. These techniques however require high computational complexity in terms of their design and imple- mentation. The use of time-varying all-pass filters have also been studied in two signal decorrelation techniques [7], [8]. In particular, the first technique achieves signal decorrelation by altering the phase of the transmitted signals. This is achieved by filtering the signals with two independent time-varying first-order all- pass filters [7]. This approach can achieve significant reduc- tion in correlation for higher frequencies with the imposed con- straints. However, lower frequencies are still fairly unaffected by the time variation. The second technique applies a periodi- cally varying filter to either the left or right channel. The signal is either delayed by one sample or unprocessed without any delay [8]. Although these techniques can achieve high convergence for the adaptive filters, altering the phase of the transmitted signals can change the stereophonic perception of the SAEC system. Nonlinear transformation to the original stereophonic signals has also been proposed to reduce the linear relationship between the transmitted signals [1]. Due to the nonlinearity principle, this transformation can reduce the interchannel coherence and consequently increase the convergence rate of the adaptive fil- ters. Based on similar concept, a new configuration with non- linear preprocessing is also proposed in [9]. In this configu- ration, the nonlinear processed signals are used as inputs to the exclusive adaptive filters before tap estimates are copied to a fixed two-channel filter is subsequently used to perform the 1558-7916/$26.00 © 2010 IEEE

Upload: awh

Post on 23-Sep-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Time-Reversal Approach to the Stereophonic Acoustic Echo Cancellation Problem

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 2, FEBRUARY 2011 385

Time-Reversal Approach to the StereophonicAcoustic Echo Cancellation Problem

Dinh-Quy Nguyen, Student Member, IEEE, Woon-Seng Gan, Senior Member, IEEE, andAndy W. H. Khong, Member, IEEE

Abstract—Stereophonic acoustic echo cancellation (SAEC) playsan important role in delivering realistic teleconferencing experi-ence. The fundamental problem of SAEC system is that stereo-phonic channels are linearly related and this results in slow con-vergence of the adaptive filters. In this paper, we present a novelalgorithm by employing a selective time-reversal block to solve theSAEC problem which results in a significant increase in the con-vergence performance of adaptive filters such that the stereophonicimage as well as quality are preserved. The proposed algorithm em-ploys time-reversal operation on selective blocks of input data sam-ples for one of the two channels to decorrelate stereophonic chan-nels in the SAEC system. To achieve good stereophonic perception,time-reversal operation is only applied to the selective blocks whosemagnitudes fall below a pre-determined threshold. Theoretical andnumerical simulation results are also studied and investigated toshow that the proposed algorithm achieves faster convergence interms of normalized misalignment and better stereophonic percep-tion with less audio distortion compared to the well-known non-linear transformation algorithm for the SAEC system.

Index Terms—Decorrelation techniques, nonlinear trans-formation, stereophonic acoustic echo cancellation (SAEC),time-reversal.

I. INTRODUCTION

S TEREOPHONIC acoustic echo cancellation (SAEC) is animportant area of research for applications including video/

teleconferencing as well as virtual gaming. These systems canenhance spatial information which in turn provides a more im-mersive experience for users. Stereophonic acoustic echo can-cellers often employ a pair of adaptive filters for the estimationof acoustic impulse responses in the receiving room. The effec-tiveness of echo cancellation is often determined by the perfor-mance of these adaptive filters which is quantified by both theirconvergence rate and their steady-state normalized misalign-ment. One of the main challenges for a stereophonic acousticecho canceller is that it suffers from poor convergence [1], [2].This poor convergence has been found to be caused by the high

Manuscript received November 04, 2009; revised February 03, 2010;accepted April 15, 2010. Date of publication April 22, 2010; date of currentversion October 29, 2010. This work is supported by the Singapore NationalResearch Foundation Interactive Digital Media R&D Program, under researchgrant NRF2007IDM-IDM002-086. The associate editor coordinating thereview of this manuscript and approving it for publication was Dr. SharonGannot.

The authors are with the with the Digital Signal Processing Laboratory,Nanyang Technological University, Singapore (e-mail: [email protected];[email protected]; [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TASL.2010.2048941

interchannel coherence between the two-channel input signals,which as a consequence, give rise to an ill-conditioning problem[1]. To address this problem, several algorithms have since beenproposed with the common aim of achieving high rate of con-vergence by decorrelating the stereophonic input signals.

One of the first algorithms proposed for SAEC involves theaddition of random noise to the stereophonic signals [3]. A sim-ilar approach computes the amount of noise to be added adap-tively through the use of variable gain and energy detection so asto achieve sufficient decorrelation for fast convergence [4]. Oneof the main disadvantages of this approach is the degradation ofaudio quality in order to achieve sufficiently high convergencefor the adaptive filters.

To address the tradeoff between maintaining audio qualityand decorrelation, the use of psychoacoustics have been pro-posed [5], [6]. The first method introduces the addition of spec-trally shaped random noise that is designed to be an auxiliarysignal based on the human auditory properties [5]. Althoughthis technique aims to maintain audio quality by reducing theeffect of perceived noise, the operation of frequency maskingresults in an undesirable processing delay. The signal decorre-lation technique proposed in [6] approximates the masking pat-terns by employing an autoregressive analysis for the source inthe transmission room. These techniques however require highcomputational complexity in terms of their design and imple-mentation.

The use of time-varying all-pass filters have also been studiedin two signal decorrelation techniques [7], [8]. In particular,the first technique achieves signal decorrelation by altering thephase of the transmitted signals. This is achieved by filteringthe signals with two independent time-varying first-order all-pass filters [7]. This approach can achieve significant reduc-tion in correlation for higher frequencies with the imposed con-straints. However, lower frequencies are still fairly unaffectedby the time variation. The second technique applies a periodi-cally varying filter to either the left or right channel. The signal iseither delayed by one sample or unprocessed without any delay[8]. Although these techniques can achieve high convergence forthe adaptive filters, altering the phase of the transmitted signalscan change the stereophonic perception of the SAEC system.

Nonlinear transformation to the original stereophonic signalshas also been proposed to reduce the linear relationship betweenthe transmitted signals [1]. Due to the nonlinearity principle,this transformation can reduce the interchannel coherence andconsequently increase the convergence rate of the adaptive fil-ters. Based on similar concept, a new configuration with non-linear preprocessing is also proposed in [9]. In this configu-ration, the nonlinear processed signals are used as inputs tothe exclusive adaptive filters before tap estimates are copied toa fixed two-channel filter is subsequently used to perform the

1558-7916/$26.00 © 2010 IEEE

Page 2: Time-Reversal Approach to the Stereophonic Acoustic Echo Cancellation Problem

386 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 2, FEBRUARY 2011

echo cancellation using the unprocessed signals. The advan-tage of this method is the ability to use the simple normalizedleast-mean-square (NLMS) algorithm to achieve high conver-gence rate, instead of the more complex adaptive algorithmssuch as recursive least squares (RLS) and affine projection (AP)algorithms. However, the stereophonic perception of the pro-cessed signals is somewhat degraded due to additional harmonicdistortions introduced by the nonlinearities [10]. Another ap-proach is to decorrelate the channels by means of complemen-tary comb filters [3], [11]. This technique is based on removingthe energy in certain frequency bands of the speech signal fromone channel. It works well for frequencies above 1 kHz, but itrequires a combination of nonlinear transformations at lowerfrequencies. A survey of existing solutions for decorrelation inthe SAEC system can be found in [12]. Among these methods,the nonlinear transformation [1] provides an effective approachto achieve signal decorrelation which results in good conver-gence performance. The nonlinear transformation was investi-gated using different types of nonlinearities [13] and it has beenshown that the half-wave rectifier (HWR) offers a good tradeoffin terms of stereophonic quality as well as convergence rate.

In addition to the above decorrelation techniques, there aresome different classes of adaptive filtering algorithms developedto solve the SAEC problem. Specifically, a multichannel AP al-gorithm was also proposed to overcome the slow convergencerate of adaptive algorithm by using additional projections [14].Recently, the exclusive-maximum (XM) tap-selection algorithmhas been developed to reduce the high interchannel coherencebetween stereophonic signals by selecting an exclusive set offilter coefficients to update at each sample iteration [15]. It thenminimizes the degradation in convergence rate due to tap-selec-tion by jointly maximizing the energies of the selected tap-in-puts. This technique can be applied to NLMS, AP, and RLS al-gorithms with a nonlinear processor to achieve a significant im-provement in rate of convergence. Hence, as can be seen by thediscussions above, there is often a tradeoff between convergencerate of the adaptive filters and stereophonic perception since theprocessed signals using such decorrelation techniques often in-troduce additional signals either in the form of noise or distor-tion which in turn degrade the stereophonic quality or image.

In parallel with the development of SAEC research, time-re-versal (TR) transformation has been proposed as a novel tech-nique for the focusing of acoustic waves. This technique canreverse a given process or signal samples in the time domainand has been widely applied in the field of acoustics to performsound focusing [16] as well as acoustic source localization [17].These applications are based on the TR property that is able toretrace forward wave propagation which subsequently focusesonto an initial source position. In this paper, we show that we canachieve signal decorrelation in the SAEC system by exploitingthe beneficial property that is inherent in a time-reversed signal.To achieve this, the TR technique is only applied to one channelof the SAEC system which in turn reduces the linear relationbetween the stereophonic signals [18]. As will be shown in thispaper, we exploit the intrinsic beneficial properties of TR trans-formation which preserves the magnitude response of the orig-inal signal so as to minimize frequency distortion. To address theundesirable phase distortion brought about by this TR process,we propose to process one of the input signals based on its av-

Fig. 1. Schematic diagram of SAEC system. Note that only single-channel al-gorithm is considered, i.e., one of the two microphones in the receiving room isneglected.

erage energy within a given frame. Our motivation of employingthis selective time-reversal block (STRB) technique is to ensurethat both the audio quality and the stereophonic image of theSAEC system are preserved as much as possible. In order to il-lustrate the benefits brought about by this TR process, we incor-porate the proposed STRB algorithm with the NLMS algorithmfor SAEC application.

II. REVIEW OF SAEC PROBLEM AND THE PROPOSED SOLUTION

A. Review of the SAEC Problem and Nonlinear TransformationSolution

Fig. 1 shows an SAEC system, where two microphonesin the transmission room pick up speech signals froma source through two acoustic impulse responses

each of length and 1,2 is defined as the channel index. Stereophonic signalsare transmitted to the receiving room, which in turn are coupledto both microphones in the receiving room via another set ofacoustic echo paths , where is de-fined as the length of . Similar to that of [1], we describe theproblem of SAEC for one microphone since similar discussioncan be extended to second microphone in the receiving room.

In order to minimize the echo in the SAEC system,a pair of finite impulse response (FIR) adaptive filters

is employed wherethe length of the adaptive filters is assumed to be the sameas that for . These adaptive filters are used to estimate theunknown acoustic echo paths in the receiving room andthe output of these adaptive filters are given by

(1)

where for 1,2, are the tap-input vectors of length . The microphone signalin the receiving room is then given by

(2)

where is defined as the background noise. Employing (1)and (2), the acoustic echo is then given as

(3)

Page 3: Time-Reversal Approach to the Stereophonic Acoustic Echo Cancellation Problem

NGUYEN et al.: TIME-REVERSAL APPROACH TO THE SAEC PROBLEM 387

It has been shown and described comprehensively in [1] that, fora realistic SAEC system where , a unique solution exists.However, due to the high interchannel coherence between inputvectors and , the convergence rate of the adaptivefilters is reduced significantly.

The main reason for the high interchannel coherence in anSAEC system is the existence of a linear relation between tap-input vectors and due to them being generatedfrom the same source in the transmission room. In particular,with a linear and time invariant system, such as occur in thetransmission room, the relation between the stereophonic sig-nals for the SAEC system [1] can be expressed as

(4)

where forchannels 1, 2.

As discussed in Section I, the high interchannel coherencedegrades the convergence performance of adaptive filters thatare employed in SAEC systems. In addition, it has been shownin [19] that reducing the interchannel coherence improvesthe conditioning of the SAEC problem, which in turn bringsabout an improvement in the convergence rate of the adaptivefilters. Thus, to reduce the effect of high interchannel coherencebrought about by the above linear relation, one possible solutionis to apply nonlinear transformation to the stereophonic signalsbefore transmitting these processed signals to the receivingroom in the SAEC system [1]. The nonlinear transformationsolution in the SAEC problem is to add nonlinear signals to theoriginal stereophonic signals as

(5)

where is a nonlinear function and is a control variable thatdetermines the tradeoff between convergence rate and the audioquality.

Of the several types of nonlinearities that have been investi-gated in [13], the half-wave rectifier (HWR) function is foundto achieve good convergence rate and stereophonic perceptionperformance. This function can be expressed as

(6)

giving

(7a)

(7b)

where [1]. Further studies on the effect of theHWR on audio quality show that when approaches 0.5, the lis-tener experiences harmonic distortions introduced by the HWRtransformation [13].

B. Proposed Selective Time-Reversal Block (STRB) Solution

To address the SAEC problem, the main task is to reducethe linear relation given by (4) between stereophonic signals.We therefore propose to employ TR to only one channel of theSAEC system. However, it is expected that TR can severelydistort the audio quality and stereophonic perception if it isbeing applied across every frame of this channel. Hence, inorder to preserve the audio quality as well as the stereophonic

perception, we propose a selective time-reversal block (STRB)algorithm that only selects and time-reverses input blocks ofone channel with an average magnitude less than a specifiedthreshold. Following that, we will discuss how our proposed al-gorithm can be incorporated to the NLMS algorithm.

1) Decorrelating Stereophonic Signals Using TR Technique:In this subsection, we investigate and show that time-reversinga frame corresponding to one channel can reduce the linear re-lationship of the tap-input vectors which as a consequence re-duces the interchannel coherence. Due to the high coherence be-tween two channels, we can choose any channel to apply the TRtechnique. In this paper, we perform TR processing on the firstchannel and leave the second channel unprocessed. It is usefulto note that the stereophonic signals are first normalized beforeperforming the STRB algorithm.

Thus, if we apply TR processing on and assume thatthe linear relation among stereophonic signals after this trans-formation still exists, it implies that

(8)

where isdefined as the time-reversed signal of . Combining (8)with (4), the linear relation between and ex-ists if and only if

(9)

For this equation to be valid, either of the two conditions listedbelow have to be satisfied:

• , which indicates that the impulse response inthe transmission room associated with the second channelhas all its amplitudes equal to zero;

• , which implies that the input signalof the first channel is symmetric in time-domain.

It is foreseeable that these two conditions are violated in prac-tice. This is because an acoustic impulse response always has fi-nite amplitude due to the finite distance between the source andmicrophone as well as reverberation of the transmission room[1]. In addition, input signals such as speech, music or noiseare seldom symmetric within a time-domain processing block[10]. These two conditions jointly show that (8) cannot be sat-isfied and as a result, TR processing can efficiently decorrelatestereophonic signals.

2) Preserving Stereophonic Perception Using Magnitude De-tector: It is expected that the proposed technique to decorrelatethe stereophonic signals will significantly degrade the stereo-phonic perception if this process is applied to all input signalblocks of the first channel. To address this concern, we pro-pose to select and time-reverse only blocks of input signal whichare relatively inaudible to the listener. To achieve this, we em-ploy a magnitude detector which selects input blocks having anaverage magnitude smaller than a specified threshold . Theseblocks are in turn selected to perform TR processing, whilethose with an average magnitude above will be unprocessed.

In order to perform TR, the input signals are preprocessed inblocks of samples where . Defining an input block ofthe first channel as

(10)

Page 4: Time-Reversal Approach to the Stereophonic Acoustic Echo Cancellation Problem

388 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 2, FEBRUARY 2011

Fig. 2. Procedure in implementing STRB-NLMS algorithm in the SAECsystem.

where denotes the block index, the time-reversed input signalis then expressed as

(11)In order to reduce perceived signal distortion, we compute themean absolute magnitude

(12)

and time-reverse only blocks with , i.e.,

ifif

(13)

To evaluate the appropriate specified threshold for the TRoperation, the sound pressure level (SPL) for a typical 16-bitimplementation is considered. The relationship between SPLand the amplitude of the speech input signal can be expressedas

SPL dB (14)

Under the absolute threshold in quiet region approximated byTerhardt [20], the typical SPL in a meeting room is 50 dB whichis equivalent to the average magnitude of the speech signal in aninaudible region of 0.005. It is important to note that in a prac-tical teleconferencing system, a comfort noise is often insertedat the network terminals in order to mask pauses in order to pro-vide a signal to the far end user when the transmission is inter-rupted [21]. Thus, a larger threshold than 0.005 can also be usedto sustain good audio quality in a meeting room.

3) STRB-NLMS Algorithm: To reduce the echo from thereceiving room, two adaptive filters employing the NLMSalgorithm can be used. After performing STRB algorithm asstated in (13), the data samples of the th block are usedfor sample-by-sample adaptation. While the filter coefficientsof the NLMS algorithm are being updated using the thblock, our proposed STRB algorithm is concurrently applied to

th block, as shown in Fig. 2. It is therefore expectedthat, since STRB is applied to the first channel, there will be adelay of samples with respect to the other channel without

TABLE ISTRB-NLMS ALGORITHM

STRB preprocessing. In order to preserve the stereophonicimage, we propose that the second channel is also delayedby samples without STRB preprocessing. Defining asthe regularization parameter for the NLMS algorithm, theSTRB-NLMS algorithm is listed in Table I.

It is also useful to note that the proposed STRB algorithmcan easily be implemented. In practice, digital signal processorscommonly provide real-time block processing algorithms andthe TR operation can be easily achieved by using circular bufferaddressing [22]. In addition, the computational complexity ofour STRB algorithm is largely dependent on the magnitude de-tector algorithm, which only requires additions and onedivision in each block signal.

III. QUANTITATIVE PERFORMANCE OF STRB ALGORITHM

We evaluate the quantitative and subjective performanceof the STRB algorithm on the transmitted signals and

in terms of their interchannel coherence and distortionmeasurements both in time and frequency domains. In addition,we illustrate the comparisons between our proposed STRBalgorithm and the HWR processor [1] as well as the selectiverandom noise addition (SRNA) algorithm. For fair comparison,the SRNA algorithm is modified from the work reported in[3], where random noise is only added on selected blocks withmean absolute magnitude smaller than the specified threshold. In numerical simulations, we generate the impulse responses

using the method of images [23] with the source location at{2.2, 1.5, 1.6} m, while the two microphones are placed at {1.5,2.5, 1.6} m and {2.5, 2.5, 1.6} m in a room having dimensionsof m . The reverberation time (RT60) is set to 128ms in each simulation.

A. Interchannel Coherence

We first show that the STRB algorithm reduces interchannelcoherence between the stereophonic signals by first defining theinterchannel coherence between and as

(15)

Page 5: Time-Reversal Approach to the Stereophonic Acoustic Echo Cancellation Problem

NGUYEN et al.: TIME-REVERSAL APPROACH TO THE SAEC PROBLEM 389

where

and is the normalized frequency. When applying the STRBalgorithm on the first channel, the interchannel coherence be-tween the stereophonic signals is given by

(16)

We note that a time-reversed signal does not change the signalamplitude since it only reverses the elemental positions of theelement in . As a result of this intrinsic property, the auto-correlation function of the STRB processed signal is the sameas the auto-correlation of the original signal, i.e.,

(17)

By taking Fourier transform of above auto-correlation functionsin (17), we obtain the auto-spectral density function in the fre-quency domain giving

(18)

On the other hand, the cross-spectral density function be-tween STRB processed signals and is

(19)

An important relation involving the magnitude of the cross-spectrum is the cross-spectrum inequality, given by [24]

(20)

Substituting (18) into (20), we have

(21)

In the SAEC system, due to the linear relation between thestereophonic signals in the time domain, the interchannel co-herence function is very close to unity [1]. Thus, for the unpro-cessed SAEC system

(22)

By substituting (22) into (21), we can deduce

(23)

which as a consequence implies that

(24)

Fig. 3. Interchannel coherence plots for (a) no decorrelation, (b) HWR algo-rithm, (c) SRNA algorithm, and (d) STRB algorithm.

Hence, our proposed STRB algorithm can reduce the inter-channel coherence of the original stereophonic signals in theSAEC system.

To further illustrate the above, we compare the interchannelcoherence of the proposed STRB algorithm with HWR andSRNA algorithms. We then compute, using (15) and (16), thetime-averaged interchannel coherence for a speech sequenceof 10 s using a frame size of 128 ms at 8-kHz sampling rate.This time-averaged interchannel coherence is then plottedagainst frequency as shown in Fig. 3. As can be seen fromFig. 3(a) and (d), the time-averaged interchannel coherencemagnitude using our proposed STRB algorithm is smallerthan that of the original signals across all frequencies. Moreimportantly, the proposed STRB algorithm achieves smallermean interchannel coherence than the HWR algorithm with anonlinearity control factor . This observation can alsobe made when our proposed algorithm is compared with theSRNA algorithm using a signal-to-noise ratio (SNR) of 30 dB.The average reduction in mean interchannel coherence acrossall frequencies for the STRB algorithm over HWR and SRNAalgorithms are 0.0278 and 0.0655, respectively.

B. Objective Distortion Measurement in Time Domain

To compare the distortions introduced by different decorrela-tion techniques in time domain, we evaluate the speech differ-ence between the original and the processed speech signals. Forclarity of presentation, we chose a speech segment with a dura-tion of 2 s received from the first microphone in the transmis-sion room. Fig. 4(a) shows the speech difference between theoriginal and the HWR processed speech with , whileFig. 4(b) shows such differences for the proposed STRB pro-cessed speech signal with and . ComparingFig. 4(a) and (b), it can be seen that the STRB algorithm in-troduces significantly less distortion to the speech signal thanthe HWR algorithm in the first channel. On the other hand, theHWR algorithm introduces distortion proportional to the mag-nitude of the received signal for decorrelation in both channelsof the SAEC system. It is important to note that, unlike the HWRalgorithm, our proposed STRB algorithm does not process thesecond channel and hence no distortion is introduced in thischannel.

Page 6: Time-Reversal Approach to the Stereophonic Acoustic Echo Cancellation Problem

390 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 2, FEBRUARY 2011

Fig. 4. Speech differences between original and (a) HWR processed signalsand (b) STRB processed signals.

Fig. 5. PSDR of HWR algorithm with � � ��� (dashed line) and STRB algo-rithm with � in the range (0, 0.3] (solid line).

In order to quantify the effect of the speech difference in-troduced by the proposed STRB algorithm, we define the peaksignal-to-difference ratio (PSDR) as

PSDR (25)

where is the mean speech difference between the original andprocessed speech signals. Within a block of the STRB processedsignal, the mean speech difference is always smaller than thechosen magnitude threshold . Thus, we can relate the PSDR ofthe STRB algorithm to the magnitude threshold as

PSDR (26)

To evaluate distortions of the STRB processed signal in timedomain, the variation of PSDR measurements with the specifiedthreshold is plotted as shown in Fig. 5. From this figure, we ob-serve that the minimum PSDR of the STRB algorithm is alwaysgreater than 40 dB and therefore, it is foreseeable that the pro-posed algorithm can sustain good speech quality. Specifically,the PSDR of the STRB algorithm with and HWRalgorithm with are found to be 57.59 and 34.44 dB,respectively. Hence, our proposed algorithm with and

shows a PSDR improvement of 67.22% over the HWRalgorithm with .

Fig. 6. Time-averaged phase differences of (a) HWR processing, (b) STRBprocessing with no overlap, and (c) STRB processing with 50% overlap.

C. Objective Distortion Measurement in Frequency Domain

We now evaluate the speech distortion in terms of phase andmagnitude differences between the original and processed sig-nals of the first channel. The phase and magnitude responsesof the original and processed signals are analyzed using a 128-point fast Fourier transform (FFT). These values are then time-averaged across the entire speech sequence. We then computeand plot the time-averaged phase differences for signals pro-cessed using HWR preprocessing, STRB processing with nooverlap and STRB processing with 50% overlap, respectively,as shown in Fig. 6(a)–(c). We included STRB processing with50% overlap for the purpose of investigating whether any in-stantaneous change between a time-reversed block and a neigh-boring unprocessed block will bring about a large phase distor-tion over the first channel.

As can be seen from Fig. 6, the time-averaged phase differ-ence using STRB algorithm is smaller than that using the HWRalgorithm across most frequencies. Specifically, the time-aver-aged phase difference of the STRB processing with no overlapis approximately three times smaller than that of the HWR algo-rithm. More importantly, we note that the time-averaged phasedifference arising from STRB processing with no overlap is veryclose to that arising from STRB processing with 50% overlap.In particular, the mean phase difference across all frequenciesof STRB processing with 50% overlap is 0.0003 radians smallerthan that of STRB processing with no overlap. This phase differ-ence corresponds to only a 0.012 s time delay at 8-kHz sam-pling rate, which can be negligible, compared to the just no-ticeable difference (JND) for inter-aural time difference (ITD)of 80 s [25]. Since the STRB processing with 50% overlaprequires higher computational load and more complex imple-mentation compared to conventional STRB processing with nooverlap, the STRB processing with no overlap offers a goodtradeoff between computational complexity and phase distor-tion across frames in the first channel. In addition, as can be seenfrom Fig. 6, the STRB processing does not introduce significantphase distortions in low frequency range of 0–1 kHz where themean phase difference across this frequency range is 0.0086 ra-dians. This corresponds to a time delay of 2.73 s which is in-significant in terms of acceptable JND for ITD. Hence, we notethat the proposed STRB algorithm preserves the stereophonicimage of the SAEC system.

Page 7: Time-Reversal Approach to the Stereophonic Acoustic Echo Cancellation Problem

NGUYEN et al.: TIME-REVERSAL APPROACH TO THE SAEC PROBLEM 391

Fig. 7. Spectrograms of signal differences (a) between HWR processed signaland original signal, and (b) STRB processed signal and original signal. Notethat these signals are belong to the first channel of the SAEC system.

The spectrograms of the signal differences between the pro-cessed signals using HWR as well as STRB algorithms and orig-inal signal are also evaluated as shown in Fig. 7(a) and (b),respectively. We observe that the spectrogram differences be-tween the original and STRB processed signals are negligibleacross most time instances. The spectrogram of the signal dif-ference using the STRB algorithm as shown in Fig. 7(b) is muchlower than that using the HWR algorithm as shown in Fig. 7(a)across all frequencies. Hence, we note that the STRB algorithmpreserves much of the speech quality. These observations in fre-quency domain are also consistent with the objective distortionmeasurement results in time domain as explained in above sub-section. These smaller objective distortion measurements can beachieved in the STRB algorithm since it performs time-reversaloperation on selective blocks.

Furthermore, it is important to relate the phase and magnituderesponses with inter-aural time difference (ITD) and inter-aurallevel difference (ILD). According to the duplex theory [26],ITD is used to localize low frequency sounds, while ILD isused in the localization of high-frequency sounds. As can beseen from the discussion above, the STRB processed signal ofthe first channel can achieve negligible phase distortions in lowfrequency range and small magnitude distortions in high-fre-quency range. Since our proposed algorithm does not performany STRB processing on the second channel, the overall jointeffect shows that the proposed algorithm introduces negligibledistortion to the stereophonic image of the transmitted signals.

IV. NUMERICAL SIMULATIONS AND DISCUSSIONS

In numerical simulations, a male speech with duration of 10s is used to verify the effectiveness of our proposed STRB algo-rithm in an SAEC system. Two microphone signals and

are obtained by convolving the speech signal with twoimpulse responses and in the transmission room.These impulse responses are generated using the method of im-ages [23] with the source at {2.2, 1.5, 1.6} m, while the micro-phones are placed at {1.5, 2.5, 1.6} m and {2.5, 2.5, 1.6} m. Theecho is generated by further convolving and withimpulse responses and in the receiving room. Im-pulse responses and are also generated using themethod of images with the loudspeakers at {1, 2.5, 1.6} m and{3, 2.5, 1.6} m and microphone at {2.1, 1.5, 1.6} m. All impulseresponses are of length samples and both rooms areof dimensions m . A sampling frequency of 8 kHzand the two-channel NLMS algorithm with filter length of 512data samples are used throughout the simulation. The reverber-ation time (RT60) is set to 128 ms. A white noise is added to the

microphone signal in the receiving room to achieve an SNR of30 dB.

A. Comparison Between the STRB and HWR Algorithms

In this subsection, we used and for theevaluation of our proposed STRB algorithm. This block lengthresults in a processing delay of 16 ms at 8-kHz sampling rate,which is smaller than the maximum allowable delay budget of252 ms for a real-time implementation [27]. The threshold ischosen as 0.01 which corresponds to 56-dB SPL giving an ac-ceptable audio quality in a common meeting room [21]. TheSTRB algorithm is benchmarked against the HWR algorithmwith a commonly used value of [1]. The quantitativeperformance of the two algorithms are evaluated and comparedin terms of the Bark Spectral Distortion measurement, conver-gence rate of normalized misalignment, echo return loss en-hancement and subjective listening test.

1) Bark Spectral Distortion (BSD) Measurement: In orderto verify that our proposed STRB algorithm introduces less dis-tortion compared to the HWR algorithm, we employ the BSDmeasurement, whereby a smaller value corresponds to a smallerdistortion [28]. The BSD takes into account auditory frequencywarping, critical band integration, amplitude sensitivity varia-tions with frequency, and subjective loudness. The mean BSDof the STRB and HWR algorithms are found to be 0.0010 and0.0274, respectively. Thus, the STRB algorithm shows an im-provement of 96.35% in terms of BSD over the HWR algorithm.Hence, the proposed STRB algorithm can achieve better BSDmeasurement compared to the HWR algorithm in the SAECsystem [1].

2) Convergence Rate of Normalized Misalignment: We nowevaluate the convergence performance of our proposed STRBand the HWR algorithms in terms of normalized misalignmentdefined by

(27)

In the first simulation, we compare the convergence rate be-tween STRB and HWR algorithms using a synthesized whiteGaussian noise (WGN) as input signal. The purpose of usingsynthesized WGN as a source signal is to allow the STRB al-gorithm to selectively time-reverse blocks having energies thatare less than a specified threshold. This also allows us to com-pare our algorithm with that of the HWR algorithm [1]. Thesynthesized WGN is created by concatenating and alternatingbetween two WGNs whose variances are 1 with duration of1 s and 0.01 with duration of 1.5 s, respectively. This synthe-sized WGN closely mimics a typical speech signal [29], whereapproximately 38.53% of the speech duration consists of highspeech energy information while the remaining 61.47% consistsof pauses. The total duration of synthesized WGN is 60 s asshown at the top part of Fig. 8. In order to achieve the samesteady-state normalized misalignment, we use a step-size of 0.5and 0.7 for the STRB and HWR algorithms, respectively. Asshown in Fig. 8, both the STRB and HWR algorithms convergeto the steady-state normalized misalignment of 8.5 dB. The pro-posed STRB algorithm achieves a 2.3-dB improvement in con-vergence rate compared to the HWR algorithm. It is also useful

Page 8: Time-Reversal Approach to the Stereophonic Acoustic Echo Cancellation Problem

392 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 2, FEBRUARY 2011

Fig. 8. Convergence rate of normalized misalignment plots for HWR algorithmwith step-size � � ��� (dashed line) and STRB algorithm with step-size � �

��� (solid line). The input signal is synthesized WGN source.

Fig. 9. (a) Normalized speech and misalignment plots for (b) no decorrelation,(c) HWR algorithm, and (d) STRB algorithm.

to note that the steady-state normalized misalignment is calcu-lated in time-averaged window of 1024 samples. In our subse-quent speech simulations, we use these step-size values whencomparing STRB and HWR algorithms.

In the second simulation based on a male speech input withduration 10 s, we investigate the convergence performances ofthe STRB algorithm labeled as (d) against that using the HWRalgorithm labeled as (c), as shown in Fig. 9. As shown in thetop part of Fig. 9, selective TR processing is only applied tospeech blocks marked by the rectangular dotted lines. As canbe seen from this result, the proposed STRB-NLMS algorithmcan achieve an improvement of approximately 2.6-dB steady-state normalized misalignment compared to the HWR-NLMSalgorithm [1]. The main reason for this better performance is dueto the fact that STRB processing can decorrelate stereophonicsignals more efficiently than the HWR processing in the SAECsystem.

3) Echo-Return-Loss Enhancement (ERLE): Besides evalu-ating the normalized misalignment, the ERLE defined by

ERLE (28)

is chosen as a metric for performance evaluation. Fig. 10 showsthe ERLE performance of the HWR and STRB algorithmsdenoted by dashed and solid lines, respectively. These plotsare generated using a time-averaged window of 1024 samples.As can be seen, our proposed STRB algorithm achieves higher

Fig. 10. ERLE performances using HWR algorithm (dashed line) and STRBalgorithm (solid line).

Fig. 11. Normalized misalignment plots for HWR algorithm (dashed line) andSTRB algorithm (solid line). An abrupt change is made in the transmission roomat 5 seconds.

ERLE than the HWR algorithm throughout the speech dura-tion. In particular, the time-averaged ERLE performance usingthe STRB algorithm is 2 dB higher than that using the HWRalgorithm.

B. Robustness of the STRB Algorithm in the SAEC System

In this subsection, the proposed STRB algorithm is simu-lated in two different conditions including an abrupt change inthe transmission room impulse responses [30] and backgroundnoise addition to speech signals in the transmission room. Thisabrupt change is achieved by shifting the source from {2.2, 1.5,1.6} m to {1.9, 1.5, 1.6} m, after 5 s from the start of the adaptiveprocess. All acoustic impulse responses are generated using themethod of images [23] with the microphones at {1.5, 2.5, 1.6} mand {2.5, 2.5, 1.6} m. As shown in Fig. 11, our proposed STRBalgorithm can achieve an approximately 4.7-dB improvement innormalized misalignment compared to the HWR algorithm be-fore the echo path change. After the echo path changes at 5 s, theSTRB algorithm is able to track the unknown system by main-taining its high convergence rate giving approximately 2-dB im-provement in convergence over that of the HWR algorithm.

Second, the proposed STRB algorithm is verified for the caseof input speeches containing a background noise in the trans-mission room. The background noise is chosen as a commonroom acoustic noise, which is added to the input speeches. Inthis experiment, the impulse responses in the transmission room

Page 9: Time-Reversal Approach to the Stereophonic Acoustic Echo Cancellation Problem

NGUYEN et al.: TIME-REVERSAL APPROACH TO THE SAEC PROBLEM 393

Fig. 12. (a) Normalized speech containing background noise in the transmis-sion room and normalized misalignment plots for (b) no decorrelation (dottedline), (c) HWR algorithm with � � ��� (dashed line), and (d) STRB algorithmwith � � ���� and � � ��� (solid line).

are generated using the method of images [23] with the sourceat {2.2, 1.5, 1.6} m, while the microphones are placed at {1.5,2.5, 1.6} m and {2.5, 2.5, 1.6} m. These impulse responses areof length samples and the transmission room is ofdimensions m . Due to background noise added toachieve an SNR of 20 dB, the threshold is adjusted to 0.03so as to achieve sufficient convergence of the STRB algorithm.As shown in Fig. 12, the STRB algorithm achieves 7-dB im-provement in terms of steady-state normalized misalignmentcompared to the HWR algorithm. In listening tests, the unde-sirable harmonic distortion of the HWR processed speech canbe perceived more clearly. On the other hand, the STRB pro-cessed speech is perceived to be similar to that of the orig-inal speech. This is due to the fact that the background noisecan mask time-reversed signal with small amplitude. Hence,our proposed STRB algorithm is robust under some differentchanges in the transmission room.

C. Investigation of the STRB Parameters in the SAEC System

The threshold and the block length of the STRB algorithm arestudied and their effects on convergence performance are inves-tigated in this subsection. We first evaluate the performance ofthe STRB algorithm for different thresholds ( , 0.01,0.02 and 0.03) and a fixed block length . As shown inFig. 13, the convergence rate increases with threshold at the ex-pense of degradation in terms of audio quality. With ,the distortions caused by blocks of time-reversed speech are lessaudible since the STRB processed speech is approximately thesame as that for the original speech. This threshold also results inhigher rate of convergence of about 1.1 dB over the HWR algo-rithm. In the case of no background noise added to the transmis-sion room, we observe that the STRB algorithm using thresh-olds of and achieves similar steady-statenormalized misalignment of 10.1 dB. Thus, in this case, thethreshold can be chosen to range between 0.005 and 0.02 with atradeoff between convergence performance and speech quality.A threshold range of [0.02, 0.03] can be used in the actual trans-mission room with environmental noise.

We further investigate the performance of the proposedSTRB algorithm with different block lengths ( , 256and 512 corresponding to adaptive processing delay of 16, 32,and 64 ms, respectively). These experiments are investigatedusing four different thresholds of , 0.01, 0.02, and

Fig. 13. (a) Normalized speech and normalized misalignment plots for (b) nodecorrelation, (c) HWR with � � ���, (d) STRB with � � �����, (e) STRBwith � � ����, (f) STRB with � � ����, and (g) STRB with � � ���� (AllSTRBs have fixed � � ���.)

Fig. 14. (a) Normalized speech and normalized misalignment plots for (b) nodecorrelation, (c) HWR with� � ���, (d) STRB with� � ���, (e) STRB with� � ���, and (f) STRB With � � ���. (All STRBs have fixed � � ����.)

0.03. For purpose of illustration, a sample plot of normalizedmisalignment for the STRB algorithm with is givenas shown in Fig. 14. With the same threshold, we observe thatsmaller block length leads to a higher rate of convergence. Thisis because more blocks are being time-reversed giving lowerinterchannel coherence between the stereophonic channels.Finally, the PSDR, BSD and normalized misalignment resultsunder different thresholds and block lengths are summarizedand listed in Table II. This table is useful to appropriately selectthe threshold and block length of the STRB algorithm for adesired performance in an SAEC system.

D. Comparison Among Different Decorrelation Techniques inthe SAEC System

We now compare the performance of TR against otherdecorrelation techniques applied onto selected blocks. Basedon conventional HWR and RNA algorithms, we extended twodifferent algorithms including selective random noise addi-tion (SRNA) and selective half-wave rectifier (SHWR). TheSRNA and SHWR algorithms perform RNA [3] and HWRtransformations [1] on selected blocks with mean absolutemagnitude smaller than the specified threshold , similar tothat of the proposed STRB algorithm. The SRNA algorithmadds a WGN with an SNR of 30 dB on the selected blocks,and the SHWR algorithm introduces the HWR processing, asdescribed in (6), with on the selected blocks. A white

Page 10: Time-Reversal Approach to the Stereophonic Acoustic Echo Cancellation Problem

394 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 2, FEBRUARY 2011

TABLE IIINVESTIGATION OF THE PROPOSED STRB ALGORITHM IN

DIFFERENT THRESHOLDS AND BLOCK LENGTHS

The normalized misalignment is evaluated as the steady-state normalized

misalignment with input speech duration of 10 s.

Fig. 15. (a) Normalized speech and normalized misalignment plots for (b) nodecorrelation, (c) SHWR algorithm, (d) SRNA algorithm, and (e) STRB algo-rithm. Note that the selected blocks are indicated as rectangular dotted lines inthe normalized speech.

noise is added to the microphone signal in the receiving roomto achieve and SNR of 30 dB. We compare the convergencerate among SHWR, SRNA, and STRB algorithms, respectivelylabeled as (c), (d), and (e) in Fig. 15. The specified threshold

is selected as 0.01. As seen in Fig. 15, the STRB algorithmachieves a 4.4-dB improvement in terms of steady-state normal-ized misalignment compared to the SHWR algorithm. On theother hand, the STRB algorithm achieves similar steady-statenormalized misalignment compared to the SRNA algorithm.However, the SRNA algorithm distorts the audio quality sig-nificantly due to the large variance of the random noise addedto the selected blocks of original signals. This is in contrast tothe STRB algorithm, which retains the audio and stereophonicquality of the SAEC system.

E. Subjective Listening Test

Based on the ITU recommendation [31], subjective listeningtests are undertaken with ten listeners. The stereophonic sig-nals comprise of male and female speeches with a duration of10 s. The stereophonic signals are processed using differentdecorrelation techniques including STRB with and

, HWR with , SHWR with , as wellas SRNA with 30-dB noise addition. Independent listeners hear

TABLE IIIMEAN OPINION SCORE VALUE OF THE STEREOPHONIC SIGNALS

EMPLOYING DIFFERENT DECORRELATION TECHNIQUES

TABLE IVCOMPARISONS AMONG DIFFERENT DECORRELATION

TECHNIQUES IN SAEC SYSTEM

and rank the quality of these processed speech signals usingthe Mean Opinion Score (MOS) with 5 scales of subjectivequality (1—Bad, 2—Poor, 3—Fair, 4—Good and 5—Excel-lent). The MOS of the processed male and female speech sig-nals are averaged over all listeners for each of the five scales.As mentioned in Table III, the mean MOS values of the HWR,SHWR and STRB processed signals are recorded as 3.6, 4.25,and 4.15, respectively, indicating that these processed signalsachieve good audio quality. On the other hand, the SRNA pro-cessed speeches are evaluated as poor quality due to the additionof 30-dB random noise on selected blocks which can be easilybe perceived as speech distortions.

F. Summary

In summary, the comparison of the SRNA, SHWR, STRB,and HWR algorithms are listed in Table IV. These techniquesare evaluated in terms of PSDR, BSD, normalized misalign-ment, and ERLE performances. From Table IV, we observethat the proposed STRB algorithm can achieve fastest conver-gence rate with excellent stereophonic quality (PSDR and BSD)compared to other existing decorrelation techniques. Thus, theproposed STRB algorithm can achieve an effective tradeoff be-tween convergence rate and stereophonic perception. In addi-tion, the STRB algorithm is computationally efficient to imple-ment for an SAEC system.

V. CONCLUSION

In this paper, the STRB algorithm was proposed to mitigatethe well-known misalignment problem in an SAEC system. TheSTRB algorithm employs a magnitude detector to select andtime-reverse appropriate blocks in order to achieve signal decor-relation. Thus, the STRB algorithm can achieve smaller inter-channel coherence, faster convergence and better ERLE com-pared to that for the HWR algorithm in the SAEC system. Inaddition, the proposed STRB algorithm achieves lower stereo-phonic image and speech distortions over the HWR algorithm.The STRB algorithm also performs favorably when comparedwith SRNA and SHWR algorithms in terms of convergence rateand stereophonic perception performances. Hence, the proposedSTRB algorithm can overcome present technical challenges of

Page 11: Time-Reversal Approach to the Stereophonic Acoustic Echo Cancellation Problem

NGUYEN et al.: TIME-REVERSAL APPROACH TO THE SAEC PROBLEM 395

an SAEC system and provides a suitable and cost-effective so-lution for teleconferencing and multi-participant desktop con-ferencing applications.

ACKNOWLEDGMENT

The authors would like to thank the editor and the reviewersfor constructive comments and helpful suggestions.

REFERENCES

[1] J. Benesty, D. R. Morgan, and M. M. Sondhi, “A better understandingand an improved solution to the specific problems of stereophonicacoustic echo cancellation,” IEEE Trans. Speech Audio Process., vol.6, no. 2, pp. 156–165, Mar. 1998.

[2] K.-A. Lee, W.-S. Gan, and S. M. Kuo, Subband Adaptive Filter: Theoryand Implementation. Chichester, U.K.: Wiley, 2009.

[3] M. M. Sondhi, D. R. Morgan, and J. L. Hall, “Stereophonic acousticecho cancellation—An overview of the fundamental problem,” IEEESignal Process. Lett., vol. 2, no. 8, pp. 148–151, Aug. 1995.

[4] P. Surin, N. Tangsangiumvisai, and S. Aramvith, “An adaptive noisedecorrelation technique for stereophonic acoustic echo cancellation,”in Proc. TENCON, Nov. 2004, pp. 112–115.

[5] A. Gilloire and V. Turbin, “Using auditory properties to improve thebehaviour of stereophonic acoustic echo cancellers,” in Proc. IEEE Int.Conf. Acoust., Speech, Signal Process., May 1998, pp. 3681–3684.

[6] Y. W. Jung, J. H. Lee, Y. C. Park, and D. H. Youn, “A new adaptivealgorithm for stereophonic acoustic echo canceller,” in Proc. IEEE Int.Conf. Acoust., Speech, Signal Process., Jun. 2000, pp. 801–804.

[7] M. Ali, “Stereophonic acoustic echo cancellation system using time-varying all-pass filtering for signal decorrelation,” in Proc. IEEE Int.Conf. Acoust., Speech, Signal Process., May 1998, pp. 3689–3692.

[8] Y. Joncour and A. Sugiyama, “A stereophonic echo canceller withpre-processing for correct echo path identification,” in Proc. IEEE Int.Conf. Acoust., Speech, Signal Process., May 1998, pp. 3677–3680.

[9] S. Shimauchi, Y. Haneda, S. Makino, and Y. Keneda, “New configura-tion for a stereophonic echo canceller with nonlinear pre-processing,”in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., May 1998,pp. 3685–3688.

[10] H. I. K. Rao and B. F. Boroujeny, “Fast LMS/Newton algorithms forstereophonic acoustic echo cancellation,” IEEE Trans. Signal Process.,vol. 57, no. 8, pp. 2919–2930, Aug. 2009.

[11] J. Benesty, D. R. Morgan, J. L. Hall, and M. M. Sondhi, “Stereophonicacoustic echo cancellation using nonlinear transformations and combfiltering,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process.,May 1998, pp. 3673–3676.

[12] T. Gansler and J. Benesty, “Stereophonic acoustic echo cancellationand two-channel adaptive filtering: An overview,” Int. J. Adapt. ControlSignal Process., vol. 14, pp. 565–586, Aug. 2000.

[13] D. R. Morgan, J. L. Hall, and J. Benesty, “Investigation of severaltypes of nonlinearity for use in stereophonic acoustic echo cancella-tion,” IEEE Trans. Speech Audio Process., vol. 9, no. 6, pp. 686–696,Sep. 2001.

[14] J. Benesty, P. Duhamel, and Y. Grenier, “A multichannel affine projec-tion algorithm with applications to multichannel acoustic echo cancel-lation,” IEEE Signal Process. Lett., vol. 3, no. 2, pp. 35–37, Feb. 1996.

[15] A. W. H. Khong and P. A. Naylor, “Stereophonic acoustic echo can-cellation employing selective-tap adaptive algorithms,” IEEE Trans.Speech Audio Process., vol. 14, no. 3, pp. 785–796, May 2006.

[16] S. Yon, M. Tanter, and M. Fink, “Sound focusing in rooms: The timereversal approach,” J. Acoust. Soc. Amer., vol. 113, pp. 1533–1543,2003.

[17] S. Catheline, M. Fink, N. Quieffin, and R. K. Ing, “Acoustic source lo-calization model using in-skull reverberation and time reversal,” Appl.Phys. Lett., vol. 90, no. 6, 2007, 063902.

[18] D.-Q. Nguyen, W.-S. Gan, and A. W. H. Khong, “Selective time-re-versal block solution to the stereophonic acoustic echo cancellation,”in Proc. Eur. Signal Process. Conf., 2009, pp. 1987–1991.

[19] A. W. H. Khong, J. Benesty, and P. Naylor, “Stereophonic acousticecho cancellation: Analysis of the misalignment in the frequency do-main,” IEEE Signal Process. Lett., vol. 13, no. 1, pp. 33–36, Jan. 2006.

[20] E. Terhardt, “Calculating virtual pitch,” Hear. Res., vol. 1, no. 155, p.182, 1979.

[21] H. W. Gierlich and F. Kettler, “Advanced speech quality testing ofmodern telecommunication equipment: An overview,” Signal Process.,vol. 86, pp. 1327–1340, 2006.

[22] W. S. Gan and S. M. Kuo, Embedded Signal Processing With the MicroSignal Architecture. Hoboken, NJ: Wiley, 2007.

[23] J. B. Allen and D. A. Berkley, “Image method for efficiently simulatingsmall-room acoustics,” J. Acoust. Soc. Amer., vol. 65, pp. 943–950,1979.

[24] J. S. Bendat and A. G. Piersol, Engineering Applications of Correlationand Spectral Analysis. New York: Wiley, 1993, ch. 3, pp. 50–56.

[25] F. Martellotta, “Subjective study of preferred listening conditions inItalian Catholic churches,” J. Sound Vibr., vol. 317, pp. 378–399, 2008.

[26] T. C. T. Yin, “Neural mechanisms of encoding binaural localizationcues in the auditory brainstem,” in Integrative Functions in the Mam-malian Auditory Pathway, D. Oertel, Ed. et al. Berlin, Germany:Springer-Verlag, 2002, ch. 4, pp. 99–103.

[27] S. Na and S. Yoo, “Allowable propagation delay for VoIP calls of ac-ceptable quality,” in AISA, W. Chang, Ed. New York: Springer, 2002,vol. 2402, LNCS, pp. 47–55.

[28] S. Wang, A. Sekey, and A. Gersho, “An objective measure forpredicting subjective quality of speech coders,” IEEE J. Sel. AreasCommun., vol. 10, no. 5, pp. 819–829, Jun. 1992.

[29] “CCITT Recommendation P.59,” Int. Telecomm. Union (ITU), 1993.[30] C. Paleologu, S. Ciochina, and J. Benesty, “Variable step-size NLMS

algorithm for under-modeling acoustic echo cancellation,” IEEE SignalProcess. Lett., vol. 15, pp. 5–9, 2008.

[31] “CCITT Recommendation E.432,” Int. Telecomm. Union (ITU), 1992.

Dinh-Quy Nguyen (S’07) was born in Hanoi,Vietnam, in 1983. He received the B.Eng. (Honors)degree from the School of Electrical and ElectronicEngineering, Nanyang Technological University(NTU), Singapore, in 2006. He is currently pursuingthe Ph. D. degree in the Digital Signal Processing(DSP) Laboratory, NTU.

His research interests include time reversal signalprocessing, array signal processing, adaptive signalprocessing, acoustic echo cancellation, and acousticinverse scattering problem.

Woon-Seng Gan (S’90–M’93–SM’00) receivedthe B.Eng. (First Class Honors) and Ph.D. degreesin electrical and electronic engineering from theUniversity of Strathclyde, Glasgow, U.K., in 1989and 1993, respectively.

He joined the School of Electrical and ElectronicEngineering, Nanyang Technological University(NTU), Singapore, as a Lecturer and Senior Lecturerin 1993 and 1998, respectively. In 1999, he waspromoted to Associate Professor. He is currentlythe Deputy Director of the Center for Signal Pro-

cessing at NTU. His research interests include adaptive signal processing,psycho-acoustical signal processing, audio processing, and real-time embeddedsystems. He has recently coauthored a book titled Digital Signal Processors:Architectures, Implementations, and Applications (Prentice-Hall, 2005). Thisbook has since been translated to Chinese for adoption by universities inChina. He is also the leading author of a new book titled Embedded SignalProcessing with the Micro Signal Architecture, (Wiley-IEEE, 2007). A newbook on Subband Adaptive Filtering: Theory and Implementation is due to bepublished by Wiley in July 2009.

Dr. Gan won the Institute of Engineer Singapore (IES) Prestigious Engi-neering Achievement Award in 2001 for his work on the Audio Beam System.He has published more than 180 international refereed journals and conferencesand has been awarded four Singapore and U.S. patents.

Andy W. H. Khong (M’06) received the B.Eng.degree in electrical and electronic engineering fromthe Nanyang Technological University, Singapore,in 2002 and the Ph.D. degree from Imperial CollegeLondon, London, U.K., in 2006.

From 2006 to 2008, he was a Research Associateat Imperial College London. Since 2008, he has beena faculty member at the Nanyang TechnologicalUniversity, Singapore. His research interests aremainly in the area of acoustic propagation, (blindand nonblind) adaptive algorithms for single and

multichannel acoustic system identification with applications to echo controland speech enhancement. His industrial research experience includes sourcelocalization and classification using acoustic and seismic signals.