block lms algorithm

1860 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 49, NO. 9, SEPTEMBER 2001

Analysis of the Partitioned Frequency-Domain BlockLMS (PFBLMS) Algorithm

Kheong Sann Chan and Berhouz Farhang-Boroujeny, Senior Member, IEEE

Abstract—In this paper, we present a new analysis of the par-titioned frequency-domain block least-mean-square (PFBLMS)algorithm. We analyze the matrices that control the convergencerates of the various forms of the PFBLMS algorithm and evaluatetheir eigenvalues for both white and colored input processes.Because of the complexity of the problem, the detailed analysesare only given for the case where the filter input is a first-orderautoregressive process (AR-1). However, the results are thengeneralized to arbitrary processes in a heuristic way by lookinginto a set of numerical examples. An interesting finding (that isconsistent with earlier publications) is that the unconstrainedPFBLMS algorithm suffers from slow modes of convergence,which the FBLMS algorithm does not. Fortunately, however, thesemodes are not present in the constrained PFBLMS algorithm. Asimplified version of the constrained PFBLMS algorithm, whichis known as the schedule-constrained PFBLMS algorithm, is alsodiscussed, and the reason for its similar behavior to that of its fullyconstrained version is explained.

Index Terms—Adaptive filters, block LMS, FBLMS, frequencydomain, partitioned FBLMS.

I. INTRODUCTION

I N the realization of adaptive filters, the least-mean-square(LMS) algorithm has always been one of the most popular

adaptation schemes. Its conventional form, which was first pro-posed by Widrow and Hoff [1], has been very well analyzedand understood. Its main drawback is that it does not performvery well in the presence of a highly colored filter input. In thepast, researchers have developed many variations of the LMSalgorithm, reducing the complexity, increasing the convergencerate, and tailoring it for certain specific applications [2]–[16].The frequency-domain block LMS (FBLMS) algorithm (alsoknown as the fast block LMS) was initially proposed by Fer-rara [2] to cut down the computational complexity of the algo-rithm. It takes advantage of the existence of efficient algorithmsfor the computation of the discrete Fourier transform (DFT) andthe fact that point-wise multiplication in the frequency domainis equivalent to circular convolution in the time domain [17].In this way, a block of size outputs of the filter are simulta-neously calculated. The block sizeis typically (although notnecessarily) chosen to be the same as or close to the filter size

.

Manuscript received May 2, 2000; revised May 18, 2001. The associate editorcoordinating the review of this paper and approving it for publication was Dr.Ali H. Sayed.

K. S. Chan is with the Data Storage Institute, National University of Singa-pore, Singapore.

B. Farhang-Boroujeny is with the Department of Electrical Engineering, Uni-versity of Utah, Salt Lake City, UT 84112-9206 USA and also with the NationalUniversity of Singapore, Singapore (e-mail: [email protected]).

Publisher Item Identifier S 1053-587X(01)07043-X.

Ferrara’s algorithm was initially proposed as an exact but fastimplementation of the time-domain block LMS (BLMS) algo-rithm. However, Mansour and Gray [16] subsequently showedthat it was also possible to omit an operation that constrainedcertain time-domain quantities at the cost of an increase in themisadjustment. This algorithm, which is referred to as the un-constrained FBLMS algorithm, saves two out of the five fastFourier transforms (FFTs) required in the original (constrained)FBLMS. However, it is no longer an exact implementation ofthe time-domain block LMS algorithm.

It was also found that since the transformed samples of thefilter input (known as frequency bins) are almost uncorrelatedfrom one another, one may use separate normalized step-sizeparameters for each of the frequency bins [18], [19], therebyequalizing the convergence rates of all frequency bins. The step-normalization procedure resolves the problem of slow modes ofthe LMS algorithm and results in an algorithm that convergesfaster than the conventional LMS algorithm [18], [20].

Analysis done on the FBLMS algorithm has shown that as theblock size grows, the eigenvalues that control the convergencebehavior of the FBLMS algorithm all asymptotically approachthe same value, resulting in the fastest possible convergence ratefor a given filter length [20]. The shortcoming lies in the fact thatas the filter length grows, more time must be spent accumu-lating the input data before any processing can begin. Therefore,when the number of taps of the adaptive filteris large, theFBLMS algorithm suffers a significant delay from the time thefirst datum of the current block is collected until the processingfor the current block is completed. This delay is referred to asthe algorithmlatency. One solution is to simply use a smallerblock size , but this results in an algorithm that is com-putationally less efficient.

Asharif et al. [3] have suggested a solution to this thatinvolves partitioning the time-domain convolution intosmaller-sized convolutions and performing each of the con-volutions in the frequency domain. This algorithm is referredto here as the partitioned FBLMS (PFBLMS) algorithm andsolves the problem of a large latency without sacrificingcomputational efficiency; the FBLMS algorithm is a specialcase of the PFBLMS algorithm with . The analysis ofthe PFBLMS algorithm, however, is more complex than that ofthe FBLMS algorithm.

Moulines et al. have analyzed the PFBLMS algorithm in[21] (which they refer to as a multidelay adaptive filter).They have performed first- and second-order analysis on thematrices. Their analysis resembles that done by Lee and Un in[20] in which they derive the matrices that control the variousimplementations of the FBLMS algorithm. They show that in

1053–587X/01$10.00 © 2001 IEEE

CHAN AND FARHANG-BOROUJENY: PARTITIONED FREQUENCY-DOMAIN BLOCK LMS ALGORITM 1861

the cases of the normalized constrained/unconstrained FBLMSalgorithms and the normalized constrained PFBLMS algorithm,the matrices will beasymptotically equivalentto some othermatrices that have identical eigenvalues. Therefore, in the limitas the matrix dimension grows to infinity, the eigenvalue spreadwill tend toward 1. However, the unconstrained PFBLMSalgorithm is found to suffer from slow modes of convergence.Beaufays has shown in [22] that although the asymptotic equiv-alence of two matrices may mean that the eigenvalue momentswill be the same (as proven by Gray [23]), there may still existindividual eigenvalues that do not asymptotically converge tothe same values. In addition, in [7], it has been noted that theoverlapping of successive partitions in the PFBLMS algorithmdegrades its unconstrained convergence behavior. A solutionto this problem was then given based on a simplified analysisof the algorithm. It was noted that by reducing the amount ofoverlap between successive partitions, the slow convergenceof the PFBLMS algorithm is resolved to some extent. Furtherwork along this line has been done in [24]. In [25], McLaughlinhas proposed an alternative algorithm where the constrainingoperation is applied to the various partitions on a scheduledbasis, with specific reference to the application of acousticecho cancellation. This scheduling of the constraint results ina significant reduction in the computational complexity of thealgorithm while keeping the convergence at almost the samerate as the fully constrained PFBLMS algorithm. This methodis therefore referred to as the schedule-constrained PFBLMSalgorithm. Analysis of this algorithm has yet to be reported.

In this paper, we present an alternative analysis of the PF-BLMS algorithm based on a direct evaluation and analysis of theunderlying convergence controlling matrices, without relyingon the concept of asymptotic equivalence. We use Gerschgorin’stheorem to derive bounds on the distribution of the eigenvaluesof these matrices. To make the analysis tractable, we first con-sider the case where the filter input is modeled as a first-orderautoregressive process (AR-1). We then show through numer-ical examples that the matrices evaluated for the AR-1 case aresimilar to those of more general processes as well. We also pro-vide an overview of the less-understood schedule-constrainedPFBLMS algorithm based on the matrices we have developedin this paper.

II. PFBLMS ALGORITHM

We consider the implementation of an adaptive transversalfilter with taps. The filter output is related to itsinput by

(1)

where the s are the adaptive filter tap weights.The convolution sum of (1) may be partitioned into

smaller-sized convolution sums according to

(2)

where

(3)

and each of these smaller convolutions is then performed effi-ciently in the frequency-domain and summed to give asper (2). Then, the input and tap-weight vectors forthe th partition of the algorithm are

(4)

respectively, where is the block size, and is the block index.We also note that there is an overlap ofdata samples betweensuccessive partitions and that the addition ofzeros at the endof is to allow the computation of filter output , basedon the overlap-save method [17], which can be implemented inthe frequency domain.

The frequency-domain input and tap-weight vectors used inthe PFBLMS algorithm are the DFT of and

(5)

where is the DFT matrix. The frequency-domain outputvector is given by

(6)

where denotes point-wise multiplication of vectors, andis a diagonal matrix taking the elements of

along its diagonal. The block of time-domain filter outputsamples are the last elements of

. The corresponding time-domain errors are

(7)

for , where s are the elements ofthe time-domain training sequence. The frequency-domain errorvector is given as the1 vector ,where

(8)

This may also be written entirely in the frequency domain as

(9)

where is the DFT of the training signal padded withzeros at the beginning, and

(10)

1We note that for a block of lengthL and partition lengthM , the length ofe (k) need only beM+L�1. The choice ofM+L is however more commonas it greatly simplifies the implementation [19] and analysis.


is a windowing matrix that forces the first time-domain el-ements to zero [19]. The vector is then used to updatethe frequency-domain tap-weight vectors of each partition ac-cording to

(11)

where “ ” denotes complex conjugation

(12)

is the windowing matrix that forces the lasttime-domain ele-ments to zero, is the unnormalized step size, and is a

diagonal matrix of the powers at each ofthe frequency bins. The powers of the frequency bins are usu-ally updated using the running average algorithm

where , is a parameter that determines how muchweight is given to previous powers. In a subsequent discussion,we choose to ignore the time dependence of and assumethat it is equal to a fixed diagonal matrix holding the powersat each frequency bin along its diagonal. Comments on the rangeof that guarantees the stability of the PFBLMS algorithmmay be found in [21].

We may write the equations for the PFBLMS algorithm in amore compact form by defining the super matrix

(13)

and the super vector

...(14)

With these definitions, (6) is more simply written as

(15)

The error vector is still given by (9), but the update equation (11)becomes2

(16)

where superscript denotes matrix Hermitian

......

.. ....

and

2It is known that in a finite-precision implementation, (16) may face somenumerical stability problems due to round-off errors [19]. The algorithm willwork better ifP is applied to the entire right-hand side of (16). For thepurposes of analysis, however, (16) is adequate as it stands.

......

. . ....

The matrix or performs the constraining opera-tion, and it is thus omitted in the unconstrained algorithm. Thestep-normalizing matrix is omitted for the unnormalizedPFBLMS algorithm. The parameter is the unnormalized stepsize and is used to control how fast the algorithm convergesand the misadjustment level after convergence. Typically,ischosen such that the final misadjustment is on the order of 10%.

In the rest this paper, we limit our discussion to the analysis ofstep-normalized versions of the PFBLMS algorithm. The con-strained PFBLMS algorithm without step normalization is an al-ternative (fast) implementation of the conventional block LMSalgorithm whose performance is well understood [19]. It per-forms very similarly to the conventional LMS algorithm. Theunconstrained PFBLMS algorithm without step normalization,on the other hand, is difficult to analyze, and to the best of ourknowledge, such analysis has yet to be reported. The limitedreports on the analysis of the PFBLMS algorithm, as in thispaper, have concentrated mainly on the step-normalized formsof the algorithm. One reason for this is that PFBLMS algo-rithm without step normalization is hardly ever used in practice.As mentioned in Section I, the step normalization resolves thewell-known problem of slow convergence modes in the LMS al-gorithm. In the FBLMS and PFBLMS algorithms, this solutioncomes at a minimal computational cost.

III. A NALYSIS OF THE PFBLMS ALGORITHM

We assume a model for the desired signal that consists ofpassing the input through a fixed FIR filter and adding in-dependent zero-mean white noise to the output. The coef-ficients of this fixed filter will be the optimum tap weightsfor the adaptive filter that will minimize the power of the errorvector . As such, we may write (9) as

(17)

where is the optimum tap-weight vector defined in a waysimilar to that done in (4), (5), and (14), and is the error(noise) added to the output of the plant. Substituting (17) into(16), we obtain

(18)

where is the tap-weight error vector.Subtracting from both sides, taking the expectation, andconsidering the commonly used independence assumption [19],we get

(19)where

(20)


and we have set , which readilyfollows from the independence of and and the factthat . We note that (19) is the equation for theconstrained normalized PFBLMS algorithm. The analysis forthe unconstrained and/or unnormalized PFBLMS algorithmsfollows analogously by removing the and/or from theequation, respectively. Equation (19) shows that the modesof convergence of the PFBLMS algorithm are controlled bythe properties of the matrix . We therefore begin with ananalysis of this matrix.

A. Matrix

Substituting (13) into (20), we obtain

......

. . ....

(21)

where

(22)

and we have used the fact that and. Since is Hermitian, we see that is also

Hermitian, and therefore, all of its eigenvalues will be real.has been shown to be asymptotically diagonal, and in the casewhen the number of partitions (i.e., the FBLMS), theunconstrained normalized algorithm performs sufficiently wellas the filter length increases, even in the presence of a coloredinput [20], [26]. When however, we have to consider theeffects of the submatrices on the matrix aswell.

From (22), we see that since is diagonal, the productis equivalent to multiplying the rows of by

the corresponding diagonal elements of , which are theelements of . Similarly, postmultiplying by the diag-onal matrix is equivalent to multiplying the columnsof by the corresponding diagonal elements, i.e.,the elements of . Noting these facts, we obtain

(23)

where

(24)

and is the correlation matrix between the inputs of partitions0 and . Using (23) in (21), we get

......

......

. . ....

(25)

where we have made use of and.

Now, we make the extension to the matrix of thenormalizedPFBLMS algorithm, which, from (19), is

(26)

Here and in the subsequent equations, we use the superscript“ ” to indicate normalization. It turns out that the structure ofthe time-domain matrix

(27)

where is a block-diagonal supermatrix whose diagonal blockelements are the DFT matrix, is simpler than that of its fre-quency-domain counterpart. From the theory of eigenvalues ofmatrices, it is known that and share the same set ofeigenvalues. We thus proceed with the analysis of and itssubmatrices .

B. Matrix

For our analysis, we assume a first-order autoregressive inputprocess (AR-1) with coloring parameter. Such a process isgenerated by passing white noise through a filter with transferfunction . The normalized submatricesare readily obtained from (23), (25), (26), and (27) as

(28)

A procedure for the derivation of the elements of for thecase when is given in Appendix A. The expressionsobtained are cumbersome for the cases with general. When

, examination of (4) reveals that inputs at different parti-tions will have significant correlation only when their partitionnumbers do not differ by more than 1. That is, and

will have significant values, but towill all have norms that are close to zero, given that the filterinput is not too highly colored and that is large enough. Thematrix is thus asymptotically block tridiagonal. We there-fore summarize our results for the three main submatrices inwhich we are interested, i.e., and . These aregiven in Tables I–III, where refers to the th element of thespecified matrix, and is the Kronecker delta function. Moredetailed derivations that include general elements of canbe found in [27].

The matrix controlling the convergence of the normalized un-constrained PFBLMS algorithm may then be explicitly writtenout as

......

......

(29)

C. Numerical Examples

At this stage in our study, it is instructive to look at the re-sults of Tables I–III through some numerical examples. The nu-merical results presented here serve two purposes. First, theyare used to confirm the accuracy of our derivations, as we havemade an approximation in Appendix A in evaluating the diag-onal elements of . Second, the numerical examples serve


TABLE IELEMENTS OF THEMATRIX R

TABLE IIELEMENTS OF THEMATRIX R

TABLE IIIELEMENTS OF THEMATRIX R

to fill the gap between the derived theoretical results for AR-1input and the more general results associated with an arbitraryinput process. Fig. 1 presents a pictorial representation of thetime-domain matrices and in the left column and

and in the right column for partition sizewhen the input process is white. Fig. 2 shows six

similar matrices that were generated when the input is an AR-1process with parameter . The relative magnitudes ofthe elements are represented in varying shades of gray as indi-cated in the corresponding legends. Darker squares correspondto elements with larger magnitude, whereas lighter squares cor-respond to elements with smaller magnitude. The submatrices

were evaluated from knowledge of the autocorrelation func-tion for an AR-1 process (with for white input), and thesubmatrices were computed from using (28). The diag-onal elements of were used for the elements of instead

of the approximations employed in Appendix A. The numer-ical elements of these submatrices were compared with thoseobtained using Tables I–III and found to be in close agreement.This confirms the validity of the simplifying assumptions usedin Appendix A, which gave rise to the results of Tables I–III. Ac-cording to (28), is derived from by first converting it tothe frequency domain, taking the complex conjugate, point-wisemultiplying it by , normalizing the matrix, and then con-verting it back into the time domain. Comparing Figs. 1 and2, we see that although the original time-domain matricesdiffer quite substantially when the input is colored, as comparedto when it is white, the matrices look fairly similar forthe two cases. This implies that the normalized PFBLMS al-gorithm ought to perform similarly, regardless of whether theinput process is white or AR-1, provided that the block sizeand the partition size are sufficiently large.

With reference to Figs. 1 and 2, we make the following ob-servations.

1) The matrix is close to diagonal. We recall thatis the correlation matrix that governs the convergencebehavior of the FBLMS algorithm, i.e., the case when

. Furthermore, the asymptotic diagonality of thismatrix has been proven in the context of the FBLMS al-gorithm [20], [26].

2) The significant nonzero elements of are mainlyalong the diagonals of the upper-right and lower-leftquadrants. The forms of these matrices may be under-stood when we see that the frequency-domain equivalentof is generated by the point-wise multiplication oftwo frequency-domain matrices and , (23).This corresponds to a two-dimensional (2-D) circularconvolution in the time domain. When the input is white,

is a rectangular pulse along thediagonal of the top-right quadrant, as shown in Fig. 1(b),and is a rectangular pulse alongthe diagonal of the bottom-right quadrant. Circularlyconvolving the two rectangular pulses together producesa triangular pulse that spans the top-right and bottom-leftquadrants, as shown in Fig. 1(e).

These two observations, as we will see, are fundamental to theconvergence behavior of the various versions of the PFBLMSalgorithm.

In Fig. 3, we have evaluated all the submatrices and presenteda pictorial representation of the matrix for four differentinput processes:

i) white input process;ii) lowpass AR-1 process with parameter ;iii) autoregressive moving average (ARMA) highpass

process generated by the coloring filter

iv) moving average bandpass process generated by passinga white process through a filter with transfer function

.The interesting observation in Fig. 3 is that the general pattern of

is almost independent of the nature of the input process. We


Fig. 1. Pictorial representation of the submatrices (a)R , (b)R , (c)R , (d)R , (e)R , and (f)R for a white input process.

thus may conclude that the behavior of the PFBLMS algorithmis almost independent of the coloring of the filter input.

D. Matrix

From (19), we observe that the convergence of the con-strained PFBLMS algorithm is governed by the eigenvalues

of or, equivalently, .Recalling the form of , it is easy to see that

......

. . ....

(30)


Fig. 2. Pictorial representation of the submatrices (a)R , (b)R , (c)R , (d)R , (e)R , and (f)R for an AR-1 input process with parameter� = 0:85.

The matrix sets the bottom half of each submatrix to zero.For , this has the effect of zeroing out exactly half ofthe eigenvalues of , as is only of rank . These zeroeigenvalues correspond to the convergence modes of the secondhalf of each partition, which are forced to zero at each iterationby the constraining operation. They are thus automatically setto their optimum values (zero) at each iteration and, therefore,

do not need to converge. The convergence of the constrainedPFBLMS algorithm is therefore determined by the remainingnonzero eigenvalues of .

At this point, it is instructive to examine the structure ofin more detail and to determine the effect of this structure onthe distribution of its eigenvalues. In Fig. 4, we present a pic-torial representation of for the AR-1 process, with


Fig. 3. Pictorial representation of the matricesR (time-domain) for (a) white input, (b) AR-1 input, (c) ARMA input, and (d) MA input.P = 4; L =M = 16.The three colored processes were chosen such that the original eigenvalue spread is around 100.

and . As can be seen from (30) as wellas from Fig. 4, the matrix sets the bottom half of eachto zero. Thus, only the top half ofand will affect the eigenvalues of . Furthermore, eacheigenvalue and associated eigenvectorof satisfy

Since the bottom half of each partition (i.e., of the matricesand ) is zero, the corresponding

portions of must also be zero, and therefore, only the left halfof and will affect the eigen-

values of . The only portion of the matricesand that affect the eigenvalues of are therefore thetop-left quadrants, i.e., and . These por-


tions are obtained from Tables I–III and are summarized herefor and .

(31)

The presence of in the expressions makes these portions of thematrices quite simple. In particular, we note that only the first

row of the top-left quadrants of contain nonzeroelements. This closely matches what is observed in Figs. 2 and4.

E. Convergence Modes of the Constrained and UnconstrainedPFBLMS Algorithms

In this subsection, we wrap up the results developed so farand comment on the various implementations of the PFBLMSalgorithm. Obviously, without step-normalization, the conver-gence behavior of the PFBLMS algorithm is highly dependenton the power spectral density of the filter input and performspoorly when it is highly colored. For this reason, as noted ear-lier, we only consider the step-normalized versions of the PF-BLMS algorithm. We therefore focus only on the matricesand , which determine the convergence behavior of the un-constrained and constrained versions of the PFBLMS algorithm,respectively. The analysis of these matrices relies heavily on thefollowing result from matrix theory [28]:

Gerschgorin’s Theorem:Let be an eigenvalue of an arbi-trary matrix . Then, there exists some integer

such that

(32)

Gerschgorin’s theorem says that each eigenvalueof is“close” to one of its diagonal elements. How close is quantifiedby , which is the summation of theabsolute values of the off-diagonal elements of theth columnof . As the determinants and, therefore, the eigenvalues of

and are identical, we see that Gerschgorin’s theoremmay equally well be applied by forming the summation overthe rows instead of the columns. Since Gerschgorin’s theoremprovides a bound for a specific eigenvalue, it is natural tochoose the tighter of the two bounds, i.e., to chooseto be thesmaller of the two possible summations arising from the rowor column summations. If we define Gerschgorin disks inthe complex plane each centered at, and of radius , for

, then the eigenvalues of are distributed suchthat they all lie within the union of the Gerschgorin disks. Afurther extension of Gerschgorin’s theorem states that whenthe Gerschgorin disks are disjoint, there will be one eigenvaluein each disk.

Examining the elements of as described in (31) and inFig. 4, we see that we may apply Gerschgorin’s theorem to thecolumns of . The term guarantees that the only nonzero

Fig. 4. MatrixR with an AR-1 input process, with� = 0:85.

elements in each will be in the first row; therefore,we only need to consider the element of each submatrix

and to obtain the radius of each of the Ger-schgorin disks. From (31), we observe that these elements willall have the factor in common. This clearly shows thatas the FFT-length goes to infinity,3 the radii of the Ger-schgorin disks go to zero, and therefore, the nonzero eigenvaluesof all tend toward their diagonal values of a half.

When the constraint is not applied, looking at Figs. 2 and 3, itis apparent that summing over the relevant rows or columns of

will give rise to Gerschgorin disks with significant (nonde-caying) radii because of the significant elements in the top-rightand bottom-left quadrants of . The eigenvalues of arein fact widely spread, as we now proceed to demonstrate withthe following numerical example.

Consider the case where the input is white,and . Then

The eigenvalues of this matrix may be evaluated by sub-tracting from each of the diagonal elements and thenperforming elementary row operations to reduce it to uppertriangular form. The eigenvalues will then be located along

3We note that a choice of a large value ofM may be undesirable in someapplications as it will result in a latency of at leastM samples at the filter output.


TABLE IVEIGENVALUE SPREAD OFRRR FOR WHITE INPUT

the main diagonal. Doing so produces the eigenvalues, which are widely

spread.4 In general, the matrix has eigenvalues that arewidely spread. One point of interest is the zero eigenvaluethat implies that the eigenvalue spread .The zero eigenvalue corresponds to a degree of redundancyincorporated into the unconstrained PFBLMS algorithm. Thisredundancy is also best illustrated by way of example. Wewrite out the output vectors for each partition for our exampleof

and

where denotes a nonrelevant item. Thes are later set to zeroby . Performing the circular convolution between and

, we see that an arbitrary variable may be added toand subtracted from the tap-weight vector in the shown posi-tions, without affecting . This re-dundancy manifests itself as a zero eigenvalue, as energy maybe exchanged arbitrarily between tap 2 of partition andtap 0 of the partition . Obviously, this does not contribute

4Similar, but slightly different results have also been reported in [21]. In par-ticular, the results given in [21] do not contain any zero eigenvalue. This dif-ference is because of the choice of the FFT length. Here, we use FFTs that areof lengthM + L. In [21], an FFT length ofM + L� 1 is assumed. See alsoFootnote 1 after (7).

TABLE VEIGENVALUE SPREAD OFRRR FOR AR-1 INPUT WITH PARAMETER

toward the convergence of the PFBLMS algorithm. It simplyappears as a kind of random walk stochastic process. This maypresent practical problems with the numerical overflow as thevariance of a random-walk process goes to infinity as time goesto infinity. It is therefore possible that the tap may becomearbitrarily large. Furthermore, we note that this is not a problemwhen the constrained PFBLMS is used because, being in thesecond half of the first partition, is constrained to 0, forcingto bear the full responsibility for that tap.

Continuing with the analysis of our example, we see thatwhen the constrained matrix is used, we get

The matrix has the effect of zeroing out the secondhalf of each partition resulting in eigenvalues,which are identically zero. The remaining four eigenvaluesmay be immediately obtained by using Gerschgorin’s the-orem along columns 0, 1, 4, and 5, giving four eigenvaluesall equal to 0.5, as all of the Gerschgorin disks in this casehave radii of zero. The eigenvalues of are there-fore . Ignoring the zeroeigenvalues, we thus see that the constraining operation hasequalized all the eigenvalues.

To further support the conclusions drawn in this section, wepresent Tables IV–VI. In Tables IV and V, the eigenvalue spreadof the matrix is given for a white input process and for anAR-1 input process with , respectively, for various

and . In Table VI, the eigenvalue spread of is givenfor the AR-1 input for various and . Since the eigenvaluespread of is always 1 when the input process is white, thattable has been omitted. The matrices were evaluated without theapproximation employed in Appendix A. These tables support


TABLE VIEIGENVALUE SPREAD OFRRR FORAR-1 INPUT WITH PARAMETER � = 0:85

our conclusion that the constrained PFBLMS algorithm has aneigenvalue spread that asymptotically tends to 1 as(recall that ). We can also see that this does nothappen with the unconstrained algorithm, even for the case ofwhite input.

IV. SCHEDULE-CONSTRAINED PFBLMS ALGORITHM

From the analysis in the previous section, we see that the un-constrained PFBLMS algorithm suffers from slow convergencemodes, whereas the convergence of the constrained PFBLMSalgorithm asymptotically approaches a single mode at theexpense of significantly increased computational complexity.The unconstrained PFBLMS algorithm only requires threeFFT/IFFTs per iteration, whereas the constraining operationrequires an additional FFTs and IFFTs per iteration to con-strain the tap-weight vectors for .For large , this complexity may not be acceptable. As wasmentioned in Section I, McLaughlin [25] has proposed asolution in which the tap weights are not all constrained ateach iteration. Instead, a scheduling is assigned in whichdifferent tap weights are constrained at each iteration. A simpleschedule could be to apply the constraining operation to eachpartition tap-weight vector on a rotational basis. As is notedin [25] and shown through computer simulation in Section V,the schedule-constrained PFBLMS algorithm performs almostas well as the fully constrained PFBLMS algorithm. In thissection, we use the results derived in the previous sectionsto give an explanation of the high convergence rate of theschedule-constrained PFBLMS algorithm.

The schedule-constrained PFBLMS algorithm takes advan-tage of the fact that after being constrained, the tap weightsin each partition remain “approximately constrained” for thenext few iterations. We define those time-domain taps that areset to zero by the constraining operation, as theconstrainedtaps, whereas those taps that are not influenced by the con-straint are referred to as thefree taps. In the implementation ofthe PFBLMS algorithm with 50% overlap between input par-titions , the free taps correspond to the first half of

, whereas the constrained taps correspond to thesecond half of . In terms of these definitions, the

constrained taps are set to zero every time the constraint is ap-plied to a tap-weight vector. They will remain close to zero overthe course of the next few iterations as the unconstrained PF-BLMS algorithm is run but will grow slowly. However, theywill be set back to zero at the next scheduled constraint and,therefore, never have the opportunity to reach a significantlylarge magnitude. Using the matrices developed in Appendix Aand summarized in Tables I–III, we can understand the behaviorof the schedule-constrained PFBLMS algorithm and show thatit will converge with almost no degradation in performance ascompared to its fully constrained counterpart.

When is a constrained vector, we know that the al-gorithm will asymptotically converge with a single mode. Thiscan also be seen from (19), which determines the convergenceof . Removing the constraining matrix and pre-multiplying (19) by to convert the equation into the time-domain, we get

(33)

where . From the way the super-vector oftap weights was defined in (14), we see that the super-vector holds the time-domain tap-error weights for all thepartitions strung head to tail.

Equation (33) is the update equation for the normalizedunconstrained PFBLMS algorithm, and the convergence iscontrolled by the matrix in (29), which has the submatrices

summarized in Tables I–III for anddepicted in Fig. 3 for various input processes. As we wish todifferentiate between constrained taps and free taps, we splitup the submatrices into four submatrices, which we label

, and , corresponding to the top-left,top-right, bottom-left, and bottom-right quadrants of ,respectively. We can write the tap-errors in terms of their freetaps and constrained taps as

where is the tap-error vector for theth partition. Withthese new terms, (33) is written as

(34)

where, due to space restrictions, we have written the equationout for , although it extends simply to the case of general

.


From (34), we can see that when the constrained tapsare zero, the update equation for the free taps is governed solelyby the matrices , and (34) reduces to

(35)

From Figs. 1–3 and Tables I–III, we know that andare close to zero and that is asymptotically diagonal. It

then follows that , which controls the conver-

gence of the free taps when the constrained taps are zero, isalso asymptotically diagonal. As was noted before, the con-strained taps are never allowed to grow very large due to theperiodical application of the constraint. We therefore expect theschedule-constrained PFBLMS algorithm to perform almost aswell as the fully constrained PFBLMS algorithm. This conclu-sion is verified by the simulations.

V. SIMULATION RESULTS

Figs. 5 and 6 show the learning curves for the PFBLMS al-gorithm with and for the white input andAR-1 input cases. The curves were ensemble aver-aged over 500 individual runs each and time averaged over eachblock . The step-size parameter was chosen so thatthe final misadjustment would be around 10% in all cases. Thefollowing equations were used to obtain

and

where denotes misadjustment, and the superscripts “,” “ ,”and “ ,” as before, refer to “constrained,” “unconstrained,”and “step-normalized,” respectively. The forgetting factorwas 0.95. The three curves correspond to the unconstrained,schedule-constrained, and fully constrained algorithms, re-spectively. Similar curves are also obtained for other coloredinputs, including the third and fourth processes presented inSection III-C. We can see from these curves that, as predictedby the theory, the schedule-constrained algorithm performsalmost identically to the fully constrained PFBLMS algorithm.The unconstrained algorithm, however, shows a slow modeof convergence as predicted by the large eigenvalue spreadsassociated with . A further observation from these resultsis that the convergence modes are more or less independent ofthe type of input coloring, which is, again, in agreement withwhat is observed from the matrices in Fig. 3.

VI. CONCLUSION

In this paper, we evaluated the matrices that control the con-vergence rates of the constrained and unconstrained normalizedPFBLMS algorithms. We proceeded to calculate each elementof the matrices for the case of an AR-1 input. From these, weshowed that as the size of the discrete Fourier transform is in-creased, the eigenvalues associated with the constrained PF-BLMS algorithm matrix all tend to the same value, andtherefore, the eigenvalue spread asymptotically tends toward 1.We also showed through numerical examples and simulation

Fig. 5. Learning curves for the (a) unconstrained, (b) schedule-constrained,and (c) fully constrained PFBLMS algorithm with white input.N = 1024; P =

16;M = 64.

Fig. 6. Learning curves for the (a) unconstrained, (b) schedule-constrained,and (c) fully constrained PFBLMS algorithm with AR-1 input.N = 1024;P =

16;M = 64.

results that the eigenvalue spread associated with the uncon-strained PFBLMS algorithm is widely spread and, therefore,that the unconstrained PFBLMS algorithm suffers from slowmodes of convergence. We also looked at the schedule-con-strained PFBLMS algorithm, which does not have as high acomputational complexity as the fully constrained PFBLMS,nor does it suffer from slow modes of convergence like the un-constrained PFBLMS algorithm. This not so well-known algo-rithm was thus identified as the best compromise implementa-tion of the PFBLMS algorithm.

APPENDIX AEVALUATION OF FOR AN AR-1 INPUT

Since we need to evaluate matrices of the form andfrequently in this Appendix, it is useful to have an

explicit derivation of the elements of this matrix. Whenand


are matrices, is arbitrary, and is the normalizedDFT matrix, i.e., , it is simple towork out the elements of the matrix as

(36)

and similarly for

(37)

Here, we use the symbolto denote , whereas the variableis used as an index to the columns of a matrix. In this Appendix,we assume that . Applying (36) to the matrix , weget the expression for the matrix

(38)

The main objective of this Appendix is to evaluate the elementsof the matrix

where . Writing out explic-itly the input vectors for the input partitionsand from (4) andreplacing by , we see that for an AR-1 input process withparameter , the elements of maybe written as

(39)

Applying (36) to this, we get

Point-wise multiplying this with , we get

(40)

We now need to evaluate the matrix, which is a diagonalmatrix holding the powers of each frequency bin of the input.These are simply the diagonal elements of ,which, from (36), are given as

We only need to consider the diagonal elements of . Dueto the Toeplitz nature of , we may rewrite this as a singlesummation

We now make the simplifying assumption that is largeenough, and is small enough such that for the significantterms in the above summation, may be approxi-mated by . Defining the th diagonal element of as ,we thus obtain

Again, making the approximation and combiningterms, we get

Multiplying (40) by gives

Applying (37) then gives (41), shown at the top of the next page.The first summation in parentheses, in (41), after factoring outthe denominator, may be expanded as

(42)

Considering the valid ranges of the variables, the summationsof exponents can be rewritten in terms of the Kroneker deltafunction. Simplifying (41) then gives

(43)


(41)

Expanding the brackets will give an expression with 14 terms init, each of the form

We can treat each of these summations separately. Evaluation ofthese summations is a matter of carefully checking the ranges ofthe summations and variables (parameters) used and countingthe number of elements of the summation that are nonzero.The valid ranges of the summations are different, depending onwhich portion of the matrix is being examined. The expressionhas been evaluated and presented for the three main submatricesof interest and in Tables I–III. A more detailedevaluation, including a general expression for , is availablein [27].

ACKNOWLEDGMENT

The authors wish to thank the anonymous reviewers for theirmany constructive comments.

REFERENCES

[1] B. Widrow and M. E. Hoff Jr., “Adaptive switching circuits,” inIREWESCON Conv. Rec., 1960, pp. 96–104.

[2] E. R. Ferrara, “Fast implementation of LMS adaptive filters,”IEEETrans. Acoust., Speech, Signal Processing, vol. ASSP-28, pp. 474–475,Aug. 1980.

[3] M. R. Asharif, T. Takebayashi, T. Chugo, and K. Murano, “Frequency-domain noise canceler: Frequency-bin adaptive filtering (FBAF),” inProc. ICASSP, 1986, pp. 41.22.1–41.22.4.

[4] M. R. Asharif and F. Amano, “Acoustic echo-canceler using the FBAFalgorithm,”Trans. Commun., vol. 42, pp. 3090–3094, Dec. 1994.

[5] J. S. Soo and K. K. Pang, “Multidelay block frequency-domain adap-tive filter,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 38, pp.373–376, Feb. 1990.

[6] C. H. Yon and C. K. Un, “Fast multidelay block transform-domain adap-tive filters based on a two-dimensional optimum block algorithm,”IEEETrans. Circuits Syst. II, vol. 41, pp. 337–345, May 1994.

[7] B. Farhang-Boroujeny, “Analysis and efficient implementation of parti-tioned block LMS adaptive filters,”IEEE Trans. Signal Processing, vol.44, pp. 2865–2868, Nov. 1996.

[8] W. Kellermann, “Analysis and design of multirate systems for cancella-tion of acoustic echoes,”Proc. IEEE ICASSP, pp. 2570–2573, 1988.

[9] B. Farhang-Boroujeny and Z. Wang, “Adaptive filtering in subbands:Design issues and experimental results for acoustic echo cancellation,”Signal Process., vol. 61, pp. 213–223, 1997.

[10] S. S. Narayan, A. M. Peterson, and M. J. Narasimha, “Transform domainLMS algorithm,” IEEE Trans. Acoust. Speech, Signal, Processing, vol.ASSP-31, pp. 609–615, June 1983.

[11] B. Farhang-Boroujeny and S. Gazor, “Selection of orthonormal trans-forms for improving the performance of the transform domain normal-ized LMS algorithm,” inProc. Inst. Elect. Eng. F, vol. 139, Oct. 1992,pp. 327–335.

[12] D. F. Marshall and W. K. Jenkins, “A fast quasi-Newton adaptivefiltering algorithm,” IEEE Trans. Signal Processing, vol. 40, pp.1652–1662, July 1992.

[13] C. E. Davila, “A stochastic Newton algorithm with data-adaptive stepsize,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 38, pp.1796–1798, Oct. 1990.

[14] B. Farhang-Boroujeny, “Fast LMS/Newton algorithms based on autore-gressive modeling and their applications to acoustic echo cancellation,”IEEE Trans. Signal Processing, vol. 45, pp. 1987–2000, Aug. 1997.

[15] B. Widrow and S. D. Stearns,Adaptive Signal Processing. EnglewoodCliffs, NJ: Prentice-Hall, 1985.

[16] D. Mansour and A. H. Gray Jr., “Unconstrained frequency-domainadaptive filter,”IEEE Trans. Acoust., Speech, Signal, Processing, vol.ASSP-30, pp. 726–734, Oct. 1982.

[17] A. V. Oppenheim and R. W. Schafer,Digital Signal Pro-cessing. Englewood Cliffs, NJ: Prentice-Hall, 1975.

[18] G. A. Clark, S. R. Parker, and S. K. Mitra, “A unified approach totime and frequency-domain realization of FIR adaptive digital filters,”IEEE Trans. Acoust., Speech, Signal, Processing, vol. ASSP-31, pp.1073–1083, Oct. 1983.

[19] B. Farhang-Boroujeny,Adaptive Filters: Theory and Applica-tions. Chichester, U.K.: Wiley, 1998.

[20] J. C. Lee and C. K. Un, “Performance analysis of frequency-domainblock LMS adaptive digital filters,”IEEE Trans. Circuits Syst., vol. 36,pp. 173–189, Feb. 1989.

[21] E. Moulines, O. A. Amrane, and Y. Grenier, “The generalized multidelayadaptive filter: Structure and convergence analysis,”IEEE Trans. SignalProcessing, vol. 43, pp. 14–28, Jan. 1995.

[22] F. Beaufays, “Transform domain adaptive filters: An analytic approach,”IEEE Trans. Signal Processing, vol. 43, pp. 422–431, Feb. 1995.

[23] R. M. Gray, “On the asymptotic eigenvalue distribution of Toeplitz ma-trices,”IEEE Trans. Inform. Theory, vol. IT-18, pp. 725–729, Nov. 1972.

[24] K. S. Chan and B. Farhang-Boroujeny, “Lattice implementation: Fastconverging structure for efficient implementation of frequency-domainadaptive filters,”Signal Process., vol. 78, pp. 79–89, 1999.

[25] H. J. McLaughlin, “System and method for an efficiently constrainedfrequency-domain adaptive filter,” U.S. Patent 5 526 426, June 11, 1996.

[26] B. Farhang-Boroujeny and K. S. Chan, “Analysis of the frequency-do-main block LMS algorithm,”IEEE Trans. Signal Processing, vol. 48,pp. 2332–2342, Aug. 2000.

[27] K. S. Chan, “Fast block LMS algorithms and analysis,” Ph.D. disserta-tion, Nat. Univ. Singapore, 2000.

[28] J. H. Wilkinson,The Algebraic Eigenvalue Problem. Monographs onNumerical Analysis. Oxford, U.K.: Oxford Univ. Press, 1965.

Kheong Sann Chanwas born in Melbourne, Aus-tralia, in 1972 and grew up in Singapore. He receivedthe B.A. degree in mathematics and physics in 1994and the B.Sc. degree in electrical engineering in 1996from Northwestern University, Evanston, IL. He thenreturned to Singapore to pursue the Ph.D. degree atthe National University of Singapore.

His research interests include analysis and imple-mentation of adaptive filtering algorithms in the fre-quency domain and partial response channel equal-ization. He is currently working at the Data Storage

Institute, National University of Singapore.


Berhouz Farhang-Boroujeny(SM’98) received theB.Sc. degree in electrical engineering from TeheranUniversity, Teheran, Iran, in 1976, the M.Eng. de-gree from University of Wales Institute of Scienceand Technology, Cardiff, U.K., in 1977, and the Ph.D.degree from Imperial College, University of London,London, U.K., in 1981.

From 1981 to 1989, he was with Isfahan Univer-sity of Technology, Isfahan, Iran. From September1989 to August 2000, he was with the Electrical En-gineering Department, National University of Singa-

pore. He recently joined the Department of Electrical Engineering, Universityof Utah, Salt Lake City. His current scientific interests are adaptive filter theoryand applications, multicarrier modulation for wired and wireless channels, codedivision multiple access, and recording channels.

Dr. Farhang-Boroujeny received the UNESCO Regional Office of Scienceand Technology for South and Central Asia Young Scientists Award in 1987in recognition of his outstanding contribution in the field of computer applica-tions and informatics. He is the author of the bookAdaptive Filters: Theory andApplications(New York: Wiley, 1998) and the coauthor of an upcoming bookToeplitz Matrices: Algebra, Algorithms and Analysis(Boston, MA: Kluwer, tobe published). He has been an active member of IEEE. He has served as memberof Signal Processing, Circuits and Systems, and Communications Chapters inSingapore and Utah. He has also served on organizing committees of many inter-national conferences, including Globecom’95 in Singapore and ICASSP’2001,in Salt Lake City, UT.

block lms algorithm

Documents