fault tolerant techniques for fft processors

10
732 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 6, JUNE 2005 Design-for-Testability and Fault-Tolerant Techniques for FFT Processors Shyue-Kung Lu, Jen-Sheng Shih, and Shih-Chang Huang Abstract—In this paper, we first propose a novel design-for-testa- bility approach based on M-testability conditions for module-level systolic fast Fourier transform (FFT) arrays. Our M-testability conditions guarantee 100% single-module-fault testability with a minimum number of test patterns. Based on this testable design, fault-tolerant approaches at the bit level and the multiply-sub- tract-add (MSA) module level are proposed, respectively. If the reconfiguration is performed at the bit level, then the FFT net- work is constructed. Two types of reconfiguration schemes (Type-I FFT and Type-II FFT ) are proposed at the MSA module level. Since both the design for testability (DFT) and the design for yield (DFY) issues are considered at the same time for all these proposed approaches, the resulting architectures are simpler as compared with previous works. The reliability of the FFT system increases significantly. The hardware overhead is low—about 12% and for the FFT network and the Type-II FFT network, respectively. An experimental chip is also implemented to verify our approaches. Reliabilities and hardware overhead are also evaluated and compared with previous works. Index Terms—Butterfly network, C-testable, design for testa- bility (DFT), fast Fourier transform (FFT), fault tolerant, logic testing. I. INTRODUCTION F AST FOURIER transform (FFT) algorithms are among the most important digital signal processing (DSP) algorithms. They provide a means to greatly speed up discrete Fourier trans- form computations. The performance improved by FFTs has made the realization of many sophisticated signal processing algorithms economically. Due to the rapid advance in semicon- ductor fabrication technology, a large number of processing ele- ments can be integrated on a single chip. It therefore will soon be possible that special-purpose VLSI chips are used to construct FFT systems. A straightforward implementation of the -point FFT uses two-point butterflies, which consists of stages, and each stage contains two-point butterflies. However, in- tegrating a large number of processors on a single chip results in the increase in the logic-per-pin ratio, which drastically re- duces the controllability and observability of the logic on the chip. Consequently, testing such highly complex and dense cir- cuits becomes very difficult and expensive. There are several testable structures and fault-tolerant de- signs proposed to improve the testability and fabrication yield Manuscript received September 10, 2003; revised November 26, 2004. S.-K. Lu and S.-C. Huang are with the Very Large Scale Integration/ Computer-Aided Design Laboratory, Department of Electronic Engineering, Fu-Jen Catholic University, Taipei 24205, Taiwan, R.O.C. (e-mail: [email protected]. edu.tw). J.-S. Shih was with the Department of Electronic Engineering, Fu-Jen Catholic University, Taipei 24205, Taiwan, R.O.C. Heis now with Pixelworks, U.S., Taipei 24205, Taiwan, R.O.C. (e-mail: [email protected]). Digital Object Identifier 10.1109/TVLSI.2005.844306 of FFT processors, e.g., triple modular redundancy (TMR) with voting [1], hybrid redundancy [2], recomputing with shifted operands (RESO) [3], and triple time redundancy [4]. Some other concurrent error detection (CED) and testable schemes for FFT networks can be found in [5]–[9], [14], and [16]–[22]. In [8], Choi and Malek proposed a scheme called recomputing by alternate path for concurrent error detection and fault diagnosis of FFT networks. Once an error is detected, a faulty butterfly can be located within additional cycles. In [6], Jou and Abraham presented an algorithm-based fault-tolerant scheme for FFT networks. They show that 100% fault coverage and no loss of throughput could be achieved theoretically. Lombardi and Muzio [9] presented a new approach for CED and fault location in homogeneous VLSI/WSI architectures for computing complex FFT. Tao et al. [5] also proposed an algorithm-based CED scheme for FFT processors, which maintains the low hardware overhead and high throughput of Jou and Abraham’s scheme, and at the same time increases the fault coverage significantly. It is well known that the general logic testing problem is NP-complete. For certain iterative logic arrays (ILAs), however, the fault detection problem is solvable in polynomial time [10]. In this paper, we show that the FFT processor can be viewed as an ILA. Our work has grounded on the theory established in a series of papers reported in [11]–[15]. In these papers, testability conditions for -testable [11], [12], and -testable [13]–[15] mesh-connected arrays, hexagonally connected arrays, sequen- tial arrays, and bilateral arrays are proposed. A C-testable array is an array testable with a constant number of test patterns in- dependent of the size of the array. An M-testable array is also an array testable with a constant number of test patterns. How- ever, this constant number is also a minimum value. There- fore, M-testable techniques are always superior to C-testable techniques. In this paper, a design-for-testability approach is applied to the module-level systolic array for computing FFT. Our M-testa- bility conditions guarantee 100% single-module-fault testability with a minimum number of test patterns. Based on this testable design, fault-tolerant approaches at the bit level and the MSA module level are proposed, respectively. If the reconfiguration is perform at the bit level, then the FFT network is con- structed. Two types of reconfiguration schemes (Type-I FFT and Type-II FFT ) are proposed at the MSA module level. Since both the DFT and DFY issues are considered at the same time for all these proposed approaches, the resulting architec- tures are simpler as compared with previous works. The relia- bility of the FFT system increases significantly. The hardware overhead is low—about 12% and for the FFT network 1063-8210/$20.00 © 2005 IEEE

Upload: btechece65

Post on 19-Dec-2015

224 views

Category:

Documents


1 download

DESCRIPTION

Design-for-Testability and Fault-TolerantTechniques for FFT Processors

TRANSCRIPT

Page 1: fault tolerant techniques for FFT processors

732 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 6, JUNE 2005

Design-for-Testability and Fault-TolerantTechniques for FFT Processors

Shyue-Kung Lu, Jen-Sheng Shih, and Shih-Chang Huang

Abstract—In this paper, we first propose a novel design-for-testa-bility approach based on M-testability conditions for module-levelsystolic fast Fourier transform (FFT) arrays. Our M-testabilityconditions guarantee 100% single-module-fault testability with aminimum number of test patterns. Based on this testable design,fault-tolerant approaches at the bit level and the multiply-sub-tract-add (MSA) module level are proposed, respectively. If thereconfiguration is performed at the bit level, then the FFTBIT net-work is constructed. Two types of reconfiguration schemes (Type-IFFTMSA and Type-II FFTMSA) are proposed at the MSA modulelevel. Since both the design for testability (DFT) and the designfor yield (DFY) issues are considered at the same time for all theseproposed approaches, the resulting architectures are simpler ascompared with previous works. The reliability of the FFT systemincreases significantly. The hardware overhead is low—about 12%and 1 2 for the FFTBIT network and the Type-II FFTMSAnetwork, respectively. An experimental chip is also implementedto verify our approaches. Reliabilities and hardware overhead arealso evaluated and compared with previous works.

Index Terms—Butterfly network, C-testable, design for testa-bility (DFT), fast Fourier transform (FFT), fault tolerant, logictesting.

I. INTRODUCTION

FAST FOURIER transform (FFT) algorithms are among themost important digital signal processing (DSP) algorithms.

They provide a means to greatly speed up discrete Fourier trans-form computations. The performance improved by FFTs hasmade the realization of many sophisticated signal processingalgorithms economically. Due to the rapid advance in semicon-ductor fabrication technology, a large number of processing ele-ments can be integrated on a single chip. It therefore will soon bepossible that special-purpose VLSI chips are used to constructFFT systems. A straightforward implementation of the -pointFFT uses two-point butterflies, which consists of stages,and each stage contains two-point butterflies. However, in-tegrating a large number of processors on a single chip resultsin the increase in the logic-per-pin ratio, which drastically re-duces the controllability and observability of the logic on thechip. Consequently, testing such highly complex and dense cir-cuits becomes very difficult and expensive.

There are several testable structures and fault-tolerant de-signs proposed to improve the testability and fabrication yield

Manuscript received September 10, 2003; revised November 26, 2004.S.-K. Lu and S.-C. Huang are with the Very Large Scale Integration/

Computer-Aided Design Laboratory, Department of Electronic Engineering,Fu-Jen Catholic University, Taipei 24205, Taiwan, R.O.C. (e-mail: [email protected]).

J.-S. Shih was with the Department of Electronic Engineering, Fu-JenCatholic University, Taipei 24205, Taiwan, R.O.C. He is now with Pixelworks,U.S., Taipei 24205, Taiwan, R.O.C. (e-mail: [email protected]).

Digital Object Identifier 10.1109/TVLSI.2005.844306

of FFT processors, e.g., triple modular redundancy (TMR) withvoting [1], hybrid redundancy [2], recomputing with shiftedoperands (RESO) [3], and triple time redundancy [4]. Someother concurrent error detection (CED) and testable schemes forFFT networks can be found in [5]–[9], [14], and [16]–[22]. In[8], Choi and Malek proposed a scheme called recomputing byalternate path for concurrent error detection and fault diagnosisof FFT networks. Once an error is detected, a faulty butterflycan be located within additional cycles. In [6],Jou and Abraham presented an algorithm-based fault-tolerantscheme for FFT networks. They show that 100% fault coverageand no loss of throughput could be achieved theoretically.Lombardi and Muzio [9] presented a new approach for CEDand fault location in homogeneous VLSI/WSI architecturesfor computing complex FFT. Tao et al. [5] also proposedan algorithm-based CED scheme for FFT processors, whichmaintains the low hardware overhead and high throughput ofJou and Abraham’s scheme, and at the same time increases thefault coverage significantly.

It is well known that the general logic testing problem isNP-complete. For certain iterative logic arrays (ILAs), however,the fault detection problem is solvable in polynomial time [10].In this paper, we show that the FFT processor can be viewed asan ILA. Our work has grounded on the theory established in aseries of papers reported in [11]–[15]. In these papers, testabilityconditions for -testable [11], [12], and -testable [13]–[15]mesh-connected arrays, hexagonally connected arrays, sequen-tial arrays, and bilateral arrays are proposed. A C-testable arrayis an array testable with a constant number of test patterns in-dependent of the size of the array. An M-testable array is alsoan array testable with a constant number of test patterns. How-ever, this constant number is also a minimum value. There-fore, M-testable techniques are always superior to C-testabletechniques.

In this paper, a design-for-testability approach is applied tothe module-level systolic array for computing FFT. Our M-testa-bility conditions guarantee 100% single-module-fault testabilitywith a minimum number of test patterns. Based on this testabledesign, fault-tolerant approaches at the bit level and the MSAmodule level are proposed, respectively. If the reconfigurationis perform at the bit level, then the FFT network is con-structed. Two types of reconfiguration schemes (Type-I FFTand Type-II FFT ) are proposed at the MSA module level.Since both the DFT and DFY issues are considered at the sametime for all these proposed approaches, the resulting architec-tures are simpler as compared with previous works. The relia-bility of the FFT system increases significantly. The hardwareoverhead is low—about 12% and for the FFT network

1063-8210/$20.00 © 2005 IEEE

Page 2: fault tolerant techniques for FFT processors

LU et al.: DESIGN-FOR-TESTABILITY AND FAULT-TOLERANT TECHNIQUES 733

Fig. 1. Eight-point FFT butterfly network.

and the Type-II FFT network, respectively. An experimentalchip is implemented. Reliabilities and hardware overhead arealso evaluated and compared with previous works.

II. FFT

The discrete Fourier transform is defined by the followingequation:

(1)

where . A straightforward implementation ofan -point FFT in hardware is by a -stage butterfly net-work. Each stage contains two-point butterfly modules.In this paper, such a circuit is called an FFT network, whichis assumed pipelined. An eight-point FFT network is shown inFig. 1. Each butterfly module (i.e., the two-point FFT module)in Fig. 1 performs the following computations:

(2)

where is a representative twiddle factor. All the quantitiesin these equations are complex-valued. For implementation pur-poses, it is necessary to use a functionally equivalent butterflythat employs only real quantities and real operations. Let us ex-press , and in complex form as follows:

(3)

where is the square root of . Combining these equations,we can recast and as

(4)

Fig. 2. Butterfly consists of four MSA modules.

(5)

The butterfly module can be constructed with four identical mul-tiply-subtract-add (MSA) modules, as shown in Fig. 2.

III. REVIEW OF M-TESTABILITY CONDITIONS

Definition: A cell is a combinational machine ,where is the cell function and and

for . A cell can be a bit-level cell suchas the adder cell. Moreover, it can also represent a word-levelcell such as a two-point butterfly module as shown in Fig. 1.An ILA is an array of cells. We use the terms array and ILAinterchangeably.

Definition: A complete or exhaustive input sequence for acell is an input sequence consisting of all possible input combi-nations for the cell, i.e., , where

.Definition: A complete output sequence

is defined analogously. A minimal complete sequence is ashortest such sequence (which has a length of denotesthe word length of a cell).

Definition: A -testable array is an array testable with a con-stant number of test patterns independent of the size of the array.An -testable array is also an array testable with a constantnumber of test patterns. However, this constant number is also aminimum value (equal to ). Therefore, M-testable techniquesare always superior to C-testable techniques.

We assume that the cell’s behavior is invariant over time, evenif it is faulty. A faulty cell’s function may deviate from the cor-rect one in any manner, as long as it remains combinational.That is, we are testing for permanent combinational faults only[12]. We now turn to the FFT arrays. A straightforward imple-mentation of an -point FFT network is to use two-point but-terflies, which consists of stages and each stage contains

two-input butterflies. Let the inputs of a butterfly module,and , be assigned the values and , respectively, and

and be the values of their corresponding outputs (see Fig. 1).Since , and . Thebijectivity of the module function can easily be verified in ourprevious work [11].

Theorem 1: An -point FFT butterfly network can be madeM-testable by swapping the outputs of the lower left cells of

Page 3: fault tolerant techniques for FFT processors

734 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 6, JUNE 2005

Fig. 3. Tessellation of test patterns.

Fig. 4. Four-point FFT network.

each four-point module, where [11]. In otherwords, the number of test patterns for the FFT network is equalto the patterns for a single two-point butterfly module. The tes-sellation of test patterns is shown in Fig. 3, where de-notes minimal complete sequence, and . In thisfigure, modules with a cross mark in them denote their outputsmust be swapped.

Proof: The case where is trivial. For , i.e.,a four-point FFT module as shown in Fig. 4, we apply a min-imal complete sequence to both cell and cell . Sincethe cell function is a bijection, the output sequenceof both cells are also minimal complete. We swap the outputsof cell ; then cell and cell receive sequences and

, respectively. Since, is minimal complete, so is. The resulting outputs of cell and cell are ,

and , respectively, which are also minimal complete.Using this tessellation, each cell receives a minimal completeinput sequence. Thanks to the bijectivity of the cell function,any fault is propagated to some observable primary outputs con-currently.

The four-point FFT module is therefore M-testable after amechanism is implemented on cell to facilitate the capabilityof swapping its outputs during test mode of the network. Ingeneral, for an -point FFT network, it can be shown to beM-testable by induction [11].

According to this theorem, the outputs of the lower leftmodule of each four-point butterfly network should be swapped.Therefore, we modify the original MSA module into a testableMSA module (TMSA) that will be described in the following.

Fig. 5. Testable FFT butterfly network.

Fig. 6. Butterfly consists of four TMSA modules.

IV. TESTABLE DESIGN AT THE MODULE LEVEL

According to Theorem 1, a testable design of the FFT but-terfly networks is shown in Fig. 5. In this figure, the lower leftmodule of each four-point butterfly network is constructed withfour TMSA modules. The two-point butterflies designated asMSA (TMSA) denotes that they are constructed with four MSA(TMSA) modules, respectively.

Since the function of a two-point FFT module is bijective,this leaves us the job of designing the TMSA modules in orderto make the whole array M-testable. The swapping mechanismcan be implemented with negligible cost, since its property isinherent in the computation of the FFT modules. In Section II,we showed that a butterfly module could be constructed withfour identical MSA modules (see Fig. 2). Our goal now is toswap the outputs and of the specified modules in testmode. Let and be the outputs of a module after swap-ping; then the module performs the following function:

(6)

From (4) and (5) we have

(7)

(8)

Comparing (4) and (5) with (7) and (8), respectively, we seethat swapping the outputs is tantamount to changing the sign of

, or to replace the adders by subtractors and vice versa. Thiscan be implemented by using four TMSA modules as shownin Fig. 6. When the processor operates in normal mode, each

Page 4: fault tolerant techniques for FFT processors

LU et al.: DESIGN-FOR-TESTABILITY AND FAULT-TOLERANT TECHNIQUES 735

Fig. 7. MSA module in the form of an ILA.

Fig. 8. Three cell types used in the original MSA module. (a) The multipliercell. (b) The subtractor cell. (c) The adder cell.

Fig. 9. Adder/subtractor cell.

two-point FFT module is configured as four MSA modules, andtheir outputs are not swapped. In test mode, each of the speci-fied FFT modules is configured as four MAS (multiply-add-sub-stract) modules, and their outputs are swapped. A control circuitmust be designed for the TMSA modules to switch betweenthese two configurations. The MSA module in the form of anILA is shown in Fig. 7, where the word length is 3. In this figure,three types of cells are used—the multiplier cell (MC), the addercell (AC), and the subtractor cell (SC). The cell structures areshown in Fig. 8.

The TMSA module is similar to that in Fig. 7. The only dif-ference is that the adder cells and subtractor cells are replacedwith subtractor/adder (SA) cells. The cell structure of the SAcell is shown in Fig. 9. In this figure, the XOR gate is controlled

Fig. 10. Fault-tolerant/testable FFT butterfly network.

Fig. 11. FMSA module in the form of an ILA.

by the (subtractor/adder) selection signal. When ,it performs the subtractor function. When , it performsthe adder function instead.

V. FAULT TOLERANCE AT THE BIT LEVEL

This section deals with the off-line reconfiguration archi-tecture for FFT arrays at the bit level. In our fault-tolerantdesign, a redundant column col is included and placed inbetween the multiplier cells and the subtractor cells for eachTMSA and MSA modules. We call the modified fault-tolerantTMSA and MSA modules the FTMSA (fault-tolerant TMSA)and the FMSA (fault-tolerant MSA) modules, respectively. Thefault-tolerant/testable structure of FFT butterfly networks isshown in Fig. 10. This type of fault-tolerant FFT network isreferred to as the FFT network. In this figure, the lowerleft two-point butterfly of each four-point butterfly networkis constructed with FTMSA modules. The two-point butterflydesignated as FMSA (FTMSA) denotes that the butterflies areconstructed with FMSA (FTMSA) modules, respectively.

The FMSA module in the form of an ILA is shown in Fig. 11,where the word length is 3. Each column in the array is labeledusing the notation col , where . Our reconfigura-tion algorithm proceeds as follows.

1) If col is faulty, . That is, some of the mul-tiplier column is faulty. This faulty col is then replacedwith col , which is in turn replaced with col . This

Page 5: fault tolerant techniques for FFT processors

736 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 6, JUNE 2005

Fig. 12. (a) Multiplier cell (MC). (b) The multiplier/subtractor cell (MS).(c) The adder/subtractor cell (SA). (d) The adder cell (AC).

process continues until col is replaced with the redun-dant col . In this case, functions of col (subtractorcells) and col remain unchanged.

2) If col (subtractor cell) is faulty, then col is usedto replace the faulty column. The functions of all othercolumns remain unchanged.

3) When col (adder cell) is faulty, the subtractor columnand the redundant column are used to replace the addercolumn and subtractor column, respectively. In otherwords, col is replaced with col and col isreplaced with col . The functions of all other columnsremain unchanged.

To implement the FMSA module, four types of cells areused in the fault-tolerant design: 1) the multiplier cell (MC);2) the multiplier-subtractor cell (MS); 3) the subtractor-addercell (SA); and 4) the adder cell (AC). All these cells shouldhave bypass capability and have almost the same complexity.The detailed implementations of these cell types are shownin Fig. 12. For the multiplier cell and the adder cell, the onlydifference with their corresponding original cells is the in-clusion of a multiplexer which control the source of theoutput signal. The multiplexer is controlled by BP and BP(bypass signals), denotes the column index, for adder cells andmultiplier cells, respectively. That is, when BP BP ,the adder (multiplier) cells perform their normal function.When BP BP , they act merely as bypass registers.The design of the redundant multiplier-subtractor cell is morecomplicated. In Fig. 12(b), there are two multiplexers ( and

) included in the cell. is controlled by the BP signal,and denotes the redundant column. has the identicalfunction as ’s in MC and AC cells. is controlled by the

(multiplier/subtractor) selection signal.

Fig. 13. FTMSA module in the form of an ILA.

When , it performs the multiplier function. When, it performs the subtractor function instead. If this

column is faulty, it can be bypassed by controlling the BPsignal. The subtractor/adder cell is shown in Fig. 12(c). To im-plement its function, a CMOS XNOR gate, which can be im-plemented with four transistors is included in the design. Bycontrolling , we can switch the function of the cell betweennormal phase and reconfiguration phase. If this column is faulty,it can be bypassed by controlling the BP signal. Furthermore,the FTMSA module in the form of an ILA is shown in Fig. 13,where the word length is 3. The unique difference between theFTMSA module and the FMSA module is that the last columnof the FTMSA module consists of subtractor/adder cells. Intest mode, the FTMSA module must be configured as a MASmodule, whereas the FMSA module must be configured as aMSA module. For the bit-level design, the FTMSA modules asshown in Fig. 10 can also perform the swap operation. There-fore, Theorem 1 can be applied to this fault-tolerant design di-rectly. We can conclude that this fault-tolerant architecture isalso M-testable.

For the bit-level fault-tolerant design, a single column is usedas the basic replacement element. Therefore, diagnosis algo-rithms must be used to locate a faulty column. Since each cellin the FTMSA module contains a bypass multiplexer, it can beused to isolate a single stage first. If a single stage is isolatedand the faulty behaviors can not be observed from the primaryoutputs, we can conclude that the isolated stage is faulty. It is ev-ident that the complexity to isolate a faulty stage is .Similarly, after a faulty stage is located, a single column withina butterfly module can be isolated to locate the faulty column.The diagnosis complexity is .

VI. FAULT TOLERANCE AT THE MSA MODULE LEVEL

This section deals with the off-line reconfiguration architec-ture for FFT arrays at the MSA module level FFT . Con-sider the four MSA modules shown in Fig. 2. These four MSAmodules can be divided into two groups and each contains 2MSA modules. Group and Group compute the real part andimaginary part of the outputs, respectively. In our reconfigurablearchitecture, an extra MSA module MSA is included andplaced at the top of Group and Group , so there are a total of

Page 6: fault tolerant techniques for FFT processors

LU et al.: DESIGN-FOR-TESTABILITY AND FAULT-TOLERANT TECHNIQUES 737

Fig. 14. Fault-tolerant structure of a butterfly module.

Fig. 15. Reconfiguration of MSA modules.

five MSA modules in a butterfly. Extra local interconnectionsare added in the design as shown in Fig. 14.

If a faulty module is identified, then the reconfiguration mech-anism must be activated to replace the faulty module. AssumeMSA is faulty; then according to our reconfiguration algorithm,the faulty module is replaced by MSA , which is in turnreplaced by MSA , and so on. Finally, the first redundantmodule is used. Fig. 15 shows the case when MSA is faulty.MSA and MSA constitute the first group, and MSA andMSA constitute the second group. The reconfiguration mech-anism is simply a multiplexer placed in each MSA module toselect appropriate inputs. This type of FFT is referred to asthe Type-I FFT network as opposed to the Type-II FFTnetwork discussed next. This scheme has a low hardware over-head as compared with the original circuit—approximately 25%(an extra MSA module out of four).

Instead of using a redundant MSA module in each butterfly,the Type-II FFT network adds a redundant MSA module ateach stage. This redundant MSA module, which is not limitedto be used for replacement in the same butterfly, can be used toreplace any faulty MSA module at the same stage. Therefore, thehardware overhead is approximately . It is clear that theType-II FFT network has more efficient resource utilizationthan the Type-I FFT network.

In order to test the MSA, FMSA, and FTMSA modules com-pletely. We cannot assume that there is no fault in the XOR circuit(DFT) as well as the multiplexers (DFY). Fortunately, we can

apply the all-0 and all-1 patterns to the bit-level cells in normalmode. The reason to choose these two patterns is that the out-puts of the basic cell are the same as the inputs when they are ap-plied. These two patterns can detect all the stuck-at faults of theXOR gate and the multiplexers during normal operation mode.Therefore, only patterns are required to achieve 100%fault coverage for the MSA module. For this module-level de-sign, since four MSA modules can be constructed within eachmodified butterfly module, then Theorem 1 can also be applieddirected. Therefore, we can see that the fault-tolerant design isalso M-testable.

VII. RELIABILITY AND HARDWARE OVERHEAD ANALYSIS

Let the reliability of a fault-tolerant MSA module in theFFT network be . The word length of the processor isdenoted as . Assume that each cell becomes faulty randomlyand independently, with a constant failure rate . Then, thereliability of a single cell is . The reliability of a column,

, is . Then can be expressed as follows:

(9)

The reliability of a butterfly module with bit-levelfault-tolerant design is

(10)

Similarly, the system reliability of the FFT networkcan be expressed as

(11)

The hardware overhead of the FFT network is evaluatedin the following. We define TC TC , and TCas the transistor counts of a single MSA, FMSA, and FTMSAmodule, respectively. The term TC is defined as the number ofextra transistors in the FFT network. In other words

TC TC TC

TC

TC TC

TC

TC TC (12)

The hardware overhead ratio (HO) is defined as

HOTC

TCTC

TC(13)

The number of multiplier cells, multiplier-subtractor cells,subtractor-adder cells, adder cells, and corner cells in a single

Page 7: fault tolerant techniques for FFT processors

738 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 6, JUNE 2005

FMSA module are denoted as , and, respectively. The number of multiplier cells, multiplier-

subtractor cells, subtractor-adder cells, and corner cells in asingle FTMSA module are denoted as ,and , respectively. The word length is denoted as . FromFigs. 11 and 13, we can see that

. The tran-sistor counts of a MSA, FMSA, and FTMSA module are

TC

TC

TC

(14)

respectively. Substituting (14) into (12) and (13), we can findthe hardware overhead of the FFT network. For example, if

and , the hardware overhead ratio is calculatedas HO %.

Now we turn our analysis to the module level design. Let thereliability of a MSA module at the module level be . Then

can be expressed as follows:

(15)

The reliability of a two-point butterfly module in the Type-IFFT network is then given by

(16)

For the network to work correctly, all two-point butterfliesmust work correctly. Therefore, the reliability of the Type-IFFT network can be expressed as

(17)

Similarly, the reliability to obtain an operational butterflystage in the Type-II FFT network can be expressed as

(18)

For the FFT processor to work correctly, all the stages mustoperate correctly. That is, the system reliability for the Type-IIFFT network can be expressed as

(19)

Since the number of MSA modules at each stage is . Thehardware overhead for the Type-II FFT network is approx-imately (one redundant MSA is added in each stage).Note that the extra routing area is neglected here since they areconnected locally and occupies less than 3% of the area of a

Fig. 16. Chip layout.

MSA module. Furthermore, the switches of the MSA-level de-sign are assumed to be fault free in our analysis. In fact, theadded routing areas may affect the reliability of the system. Totake the effect into consideration, we can increase the failurerate of a cell proportional to the overall hardware overhead ratio.That is, the reliability of a cell becomes to HO . For ex-ample, the hardware overhead ratio can be found in the previousdiscussions. We can substitute the values of and to theseequations to obtain the real hardware overhead. Then the com-pared results can be analyzed.

VIII. EXPERIMENTAL RESULTS AND COMPARISONS

To verify the bit-level design, a VLSI chip for theFTMSA module is designed using Cadence full-custom designtools. The technology used is TSMC 0.18 m, 1p6m. The tran-sistor count is 40 464, and the chip size is 3.79 mm . The wholechip layout is shown in Fig. 16. The area overhead is about 12%.This overhead is lower than the analysis results shown above.This is since several routing layers can be used for the layout. Ifmore layout layers are used, then the overhead may be furtherreduced.

The reliabilities of the bit-level designs with different compu-tation points are shown in Fig. 17, where and

. The curve marked nonred denotes that it is a nonfault-tol-erant design. From this figure, we can find that if the com-putation point increase, the reliability decreases significantly.However, even the computation point is 256, the reliability isstill higher than the nonfault-tolerant design. The reliabilities ofthe bit-level designs with different word lengths are shown inFig. 18. The computation point is assumed to be 16. From thisfigure, we can find that less word length will result in greaterreliability improvement.

The reliabilities of the module-level design (a redundantMSA module is added in each two-point butterfly) for differentcomputation points and word lengths are shown in Figs. 19and 20, respectively. From Fig. 20 we can find that if the wordlength is greater than 32, the reliability is even lower than thenonfault-tolerant design. This is because a redundant MSAmodule with larger word length has higher area. Therefore, the

Page 8: fault tolerant techniques for FFT processors

LU et al.: DESIGN-FOR-TESTABILITY AND FAULT-TOLERANT TECHNIQUES 739

Fig. 17. Reliabilities of the bit-level designs with different computation points(� = 0:0005; w = 16).

Fig. 18. Reliabilities of the bit-level designs with different word lengths. (� =0:0003;N = 16).

probability of getting a faulty module increases significantly.Comparing Figs. 18 and 20, we can find that the bit-leveldesigns have higher reliabilities than the module-level designs.

The comparison of our approaches (FFT , Type-I FFT ,and Type-II FFT ) with previous works [18], [22], [24], [25]is shown in Table I. The approach used in [22] uses a BISTcircuitry in each eight-point FFT network with word length

. The test pattern generator (TPG) used is a pseudorandompattern generator, which cannot guarantee 100% fault coveragewith a test length of 4096. In [24], the test approach is derivedfrom algorithm flow graphs (AFGs), which allows detection andlocation of all single faults. Moreover, interconnect faults canbe covered in operations for an -point FFT network. In[25], a C-testable approach based on component-level faults wasproposed. It requires test patterns to test the wholeFFT network, where denotes the number of test patternsrequired for testing a component.

Fig. 19. Reliabilities of the module-level designs with different computationpoints. (� = 0:0001;w = 16).

Fig. 20. Reliabilities of the module-level designs with different word lengths.(� = 0:0005;N = 16).

Our proposed approaches are improvements over our pre-vious work [18], which are superior to these as can be seen fromthe table. The DFT technique used in [18] aims at the bit level.Since the basic bit-level cells (adder cells, multiplier cells, andsubtractor cells) do not inherently possess the property of bijec-tion, therefore, two multiplexers should be added to make thembijective. Therefore, significant hardware and delay overhead isrequired (5.66%). Moreover, the fault-tolerant design proposedin [18] used a spare row. In this approach, the faulty row is re-placed by the neighboring row to its above, which is in turn re-placed by the next row to the above, and so on. Since a basiccell contains three vertical inputs, it requires three multiplexersfor each cell to bypass itself when it is faulty. Therefore, theoverhead for the DFY/DFY design is almost 40%. On the con-trary, the column-based reconfiguration used in FFT is sim-pler than that in [18] since each cell contains only one horizontalinput. From Table I, we can see that the proposed three fault-tol-erant approaches are all superior to [18] in terms of hardware

Page 9: fault tolerant techniques for FFT processors

740 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 6, JUNE 2005

TABLE IFAULT-TOLERANT AND TEST FEATURE COMPARISON WITH PREVIOUS SCHEMES

overhead. Although the number of test patterns is greater thanthat in [18], however, the test patterns can be generated by a bi-nary counter, which can provide at-speed testing. Therefore, thetestable design proposed in this paper is also suitable for prac-tical applications.

Since some modifications are performed for the basic cells,the performance penalty is inevitable. We used HSPICE circuitsimulation tool to estimate the dynamic performance of Fig. 8(b)(the subtractor cell) and Fig. 12(c) (the adder/subtractor cellwith DFT and DFY circuits incorporated). We design all nMOSand pMOS transistors with a (W/L) ratio of (0.5 m/0.18 m),which is the minimum transistor size allowed in the 0.18- mprocess technology. The differences between the propagationdelay times of the for both circuits are 0.0248 ns. It is about10% higher than the original propagation delay time. However,in order to increase the chip’s testability, reliability, and yield,this performance penalty is inevitable. If the timing specifica-tions are tight, we still can use circuit design techniques to curethis problem. For example, we can increase the (W/L) ratios ofall transistors in the critical path of the cell. Of course, this willsuffer from the increasing of chip area.

IX. CONCLUSION

In this paper, M-testability conditions and a design-for-testa-bility technique are applied for testable design of FFT butterflynetworks. Our M-testability conditions guarantee 100% single-module-fault testability with a constant number of test patterns,which results in a design-for-testability approach requiring neg-ligible hardware overhead. The number of test patterns neededfor M-testing the FFT processors at the module level is equalto that for a single module. If the word length is greater than16, M-testing the FFT processor at the bit level is more ap-propriate. Although the fault model adopted here is the singlemodule fault, our M-testability condition can be applied to lowerlevel fault models, such as delay faults and sequential faults.Moreover, built-in self-test structures can easily be designed andapplied to the module-level array, which can be tested at thesystem clock rate. Based on this testable design, fault-tolerantapproaches at the bit level and the MSA module level are pro-posed, respectively. If the reconfiguration is perform at the bitlevel, then the FFT network is constructed. Two types of re-configuration schemes (Type-I FFT and Type-II FFT )are proposed at the MSA module level. The resulting architec-tures are simpler as compared with previous works. The relia-bility of the FFT system increases significantly. The hardwareoverhead is low—about 12% and for the FFT network

and the Type-II FFT network, respectively. An experimentalchip is also implemented to verify our approaches. Reliabilitiesand hardware overhead are also evaluated and compared withprevious works.

REFERENCES

[1] B. W. Johnson, Design and Analysis of Fault Tolerant Digital Systems.Reading, MA: Addison-Wesley, 1989.

[2] P. K. Lara, Fault Tolerant and Fault Testable Hardware Design. En-glewood Cliffs, NJ: Prentice-Hall, 1987.

[3] S. Laha and J. H. Patel, “Error correction in arithmetic operations usingtime redundancy,” in Proc. 13th Annu. Int. Symp. Fault-Tolerant Com-puting, Jun. 1983, pp. 298–305.

[4] E. E. Swartzlander, Jr. et al., “Sign/logarithm arithmetic for FFT imple-mentation,” IEEE Trans. Comput., vol. C-32, no. 6, pp. 526–534, Jun.1983.

[5] D. L. Tao, C. R. P. Hartmann, and Y. S. Chen, “A novel concurrent errordetection scheme for FFT networks,” in Proc. Int. Symp. Fault-TolerantComput., Jun. 1990, pp. 114–121.

[6] J. Y. Jou and J. A. Abraham, “Fault-tolerant FFT networks,” IEEE Trans.Comput., vol. 37, no. 5, pp. 548–561, May 1988.

[7] M. Tsunoyama and S. Naito, “A fault-tolerant FFT processor,” in Int.Symp. Fault-Tolerant Comput., Jun. 1991, pp. 128–135.

[8] Y. H. Choi and M. Malek, “A fault-tolerant FFT processor,” IEEE Trans.Comput., vol. 37, no. 5, pp. 617–621, May 1988.

[9] F. Lombardi and J. Muzio, “Concurrent error detection in reconfigurableWSI structures for FFT computation,” in Proc. Int. Conf. Wafer ScaleIntegration, 1991, pp. 46–53.

[10] H. Fujiwara and S. Toida, “The complexity of fault detection problemsfor combinational logic circuits,” IEEE Trans. Comput., vol. C-31, no.6, pp. 555–560, Jun. 1982.

[11] S. K. Lu, C. W. Wu, and S.-Y. Kuo, “Enhancing testability of VLSI ar-rays for fast Fourier transform,” Proc. Inst. Elect. Eng., E, vol. 140, no.3, pp. 161–166, May 1993.

[12] C. W. Wu and P. R. Cappello, “Easily testable iterative logic arrays,”IEEE Trans. Comput., vol. 31, no. 5, pp. 640–652, May 1990.

[13] W. H. Kautz, “Testing for faults in combinational cellular logic arrays,”in Proc. 8th Annu. Symp. Switching, Automata Theory, 1967, pp.161–174.

[14] P. R. Menon and A. D. Friedman, “Fault detection in iterative arrays,”IEEE Trans. Comput., vol. C-20, pp. 524–535, May 1971.

[15] A. D. Friedman, “Easily testable iterative systems,” IEEE Trans.Comput., vol. C-22, pp. 1061–1064, Dec. 1973.

[16] T. H. Chen and L. G. Chen, “Concurrent error-detectable butterfly chipfor real-time FFT processing through time redundancy,” IEEE J. Solid-State Circuits, vol. 28, no. 5, pp. 537–547, May 1993.

[17] V. K. Jain, H. A. Nienhaus, D. L. Landis, S. Al-Arian, and C. E. Alvarez,“Wafer scale architecture for an FFT processor,” in Proc. Int. Symp. Cir-cuits Systems, 1989, pp. 453–456.

[18] J. F. Li, S. K. Lu, S. Y. Huang, and C. W. Wu, “Easily testable and faulttolerant FFT butterfly networks,” IEEE Trans. Circuits Syst. II, AnalogDigit. Signal Process., vol. 47, no. 9, pp. 919–929, Sep. 2000.

[19] V. K. Jain, H. Hikawa, and E. E. Swartzlander, “Defect tolerance andyield for a wafer scale FFT processor system,” in Proc. Int. Conf. WaferScale Integration, 1991, pp. 54–60.

[20] V. Piuri and E. E. Swartzlander, “Time-shared modular redundancy forfault-tolerant FFT processors,” in Proc. Int. Symp. Defect Fault Toler-ance in VLSI Systems, 1999, pp. 265–273.

[21] L. Breveglieri and V. Piuri, “A fast pipelined FFT unit,” in Proc. Int.Conf. Application Specific Array Processors, 1994, pp. 143–151.

Page 10: fault tolerant techniques for FFT processors

LU et al.: DESIGN-FOR-TESTABILITY AND FAULT-TOLERANT TECHNIQUES 741

[22] K. Yamashita, A. Kanasugi, S. Hijiya, G. Goto, N. Matsumura, and T.Shirato, “A wafer-scale 170 000-gate FFT processor with built-in testcircuits,” IEEE J. Solid-State Circuits, vol. 23, no. 2, pp. 336–342, Apr.1988.

[23] V. K. Jain, S. A. Al-Arian, D. L. Landis, and H. A. Nienhaus, “Fullyparallel and testable WSI architecture for an FFT processor,” Int. J.Comput.-Aided VLSI Des., vol. 3, pp. 113–131, 1991.

[24] A. Antola and M. G. Sami, “Testing and diagnosis of FFT arrays,” J.VLSI Signal Process., vol. 3, pp. 225–236, 1991.

[25] C. Feng, J. C. Muzio, and F. Lombardi, “On the testability of the arraystructures for FFT computation,” J. Electron. Testing: Theory Applicat.,vol. 4, pp. 215–224, Aug. 1993.

Shyue-Kung Lu received the Ph.D. degree in elec-trical engineering from the National Taiwan Univer-sity, Taipei, in 1995.

From 1995 to 1998, he was an Associate Pro-fessor in the Department of Electrical Engineering,Lunghwa Junior College of Technology and Com-merce. Since 1998, he has been with the Departmentof Electronics Engineering, Fu Jen Catholic Univer-sity, Taipei, where he is a Professor. His researchinterests include the areas of VLSI testing andfault-tolerant computing.

Jen-Sheng Shih was born in Taiwan, R.O.C., in1973. He received the B.S. degree from the NationalTaipei University, Taipei, Taiwan, and the M.S.degree from the Fu-Jen Catholic University, Taipei,in 1998 and 2000, both in electronic engineering.

He was with the Acer Laboratories Inc. (Ali) from2000 to 2003, where he was engaged in MPEG IC de-sign. He joined Pixelworks, U.S., Taipei, from 2003to now. His research interests include FPGA testing,MPEG design, and LCD TV systems.

Shih-Chang Huang received the B.S. degree inelectronic engineering from Fu-Jen Catholic Univer-sity, Taipei, Taiwan, R.O.C., in 2002. He is currentlypursuing the M.S. degree in electrical engineering atFu-Jen Catholic University.