fast memory addressing scheme for radix-4 fft implementation presented by cheng-chien wu, master...

Post on 01-Jan-2016

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Fast Memory Addressing Scheme for Radix-4 FFT Implementation

Presented by Cheng-Chien Wu , Master Student of CSIE,CCU

Author: Xin Xiao, Erdal Oruklu and Jafar Saniie

(Illinois Institute of Technology)Source:

IEEE International Conference on Electro/Information Technology, 2009. eit ’09

2

Outline

– Introduction– Radix4-FFT– Related Work– Proposed Method– Experimental Results– Conclusion

3

Introduction

• Fast Fourier Transform (FFT) is widely applied in the speech processing, image processing, and communication system.

• One of the key components for various signal processing and communications applications such as software defined radio and OFDM.

4

Introduction(cont’d)

5

Introduction(cont’d)

• The main objective – This study is primarily Concerned

Improving the performance of the address generation unit of the FFT processor by eliminating the complex critical path components.

6

Outline

– Introduction– Radix4-FFT– Related Work– Proposed Method– Experimental Results– Conclusion

7

Introduction(cont’d)

• Important FFT issues– High throughput– FFT size– Power consumption– Low cost– Area

8

Outline

– Introduction– Radix4-FFT– Related Work– Proposed Method– Conclusion

9

Radix-4

• The N-point discrete Fourier transform is defined by

10

Data Path of Radix-4

11

Butterfly Units

• The N-point FFT can be decomposed to repeated micro-operations called butterfly operations. When the size of the butterfly is r, the FFT operation is called a radix-r FFT.

12

Butterfly Units in Radix-4

13

Memory-based FFT

• In memory-based FFT architecture, only one butterfly structure is implemented in the chip, this butterfly unit will execute all the calculations recursively.

14

Execution Time

• If parallel and pipeline processing techniques are used, an N point radix-r FFT can be executed by clock cycles.

• This indicates that a radix-4 FFT can be four times faster than a radix-2 FFT.

15

Outline

– Introduction– Radix4-FFT– Related Work– Proposed Method– Experimental Results– Conclusion

16

Related Work

Year Title

1969 Organization of Large Scale Fourier Processors

J. Assoc.Comput. Mach.

1976 Simplified control of FFT hardware IEEE Trans. Acoust.,Speech, Signal Processing

1992 Conflict free memory addressing for dedicated FFT hardware

IEEE Trans. Circuits Syst.

1999 An effective memory addressing scheme for FFT processors

IEEE Trans. on Signal Process

2008 An Efficient FFT Engine With ReducedAddressing Logic

IEEE Transactions on Circuits and Systems II

17

Data Path of Radix-2

18

Data Path of Radix-4

19

Outline

– Introduction– Radix4-FFT– Related Work– Proposed Method– Experimental Results– Conclusion

20

Memory Banks

• Four memory banks are used to store the data.

21

Read Ports and Write Ports

• However, for pass 1 and pass 2, four inputs and four outputs of any butterfly stage belong to same memory bank.

• Since each memory bank is a two-port memory, at each clock cycle, each memory bank can export (read) once and import(write) once.

• Four clock cycles are necessary to perform four read and four write accesses in pass 1 and pass 2.

22

Counter D

• Other main components of the FFT processor are Counter D and the barrel shifter. Counter D has two parts:– Pass counter P which is v=log4N

bits (Pv-1 to P0) – Butterfly counter B which is bits

(Bm-1 to B0).

23

Barrel Shifter

• The barrel shifter generates all the addresses for four memory banks based on the pass number of the FFT, which can be expressed as:

RR(counter B, 2p) • where RR(counter B, 2p) means

rotate-right butterfly counter B by 2p bits, and p is the pass number of FFT.

24

Twiddle Factor

• For twiddle factors Wb, Wc and Wd, three memory banks are used with same address generation logic. For pass p, this address is given as:

• (2p 0’s follow)

25

For Larger FFT Size

• For different length FFT transforms, the control logic of the multiplexers only depends on the last three bits of the counter ,so the register and multiplexer structures are fixed for different length FFTs resulting in a common architecture for any N-point FFT.

26

Logic Minimization

• After logic minimization, it results in only primitive logic gates such as AND/OR gates using the least significant bits of the butterfly counter B.

27

Address Sequences(R0~R15)

28

Address Sequences(R16 ~R31)

29

Outline

– Introduction– Radix4-FFT– Related Work– Proposed Method– Experimental Results– Conclusion

30

Experimental Results

31

Experimental Results

32

Outline

– Introduction– Radix4-FFT– Related Work– Proposed Method– Experimental Results– Conclusion

33

Conclusions

• The proposed method for radix-4 FFT avoids any addition in the address generation, enabling a fast data path for butterfly operations.

• The same concept can be extended to any radix FFT, but the amount of registers and multiplexers for different radix FFT will be different: For radix-r FFT, registers and 4r multiplexers are needed.

34

Thanks for Listening

top related