acceleration of cooley-tukey algorithm using maxeler machine
DESCRIPTION
Acceleration of Cooley-Tukey algorithm using Maxeler machine. Author : Nemanja Trifunović Mentor : Profe s sor dr. Veljko Milutinović. Introduction. Cooley-Tukey algorit h m Fast Fourier Transform Divide and conquer - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Acceleration of Cooley-Tukey algorithm using Maxeler machine](https://reader037.vdocument.in/reader037/viewer/2022102818/56812b26550346895d8f28c2/html5/thumbnails/1.jpg)
Acceleration of Cooley-Tukey algorithmusing Maxeler machine
Author: Nemanja Trifunović Mentor: Professor dr. Veljko Milutinović
![Page 2: Acceleration of Cooley-Tukey algorithm using Maxeler machine](https://reader037.vdocument.in/reader037/viewer/2022102818/56812b26550346895d8f28c2/html5/thumbnails/2.jpg)
Introduction
● Cooley-Tukey algorithm○ Fast Fourier Transform○ Divide and conquer○ Uses: Digital Signal Processing,
Telecommunications, The analysis of sound signals, …
● Maxeler platform○ Data flow
(vs Control flow)○ FPGA
Example of Fourier transformation.
(Source: https://en.wikipedia.org/wiki/File:Rectangular_function.svg; https://en.wikipedia.org/wiki/File:Sinc_function_(normalized).svg, Illustration is published under Creative Commons licencom)
1/22
![Page 3: Acceleration of Cooley-Tukey algorithm using Maxeler machine](https://reader037.vdocument.in/reader037/viewer/2022102818/56812b26550346895d8f28c2/html5/thumbnails/3.jpg)
Problem statement
Design and implementation of:
● The fastest possible system for calculating Fast Fourier Transform using Maxeler machine.
● System that will outperform currently existing solutions to this problem.
2/22
![Page 4: Acceleration of Cooley-Tukey algorithm using Maxeler machine](https://reader037.vdocument.in/reader037/viewer/2022102818/56812b26550346895d8f28c2/html5/thumbnails/4.jpg)
Problem statement
Benefits
● Higher speed of calculation.
● Lower power consumption.● Lower space consumption.
Conditions
● Huge amounts of data.
• Benefits of calculating Fast Fourier Transformwith Maxeler machines
3/22
![Page 5: Acceleration of Cooley-Tukey algorithm using Maxeler machine](https://reader037.vdocument.in/reader037/viewer/2022102818/56812b26550346895d8f28c2/html5/thumbnails/5.jpg)
Conditions and assumptions
● Used Maxeler machine○ Two Maxeler card
type MAX3424A.
● In experiments with multiprocessor systems only one processor core was used.
4/22
![Page 6: Acceleration of Cooley-Tukey algorithm using Maxeler machine](https://reader037.vdocument.in/reader037/viewer/2022102818/56812b26550346895d8f28c2/html5/thumbnails/6.jpg)
Overview of existing solutions
● FFT algorithms: Prime-factor, Bruun’s, Rader’s, Winograd, Bluestein’s, …
● The time complexity: O(N log N).
● Performance comparisonof publicly available implementations.
○ Matteo Frigo and Steven G. Johnson (from MIT)
5/22
![Page 7: Acceleration of Cooley-Tukey algorithm using Maxeler machine](https://reader037.vdocument.in/reader037/viewer/2022102818/56812b26550346895d8f28c2/html5/thumbnails/7.jpg)
Illustration of Matteo Frigo’s and Steven G. Johnson’s experiments. (Soruce: http://www.fftw.org/speed/Pentium4-3.60GHz-icc)
6/22
![Page 8: Acceleration of Cooley-Tukey algorithm using Maxeler machine](https://reader037.vdocument.in/reader037/viewer/2022102818/56812b26550346895d8f28c2/html5/thumbnails/8.jpg)
The proposed solution
● Parallelized radix 2 algorithm.
● Pipeline of depth O(log N), where N is the length of input sequence.
● Latency is proportional to the depth of pipeline.
● After initial delay (latency) one result in every cycle.
7/22
![Page 9: Acceleration of Cooley-Tukey algorithm using Maxeler machine](https://reader037.vdocument.in/reader037/viewer/2022102818/56812b26550346895d8f28c2/html5/thumbnails/9.jpg)
Formal analysis
Radix 2 Cooley-Tukey algorithmoperates as follows:
1. Input sequence is divided into two equal subsequences where even elements make first, while the odd elements make second sequence.
2. Then, using the calculated DFT's of subsequences DFT of the whole sequence is calculated.
8/22
![Page 10: Acceleration of Cooley-Tukey algorithm using Maxeler machine](https://reader037.vdocument.in/reader037/viewer/2022102818/56812b26550346895d8f28c2/html5/thumbnails/10.jpg)
Formal analysisDetailed derivation of the following formula is given it the paper
● DFT of even sequence is denoted by Ek,
● DFT of odd sequence is denoted by a Ok and
● e-2πk/N is denoted by Wkn.
9/22
![Page 11: Acceleration of Cooley-Tukey algorithm using Maxeler machine](https://reader037.vdocument.in/reader037/viewer/2022102818/56812b26550346895d8f28c2/html5/thumbnails/11.jpg)
Illustration of pipelined execution of radix 2 algorithm. 10/22
![Page 12: Acceleration of Cooley-Tukey algorithm using Maxeler machine](https://reader037.vdocument.in/reader037/viewer/2022102818/56812b26550346895d8f28c2/html5/thumbnails/12.jpg)
Measurment and analysis of the performance of proposed implementation
Types of performed experiments
● Calculation of Fourier transformof 100, 1.000, 10.000, 1.000.000 and 10.000.000 consecutive input sequencesof length 8, 16, 32 i 64 points.
● Maxeler implementationvs reference CPU implementation
● Maxeler implementationvs best publicly available implementations
11/22
![Page 13: Acceleration of Cooley-Tukey algorithm using Maxeler machine](https://reader037.vdocument.in/reader037/viewer/2022102818/56812b26550346895d8f28c2/html5/thumbnails/13.jpg)
Generated graphs:
● Maxeler vs best publicly available implementations of FFT algorithm.
● Run-times, depending on the number of consecutive FFT calculations(for input sequences of length 8, 16, 32 and 64).
● Acceleration obtained using Maxeler machine, compared to the CPU execution,depending on the number of consecutive FFT calculations(for input sequences of length 8, 16, 32 and 64).
12/22
![Page 14: Acceleration of Cooley-Tukey algorithm using Maxeler machine](https://reader037.vdocument.in/reader037/viewer/2022102818/56812b26550346895d8f28c2/html5/thumbnails/14.jpg)
The average execution time in seconds of publicly available algorithms for calculating FFT on different architectures
for input sequence of 8 elements. 13/22
![Page 15: Acceleration of Cooley-Tukey algorithm using Maxeler machine](https://reader037.vdocument.in/reader037/viewer/2022102818/56812b26550346895d8f28c2/html5/thumbnails/15.jpg)
Acceleration of Maxeler implementation compared to CPU implementation depending on the number of elements in the input sequence .
14/22
![Page 16: Acceleration of Cooley-Tukey algorithm using Maxeler machine](https://reader037.vdocument.in/reader037/viewer/2022102818/56812b26550346895d8f28c2/html5/thumbnails/16.jpg)
Computation time of consecutive fast Fourier transforms expressed in seconds depending on the number of consecutive calculations.
15/22
![Page 17: Acceleration of Cooley-Tukey algorithm using Maxeler machine](https://reader037.vdocument.in/reader037/viewer/2022102818/56812b26550346895d8f28c2/html5/thumbnails/17.jpg)
Acceleration of Maxeler implementation compared to CPU implementation depending on the number of consecutive calculations.
.
16/22
![Page 18: Acceleration of Cooley-Tukey algorithm using Maxeler machine](https://reader037.vdocument.in/reader037/viewer/2022102818/56812b26550346895d8f28c2/html5/thumbnails/18.jpg)
Analysis of scalability and bottlenecks of proposed solution
● Transfer of data to Maxeler cardand from Maxeler card
● Limited number of hardware resources on single Maxeler card
● Limited number of Maxeler cards
17/22
![Page 19: Acceleration of Cooley-Tukey algorithm using Maxeler machine](https://reader037.vdocument.in/reader037/viewer/2022102818/56812b26550346895d8f28c2/html5/thumbnails/19.jpg)
Analysis of implementation
Maxeler implementation of Cooley-Tukey algorithm consists of:
1. Rearrangement of the input sequencein bit reverse order and
2. Radix 2 algorithm.
18/22
![Page 20: Acceleration of Cooley-Tukey algorithm using Maxeler machine](https://reader037.vdocument.in/reader037/viewer/2022102818/56812b26550346895d8f28c2/html5/thumbnails/20.jpg)
Illustration of the kernel19/22
![Page 21: Acceleration of Cooley-Tukey algorithm using Maxeler machine](https://reader037.vdocument.in/reader037/viewer/2022102818/56812b26550346895d8f28c2/html5/thumbnails/21.jpg)
Implementation details
● Two input and two output streams ● These streams are of type: arrayType
DFEType floatType = dfeFloat(8, 24);DFEArrayType<DFEVar> arrayType =
new DFEArrayType<DFEVar>(floatType, n);
● Ratios Wnk aren’t calculated on Maxeler machine
● Parameters:○ N○ first_level○ last_level
20/22
![Page 22: Acceleration of Cooley-Tukey algorithm using Maxeler machine](https://reader037.vdocument.in/reader037/viewer/2022102818/56812b26550346895d8f28c2/html5/thumbnails/22.jpg)
Conclusion
➔ It’s show that proposed solutionhas expected performance and that it works correctly.
➔ Performance of the proposed solutionis better than performance ofany publicly available implementation of Fast Fourier Transform.
➔ To achieve these speedups it is needed to do consecutive calculations of Fast Fourier Transform
21/22
![Page 23: Acceleration of Cooley-Tukey algorithm using Maxeler machine](https://reader037.vdocument.in/reader037/viewer/2022102818/56812b26550346895d8f28c2/html5/thumbnails/23.jpg)
Q/AThank you for attention