eads: accelerator project

17
EADS: Accelerator Project Rohit Prakash (2003CS10186) Anand Silodia (2003CS50210)

Upload: gareth-boone

Post on 30-Dec-2015

32 views

Category:

Documents


0 download

DESCRIPTION

EADS: Accelerator Project. Rohit Prakash (2003CS10186) Anand Silodia (2003CS50210). Speed up scientific application. Application. Candidate Partition. Performance Prediction. Choose next partition. 28 th January : Figure out the best algorithm of FFT - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: EADS: Accelerator Project

EADS: Accelerator Project

Rohit Prakash (2003CS10186)Anand Silodia (2003CS50210)

Page 2: EADS: Accelerator Project

Speed up scientific application

Application

Candidate Partition

Performance Prediction

Choose next partition

Page 3: EADS: Accelerator Project

Time lines (tentative)

28th January : Figure out the best algorithm of FFTCompare the algos on the following parameters – - Execution Time - No. of multiplications - No. of additions

19th February : Study hardware implementation of FFT.....

Page 4: EADS: Accelerator Project

Terminologies

radix : The "radix" is the size of an FFT decomposition twiddle factors: "Twiddle factors" are the coefficients used to combine results from a previous stage to form inputs to the next stage

Page 5: EADS: Accelerator Project

First Implementation

Implemented Recursive radix-4 FFT analysed this using gprof Looked into other FFT implementations

iterative parallel split radix

Page 6: EADS: Accelerator Project

Analysis of the implementation

Considered FFT of 1024 random points (double)

Results from gprof -> No. of Complex multiplications : 21760 No. of Complex additions : 7680

(Each complex multiplication consists of 4 real multiplications and 2 real additions)

(Each complex addition/subtraction consists of 2 real additions/subtractions)

Page 7: EADS: Accelerator Project

Problems with this implementation

Inefficient use of memory (recursive procedure)

Wasted computations (some factors computed multiple times)

Maximum time utilized in computing Twiddle factors (complex number multiplications)

Page 8: EADS: Accelerator Project

2nd Implementation

Radix-4 iterative in-place implementation -iterativeFFT(a)

BitReversal(a,A)

n length(a)

for(s 1 to log4(n)) // logarithm is of base 4

{

do m 4s

ω e2Лi/m

for(k0 to n-1 by m)

{

do τ 1

for(j0 to m/4)

{

tA[k+j]

u τ A[k+j+m/4]

v τ2A[k+j+2*m/4]

x τ3A[k+j+3*m/4]

A[k+j]t+u+v+x

A[k+j+m/4]t+(i)u-

v-(i)x

A[k+j+2*m/4]t-u+v-

x

A[k+j+3*m/4]t-

(i)u-v+(i)x

τ τ* ω

}

}

}

Page 9: EADS: Accelerator Project

Analysis of this implementation

Considered FFT of 1024 random points (double)

Results from gprof -> No. of Complex multiplications : 14080 No. of Complex additions/subtractions : 7680

(Each complex multiplication consists of 4 real multiplications and 2 real additions)

(Each complex addition/subtraction consists of 2 real additions/subtractions)

Page 10: EADS: Accelerator Project

Improvements

Precompute twiddle factors Trade additions for multiplications

(it’s possible to multiply with 3 real multiplies and 5 real adds rather than usual 4 real multiplies and 2 real adds)

use compiler flags (10%-15% execution time on some systems) -O3 -march=pentiumpro -ffast-math -fomit-frame-pointer

Page 11: EADS: Accelerator Project

Some results

Precomputing twiddle factors: No. of multiplications : 8960 5120 less multiplications (complex)

Trading multiplications for additions Did not show any appreciable decline in

execution time Using compiler flags

Drastic improvement in execution time

Page 12: EADS: Accelerator Project

Comparative Analysis

User time for 1024 points

0

5

10

15

20

25

30

recursive fft inplace fft inplace twiddleprecompute

inplace twiddlecompiler

final fftw

tim

e (m

illi

seco

nd

s)

Page 13: EADS: Accelerator Project

User time for 4096 points

0

10

20

30

40

50

60

70

recursive fft inplace fft inplace twiddleprecompute

inplace twiddlecompiler

final fftw

tim

e (m

illi

seco

nd

s)

Page 14: EADS: Accelerator Project

User time for 262144 points

0

500

1000

1500

2000

2500

recursive fft inplace fft inplacetwiddle

precompute

inplacetwiddle

compiler

final fftw

tim

e (

mil

lis

ec

on

ds

)

Page 15: EADS: Accelerator Project

Further enhancements possible

Use higher radix – 8,16,32, etc. Use split-radix or Winograd algorithms If data is real, we can have great

improvements Use Fast Bit-Reversal method (IEEE D.M.W.

Evans)

Page 16: EADS: Accelerator Project

Resources

Rivest, Cormen Numerical Recipes in C IEEE papers

Conversion of Digit-Reversed to Bit-Reversed order in FFT algorithms (Panos E. and C.S. Burrus)

The Design and Implementation of FFTW3 (Matteo Frigo and Steven G. Johnson)

cnx.org Other fft implementations on the net

Best: fftw

Page 17: EADS: Accelerator Project

Thank You