4/18/00spring 2000 fftw workshop1 ahpcc/ncsa workshop fast fourier transform using fftw guobin ma 1...

57
4/18/00 Spring 2000 FFTw workshop 1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 ([email protected]), Sirpa Saarinen 2 ([email protected]), and Paul M. Alsing 1 ([email protected]), 1 AHPCC, 2 NCSA http://www.ahpcc.unm.edu/Workshop/FFTW

Upload: valerie-stone

Post on 12-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 1

AHPCC/NCSA WORKSHOPFast Fourier Transform Using FFTw

Guobin Ma1 ([email protected]),

Sirpa Saarinen2 ([email protected]),

and Paul M. Alsing1 ([email protected]),1AHPCC, 2NCSA

http://www.ahpcc.unm.edu/Workshop/FFTW

Page 2: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 2

ContentsFFT basic (Paul)

What is FFT and why FFT

FFTwOutline of FFTW (Guobin)

Characteristics C routines

Performance and C example codes (Sirpa) Fortran wrappers and example codes (Guobin)

Exercises (skipped)

Page 3: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 3

FFT Basic

What is FFT and why FFT

by Paul Alsing

Page 4: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 4

Fourier Transform: frequency analysis of time series data.DFT: Discrete Fourier Transform (N time/freq points)

FFT: Fast Fourier Transform: efficient implementation ~O(Nlog2N)

12/,,2/,

1,,1,0,,

1

2

1 1

0

/22

1

0

/22

NNntN

nf

Nktktthh

eHN

hdefHth

ehHdtethfH

n

kkk

N

n

Nnkink

tfi

N

k

Nnkikn

tfi

Page 5: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 5

Aliasing issues:

Let fc = Nyquist Frequency

= 1/(2t). A sine wave

sampled at fc will be sampled at

2 points, the peak and the trough.

Frequency components f > | fc |

will be falsely folded back into

the range -fc < f < fc.

Page 6: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 6

Fourier Transform: radix 2, Danielson-Lanczos

sh

nHsh

nH

HWH

eWheWhe

hehe

heH

k

onk

en

on

nen

NinN

kk

NnkinN

kk

Nnki

N

kk

NnkiN

kk

Nnki

N

kk

Nnkin

' original theof components odd thefrom formed N/2length of

FT theofcomponent th theis ;' original theof componentseven

thefrom formed N/2length of FT theofcomponent th theis where

, /212/

0

2//212/

0

2//2

12/

0

/12212/

0

/22

1

0

/2

Page 7: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 7

Fourier Transform: radix 2, Danielson-Lanczos (cont.)

8/length of are ,,,,,,,

4/length of are ,,,

2/length of are ,

Nlength of is

steps8log,

,

,

2424

424

22

NHHHHHHHH

NHHHH

NHH

H

NHWHWHWHW

HWHWHWH

HWHWHWH

HWHH

ooon

oeon

eoon

eeon

ooen

oeen

eoen

eeen

oon

oen

eon

een

on

en

n

oooon

nooen

noeon

noeen

n

eoon

neoen

neeon

neeen

oon

noen

neon

neen

on

nenn

Page 8: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 8

Fourier Transform: radix 2, butterfly Cooley-Tukey algorithm

We finally get down to 1-point transforms such as

The question is: which value of m corresponds to which pattern

of e’s and o’s?

The answer is:

Let {e=0,o=1}. Reverse the pattern of e’s and o’s and you will

have the value of m in binary.

1-Nm 0 somefor e)input valu (some moeeeoeeoeo

n hH

Page 9: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00

Bit reversal:The Cooley-Tukeyalgorithm first rearranges the datain bit reversed form,then builds up thetransform in

N log2N iterations

(decimation in time).

eee

eeo

eoe

eoo

oee

oeo

ooe

eee

eee

eeo

eoe

eoo

oee

oeo

ooe

eee

Page 10: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 10

Ordering oftime series(coord space)and frequenciesin fourier (momentum) space.

Page 11: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

11

Example Application: Quantum MechanicsPropagation of (dimensionless) Schrodinger Wave Function

tk

tk

e

e

tke

tx

tx

e

e

txe

txeee

VTHtxettx

txHtxtxVx

tx

t

txi

Ntki

tki

tiT

NttxiV

ttxiV

tiV

tTitiVtiT

tHi

N

N

,ˆ space, (momentum)fourier In

,

,

,,space coordinateIn

,

,,,

0,,,,,

1

2/2/1

2/2/1

2/

1

,

,

2/2/

2

2

2

21

1

Page 12: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00

x

y

y

x

transpose

Transpose data to keepy transforms continguousin memory.

x data is contiguous in memory (Fortran)

Serial FFTs

Page 13: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

transposeIn parallel, all x transformsare local operations on eachprocessor (no communication)

In performing the transposeprocessors must perform anAll-to-All communication.

Parallel FFTs

y

xP0 P3P1 P2

x

y P2P0 P1 P3

Page 14: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 14

Outline of FFTw

By Guobin Ma

Page 15: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 15

Characteristics of FFTwC routines generated by Caml-Light ML1D/nD, real/complex dataArbitrary input size, not necessary 2n

Serial/Parallel, Share/Distributed MemoryFaster than all others, high performancePortable, automatically adapt to machine

Page 16: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 16

Two Phases of FFTwHardware dependent algorithmPlanner

‘Learn’ the fast way on your machineProduce a data structure --‘plan’Reusable

ExecutorCompute the transform

Apply to all FFTw operation modes 1D/nD, complex/real, serial/parallel

Page 17: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 17

C Routines of FFTwRoutines

1D/nD complex1D/nD realCorresponding parallel (MPI) ones

ArgumentsSpecial notesData formats

Page 18: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 18

1D Complex TransformTypical call

#include <fftw.h>…{ fftw_complex in[N], out[N]; fftw_plan p; … p = fftw_create_plan(int n, fftw_direction dir, int flags); … fftw_one(p, in, out); … fftw_destroy_plan(p);}

Page 19: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 19

1D Complex Transform (cont.) Routines

fftw_plan fftw_create_plan(int n, fftw_direction dir, int flags);

void fftw_one(fftw_plan plan, fftw_complex *in, fftw_complex *out);

fftw_plan fftw_create_plan_specific(int n, fftw_direction dir, int flags,

fftw_complex *in, int istride,fftw_complex *out, int ostride);

Page 20: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 20

1D Complex Transform (cont.) Routines (cont.)

void fftw(fftw_plan plan, int howmany,fftw_complex *in, int istride, int

idist, fftw_complex *out, int ostride, int odist);

fftw_destroy_plan(fftw_plan plan);

Page 21: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 21

1D Complex Transform (cont.) Arguments

plan: data structure containing all the information

n: data size

dir: FFTW_FORWARD (-1), FFTW_BACKWORD (+1)

flags: FFTW_MESURE, FFTW_ESTIMATE, FFTW_OUT_PLACE,FFTW_IN_PLACE, FFTW_USE_WISDOM, separated

by |

howmany: number of transforms / input arrays

in, istride, idist: input arrays, in[i*istride+j*idist]

out, ostride, odist: output arrays, ...

Page 22: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 22

1D Complex Transform (cont.) Notes

out of place (default), in[N], out[N]

in place, save memory, cost more timeignore ostride and odist; ignore out

in-order output, 0 frequency at out[0]

unnormalized, factor of N

Page 23: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 23

nD Complex TransformRoutines, similar to 1D case, except …

fftwnd_plan fftwnd_create_plan(int rank, const *int n, fftw_direction dir, int flags);

void fftwnd_one(fftwnd_plan plan, , );

fftwnd_plan fftw_create_plan_specific(int rank, const *int n, fftw_direction dir, , , , , );

void fftwnd(fftwnd_plan plan, , , , , , , );

fftwnd_destroy_plan(fftwnd_plan plan);

Page 24: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 24

nD Complex Transform (cont.)Arguments

rank: dimensionality of the arrays to be transformed

n: pointer to an array of rank - size of each dimension, e.g. n[8,4,5]

row-major for C, column-major for Fortran

Special routines for 2D and 3D cases

nd -> 2d, 3d

n_dim -> nx, ny or nx, ny, nz

Page 25: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 25

1D Real TransformRoutines, similar to 1D complex case, except …

rfftw_plan rfftw_create_plan( , , );

void rfftw_one(rfftw_plan plan, fftw_real *in, fftw_real *out);

rfftw_plan rfftw_create_plan_specific(int n, fftw_direction dir, int flags, fftw_real *in, int istride, fftw_real *out, int ostride);

void rfftw(rfftwnd_plan plan, int howmany, fftw_real *in, int istride, int idist, fftw_real *out, int ostride, int odist);

rfftw_destroy_plan(rfftw_plan plan);

Page 26: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 26

1D Real Transform (cont.)Arguments

dir: FFTW_REAL_TO_COMPLEX = FFTW_FORWARD = -1 FFTW_COMPLEX_TO_REAL = FFTW_BACK_WARD = 1

others have the same meaning as before

Page 27: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 27

nD Real TransformRoutines, similar to 1D real case, but …

rfftwnd_plan rfftwnd_create_plan(int rank, const *int n, fftw_direction dir, int flags);

void rfftwnd_one_real_to_complex(rfftwnd_plan plan, fftw_real *in, fftw_complex *out);

void rfftwnd_one_complex_to_real(rfftwnd_plan plan, fftw_complex *in, fftw_real *out);

void rfftwnd_real_to_complex(rfftwnd_plan plan, int howmany, fftw_real *in, int istride, int idist, fftw_complex *out, int ostride, int odist);

Page 28: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 28

nD Real Transform (cont.)Routines (cont.)

void rfftwnd_complex_to_real(rfftwnd_plan plan, int howmany, fftw_complex *in, int istride, int idist, fftw_real *out, int ostride, int odist);

rfftwnd_destroy_plan(rfftwnd plan);

Special 2D and 3D routines

Page 29: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 29

nD Array Format

nD arrays stored as a single contiguous blockC order, Row-major order

First index most slowly, last most quickly

Fortran order, Column-major orderFirst index most quickly, last most slowly

Static Array - no problemDynamic Array - may have problem in nD case

Page 30: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 30

Parallel FFTw

Multi-thread Skipped

MPI nD complex

RoutinesNotesData Layout

1D complexnD real

Page 31: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 31

nD Complex MPI FFTwRoutines, similar to uniprocessor case, except mpi…

fftwnd_mpi_plan fftwnd_create_plan(mpi_comm comm, int rank, const *int n, fftw_direction dir, int flags);

void fftwnd_mpi_local_size(fftwnd_mpi_plan p, int *local_first, int *local_first_start, int *local_second_after_transpose, int *local_second_start_after_transpose, int *total_local_size);

local_data = (fftw_complex*) malloc(sizeof(fftw_complex) * total_local_size);

work = (fftw_complex*) malloc(sizeof(fftw_complex) * total_local_size);

Page 32: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 32

nD Complex MPI FFTw (cont.)

Routines (cont.)

void fftwnd_mpi(fftwnd_mpi_plan p, int n_fields, fftw_complex *local_data, fftw_complex *work, fftw_mpi_output_order output_order);

void fftw_mpi_destroy_plan(fftwnd_mpi_plan p);

Page 33: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 33

nD Complex MPI FFTw (cont.)Notets

First argument: comm - MPI communicatorData layoutAll fftw_mpi are in-placework:

Optional, Same size as local_data, great efficiency by extra storage

output_order: normal/transposedtransposed: performance improvements, need to reshape the data manually, may have problem sometimes

Page 34: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 34

nD Complex MPI FFTw (cont.)Data layout

Distributed dataDivided according to row (1st dimension) in CDivided according to column (last dimension) in Fortran

Given plan, all other parameters regarding to data layout are determined by fftwnd_mpi_local_sizetotal_local_size = n1/np*n1*n2…*nk*n_fieldstransposed_order: n2 will be the 1st dimension in output

inverse transform n[n2,n1,n3,...,nk]

Page 35: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 35

1D Complex MPI FFTw Routines, similar to nD case, except no nd…

fftw_mpi_plan fftw_create_plan(mpi_comm comm, int n, fftw_direction dir, int flags);

void fftw_mpi_local_size(fftw_mpi_plan p, int *local_n, int *local_n_start, int *local_n_after_transpose, int *local_start_after_transpose, int *total_local_size);

Page 36: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 36

1D Complex MPI FFTw (cont.) Routines (cont.)

void fftw_mpi(fftw_mpi_plan p, int n_fields, fftw_complex *local_data, fftw_complex *work, fftw_mpi_output_order output_order);

void fftw_mpi_destroy_plan(fftw_mpi_plan p);

Generally worse speedup than nD, fit large size

Page 37: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 37

nD Real MPI FFTw

Similar to that for uniprocessor and complex MPI Speedup 2, save 1/2 space at the expense of more complicated data formatCan have transposed-order output dataNo 1D Real MPI FFTw

Page 38: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 38

Break

Page 39: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 39

FFTw Performance

By Sirpa Saarinen

http://www.ncsa.uiuc.edu/MEDIA/agppt/myFFTW2.ppt

Page 41: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 41

FFTW Fortran Wrappersand Example Codes

By Guobin Ma

Page 42: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 42

FFTw Fortran-Callable WrappersRoutine names, append _f77 in C routine names

fftw/fftwnd/rfftw/rfttwnd ->

fftw_f77/fftwnd_f77/rfftw_f77/rfttwnd_f77fftw_mpi/fftwnd_mpi -> fftw_f77_mpi/fftwnd_f77_mpie.g. fftwnd_create_plan(3, n_dim, FFTW_FORWARD, FFTW_ESTIMATE | FFTW_IN_PLACE)

-> fftwnd_f77_create_plan(plan, 3, n_dim, FFTW_FORWARD, FFTW_ESTIMATE + FFTW_IN_PLACE)

Page 43: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 43

FFTw Fortran-Callable WrappersNotes

Any function that returns a value is converted into a subroutines with an additional (first) parameter. No null in Fortran, must allocate and pass an array for out. nD arrays, column-major, Fortran orderplan variables: be declared as integer

ConstantsFFTW_FORWARD, FFTW_BACKWARD, FFTW_IN_PLACE, …

separated by ‘+’ instead of ‘|’In file fortran/fftw_f77.i, fftw_f90.i, fftw_f90_mpi.i

Page 44: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 44

Fortran ExamplesSource codes at AHPCC (tested on Turing, BB, SGI):

~gbma/workshop/fftw/codes orhttp://www.arc.unm.edu/~gbma/Workshop/FFTW/codesComplex data

1D serial, fftw_1d.f901D parallel, fftw_1d_p.f90nD serial, fftw_3d.f90nD Parallel

Normal order, fftw_3d_p_n.f90 Transposed order, fftw_3d_p_t.f90

Page 45: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 45

Fortran Examples (cont.)1D case

Input

Forward output Inverse output

nD caseInput

Forward outputInverse output

2

2)(

N

N

ikxdkkexf

)1,...12,2,12,...,,...,2,1,0()( NNNkkF)(xf

zyxzkykxki

zyx dkdkdkekkkzyxf zyx )(),,(

),,( zyxf

)1,...,,...,2,1,0(),,( zyxzyx kkkkkkF

Page 46: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 46

1D Serial Fortran ExampleFFTw codes

...

call fftw_f77_create_plan(plan_forward,N, &

FFTW_FORWARD, FFTW_ESTIMATE)

call fftw_f77_create_plan(plan_reverse,N, &

FFTW_BACKWARD,FFTW_ESTIMATE)

...

call fftw_f77_one(plan_forward,in,out)

...

call fftw_f77_one(plan_reverse,out,in)

...

call fftw_f77_destroy_plan(plan_forward)

call fftw_f77_destroy_plan(plan_reverse)

Page 47: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 47

1D Parallel Fortran ExampleFFTw codes

...

call fftw_f77_mpi_create_plan(p_fwd,MPI_COMM_WORLD,N, &

FFTW_FORWARD,FFTW_ESTIMATE)

...

call fftw_f77_mpi_local_sizes(p_fwd, local_n, local_start, &

local_n_after_trans, local_start_after_trans, total_local_size)

...

allocate( psi_local(0:total_local_size-1) )

...

allocate( work(0:total_local_size-1) )

Page 48: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 48

1D Parallel Fortran Example (cont.)FFTw codes (cont.)

...

call fftw_f77_mpi(p_fwd,1,psi_local,work,USE_WORK)

...

call fftw_f77_mpi_destroy_plan(p_fwd)

...

call fftw_f77_mpi_create_plan(p_rvs,MPI_COMM_WORLD,N, &

FFTW_BACKWARD,FFTW_ESTIMATE)

...

call fftw_f77_mpi(p_rvs,1,psi_local,work,USE_WORK)

...

call fftw_f77_mpi_destroy_plan(p_rvs)

Page 49: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 49

nD Serial Fortran ExampleFFTw codes

call fftwnd_f77_create_plan(p_fwd,nd,n_dim, &

FFTW_FORWARD,FFTW_ESTIMATE + FFTW_IN_PLACE)

call fftwnd_f77_one(p_fwd,psi,0)

call fftwnd_f77_destroy_plan(p_fwd)

call fftwnd_f77_create_plan(p_rvs,nd,n_dim, &

FFTW_BACKWARD,FFTW_ESTIMATE + FFTW_IN_PLACE)

call fftwnd_f77_one(p_rvs,psi,0)

call fftwnd_f77_destroy_plan(p_rvs)

Page 50: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 50

nD Parallel Fortran Example FFTw codes, normal order, nD local array

n_dim(1)=nx; n_dim(2)=ny; n_dim(3)=nz

call fftwnd_f77_mpi_create_plan(p_fwd,MPI_COMM_WORLD,&

nd,n_dim,FFTW_FORWARD,FFTW_ESTIMATE)

call fftwnd_f77_mpi_local_sizes(p_fwd, local_nlast, &

local_last_start, local_nlast2_after_trans, &

local_last2_start_after_trans, total_local_size)

allocate( psi_local(0:nx-1,0:ny-1,0:local_nlast-1) )

allocate( work(0:nx-1,0:ny-1,0:local_nlast-1) )

Page 51: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 51

nD Parallel Fortran Example (cont.) FFTw codes, normal order, nD local array (cont.)

call fftwnd_f77_mpi(p_fwd,1,psi_local,work,USE_WORK,order)

call fftwnd_f77_mpi_destroy_plan(p_fwd)

call fftwnd_f77_mpi_create_plan(p_rvs,MPI_COMM_WORLD, &

nd,n_dim,FFTW_BACKWARD,FFTW_ESTIMATE)

call fftwnd_f77_mpi(p_rvs,1,psi_local,work,USE_WORK,order)

call fftwnd_f77_mpi_destroy_plan(p_rvs)

Page 52: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 52

nD Parallel Fortran Example (cont.) FFTw codes, transposed order, 1D local array

n_dim(1)=nx; n_dim(2)=ny; n_dim(3)=nz

call fftwnd_f77_mpi_create_plan(p_fwd,MPI_COMM_WORLD,&

nd,n_dim,FFTW_FORWARD,FFTW_ESTIMATE)

call fftwnd_f77_mpi_local_sizes(p_fwd, local_nlast, &

local_last_start, local_nlast2_after_trans, &

local_last2_start_after_trans, total_local_size)

allocate( psi_local(0:total_local_size-1) )

allocate( work(0:total_local_size-1) )

Page 53: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 53

nD Parallel Fortran Example (cont.) FFTw codes, transposed order, 1D local array (cont.)

call fftwnd_f77_mpi(p_fwd,1,psi_local,work,USE_WORK,order)

call fftwnd_f77_mpi_destroy_plan(p_fwd)

n_dim(1)=nx; n_dim(2)=nz; n_dim(3)=ny

call fftwnd_f77_mpi_create_plan(p_rvs,MPI_COMM_WORLD, &

nd,n_dim,FFTW_BACKWARD,FFTW_ESTIMATE)

call fftwnd_f77_mpi(p_rvs,1,psi_local,work,USE_WORK,order)

call fftwnd_f77_mpi_destroy_plan(p_rvs)

Page 54: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 54

nD Parallel Fortran Example (cont.) Notes

Normal orderEasy to code, ‘low’ performance

Transposed order‘High’ performance, complicated to code, user reorder data

Use-workHigh efficiency, large memory space

Page 55: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 55

Run the Examples at AHPCC Copy files to your directory

cp ~gbma/workshop/fftw/codes/*.* .Compile

make filename.turmake filename.bbmake filename.sgiwith link specification -lfftw -lfftw_mpi (only for MPI)

RunBB: qsub -I -l nodes=2

mpirun -np 2 -machinefile $PBS_NODEFILE filename.bbTuring: filename.turSGI: mpirun -np 2 filename.sgi

Page 56: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 56

References Numerical Recipe (FOTRAN)

by / William T. Vetterling et al., New York : Cambridge University Press, 1992

Numerical integration by P. J. Davis & P. Rabinowitz, Waltham, Mass., Blaisdell Pub. Co. 1967

www.fftw.orgFFTW User’s manual

by M. Frigo & S. G. Johnson

Page 57: 4/18/00Spring 2000 FFTw workshop1 AHPCC/NCSA WORKSHOP Fast Fourier Transform Using FFTw Guobin Ma 1 (gbma@ahpcc.unm.edu), Sirpa Saarinen 2 (sirpa@ncsa.uiuc.edu),

4/18/00 Spring 2000 FFTw workshop 57

Acknowledgement Brain Baltz

installation of FFTw at AHPCCrunning MPI at AHPCC

John Greenfieldsetting up the grid access

Andrew Pinedacomputer work environment at AHPCC

Brain Smith & Susan Atlas many stimulated discussions

Many others ...