porting from the cray t3e to the ibm sp
DESCRIPTION
Porting from the Cray T3E to the IBM SP. Jonathan Carter NERSC User Services. Overview. Focus is on Fortran programs using MPI for communication Outline common pitfalls: f90 vs. xlf Fortran compiler Cray vs. IBM MPI library Math libraries System libraries I/O. - PowerPoint PPT PresentationTRANSCRIPT
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
1
Porting from the Cray T3E to the IBM SP
Jonathan Carter
NERSC User Services
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
2
Overview
• Focus is on Fortran programs using MPI for communication
• Outline common pitfalls: – f90 vs. xlf Fortran compiler
– Cray vs. IBM MPI library
– Math libraries
– System libraries
– I/O
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
3
f90 vs. xlf - Main Differences
• f90
– compiles for parallel (MPI) automatically
– accepts file suffix .f90, .F90
– default optimization is -O2
– allows access to full memory on a PE by default
• xlf
– compiler is accessed by several names, each name “packages” options together
– by default, only file suffix .f and .F allowed
– default is no optimization
– restricted amount of memory available by default
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
4
xlf Compiler Options
• Compiler name can have three parts:– optional prefix “mp” indicates MPI library is automatically linked
– compiler name, xlf, xlf90, or xlf95 indicates language mode
– optional postfix “_r” indicates threads, or OpenMP capability
• Example:– mpxlf90 - Fortran 90 language compiler with MPI library available
– mpxlf_r - Fortran 77 language compiler with MPI library, threads, and OpenMP capability available.
• If you want to use MPI I/O, the thread capable compiler must be used.
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
5
xlf Compiler Options
• To use different file suffixes, e.g. .f90 and .F90:– -qsuffix=f=f90,F=F90
• For optimization we recommend:– -O3 -qtune=pwr3 -qarch=pwr3 -qstrict
• xlf defaults to 32 Kbytes for stack space and 128 Mbyte for heap space. To increase to maximums of 256 Mbyte for stack, and 2 Gbyte for heap:– -bmaxstack:0x10000000 -bmaxstack:0x80000000
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
6
Default Datatypes
Type T3ELength (bytes)
SPLength (bytes)
Character 1 1Complex 2 x 8 2 x 4Double Complex 2 x 8 2 x 8Double precision 8 8Integer / Logical 8 4Real 8 4
• Double Complex is a language extension• Assume -dp flag for f90• xlf compiler has -qrealsize=8 to promote all default reals and real constants to 8 bytes. Also, -qintsize=8 to promote all integers and logicals.
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
7
Available Datatypes
• Fortran 77 “*” syntax is also available to explicitly define a datatype
Type Kind T3ELength (bytes)
SPLength (bytes)
4 2 x 4 2 x 48 2 x 8 2 x 8
Complex
16 NA 2 x 161 1 42 2 44 4 4
Integer /Logical
8 8 84 4 48 8 8
Real
16 NA 16
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
8
MPI Differences
• Different default datatypes between T3E and SP
• More error checking of arguments on the SP
• Default amount of buffering is different
• Different subset of MPI I/O implemented
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
9
Available MPI Datatypes
Type T3ELength (bytes)
SPLength (bytes)
MPI_Character 1 1MPI_Complex 2 x 8 2 x 4MPI_Double_Complex 2 x 8 2 x 8MPI_Double_Precision 8 8MPI_Integer 8 4MPI_Logical 8 4MPI_Real 8 4
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
10
Default MPI Datatypes
Type T3ELength (bytes)
SPLength (bytes)
MPI_Complex8 NA 2 x 4MPI_Complex16 NA 2 x 8MPI_ Complex32 NA 2 x 16MPI_Integer1 4 1MPI_Integer2 4 2MPI_Integer4 4 4MPI_Integer8 8 8MPI_Logical1 NA 1MPI_Logical2 NA 2MPI_Logical4 NA 4MPI_Logical8 NA 8MPI_Real4 4 4MPI_Real8 8 8MPI_Real16 NA 16
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
11
MPI - Argument Checking
• T3E MPI library has several collective routines which do not check arguments in accordance with the MPI standard. The SP does check arguments.
• Examples:– MPI_Bcast “count” argument is not checked for consistency on
T3E– MPI_Gatherv array of “counts” is not checked for consistency on
T3E
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
12
MPI - Buffering
• If your program depends on the buffering of standard MPI Sends and Receives, you may see different behavior between the T3E and the SP.
• Classic case:
...if (mype.eq.0) then call mpi_send(buf,count,type,1,tag,MPI_COMM_WORLD,ierr) call mpi_recv(buf,count,type,0,tag,MPI_COMM_WORLD,status,ierr)else if (mype.eq.1) then call mpi_send(buf,count,type,0,tag,MPI_COMM_WORLD,ierr) call mpi_recv(buf,count,type,1,tag,MPI_COMM_WORLD,status,ierr)end if...
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
13
MPI - Buffering
• On the T3E, a message up to 4 Kbyte are buffered. This can be changed by setting the environment variable MPI_BUFFER_MAX.
• On the SP, the default size depends on the number of processors:1 to 16 4096
17 to 32 2048
33 to 64 1024
65 to 128 512
127 to 256 256
257 and over 128
• This can be changed by setting the environment variable MP_EAGER_LIMIT.
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
14
Cray SciLib and IBM ESSL
• Both vendors provide libraries of commonly used Linear Algebra subroutines
• On the T3E this is linked by default, on the SP use “-lessl”
• These libraries are faster then the public domain BLAS, LAPACK, etc.
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
15
Using BLAS
• BLAS levels 1 through 3 are completely compatible between the two machines
• Note which precision of BLAS is being called:– On the T3E
real*8 a(n), b(n), x
…
x = sdot(n,a,1,b,1)
– On the SPreal*8 a(n), b(n), x
…
x = ddot(n,a,1,b,1)
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
16
Using BLAS
• Instead of changing program source, loader options can be used to map one routine to another
• To resolve a call to sdot by a call to ddot on the SP:
xlf -o a.out -brename:sdot,ddot b.f
• To resolve a call to ddot by a call to sdot on the T3E:
f90 -o a.out -Wl”-Dequiv(DDOT)=SDOT” b.f
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
17
LAPACK routines
• Most other linear algebra routines in Cray SciLib and IBM ESSL are compatible with LAPACK.
• In ESSL there are a few incompatibilities (x may be C, D, S, Z):xGEEV
xSPEV
xSPSV
xHPEV
xHPSV
xGEGV
xSYGV
• Use installed LAPACK library for these.
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
18
ScaLAPACK library
• Cray SciLib and IBM PESSL support pieces of the standard ScaLAPACK library.
• Check precision of routines:– For real*8 on the T3E, routines start “PS”
– For real*8 on the SP, routines start “PD”
• On the SP, you must call BLACS_GET followed by either BLACS_GRIDINIT or BLACS_GRIDMAP. On the T3E, only a call to one of the latter two routines is required.
• Public domain ScaLAPACk is also installed on both machines.
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
19
System Libraries
• Generally, any routines which interact with the operating system, and provide extensions to the Fortran language.
• Cray provides very many such routines. Some are available on the SP, for example:
T3E SP FunctionAbort Abort Ends programExit Exit_ Ends programFlush Flush_ Flushes Fortran I/O bufferSystem Ishell Executes a commandTrbk Xl__trbk Prints a tracback
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
20
System Libraries
• A more comprehensive list is available at: http://hpcf.nersc.gov/computers/SP/port.html
• Some routines have changed names and slightly different arguments.
• There are sometimes identically or similarly named routines on the SP which are designed to be called from C only. Calling them from Fortran will cause unexpected behavior.
• For example, calling exit instead of exit_ will cause the program to end without flushing any Fortran I/O buffer.
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
21
Fortran I/O
• Unformatted I/O– The primitive datatypes on the T3E and SP are compatible (provided they
are of the same length), but control words inserted by Fortran language i/o layer prevent transferability of sequential access files.
– Direct access files can be freely transferred between the two machines, as can MPI I/O files.
• Namelist Input/Output– Users familar with the assign -f77 on the T3E, which causes an old-
style namelist input to be written or read, can set the following environment variable on the SP to obtain the same effect:
setenv XLFRTEOPTS="namelist=old"
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
22
Further Information
• T3E and SP webpages and software webpages contain further information and links to vendor documentation:
http://hpcf.nersc.gov/computers
http://hpcf.nersc.gov/software