mpi io - home-rssgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · purpose •...

50
MPI IO Timothy H. Kaiser, Ph.D. [email protected] Friday, August 12, 11

Upload: others

Post on 17-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

MPI IOTimothy H. Kaiser, Ph.D.

[email protected]

Friday, August 12, 11

Page 2: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

Purpose• Introduce Parallel IO

• Introduce MPI-IO

• Not give an exhaustive survey

• Explain why you would want to use it

• Explain why you wouldn’t want to use it

• Give a nontrivial and useful example

Friday, August 12, 11

Page 4: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

What & Why of parallel IO

Friday, August 12, 11

Page 5: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

What & Why of parallel IO

• Same motivation of going parallel initially

• You have lots of data

• You want to do things fast

• Parallel IO will (hopefully) enable you to move large amounts of data to/from disk quickly

Friday, August 12, 11

Page 6: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

What & Why of parallel IO• Parallel implies that some number (or all) of

your processors (simultaneously) participate in an IO operation

• Good parallel IO shows speedup as you add processors

• I write about 300 Mbytes/second, others faster

Friday, August 12, 11

Page 7: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

A Motivating Example• Earthquake Model E3d

• Finite difference simulation with the grid distributed across N processors

• On BlueGene we run at sizes of 7509 x 7478 x 250 = 14,021,250,000 cells or 56 GBytes per volume, output 3 velocity volumes per dump

• For a restart file we write 14 volumes

Friday, August 12, 11

Page 8: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

Simple (nonMPI) Parallel IO• Each processor dumps its portion of the grid

to a separate “unique” filechar* unique(char *name,int myid) { static char unique_str[40]; int i; for(i=0;i<40;i++) unique_str[i]=(char)0; sprintf(unique_str,"%s%5.5d",name,myid); return unique_str;}

Friday, August 12, 11

Page 9: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

Simple (nonMPI) Parallel module stuffcontainsfunction unique(name,myid)

character (len=*) name character (len=20) unique character (len=80) temp write(temp,"(a,i5.5)")trim(name),myid unique=temp return

end function uniqueend module

Friday, August 12, 11

Page 10: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

Why not just do this?

Friday, August 12, 11

Page 11: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

Why not just do this?

• Might write thousands of files

• Could be very slow

• Output is dependent on the number of processors

• We might want the data in a single file

Friday, August 12, 11

Page 12: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

MPI-IO to the rescue• MPI has over 55 calls related to file input and

output

• Available in most modern MPI libraries

• Can produce exceptional results

• Support striping

• A collection of distributed files look like one

• We will look at outputs to a single file

Friday, August 12, 11

Page 13: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

Why not?• Some functionality might not be available

• 3d data types

Friday, August 12, 11

Page 14: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

Why not?• Some functionality might not be available

• 3d data types• More likely to have/introduce bugs

• Memory leak

• File system overload

• Just hangs

Friday, August 12, 11

Page 15: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

Why not?

• More complex than “normal” output

Friday, August 12, 11

Page 16: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

Why not?

• More complex than “normal” output

• Need support from the file system for good performance

• Have seen 200 bytes/second NOT Megabytes

• Have run out of file locks

Friday, August 12, 11

Page 17: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

Our Real World Example• We have a 3d volume of some data “v” distributed

across N processor

• The size and distribution are input and not the same on each processor

• We are outputting some function of “v” , V=f(v)

• Each processor writes its values to a common file

• We do not “Consider a spherical cow”

Friday, August 12, 11

Page 18: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

Typical Small Case

vista --raw 472 250 242 --minmax -1 15 --skip 36 \ -x 640 -y 480 --outformat png --swapbytes -r 1 0.53 0.51 \ --fov 30 -g 0.9 0.9 0.9 1.0 \ -a 0.002 --opacity 0.01 volume1.0010.3DSMPI

• Grid size 472 x 250 x 242 on 16 processors

• Color shows processor number

Friday, August 12, 11

Page 19: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

Special Considerations

• We are calculating our output on the fly

• Create a buffer

• Fill the buffer and write

• Different processors will have different number of writes

Friday, August 12, 11

Page 20: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

Special Considerations• We want to use a collective write operation

• Each process must call the collective write the same number of times

• Each process must determine how many writes it needs to do

• The total number of writes is the max

• Some processors might call write with no data

Friday, August 12, 11

Page 21: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

Procedure # 1

• Allocate a temporary output buffer

• Open the file

• Set the view of the file to the beginning

• Process 0 writes the file header (36 bytes)

Friday, August 12, 11

Page 22: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

Procedure #2• Create a description of how the data is

distributed

• Defined data type

• Hardest part of the whole process

• Set the view of the file to this description

• Determine how many writes are needed and AllReduce into “do_call_max”

Friday, August 12, 11

Page 23: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

Procedure #3• Loop over the grid

• Fill buffer

• If buffer is full

• Write it

• Adjust offset

• do_call_max=do_call_max-1

• Call write with no data until do_call_max=0

Friday, August 12, 11

Page 24: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

The MPI-IO Routines

• MPI_File_open(MPI_COMM_WORLD,fname,(MPI_MODE_RDWR|MPI_MODE_CREATE),MPI_INFO_NULL,&fh);

• MPI_File_set_view(fh,disp,MPI_INT,filetype,"native",MPI_INFO_NULL);

• MPI_File_write_at(fh, 0, header, hl, MPI_INT,&status);

• MPI_File_write_at_all(fh, offset, ptr, i2, MPI_INT,&status);

• MPI_File_close(&fh);

Friday, August 12, 11

Page 25: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

Synopsis: Opens a fileint MPI_File_open(MPI_Comm comm, char *filename, int amode, MPI_Info info, MPI_File *fh)

Input Parameterscomm

communicator (handle)filename

name of file to open (string)amode

file access mode (integer)info

info object (handle)

Output Parametersfh

file handle (handle)

MPI_File_open

Friday, August 12, 11

Page 26: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

MPI_File_set_viewSynopsis: Sets the file viewint MPI_File_set_view(MPI_File fh, MPI_Offset disp, MPI_Datatype etype, MPI_Datatype filetype, char *datarep, MPI_Info info)

Input Parametersfh

file handle (handle)disp

displacement (nonnegative integer)etype

elementary datatype (handle)filetype

filetype (handle)datarep

data representation (string)info

info object (handle)

Friday, August 12, 11

Page 27: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

MPI_File_write_atSynopsis: Write using explicit offset, not collectiveint MPI_File_write_at(MPI_File fh, MPI_Offset offset, void *buf, int count, MPI_Datatype datatype, MPI_Status *status)

Input Parametersfh

file handle (handle)offset

file offset (nonnegative integer)buf

initial address of buffer (choice)count

number of elements in buffer (nonnegative integer)datatype

datatype of each buffer element (handle)

Output Parametersstatusstatus object (Status)

Friday, August 12, 11

Page 28: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

MPI_File_write_at_allSynopsis: Collective write using explicit offset

int MPI_File_write_at_all(MPI_File fh, MPI_Offset offset, void *buf, int count, MPI_Datatype datatype, MPI_Status *status)

Input Parametersfh

file handle (handle)offset

file offset (nonnegative integer)buf

initial address of buffer (choice)count

number of elements in buffer (nonnegative integer)datatype

datatype of each buffer element (handle)

Output Parametersstatus

status object (Status)

Friday, August 12, 11

Page 29: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

MPI_File_close

Synopsis: Closes a fileint MPI_File_close(MPI_File *fh)

Input Parametersfh

file handle (handle)

Friday, August 12, 11

Page 30: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

The Data type Routines

• MPI_Type_create_subarray(3,gsizes,lsizes,istarts,MPI_ORDER_C,old_type,new_type);

• MPI_Type_contiguous(sx,old_type,&VECT);

• MPI_Type_struct(sz,blocklens,indices,old_types,&TWOD);

• MPI_Type_commit(&TWOD);

Our preferred routine creates a 3d description

On some platforms we need to “fake” it

Friday, August 12, 11

Page 31: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

MPI_Type_create_subarray

Synopsis: Creates a datatype describing a subarray of an N dimensional arrayint MPI_Type_create_subarray(int ndims, int *array_of_sizes, int *array_of_subsizes, int *array_of_starts, int order, MPI_Datatype oldtype, MPI_Datatype *newtype)

Input Parametersndims

number of array dimensions (positive integer) array_of_sizes

number of elements of type oldtype in each dimension of the full array (array of positive integers) array_of_subsizes

number of elements of type oldtype in each dimension of the subarray (array of positive integers) array_of_starts

starting coordinates of the subarray in each dimension (array of nonnegative integers) order

array storage order flag (state) oldtype

old datatype (handle) Output Parametersnewtype

new datatype (handle)

Friday, August 12, 11

Page 32: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

MPI_Type_contiguous

Synopsis: Creates a contiguous datatype

int MPI_Type_contiguous( int count,MPI_Datatype old_type,MPI_Datatype *newtype)

Input Parameterscount

replication count (nonnegative integer)

oldtypeold datatype (handle)

Output Parameternewtype

new datatype (handle)

Friday, August 12, 11

Page 33: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

MPI_Type_struct

Synopsis: Creates a struct datatypeint MPI_Type_struct( int count, int blocklens[], MPI_Aint indices[], MPI_Datatype old_types[], MPI_Datatype *newtype )

Input Parameterscount

number of blocks (integer) -- also number of entries in arrays array_of_types , array_of_displacements and array_of_blocklengths

blocklensnumber of elements in each block (array)

indicesbyte displacement of each block (array)

old_typestype of elements in each block (array of handles to datatype objects)

Output Parameternewtype

new datatype (handle)

Friday, August 12, 11

Page 34: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

! MPI_Init(&argc,&argv);! MPI_Comm_rank( MPI_COMM_WORLD, &myid);! MPI_Comm_size( MPI_COMM_WORLD, &numprocs);! MPI_Get_processor_name(name,&resultlen);! printf("process %d running on %s\n",myid,name);/* we read and broadcast the global grid size (nx,ny,nz) */! if(myid == 0) {! ! if(argc != 4){! ! ! printf("the grid size is not on the command line assuming 100 x 50 x 75\n");! ! ! gblsize[0]=100;! ! ! gblsize[1]=50;! ! ! gblsize[2]=75;! ! }! ! else {! ! ! gblsize[0]=atoi(argv[1]);! ! ! gblsize[1]=atoi(argv[2]);! ! ! gblsize[2]=atoi(argv[3]);! ! }! }! MPI_Bcast(gblsize,3,MPI_INT,0,MPI_COMM_WORLD);/********** a ***********/!

Our Program...

Friday, August 12, 11

Page 35: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

/* the routine three takes the number of processors and returns a 3d decomposition or topology. this is simply a factoring of the number of processors into 3 integers stored in comp */! three(numprocs,comp);!/* the routine mpDecomposition takes the processor topology and the global grid dimensions and maps the grid to the topology. mpDecomposition returns the number of cells a processor holds and the starting coordinates for its portion of the grid */! if(myid == 0 ) { ! printf("input mpDecomposition %5d%5d%5d%5d%5d%5d\n",gblsize[1],gblsize[2],gblsize[0], ! comp[1], comp[2], comp[0]); }! mpDecomposition( gblsize[1],gblsize[2],gblsize[0],comp[1],comp[2],comp[0],myid,dist); printf(" out mpDecomposition %5d%5d%5d%5d%5d%5d%5d\n",myid,dist[0],dist[1],dist[2], dist[3],dist[4],dist[5]);/********** b ***********/!

Friday, August 12, 11

Page 36: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

Processor Size X Size Y Size Z Start X Start Y Start Z0 50 200 13 0 0 01 50 200 13 50 0 02 50 200 13 0 0 133 50 200 13 50 0 134 50 200 12 0 0 265 50 200 12 50 0 266 50 200 12 0 0 387 50 200 12 50 0 38

Global size 50 x 200 x 100 on 8 processors

Example Distribution

Friday, August 12, 11

Page 37: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

/* global grid size */ nx=gblsize[0];! ny=gblsize[1];! nz=gblsize[2];

/* amount that i hold */! sx=dist[0];! sy=dist[1];! sz=dist[2];

/* my grid starts here */! x0=dist[3];! y0=dist[4];! z0=dist[5];

/********** c ***********/!

Back to our program...

Friday, August 12, 11

Page 38: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

/* allocate memory for our volume */! vol=getArrayF3D((long)sy,(long)0,(long)0, (long)sz,(long)0,(long)0, (long)sx,(long)0,(long)0);

/* fill the volume with numbers 1 to global grid size *//* the program from which this example was derived, e3d, holds its data as a collection of vertical planes. plane number increases with y. that is why we loop on y with the outer most loop. */! k=1+(x0+nx*z0+(nx*nz)*y0);! for (ltmp=0;ltmp<sy;ltmp++) {! ! for (mtmp=0;mtmp<sz;mtmp++) {! ! ! for (ntmp=0;ntmp<sx;ntmp++) {! ! ! ! val=k+ntmp+ mtmp*nx! + ltmp*nx*nz;! ! ! ! if(val > (long)INT_MAX)val=(long)INT_MAX;! ! ! ! vol[ltmp][mtmp][ntmp]=(int)val;! ! ! }! ! }! }

/********** d ***********/!

Friday, August 12, 11

Page 39: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

/* create a file name based on the grid size */! for(j=1;j<80;j++) {! ! fname[j]=(char)0;! } sprintf(fname,"%s_%3.3d_%4.4d_%4.4d_%4.4d","mpiio_dat", numprocs,gblsize[0],gblsize[1],gblsize[2]);/* we open the file fname for output, info is NULL */ ierr=MPI_File_open(MPI_COMM_WORLD, fname,(MPI_MODE_RDWR | MPI_MODE_CREATE), MPI_INFO_NULL, &fh);/* we write a 9 integer header */ hl=3; /* set the view to the beginning of the file */ ierr=MPI_File_set_view(fh, 0, MPI_INT, MPI_INT, "native",MPI_INFO_NULL);/* process 0 writes the header */ if(myid == 0) { header[0]=nx; header[1]=ny; header[2]=nz;/* MPI_File_write_at is not a collective so only 0 calls it */ ierr=MPI_File_write_at(fh, 0, header, hl, MPI_INT,&status); }/********** 01 ***********/

Friday, August 12, 11

Page 40: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

/* we create a description of the layout of the data */ /* more on this later */ ! printf("mysubgrid0 %5d%5d%5d%5d%5d%5d%5d%5d%5d%5d\n",myid,nx,ny,nz,sx,sy,sz,x0,y0,z0);! mysubgrid0(nx, ny, nz,sx, sy, sz, x0, y0, z0, MPI_INT,&disp,&filetype);

/* length of the header */! disp=disp+(4*hl); /* every processor "moves" past the header */ ierr=MPI_File_set_view(fh, disp, MPI_INT, filetype, "native",MPI_INFO_NULL);/********** 02 ***********/

Friday, August 12, 11

Page 41: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

/* we are going to create the data on the fly *//* so we allocate a buffer for it */ t3=MPI_Wtime(); isize=sx*sy*sz; buf_size=NUM_VALS*sizeof(FLT); if( isize < NUM_VALS) { buf_size=isize*sizeof(FLT); } else { buf_size=NUM_VALS*sizeof(FLT); } ptr=(FLT*)malloc(buf_size); offset=0;/* find the max and min number of isize of each processors buffer */ ierr=MPI_Allreduce ( &isize, &max_size, 1, MPI_INT, MPI_MAX, MPI_COMM_WORLD); ierr=MPI_Allreduce ( &isize, &min_size, 1, MPI_INT, MPI_MIN, MPI_COMM_WORLD);/********** 03 ***********/

Friday, August 12, 11

Page 42: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

/* find out how many times each processor will dump its buffer */ i=0; i2=0; do_call=0; sample=1; grid_l=y0+sy; grid_m=z0+sz; grid_n=x0+sx;/* could just do division but that would be too easy */ for(l = y0; l < grid_l; l = l + sample) { for(m = z0; m < grid_m; m = m + sample) { for(n = x0; n < grid_n; n = n + sample) { i++; i2++; if(i == isize || i2 == NUM_VALS){ do_call++; i2=0; } } } }/* get the maximum number of many times a processor will dump its buffer */ ierr= MPI_Allreduce ( &do_call, &do_call_max, 1, MPI_INT, MPI_MAX, MPI_COMM_WORLD);/********** 04 ***********/

Friday, August 12, 11

Page 43: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

/* finally we start to write the data */ i=0; i2=0;/* we loop over our grid filling the output buffer */ for(l = y0; l < grid_l; l = l + sample) { for(m = z0; m < grid_m; m = m + sample) { for(n = x0; n < grid_n; n = n + sample) { ptr[i2] = getS3D(vol,l, m, n,y0,z0,x0); i++; i2++;/********** 05 ***********/

Friday, August 12, 11

Page 44: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

/* when we have all our data or the buffer is full we write */ if(i == isize || i2 == NUM_VALS){ t5=MPI_Wtime(); t7++; if((isize == max_size) && (max_size == min_size)) {/* as long as every processor has data to write we use the collective version *//* the collective version of the write is MPI_File_write_at_all */ ierr=MPI_File_write_at_all(fh, offset, ptr, i2, MPI_INT,&status); do_call_max=do_call_max-1; } else {/* if only I have data to write then we use MPI_File_write_at */ /*ierr=MPI_File_write_at(fh, offset, ptr, i2, MPI_INT,&status);*//* Wait! Why was that line commented out? Why are we using MPI_File_write_at_all? *//* Answer: Some versions of MPI work better using MPI_File_write_at_all *//* What happens if some processors are done writing and don't call this? *//* Answer: See below. */ ierr=MPI_File_write_at_all(fh, offset, ptr, i2, MPI_INT,&status); do_call_max=do_call_max-1; } offset=offset+i2; i2=0; t6=MPI_Wtime(); dt[5]=dt[5]+(t6-t5); } }}}/********** 06 ***********/

Friday, August 12, 11

Page 45: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

/* Here is where we fix the problem of unmatched calls to MPI_File_write_at_all*//* If a processor is done with its writes and others still have *//* data to write the the done processor just calls *//* MPI_File_write_at_all but this 0 values to write *//* All processors call MPI_File_write_at_all the same number of *//* times so everyone is happy */ while(do_call_max > 0) { ierr=MPI_File_write_at_all(fh, (MPI_Offset)0, (void *)0, 0, MPI_INT,&status); do_call_max=do_call_max-1; }/* We finally close the file */ ierr=MPI_File_close(&fh);/********* ierr=MPI_Info_free(&fileinfo);*********/ MPI_Finalize(); exit(0);/********** 07 ***********/

Friday, August 12, 11

Page 46: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

vista --rawtype int --minmax 0 1000000 --skip 12 -x 640 -y 480 --outformat png --fov 30 bonk --raw 100 50 200 -r .5 .25 1.0 -g 0.9 0.9 0.9 1.0 -a 0.002 --opacity 0.01

Our output:

Friday, August 12, 11

Page 48: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

void mpDecomposition(int l, int m, int n, int nx, int ny, int nz, int node, int *dist){ ! int nnode, mnode, rnode;! int grid_n,grid_n0,grid_m,grid_m0,grid_l,grid_l0;/* x decomposition */! rnode = node%nx;! mnode = (n%nx);! nnode = (n/nx);! grid_n = (rnode < mnode) ? (nnode + 1) : (nnode);! grid_n0 = rnode*nnode;! grid_n0 += (rnode < mnode) ? (rnode) : (mnode);/* z decomposition */! rnode = (node%(nx*nz))/nx;! mnode = (m%nz);! nnode = (m/nz);! grid_m = (rnode < mnode) ? (nnode + 1) : (nnode);! grid_m0 = rnode*nnode;! grid_m0 += (rnode < mnode) ? (rnode) : (mnode);/* y decomposition */! rnode = node/(nx*nz);! mnode = (l%ny);! nnode = (l/ny);! grid_l = (rnode < mnode) ? (nnode + 1) : (nnode);! grid_l0 = rnode*nnode;! grid_l0 += (rnode < mnode) ? (rnode) : (mnode);!! dist[0]=grid_n; dist[1]=grid_l; dist[2]=grid_m;! dist[3]=grid_n0; dist[4]=grid_l0; dist[5]=grid_m0;

}

/* the routine mpDecomposition takes the processor topology (nx, ny,nz) and the global grid dimensions (l,m,n) and maps the grid to the topology.

mpDecomposition returns the number of cells a processor holds, dist[0:2], and the starting coordinates for its portion of the grid dist[3:5] */

Friday, August 12, 11

Page 49: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

void mysubgrid0(int nx, int ny, int nz, int sx, int sy, int sz, int x0, int y0, int z0, MPI_Datatype old_type, MPI_Offset *location,MPI_Datatype *new_type){! MPI_Datatype VECT;#define BSIZE 5000! int blocklens[BSIZE];! MPI_Aint indices[BSIZE];! MPI_Datatype old_types[BSIZE];! MPI_Datatype TWOD;! int i;! if(myid == 0)printf("using mysubgrid version 1\n");! if(sz > BSIZE)mpi_check(-1,"sz > BSIZE, increase BSIZE and recompile");! ierr=MPI_Type_contiguous(sx,old_type,&VECT); ierr=MPI_Type_commit(&VECT); for (i=0;i<sz;i++) { !blocklens[i]=1; !old_types[i]=VECT; !indices[i]=i*nx*4; }! ierr=MPI_Type_struct(sz,blocklens,indices,old_types,&TWOD); ierr=MPI_Type_commit(&TWOD); for (i=0;i<sy;i++) { !blocklens[i]=1; !old_types[i]=TWOD; !indices[i]=i*nx*nz*4; }! ierr=MPI_Type_struct(sy,blocklens,indices,old_types,new_type); ierr=MPI_Type_commit(new_type); *location=4*(x0+nx*z0+(nx*nz)*y0);}

/* we have two versions of mysubgrid0, the routine that creates the description of the data layout. /*

/* This version of mysubgrid0 builds up the description from primatives. we start with x, then create VECT which is a vector of x values. we then take a collection of VECTs and create a vertical slice, TWOD. note that the distance between each VECT in TWOD is given in indices[i]. we next take a collection of vertical slices and create our volume. again we have the distances between the slices given in indices[i] */

Friday, August 12, 11

Page 50: MPI IO - Home-RSSgeco.mines.edu/workshop/aug2011/05fri/mpiio.pdf · 2011-08-16 · Purpose • Introduce Parallel IO • Introduce MPI-IO • Not give an exhaustive survey • Explain

void mysubgrid0(int nx, int ny, int nz, int sx, int sy, int sz, int x0, int y0, int z0,! ! ! ! MPI_Datatype old_type,! ! ! ! MPI_Offset *location,! ! ! ! MPI_Datatype *new_type){! int gsizes[3],lsizes[3],istarts[3];! gsizes[2]=nx; gsizes[1]=nz; gsizes[0]=ny;! lsizes[2]=sx; lsizes[1]=sz; lsizes[0]=sy;! istarts[2]=x0; istarts[1]=z0; istarts[0]=y0;! if(myid == 0)printf("using mysubgrid version 2\n");! ierr=MPI_Type_create_subarray(3,gsizes,lsizes,istarts,MPI_ORDER_C,old_type,new_type);! ierr=MPI_Type_commit(new_type);! *location=0;}

/* This one is actually preferred. it uses a single call to the mpi routine MPI_Type_create_subarray with the the grid description as input. what we get back is a data type that is a 3d strided volume. Unfortunately, MPI_Type_create_subarray does not work for 3d arrays for some versions of MPI, in particular LAM. */

Friday, August 12, 11