parallel netcdf study. john tannahill lawrence livermore national laboratory september 30, 2003
DESCRIPTION
Slide 3, 2/2/2016 Computing Applications and Research Department Acknowledgments (2). Additional thanks to all the people who contributed to this study in one way or another: Argonne National Laboratory (ANL): William Gropp, Robert Latham, Rob Ross, & Rajeev Thakur. Northwestern (NW) University: Alok Choudhary, Jianwei Li, & Wei-keng Liao. Lawrence Livermore National Laboratory (LLNL): Richard Hedges, Bill Loewe, & Tyce McLarty. Lawrence Berkeley Laboratory (LBL) / NERSC: Chris Ding & Woo-Sun Yang. UCAR / NCAR / Unidata: Russ Rew. University of Chicago: Brad Gallagher.TRANSCRIPT
Parallel netCDF Study.Parallel netCDF Study.
John TannahillJohn TannahillLawrence Livermore National LaboratoryLawrence Livermore National Laboratory
([email protected])([email protected])
September 30, 2003September 30, 2003
Slide 2, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Acknowledgments (1).
This work was performed under the auspices of the U.S.
Department of Energy by the University of California, Lawrence
Livermore National Laboratory under contract No. W-7405-Eng-48.
Work funded by the LLNL/CAR Techbase Program.
Many thanks to this program for providing the resources to conduct
this study of parallel netCDF, something that probably would not
have occurred otherwise.
This is LLNL Report: UCRL-PRES-200247
Slide 3, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Acknowledgments (2).
Additional thanks to all the people who contributed to this study in one way or another:
Argonne National Laboratory (ANL): William Gropp, Robert Latham, Rob Ross, & Rajeev Thakur.
Northwestern (NW) University: Alok Choudhary, Jianwei Li, & Wei-keng Liao.
Lawrence Livermore National Laboratory (LLNL): Richard Hedges, Bill Loewe, & Tyce McLarty.
Lawrence Berkeley Laboratory (LBL) / NERSC: Chris Ding & Woo-Sun Yang.
UCAR / NCAR / Unidata: Russ Rew.
University of Chicago: Brad Gallagher.
Slide 4, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Overview of contents.
Proposal background and goals. Parallel I/O options initially explored. A/NW’s parallel netCDF library.
(A/NW = Argonne National Laboratory / Northwestern University) Installation. Fortran interface.
Serial vs. Parallel netCDF performance. Test code details. Timing results.
Parallel HDF5 comparison. Observations / Conclusions. Remaining questions / issues.
Slide 5, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Why parallel netCDF (1)?
Parallel codes need parallel I/O.
Performance.
Ease of programming and understandability of code.
Serial netCDF is in widespread use.
Currently a de-facto standard for much of the climate community.
Easy to learn and use.
Well supported by Unidata.
Huge amount of existing netCDF data sets.
Many netCDF post-processing codes and tools.
Slide 6, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Why parallel netCDF (2)?
Hopefully a fairly straightforward process to migrate from serial to
parallel netCDF.
From material presented at a SuperComputing 2002 tutorial
(11/02), it appeared that at least one feasible option for a Fortran
parallel netCDF capability would soon be available.
Slide 7, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Summary of work performed under proposal (1).
Read material, performed web searches, and communicated with a number of people to determine what options were available. Parallel I/O for High Performance Computing by John May.
Once the decision was made to go with A/NW’s parallel netCDF, collaborated with them extensively: First, to get the kinks out of the installation procedure, for each of
the platforms of interest.
Next, to get the Fortran interface working properly.
– C interface complete, but Fortran interface needed considerable work.
– Wrote Fortran 90 (F90) and C interface test codes.
Slide 8, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Summary of work performed under proposal (2).
Also developed F90 test codes for performance testing: One that emulates the way serial netCDF is currently being used
to do I/O in our primary model.
Another that replaces the serial netCDF code with its A/NW parallel netCDF equivalent.
Ran a large number of serial / parallel netCDF timings. Collaborated with Livermore Computing personnel to convert
the parallel netCDF test code to its parallel HDF5 equivalent. Ran a limited number of parallel HDF5 timings for comparison
with parallel netCDF. Created this presentation / report.
Slide 9, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Ultimate goals.
Bring a much-needed viable Fortran parallel netCDF capability to the Lab.
Incorporate parallel netCDF capabilities into our primary model, an Atmospheric Chemical Transport Model (ACTM) called “Impact”.
Model uses a logically rectangular, 2D lon/lat domain
decomposition, with a processor assigned to each subdomain.
Each subdomain consists of a collection of full vertical columns,
spread over a limited range of latitude and longitude.
Employs a Master / Slaves paradigm.
MPI used to communicate between processors as necessary.
Slide 10, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
2-D (lon/lat) domain decomposition.
Slide 11, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Impact model / Serial netCDF.
Impact currently uses serial netCDF for much of its I/O.
Slaves read their own data.
Writes are done by the Master only.
Data communicated back to Master for output.
Buffering required because of large arrays and limited memory on
Master.
Complicates the code considerably.
Increased I/O performance welcomed, but code not necessarily I/O
bound.
Slide 12, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Impact model / Serial netCDF calls.
Impact Serial netCDF Calls
Nf_Create
Nf_Open
Nf_Close
Nf_Def_Dim
Nf_Def_Var
Nf_Enddef
Nf_Get_Var_Int Nf_Put_Var_Double
Nf_Put_Var_Int
Nf_Put_Var_Real
Nf_Set_Fill
Nf_Sync
Nf_Inq_Dimid
Nf_Inq_Dimlen
Nf_Inq_Unlimdim
Nf_Inq_Varid
Nf_Get_Vara_Double
Nf_Get_Vara_Int
Nf_Get_Vara_Real
Nf_Put_Vara_Double
Nf_Put_Vara_Int
Nf_Put_Vara_Real
Nf_Put_Vara_Text
Nf_Get_Att_Text Nf_Put_Att_Text
Slide 13, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Parallel I/O options initially explored (1).
Parallel netCDF alternatives: A/NW (much more later). LBL / NERSC:
– Ziolib + parallel netCDF.
– Level of support? Small user base? Recoding effort?
– My lack of understanding in general? Unidata / NCSA project:
– “Merging the NetCDF and HDF5 Libraries to Achieve Gains in Performance and Interoperability.”
– PI is Russ Rew, one of the primary developers of serial netCDF.
– Multi-year project and just began, so not a viable option.
Slide 14, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Abstract of Unidata / NCSA project.
Merging the NetCDF and HDF5 Libraries to Achieve Gains in Performance and Interoperability.
The proposed work will merge Unidata's netCDF and NCSA's HDF5, two widely-used scientific data access libraries. Users of netCDF in numerical models will benefit from support for packed data, large datasets, and parallel I/O, all of which are available with HDF5. Users of HDF5 will benefit from the availability of a simpler high-level interface suitable for array-oriented scientific data, wider use of the HDF5 data format, and the wealth of netCDF software for data management, analysis and visualization that has evolved among the large netCDF user community. The overall goal of this collaborative development project is to create and deploy software that will preserve the desirable common characteristics of netCDF and HDF5 while taking advantage of their separate strengths: the widespread use and simplicity of netCDF and the generality and performance of HDF5.
To achieve this goal, Unidata and NCSA will collaborate to create netCDF-4, using HDF5 as its storage layer. Using netCDF-4 in advanced Earth science modeling efforts will demonstrate its effectiveness. The success of this project will facilitate open and free technologies that support scientific data storage, exchange, access, analysis, discovery, and visualization. The technology resulting from the netCDF-4/HDF5 merger will benefit users of Earth science data and promote cross-disciplinary research through the provision of better facilities for combining, synthesizing, aggregating, and analyzing datasets from disparate sources to make them more accessible.
Slide 15, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Parallel I/O options initially explored (2).
Parallel HDF5:
Would require a significant learning curve; fairly complex.
Would require significant code changes.
Feedback from others:
– Difficult to use.
– Limited capability to deal directly with netCDF files.
– Performance issues?
– Fortran interface?
Slide 16, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Why A/NW’s parallel netCDF was chosen (1).
Expertise, experience, and track record of developers.
PVFS, MPICH, ROMIO.
Parallel netCDF library already in place.
Small number of users; C interface only.
Initial work on Fortran interface completed, but untested.
Interest level in their product seems to be growing rapidly.
Parallel syntax much like the serial syntax.
Slide 17, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Why A/NW’s parallel netCDF was chosen (2).
Level of support that could be expected over the long-term.
Russ Rew recommendation; based on my needs and time frame.
A/NW developers level of interest and enthusiasm in working
with me.
Belief that A/NW may play a role in the Unidata / NCSA project.
Only practical option currently available?
Slide 18, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
A/NW’s parallel netCDF library.
Based on Unidata’s serial netCDF library.
Syntax and use very much like serial netCDF:
nf_ functions become nfmpi_ functions.
Additional arguments necessary for some calls.
– Create / Open require communicator +
MPI hint (used MPI_INFO_NULL).
Collective functions are suffixed with _all.
netcdf.inc include file becomes pnetcdf.inc.
-lnetcdf library becomes –lpnetcdf.
Slide 19, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
A/NW’s parallel netCDF library: v0.9.0.
First version with a fully functional Fortran interface.
Installation procedure made more user-friendly.
Fortran test routines added.
Interacted extensively with the developers on the above items.
Several LLNL F90 test codes became part of the v0.9.0 release.
The end product seems to meet our needs in terms of
functionality, ease of use, and portability.
Have been told that the first non-beta release will be soon.
Slide 20, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Platforms used for tests.
MachineType
ComputerCenter:Machine
Name
ProcessorType: Speed(MHz) Nodes
CPUs/Node
Memory/Node(GB)
Parallel File
SystemUsed
Intel / Linux cluster
LLNL:mcr
Pentium4Xeon: 2400
1152 2 4 Lustre/p/gm1
IBM SP NERSC:seaborg
Power3:375
416 16 16-64 GPFS$SCRATCH
Compaq TeraCluster
2000LLNL:tckk
ES40/EV67: 667
128 4 2 CPFS/cpfs
Slide 21, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Parallel netCDF v0.9.0 installation (1).
Web site => http://www-unix.mcs.anl.gov/parallel-netcdf
Subscribe to the mailing list.
Download:
parallel-netcdf-0.9.0.tar.gz
Parallel NetCDF API documentation.
Note that the following paper will also be coming out soon:Jianwei Li, Wei-keng Liao, Alok Choudhary, Robert Ross, Rajeev Thakur, William Gropp, and Rob Latham, “Parallel netCDF: A Scientific High-Performance I/O Interface”, to appear in the Proceedings of the 15th SuperComputing Conference, November, 2003.
Slide 22, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Parallel netCDF v0.9.0 installation (2).
Environment Variable mcr seaborg tckkMPICCMPIF77
F77FCCC
CCX
mpcc_rmpxlf_r
xlfxlfxlcxlC
mpiiccmpiifc
ifcifcicc---
mpiccmpif77
------------
Set the following environment variables:
Uncompress / Untar tar file. Move into the top-level directory. Type:
./configure --prefix=/replace with top-level directory path make make install
Slide 23, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Performance test codes (1).
Test codes written in Fortran 90.
MPI_Wtime used to do the timings.
One large 4D floating point array read or written.
Set up to emulate the basic kind of netCDF I/O that is currently
being done in the Impact model.
Use a Master / Slaves paradigm.
Lon x Lat x Levels x Species I/O array dimensions.
Each Slave only has a portion of the first two dimensions.
Slide 24, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Performance test codes (2).
Focused on timing the explicit Read / Write calls, along with any
required MPI communication costs.
Typically, Impact files are open for prolonged periods, with large
Read / Writes occurring periodically, then eventually closed.
Not overly concerned with file definition costs; opens / closes, but
kept an eye on them.
Slide 25, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Serial netCDF performance test code.
Version 3.5 of serial netCDF used.
Slave processors read their own input data.
Slaves use MPI to communicate their output data back to the
Master for output.
Communication cost included for Write timings.
Only Master creates / opens output file.
Timed over a single iteration of Read / Write calls in any given run.
Slide 26, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Parallel netCDF performance test code.
Version 0.9.0 of A/NW’s parallel netCDF used. Slave processors do all netCDF I/O (Master idle). All Slaves create / open output file. Translation from serial netCDF test code.
Same number of netCDF calls. Calls are syntactically very similar. Explicit MPI communications no longer needed for Writes. Two additional arguments are required for Create / Open:
– Communicator + MPI hint (used MPI_INFO_NULL). netcdf.inc needs to be changed to pnetcdf.inc.
Timed over 10 iterations of Read / Write calls in any given run.
Slide 27, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Function calls used in netCDF test codes.
Serial netCDF Calls
Nf_CreateNf_OpenNf_Close (2X)
Nf_Def_Dim (4X)Nf_Def_VarNf_Enddef
Nf_Inq_Varid (2X)
Nf_Get_Vara_RealNf_Put_Vara_Real
Parallel netCDF Calls
Nfmpi_CreateNfmpi_OpenNfmpi_Close (2X)
Nfmpi_Def_Dim (4X)Nfmpi_Def_VarNfmpi_Enddef
Nfmpi_Inq_Varid (2X)
Nfmpi_Get_Vara_Real_AllNfmpi_Put_Vara_Real_All
Slide 28, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Parallel netCDF test code compilation.
On mcr:mpiifc -O3 -xW -tpp7 -Zp16 -ip -cm -w90 -w95 -extend_source \
-I$(HOME)/parallel-netcdf-0.9.0/include -c slvwrt.F On seaborg:
mpxlf90_r -c -d -I$(HOME)/parallel-netcdf-0.9.0/include -O3 \-qfixed=132 -qstrict -qarch=auto -qtune=auto \-qmaxmem=-1 slvwrt.F
On tckk:/usr/bin/f90 -arch host -fast -fpe -assume accuracy_sensitive \
-extend_source -I$(HOME)/parallel-netcdf-0.9.0/include \-c slvwrt.F
Then link with -lpnetcdf and other libraries as necessary (mpi, etc.).
Slide 29, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Timing issue.
I/O resources are shared, so getting consistent timings can be
problematic.
More so for some machines (seaborg) than others.
Made many runs and took the best time.
Slide 30, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
I/O timing variables.
Modes Operations PlatformsNumber ofProcessors
File Size(MB)
SerialParallel
ReadWrite
mcrseaborg
tckk
163164
127
302.4907.2
1814.4
Number ofProcessors
Lon x Lat + 1 for Master
16 5 x 3 + 131 5 x 6 + 164 9 x 7 + 1
127 9 x 14 + 1
File Size(MB)
Lon x Lat x Levels x Species
302.4 180 x84 x 50 x 100907.2 270 x168 x 50 x 100
1814.4 360 x252 x 50 x 100
Slide 31, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Serial / Parallel netCDF performance test results for mcr (plots to follow).
Serial netCDF Read / Write Rates (MB/s)
Number of Processors16 31 64 127
File Size (MB)
302 43 / 40 25 / 39 14 / 39 7 / 48907 44 / 24 26 / 38 14 / 38 7 / 38
1814 51 / 25 29 / 25 15 / 38 7 / 39
Parallel netCDF Read / Write Rates (MB/s)
Number of Processors16 31 64 127
File Size (MB)
302 438 / 104 676 / 115 667 / 114 577 / 101907 488 / 119 771 / 135 949 / 131 863 / 132
1814 520 / 128 820 / 130 1020 / 144 1032 / 136
Slide 32, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Serial / Parallel netCDF performance test results for seaborg (plots to follow).
Serial netCDF Read / Write Rates (MB/s)
Number of Processors16 31 64 127
File Size (MB)
302 52 / 25 47 / 25 46 / 23 48 / 21907 49 / 24 48 / 24 42 / 24 41 / 23
1814 45 / 24 49 / 24 48 / 23 53 / 23
Parallel netCDF Read / Write Rates (MB/s)
Number of Processors16 31 64 127
File Size (MB)
302 136 / 159 209 / 264 175 / 198 428 / 235907 121 / 149 273 / 223 268 / 219 271 / 286
1814 114 / 136 255 / 238 278 / 235 350 / 311
Slide 33, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Serial / Parallel netCDF performance test results for tckk (plots to follow).
Serial netCDF Read / Write Rates (MB/s)
Number of Processors16 31 64 127
File Size (MB)
302 91 / 30 43 / 26 9 / 15 ---907 81 / 21 40 / 22 16 / 21 ---
1814 58 / 20 38 / 19 24 / 20 ---
Parallel netCDF Read / Write Rates (MB/s)
Number of Processors16 31 64 127
File Size (MB)
302 187 / 55 299 / 81 345 / 87 ---907 194 / 47 322 / 85 392 / 117 ---
1814 198 / 56 329 / 85 417 / 118 ---
Slide 34, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Serial / Parallel netCDF Read / Write rates.
15 48 24
1020
278
417
0
200
400
600
800
1000
1200
mcr seaborg tckk
Platform
Rate(MB/s)
SerialParallel
Read
38 23 20
144235
118
0
200
400
600
800
1000
1200
mcr seaborg tckk
Platform
Rate(MB/s)
SerialParallel
Write
1814 MB file64 processors
Slide 35, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
15
48
24
38
23 20
0
20
40
60
80
100
120
mcr seaborg tckk
Platform
Rate(MB/s)
ReadWrite
Serial
1020
278
417
144235
118
0
200
400
600
800
1000
1200
mcr seaborg tckk
Platform
Rate(MB/s)
ReadWrite
Parallel
1814 MB file64 processors
(Note different y axis scales.)
Read / Write netCDF Serial / Parallel rates.
Slide 36, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Read netCDF Serial / Parallel rates for varying numbers of processors.
45 49 48 53114
255 278
350
0
200
400
600
800
1000
1200
0 50 100 150Number of Processors
(16,31,64,127)
SerialParallel
Read1814 MB file
mcr
51 29 15 7
520
820
1020 1032
0
200
400
600
800
1000
1200
0 50 100 150
Rate(MB/s)
58 38 24
198
329
417
0
200
400
600
800
1000
1200
0 50 100 150
seaborg tckk
Slide 37, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Write netCDF Serial / Parallel rates for varying numbers of processors.
24 24 23 23
136
238 235
311
0
100
200
300
400
0 50 100 150Number of Processors
(16,31,64,127)
SerialParallel
Write1814 MB file
mcr
25 2538 39
128 130144 136
0
100
200
300
400
0 50 100 150
Rate(MB/s)
20 19 20
56
85
118
0
100
200
300
400
0 50 100 150
seaborg tckk
Slide 38, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Read netCDF Serial / Parallel rates for varying file sizes.
46 42 48
175
268 278
0
200
400
600
800
1000
1200
302 907 1814
File Size (MB)
SerialParallel
Read64 processors
mcr
14 14 15
667
9491020
0
200
400
600
800
1000
1200
302 907 1814
Rate(MB/s)
9 16 24
345392 417
0
200
400
600
800
1000
1200
302 907 1814
seaborg tckk
Slide 39, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Write netCDF Serial / Parallel rates for varying file sizes.
23 24 23
198219
235
0
100
200
300
400
302 907 1814
File Size (MB)
SerialParallel
Write64 processors
mcr
39 38 38
114131
144
0
100
200
300
400
302 907 1814
Rate(MB/s)
15 21 20
87
117 118
0
100
200
300
400
302 907 1814
seaborg tckk
Slide 40, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Parallel HDF5 performance test code.
Version 1.4.5 of NCSA’s parallel HDF5 used.
Slave processors do all HDF5 I/O (Master idle). Collaborated with Livermore Computing personnel to convert the
parallel netCDF test code to its parallel HDF5 equivalent.
Conversion seemed to take a good deal of effort.
Increase in code complexity over parallel netCDF.
Great deal of difficulty in getting test code compiled and linked. Irresolvable problems with parallel HDF5 library on mcr and tckk.
Finally got things working on seaborg.
Made a limited number of timing runs for a “ballpark” comparison.
Slide 41, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Function calls used in parallel HDF5 test code.
Parallel HDF5 Calls
h5dcreate_fh5fcreate_fh5pcreate_f(3X)h5screate_simple_f(2X)
h5dget_space_fh5sselect_hyperslab_f
h5open_f h5dopen_f h5fopen_f
h5pset_chunk_f h5pset_dxpl_mpio_f h5pset_fapl_mpio_f
h5close_f h5dclose_f h5fclose_f h5pclose_f(2X)h5sclose_f(2X)
h5dread_fh5dwrite_f
Slide 42, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Parallel HDF5 test code compilation.
On seaborg:
module load hdf5_par
mpxlf90_r -c -d -O3 -qfixed=132 -qstrict -qarch=auto \
-qtune=auto -qmaxmem=-1 slvwrt_hdf5.F $(HDF5)
Add $(HDF5) at the end of your link line as well.
Slide 43, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Parallel HDF5 / netCDF performance test results for seaborg (plot to follow).
Parallel HDF5 Read / Write Rates (MB/s)
Number of Processors16 31 64 127
File Size (MB)
302 --- --- --- ---907 --- 631 / 838 1016 / 1459 ---
1814 --- --- 1182 / 1465 ---
Parallel netCDF Read / Write Rates (MB/s)
Number of Processors16 31 64 127
File Size (MB)
302 136 / 159 209 / 264 175 / 198 428 / 235907 121 / 149 273 / 223 268 / 219 271 / 286
1814 114 / 136 255 / 238 278 / 235 350 / 311
Slide 44, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Parallel HDF5 / netCDF Read / Write rates.
1182
278
0
200
400
600
800
1000
1200
1400
1600
mcr seaborg tckk
Platform
Rate(MB/s)
PHDF5PnetCDF
Read1465
235
0
200
400
600
800
1000
1200
1400
1600
mcr seaborg tckk
Platform
Rate(MB/s)
PHDF5PnetCDF
Write
1814 MB file64 processors
Slide 45, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Observations.
Parallel netCDF seems to be a very hot topic right now.
Since A/NW’s parallel netCDF is functionally and syntactically very
similar to serial netCDF, code conversion is pretty straightforward.
I/O speeds can vary significantly machine to machine.
I/O speeds can vary significantly on the same machine, based on
the I/O load at any given time.
Slide 46, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Misc. conclusions from plots.
Our current method of doing serial netCDF Slave Reads performed quite poorly in general. Unexpected. Can degrade significantly as number of processors are increased.
Parallel netCDF Reads are faster than Writes. Magnitude of difference on a given platform can vary dramatically.
mcr marches to its own netCDF drummer. Parallel Reads are quite fast; serial Reads are not. Serial Writes faster than Reads. Parallel Writes scale poorly.
Parallel netCDF I/O tends to get somewhat faster as the file size increases.
Different platforms can behave very differently!
Slide 47, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Overall Conclusions.
Under the specified test conditions: A/NW’s parallel netCDF (v0.9.0) performed significantly better than
serial netCDF (v3.5). Under the specified test conditions and limited testing:
Parallel HDF5 (v1.4.5) performed significantly better than A/NW’s parallel netCDF (v0.9.0).
– To date, A/NW focus has been on functionality, not performance; they believe that there is substantial room for improvement.
– On a different platform and code, A/NW developers have found that parallel netCDF significantly outperforms parallel HDF5.
– Not a simple matter of one being faster than the other, platform and access patterns may favor one or the other.
Slide 48, 05/03/23Computing Applications and Research DepartmentComputing Applications and Research Department
Remaining questions / issues.
What about files larger than 2 GB? It appears that a general netCDF solution may be forthcoming.
How much will A/NW be able to improve performance? They are committed to working this issue.
When will the first A/NW non-beta release be? Maybe early next year, after performance issues are addressed.
What will the outcome of the Unidata / NCSA project be? What role will A/NW play?
Have any potential show stoppers been missed? Will we incorporate A/NW’s parallel netCDF capability into our
Impact model?