hdf5 i/o performance
DESCRIPTION
Source: http://hdfeos.org/workshops/ws06/presentations/Pourmal/HDF5_IO_Perf.pdfTRANSCRIPT
![Page 1: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/1.jpg)
1
HDF5 I/O PerformanceHDF5 I/O Performance
HDF and HDF-EOS Workshop VIDecember 5, 2002
![Page 2: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/2.jpg)
2
Goal of this talk
• Give an overview of the HDF5 Library tuning knobsfor sequential and parallel performance
![Page 3: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/3.jpg)
3
Challenging task
• HDF5 Library has to perform well on– Variety of UNIX Workstation (SGI, Intel, HP, Sun)– Windows– Cray– DOE supercomputers (IBM SP, Intel Tflops)– Linux clusters (Compaq, Intel)
• Variety of file systems (GPFS, PVFS, Unix FS)• Variety of MPI-IO implementations• Other tasks
– Efficient memory and file space management• Applications are different (access patterns, many
small objects vs. few large objects, parallel vs.sequential, etc.)
![Page 4: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/4.jpg)
4
Outline
• Sequential performance– Tuning knobs
• File level• Data transfer level
– Memory management– File space management: Fill values and storage allocation– Chunking
• Compression• Caching
– Compact storage
![Page 5: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/5.jpg)
5
Outline
• Parallel performance
– Tuning knobs• Data alignment• MPI-IO hints• HDF5 Split Driver
– h5perf benchmark
![Page 6: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/6.jpg)
6
Sequential Performance
• Tuning knobs
![Page 7: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/7.jpg)
7
Two Sets of Tuning Knobs
• File level knobs– Apply to the entire file
• Data transfer level knobs– Apply to individual dataset read or write
![Page 8: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/8.jpg)
8
File Level Knobs
• H5Pset_meta_block_size• H5Pset_cache
![Page 9: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/9.jpg)
9
H5Pset_meta_block_size
• Sets the minimum metadata block size allocatedfor metadata aggregation.
• Aggregated block is usually written in a singlewrite action
• Default is 2KB• Pro:
– Larger block size reduces I/O requests
• Con:– Could create “holes” in the file and make file bigger
![Page 10: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/10.jpg)
10
H5Pset_meta_block_size
• When to use:• File is open for a long time and
– A lot of objects created– A lot of operations on the objects performed– As a result metadata is interleaved with raw data– A lot of new metadata (attributes)
![Page 11: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/11.jpg)
11
H5Pset_cache
• Sets:– The number of elements (objects) in the meta data
cache– The number of elements, the total number of bytes, and
the preemption policy value (default is 0.75) in the rawdata chunk cache
![Page 12: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/12.jpg)
12
H5Pset_cache(cont.)
• Preemption policy:– Chunks are stored in the list with the most recently
accessed chunk at the end– Least recently accessed chunks are at the beginning of
the list– X*100% of the list is searched for the fully read/written
chunk; X is called preemption value, where X is between0 and 1
– If chunk is found then it is deleted from cache, if notthen first chunk in the list is deleted
![Page 13: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/13.jpg)
13
H5Pset_cache(cont.)
• The right values of X– May improve I/O performance by controlling preemption
policy– 0 value forces to delete the “oldest” chunk from cache– 1 value forces to search all list for the chunk that will be
unlikely accessed– Depends on application access pattern
![Page 14: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/14.jpg)
14
Data Transfer Level Knobs
• H5Pset_buffer• H5Pset_sieve_buf_size
![Page 15: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/15.jpg)
15
H5Pset_buffer
• Sets size of the internal buffers used during datatransfer
• Default is 1 MB• Pro:
– Bigger size improves performance
• Con:– Library uses more memory
![Page 16: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/16.jpg)
16
H5Pset_buffer
• When should be used:– Datatype conversion– Data gathering-scattering (e.g. checker board dataspace
selection)
![Page 17: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/17.jpg)
17
H5Pset_sieve_buf_size
• Sets the size of the data sieve buffer• Default is 64KB• Sieve buffer is a buffer in memory that holds part
of the dataset raw data• During I/0 operations data is replaced in the
buffer first, then one big I/0 request occurs
![Page 18: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/18.jpg)
18
H5Pset_sieve_buf_size
• Pro:– Bigger size reduces I/O requests issued for raw data
access
• Con:– Library uses more memory
• When to use:– Data scattering-gathering (e.g. checker board)– Interleaved hyperslabs
![Page 19: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/19.jpg)
19
HDF5 Application Memory Management
• H5garbage_collect()– Memory used by HDF5 application may grow with the
growing number of the objects created and then released– Function walks through all the garbage collection
routines of the library, freeing any unused memory
– When to use:– Application creates-opens-releases substantial
number of objects– “Number of objects” is application and platform
dependent
![Page 20: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/20.jpg)
20
HDF5 File Space Management
• H5Pset_alloc_time– Sets the time of data storage allocation for creating a
dataset• Early when dataset is created• Late when dataset is written
• H5Pset_fill_time– Sets the time when fill values are written to a dataset
• When space allocated• Never
– Avoids unnecessary writes
![Page 21: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/21.jpg)
21
Chunking and Compression
• Chunking storage– Provides better partial access to dataset– Space is allocated when data is written– Con:
• Storage overhead• May degrade performance if cache is not set up properly
• Compression (GZIP, SZIP in HDF5 1.5 release)– Saves space– User may easily turn on their own compression method– Con:
• May take a lot of time• Data shuffling (in HDF5 1.5 release)
– Helps compression algorithms
![Page 22: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/22.jpg)
22
Data shuffling
• See Kent Yang’s poster• Not a compression; change of byte order in a
stream of data• Example
– 1 23 43• Hexadecimal form
– 0x01 0x17 0x2B• Big-endian machine
– 0x00 0x00 0x00 0x01 0x00 0x00 0x00 0x17 0x00 0x000x00 0x2B
• Shuffling– 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x01
0x17 0x2B
![Page 23: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/23.jpg)
23
00 00 00 01 00 00 00 17 00 00 00 2B
00 00 00 00 00 00 01 17 2B
00 00 00 01 00 00 00 17 00 00 00 2B
00 00 00
![Page 24: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/24.jpg)
24
Chunking and compressionbenchmark
• Write one 4-byte integer dataset 256x256x1024(256MB)
• Using chunks of 256x16x1024 (16MB)• Random integers between 0 and 255• Tests with
– Compression on/off– Chunk cache size 16MB to 256MB– Data shuffling
![Page 25: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/25.jpg)
25
Chunking BenchmarkTime Definitions
• Total– Time to open file, write dataset, close dataset and close
file
• Write time– Time to write the whole dataset
• Average chunk time– Total time/ number of chunks
![Page 26: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/26.jpg)
26
Performance improvement
0.24474.57111.4.5-prerelease
0.48098.69501.4.4 release
Average time towrite a 16MB chunkIn seconds
Total time(Open-write-close)in seconds
Release version
![Page 27: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/27.jpg)
27
Effect of Caching (H5Pset_cache)
3.48674.66256MBYes
630.89672.5816MBYesFile size102.9MB
3.635.79256MBNo
5.435.60716MBNoFile size268.4MB
Write time inseconds
Total time inseconds
CacheCompression
![Page 28: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/28.jpg)
28
Effect of data shuffling(H5Pset_shuffle + H5Pset_deflate)
78.26883.35367.34MB
629.45671.049102.9MB
Write TimeTotal timeFile size
Compression combined with shuffling provides•Better compression ratio•Better I/O performance
![Page 29: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/29.jpg)
29
Effect of chunk caching and data shuffling
3.47682.972256MB
43.25782.942128MB
78.26883.35316MB
Write TimeTotal timeCache
H5Pset_cache + H5Pset_shuffle + H5Pset_deflate
•Caching improves chunk write time
![Page 30: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/30.jpg)
30
Compact storage
• Store small objects (e.g. 4KB dataset) in the file– C code example:
plist = H5Pcreate(H5P_DATASET_CREATE);H5Pset_layout(plist, H5D_COMPACT);H5Pset_alloc_time(plist,H5D_ALLOC_TIME_EARLY);dataset = H5Dcreate(file,…, plist);
– Raw data is stored in the dataset header• Metadata and raw data are written/read in one I/0
operation• Faster write and read
![Page 31: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/31.jpg)
31
Compact storage benchmark
• Create a file with N 4KB datasets using regular andcompact storage (100 < N < 35000)
• Measure average time needed to write/read adataset in a file with N datasets
• Benchmark run on Linux 2.2.18 i686, 960MB memory• timeofday function used to measure time
![Page 32: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/32.jpg)
32
Writing a dataset
0
0.0001
0.0002
0.0003
0.0004
0.0005
0.0006
100
4100
8100
1210
016
100
2010
024
100
2810
032
100
Number of datasets (4KB size)
Ave
rage
tim
e to
wri
te a
da
tase
t in
seco
nds
Regular storageCompact storage
![Page 33: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/33.jpg)
33
Reading latest written dataset
0
0.005
0.01
0.015
0.02
0.025
100
4100
8100
1210
016
100
2010
024
100
2810
032
100
Number of datasets (4KB size)
Tim
e in
sec
onds
Regular storageCompact storage
![Page 34: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/34.jpg)
34
Parallel Performance
• Tuning knobs• h5perf benchmark
![Page 35: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/35.jpg)
35
Parallel HDF + MPI
ParallelApplication
ParallelApplication
ParallelApplication
ParallelApplication User Applications
HDF library
Parallel I/O layer
Parallel File systemsSP GPFSO2K Unix I/O
MPI-IO
TFLOPS PFS
PHDF5 Implementation Layers
![Page 36: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/36.jpg)
36
File Level Knobs (Parallel)
• H5Pset_alignment• H5Pset_fapl_split• H5Pset_fapl_mpio
![Page 37: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/37.jpg)
37
H5Pset_alignment
• Sets two parameters– Threshold
• Minimum size of object for alignment to take effect• Default 1 byte
– Alignment• Allocate object at the next multiple of alignment• Default 1 byte
• Example: (threshold, alignment) = (1024, 4K)– All objects of 1024 or more bytes starts at the boundary
of 4KB
![Page 38: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/38.jpg)
38
H5Pset_alignmentBenefits
• In general, the default (no alignment) is good forsingle process serial access since the OS alreadymanages buffering.
• For some parallel file systems such as GPFS, analignment of the disk block size improves I/Ospeeds.
![Page 39: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/39.jpg)
39
H5Pset_fapl_split
• HDF5 splits to two files– Metadata file for metadata– Raw data file for raw data (array data)
• Significant I/O improvement if– metadata file is stored in Unix file systems (good for
many small I/O)– raw data file is stored in Parallel file systems (good for
large I/O).
![Page 40: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/40.jpg)
40
Writes
peeds
o f
Standardvs
.S
plit
-file
HDF5vs
.M
2 4 8 16
MB
/sec
Number of processes
4
8
12
16
20
MPI I/O write (one file)
Split-file HDF5 write
Standard HDF5 write (one file)
Results for ASCI Red machine at Sandia National Laboratory
•Each process writes 10MB of array data
![Page 41: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/41.jpg)
41
I/O Hints viaH5Pset_fapl_mpio
• MPI-IO hints can be passed to the MPI-IO layer viathe Info parameter of H5Pset_fapl_mpio
• Examples– Telling Romio to use 2-phase I/O speeds up collective
I/O in the ASCI Red machine at Livermore NationalLaboratory
– Setting IBM_largeblock_io=true speeds up GPFS writespeeds
![Page 42: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/42.jpg)
42
2-Phase I/O
p0 p1 p2 p3 p4 p5
p0 p1
disk
- Interleaving
![Page 43: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/43.jpg)
43
2-Phase I/O
• Aggregation (available in ROMIO 1.2.4); useful for• filling I/O buffers• moving data to processors that have better connectivity
p0 p1 p2 p3 p4 p5
disk
p0 p1
![Page 44: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/44.jpg)
44
Effects of I/O HintsIBM_largeblock_io
• GPFS at Livermore National Laboratory ASCI Blue machine– 4 nodes, 16 tasks– Total data size 1024MB– I/O buffer size 1MB
050
100150200250300350400
MPI-IO PHDF5 MPI-IO PHDF5
IBM_largeblock_io=false IBM_largeblock_io=true
16 write16 read
![Page 45: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/45.jpg)
45
Parallel I/OBenchmark Tool
• h5perf– Benchmark test I/O performance– Comes with HDF5 binaries– Writes datasets into the file by hyperslabs– Variables:
• Number of datasets• Number of processes• Number of bytes per process per dataset to
write/read• Threshold for data alignment• Size of transfer buffer (memory buffer) and block per
process• Collective vs. Independent• Interleaved blocks vs. contiguous blocks
![Page 46: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/46.jpg)
46
Parallel I/OBenchmark Tool
• Four kinds of API– Parallel HDF5– MPI-IO– Native parallel (e.g., gpfs, pvfs)– POSIX (open, close, lseek, read, write)
• Provides standard approach to measure andcompare performance results
![Page 47: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/47.jpg)
47
Parallel I/O Tuning
• Challenging task– Performance vary from platform to platform– Complex access patterns– Many layers can be involved
• SAF-HDF5-MPIO-GPFS
![Page 48: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/48.jpg)
48
Parallel I/O TuningExample of transfer buffer effect
• h5perf run on NERSC IBM SP• 4 processes, 4 nodes, 1MB file, 1 dataset, 256KB
data per process to write• Maximum achieved write speed in MB/sec
![Page 49: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/49.jpg)
49
Parallel I/O TuningExample of transfer buffer effect on SP2
0
50
100150
200
250
300
write open-write-close
write open-write-close
128 256
Transfer buffer size in KB
Spe
ed in
MB
/sec
POSIX
MPI-IO
PHDF5
HDF5
![Page 50: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/50.jpg)
50
Parallel I/O TuningExample of transfer buffer effect on SP2
020406080
100120140160180
read open-read-close
read open-read-close
128 256
Transfer buffer size in KB
Spe
ed in
MB
/sec
POSIX
MPI-IO
PHDF5
![Page 51: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/51.jpg)
51
Parallel I/O TuningExample of transfer buffer effect on SGI
020406080
100120140160
write open-write-close
write open-write-close
128 256
Transfer buffer size in KB
Spe
ed in
MB
/sec
POSIX
MPI-IO
PHDF5
HDF5
![Page 52: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/52.jpg)
52
Summery results for Blue
• h5perf run on ASCI IBM SP Blue• 1 to 4 processes per node, 16 nodes, 256KB data
per process to write/read, 256 KB transfer size,256KB block size
• Varied:– Number of tasks per node (1 – 4)– Number of datasets 50, 100, 200– Independent or collective calls
![Page 53: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/53.jpg)
53
HDF5 collective write results
52.662 TPN
25 – 4540 – 60200
66.243 TPN
50 – 6040 – 60100
68.473 TPN
50 – 6020 – 3050
MPI-IO bestresultSpeed inMB/sec
Tasks pernode (TPN)3 – 4Speed inMB/sec
Tasks pernode (TPN)1 – 2Speed inMB.sec
Number ofdatasets
![Page 54: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/54.jpg)
54
HDF5 independent write results
61.063 TPN
35 – 6020 – 35200
31.094 TPN
20 – 3520 – 35100
32.142 TPN
20 – 3520 – 3550
MPI-IO bestresultSpeed inMB/sec
Tasks pernode 3 – 4Speed inMB/sec
Tasks pernode 1 – 2Speed inMB.sec
Number ofdatasets
![Page 55: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/55.jpg)
55
Read performance
330 - 2935100 - 650Independent
335 - 180075 - 200Collective
MPI-IOSpeed in MB/sec
PHDF5Speed in MB/sec
Mode
![Page 56: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/56.jpg)
56
Future Parallel HDF5 Features
• Flexible PHDF5– Reduces the needs of collective calls– Set aside a process for independent calls coordination– Estimated release date: end of 2002
![Page 57: HDF5 I/O Performance](https://reader033.vdocument.in/reader033/viewer/2022052905/55842e9fd8b42a86478b52b3/html5/thumbnails/57.jpg)
57
Useful Parallel HDF Links
• Parallel HDF information site– http://hdf.ncsa.uiuc.edu/Parallel_HDF/
• Parallel HDF mailing list– [email protected]
• Parallel HDF5 tutorial available at– http://hdf.ncsa.uiuc.edu/HDF5/doc/Tutor