performance tuning in hdf5
DESCRIPTION
In this talk we will examine how to tune HDF5 performance to improve I/O speed. The talk will focus on chunk and metadata caches, how they affect performance, and which HDF5 APIs that can be used for performance tuning. Examples of different chunking strategies will be given. We will also discuss how to reduce file overhead by using special properties of the HDF5 groups, datasets and datatypes.TRANSCRIPT
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
1
Performance Tuning in HDF5
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
2
Outline
• HDF5 file overhead• Performance and options for tuning
• Chunking • Caching
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
3
HDF5 file overhead
• Why my HDF5 file is so big?• Each HDF5 object has an overhead (object
header, B-trees, global heaps)• Examples
• Empty HDF5 file (has root group) 976 bytes• File with one empty group 1952 bytes• Dataset without data written 1576 bytes• Dataset with 4MB data written 4002048 bytes
(1000x1000 int)• Dataset with chunk storage 25 chunks 4004664
bytes• Dataset with chunk storage 10^4 chunks 4497104
bytes• Dataset with chunk storage 25 chunks with
compression 34817 bytes (actual data was 8-bit integer)
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
4
HDF5 file overhead
• Overhead may come from datatypes • Example: Compound datatype
typedef struct s1_t { int a;
char d;
float b;
double c;
} s1_t;
• If compiler aligns data on 8 byte boundaries, we have 11 bytes overhead for each element
• Use H5Tpack to “compress” data
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
5
HDF5 file overhead
• Example: Variable length datatypedef struct vl_t {
int len;
void *p;
} vl_t;
• ~20 bytes overhead for each element• Raw data is stored in global heaps• Cannot be compressed• Opening/closing VL dataset will increase overhead
in a file due to fragmented global heap
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
6
HDF5 file overhead
• Example: 9 bit integer• Takes at least 16bits, 7 bits are overhead• Use N-bit filter to compress (1.8.0 release)
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
7
Summary
• File • Try to keep file open as long as possible; don’t
open/close if not necessary
• Groups• Use compact storage (available in 1.8.0) for
groups with a few members
• Datasets• Use compact group storage for small ( <64K)
datasets• If many datasets are of the same datatype, use
shared datatype• Use appropriate chunking and compression
strategy
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
8
HDF5 chunking
• Chunking refers to a storage layout where a dataset is partitioned into fixed-size multi-dimensional chunks
• Used when dataset is• Extendible• Compressed• Checksum or other filters are applied
• HDF5 chunk’s properties• Chunk has the same number of dimensions as a
dataset• Chunks cover all dataset, but the dataset need not
be an integral number of chunks• If no data is ever written to a chunk, chunk is not
allocated in a file• Chunk is an atomic object for I/O operation (e.g.
written or read in one I/O operation)
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
9
HDF5 chunking
1 2 3
4 5 6
7 8 9
10 11 12
• Dataset is covered by 12 chunks• Chunks 1,6 and 11 never allocated in a file• Compression, encryption, checksum, etc is performed on entire chunk
Data written
Dataset
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
10
HDF5 filter pipeline
File Chunk # 12
Example: H5Dwrite touches only a few bytes in a chunk
• Entire chunk is read from the file• Data passes through filter pipeline• The few bytes are modified• Data passes through filter pipeline• Entire chunk is written back to file• May be written to another location leaving a hole in the file• Can increase file size
Modify bytes
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
11
HDF5 filter pipeline and chunk cache
Example: H5Dwrite touches only a few bytes in a chunk
• Calling H5Dwrite many times would result in poor performance• Chunk cache layer
• Holds 521 chunks or 1MB (whichever is less)
• H5Pset_cache call to modify cache
Modify bytes
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
12
HDF5 chunk cache
• The preemption policy for the cache favors certain chunks and tries not to preempt them. • Chunks that have been accessed frequently in the
near past are favored. • A chunk which has just entered the cache is favored. • A chunk which has been completely read or
completely written but not partially read or written is penalized according to some application specified weighting between zero and one.
• A chunk which is larger than the maximum cache size is not eligible for caching.
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
13
HDF5 chunk overhead in a file
• B-tree maps chunk N-dimensional addresses to file addresses
• B-tree grows with the number of chunks• File storage overhead is higher• More disk I/O is required to traverse the tree from
root to leaves• Large # of B-tree nodes will result in higher
contention for metadata cache
• To reduce overhead• Use bigger chunk sizes, don’t’ use chunk size 1
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
14
HDF5 metadata cache
• Outline• Overview of HDF5 metadata cache• Implementation prior 1.8.0 release• Adaptive metadata cache in 1.8.0• Documentation is available in UG and RM
for 1.8.0 release• http://hdf.ncsa.uiuc.edu/HDF5/doc_dev_snapshot/
H5_dev/UG/UG_frame17SpecialTopics.html
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
15
Overview of HDF5 metadata cache
● Metadata – extra information about your data● Structural metadata
● Stores information about your data● Example: When you create a group, you really create:● Group header● B-Tree (to index entries), and● Local heap (to store entry names)
● User defined metadata (Created via the H5A calls)● Usually small – less than 1 KB● Accessed frequently● Small disk accesses still expensive
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
16
Overview of HDF5 metadata cache
● Cache● An area of storage devoted to the high speed retrieval
of frequently used data● HDF5 Metadata Cache
● A module that tries to keep frequently used metadata in core so as to avoid file I/O
● Exists to enhance performance● Limited size – can't hold all the metadata all the time
● Cache Hit● A metadata access request that is satisfied from cache● Saves a file access
● Cache Miss● A metadata access request that can't be satisfied from
cache● Costs a file access (several milliseconds in the worst
case)
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
17
Overview of HDF5 metadata cache
● Dirty Metadata Metadata that has been altered in cache but not
written to file● Eviction
The removal of a piece of metadata from the cache● Eviction Policy
Procedure for selecting metadata to evict● Principle of Locality
File access tends not to be random Metadata just accessed is likely to be accessed
again soon This is the reason why caching is practical
• Working set Subset of the metadata that is in frequent use at a
given point in time Size highly variable depending on file structure and
access pattern
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
18
Overview of HDF5 metadata cache
Scenario Working set size # of Metadata cache accesses
Create datasets A,B,C,D 10^6 chunks under root group
< 1MB <50K
Initialize the chunks using a round robin (1 from A, 1 from B, 1 from C, 1 from D, repeat until done
< 1MB ~30M
10^6 random accesses across A,B,C and D
~120MB ~4M
10^6 random accesses to A only
~40MB ~4M
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
19
Overview of HDF5 metadata cache
• Challenges peculiar to metadata caching in HDF5• Varying metadata entry sizes
• Most entries are less than a few hundred bytes• Entries may be of almost any size• Encountered variations from few bytes to
megabytes• Varying working set sizes
• < 1MB for most applications most of the time• ~ 8MB (astrophysics simulation code)
• Metadata cache competes with application in core• Cache must be big enough to to hold working set • Should never be significantly bigger lest is starve
the user program for core
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
20
Metadata Cache in HDF5 1.6.3 and before
Hash Table
Metadata
Metadata
Metadata
Fast No provision for collisions Eviction on collision For small hash table performance is bad since frequently accessed entries hash to the same location Good performance requires big size of hash table Inefficient use of core Unsustainable as HDF5 file size and complexity increases
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
21
Metadata Cache in HDF5 1.6.4 and 1.6.5
• Entries are stored in a hash table as before• Collisions handled by chaining• Maintain a LRU list to select candidates for
eviction• Maintain a running sum of the sizes of the entries• Entries are evicted when a predefined limit on
this sum is reached • Size of the metadata cache is bounded
• hard coded 8MB• Doesn’t work when working set size is bigger• Larger variations on a working set sizes are
anticipated• Manual control over the cache size is needed!!!!
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
22
Overview of HDF5 metadata cache
9
2
4
1
8
7
6
5
3
Hash Table
LRU list
Metadata 9 Metadata 1 Metadata 3
Metadata 2
Metadata 8
Metadata 4
Metadata 5 Metadata 7
Metadata 6
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
23
Metadata Cache in HDF5 1.8.0
• New metadata cache APIs • control cache size• monitor actual cache size and current hit rate
• Adaptive cache resizing• Enabled by default (min and max sizes are 1MB)• Automatically detects the current working size• Sets max cache size to the working set size
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
24
Metadata Cache in HDF5 1.8.0
• First problem (easy)• Detect when the cache is too small and select
a size increment (some overshoot is OK)• Check hit rate every n accesses• Increase by defined increment if hit rate is
below a threshold• Repeat until hit rate is above the threshold• User-defined via APIs
• Works well in most cases• Doesn’t work well when hit rate varies slowly
with increasing cache size• Happens when working size is very large
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
25
Metadata Cache in HDF5 1.8.0
• Second problem (hard)• Detect when the cache is too big, and select
a size decrement (must not overshoot)• Track how long since each entry in the cache
has been accessed• Check hit rate every n cache accesses• If hit rate is above some threshold, evict
entries with number of accesses less than m• If cache size is significantly below the max
cache size, reduce the cache size
• Worked well for Boeing time-segment library application
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
26
Metadata Cache in HDF5 1.8.0
• Hints:• If working set size varies quickly, correct size
is never reached• Adaptive cache resize algorithms take time
to react•Control cache size directly from application
• Shouldn’t see application memory growth due to the metadata cache
• See HDF5 documentation; collect statistics on metadata cache access
• If performance is still poor contact [email protected]
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
27
Questions? Comments?
Thank you!
Questions ?
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
28
Acknowledgement
This report is based upon work supported in part by a Cooperative Agreement with NASA under NASA NNG05GC60A. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Aeronautics and Space Administration.