april 28, 2008lci tutorial1 introduction to hdf5 tools tutorial part ii

42
April 28, 200 8 LCI Tutorial 1 Introduction to HDF5 Tools Tutorial Part II

Upload: sophie-cooper

Post on 02-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 1

Introduction to HDF5 Tools

Tutorial

Part II

Page 2: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 2

Outline

• Overview of HDF5 tools• Using tools for problems troubleshooting

Page 3: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 3

HDF5 command-line tools

• Readers h5dump, h5diff, h5ls1.8 tools: h5check, h5stat

• Writersh5repack, h5repart, h5import, h5jam/h5unjam1.8 tools: h5copy, h5mkgrp

• Convertersh4toh5, h5toh4, gif2h5, h52gif

Page 4: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 4

h5dump

• Dumps the content of an HDF5 file to standard output and optionally to the following types of files

1. ASCII text file2. XML file3. Binary file

• Flags to remember -H to print header information -p to print objects’ properties -b to export data in a binary form -o to export data to a file (text by default) -y to skip printing indices -w to specify line width

Page 5: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 5

h5dump -H SDS.h5

HDF5 "SDS.h5" {

GROUP "/" {

GROUP "Floats" {

DATASET "FloatArray" {

DATATYPE H5T_IEEE_F32LE

DATASPACE SIMPLE { ( 4, 3 ) / ( 4, 3 ) }

}

}

DATASET "IntArray" {

DATATYPE H5T_STD_I32LE

DATASPACE SIMPLE { ( 5, 6 ) / ( 5, 6 ) }

}

}

}

Page 6: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 6

h5dump -d /Floats/FloatArray SDS.h5

HDF5 "SDS.h5" {

DATASET "/Floats/FloatArray" {

DATATYPE H5T_IEEE_F32LE

DATASPACE SIMPLE { ( 4, 3 ) / ( 4, 3 ) }

DATA {

(0,0): 0.01, 0.02, 0.03,

(1,0): 0.1, 0.2, 0.3,

(2,0): 1, 2, 3,

(3,0): 10, 20, 30

}

}

}

Page 7: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 7

h5dump -x SDS.h5

Page 8: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 8

h5dump binary output

-b F, --binary=F The form of the binary output (F):• MEMORY -- for memory type

Data in a file will have the same data type as in memory

• FILE -- for the disk file type Data in a file will have the same data type as

corresponding dataset in an HDF5 file• LE -- for pre-defined little endian type

H5T_IEEE_F64LE• BE -- for pre-defined big endian type

H5T_STD_I32BE

Page 9: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 9

h5dump -d /IntArray -o out_le.bin -b LE SDS.h5

od --width=24 -t x4 out_le.bin0000000 00000000 00000001 00000002 00000003 00000004 00000005

0000030 0000000a 0000000b 0000000c 0000000d 0000000e 0000000f

0000060 00000014 00000015 00000016 00000017 00000018 00000019

0000110 0000001e 0000001f 00000020 00000021 00000022 00000023

0000140 00000028 00000029 0000002a 0000002b 0000002c 0000002d

Dumps a 32-bit integer dataset, IntArray, from SDS.h5 to a little endian binary file out_le.bin

Page 10: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 10

h5diff

Using h5diff, you can • compare two objects in the same file

• compare two objects between two files

• compare all objects between two files

Page 11: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 11

h5diff SDS.h5 SDS2.h5

• Dataset: </IntArray> and </IntArray>• 5 differences found

Page 12: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 12

h5diff SDS.h5 SDS2.h5 -r /IntArray

Dataset: </IntArray> and </IntArray>

positionIntArray IntArray difference

------------------------------------------------------------

[ 0 0 ] 0 10 10

[ 1 0 ] 10 100 90

[ 2 0 ] 20 200 180

[ 3 0 ] 30 300 270

[ 4 0 ] 40 400 360

5 differences found

Page 13: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 13

h5repack

• Copies an HDF5 file to a new file with/without compression/chunkingRemove un-used spaceApply compression filterApply layout

Page 14: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 14

h5repack: Applying filters

-f FILTER GZIP, to apply GZIP compression SZIP, to apply SZIP compression SHUF, to apply the HDF5 shuffle filter FLET, to apply the HDF5 checksum filter NBIT, to apply NBIT compression SOFF, to apply the HDF5 Scale/Offset filter NONE, to remove all filters

For exampleh5repack -i SDS2.h5 -o SDS2_compressed.h5 -f /IntArray:GZIP=9

Remember that if your data is smaller than 1K, compression will not

be applied, see -m flag

Page 15: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 15

h5repack: Data layout

-l LAYOUTCHUNK, to apply chunking layoutCOMPA, to apply compact layoutCONTI, to apply continuous layout

For exampleh5repack -i SDS.h5 -o SDS_chunk.h5

-l /Floats/FloatArray,/IntArray:CHUNK=2x3

Page 16: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 16

h5repart

Repartitions a file or family of files

For exampleh5repart -m 200m int16kx16k.h5 part200m%d.h5

977 MB

200 MB part200m0.h5

200 MB part200m1.h5

200 MB part200m2.h5

200 MB part200m3.h5

177 MB part200m1.h5

Page 17: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 17

h5import

Imports binary/ASCII data into an HDF5 fileh5import infile -c config_file [infile -c config_file2 ...] -outfile

outfile

Example:h5import float5x4x2.txt -c First_set.conf -o First_set.h5

PATH work/First-set INPUT-CLASS TEXTFP RANK 3 DIMENSION-SIZES 5 2 4 OUTPUT-CLASS FP OUTPUT-SIZE 64 OUTPUT-ARCHITECTURE IEEE OUTPUT-BYTE-ORDER LE CHUNKED-DIMENSION-SIZES 2 2 2 MAXIMUM-DIMENSIONS 8 8 -1

GROUP "/" { GROUP "work" { DATASET "First-set" { DATATYPE H5T_IEEE_F64LE DATASPACE SIMPLE { ( 5, 2, 4 ) / ( 8, 8, H5S_UNLIMITED ) } DATA { (0,0,0): 1.01, 1.02, 1.03, 1.04, (0,1,0): 1.11, 1.12, 1.13, 1.14, (1,0,0): 1.21, 1.22, 1.23, 1.24, (1,1,0): 1.31, 1.32, 1.33, 1.34, (2,0,0): 1.41, 1.42, 1.43, 1.44, (2,1,0): 1.51, 1.52, 1.53, 1.54, (3,0,0): 2.01, 2.02, 2.03, 2.04, (3,1,0): 2.11, 2.12, 2.13, 2.14, (4,0,0): 2.21, 2.22, 2.23, 2.24, (4,1,0): 2.31, 2.32, 2.33, 2.34 } } }}}

Page 18: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 18

h5jam/h5unjam

• Adds/removes a file at the beginning of an HDF5 file

• Example:

• h5jam -- adds text to User Blockh5jam -u test_ub.txt -i test_ub.h5

• h5unjam -- removes text from User Blockh5unjam -i test_ub.h5 -o out_ub.txt -o out_ub.h5

Page 19: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 19

h5ls

• Lists selected information about file objects in the specified format

Example: h5ls -r SDS2.h5

/Floats Group/Floats/DoubleArray Dataset {10, 5}/Floats/FloatArray Dataset {4, 3}/Floats/subs Group/IntArray Dataset {5, 6}

Page 20: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 20

gif2h5 / h52gif

• gif2h5 – Converts a GIF file into HDF5

gif2h5 apollo17_earth.gif apollo17_earth.h5• h52gif – Converts an HDF5 file into GIF

h52gif apollo17_earth.h5 apollo17_earth2.gif

-i /apollo17_earth.gif/Image0 -p "/apollo17_earth.gif/Global Palette"

Page 21: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 21

h5copy

• Copies an object from one location to another location within a file or across files

• Available in 1.8.0 and later

/

FloatArray

FloatsIntArray

/

FloatArray

Page 22: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 22

h5copy

usage: h5copy [OPTIONS] [OBJECTS...]• -i, --input input file name• -o, --output output file name• -s, --source source object name• -d, --destination destination object name• -f, --flag <value>

shallow Copy only immediate members for groups

soft Expand soft links into new objects

ext Expand external links into new objects

ref Copy objects that are pointed by references

noattr Copy object without copying attributes

Page 23: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 23

h5copy

Exampleh5copy -i SDS.h5 -o SDS_cp.h5 -s /Floats/FloatArray -

d /FloatArray

/

FloatArray

FloatsIntArray

/

FloatArray

SDS.h5

SDS_cp.h5

Page 24: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 24

h5copy -f shallow

/

i1

floatsintegers

64-bit

i2

f32 f2f1

/

floats

64-bitf32

f2f1

/

floats

64-bitf32

-f shallow

Page 25: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 25

h5copy -f soft

/

-f soft

dset_SL

/f1/f1

f1

/

dset_SL

/f1/f1

f1

/

dset_SL

/f1/f1

Page 26: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 26

h5copy -f ref

/

-f ref

d1

dset_ref

d2

1895

763

/

d1

dset_ref

d2

679

1287

/

dset_ref

0

0

Page 27: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 27

h5stat

• Prints different statistics about HDF5 file• Helps

To troubleshoot size overhead in HDF5 files To choose specific object’s properties and storage strategies

• Available in 1.8.0 and later

Page 28: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 28

h5check

Verifies if an HDF5 file is encoded according to the HDF5 File Format Specification

Does not use HDF5 library Serves as a watch dog that the HDF5 library implementation is

compliant with the HDF5 File Format Specification Tool is NOT a part of the HDF5 source code distribution

Page 29: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 29

How to use it?

h5check [-vn] <filename>-vn verboseness mode

n=0 Terse—only prints if the file is compliant or not

n=1 Default—prints its progress and all errors found

n=2 Verbose—prints everything it knows, usually for

debugging

Page 30: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 30

Example: a compliant file

% h5check example1.h5VALIDATING example1.h5FOUND super block signatureVALIDATING the super block at 0...VALIDATING the object header at 928...VALIDATING the btree at 384...FOUND btree signature.VALIDATING the local heap at 96...FOUND local heap signature.…Result: File is in compliance.

Page 31: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 31

Example: a non-compliant file

h5check invalid2.h5FOUND super block signatureVALIDATING the super block at 0...VALIDATING the object header at 928...VALIDATING the btree at 384...FOUND btree signature.VALIDATING the SNOD at 1248...FOUND SNOD signature.VALIDATING the object header at 976...check_sym(at 1248): Errors from check_obj_header()decode_validate_messages(): Failure in type->decode().H5O_sdspace_decode(): Bad version number in simple dataspace message.VALIDATING the local heap at 96...FOUND local heap signature.Main(): Errors from check_obj_header().decode_validate_messages(): Failure in type->decode().H5O_attr_decode(): Can't decode attribute dataspace.H5O_sdspace_decode(): Bad version number in simple dataspace message.…Result: File is not in compliance.

Page 32: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 32

Using HDF5 Tools for Performance Tuning and

Troubleshooting

Page 33: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 33

Introduction

• HDF5 tools may be very useful for performance tuning and troubleshooting Discover objects and their properties in HDF5 files

h5dump -p Get file size overhead information

h5stat Get locations of the objects in a file

h5ls Discover differences

h5diff, h5ls Location of raw data

h5ls –var

Page 34: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 34

h5stat

• Prints different statistics about HDF5 file• Helps

To troubleshoot size overhead in HDF5 files To choose specific object’s properties and storage strategies

• To use h5stat --helph5stat file.h5

• Full spec can be found http://www.hdfgroup.uiuc.edu/RFC/HDF5/h5stat/

• Let us know if you need some “special” type of statistics

Page 35: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 35

h5stat

• Reports two types of statistics:

• High-level information about objects (examples): Number of different objects (groups, datasets, datatypes) in a file Number of unique datatypes Size of raw data in a file

• Information about object’s structural metadata

• Sizes of structural metadata (total/free) Object headers, local and global heaps Sizes of B-trees

• Object headers fragmentation

Page 36: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 36

h5stat

• Examples of high-level information:

File information # of unique groups: 10008 # of unique datasets: 30 # of unique named datatypes: 0……………………Max. # of links to object: 1 Max. depth of hierarchy: 4 Max. # of objects in group: 19……………………Group bins: # of groups of size 0: 10000 # of groups of size 1 - 9: 7 # of groups of size 10 - 99: 1……………………

Max. dimension size of 1-D datasets: 1643……………………Dataset filters information: Number of datasets with ……………… SZIP filter: 2 ……………… NBIT filter: 10 USER-DEFINED filter: 1

Page 37: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 37

h5stat

• Conclusion:

• There are a lot of empty groups in the file; good candidate for compact group feature (h5repack -l ….)

• Some datasets use “user-defined” filters and may not be readable by HDF5 library

• SZIP compression is needed to read some datasets

Oh… my application uses buffers of size 1024 to read data…No wonder it crashes on reading…Do I have all filters needed to read the data?

Page 38: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 38

h5stat

• Examples of structural metadata information:Object header size: (total/unused) Groups: 1808/72 Datasets: 15792/832………Dataset storage information: Total raw data size: 6140688………Dataset datatype #3: Count (total/named) = (2/0) Size (desc./elmt) = (10/65535)Dataset datatype #4: Count (total/named) = (1/0) Size (desc./elmt) = (10/32000)

Page 39: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 39

• Conclusions• File size: 6228197• 1.5% overhead (not bad at all!)• There some elements of size 65535 and 32000

Oh… Is it really what I want?Should I use other datatype and get advantage of compression?

h5stat

Page 40: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 40

Case study: Using HDF5tools to debug a problem

• My application creates files on Windows with VS2005 and VS2003. I can read the VS2003 file but not the VS2005 one. H5dump reads both files OK and there are no differences. What am I doing wrong?

• h5diff good.h5 bad.h5 Datatype: </Definitions/timespec> and </Definitions/timespec> 1 differences

found

• h5ls –var good.h5 /Definitions/timespec Type Location: 0:1:0:900

• h5debug good.h5 900Message Information:Type class: compoundSize: 8 bytes

• h5debug bad.h5 900Message Information:Type class: compoundSize: 16 bytes

Page 41: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 41

• ConclusionsCompound datatype “timespec” requires

different number of bytes on VS2005 (16 bytes; 2x8bytes) and on VS2003 (8bytes; 2x4bytes)

Oh… How do I read my data back?I assumed that my struct would need only 8 bytes for each element but it needs 16 bytes on VS2005. I need H5Tget_native_type functionto find the type of my data in memory

Case study: Using HDF5tools to debug a problem

Page 42: April 28, 2008LCI Tutorial1 Introduction to HDF5 Tools Tutorial Part II

April 28, 2008 LCI Tutorial 42

Questions?

End of Part II