hdf5 and h5py tutorial - national energy research ... · jialin liu data analytics & service...

23
Jialin Liu Data Analytics & Service Group HDF5 and H5py Tutorial -1- Feb 22, 2017

Upload: dangliem

Post on 28-Apr-2018

301 views

Category:

Documents


12 download

TRANSCRIPT

Page 1: HDF5 and H5py Tutorial - National Energy Research ... · Jialin Liu Data Analytics & Service Group! HDF5 and H5py Tutorial - 1 - Feb 22, 2017 Goals - 2 - • Introduce you to HDF5

Jialin Liu Data Analytics & Service Group!

HDF5 and H5py Tutorial

-1-Feb22,2017

Page 2: HDF5 and H5py Tutorial - National Energy Research ... · Jialin Liu Data Analytics & Service Group! HDF5 and H5py Tutorial - 1 - Feb 22, 2017 Goals - 2 - • Introduce you to HDF5

Goals

-2-

•  Introduce you to HDF5 •  HDF5 data model

•  Python Interface of HDF5: H5py •  Basic usage •  Best practice

Page 3: HDF5 and H5py Tutorial - National Energy Research ... · Jialin Liu Data Analytics & Service Group! HDF5 and H5py Tutorial - 1 - Feb 22, 2017 Goals - 2 - • Introduce you to HDF5

What is HDF5?

3

•  HDF5 == Hierarchical Data Format, v5 •  Open file format

–  Designed for high volume or complex data

•  Open source software –  Works with data in the format

•  A data model –  Structures for data organization and specification

Page 4: HDF5 and H5py Tutorial - National Energy Research ... · Jialin Liu Data Analytics & Service Group! HDF5 and H5py Tutorial - 1 - Feb 22, 2017 Goals - 2 - • Introduce you to HDF5

HDF5 File

October 27, 2016 4

lat|lon|temp----|-----|-----12|23|3.115|24|4.217|21|3.6

An HDF5 file is a container that holds data objects.

Page 5: HDF5 and H5py Tutorial - National Energy Research ... · Jialin Liu Data Analytics & Service Group! HDF5 and H5py Tutorial - 1 - Feb 22, 2017 Goals - 2 - • Introduce you to HDF5

HDF5 Data Model

October 27, 2016 5

File

Dataset Link

Group

Attribute Dataspace

Datatype

HDF5 Objects

Page 6: HDF5 and H5py Tutorial - National Energy Research ... · Jialin Liu Data Analytics & Service Group! HDF5 and H5py Tutorial - 1 - Feb 22, 2017 Goals - 2 - • Introduce you to HDF5

HDF5 Dataset

6

• HDF5 datasets organize and contain data elements. •  HDF5 datatype describes individual data elements. •  HDF5 dataspace describes the logical layout of the data elements.

Integer: 32-bit, LE

HDF5 Datatype

Multi-dimensional array of identically typed data elements

Specifications for single data element and array dimensions

2 Rank Dimensions

Dim[0] = 4 Dim[1] = 5

HDF5 Dataspace

Page 7: HDF5 and H5py Tutorial - National Energy Research ... · Jialin Liu Data Analytics & Service Group! HDF5 and H5py Tutorial - 1 - Feb 22, 2017 Goals - 2 - • Introduce you to HDF5

HDF5 Groups and Links

7

lat|lon|temp----|-----|-----12|23|3.115|24|4.217|21|3.6

ExperimentNotes:SerialNumber:99378920Date:3/13/09ConfiguraCon:Standard3

/

SimOutViz

HDF5 groups and links organize data objects.

Every HDF5 file has a root group

Parameters10;100;1000

Timestep36,000

Page 8: HDF5 and H5py Tutorial - National Energy Research ... · Jialin Liu Data Analytics & Service Group! HDF5 and H5py Tutorial - 1 - Feb 22, 2017 Goals - 2 - • Introduce you to HDF5

HDF5 Attributes

-8-

•  Typically contain user metadata

•  Have a name and a value

•  Attributes “decorate” HDF5 objects

•  Value is described by a datatype and a dataspace

•  Analogous to a dataset, but do not support partial I/O operations; nor can they be compressed or extended

>h5dump

Page 9: HDF5 and H5py Tutorial - National Energy Research ... · Jialin Liu Data Analytics & Service Group! HDF5 and H5py Tutorial - 1 - Feb 22, 2017 Goals - 2 - • Introduce you to HDF5

H5py

-9-

The h5py package is a Pythonic interface to the HDF5 binary data format. •  H5py provides easy-to-use high level interface, which

allows you to store huge amounts of numerical data, •  Easily manipulate that data from NumPy. •  H5py uses straightforward NumPy and Python

metaphors, like dictionary and NumPy array syntax.

Page 10: HDF5 and H5py Tutorial - National Energy Research ... · Jialin Liu Data Analytics & Service Group! HDF5 and H5py Tutorial - 1 - Feb 22, 2017 Goals - 2 - • Introduce you to HDF5

H5py at NERSC

-10-

Serialh5py

Anacondaincludesh5pypackageØ H5py2.6.0Ø Built-inhdf5library,1.8.17Ø EasyusewithotherpackagesØ Noparallelsupport

WorksonbothEdisonandCori

Page 11: HDF5 and H5py Tutorial - National Energy Research ... · Jialin Liu Data Analytics & Service Group! HDF5 and H5py Tutorial - 1 - Feb 22, 2017 Goals - 2 - • Introduce you to HDF5

H5py at NERSC

-11-

Weprovideparallelh5pyatNERSC

H5py-parallel@NERSCØ H5py2.6.0Ø Compiledwithcray-hdf5-parallel/1.8.16Ø Noconflictwithanaconda’sserialh5py

v Importh5py(perfectlyfine)v Canusetogetherwithanaconda

Ø Uptodatefeatures

Page 12: HDF5 and H5py Tutorial - National Energy Research ... · Jialin Liu Data Analytics & Service Group! HDF5 and H5py Tutorial - 1 - Feb 22, 2017 Goals - 2 - • Introduce you to HDF5

-12-

Basic Usage

Page 13: HDF5 and H5py Tutorial - National Energy Research ... · Jialin Liu Data Analytics & Service Group! HDF5 and H5py Tutorial - 1 - Feb 22, 2017 Goals - 2 - • Introduce you to HDF5

Basic Usage of H5py

-13-

ReadwithH5py

Filename Readorwritemode

Page 14: HDF5 and H5py Tutorial - National Energy Research ... · Jialin Liu Data Analytics & Service Group! HDF5 and H5py Tutorial - 1 - Feb 22, 2017 Goals - 2 - • Introduce you to HDF5

Basic Usage of H5py: Dealing with File

-14-

OnlyonComputenode

NeedMPIforparallelh5py

Nodifferencewithserialh5py

Page 15: HDF5 and H5py Tutorial - National Energy Research ... · Jialin Liu Data Analytics & Service Group! HDF5 and H5py Tutorial - 1 - Feb 22, 2017 Goals - 2 - • Introduce you to HDF5

Playing with the data: More than numpy

-15-

Useh5pyinAstronomy

Pathtothedataset Datasetname

Returnedadatasetobject

Page 16: HDF5 and H5py Tutorial - National Energy Research ... · Jialin Liu Data Analytics & Service Group! HDF5 and H5py Tutorial - 1 - Feb 22, 2017 Goals - 2 - • Introduce you to HDF5

Playing with the data: More than numpy

-16-

Chunking

Compression

Slicing

Page 17: HDF5 and H5py Tutorial - National Energy Research ... · Jialin Liu Data Analytics & Service Group! HDF5 and H5py Tutorial - 1 - Feb 22, 2017 Goals - 2 - • Introduce you to HDF5

Best Practice

-17-

StriveforBestPerformance! atNERSC

Page 18: HDF5 and H5py Tutorial - National Energy Research ... · Jialin Liu Data Analytics & Service Group! HDF5 and H5py Tutorial - 1 - Feb 22, 2017 Goals - 2 - • Introduce you to HDF5

-18-

Really Challenging: More than Single IO Layer

h5py mpi4py numpy

Page 19: HDF5 and H5py Tutorial - National Energy Research ... · Jialin Liu Data Analytics & Service Group! HDF5 and H5py Tutorial - 1 - Feb 22, 2017 Goals - 2 - • Introduce you to HDF5

-19-

1. Optimal HDF5 file creation

2.25X

Choosethemostmodernformat[opConal]

Page 20: HDF5 and H5py Tutorial - National Energy Research ... · Jialin Liu Data Analytics & Service Group! HDF5 and H5py Tutorial - 1 - Feb 22, 2017 Goals - 2 - • Introduce you to HDF5

-20-

2. Speedup the I/O with collective I/O

2X

CollecCveIOreducestheIOcontenCononserverside

IndependentIO

CollecCveIO

Page 21: HDF5 and H5py Tutorial - National Energy Research ... · Jialin Liu Data Analytics & Service Group! HDF5 and H5py Tutorial - 1 - Feb 22, 2017 Goals - 2 - • Introduce you to HDF5

-21-

3. Use low-level API in H5py

GetclosertotheHDF5Clibrary,finetuning

Page 22: HDF5 and H5py Tutorial - National Energy Research ... · Jialin Liu Data Analytics & Service Group! HDF5 and H5py Tutorial - 1 - Feb 22, 2017 Goals - 2 - • Introduce you to HDF5

-22-

4. Avoid type casting on the fly

vs

ReducestheIOCme,from527secondsto1.3secondswhenwriCnga100x100x100arraywith5x5x5procs

Page 23: HDF5 and H5py Tutorial - National Energy Research ... · Jialin Liu Data Analytics & Service Group! HDF5 and H5py Tutorial - 1 - Feb 22, 2017 Goals - 2 - • Introduce you to HDF5

-23-

High Performance H5py with Sample Codes hip://www.nersc.gov/users/data-analyCcs/data-management/i-o-libraries/hdf5-2/h5py/

Thanks

Stay Tuned