problem minimize file size vr...

2
SC14 Poster @ New Orleans, LA, USA 1621, Nov. 2014 Partially supported by Efficient Data Compression by Efficient Use of HDF5 Format with Bit Segmentation Method for Visualization and Analysis Katsumi Hagita(National Defense Academy of JAPAN), Manabu Omiya(Hokkaido Univ.), Takashi Honda(Zeon), Masao Ogino(Nagoya Univ.) Data Transfer Problem Concept of JHPCNDF Minimize file size for VR visualization. Advantage of JHPCNDF Decodable without special API. JHPCNDF is effective for almost software and user application supporting HDF5. Splitting a few HDF5 files by simple converter application. Recoverable with lossless. Peta scale Strorage Relatively narrow LAN and small disk space at local environment H P C Top Supercomputers Neighboring supercomputers Local workstation (PC) Visualization Analysis Lossless Look like random noise Bad Compression IEEE754 exponent fraction sign (11bit) (53bit) Jointed Hierarchical Precision Compression Number Data Format (JHPCNDF) Highly correlated data Good Compression Original data Segmentation 1 Segmentation 2 XOR Record exponent fraction sign (11bit) (53bit) HDF5 HDF5 HDF5 union fi32{ float f; int i32; }; union fi32 fival; double logallo=log(allowerr)/log(2.0); fval=frexp(fval0,&ival); ival2=(int)(logallo+ival1); sval=(int)(23ival2+1); if(sval>23) sval=23; do { sval‐‐; fival.f=fval0; fival.i32=(fival.i32 >> sval); fival.i32=(fival.i32 << sval); fval1=fival.f; } while ((fval1fval0)*(fval1fval0)>allowerr*allowerr); fival.f=fval0; fival1.f=fval1; ixval=(fival1.i32 ^ fival.i32); exponent fraction sign (8bit) (23bit) Padding by zero bit Reduction of Data Size Data Size of zero padding is almost zero. Huffman coding Segmentation XOR Example of implementation This HDF5 file can be used by the standard API. Nbit Filter packing nbit data on output by stripping off all unused bits and unpacking on input (restoring extra bits) ScaleOffset Filter Truncate value to a minimum number of bits before storing by scale and/or offset Relative error < 5*(Dscale_factor+1)) Related method of standard HDF5

Upload: others

Post on 14-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Problem Minimize file size VR visualization.sc14.supercomputing.org/.../poster_files/post173s2-file2.pdfSC14 Poster @ New Orleans, LA, USA 16‐21, Nov. 2014 Partially supported by

SC14 Poster  @ New Orleans, LA, USA  16‐21, Nov. 2014 Partially supported by

Efficient Data Compression by Efficient Use of HDF5 Formatwith Bit Segmentation Method  for Visualization and Analysis

Katsumi Hagita(National Defense Academy of JAPAN), Manabu Omiya(Hokkaido Univ.), Takashi Honda(Zeon), Masao Ogino(Nagoya Univ.)

Data Transfer Problem

Concept of JHPCN‐DF

Minimize file size for VR visualization.

Advantage of JHPCN‐DF

• Decodable without special API.• JHPCN‐DF is effective for almost software 

and user application supporting HDF5.

• Splitting a few HDF5 files by simple converter application.

• Recoverable with lossless.

Peta scaleStrorage

Relatively narrow LAN and small disk space at local environment

H P C

Top SupercomputersNeighboring supercomputersLocal workstation (PC)Visualization

Analysis

LosslessLook like random noiseBad Compression

IEEE754 exponent fraction

sign (11bit) (53bit)

Jointed Hierarchical Precision Compression Number ‐ Data Format  (JHPCN‐DF)

Highly correlated dataGood Compression

Original data

Segmentation 1

Segmentation 2

XOR

Record

exponent fraction

sign (11bit) (53bit)

HDF5 HDF5 HDF5

union fi32{float f;int i32;};union fi32 fival;double logallo=log(allowerr)/log(2.0);

fval=frexp(fval0,&ival);ival2=(int)(‐logallo+ival‐1);sval=(int)(23‐ival2+1);if(sval>23) sval=23;do {sval‐‐;fival.f=fval0;fival.i32=(fival.i32 >> sval);fival.i32=(fival.i32 << sval);fval1=fival.f;} while ((fval1‐fval0)*(fval1‐fval0)>allowerr*allowerr);

fival.f=fval0;fival1.f=fval1;ixval=(fival1.i32  ^ fival.i32);

exponent fraction

sign (8bit) (23bit)

Padding by zero bit

Reduction of Data Size

Data Sizeof zero paddingis almost zero.

Huffmancoding

Segmentation

XOR

Example of implementation

This HDF5 file can be used by the standard API.

• N‐bit Filterpacking n‐bit data on output by strippingoff all unused bitsand

unpacking on input (restoring extra bits)

• Scale‐Offset FilterTruncate value to a minimum number ofbits before storing by scale and/or offsetRelative error < 5*(D‐scale_factor+1)) 

Related method of standard HDF5

Page 2: Problem Minimize file size VR visualization.sc14.supercomputing.org/.../poster_files/post173s2-file2.pdfSC14 Poster @ New Orleans, LA, USA 16‐21, Nov. 2014 Partially supported by

Conclusion and Future

Acknowledgement

This work is partially supported by "Joint Usage/Research Center for Interdisciplinary Large‐scale Information Infrastructures" and “High performance Computing Infrastructure” in Japan.JHPCN 14‐NA28, HPCI hp140191 (JHPCN 10‐MD01, 11‐MD02, 12‐MD03, jh130028‐NA19, HPCI hp130056, hp130062, hp130122)

This work is partially supported by HPC project of Nagoya University.

Test for plasma PIC simulation

Test for FEM simulation

Comparison for allowerr =0.01, 0.02, 0.05

0.010.02

Finite element analysis of 0.24 million elements and 0.38 million nodes

8.3MBfor double

3.3MBfor 1.0e‐6

1.3MBfor 1.0e‐3

Good for analysis

Good for visualization

Test for block copolymerReduce data size from 1182MB to 42MBto present isosurface of phase separated block copolymers.  (mesh size: 512^3)

• Concept of JHPCN‐DF works and shows excellent results for various type of  simulation! JHPCN‐DF is a powerful tool for all large scale simulations.

• Distribute know‐how of JHPCN‐DF for all supercomputing.

• We will upgrade as  a filter of HDF5.

• We consider post process of HDF5 using MPI‐IO to be compressed by JHPCN‐DF as system software coupled with job scheduler.

• Treatment of the last bit is related to rounding mode control of IEEE 754.

allowerr =0.05

exponent fraction

sign (8bit) (23bit)exponent fraction

sign (8bit) (23bit)

0.0 6,336 MB ‐‐‐‐

0.001 3,369 MB 4,503 MB

0.01 1,940 MB 5,298 MB

0.02 1,413 MB 5,586 MB

0.05 883.1 MB 5,712 MB

0.1 568.5 MB 5,857 MB

Size of 20 HDF5 files of 100 timesteps150,000 ions and 150,000 electrons

• Increase number of  examples for various simulation fields (in progress)• Nonhydrostatic ICosahedral Atmospheric Model (NICAM)• Fluid dynamics simulation• 0.1 billion powder simulation

0.05