the hdf group milestone 5.1: initial posix function shipping demonstration jerome soumagne, quincey...

27
The HDF Group www.hdfgroup.or g Milestone 5.1: Initial POSIX Function Shipping Demonstration Jerome Soumagne, Quincey Koziol 09/24/2013 © 2013 The HDF Group 1

Upload: cornelius-bradford

Post on 05-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The HDF Group  Milestone 5.1: Initial POSIX Function Shipping Demonstration Jerome Soumagne, Quincey Koziol 09/24/2013 © 2013 The HDF Group

The HDF Group

1 www.hdfgroup.org

Milestone 5.1: Initial POSIX Function Shipping

DemonstrationJerome Soumagne, Quincey Koziol

09/24/2013

© 2013 The HDF Group

Page 2: The HDF Group  Milestone 5.1: Initial POSIX Function Shipping Demonstration Jerome Soumagne, Quincey Koziol 09/24/2013 © 2013 The HDF Group

www.hdfgroup.org09/24/2013 Initial POSIX Function Shipping Demonstration 2

Overview – Mercury

© 2013 The HDF Group

• Mercury “Function Shipper”: RPC layer that supports• Non-blocking transfers• Large data arguments (w/RMA)• Native transport protocols of HPC systems

• Mercury serves as a basis for higher-level frameworks that need to operate on/store/access data remotely• HDF5 IOD virtual object plugin• IOFSL I/O forwarding scalability layer• Storage systems• Analysis frameworks

Page 3: The HDF Group  Milestone 5.1: Initial POSIX Function Shipping Demonstration Jerome Soumagne, Quincey Koziol 09/24/2013 © 2013 The HDF Group

www.hdfgroup.org09/24/2013 Initial POSIX Function Shipping Demonstration 3

Overview – Mercury

© 2013 The HDF Group

• Already largely presented in previous milestones• No major modification of Mercury for this deliverable in

order to support POSIX calls• But Mercury is still being improved:

• Performance tuning on Infiniband cluster• Support for additional network transports is being added (TCP /

ibverbs / SSM)

• Paper submitted at end of Q4 now accepted and being presented at IEEE Cluster 2013:• J. Soumagne, D. Kimpe, J. Zounmevo, M. Chaarawi, Q.Koziol, A.

Afsahi, and R. Ross, “Mercury: Enabling Remote Procedure Call for High-Performance Computing”, IEEE International Conference on Cluster Computing, Sep 2013

Page 4: The HDF Group  Milestone 5.1: Initial POSIX Function Shipping Demonstration Jerome Soumagne, Quincey Koziol 09/24/2013 © 2013 The HDF Group

www.hdfgroup.org09/24/2013 Initial POSIX Function Shipping Demonstration 4

Fast Forward Stack – Function Shipping

© 2013 The HDF Group

HDF5 API

VOL

Mercury(Client)

Mercury(Server)

Native (H5) IOD VOL

Network

IOD VOL

VFL

Page 5: The HDF Group  Milestone 5.1: Initial POSIX Function Shipping Demonstration Jerome Soumagne, Quincey Koziol 09/24/2013 © 2013 The HDF Group

www.hdfgroup.org09/24/2013 Initial POSIX Function Shipping Demonstration 5

POSIX Function Shipping (Example)

© 2013 The HDF Group

HDF5 API

VOL

VFL

File System

Mercury(Client)

Mercury(Server)

Native (H5) IOD VOL

sec2Network

POSIX I/O

POSIX I/OPOSIX I/O

Page 6: The HDF Group  Milestone 5.1: Initial POSIX Function Shipping Demonstration Jerome Soumagne, Quincey Koziol 09/24/2013 © 2013 The HDF Group

www.hdfgroup.org09/24/2013 Initial POSIX Function Shipping Demonstration 6

Mercury POSIX

© 2013 The HDF Group

• Support POSIX I/O routines through Mercury• Completely separate package built on top of Mercury

called: Mercury POSIX (lightweight library + server)• Design keys:

• Support 32/64 bit platforms and large files• No modification of original source code that uses POSIX I/O

(e.g., HDF5 sec2 driver)• Redirects I/O to Mercury server with dynamic linking

• Can make use of all the transports available through Mercury (although MPI dynamic connection is not really flexible and always available)

• Code for supporting POSIX routine is automatically generated inside Mercury POSIX by using BOOST preprocessor macros

Page 7: The HDF Group  Milestone 5.1: Initial POSIX Function Shipping Demonstration Jerome Soumagne, Quincey Koziol 09/24/2013 © 2013 The HDF Group

www.hdfgroup.org09/24/2013 Initial POSIX Function Shipping Demonstration 7

Mercury POSIX – Stub Generation

© 2013 The HDF Group

• Most routines are generated with one line macro• Built on top of existing Mercury/Boost macros• However supporting variable arguments routines requires

some extra lines to create encoding / decoding routines that check argument flags etc

Page 8: The HDF Group  Milestone 5.1: Initial POSIX Function Shipping Demonstration Jerome Soumagne, Quincey Koziol 09/24/2013 © 2013 The HDF Group

www.hdfgroup.org09/24/2013 Initial POSIX Function Shipping Demonstration 8

Mercury POSIX – Stub Generation

© 2013 The HDF Group

• Two main macros:/* Non-bulk routines */MERCURY_POSIX_GEN_STUB(func_name,

ret_type,in_types,out_types)

/* Bulk routines */MERCURY_POSIX_GEN_BULK_STUB(func_name,

ret_type,in_types,out_types,bulk_read)/* 1/0 if reading/writing bulk data

*/

Page 9: The HDF Group  Milestone 5.1: Initial POSIX Function Shipping Demonstration Jerome Soumagne, Quincey Koziol 09/24/2013 © 2013 The HDF Group

www.hdfgroup.org09/24/2013 Initial POSIX Function Shipping Demonstration 9

Mercury POSIX – Stub Generation

© 2013 The HDF Group

• Example, showing results of the following macro:

/* off_t lseek(int fildes, off_t offset, int whence) */MERCURY_POSIX_GEN_STUB(lseek,

hg_off_t,(hg_int32_t)(hg_off_t)(hg_int32_t),

)

Page 10: The HDF Group  Milestone 5.1: Initial POSIX Function Shipping Demonstration Jerome Soumagne, Quincey Koziol 09/24/2013 © 2013 The HDF Group

www.hdfgroup.org09/24/2013 Initial POSIX Function Shipping Demonstration 10

Mercury POSIX – Stub Generation

© 2013 The HDF Group

/* off_t lseek(int fildes, off_t offset, int whence) */MERCURY_POSIX_GEN_STUB(lseek,

hg_off_t,(hg_int32_t)(hg_off_t)(hg_int32_t),

)

• Generate input structuretypedef struct{ hg_int32_t in_param_0; hg_off_t in_param_1; hg_int32_t in_param_2;} lseek_in_t;

Page 11: The HDF Group  Milestone 5.1: Initial POSIX Function Shipping Demonstration Jerome Soumagne, Quincey Koziol 09/24/2013 © 2013 The HDF Group

www.hdfgroup.org09/24/2013 Initial POSIX Function Shipping Demonstration 11

Mercury POSIX – Stub Generation

© 2013 The HDF Group

/* off_t lseek(int fildes, off_t offset, int whence) */MERCURY_POSIX_GEN_STUB(lseek,

hg_off_t,(hg_int32_t)(hg_off_t)(hg_int32_t),

)

• Generate proc routine for input structurestatic __inline__ inthg_proc_lseek_in_t(hg_proc_t proc, void *data){ lseek_in_t *struct_data = (lseek_in_t *) data;

hg_proc_hg_int32_t(proc, &struct_data->in_param_0); hg_proc_hg_off_t(proc, &struct_data->in_param_1); hg_proc_hg_int32_t(proc, &struct_data->in_param_2);

return ret;}

Page 12: The HDF Group  Milestone 5.1: Initial POSIX Function Shipping Demonstration Jerome Soumagne, Quincey Koziol 09/24/2013 © 2013 The HDF Group

www.hdfgroup.org09/24/2013 Initial POSIX Function Shipping Demonstration 12

Mercury POSIX – Stub Generation

© 2013 The HDF Group

/* off_t lseek(int fildes, off_t offset, int whence) */MERCURY_POSIX_GEN_STUB(lseek,

hg_off_t,(hg_int32_t)(hg_off_t)(hg_int32_t),

)

• Generate output structuretypedef struct{ hg_off_t ret;} lseek_out_t;

Page 13: The HDF Group  Milestone 5.1: Initial POSIX Function Shipping Demonstration Jerome Soumagne, Quincey Koziol 09/24/2013 © 2013 The HDF Group

www.hdfgroup.org09/24/2013 Initial POSIX Function Shipping Demonstration 13

Mercury POSIX – Stub Generation

© 2013 The HDF Group

/* off_t lseek(int fildes, off_t offset, int whence) */MERCURY_POSIX_GEN_STUB(lseek,

hg_off_t,(hg_int32_t)(hg_off_t)(hg_int32_t),

)

• Generate proc routine for output structurestatic __inline__ inthg_proc_lseek_out_t(hg_proc_t proc, void *data){ lseek_out_t *struct_data = (lseek_out_t *) data;

hg_proc_hg_int64_t(proc, &struct_data->ret);

return ret;}

Page 14: The HDF Group  Milestone 5.1: Initial POSIX Function Shipping Demonstration Jerome Soumagne, Quincey Koziol 09/24/2013 © 2013 The HDF Group

www.hdfgroup.org09/24/2013 Initial POSIX Function Shipping Demonstration 14

Mercury POSIX – Stub Generation

© 2013 The HDF Group

/* off_t lseek(int fildes, off_t offset, int whence) */MERCURY_POSIX_GEN_STUB(lseek,

hg_off_t,(hg_int32_t)(hg_off_t)(hg_int32_t),

)

• Generate client stub (simplified version)hg_off_tlseek(hg_int32_t in_param_0, hg_off_t in_param_1, hg_int32_t in_param_2){ lseek_in_t in_struct; lseek_out_t out_struct; hg_off_t ret;

/* Initialization */ ...

Page 15: The HDF Group  Milestone 5.1: Initial POSIX Function Shipping Demonstration Jerome Soumagne, Quincey Koziol 09/24/2013 © 2013 The HDF Group

www.hdfgroup.org09/24/2013 Initial POSIX Function Shipping Demonstration 15

Mercury POSIX – Stub Generation

© 2013 The HDF Group

/* Register function if not registered */ MERCURY_REGISTER("lseek", lseek_in_t, lseek_out_t);

/* Fill input structure */ in_struct.in_param_0 = in_param_0; in_struct.in_param_1 = in_param_1; in_struct.in_param_2 = in_param_2;

/* Forward call to remote addr and get a new request */ HG_Forward(addr, id, &in_struct, &out_struct, &request);

/* Wait for call to be executed */ HG_Wait(request, HG_MAX_IDLE_TIME, &status);

/* Get output parameters */ ret = out_struct.ret;

return ret;}

Page 16: The HDF Group  Milestone 5.1: Initial POSIX Function Shipping Demonstration Jerome Soumagne, Quincey Koziol 09/24/2013 © 2013 The HDF Group

www.hdfgroup.org09/24/2013 Initial POSIX Function Shipping Demonstration

Mercury POSIX – Stub Generation

© 2013 The HDF Group16

/* off_t lseek(int fildes, off_t offset, int whence) */MERCURY_POSIX_GEN_STUB(lseek,

hg_off_t,(hg_int32_t)(hg_off_t)(hg_int32_t),

)

• Generate server stub (simplified version)static intlseek_cb(hg_handle_t handle){ lseek_in_t in_struct; lseek_out_t out_struct;

hg_int32_t in_param_0; hg_off_t in_param_1; hg_int32_t in_param_2;

hg_off_t ret;

Page 17: The HDF Group  Milestone 5.1: Initial POSIX Function Shipping Demonstration Jerome Soumagne, Quincey Koziol 09/24/2013 © 2013 The HDF Group

www.hdfgroup.org09/24/2013 Initial POSIX Function Shipping Demonstration 17

Mercury POSIX – Stub Generation

© 2013 The HDF Group

/* Get input buffer */ HG_Handler_get_input(handle, &in_struct);

/* Get parameters */ in_param_0 = in_struct.in_param_0; in_param_1 = in_struct.in_param_1; in_param_2 = in_struct.in_param_2;

/* Call function */ ret = lseek (in_param_0, in_param_1, in_param_2);

/* Fill output structure */ out_struct.ret = ret;

/* Free handle and send response back */ HG_Handler_start_output(handle, &out_struct);

}

Page 18: The HDF Group  Milestone 5.1: Initial POSIX Function Shipping Demonstration Jerome Soumagne, Quincey Koziol 09/24/2013 © 2013 The HDF Group

www.hdfgroup.org09/24/2013 Initial POSIX Function Shipping Demonstration 18

Mercury POSIX

© 2013 The HDF Group

• Routines currently supported:access fdatasync mkdir truncate

chdir fpathconf mkfifo umask

chmod fstat mknod unlink

chown fsync open write

creat ftruncate pathconf

close getcwd read +Large file versions:

dup lchown readlink creat64

dup2 link rmdir ftruncate64

fchdir lockf stat lseek64

fchmod lseek symlink open64

fchown lstat sync etc.

Page 19: The HDF Group  Milestone 5.1: Initial POSIX Function Shipping Demonstration Jerome Soumagne, Quincey Koziol 09/24/2013 © 2013 The HDF Group

www.hdfgroup.org09/24/2013 Initial POSIX Function Shipping Demonstration 19

Mercury POSIX

© 2013 The HDF Group

• Routines not yet supported:closedir pipe readdir

fcntl pread rewinddir +Large file versions:

opendir pwrite utime ?

Page 20: The HDF Group  Milestone 5.1: Initial POSIX Function Shipping Demonstration Jerome Soumagne, Quincey Koziol 09/24/2013 © 2013 The HDF Group

www.hdfgroup.org09/24/2013 Initial POSIX Function Shipping Demonstration 20

Mercury POSIX - Configuration

© 2013 The HDF Group

• Environment variables required:• MERCURY_NA_PLUGIN: Underlying network transport

method used to forward calls to remote servers.• e.g., "bmi”

• MERCURY_PORT_NAME: Port name information (IP/port) specific to the network transport chosen – used to establish a connection with a remote server.

• e.g., "tcp://72.36.68.242:22222”• LD_PRELOAD: Path to Mercury POSIX shared library.

• e.g., “/usr/local/lib/libmercury_posix.so”

• Setting LD_PRELOAD redirects all POSIX calls to the Mercury server (can be an issue with local scripts, etc. that make use of POSIX I/O)

Page 21: The HDF Group  Milestone 5.1: Initial POSIX Function Shipping Demonstration Jerome Soumagne, Quincey Koziol 09/24/2013 © 2013 The HDF Group

www.hdfgroup.org09/24/2013 Initial POSIX Function Shipping Demonstration 21

Mercury POSIX - Testing

© 2013 The HDF Group

• Integrated regression tests (limited POSIX test suite)• HDF5 sec2 driver (demo)• Lustre POSIX test suite

• However: framework issues, needs to be modified, possibly need to support fdopen and FILE* routines?

Page 22: The HDF Group  Milestone 5.1: Initial POSIX Function Shipping Demonstration Jerome Soumagne, Quincey Koziol 09/24/2013 © 2013 The HDF Group

www.hdfgroup.org09/24/2013 Initial POSIX Function Shipping Demonstration 22

Demo – Mercury POSIX and HDF5 tools

© 2013 The HDF Group

$ pwd~jsoumagne/demo

$ ls *.h5ls: *.h5: No such file or directory

$ export MERCURY_NA_PLUGIN=“bmi”

$ export MERCURY_PORT_NAME=“tcp://127.0.0.1:22222”

$ export LD_PRELOAD=/path/to/libmercury_posix.so

$ pwd~jsoumagne/demo_server

$ lscoord.h5

$ mercury_posix_server bmiWaiting for client...

Page 23: The HDF Group  Milestone 5.1: Initial POSIX Function Shipping Demonstration Jerome Soumagne, Quincey Koziol 09/24/2013 © 2013 The HDF Group

www.hdfgroup.org09/24/2013 Initial POSIX Function Shipping Demonstration 23

Demo – Mercury POSIX and HDF5 tools

© 2013 The HDF Group

$ h5dump -H coord.h5HDF5 "coord.h5" {GROUP "/" { DATASET "multiple_ends_dset" { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 4, 5, 3, 4, 2, 3, 6, 2 ) / ( 4, 5, 3, 4, 2, 3, 6, 2 ) } } DATASET "multiple_ends_dset_chunked" { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 4, 5, 3, 4, 2, 3, 6, 2 ) / ( 4, 5, 3, 4, 2, 3, 6, 2 ) } } DATASET "single_end_dset" { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 2, 3, 6, 2 ) / ( 2, 3, 6, 2 ) }... (skip)

$ mercury_posix_server bmiWaiting for client...Thu, 19 Sep 13 17:31:00 CDT: Executing open64Thu, 19 Sep 13 17:31:00 CDT: Executing __fxstat64Thu, 19 Sep 13 17:31:00 CDT: Executing lseek64Thu, 19 Sep 13 17:31:00 CDT: Executing hg_posix_readThu, 19 Sep 13 17:31:00 CDT: Executing lseek64Thu, 19 Sep 13 17:31:00 CDT: Executing hg_posix_readThu, 19 Sep 13 17:31:00 CDT: Executing hg_posix_readThu, 19 Sep 13 17:31:00 CDT: Executing hg_posix_readThu, 19 Sep 13 17:31:00 CDT: Executing getcwd... (skip)

Page 24: The HDF Group  Milestone 5.1: Initial POSIX Function Shipping Demonstration Jerome Soumagne, Quincey Koziol 09/24/2013 © 2013 The HDF Group

www.hdfgroup.org09/24/2013 Initial POSIX Function Shipping Demonstration 24

Demo – Mercury POSIX and HDF5 tools

© 2013 The HDF Group

$ h5copy -i coord.h5 -s single_end_dset -o coord_simple.h5 -d simple

Thu, 19 Sep 13 17:33:51 CDT: Executing open64Thu, 19 Sep 13 17:33:51 CDT: Executing open64... (skip)Thu, 19 Sep 13 17:33:51 CDT: Executing __fxstat64Thu, 19 Sep 13 17:33:51 CDT: Executing lseek64Thu, 19 Sep 13 17:33:51 CDT: Executing hg_posix_readThu, 19 Sep 13 17:33:51 CDT: Executing lseek64Thu, 19 Sep 13 17:33:51 CDT: Executing hg_posix_read... (skip)Thu, 19 Sep 13 17:33:51 CDT: Executing hg_posix_writeThu, 19 Sep 13 17:33:51 CDT: Executing hg_posix_writeThu, 19 Sep 13 17:33:51 CDT: Executing close

Page 25: The HDF Group  Milestone 5.1: Initial POSIX Function Shipping Demonstration Jerome Soumagne, Quincey Koziol 09/24/2013 © 2013 The HDF Group

www.hdfgroup.org09/24/2013 Initial POSIX Function Shipping Demonstration 25

Demo – Mercury POSIX and HDF5 tools

© 2013 The HDF Group

$ h5dump coord_simple.h5HDF5 "coord_simple.h5" {GROUP "/" { DATASET "simple" { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 2, 3, 6, 2 ) / ( 2, 3, 6, 2 ) } DATA { (0,0,0,0): 0, 1, (0,0,1,0): 1, 2,... (skip) (1,2,2,0): 122, 123, (1,2,3,0): 123, 124, (1,2,4,0): 124, 125, (1,2,5,0): 125, 126 } }}}

Thu, 19 Sep 13 17:36:57 CDT: Executing open64Thu, 19 Sep 13 17:36:57 CDT: Executing __fxstat64Thu, 19 Sep 13 17:36:57 CDT: Executing lseek64Thu, 19 Sep 13 17:36:57 CDT: Executing hg_posix_readThu, 19 Sep 13 17:36:57 CDT: Executing lseek64Thu, 19 Sep 13 17:36:57 CDT: Executing hg_posix_readThu, 19 Sep 13 17:36:57 CDT: Executing hg_posix_readThu, 19 Sep 13 17:36:57 CDT: Executing hg_posix_read... (skip)Thu, 19 Sep 13 17:36:57 CDT: Executing lseek64Thu, 19 Sep 13 17:36:57 CDT: Executing hg_posix_readThu, 19 Sep 13 17:36:57 CDT: Executing close

Page 26: The HDF Group  Milestone 5.1: Initial POSIX Function Shipping Demonstration Jerome Soumagne, Quincey Koziol 09/24/2013 © 2013 The HDF Group

www.hdfgroup.org09/24/2013 Initial POSIX Function Shipping Demonstration 26

Conclusion – Future Work

© 2013 The HDF Group

• Very easy to forward POSIX I/O calls and does not require modification of existing tools / code

• Mercury POSIX can be easily extended to support additional system / library calls

• Can directly take advantage of updates to Mercury (network transports, etc.)

• Next Quarter:• Support remaining POSIX routines• Test with MPI I/O (ROMIO driver)• Test with Lustre POSIX test suite

• If framework issues are solved

Page 27: The HDF Group  Milestone 5.1: Initial POSIX Function Shipping Demonstration Jerome Soumagne, Quincey Koziol 09/24/2013 © 2013 The HDF Group

www.hdfgroup.org09/24/2013 Initial POSIX Function Shipping Demonstration 27

Questions

© 2013 The HDF Group