hpci shared storage user manual
TRANSCRIPT
Doc No.: HPCI-ST01-004E-03
HPCI Shared Storage User Manual For Fugaku users
2021/4/16
HPCI Shared Storage User Manual (for Fugaku users)
2
Revision History Revision Date Description
Initial
and
2nd
2021/03/15 • Substantially changed based on HPCI Shared Storage
User Manual (HPCI-ST01-001), accounting for actual
Fugaku computer operations
3rd 2021/04/16 • Correction of errors and figures.
• Changed the name of csgw.fugaku.r-ccs.riken.jp to
Cloud Storage Gateway Node
HPCI Shared Storage User Manual (for Fugaku users)
3
Table of Contents Introduction ............................................................................................................................................................................... 4
Overview of the Shared Storage System ........................................................................................... 6
For First-time Users of the Shared Storage System ................................................................. 7
Login to the Cloud Storage Gateway Node ..................................................................................................... 8
Obtaining the HPCI proxy certificate ................................................................................................................. 9
Setting up encrypted network communication ........................................................................................... 11
Mounting shared storage ...................................................................................................................................... 12
Remote copy to shared storage ........................................................................................................................ 13
Data transfer between Fugaku and shared storage .................................................................................. 13
Introduction of Replicas ........................................................................................................................................ 13
Parallel File Copy ..................................................................................................................................................... 14
Unmounting Shared Storage ............................................................................................................................... 16
Details of Shared Storage ........................................................................................................................................ 17
Direct Access to Shared Storage .................................................................................................................... 17
Access control of file and directory ................................................................................................................ 18
Storage usage and allocation ............................................................................................................................. 19
File Sharing in a Project ....................................................................................................................................... 22
Installing the Client Environment ...................................................................................................................... 22
Introduction of TIPS ............................................................................................................................................... 23
Troubleshooting ...................................................................................................................................................... 24
Introducing the HPCI Helpdesk ......................................................................................................................... 24
“Mountpoint is not empty” indicating the mountpoint is already in use .......................................... 25
“No write access to mountpoint” and nothing can be written ............................................................. 25
"Transport endpoint is not connected” and the shared storage cannot be accessed ............. 26
"Operation not permitted” and the shared storage cannot be mounted ........................................ 27
"Transport endpoint is not connected” and files cannot be accessed ............................................ 27
"Invalid argument” and files cannot be accessed ..................................................................................... 28
"Connection refused” and files cannot be accessed .............................................................................. 28
"Input/Output Error” and files cannot be accessed ................................................................................ 28
HPCI Shared Storage User Manual (for Fugaku users)
4
Introduction This document is intended for participants in HPCI projects using the Fugaku
supercomputer (hereafter referred to as the “Fugaku”) and describes how to access
the HPCI shared storage system (hereafter referred to as the “shared storage
system”) from Fugaku gateway serverss (csgw1.fugaku.r-ccs.riken.jp and
csgw2.fugaku.r-ccs.riken.jp) for Fugaku users.
Note that, Since the client software for using the shared storage is not installed on the
Fugaku login node (login.fugaku.r-ccs.riken,jp/ hereafter referred to as the Fugaku
login node), please use the Cloud Storage Gateway Node to access the shared
storage.
Chapter 1 briefly introduces the shared storage system, and Chapter 2 discusses basic
usage with actual examples for first-time users. Chapter 3. discusses using the shared
storage system in more detail. Finally, Chapter 4. explains how to contact the HPCI
Helpdesk for troubleshooting, as well as general troubleshooting methods.
This document assumes that users have obtained both a digital and a proxy certificate
in advance from the HPCI certificate issuing system. For instructions on how to obtain
a proxy certificate, refer to Chapter 2.2 “Proxy Certificate Issuing Procedure” in the
HPCI Login Manual (HPCI-CA01-001E): https://www.hpci-office.jp/materials/hpci-
ca01-001_e.pdf
In this document, italicized characters indicate input commands and bold characters
mean that the comment or the command’s output should be checked.
When using the shared storage system, the following websites may be helpful.
• Shared storage portal website
• Shows notices and maintenance information for the shared storage system:
https://www.hpci-office.jp/info/pages/viewpage.action?pageId=11862295
• Shared storage tips
• Provides detailed explanations of shared storage and Gfarm commands:
https://www.hpci-office.jp/info/pages/viewpage.action?pageId=26935659
HPCI Shared Storage User Manual (for Fugaku users)
5
• HPCI Helpdesk
• If you have any questions about the shared storage system, contact the HPCI
Helpdesk:
https://www.hpci-office.jp/pages/e_support
• HPCI portal website
• Provides support information related to using the HPCI, including applying for
a project and reporting results.
https://www.hpci-office.jp/
• Shared Storage Operation Information
• providing a dashboard that visualizes shared storage operation information. Ø Dashboard: https://hpci-web01.r-ccs.riken.jp/grafana/
Ø Manual: https://www.hpci-office.jp/info/pages/viewpage.action?pageId=216629492
The shared storage system is managed and operated by an HPCI shared storage
working group comprising the following two organizations.
• RIKEN Center for Computational Science (RIKEN Center for Computational
Science)
http://www.r-ccs.riken.jp/
• Information Technology Center, The University of Tokyo
http://www.cc.u-tokyo.ac.jp/
HPCI Shared Storage User Manual (for Fugaku users)
6
Overview of the Shared Storage System The shared storage system is a large-scale data sharing platform for HPCI users. By
using the shared storage system, HPCI users can quickly and safely share large
amounts of data under one file system across the geographically-dispersed
computational resources of the HPCI. The shared storage system uses the Gfarm
network shared file system and consists of metadata servers (serving metadata) and
file system nodes (serving file data). The system ensures high fault tolerance by
always copying metadata transactions from a master metadata server, installed at the
University of Tokyo or RIKEN R-CCS, to one or more slave servers, again at the
University of Tokyo or RIKEN R-CCS. The shared storage system’s client environment
is installed on login nodes, the HPCI system’s computational resources, and Fugaku
login nodes, to allow HPCI project participants to share storage. Users can also install
the client environment on local machines. For instructions on how to install the client
environment, refer to “HPCI Shared Storage User Manual–Client Introduction”: https://www.hpci-office.jp/materials/hpci-st01-002.pdf
RIKEN Center for Computational Science
Master metadata server�Slave metadata server File system node
Information Technology Center, The University of Tokyo
Master metadata server �Slave metadata server File system node
�The master metadata server is operated either by the RIKEN Center for Computational Science or Information Technology Center, The University of Tokyo.
HPCI Shared Storage User Manual (for Fugaku users)
7
For First-time Users of the Shared
Storage System This chapter explains the procedure for logging in to a shared storage login node and
mounting the shared storage system. In this chapter, the following account and Gfarm
group ID are used for all the examples.
Fugaku account name: u000000
HPCI-ID: hpci000000
Gfarm group ID: hp000000
A Gfarm group ID is assigned to each project during the initial project setup process,
except for a strategic program project awarded by FY 2015.
Please refer to the following page for the list of Gfarm group IDs.
• https://www.hpci-office.jp/info/pages/viewpage.action?pageId=178064247
In this document, use of a B shell system (such as bash) is assumed. If you are using
another type of shell, such as a C shell (e.g. csh or tcsh), adapt the B shell commands to
the shell you are using as necessary. A mount point for each project’s shared storage are
created from the Gfarm group ID. Note that, in the following description, the term “group
ID” refers to the Gfarm group ID.
This document explains how to log in to Cloud Storage Gateway Node using SSH, mount
the HPCI shared storage, and transfer data to and from Fugaku Global Storage as shown in
the following figure.
HPCI Shared Storage User Manual (for Fugaku users)
8
Login to the Cloud Storage Gateway Node
Fugaku provides the Cloud Storage Gateway Node as a login node for accessing the
HPCI shared storage and cloud computing environment out side the R-CCS.
The client software for using the HPCI shared storage is installed in the Cloud Storage
Gateway Node.
Cloud Storage Gateway Node Representative FQDN Actual FQDN
csgw.fugaku.r-ccs.riken.jp csgw1.fugaku.r-ccs.riken.jp
csgw2.fugaku.r-ccs.riken.jp
You can login to the Cloud Storage Gateway Node using SSH and GSISSH in the same
way as the Fugaku login node. To login via SSH, please access the Fugaku portal site
(https:///fugaku.r-ccs.riken.jp) and register your public key in advance. For details on
how to register, please refer to the Fugaku User Manual.
HPCI Shared Storage User Manual (for Fugaku users)
9
■ client$ ssh [email protected]
■ specify csgw.fugaku.r-ccs.riken.jp as the login destination, you can login to
csgw1 or csgw2.
■
■ [u00000@csgw1 ~]$
You can also login to the Cloud Storage Gateway Node using GSI-SSH, but you need
to issue a proxy certificate for HPCI and prepare a GSI-SSH client environment in
advance. For the issuing method and environment, please refer to the HPCI Quick Start Guide (https://www.hpci-office.jp/pages/e_hpci_info_manuals).
The following is how to login to the Cloud Storage Gateway Node using GSI-SSH.
■ client$ myproxy-logon -s portal.hpci.nii.ac.jp -l hpci000000 -t168
■ Issue a proxy certificate. It is valid for 168 hours.
■ Enter MyProxy pass phrase: ******
■ A credential has been received for user hpci000000 in
/tmp/x509up_XXXXXX.fileXXXXXXX.
■ client$ gsissh -p2222 [email protected]
■ Login to csgw.fugaku.r-ccs.riken.jp using GSI-SSH
you can login to csgw1 or csgw2.
■ [u00000@csgw1 ~]$
Obtaining the HPCI proxy certificate
This document describes how to access shared storage using an HPCI proxy certificate
(hereinafter referred to as a proxy certificate). For information on how to issue a
proxy certificate, please refer to the HPCI Quick Start Guide (https://www.hpci-
office.jp/pages/e_hpci_info_manuals).
After logging in to the Cloud Storage Gateway Node, check the expiration date of the
proxy certificate with the grid-proxy-info command.
The following is an example of a case where the proxy certificate has expired. If it has
expired, you need to obtain a proxy certificate.
HPCI Shared Storage User Manual (for Fugaku users)
10
■ [u00000@csgw1 ~]$ grid-proxy-info
■
■ ERROR: Couldn't find a valid proxy.
■ globus_sysconfig: Could not find a valid proxy certificate file location
■ globus_sysconfig: Error with key filename
■ globus_sysconfig: File does not exist: /tmp/x509up_pXXXXX is not a valid file
■
■ Use -debug for further information.
■ [u00000@csgw1 ~]$
If a valid proxy certificate has not been obtained, use the myproxy-logon command to
obtain a proxy certificate as follows. The -t option specifies the validity period of the
proxy certificate in hours. You will be prompted to enter the passphrase. Enter the
passphrase that was set when the proxy certificate was issued by the HPCI certificate
issuing system and stored in the repository.
■ [u00000@csgw1 ~]$ myproxy-logon -s portal.hpci.nii.ac.jp -l hpci000000 -t 168
■ Enter MyProxy pass phrase: ******
■ A credential has been received for user hpci000000 in
/tmp/x509up_XXXXXX.fileXXXXXXX.
■ [u00000@csgw1 ~]$
If you cannot obtain a proxy certificate with the myproxy-logon command, try
reissuing the proxy certificate.
After acquiring the proxy certificate, run the grid-proxy-info command again to check
the validity period.
The validity period will be displayed in timeleft field
HPCI Shared Storage User Manual (for Fugaku users)
11
■ u00000@csgw1 ~]$ grid-proxy-info
■ subject :
/C=JP/O=NII/OU=HPCI/CN=Hoge%40Foo[hpci000000]/CN=XXXXXXXXXX/CN=XXXXXXXXX/CN=XXXXXX
XXX/CN=XXXXXXXXX/CN=XXXXXXXXXX
■ issuer :
/C=JP/O=NII/OU=HPCI/CN=Hoge%40Foo[hpci000000]/CN=XXXXXXXXXX/CN=XXXXXXXXX/CN=XXXXXX
XXX/CN=XXXXXXXXX
■ identity : /C=JP/O=NII/OU=HPCI/CN=Hoge%40Foo[hpci000000]
■ type : RFC 3820 compliant impersonation proxy
■ strength : 2048 bits
■ path : /tmp/x509up_pXXXX.fileXYZABCD
■ timeleft : 23:59:40
■ [u00000@csgw1 ~]$
Setting up encrypted network communication
Since access to files and directories on shared storage is not encrypted but communicated in plain text
by system standards, it is recommended to enable the encrypted communication setting. Once you are
able to login to the Cloud Storage Gateway Node, please configure this setting before using shared
storage. To enable the encrypted communication setting for data protection, add the following
description to the configuration file $HOME/.gfarm2rc.
■ [u00000@csgw1 ~]$ cat $HOME/.gfarm2rc
■ auth enable gsi *
■ auth disable gsi_auth *
You can check whether the encrypted communication setting is enabled or not by using the gfhost
command. If the second item is uppercase "G", encrypted communication is enabled, and if it is
lowercase "g", communication is not encrypted.
HPCI Shared Storage User Manual (for Fugaku users)
12
■ [u00000@csgw1 ~]$ gfhost -lv
■ 0.01/0.03/0.03 G i386-fedora3-linux 2 linux-1.example.com 600 0(10.0.0.1)
■ 0.00/0.00/0.00 G i386-fedora3-linux 2 linux-2.example.com 600 0(10.0.0.2)
■ 0.00/0.02/0.00 G i386-redhat8.0-linux 1 linux-4.example.com 600 0(10.0.0.4)
■ 0.10/0.00/0.00 G sparc-sun-solaris8 1 solaris-1.example.com 600 0(10.0.1.1)
■ ...
Mounting shared storage
To mount the shared storage, execute the mount.hpci command as follows.
■ [u00000@csgw1 ~]$ mount.hpci
■ Update proxy certificate for gfarm2fs
■ timeleft : 23:53:05
■ Mount GfarmFS on /gfarm/hp000000/u000000
■ Mount GfarmFS on /gfarm/hp000001/u000000
■ [u00000@csgw1 ~]$
The mount destination of the shared storage is displayed in the next field of "Mount GfarmFS on".
Normally, it will be mounted on /gfarm, but if the directory does not exist, it will be mounted on /tmp, as
in /tmp/hp000000/u000000.
If you are a member of multiple research projects, the home directory of the HPCI shared storage for all
projects you belong to will be mounted. In the example above, the HPCI shared storage home directories
of both the hp000000 and hp000001 projects are mounted.
You can check the mount status with the df command.
■ [u00000@csgw1 ~]$ df | grep gfarm2fs
■ Filesystem Size Used Avail Use% Mounted on
■ gfarm2fs 85P 47P 39P 55% /gfarm/hp000000/u000000
■ gfarm2fs 85P 47P 39P 55% /gfarm/hp000001/u000000
■ [u00000@csgw1 ~]$
HPCI Shared Storage User Manual (for Fugaku users)
13
Remote copy to shared storage
You can copy files to the mounted shared storage by using the gscp or scp command, just like a normal
linux file system. Mount the shared storage with the Cloud Storage Gateway Node, then specify the
mount destination and execute copy.
■ [u00000@csgw1 ~]$ mount.hpci ← mount sharerd storage on Cloud Storage Gateway
Node
■ [u00000@csgw1 ~]$ exit
■ client$ scp ./data.file [email protected]:/gfarm/hp000000/u000000/
■ data.file 3% 381MB 72.6MB/s 02:12
ETA
■ client$
Data transfer between Fugaku and shared storage
For the area where the shared storage is mounted, Linux file manipulation commands can be used just
like any other file system. The following is an example of copying files from the global file system area
/data/hp000000/u000000 on Fugaku to the shared storage.
■ [u00000@csgw1 ~]$ cp /data/hp000000/u000000/data.file ¥
/gfarm/hp000000/u000000/
■ [u00000@csgw1 ~]$ ls -l /gfarm/hp000000/u000000/data.file
■ -rw-r--r-- 1 hpci000000 hp000000 100000 Feb 11 11:33 data.file
■ [u00000@csgw1 ~]$
Introduction of Replicas
Shared Storage automatically creates replicas of files on the file system nodes for data protection.
All files stored on the shared storage will have at least one replica each on the file servers of Todai and
RIKEN R-CCS, for a total of at least two replicas.
You can check the file server where the replicas are stored by using the gfwhere command. The
following example shows that the file “data.file” is stored in two file servers, one each in the Uviversity
of Tokyo and R-CCS.
HPCI Shared Storage User Manual (for Fugaku users)
14
■ [u00000@csgw1 ~]$ gfwhere /gfarm/hp000000/u000000/data.file
■ gfs13-1.hpci.itc.u-tokyo.ac.j ss-05-0.r-ccs.riken.jp
In the shared storage, one replica each is placed in the University of Tokyo and R-CCS to ensure fault
tolerance. Please do not change the number of replicas or the location of replicas by users.
Parallel File Copy
Shared Storage provides the gfpcopy command to copy multiple files in parallel.
In the gfpcopy command, the parallelism of the copy is specified by the -j option. The default parallelism
is 4. In the following example, TEST_DIRECTORY stored in the Fugaku global file system is recursively
copied to the shared storage.
■ [u00000@csgw1 ~]$ mount.hpci
■ [u00000@csgw1 ~]$ cd /gfarm/hp000000/u000000/
■ [u00000@csgw1 ~]$ gfpcopy -j8 /data/hp000000/u000000/TEST_DIRECTOR/ ./
■ [u00000@csgw1 ~]$ ls /gfarm/hp000000/u000000/TEST_DIRECTORY
■ TEST_FILE_01 TEST_FILE_02 TEST_FILE_03 TEST_FILE_04 TEST_FILE_05 TEST_FILE_06
■ TEST_FILE_07 TEST_FILE_08 TEST_FILE_09 TEST_FILE_10 TEST_FILE_11 TEST_FILE_12
■ TEST_FILE_13 TEST_FILE_14 TEST_FILE_15 TEST_FILE_16 TEST_FILE_17 TEST_FILE_18
■ [u00000@csgw1 ~]$
You can set the parallelism in the client_parallel_copy variable in the configuration file $HOME/.gfarm2rc.
In the following example, the parallelism is set to 8.
■ [u00000@csgw1 ~]$ cat $HOME/.gfarm2rc
■ client_parallel_copy 8
The gfpcopy command will overwrite the source file if it is newer than the destination file.
If the file stored in the source directory has been updated, or if the copy of the file fails, the gfpcopy
command can be run against the same directory to copy only the file that failed to be copied or the file
that has been updated.
HPCI Shared Storage User Manual (for Fugaku users)
15
[u00000@csgw1 ~]$ cp NEW_TEST_FILE_01 ./TEST_DIRECTOR/TEST_FILE_01
■
n The local test file is newer than the file srored in shared storage.
■ [u00000@csgw1 ~]$ ls -l ./TEST_DIRECTORY/TEST_FILE_01
■ -rw-r--r-- 1 u00000 hp000000 100000 Jul 15 11:14 TEST_FILE_01
■ [u00000@csgw1 ~]$ ls -l ¥
■ /gfarm/hp000000/u000000/TEST_DIRECTORY/ TEST_FILE_01
■ -rw-r--r-- 1 u00000 hp000000 300000 Jul 20 12:00 TEST_FILE_01
■
■ goverwrite with gfpcopy, only TEST_FILE_01 will be copied
■ [u00000@csgw1 ~]$ gfpcopy -j 8 ¥
./TEST_DIRECTOR/gfarm/hp000000/u000000/
■ [u00000@csgw1 ~]$ ls -l ¥
■ /gfarm/hp000000/u000000/TEST_DIRECTORY/ TEST_FILE_01
■ -rw-r--r-- 1 u00000 hp000000 300000 Jul 20 12:00 TEST_FILE_01
In the following example, a parallel copy is performed from the shared storage to the
Fugaku global file system. The parallelism is specified as 8. When the -v option is
specified for Gfpcopy, detailed information on file copying is displayed.
HPCI Shared Storage User Manual (for Fugaku users)
16
■ [u00000@csgw1 ~]$ grid-proxy-info | grep timeleft
■ timeleft : 1:00:00 (0.1 days) ← Check the expiration date. If it is not enough,
re-obtain a proxy certificate.
■ [u00000@csgw1 ~]$ myproxy-logon -s portal.hpci.nii.ac.jp -l hpci000000 -t 168
■ Enter MyProxy pass phrase: ******
■ [u00000@csgw1 ~]$ grid-proxy-info | grep timeleft
■ timeleft : 167:59:45 (7.0 days) ← Check the expiration date.
■ [u00000@csgw1 ~]$ gfpcopy -j 8 -v ¥
■ /gfarm/hp000000/u000000/TEST_DIRECTORY2 ¥
■ /data/hp000000/u000000/
■ INFO: mkdir(file:///data/hp000000/u000000, 755) OK
■ INFO: scheduling method = noplan
■ [OK]COPY, 200MB/s(1.0s): gfarm://ms-0.r-
ccs.riken.jp:601/home/hp000000/u000000/TEST_DIRECTORY2/FILE01(gfs54-2.hpci.itc.u-
tokyo.ac.jp:600) -> file:///data/hp000000/u000000/TEST_DIRECTORY2/FILE01
■ [OK]COPY, 200MB/s(1.0s): gfarm://ms-0.r-
ccs.riken.jp:601/home/hp000000/u000000/TEST_DIRECTORY2/FILE02(ss-02-1.r-
ccs.riken.jp:600) -> file:///data/hp000000/u000000/TEST_DIRECTORY2/FILE02
■ (snip)
■ [u00000@csgw1 ~]$
Check the expiration date of the proxy certificate before executing the gfpcopy command. If the
expiration date of the proxy certificate is exceeded during file copying, file copying from the point of the
expiration will fail. If the expiration date is exceeded and file copying fails, re-obtain the proxy
certificate and re-execute the gfpcopy command. Only the files that have not yet been copied will be
copied.
Unmounting Shared Storage
To unmount the shared storage, use the umount.hpci command.
■ [u00000@csgw1 ~]$ umount.hpci
■ Unmount GfarmFS on /gfarm/hp000000/u000000
■ Unmount GfarmFS on /gfarm/hp000001/u000000
■ [u00000@csgw1 ~]$
HPCI Shared Storage User Manual (for Fugaku users)
17
Details of Shared Storage This chapter provides details on how to use shared storage.
Direct Access to Shared Storage
There are two ways to access the shared storage as shown below.
(A) Mount the shared storage area and access it with standard file manipulation commands (method
described in Chapter 2).
(B) Direct access to the shared storage area using Gfarm-specific commands without mounting the
shared storage area.
This section introduces direct access (B). To specify a file stored in Gfarm in a Gfarm-specific
command, use the Gfarm absolute path beginning with gfarm://. The following is an example of listing
files by the gfls command.
■ [u00000@csgw1 ~]$ gfls -l ¥
■ gfarm:///home/hp000000/u000000/TEST_DIRECTORY
■ -rw-r--r-- 1 hpci000000 hp000000 10485760 Nov 11 16:30
gfarm:///home/hp000000/u000000/TEST_DIRECTORY/TEST_FILE_01
■ [u00000@csgw1 ~]$
The Gfarm absolute path can also be specified for the parallel copy command gfpcopy. By specifying
the Gfarm absolute path, you can access the shared storage without mounting it.
HPCI Shared Storage User Manual (for Fugaku users)
18
■ [u00000@csgw1 ~]$ mkdir /data/hp000000/u00000/work_dir
■ [u00000@csgw1 ~]$ ls -l /data/hp000000/u00000/work_dir
■ total 0
■ [u00000@csgw1 ~]$ gfpcopy -j 8 ¥
gfarm:///home/hp000000/u000000/TEST_DIRECTORY3 ¥
/data/hp000000/u00000/
■ [u00000@csgw1 ~]$ ls -l /data/hp000000/u00000/TEST_DIRECTORY3
■ total 921600
■ -rw-r--r-- 1 u00000 hp000000 10485760 Nov 12 13:41 test.000
■ -rw-r--r-- 1 u00000 hp000000 10485760 Nov 12 13:41 test.001
■ (省略)
■ -rw-r--r-- 1 u00000 hp000000 10485760 Nov 12 13:41 test.098
■ -rw-r--r-- 1 u00000 hp000000 10485760 Nov 12 13:41 test.099
■ [u00000@csgw1 ~]$
Access control of file and directory
Shared Storage supports access control lists (ACLs), which allow you to set individual access rights for
any user group. Access rights can be set individually for any user or group. The gfgetfacl and gfsetfacl
commands are used to reference and set ACLs, respectively.
The following example shows how to reference and set the ACL. First, the directory work is created, and
for the work directory, the ACL is referenced using the gfgetfacl command, and the ACL is set using the
gfsetfacl command.
For detailed usage of the gfgetfacl and gfsetfacl commands, please refer to the manual(man gfgetfacl,
gfsetfacl).
HPCI Shared Storage User Manual (for Fugaku users)
19
■ [u00000@csgw1 ~]$ gfmkdir -p gfarm:///home/hp000000/u000000/work
■ [u00000@csgw1 ~]$ gfls -dl gfarm:///home/hp000000/u000000/work
■ drwxr-xr-x 2 hpci000000 hp000000 0 Aug 5 15:24 work
■ [u00000@csgw1 ~]$ gfgetfacl gfarm:///home/hp000000/u000000/work
■ # file: gfarm:///home/hp000000/u000000/work
■ # owner: hpci000000
■ # group: hp000000
■ user::rwx
■ group::r-x
■ other::r-x
■ [u00000@csgw1 ~]$ gfsetfacl -m g:hp012345:rwx ¥
gfarm:///home/hp000000/u000000/work
■ [u00000@csgw1 ~]$ gfgetfacl gfarm:///home/hp000000/u000000/work
■ # file: gfarm:///home/hp000000/u000000/work
■ # owner: hpci000000
■ # group: hp000000
■ user::rwx
■ group::r-x
■ group:hp012345:rwx
■ other::r-x
■ [u00000@csgw1 ~]$
In the following example, the gfls and gfchmod commands, which are Gfarm-specific commands, are used
to refer to and set the access rights of files and directories, respectively.
■ [u00000@csgw1 ~]$ gfls -dl gfarm:///home/hp000000/u000000/work
■ drwxr-xr-x 2 hpci000000 hp000000 0 Aug 5 15:24 work
■ [u00000@csgw1 ~]$ gfchmod 775 gfarm:///home/hp000000/u000000/work
■ [u00000@csgw1 ~]$ gfls -dl gfarm:///home/hp000000/u000000/work
■ drwxrwxr-x 2 hpci000000 hp000000 0 Aug 5 15:24 work
■ [u00000@csgw1 ~]$
Storage usage and allocation
The gfusage command outputs the amount of usage and number of files used by users.
HPCI Shared Storage User Manual (for Fugaku users)
20
■ [u00000@csgw1 ~]$ gfusage
■ # UserName : FileSpace FileNum PhysicalSpace PhysicalNum
■ hpci000000 : 155354084939 32 321203401235 33
■ ----------------------------------------------------------------------
■ TOTAL : 155354084939 32 321203401235 33
■ [u00000@csgw1 ~]$
To check the amount of usage for each project, specify the group ID in the -g option. You can also use
the -H option to change the unit to the power of 10 (1 Kbyte = 1000 Byte).
■ [u00000@csgw1 ~]$ gfusage -g hp000000 -H ← Displayed as a power of 10
(1KByte = 1000Byte)
■ # GroupName : FileSpace FileNum PhysicalSpace PhysicalNum
■ hp000000 : 5.4T 18.0M 11.2T 48.2M
■ ----------------------------------------------------------------------
■ TOTAL : 5.4T 18.0M 11.2T 48.2M
■ [u00000@csgw1 ~]$
The following table shows each item of the gfusage command. For shared storage, FileSpace is used to
limit the amount of space allocated, and FileNum is used to limit the number of files allocated.
Please check FileSpace and FileNum when checking the amount of space used and the number of files
used for each issue.
(PhysicalSpace and PhysicalNum are not used for limits.)
FileSpace Storage usage (The quota limit is based on the File Sapce value)
・In the following example: 100
FileNum Number of files (The limit on the number of files to be allocated uses the value of
FileNum)
※ The number of files is the number of files in the metadata, which is the sum of
files, directories, and symbolic links.
・In the example below: 3 = number of files 1 + number of directories 1 + number of
symbolic links 1
PhysicalSpace Physical usage including replicas
PhysicalNum Number of files including replicas (However, the number of directories and symbolic
links are not included.)
HPCI Shared Storage User Manual (for Fugaku users)
21
In the following example, only one file (100Byte), one directory, and one symbolic link are stored in the
shared storage.
■ [u00000@csgw1 ~]$ gfls -l gfarm:///home/hp000000/u000000
■ drw-r--r-- 1 hpci000000 hp000000 0 Nov 12 13:41 directory
■ -rw-r--r-- 1 hpci000000 hp000000 100 Nov 12 13:41 file
■ lrwxrwxrwx 1 hpci000000 hp000000 0 Nov 12 13:41 symboliclink -> file
■ [u00000@csgw1 ~]$ gfusage
■ # UserName : FileSpace FileNum PhysicalSpace PhysicalNum
■ hpci000000 : 100 3 200 2
■ [u00000@csgw1 ~]$
Shared storage restricts usage by resource quota capacity and number of quota files.
Please note that if either the quota or the number of files exceeds the limit, you will not be able to write
any files. To check the resource allocation and the number of allocated files for your project, specify
the -g and -H options and execute the gfquota command.
The "FileSpaceHardLimit" and "FileNumHardLimit" fields indicate the amount of space allocated and the
number of files allocated, respectively.
■ [u00000@csgw1 ~]$ gfquota -g hp000000 -H
■ GroupName : hp000000
■ GracePeriod : disabled
■ FileSpace : 100T ← Storage usage
■ FileSpaceGracePeriod : disabled
■ FileSpaceSoftLimit : disabled
■ FileSpaceHardLimit : 500T ← Allocated storage capacity
(limit)
■ FileNum : 1K ← Number of existing files
■ FileNumGracePeriod : disabled
■ FileNumSoftLimit : disabled
■ FileNumHardLimit : 6M ← Number of allocated files
(limit)
■ (snip)
■ [u00000@csgw1 ~]$
HPCI Shared Storage User Manual (for Fugaku users)
22
File Sharing in a Project
In this section, we will show you how to share data stored in the shared storage between users who
belong to the same project.
The mount.hpci command only mounts the shared storage area of the executing user, and therefore
cannot refer to the shared storage area of other users.
For each project, a directory gfarm:///home/<Gfarm group ID>/shared is provided for sharing data
among users who belong to the same project.
All users in the project have read, write, and execute permissions on the directory
gfarm:///home/<Gfarm group ID>/shared.
To use the directory gfarm:///home/<Gfarm group ID>/shared, create a symbolic link with the gfln
command as follows.
■ [u00000@csgw1 ~]$ gfln -s gfarm:///home/hp000000/shared ¥
■ gfarm:///home/hp000000/u000000/shared
■ [u00000@csgw1 ~]$ gfls -l gfarm:///home/hp000000/u000000/shared
■ lrwxrwxrwx 1 hpci000000 hp000000 0 Jun 10 10:22 ¥
■ gfarm:///home/hp000000/u000000/shared -> gfarm:///home/hp000000/shared
■ [u00000@csgw1 ~]$ gfmkdir gfarm:///home/hp000000/u000000/shared/hpci000000
■ ↑Making of shared directory in a project
■ [u00000@csgw1 ~]$ gfls -ld gfarm:///home/hp000000/u000000/shared/*
■ drwxr-xr-x 1 u00000 hp000000 11 Jun 10 10:30 hpci000000
■ drwxr-xr-x 1 kxxxxx hp000000 11 Apr 23 2014 hpci12xxxx
■ [u00000@csgw1 ~]$
Installing the Client Environment
We have introduced how to use HPCI shared storage on the Fugaku login nodes, but you can also install
the HPCI shared storage client environment on your own machine. The installation method is described
in the "HPCI Shared Storage User Manual Client Installation (HPCI-ST01-002)" on the manual page of
the HPCI portal site.
https://www.hpci-office.jp/pages/e_hpci_info_manuals
HPCI Shared Storage User Manual (for Fugaku users)
23
By installing the shared storage client environment, you can mount HPCI shared storage and use Gfarm-
specific commands such as gfls and gfpcopy, which we have introduced so far, on your machine.
Introduction of TIPS
FAQs and useful usage methods are listed on the Shared Storage page of the HPCI CMS as TIPS.
Please visit: https://www.hpci-office.jp/info/pages/viewpage.action?pageId=26935639 .
HPCI Shared Storage User Manual (for Fugaku users)
24
Troubleshooting This chapter explains how to deal with any problems you may encounter while using
the shared storage system.
Introducing the HPCI Helpdesk
If a problem occurs while using the shared storage system, contact the HPCI
Helpdesk.
• HPCI helpdesk:
http://www.hpci-office.jp/pages/e_support
Please attach a log to your help request, as well as screenshots taken at the time the
problem occurred so that the issue can be quickly identified. The following information
will also help us to solve the problem more efficiently. Your cooperation is appreciated.
Report the status of the problem, including the following.
• The command that resulted in the problem (including the execution method and
accurate output at the time the error occurred)
• The time the problem occurred (as accurately as possible)
• The names of the HPCI System Provider and host where the problem occurred (or
as much configuration detail as possible if this cannot be provided)
• The local account used
• The output of the gfarm2fs–V command (shows the shared storage client version)
• The output of the grid-proxy-info command (to check the proxy certificate’s
validity)
In addition, if the shared storage system cannot be mounted then report the results of
executing the following commands. • mount.hpci / umount.hpci
• gfhost -lv
• gfmdhost -l
Alternatively, if the shared storage can be mounted but the files cannot be accessed,
report the results of executing the following commands.
HPCI Shared Storage User Manual (for Fugaku users)
25
• gfdf
• gfexport
• gfls -l
• gfwhere -al
• gfstat
■ Troubleshooting individual errors
“Mountpoint is not empty” indicating the mountpoint is already in use
■ [u00000@csgw1 ~]$ mount.hpci
■ timeleft : 23:49:19
■ fuse: mountpoint is not empty
■ fuse: if you are sure this is safe, use the 'nonempty' mount option
■ [u00000@csgw1 ~]$
If you see the above message, it is likely that either the shared storage system has
already been mounted and is in use or a file exists at that location. Check to make
sure the mountpoint is empty, then try to mount the shared storage system again.
“No write access to mountpoint” and nothing can be written
■ fusermount: user has no write access to mountpoint
/volumeX/home/hp000000/u00000/gfarm/hp000000/u000000
Write access is granted to the mountpoint according to the settings for the user’s
home directory when the mount.hpci command is used. By default, the directory
owners are set as shown below. If this problem occurs, it is likely that these settings
have been changed, so you should check these permissions and correct them if
necessary.
HPCI Shared Storage User Manual (for Fugaku users)
26
■ permission | owner | group | directory name
■ -----------+--------+----------+-----------------
■ drwxr-xr-x u000000 hp000000 ./gfarm
■ drwxr-xr-x u000000 hp000000 ./gfarm/hp000000
■ drwxr-xr-x u000000 hp000000 ./gfarm/hp000000/u000000
■ drwxr-xr-x u000000 hp000000 ./gfarm/hp000000/u000000
"Transport endpoint is not connected” and the shared storage cannot be accessed
■ libgfarm: [2000058] realpath(/home/hp000000 /u000000/gfarm/hp000000/u000000):
Transport endpoint is not connected
If you see a message like the one above, a previous mount process may have
terminated abnormally. Execute the umount.hpci command once and try to mount the
shared storage again. The error message “failed to umount” will be shown when you
execute the umount.hpci command, but this is normal.
■ [u00000@csgw1 ~]$ umount.hpci
■ Error: failed to umount GfarmFS on /gfarm/hp000000/u000000
■ [u00000@csgw1 ~]$ mount.hpci
■ timeleft : 22:41:46
■ Mount GfarmFS on /gfarm/hp000000/u000000
■ [u00000@csgw1 ~]$
Proceed to the next step if this method does not resolve the problem.
Check the shared storage mountpoint.
■ [u00000@csgw1 ~]$ df -H 2>/dev/null | grep $USER
■ gfarm2fs 85P 47P 39P 55% /gfarm/hp000000/u000000
[u00000@csgw1 ~]$
Using the fusermount command, unmount the mountpoint obtained above. Once this
has been unmounted successfully, remount the shared storage using the mount.hpci
command.
HPCI Shared Storage User Manual (for Fugaku users)
27
■ [u00000@csgw1 ~]$ fusermount -u /gfarm/hp000000/u000000
■ [u00000@csgw1 ~]$ df -H2>/dev/null | grep $USER ← Cheking to mount status
■ [u00000@csgw1 ~]$ ← No output of shell, if was succes
■ [u00000@csgw1 ~]$ mount.hpci ← Mounted shared storage in mountpoint
■ timeleft : 22:20:36
■ Mount GfarmFS on /gfarm/hp000000/u000000
■ [u00000@csgw1 ~]$
Contact the HPCI Helpdesk if this has still not resolved the problem.
"Operation not permitted” and the shared storage cannot be mounted
■ fusermount: mount failed: Operation not permitted
If you see this message, it is possible that the access authority required for mounting
has not yet been granted. If this does not resolve the problem, contact the HPCI
Helpdesk.
"Transport endpoint is not connected” and files cannot be accessed
If you see this error message when trying to access a file in the (mounted) shared
storage area, the mount process may have terminated when the shared storage
system was mounted. Unmount the storage system as follows and then try to mount it
again.
■ [u00000@csgw1 ~]$ umount.hpci
■ Unmount GfarmFS on /gfarm/hp000000/u000000
■ [u00000@csgw1 ~]$ mount.hpci
■ timeleft : 21:42:44
■ Mount GfarmFS on /gfarm/hp000000/u000000
■ [u00000@csgw1 ~]$
HPCI Shared Storage User Manual (for Fugaku users)
28
"Invalid argument” and files cannot be accessed
If you see the above error message when trying to access a file in the (mounted)
shared storage area, it is likely that the proxy certificate has expired. Obtain a new
proxy certificate, according to the instructions in Chapter 2.2, “Proxy Certificate
Issuance Procedure,” of the HPCI Login Manual (HPCI-CA01-001E):
https://www.hpci-office.jp/materials/hpci-ca01-001_e.pdf
Contact the HPCI Helpdesk if this does not resolve the problem.
"Connection refused” and files cannot be accessed
If you see this error message when trying to access a file in the (mounted) shared
storage area, it is likely that the metadata server has stopped. Check the output of the
gfls command. If you see the following error, contact the HPCI Helpdesk.
■ [u00000@csgw1 ~]$ gfls
libgfarm: [1000058] connecting to gfmd at ms-0.r-ccs.riken.jp:601 failed, sleep 1
sec: connection refused
(snip)
libgfarm: [1000058] connecting to gfmd at ms-0.r-ccs.riken.jp:601 failed, sleep
16 sec: connection refused
libgfarm: [1000059] cannot connect to gfmd at ms-0.r-ccs.riken.jp:601, give up:
connection refused
libgfarm: [1000017] connecting to gfmd at ms-0.r-ccs.riken.jp:601: connection
refused
gfls: gfarm_initialize: connection refused
■ [u00000@csgw1 ~]$
"Input/Output Error” and files cannot be accessed
If you see this message when trying to access files in the (mounted) shared storage
area, investigate the situation using the gfexport, gfls, or gfwhere commands.
Examples showing the normal outputs of these commands are given below. If you do
not see messages like these, it is likely that an error has occurred.
HPCI Shared Storage User Manual (for Fugaku users)
29
■ [u00000@csgw1 ~]$ gfexport test.dat
■ [u00000@csgw1 ~]$ ← No output of shell, if was success.
■
[u00000@csgw1 ~]$ gfls -l test.dat
-rw-r--r-- 2 hpci000000 hp000000 104857600 Apr 1 01:23 gfarm://ms-0.r-
ccs.riken.jp:601/home/hp000000/u000000/test.dat
■ [u00000@csgw1 ~]$
■
[u00000@csgw1 ~]$ gfwhere -di test.dat
■ ss-09-0-2.r-ccs.riken.jp
■ gfs53-2.hpci.itc.u-tokyo.ac.jp
■ [u00000@csgw1 ~]$
If either the file size is 0 or the gfwhere–di command outputs a file system node, it is
possible that the system has failed. In this case, contact the HPCI Helpdesk.
If the file size is greater than 0 and the gfwhere–di command outputs nothing, all files
may have been damaged or lost. In this case, investigate the situation using the gfstat
command. An example of the normal output of the gfstat command is as follows.
■ [u00000@csgw1 ~]$ gfstat test.dat
■ File: "gfarm://ms-0.r-ccs.riken.jp: 601/home/hp000000/u000000/work/test.dat "
■ Size: 10485760 Filetype: regular file
■ Mode: (0644) Uid: (hpci000000) Gid: (hp000000)
■ Inode: 117016462 Gen: 1
■ (0000000006F9878E0000000000000001)
■ Links: 1 Ncopy: 2
■ Access: 2014-11-11 16:30:30.210115479 +0900
■ Modify: 2014-11-11 16:30:24.332430836 +0900
■ Change: 2014-11-11 16:30:24.332430836 +0900
■ [u00000@csgw1 ~]$
If you do not see the above status, it is likely that an error has occurred. In this case,
contact the HPCI Helpdesk.
Hosts where replicas are managed will be
output.