using the mass storage system hpss€¦ · if you need to move data between centers, contact nics...
TRANSCRIPT
Using the Mass Storage System HPSS
Presented by:
Mitchell Griffith Oak Ridge Leadership Computing Facility (OLCF)
2
HPSS (archival storage)
• What is HPSS? – HPSS is software that manages petabytes of data on disk and
robotic tape libraries. HPSS provides highly flexible and scalable hierarchical storage management that keeps recently used data on disk and less recently used data on tape.
– Hierarchical storage system • Disk cache • Tape backend • Core Server, DB2 Metadata, Tape Control Software
• References – http://www.hpss-collaboration.org – http://www.hpss-collaboration.org/documents/hpss732/
install_guide.pdf – http://www.mgleicher.us
3
OLCF/NCCS NICS
HPSS
HPSS Layout
HPSS is shared between the OLCF and NICS
If you need to move data between centers, contact NICS or OLCF helpline
Typical Downtimes are Tuesday 8AM – 11 AM
4
5
HPSS Terms
• Storage Class – Collection of storage devices – Either disk or tape (not combination)
• Class of Service (COS) – Collection of disks and tapes – Set the hierarchy levels for the storage classes – Top-level is disk, bottom levels are tape
6
HPSS Terms
• Migrate – Copy the file from a higher storage class to a lower storage class
(migrate from disk to tape)
• Stage – Copy the file from a lower storage class to a higher storage class
(stage from tape to disk)
• Purge – Remove the file from the top-level storage class(free disk cache)
(does NOT delete the file)
7
HPSS Layout
• COS – Single copy COS (lscos)
COS Name Min Max
5081 Xsmall 0 131,071 (128K)
6001 Small 131,072 (128K) 16,777,215 (16M)
6002 Medium 16,777,216 (16M) 536,870,911 (512M)
6054 Large_T 536,870,912 (512M) 8,589,934,591 (8G)
6056 X-‐Large_T 8,589,934,591 (8G) 281,474,976,710,656 (256T)
8
HPSS Layout
• Disk Cache – Disk cache is striped(multiple disks, faster device) – Upgraded disk cache (660TB in disk cache) – (12) disk movers in production
• Tape – (17) production movers
• Future Work – Planning on updating hardware – Adding additional movers – Faster network backplane
9
HPSS Layout
• (6) SL8500’s – 10,000 tape slots per silo
• 40,000+ tapes in use – Tape capacity (500GB, 1TB, 5TB, 8.5TB)
• 112 tape drives – 16 T10K-A – 60 T10K-B – 36 T10K-C
10
HPSS Interface only way to interface with HPSS
• HSI – Similar to ftp – Authentication (keytab or combo) – Can be interactive or non-interactive – Preferred only 1 stream at a time – Batch options to allow HPSS to retrieve files more efficiently
• HTAR – Acts like a wrapper to tar for HPSS – Faster than tar a file then put to hpss – Good for file aggregation – Preferred only 1 stream at a time
11
HSI
• For a complete list of commands: – http://www.mgleicher.us/GEL/hsi/hsi_reference_manual_2/
hsi_commands/
• Most commonly used commands:
• cd • chgrp • chmod • chown • du
• get • In • ls • mkdir • mv
• put • rm • rmdir • pwd
12
HSI get and put example
local_file_name : hpss_file_name K:[/home/mitchell]: put 1MB : /home/mitchell/test/1mb
put '1MB' : '/home/mitchell/test/1mb' ( 1048576 bytes, 30173.8 KBS (cos=6001))
K:[/home/mitchell]: get test.file : /home/mitchell/test/1mb
get 'test.file' : '/home/mitchell/test/1mb' (2014/02/06 01:00:35 1048576 bytes, 4965.3 KBS )
13
htar examples
mitchell@krakenpf6:/lustre/scratch/mitchell> htar cf backup.tar .
HTAR: HTAR SUCCESSFUL
mitchell@krakenpf6:/lustre/scratch/mitchell> htar tvf backup.tar
HTAR: drwx------ mitchell/tsg01 0 2014-02-06 01:26 ./
HTAR: drwxr-xr-x mitchell/tsg01 0 2014-02-06 01:24 ./a/
HTAR: -rw-r--r-- mitchell/tsg01 1048576 2014-02-06 01:23 ./a/a.file
HTAR: -rw-r--r-- mitchell/tsg01 1048576 2014-02-06 01:24 ./a/b.file
HTAR: -rw-r--r-- mitchell/tsg01 1048576 2014-02-06 01:24 ./a/c.file
HTAR: -rw-r--r-- mitchell/tsg01 1048576 2014-02-06 01:24 ./a/d.file
HTAR: drwxr-xr-x mitchell/tsg01 0 2014-02-06 01:24 ./d/
HTAR: -rw-r--r-- mitchell/tsg01 1048576 2014-02-06 01:23 ./d/a.file
HTAR: -rw-r--r-- mitchell/tsg01 1048576 2014-02-06 01:24 ./d/b.file
HTAR: -rw-r--r-- mitchell/tsg01 1048576 2014-02-06 01:24 ./d/c.file
HTAR: -rw-r--r-- mitchell/tsg01 1048576 2014-02-06 01:24 ./d/d.file
HTAR: -rw------- mitchell/tsg01 256 2014-02-06 01:26 /tmp/HTAR_CF_CHK_28882_1391667988
HTAR: Listing complete for backup.tar, 9 files 12 total objects
HTAR: HTAR SUCCESSFUL
14
htar examples
mitchell@krakenpf6:/lustre/scratch/mitchell> hsi get backup.tar
mitchell@krakenpf6:/lustre/scratch/mitchell> tar tvf backup.tar
drwx------ mitchell/tsg01 0 2014-02-06 01:26 .
drwxr-xr-x mitchell/tsg01 0 2014-02-06 01:24 ./a
-rw-r--r-- mitchell/tsg01 1048576 2014-02-06 01:23 ./a/a.file
-rw-r--r-- mitchell/tsg01 1048576 2014-02-06 01:24 ./a/b.file
-rw-r--r-- mitchell/tsg01 1048576 2014-02-06 01:24 ./a/c.file
-rw-r--r-- mitchell/tsg01 1048576 2014-02-06 01:24 ./a/d.file
drwxr-xr-x mitchell/tsg01 0 2014-02-06 01:24 ./d
-rw-r--r-- mitchell/tsg01 1048576 2014-02-06 01:23 ./d/a.file
-rw-r--r-- mitchell/tsg01 1048576 2014-02-06 01:24 ./d/b.file
-rw-r--r-- mitchell/tsg01 1048576 2014-02-06 01:24 ./d/c.file
-rw-r--r-- mitchell/tsg01 1048576 2014-02-06 01:24 ./d/d.file
tar: Removing leading `/' from member names
-rw------- mitchell/tsg01 256 2014-02-06 01:26 /tmp/HTAR_CF_CHK_28882_1391667988
15
htar examples
mitchell@krakenpf6:/lustre/scratch/mitchell/test> htar xf backup.tar
HTAR: HTAR SUCCESSFUL
mitchell@krakenpf6:/lustre/scratch/mitchell/test> ls
a d
mitchell@krakenpf6:/lustre/scratch/mitchell/test> rm -fr a d
mitchell@krakenpf6:/lustre/scratch/mitchell/test> ls -l
total 0
mitchell@krakenpf6:/lustre/scratch/mitchell/test> htar xvf backup.tar ./d/a.file
HTAR: x ./d/a.file, 1048576 bytes, 2049 media blocks
HTAR: Extract complete for backup.tar, 1 files. total bytes read: 1,049,088 in 0.006 seconds (188.955 MB/s )
HTAR: HTAR SUCCESSFUL
mitchell@krakenpf6:/lustre/scratch/mitchell/test> ls -l d/a.file
-rw-r--r-- 1 mitchell tsg01 1048576 2014-02-06 01:23 d/a.file
mitchell@krakenpf6:/lustre/scratch/mitchell/test>
16
HPSS Interfaces
• Access time can be slow if accessing from tape – HSI get command:
K:[/home/USER]: get a.pdf
Scheduler: retrieving file(s)
• Waiting on a tape drive (resource contention) – HTAR you will not see anything
• HTAR can give you just a standard tar file – hsi get file.tar – $ tar xf file.tar
17
HPSS Errors (possible tape issue)
[/home/USER]: get test.tar
Scheduler: retrieving file(s)
HPSS EIO error, will retry in 10 seconds0.00 0.00%] [Done: 0 Failed: 0]
HPSS EIO error, will retry in 60 seconds0.00 0.00%] [Done: 0 Failed: 0]
HPSS EIO error, will retry in 360 seconds.00 0.00%] [Done: 0 Failed: 0]
HPSS EIO error, aborting0.00 Xferred: 0.00 0.00%] [Done: 0 Failed: 0]
*** hpss_Open: I/O error [-5: HPSS_EIO]
/home/USER/10_5.5A.tar
*** get: Error -1 on transfer. /lustre/USER/test.tar from
/home/USER/test.tar
18
Troubleshooting Common Codes
• HPSS_EPERM (-1) Permission issue, user does not have
permissions to perform the task • HPSS_ENOSPACE (-28) HPSS does not have enough
space (disk or tape) to perform the task • HPSS_EIO (-5) IO error could be anything from HPSS to
network to lustre file system. • HPSS_ECONN (-50) Connection issue
19
Authentication Issue
> hsi result = -11000, errno = 0g] Unable to authenticate user with HPSS. result = -11000, errno = 9 Unable to setup communication to HPSS... *** HSI: error opening logging Error - authentication/initialization failed
Workaround Ø hsi –A combo (only if you have a OTP)
20
HSI Batch
K:[/home/mitchell]: in hpss.marker!
in hpss.marker!
get <<MARKER!
get ’b1.tar' : '/home/mitchell/b1.tar' (2014/02/06 01:52:41 8395776 bytes, 19723.2 KBS )!
get ’b4.tar' : '/home/mitchell/b4.tar' (2014/02/06 01:54:48 8395776 bytes, 400446.2 KBS )!
get ’b2.tar' : '/home/mitchell/b2.tar' (2014/02/06 01:52:46 8395776 bytes, 19706.4 KBS )!
get ’b3.tar' : '/home/mitchell/b3.tar' (2014/02/06 01:52:51 8395776 bytes, 13278.8 KBS )!
K:[/home/mitchell]: !
• When retrieving multiple files, it is strongly recommend to use the MARKER option
• Schedule retrievals in an optimal way so as to minimize HPSS tape mounts $ cat hpss.marker !
get <<MARKER!b1.tar!b2.tar!b3.tar!b4.tar!MARKER!!
21
HPSS Tips
• HPSS is archival storage – Many very small files are bad for HPSS (metadata stuff) – Too large of files are bad as well (disk cache fills up) – Optimal file size for HPSS is between 2GB and 256GB. – Use HTAR to archive several smaller files into one
• A member file of an HTAR file cannot be > 64GB
• Submit to the HPSS queue (schedule the transfers) – http://www.nics.tennessee.edu/computing-resources/hpss/queue
22
HPSS Tips
• Additional information try the transfer with the debug option – hsi –d5 “get /dev/null : file.tar” – debug 5; get /dev/null : file.tar
• When moving data to lustre, be mindful of the lustre striping – Rule of thumb for NICS:1 stripe every 50GB. – http://www.nics.tennessee.edu/node/311
• Once data is on scratch use standard transfer methods: – http://www.nics.tennessee.edu/computing-resources/data-transfer – https://www.olcf.ornl.gov/kb_articles/transferring-data/?
nccssystems=DTN
23
Questions and Contact Information
• OLCF: [email protected] • NICS and XSEDE users: [email protected] • NICS and non-XSEDE users: [email protected]