![Page 1: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/1.jpg)
Linux Clusters Ins.tute: High Performance Storage
University of Oklahoma, 05/19/2015
Mehmet Belgin, Georgia Tech [email protected]
(in collabora9on with Wesley Emeneker)
18-22 May 2015 1
![Page 2: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/2.jpg)
The Fundamental Ques.on
18-22 May 2015 2
• How do we meet *all* user needs for storage?
• Is it even possible?
• Confounding factors • User expecta9ons (in their own words) • Budget constraints • Applica9on needs and use cases • Exper9se in team • Exis9ng infrastructure
![Page 3: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/3.jpg)
Examples to Common Storage Systems
18-22 May 2015 3
• Network File System (NFS) – a distributed file system protocol for accessing files over a network.
• Lustre – a parallel, distributed file system • OSS – object storage server. This server stores stores and manages pieces of files (aka objects)
• OST – object storage target. This disk is managed by the OSS and stores data • MDS – metadata server. This server stores file metadata. • MDT – metadata target. This disk is managed by the MDS and stores file metadata
• General Parallel File System (GPFS) – a parallel, distributed file system. • Metadata is not owned by any par9cular server or set of servers. • All clients par9cipate in filesystem management • NSD – network storage device
• Panasas/PanFS – a parallel, distributed file system • Metadata is owned by director blades • File data is owned by storage blades
![Page 4: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/4.jpg)
Nomenclature
18-22 May 2015 4
• Object store – a place where chunks of data (aka objects) are stored. Objects are not files, though they can store individual files or different pieces of files.
• Raw space – what the the disk label shows. Typically given in base 10.
i.e. 10TB (terabyte) == 10*10^12 bytes
• Usable space -‐ what “df” shows once the storage is mounted. Typically given in base 2.
i.e. 10TiB (tebibyte) == 10*2^40 bytes
• Usable space is o_en about 30% smaller (some9mes more, some9mes less) than raw space.
![Page 5: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/5.jpg)
Which one is right for me?
18-22 May 2015 5
Lustre
![Page 6: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/6.jpg)
The End.
18-22 May 2015 6
Thanks for par9cipa9ng!
![Page 7: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/7.jpg)
Before we start…
18-22 May 2015 7
What is a File System?
![Page 8: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/8.jpg)
What is a filesystem?
• A system for files (Duh!)
• A source of constant frustra9on
• A filesystem is used to control how data is stored and retrieved –Wikipedia
• It’s a container (that contains files)
• It’s the set of disks, servers (computa9onal components), networking, and so_ware
• All of the above
18-22 May 2015 8
![Page 9: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/9.jpg)
Disclaimer
• There are no right answers
• There are wrong answers • No, seriously.
• It comes down to balancing tradeoffs of preferences, exper9se, costs, and case-‐by-‐case analysis
18-22 May 2015 9
![Page 10: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/10.jpg)
Know Your Stakeholders
18-22 May 2015 10
… and keep all of them happy! (at the same 9me)
1. Users
2. Managers and University Leadership
3. University support staff
4. System administrators
5. Vendor
Users
Managers
Sysadmins
![Page 11: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/11.jpg)
What do you need to support? Common Storage Requirements (which most users can’t ar9culate)
• Temporary storage for intermediate results from jobs (a.k.a scratch)
• Long-‐term storage for run9me use
• Backups
• Archive
• Expor9ng said filesystem to other machines (like a user's Windows XP laptop)
• Virtual Machine hos9ng
• Database hos9ng
• Map/Reduce (a.k.a Hadoop)
• Data ingest and outgest (DMZ?)
• System Administrator storage
18-22 May 2015 11
![Page 12: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/12.jpg)
Tradeoffs
First, try to define ‘use purpose’ and ‘opera9onal life9me’…
• Speed (… is a rela9ve term!) • Space • Cost • Scalability • Administra9ve burden • Monitoring • Reliability/Redundancy • Features • Support from vendor
18-22 May 2015 12
![Page 13: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/13.jpg)
Parallel/Distributed vs. Serial Filesystems*
Serial • It doesn’t scale beyond a single server • It o_en isn't easy to make it reliable or redundant beyond a single server • A single server controls everything
Parallel • Speed increases as more components are added to it • Built for distributed redundancy and reliability • Mul9ple servers contribute to the management of the filesystem
18-22 May 2015 13
*None of these things are 100% true
![Page 14: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/14.jpg)
The Most Common Solu.ons for HPC
Want to access your data from everywhere? You need “Network Aoached Storage (NAS)”!
• NFS (serial-‐ish)
• GPFS (Parallel)
• Lustre (Parallel)
• Panasas (Parallel)
• What about others like OrangeFS, Gluster, Ceph, XtreemFS, CIFS, HDFS, Swi_, etc.?
18-22 May 2015 14
![Page 15: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/15.jpg)
Prepare for a Challenge
• NFS • Panasas • GPFS • Lustre
• Your mileage may vary!
18-22 May 2015 15
Administra9ve Burden & needed exper9se
(anectodal)
low
high
![Page 16: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/16.jpg)
Network File System (NFS)
• Can be built from commodity parts or purchased as an appliance
• A single server typically controls everything
• Where does it fall for our tradeoffs? • No so_ware cost • Compa9ble (not 100% POSIX) • Underlying Filesystem does not maoer much (ZFS, ext3, …) • True redundancy is harder (single point of failure) • Mostly for low-‐volume, low-‐throughput workloads • Strong client side caching, works well for small files • Requires minimal exper9se and (rela9vely) easy to manage
18-22 May 2015 16
*Speed *Space *Cost *Scalability *Administra9ve Burden *Monitoring *Reliability/Redundancy *Features *Vendor Support
![Page 17: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/17.jpg)
General Parallel File System (GPFS)
• Can be built from commodity parts or purchsed as an appliance
• All nodes in the GPFS cluster par9cipate in filesystem management
• Metadata is managed by every node in the cluster
• Where does it fall in our tradeoffs?
18-22 May 2015 17
Client Client
NSD Server NSD Server
Network
*Speed *Space *Cost *Scalability *Administra9ve Burden *Monitoring *Reliability/Redundancy *Features *Vendor Support
![Page 18: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/18.jpg)
Lustre • Can be built from commodity parts, or purchased as an appliance
• Separate servers for data and metadata
• Where does it fall in our tradeoffs?
18-22 May 2015 18
* Image credit: nor-tech.com
*Speed *Space *Cost *Scalability *Administra9ve Burden *Monitoring *Reliability/Redundancy *Features *Vendor Support
![Page 19: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/19.jpg)
Panasas
• Is an appliance • Separate servers for metadata and data
• Where does it fall in our tradeoffs?
18-22 May 2015 19
* Image credit: panasas.com
*Speed *Space *Cost *Scalability *Administra9ve Burden *Monitoring *Reliability/Redundancy *Features *Vendor Support
![Page 20: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/20.jpg)
Appliances
• Appliances generally come with vendor tools for monitoring and management
• Do these tools increase or decrease management complexity?
• How important is vendor support for your team?
18-22 May 2015 20
Screenshot of Panasas management tool
![Page 21: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/21.jpg)
Good idea? Bad idea? Let’s discuss!
• NFS for everything
• Panasas for everything
• Lustre for everything
• GPFS for everything
18-22 May 2015 21
![Page 22: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/22.jpg)
How about…
• Lustre for work (files stored here are temporary) • NFS for home • Tape for backup and archival
• Lustre available everywhere • Tape available on data movers • NFS only available on login machines
18-22 May 2015 22
![Page 23: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/23.jpg)
Designing your storage solu.on
• Who are the stakeholders? • How quickly should we be able to read any one file? • How will people want to use it? • How much training will you need? • How much training will your users need to effec9vely use your storage? • Do you have the knowledge necessary to do the training? • How o_en do they need the training? • Do you need different 9ers or types of storage?
• Long-‐term • Temporary • Archive
• From what science/usage domains are the users? • aka what applica9ons will they be using?
• What features are necessary?
18-22 May 2015 23
![Page 24: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/24.jpg)
Applica.on Driven Tradeoffs
• Domain Science • Chemistry • Aerospace • Bio* (biology, bioinforma9cs, biomedical) • Physics • Business • Economics • etc.
• Data and Applica9on Restric9ons • HIPAA and PHI • ITAR • PCI DSS • And many more (SOX, GLBA, CJIS, FERPA, SOC, …)
18-22 May 2015 24
![Page 25: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/25.jpg)
What you need to know
• What is the distribu9on of files? • sizes, count
• What is the expected workload? • How many bytes are wrioen for every byte read?
• How many bytes are read for each file opened?
• How many bytes are wrioen for each file opened?
• Are there any system-‐based restric9ons? • POSIX conformance. Do you need a POSIX Filesystem?
• Limita9ons on number of files or files per directories
• Network compa9bility (IB vs. Ethernet)
18-22 May 2015 25
![Page 26: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/26.jpg)
Use Case: Data Movement
• Scenario: User needs to import a lot of data
• Where is the data coming from? • Campus LAN?
• Campus WAN?
• WAN?
• How o_en will the data be ingested?
• Does it need to be outgested?
• What kind of data is it?
• Is it a one-‐9me ingest or regular?
18-22 May 2015 26
![Page 27: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/27.jpg)
Designing your storage solu.on
• What technologies do you need to sa9sfy the requirements that you now have?
• Can you put a number on the following? • Minimum disk throughput from a single compute node
• Minimum aggregate throughput for the en9re filesystem for a benchmark (like iozone or IOR)
• I/O load for representa9ve workloads from your site • How much data and metadata is read/wrioen per job?
• Temporary space requirements
• Archive and backup space requirements • How much churn is there in data that needs to be backed up?
18-22 May 2015 27
![Page 28: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/28.jpg)
Storage Devices
18-22 May 2015 28
• Solid State • RAM
• PCIe SSD • SATA/SAS SSD
• Spinning Disk • SAS • NL-‐SAS • SATA
• Tape
speed & cost capacity
high
low high
low
o Serial ATA (SATA): $/byte, large capacity, less reliable, slower (7.2k RPM)
o Serial ACached SCSI (SAS): $$/byte, small capacity, reliable, fast (15k RPM)
o Nearline-‐SAS: SATA drives with SAS interface: more reliable than SATA, cheaper than SAS, ~SATA speeds but with lower overhead
o Solid State Disk (SSD): No spinning disks, $$$/byte, blazing fast, reliable1
![Page 29: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/29.jpg)
What is an IOP?
• IOP == Input/Output Opera9on • IOPS == Input/Output Opera9ons per Second • We care about two IOPS reports
• The number we tell people when we say “Our Veridian Dynamics Frobulator 2021 gets 300PiB/s bandwidth!”
• The number that affects users “Our Veridian Dynamics Frobulator 2021 only gets 5KiB/s for <insert your applica9on’s name>”
• Why the difference?
18-22 May 2015 29
![Page 30: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/30.jpg)
More tradeoffs …
Space vs. Speed • Do you need 10GiB/s and 10TiB of space? • Do you need 1PiB of usable storage and 1GiB/s? • How do you meet your requirements?
Large vs. Small Files • What is a small file?
• No hard rule. It depends on how you define it. • At GT, small is < 1MiB?
• Why do you care? • Metadata opera9ons are deadly. The 9me required to do a metadata lookup on a 1TiB file takes the same 9me as a lookup on a 1KiB file.
18-22 May 2015 30
![Page 31: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/31.jpg)
Example Storage Solu.on (Georgia Tech)
Experienced catastrophic failure(s) with all of them at least once
• Panasas appliance for scratch (shared by all) • GPFS appliance on SATA/NL-‐SAS/SAS for long-‐term (and some home) • NFS on SATA for long-‐term (many servers) • NFS on SATA for home (a few servers) • NFS for administra9ve storage • NFS for daily backups • Coraid system (NFS) for applica9on repository and VM images
• Building a homebrew GPFS from commodity components for scratch!
18-22 May 2015 31
informa9onal purposes only, not a recommenda9on
![Page 32: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/32.jpg)
Storage Policies (Georgia Tech)
• 5GB home space • backed up daily • provided by GT • NFS
• ∞ project space • backed up daily • faculty-‐purchased, but GT buys the backup space • Mix of NFS and GPFS (transi9oning to GPFS)
• 5TB/7TB Scratch/Temporary • not backed up • purchased by GT • Panfs (soon to be something else)
18-22 May 2015 32
![Page 33: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/33.jpg)
Storage Policies (Georgia Tech)
18-22 May 2015 33
• Scratch • files older than 60 days are marked for removal
• Users are given one week to save their data (or make a plea for more 9me)
• Marked files are removed a_er 1 week
• Not backed up
• Quotas • Quota increases must be requested by owner or designated manager
![Page 34: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/34.jpg)
Best Prac.ces • Benchmark the system whenever you can
• Especially when you first get it (this is the baseline) • Then, every 9me you take the system down (so that you can tell if something has changed)
• Run the EXACT SAME test!
• Test the redundancy and reliability • Does it survive a drive or server failure? Power something off or rip it out while you are pu~ng a load on it
• Don’t solely rely on generic benchmarks • Run the applica9ons your stakeholders care about
• Regularly get data about your data • Monitor the status of your filesystem, proac9vely fix problems
• Constantly ask users (and other stakeholders) how they feel about performance • It doesn’t maoer if benchmarks are good if they feel it is bad
18-22 May 2015 34
![Page 35: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/35.jpg)
How About Cloud and Big Data?
Design/Standards:
• POSIX : Portable Opera9ng System Interface (NFS, GPFS, Panasas, Lustre )
• REST: Representa9onal State Transfer, designed for scalable web services
Case-‐specific soluMons:
• So_ware defined, hardware independent storage (e.g. Swi_) • Proprietary object storage (e.g. S3 for AWS, which is RESTful)
• Geo-‐replica9on: DDN WOS, Azure, Amazon S3
• Open Source object storage: Ceph vs Gluster vs … • Big data (map/reduce): Hadoop Distributed File System (HDFS), QFS, …
18-22 May 2015 35
![Page 36: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/36.jpg)
Future
• Hybridiza9on of storage • Connec9ng different storages
• Seamless migra9on between storage solu9ons ( Object store <-‐> Object store, POSIX <-‐> Object)
• Ethernet connected drives • Seagate’s Kine9c interface • HGST’s open Ethernet drive
• YAC (Yet Another Cache) • Intel Cache Accelera9on So_ware • DDN Infinite Memory Engine • IBM FlashCache
18-22 May 2015 36
![Page 37: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/37.jpg)
BONUS material: a lible bit of Benchmarking
18-22 May 2015 37
• Use real user applica9ons when possible!
• “dd” … quick & easy.
• “iozone” great for single/mul9-‐node read/write performance
• “Bonnie++” simple to run, but comprehensive suite of tests
• “zcav” good test for spinning hard disks, where speed is a func9on of distance from the first sector.
![Page 38: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/38.jpg)
dd • Never run as root (destruc9ve if used incorrectly)! • Reads from and input file “if” and writes to output file “of”
• You don’t need to use real files… • can read from devices e.g. /dev/zero, /dev/random, etc • can write to /dev/null
• Caching can be misleading… Prefer direct I/O (oflag=direct)
Example: $ dd if=/dev/zero of=./test.dd bs=1G count=1 oflag=direct1+0 records in1+0 records out1073741824 bytes (1.1 GB) copied, 37.8403 s, 28.4 MB/s
18-22 May 2015 38
![Page 39: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/39.jpg)
iozone • Common test u9lity for read/write performance • Great for both single-‐node and mul9-‐node tes9ng (i.e. aggr. perf.) • Sensi9ve to caching, use “-I” for direct I/O. • Can run mul9threaded (-‐t) • Simple, ‘auto’ mode:
iozone –a
• Or pick tests using ‘-‐i’: -‐i 0: Read/re-‐read -‐i 1: write/rewrite
18-22 May 2015 39
# iozone -i 0 -i 1 -+n -r 1M -s 1G -t 16 -I … … Throughput test with 16 processes Each process writes a 1048576 kByte file in 1024 kByte records
Children see throughput for 16 initial writers = 1582953.29 kB/sec Parent sees throughput for 16 initial writers = 1542978.62 kB/sec Min throughput per process = 97130.07 kB/sec Max throughput per process = 100058.16 kB/sec Avg throughput per process = 98934.58 kB/sec Min xfer = 1019904.00 kB
Children see throughput for 16 readers = 1393510.91 kB/sec Parent sees throughput for 16 readers = 1392664.11 kB/sec Min throughput per process = 84657.09 kB/sec Max throughput per process = 88483.99 kB/sec Avg throughput per process = 87094.43 kB/sec Min xfer = 1003520.00 kB
![Page 40: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/40.jpg)
iozone mul.-‐node tes.ng
• Great for tes9ng for HPC storage “peak” aggregate performance
• Network becomes a significant contributor
• Requires a “hos�ile” with: hosts, test_dir, iozone_path E.g: iw-h34-17 /gpfs/pace1/ddn /usr/bin/iozone iw-h34-18 /gpfs/pace1/ddn /usr/bin/iozone iw-h34-19 /gpfs/pace1/ddn /usr/bin/iozone
• Fire away! iozone -i 0 -i 1 -+n –e -r 128k –s <file_size> -t <num_threads> -+m <hostfile> -i : tests (0: read/re-read, 1: write/re-write) -+n : no retests selected -e : include flush/fflush in timing calculations -r : record (block) size in Kb
18-22 May 2015 40
![Page 41: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/41.jpg)
Bonnie++
• Comprehensive set of tests: • Create files in sequen9al order • Stat files in sequen9al order • Delete files in sequen9al order • Create files in random order • Stat files in random order • Delete files in random order (-‐-‐wikipedia)
• Just ‘cd’ to the directory on the filesystem, then run ‘bonnie++’
• Uses 2x client memory (by default) to avoid caching effects
• Reports performance (K/sec, higher are beoer) and the CPU used to perform opera9ons (%CP, lower are beoer)
• Highly configurable, check its man page!
18-22 May 2015 41
![Page 42: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/42.jpg)
Bonnie++
$ bonnie++Writing with putc()...done……Delete files in random order...done.Version 1.03e ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CPatlas-6.pace. 7648M 39329 74 235328 34 2599 2 37794 67 37943 4 46.9 0 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 740 6 930 2 89 0 282 2 340 1 212 1atlas-6.pace.gatech.edu,7648M,39329,74,235328,34,2599,2,37794,67,37943,4,46.9,0,16,740,6,930,2,89,0,282,2,340,1,212,1
18-22 May 2015 42
![Page 43: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/43.jpg)
zcav • Part of the Bonnie++ suite • “Constant Angular Velocity (CAV)” tests for spinning media • I/O Performance will differ depending on the distance of the heads from the center of the circular spinning media (first sector).
• Not meaningful for network aoached storage
• SSD runs can be interes9ng (you expect to see a flat line, but…)
18-22 May 2015 43
SATA disk example hop://www.coker.com.au/bonnie++/zcav/results.html SSD example
(GT machine)
![Page 44: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/44.jpg)
zcav
18-22 May 2015 44
Example: $ zcav –f /dev/sda #loops: 1, version: 1.03e#block offset (GiB), MiB/s, time0.00 115.75 2.2120.25 95.93 2.6690.50 114.63 2.2330.75 119.14 2.149……
What’s going on here?? $ zcav –f /dev/sda #loops: 1, version: 1.03e#block offset (GiB), MiB/s, time#0.00 ++++ 0.092#0.25 ++++ 0.094#0.50 ++++ 0.091……
When you run the same example twice, you see super fast “cached” results! Here’s how you flush I/O cache:
sync && echo 3 > /proc/sys/vm/drop_caches
![Page 45: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&](https://reader033.vdocument.in/reader033/viewer/2022042310/5ed7a2b721f2f81ba73da21c/html5/thumbnails/45.jpg)
The End. (for real this .me)
18-22 May 2015 45
Thanks for par9cipa9ng!