final may 2015 lci hpc storage - linux clusters institute ·...

45
Linux Clusters Ins.tute: High Performance Storage University of Oklahoma, 05/19/2015 Mehmet Belgin, Georgia Tech [email protected] (in collabora9on with Wesley Emeneker) 18-22 May 2015 1

Upload: others

Post on 30-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

Linux  Clusters  Ins.tute:  High  Performance  Storage  

 University  of  Oklahoma,  05/19/2015

Mehmet  Belgin,  Georgia  Tech    [email protected]  

(in  collabora9on  with  Wesley  Emeneker)  

18-22 May 2015 1

Page 2: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

The  Fundamental  Ques.on

18-22 May 2015 2

• How  do  we  meet  *all*  user  needs  for  storage?  

• Is  it  even  possible?  

• Confounding  factors  • User  expecta9ons  (in  their  own  words)  • Budget  constraints  • Applica9on  needs  and  use  cases  • Exper9se  in  team    • Exis9ng  infrastructure  

Page 3: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

Examples  to  Common  Storage  Systems

18-22 May 2015 3

•  Network  File  System  (NFS)  –  a  distributed  file  system  protocol  for  accessing  files  over  a  network.  

•  Lustre  –  a  parallel,  distributed  file  system  •  OSS  –  object  storage  server.  This  server  stores  stores  and  manages  pieces  of  files  (aka  objects)  

•  OST  –  object  storage  target.  This  disk  is  managed  by  the  OSS  and  stores  data  •  MDS  –  metadata  server.  This  server  stores  file  metadata.  •  MDT  –  metadata  target.  This  disk  is  managed  by  the  MDS  and  stores  file  metadata  

•  General  Parallel  File  System  (GPFS)  –  a  parallel,  distributed  file  system.  •  Metadata  is  not  owned  by  any  par9cular  server  or  set  of  servers.  •  All  clients  par9cipate  in  filesystem  management  •  NSD  –  network  storage  device  

•  Panasas/PanFS  –  a  parallel,  distributed  file  system  •  Metadata  is  owned  by  director  blades  •  File  data  is  owned  by  storage  blades  

Page 4: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

Nomenclature

18-22 May 2015 4

•  Object  store  –  a  place  where  chunks  of  data  (aka  objects)  are  stored.  Objects  are  not  files,  though  they  can  store  individual  files  or  different  pieces  of  files.  

•  Raw  space  –  what  the  the  disk  label  shows.  Typically  given  in  base  10.    

 i.e.  10TB  (terabyte)  ==  10*10^12  bytes  

•  Usable  space  -­‐  what  “df”  shows  once  the  storage  is  mounted.  Typically  given  in  base  2.    

 i.e.  10TiB  (tebibyte)  ==  10*2^40  bytes  

•  Usable  space  is  o_en  about  30%  smaller  (some9mes  more,  some9mes  less)  than  raw  space.  

Page 5: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

Which  one  is  right  for  me?

18-22 May 2015 5

Lustre  

Page 6: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

The  End.

18-22 May 2015 6

Thanks  for  par9cipa9ng!  

Page 7: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

Before  we  start…

18-22 May 2015 7

What  is  a    File  System?  

 

Page 8: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

What  is  a  filesystem?

•  A  system  for  files  (Duh!)  

•  A  source  of  constant  frustra9on  

•  A  filesystem  is  used  to  control  how  data  is  stored  and  retrieved  –Wikipedia  

•  It’s  a  container  (that  contains  files)  

•  It’s  the  set  of  disks,  servers  (computa9onal  components),  networking,  and  so_ware  

•  All  of  the  above  

18-22 May 2015 8

Page 9: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

Disclaimer

• There  are  no  right  answers  

• There  are  wrong  answers  • No,  seriously.  

• It  comes  down  to  balancing  tradeoffs  of  preferences,  exper9se,  costs,  and  case-­‐by-­‐case  analysis  

18-22 May 2015 9

Page 10: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

Know  Your  Stakeholders

18-22 May 2015 10

…  and  keep  all  of  them  happy!  (at  the  same  9me)  

 1.        Users  

2.  Managers  and  University  Leadership  

3.  University  support  staff    

4.  System  administrators  

5.  Vendor    

Users

Managers

Sysadmins

Page 11: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

What  do  you  need  to  support? Common  Storage  Requirements  (which  most  users  can’t  ar9culate)  

•  Temporary  storage  for  intermediate  results  from  jobs  (a.k.a  scratch)  

•  Long-­‐term  storage  for  run9me  use  

•  Backups  

•  Archive  

•  Expor9ng  said  filesystem  to  other  machines  (like  a  user's  Windows  XP  laptop)  

•  Virtual  Machine  hos9ng  

•  Database  hos9ng  

•  Map/Reduce  (a.k.a  Hadoop)  

•  Data  ingest  and  outgest  (DMZ?)  

•  System  Administrator  storage  

18-22 May 2015 11

Page 12: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

Tradeoffs

First,  try  to  define  ‘use  purpose’  and  ‘opera9onal  life9me’…  

•  Speed    (…  is  a  rela9ve  term!)  •  Space  • Cost  •  Scalability  • Administra9ve  burden  • Monitoring  • Reliability/Redundancy  •  Features  •  Support  from  vendor  

18-22 May 2015 12

Page 13: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

Parallel/Distributed  vs.  Serial  Filesystems*

Serial  •  It  doesn’t  scale  beyond  a  single  server  •  It  o_en  isn't  easy  to  make  it  reliable  or  redundant  beyond  a  single  server  •  A  single  server  controls  everything  

Parallel  •  Speed  increases  as  more  components  are  added  to  it  •  Built  for  distributed  redundancy  and  reliability  •  Mul9ple  servers  contribute  to  the  management  of  the  filesystem  

18-22 May 2015 13

*None of these things are 100% true

Page 14: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

The  Most  Common  Solu.ons  for  HPC

Want  to  access  your  data  from  everywhere?    You  need  “Network  Aoached  Storage  (NAS)”!  

• NFS  (serial-­‐ish)  

• GPFS  (Parallel)  

•  Lustre  (Parallel)  

•  Panasas  (Parallel)  

• What  about  others  like  OrangeFS,  Gluster,  Ceph,  XtreemFS,  CIFS,  HDFS,  Swi_,  etc.?  

18-22 May 2015 14

Page 15: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

Prepare  for  a  Challenge

• NFS  • Panasas  • GPFS  • Lustre  

•  Your  mileage  may  vary!  

18-22 May 2015 15

Administra9ve  Burden    &  needed  exper9se  

 (anectodal)  

low  

high  

Page 16: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

Network  File  System  (NFS)

• Can  be  built  from  commodity  parts  or  purchased  as  an  appliance  

• A  single  server  typically  controls  everything  

• Where  does  it  fall  for  our  tradeoffs?  •  No  so_ware  cost  •  Compa9ble  (not  100%  POSIX)  •  Underlying  Filesystem  does  not  maoer  much  (ZFS,  ext3,  …)  •  True  redundancy  is  harder  (single  point  of  failure)  •  Mostly  for  low-­‐volume,  low-­‐throughput  workloads  •  Strong  client  side  caching,  works  well  for  small  files  •  Requires  minimal  exper9se  and  (rela9vely)  easy  to  manage    

18-22 May 2015 16

*Speed      *Space      *Cost      *Scalability      *Administra9ve  Burden  *Monitoring    *Reliability/Redundancy    *Features    *Vendor  Support  

Page 17: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

General  Parallel  File  System  (GPFS)

• Can  be  built  from  commodity  parts  or  purchsed  as  an  appliance  

• All  nodes  in  the  GPFS  cluster  par9cipate  in  filesystem  management  

• Metadata  is  managed  by  every  node  in  the  cluster  

• Where  does  it  fall  in  our  tradeoffs?  

18-22 May 2015 17

Client   Client  

NSD  Server   NSD  Server  

Network  

*Speed      *Space      *Cost      *Scalability      *Administra9ve  Burden  *Monitoring    *Reliability/Redundancy    *Features    *Vendor  Support  

Page 18: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

Lustre •  Can  be  built  from  commodity  parts,  or  purchased  as  an  appliance  

•  Separate  servers  for  data  and  metadata  

• Where  does  it  fall  in  our  tradeoffs?  

18-22 May 2015 18

* Image credit: nor-tech.com

*Speed      *Space      *Cost      *Scalability      *Administra9ve  Burden  *Monitoring    *Reliability/Redundancy    *Features    *Vendor  Support  

Page 19: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

Panasas

•  Is  an  appliance  •  Separate  servers  for  metadata  and  data  

• Where  does  it  fall  in  our  tradeoffs?  

18-22 May 2015 19

* Image credit: panasas.com

*Speed      *Space      *Cost      *Scalability      *Administra9ve  Burden  *Monitoring    *Reliability/Redundancy    *Features    *Vendor  Support  

Page 20: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

Appliances

• Appliances  generally  come  with  vendor  tools  for  monitoring  and  management  

• Do  these  tools  increase  or  decrease  management  complexity?  

• How  important  is  vendor  support  for  your  team?  

 

18-22 May 2015 20

Screenshot of Panasas management tool

Page 21: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

Good  idea?  Bad  idea?  Let’s  discuss!  

• NFS  for  everything    

• Panasas  for  everything  

•  Lustre  for  everything  

• GPFS  for  everything  

   18-22 May 2015 21

Page 22: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

How  about…

•  Lustre  for  work  (files  stored  here  are  temporary)  • NFS  for  home  •  Tape  for  backup  and  archival  

•  Lustre  available  everywhere  •  Tape  available  on  data  movers  • NFS  only  available  on  login  machines  

18-22 May 2015 22

Page 23: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

Designing  your  storage  solu.on

• Who  are  the  stakeholders?  •  How  quickly  should  we  be  able  to  read  any  one  file?  •  How  will  people  want  to  use  it?  •  How  much  training  will  you  need?  •  How  much  training  will  your  users  need  to  effec9vely  use  your  storage?    •  Do  you  have  the  knowledge  necessary  to  do  the  training?  •  How  o_en  do  they  need  the  training?  •  Do  you  need  different  9ers  or  types  of  storage?  

•  Long-­‐term  •  Temporary  •  Archive  

•  From  what  science/usage  domains  are  the  users?  •  aka  what  applica9ons  will  they  be  using?  

• What  features  are  necessary?  

18-22 May 2015 23

Page 24: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

Applica.on  Driven  Tradeoffs

• Domain  Science  •  Chemistry  •  Aerospace  •  Bio*  (biology,  bioinforma9cs,  biomedical)  •  Physics  •  Business  •  Economics  •  etc.  

• Data  and  Applica9on  Restric9ons  •  HIPAA  and  PHI  •  ITAR  •  PCI  DSS  •  And  many  more  (SOX,  GLBA,  CJIS,  FERPA,  SOC,  …)  

18-22 May 2015 24

Page 25: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

What  you  need  to  know

• What  is  the  distribu9on  of  files?  •  sizes,  count  

• What  is  the  expected  workload?  •  How  many  bytes  are  wrioen  for  every  byte  read?  

•  How  many  bytes  are  read  for  each  file  opened?  

•  How  many  bytes  are  wrioen  for  each  file  opened?  

• Are  there  any  system-­‐based  restric9ons?  •  POSIX  conformance.  Do  you  need  a  POSIX  Filesystem?  

•  Limita9ons  on  number  of  files  or  files  per  directories  

•  Network  compa9bility  (IB  vs.  Ethernet)  

18-22 May 2015 25

Page 26: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

Use  Case:  Data  Movement

•  Scenario:  User  needs  to  import  a  lot  of  data  

• Where  is  the  data  coming  from?  •  Campus  LAN?  

•  Campus  WAN?  

• WAN?  

• How  o_en  will  the  data  be  ingested?  

• Does  it  need  to  be  outgested?  

• What  kind  of  data  is  it?  

•  Is  it  a  one-­‐9me  ingest  or  regular?  

18-22 May 2015 26

Page 27: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

Designing  your  storage  solu.on

• What  technologies  do  you  need  to  sa9sfy  the  requirements  that  you  now  have?  

• Can  you  put  a  number  on  the  following?  •  Minimum  disk  throughput  from  a  single  compute  node  

•  Minimum  aggregate  throughput  for  the  en9re  filesystem  for  a  benchmark  (like  iozone  or  IOR)  

•  I/O  load  for  representa9ve  workloads  from  your  site  •  How  much  data  and  metadata  is  read/wrioen  per  job?  

•  Temporary  space  requirements  

•  Archive  and  backup  space  requirements  •  How  much  churn  is  there  in  data  that  needs  to  be  backed  up?  

18-22 May 2015 27

Page 28: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

Storage  Devices

18-22 May 2015 28

•  Solid  State  •  RAM  

•  PCIe  SSD  •  SATA/SAS  SSD  

•  Spinning  Disk  •  SAS  •  NL-­‐SAS  •  SATA  

•  Tape  

speed  &  cost   capacity  

high  

low   high  

low  

o  Serial  ATA  (SATA):  $/byte,  large  capacity,  less  reliable,  slower  (7.2k  RPM)  

o  Serial  ACached  SCSI  (SAS):  $$/byte,  small  capacity,  reliable,  fast  (15k  RPM)  

o  Nearline-­‐SAS:  SATA  drives  with  SAS  interface:  more  reliable  than  SATA,  cheaper  than  SAS,  ~SATA  speeds  but  with  lower  overhead  

o  Solid  State  Disk  (SSD):  No  spinning  disks,  $$$/byte,  blazing  fast,  reliable1  

Page 29: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

What  is  an  IOP?

•  IOP  ==  Input/Output  Opera9on  •  IOPS  ==  Input/Output  Opera9ons  per  Second  • We  care  about  two  IOPS  reports  

•  The  number  we  tell  people  when  we  say  “Our  Veridian  Dynamics  Frobulator  2021  gets  300PiB/s  bandwidth!”  

•  The  number  that  affects  users  “Our  Veridian  Dynamics  Frobulator  2021  only  gets  5KiB/s  for  <insert  your  applica9on’s  name>”  

• Why  the  difference?  

18-22 May 2015 29

Page 30: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

More  tradeoffs  …

Space  vs.  Speed  •  Do  you  need  10GiB/s  and  10TiB  of  space?    •  Do  you  need  1PiB  of  usable  storage  and  1GiB/s?  •  How  do  you  meet  your  requirements?  

Large  vs.  Small  Files  • What  is  a  small  file?  

•  No  hard  rule.  It  depends  on  how  you  define  it.  •  At  GT,  small  is  <  1MiB?  

• Why  do  you  care?  •  Metadata  opera9ons  are  deadly.  The  9me  required  to  do  a  metadata  lookup  on  a  1TiB  file  takes  the  same  9me  as  a  lookup  on  a  1KiB  file.  

18-22 May 2015 30

Page 31: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

Example  Storage  Solu.on  (Georgia  Tech)

           Experienced  catastrophic  failure(s)  with  all  of  them  at  least  once    

• Panasas  appliance  for  scratch  (shared  by  all)  • GPFS  appliance  on  SATA/NL-­‐SAS/SAS  for  long-­‐term  (and  some  home)  • NFS  on  SATA  for  long-­‐term  (many  servers)  • NFS  on  SATA  for  home  (a  few  servers)  • NFS  for  administra9ve  storage  • NFS  for  daily  backups  • Coraid  system  (NFS)  for  applica9on  repository  and  VM  images  

• Building  a  homebrew  GPFS  from  commodity  components  for  scratch!  

18-22 May 2015 31

 informa9onal  purposes  only,  not  a  recommenda9on  

Page 32: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

Storage  Policies  (Georgia  Tech)

•  5GB  home  space  •  backed  up  daily  •  provided  by  GT  •  NFS  

• ∞  project  space  •  backed  up  daily  •  faculty-­‐purchased,  but  GT  buys  the  backup  space  •  Mix  of  NFS  and  GPFS  (transi9oning  to  GPFS)  

•  5TB/7TB  Scratch/Temporary  •  not  backed  up  •  purchased  by  GT  •  Panfs  (soon  to  be  something  else)  

18-22 May 2015 32

Page 33: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

Storage  Policies  (Georgia  Tech)

18-22 May 2015 33

•  Scratch  •  files  older  than  60  days  are  marked  for  removal  

•  Users  are  given  one  week  to  save  their  data  (or  make  a  plea  for  more  9me)  

•  Marked  files  are  removed  a_er  1  week  

•  Not  backed  up  

• Quotas  •  Quota  increases  must  be  requested  by  owner  or  designated  manager  

Page 34: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

Best  Prac.ces •  Benchmark  the  system  whenever  you  can  

•  Especially  when  you  first  get  it  (this  is  the  baseline)  •  Then,  every  9me  you  take  the  system  down  (so  that  you  can  tell  if  something  has  changed)  

•  Run  the  EXACT  SAME  test!  

•  Test  the  redundancy  and  reliability  •  Does  it  survive  a  drive  or  server  failure?  Power  something  off  or  rip  it  out  while  you  are  pu~ng  a  load  on  it  

•  Don’t  solely  rely  on  generic  benchmarks    •  Run  the  applica9ons  your  stakeholders  care  about  

•  Regularly  get  data  about  your  data  •  Monitor  the  status  of  your  filesystem,  proac9vely  fix  problems  

•  Constantly  ask  users  (and  other  stakeholders)  how  they  feel  about  performance  •  It  doesn’t  maoer  if  benchmarks  are  good  if  they  feel  it  is  bad  

18-22 May 2015 34

Page 35: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

How  About  Cloud  and  Big  Data?  

Design/Standards:  

•  POSIX  :  Portable  Opera9ng  System  Interface  (NFS,  GPFS,  Panasas,  Lustre  )  

•  REST:  Representa9onal  State  Transfer,  designed  for  scalable  web  services  

Case-­‐specific  soluMons:  

•  So_ware  defined,  hardware  independent  storage  (e.g.  Swi_)  •  Proprietary  object  storage  (e.g.  S3  for  AWS,  which  is  RESTful)  

• Geo-­‐replica9on:  DDN  WOS,  Azure,  Amazon  S3  

• Open  Source  object  storage:  Ceph  vs  Gluster  vs  …  •  Big  data  (map/reduce):  Hadoop  Distributed  File  System  (HDFS),  QFS,  …  

18-22 May 2015 35

Page 36: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

Future

• Hybridiza9on  of  storage  •  Connec9ng  different  storages  

•  Seamless  migra9on  between  storage  solu9ons  (  Object  store  <-­‐>  Object  store,  POSIX  <-­‐>  Object)  

•  Ethernet  connected  drives  •  Seagate’s  Kine9c  interface  •  HGST’s  open  Ethernet  drive  

•  YAC  (Yet  Another  Cache)  •  Intel  Cache  Accelera9on  So_ware  •  DDN  Infinite  Memory  Engine  •  IBM  FlashCache  

18-22 May 2015 36

Page 37: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

BONUS  material:  a  lible  bit  of  Benchmarking

18-22 May 2015 37

• Use  real  user  applica9ons  when  possible!  

•  “dd”                      …  quick  &  easy.  

•  “iozone”                great  for  single/mul9-­‐node  read/write  performance  

•  “Bonnie++”      simple  to  run,  but  comprehensive  suite  of  tests  

•  “zcav”                        good  test  for  spinning  hard  disks,  where  speed  is  a              func9on  of  distance  from  the  first  sector.    

Page 38: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

dd • Never  run  as  root  (destruc9ve  if  used  incorrectly)!  • Reads  from  and  input  file  “if”  and  writes  to  output  file  “of”  

•  You  don’t  need  to  use  real  files…  •  can  read  from  devices  e.g.  /dev/zero,  /dev/random,  etc  •  can  write  to  /dev/null  

• Caching  can  be  misleading…  Prefer  direct  I/O  (oflag=direct)  

Example:  $ dd if=/dev/zero of=./test.dd bs=1G count=1 oflag=direct1+0 records in1+0 records out1073741824 bytes (1.1 GB) copied, 37.8403 s, 28.4 MB/s

18-22 May 2015 38

Page 39: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

iozone • Common  test  u9lity  for  read/write  performance  • Great  for  both  single-­‐node  and  mul9-­‐node  tes9ng  (i.e.  aggr.  perf.)  •  Sensi9ve  to  caching,  use  “-I”  for  direct  I/O.  • Can  run  mul9threaded  (-­‐t)  •  Simple,  ‘auto’  mode:  

iozone –a  

• Or  pick  tests  using  ‘-­‐i’:  -­‐i  0:  Read/re-­‐read  -­‐i  1:  write/rewrite    

18-22 May 2015 39

# iozone -i 0 -i 1 -+n -r 1M -s 1G -t 16 -I … … Throughput test with 16 processes Each process writes a 1048576 kByte file in 1024 kByte records

Children see throughput for 16 initial writers = 1582953.29 kB/sec Parent sees throughput for 16 initial writers = 1542978.62 kB/sec Min throughput per process = 97130.07 kB/sec Max throughput per process = 100058.16 kB/sec Avg throughput per process = 98934.58 kB/sec Min xfer = 1019904.00 kB

Children see throughput for 16 readers = 1393510.91 kB/sec Parent sees throughput for 16 readers = 1392664.11 kB/sec Min throughput per process = 84657.09 kB/sec Max throughput per process = 88483.99 kB/sec Avg throughput per process = 87094.43 kB/sec Min xfer = 1003520.00 kB

Page 40: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

iozone  mul.-­‐node  tes.ng

• Great  for  tes9ng  for  HPC  storage  “peak”  aggregate  performance  

• Network  becomes  a  significant  contributor  

• Requires  a  “hos�ile”  with:  hosts,  test_dir,  iozone_path  E.g:  iw-h34-17 /gpfs/pace1/ddn /usr/bin/iozone iw-h34-18 /gpfs/pace1/ddn /usr/bin/iozone iw-h34-19 /gpfs/pace1/ddn /usr/bin/iozone

•  Fire  away!  iozone -i 0 -i 1 -+n –e -r 128k –s <file_size> -t <num_threads> -+m <hostfile> -i : tests (0: read/re-read, 1: write/re-write) -+n : no retests selected -e : include flush/fflush in timing calculations -r : record (block) size in Kb

18-22 May 2015 40

Page 41: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

Bonnie++

• Comprehensive  set  of  tests:  •  Create  files  in  sequen9al  order    •  Stat  files  in  sequen9al  order    •  Delete  files  in  sequen9al  order    •  Create  files  in  random  order    •  Stat  files  in  random  order    •  Delete  files  in  random  order              (-­‐-­‐wikipedia)  

•  Just  ‘cd’  to  the  directory  on  the  filesystem,  then  run  ‘bonnie++’  

• Uses  2x  client  memory  (by  default)  to  avoid  caching  effects  

• Reports  performance  (K/sec,  higher  are  beoer)  and  the  CPU  used  to  perform  opera9ons  (%CP,  lower  are  beoer)  

• Highly  configurable,  check  its  man  page!  

18-22 May 2015 41

Page 42: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

Bonnie++

$ bonnie++Writing with putc()...done……Delete files in random order...done.Version 1.03e ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CPatlas-6.pace. 7648M 39329 74 235328 34 2599 2 37794 67 37943 4 46.9 0 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 740 6 930 2 89 0 282 2 340 1 212 1atlas-6.pace.gatech.edu,7648M,39329,74,235328,34,2599,2,37794,67,37943,4,46.9,0,16,740,6,930,2,89,0,282,2,340,1,212,1

18-22 May 2015 42

Page 43: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

zcav   •  Part  of  the  Bonnie++  suite  •  “Constant  Angular  Velocity  (CAV)”  tests  for  spinning  media  •  I/O  Performance  will  differ  depending  on  the  distance  of  the  heads  from  the  center  of  the  circular  spinning  media  (first  sector).  

•  Not  meaningful  for  network  aoached  storage  

•  SSD  runs  can  be  interes9ng  (you  expect  to  see  a  flat  line,  but…)

18-22 May 2015 43

SATA  disk  example  hop://www.coker.com.au/bonnie++/zcav/results.html   SSD  example  

(GT  machine)  

Page 44: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

zcav

18-22 May 2015 44

Example:    $ zcav –f /dev/sda #loops: 1, version: 1.03e#block offset (GiB), MiB/s, time0.00 115.75 2.2120.25 95.93 2.6690.50 114.63 2.2330.75 119.14 2.149……

What’s  going  on  here??    $ zcav –f /dev/sda #loops: 1, version: 1.03e#block offset (GiB), MiB/s, time#0.00 ++++ 0.092#0.25 ++++ 0.094#0.50 ++++ 0.091……

When  you  run  the  same  example  twice,  you  see  super  fast  “cached”  results!  Here’s  how  you  flush  I/O  cache:  

sync && echo 3 > /proc/sys/vm/drop_caches

Page 45: Final May 2015 LCI HPC Storage - Linux Clusters Institute · Examples&to&Common&Storage&Systems 18-22 May 2015 3 • NetworkFile(System((NFS)(– adistributed&file&system&protocol&for&accessing&files&

The  End.  (for  real  this  .me)

18-22 May 2015 45

Thanks  for  par9cipa9ng!