Tony Wildish!Dan Udwary
Using the Burst Buffer on Cori
-1-
June23,2017
Full documentation
• OntheNERSCwebsite,at– h%ps://www.nersc.gov/users/computa7onal-systems/cori/burst-buffer/
-2-
What is a Burst buffer?
• UnlikeGenepool,Coribatchnodeshavenolocaldisk• Globalfilesystemscanbeslowforsomeworkloads
– Applica7onswithlargeI/Orequirements– Random-accessI/O,insteadofstreaming– Problemmagnifieswithscale
• ManyjobsreadingsamereferenceDBsetc
• Burstbufferisfastdiskyoucanuseinbatch,onCori– PhysicallypartofCori,closetothecomputenodes– Muchfaster(SSDsinsteadofspinningdisks)– Smallervolume(10’s–100’sGB,notTB)– User-configurableproper7es
• Life7me,performance
-3-
HPC memory hierarchy
Memory (DRAM)
Storage (HDD)
CPU
Past
-4-
Fast,expensive
Slow,cheap
L1/2/3cache
/global/homes,projectb,scratch
HPC memory hierarchy
Memory (DRAM)
Storage (HDD)
CPU CPU
Far Memory (DRAM)
Far Storage (HDD)
Near Storage (SSD)
Near Memory (HBM)
Past Present
-5-
HPC memory hierarchy
CPU
Far Memory (DRAM)
Far Storage (HDD)
Near Memory (HBM)
Present
-6-
Near Storage (SSD)
BurstBuffer
Accessing the Burst buffer • Corionly,notavailableonEdisonorGenepool
– ~1.8PBavailable,spreadover288nodes– AccessibletobothHaswellandKNLpar77ons– Batchnodesonly,notavailableonloginnodes
• Create/deleteBurstbufferreservaJons– Use#DWor#BBdirec7vesinyourbatchjobs
• Attopofscript,justbelowany#SBATCHdirec7ves
– Granularity:‘pool’sizefixed,butcanaskforanycapacityyouwant• 80GBdefaultgranularity• 20GB–forreallyintenseI/O,add‘pool=sm_pool’tocommands
• ViewexisJngreservaJons– ‘scontrolshowburst|grep$USER’
-7-
Burst buffer characteristics • Per-jobreservaJon–scratchspace
– Lastsaslongasthebatchjobit’screatedfor– Onlyvisibletothatbatchjob– Usedfor:
• stagingfilesin/outofjob• fastscratchspace• Checkpoints
• PersistentreservaJon–sharingdata– Canbesharedamongjobs– Life7mecontrolledbypersonwhocreatesit– Usedfor
• stagingfilesin/outofjobs• sharingdata(referencefiles)• couplingjobworkflows
– Notforlong-termstorageofdata!• Noguarantees,instancemaydisappearatany7me
-8-
Using Burst buffer as scratch
-9-
#!/bin/bash#SBATCH-pdebug#SBATCH-N1#SBATCH-Chaswell#SBATCH-t00:15:00#DWjobdwcapacity=200GBaccess_mode=stripedtype=scratch...
Requiredkeywords Size
Currently,theonlyop7on
striped:alljobsinmul7-nodejobsharethesamespace
private:eachnodeinmul7-nodejobgetsitsownspace
Using Burst buffer as scratch
-10-
#!/bin/bash#SBATCH-pdebug#SBATCH-N1#SBATCH-Chaswell#SBATCH-t00:15:00#DWjobdwcapacity=200GBaccess_mode=stripedtype=scratchcd$DW_JOB_STRIPEDcp$HOME/my-file.dat../do-something--withmy-file.dat--outputmy-output.datcpmy-output.dat$HOME/#Saveyouroutput,orloseit!
‘#DW’onlygetsyoucapacityit’suptoyoutoactuallyuseit!$DW_*environmentvariablespointtothespaceondisk
$DW_JOB_PRIVATEifmode=private
Staging data to the Burst buffer
-11-
#!/bin/bash#SBATCH-pdebug#SBATCH-N1#SBATCH-Chaswell#SBATCH-t00:15:00#DWjobdwcapacity=20GBaccess_mode=stripedtype=scratch
#DWstage_insource=/global/cscratch1/sd/username/path/to/filenamedesJnaJon=$DW_JOB_STRIPED/filenametype=file
#DWstage_outsource=$DW_JOB_STRIPED/dirnamedesJnaJon=/global/cscratch1/sd/username/path/to/dirnametype=directory
Fullpath,noenvironmentvariables!
Stagein/outfilesordirectories
• Stagingin/outhappensbefore/aqerthejobruns• #DWdirec7vesattopofscript,notinline
• Notcountedagainstyourbatch-job7me• Can’tuseenvironmentvariables–whynot?
Staging data to the Burst buffer
-12-
#!/bin/bash#SBATCH-pdebug#SBATCH-N1#SBATCH-Chaswell#SBATCH-t00:15:00#DWjobdwcapacity=20GBaccess_mode=stripedtype=scratch
#DWstage_insource=/global/cscratch1/sd/username/path/to/filenamedesJnaJon=$DW_JOB_STRIPED/filenametype=file
#DWstage_outsource=$DW_JOB_STRIPED/dirnamedesJnaJon=/global/cscratch1/sd/username/path/to/dirnametype=directory
Fullpath,noenvironmentvariables!
Stagein/outfilesordirectories
• Stagingin/outhappensbefore/aqerthejobruns• #DWdirec7vesattopofscript,notinline
• Notcountedagainstyourbatch-job7me• Can’tuseenvironmentvariables
• Yourjobhasn’t‘loggedin’yet!
Persistent Burst buffer reservations
• Use#BBdirecJvestocreate/delete,#DWtouseit– Createbatchjobstocreate/deletethereserva7on– Nolife7meguarantees,alwaysbackupvaluabledata!
– #BBcreate_persistentname=TW_BBcapacity=80GBaccess=stripedtype=scratch
• Createmypersistentreserva7on• $DW_PERSISTENT_STRIPED_TW_BBpointstodirectory
– #DWpersistentdwname=TW_BB• Useitinsubsequentbatchjobs
– #BBdestroy_persistentname=TW_BB• It’syourresponsibilitytodestroythereserva7onyourself
-13-
Burst buffer and interactive sessions
• WantaninteracJvesessionfordebugging,withBurstbuffer?Youcandothat!– Createafilewiththesame#DWor#BBdirec7vesyou’dputinabatchscript
– Usethe--bbfflagtosalloctocreatetheburstbufferalloca7on
• >salloc--qos=interac7ve–Chaswell–t01:00:00--bbf=“mybbf.conf”
– N.B.thequotesaroundthefilenameareobligatory!
– Cancreatetemporaryreserva7ons,forlife7meofinterac7vesession,orcreate/use/deletepersistentreserva7ons
-14-
Best practices
• Experiment,toseeifusingtheBurstbufferhelpsyourapplicaJon– Noteverythingwillbenefit,tryitandsee– Don’tforgettotrythedata-stagingin/outtoo!
• Preferper-jobscratchtopersistentreservaJons– Easiertomanage
• ChooseuniquenamesforpersistentreservaJons– Makethemmeaningful
• CleanuppersistentreservaJonswhendone– Builditintoyourworkflow
-15-
Environment variables
• OnlyoneDW_*environmentvariablewillbesetataJme– DW_JOB_STRIPED,DW_JOB_PRIVATE,orDW_PERSISTENT_STRIPED_*
– Butwhich?Don’twanttokeepchangingyourbatchscriptsjustbecauseyouchangedBurstbufferreserva7on!
-16-
v=`env|egrep^DW_`variable=`echo$v|awk–F=’{print$1}’`value=`echo$v|awk–F=’{print$2}’`echo“Ifoundavariablecalled$variablewithvalue$value”
Exercises! 1. Createaconfigfiletospecifya40GBpersistentreservaJon
1. Usesalloctogetaninterac7vesessionandcreatethisreserva7on2. Copysomefilestotheburstbufferdirectory3. Terminateyourinterac7vebatchsession
2. UsethescontrolcommandtolistinformaJonaboutthepersistentreservaJonyoucreated
3. CreateabatchjobtolistfilesonthepersistentreservaJon1. Submitit,waitun7litruns
4. CreateaconfigfiletodestroythepersistentreservaJon,‘execute’itwithsalloc
-17-
National Energy Research Scientific Computing Center
-18-