using vasp at nersc · pdf fileusing vasp at nersc ... – edison 2.6 gb/core; 24 cores...
TRANSCRIPT
![Page 1: Using VASP at NERSC · PDF fileUsing VASP at NERSC ... – Edison 2.6 GB/core; 24 cores per node ... • Error EDDDAV: Call to ZHEGV failed. Returncode = 25 2 48](https://reader033.vdocument.in/reader033/viewer/2022042422/5aa7e7aa7f8b9a54748ca13d/html5/thumbnails/1.jpg)
Zhengji Zhao!NERSC User Engagement Group
Using VASP at NERSC
June10,2016
![Page 2: Using VASP at NERSC · PDF fileUsing VASP at NERSC ... – Edison 2.6 GB/core; 24 cores per node ... • Error EDDDAV: Call to ZHEGV failed. Returncode = 25 2 48](https://reader033.vdocument.in/reader033/viewer/2022042422/5aa7e7aa7f8b9a54748ca13d/html5/thumbnails/2.jpg)
Agenda
• VASPaccessatNERSC• GetstartedwithavailableVASPbuildsatNERSC• Commonproblemsusersruninto• GoodpracCces• CompilingVASPonNERSCsystems
-2-
![Page 3: Using VASP at NERSC · PDF fileUsing VASP at NERSC ... – Edison 2.6 GB/core; 24 cores per node ... • Error EDDDAV: Call to ZHEGV failed. Returncode = 25 2 48](https://reader033.vdocument.in/reader033/viewer/2022042422/5aa7e7aa7f8b9a54748ca13d/html5/thumbnails/3.jpg)
VASP Access at NERSC
-3-
![Page 4: Using VASP at NERSC · PDF fileUsing VASP at NERSC ... – Edison 2.6 GB/core; 24 cores per node ... • Error EDDDAV: Call to ZHEGV failed. Returncode = 25 2 48](https://reader033.vdocument.in/reader033/viewer/2022042422/5aa7e7aa7f8b9a54748ca13d/html5/thumbnails/4.jpg)
VASP is available to the users who have the license by themselves
• NeedtoconfirmyourlicensewithVASPdevelopers– Sendyourlicenseinfo(licensenumber,PI’sname,etc)[email protected]:[email protected]
– Toavoidunnecessarydelay,makesureyourarearegistereduserunderyourPI’sVASPlicense
– Thisisamanualprocess.Itmaytakeacoupleofdaysnormally,someGmetakeslonger,e.g.,whentheVASPsupportstaffisonvacaGon.
– OnceyourlicenseisconfirmedbytheVASPsupportatVienna,NERSCgivesyoutheaccesstotheVASPbinariesonNERSCmachines.Theaccessiscontrolledbyaunixfilegroup,vasp5,orvasp(forVASP4).TypegroupscommandtoseeifyouhavetheaccesstotheVASPbinariesatNERSC.
-4-
hRps://www.nersc.gov/users/soTware/applicaGons/materials-science/vasp/#toc-anchor-2
![Page 5: Using VASP at NERSC · PDF fileUsing VASP at NERSC ... – Edison 2.6 GB/core; 24 cores per node ... • Error EDDDAV: Call to ZHEGV failed. Returncode = 25 2 48](https://reader033.vdocument.in/reader033/viewer/2022042422/5aa7e7aa7f8b9a54748ca13d/html5/thumbnails/5.jpg)
Get Started with Available VASP Builds
-5-
![Page 6: Using VASP at NERSC · PDF fileUsing VASP at NERSC ... – Edison 2.6 GB/core; 24 cores per node ... • Error EDDDAV: Call to ZHEGV failed. Returncode = 25 2 48](https://reader033.vdocument.in/reader033/viewer/2022042422/5aa7e7aa7f8b9a54748ca13d/html5/thumbnails/6.jpg)
How many vasp builds are available at NERSC?
-6-
zz217@cori01:~>moduleavailvasp----------------------/usr/common/soTware/modulefiles-------------------------------vasp/5.3.5vasp/5.3.5-ccevasp/5.3.5_vtstvasp/5.3.5_vtst-ccevasp/5.4.1(default)
zz217@edison01:~>moduleavailvasp
-----------------------/usr/common/usg/Modules/modulefiles-----------------------vasp/4.6.35_vtstvasp/5.3.5vasp/5.3.5_vtst-ccevasp/5.4.1vasp/5.3.2vasp/5.3.5-cce(default)vasp/5.3.2_vtstvasp/5.3.5_vtst
ModulefilenamingconvenCon:vasp/<version><_vtst><-compiler>- WheretheversionistheofficialVASPreleaseversion;the-compileristhe
compilernamethatwasusedtobuildthecode,whenomiRed,theIntelandCraycompilerswereusedforEdisonandHopper,respecGvely;andthe_vtstdenotesthebuildswiththethirdpartycontributedcodes,VTST,Wannier90,etc.
- OnEdison,vasp/5.3.5-cceisabuildfortheofficialVASPrelease,compiledwithaCraycompiler;vasp/5.3.5_vtst-ccebuiltwithaCraycompiler,enabledVTST(U.Texas,AusGn)andWannier90.
![Page 7: Using VASP at NERSC · PDF fileUsing VASP at NERSC ... – Edison 2.6 GB/core; 24 cores per node ... • Error EDDDAV: Call to ZHEGV failed. Returncode = 25 2 48](https://reader033.vdocument.in/reader033/viewer/2022042422/5aa7e7aa7f8b9a54748ca13d/html5/thumbnails/7.jpg)
How many VASP builds are available at NERSC? -cont
• Therearedifferentcompilerbuilds– ToprovidealternaGvesbecausesomeproblemsmayoccurwithone
compilerbuildbutnotwithanother– OnEdison,IntelcompilerbuildsrunfasterthantheCraycompiler
builds;however,mayrunintoLAPACKerrorsmoreoTen.SothedefaultVASPmoduleonEdisonwasbuiltwithaCraycompiler.
• Thedefaultmoduleisrecommended• Toaccess,do:moduleloadvasp• Formoreinfo,do:moduleshowvasp/<versionstring>
-7-
![Page 8: Using VASP at NERSC · PDF fileUsing VASP at NERSC ... – Edison 2.6 GB/core; 24 cores per node ... • Error EDDDAV: Call to ZHEGV failed. Returncode = 25 2 48](https://reader033.vdocument.in/reader033/viewer/2022042422/5aa7e7aa7f8b9a54748ca13d/html5/thumbnails/8.jpg)
What do the modulefiles do?
-8-
Edison01>moduleshowvasp-------------------------------------------------------------------/usr/common/usg/Modules/modulefiles/vasp/5.3.5-cce:module-whaGs VASP:ViennaAb-iniGoSimulaGonPackageAccesstothevaspsuiteisallowedonlyforresearchgroupswithexisGnglicensesforVASP.IfyouhaveaVASPlicensepleaseemailvasp.Materialphysik@univie.ac.atandCC:vasp_licensing@nersc.govwiththeinformaGononwhichresearchgroupyourlicensederivesfrom.ThePIofthegroupaswellastheinsGtuGonandlicensenumberwillhelpspeedtheprocess.setenv PSEUDOPOTENTIAL_DIR/usr/common/usg/vasp/pseudopotenGals/5.3.5setenv VDW_KERNAL_DIR/usr/common/usg/vasp/vdw_kernalsetenv NO_STOP_MESSAGE1setenv MPICH_NO_BUFFER_ALIAS_CHECK1prepend-path PATH/usr/common/usg/vasp/vtstscripts/3.1prepend-path PATH/usr/common/usg/vasp/5.3.5-cce/bin-------------------------------------------------------------------
![Page 9: Using VASP at NERSC · PDF fileUsing VASP at NERSC ... – Edison 2.6 GB/core; 24 cores per node ... • Error EDDDAV: Call to ZHEGV failed. Returncode = 25 2 48](https://reader033.vdocument.in/reader033/viewer/2022042422/5aa7e7aa7f8b9a54748ca13d/html5/thumbnails/9.jpg)
Where do VASP binaries reside?
-9-
zz217@edison01>ls-ld/usr/common/usg/vasp/5.3.5-cce/bindrwxr-x---+4zz217vasp5512Jan621:24/usr/common/usg/vasp/5.3.5-cce/binzz217@edison01>ls-l/usr/common/usg/vasp/5.3.5-cce/bintotal395660-rwxrwxr-x+1zz217usg69225563Jul312014gvasp-rwxrwxr-x+1zz217usg66153550Aug212014makeparam-rwxrwxr-x+1zz217usg69601938Aug222014vasp-rwxr-xr-x+1zz217usg69604098Jul312014vasp.NMAX_DEG=128-rwxrwxr-x+1zz217usg60887560Aug212014vasp.seriallrwxrwxrwx1zz217zz2175Jan621:24vasp_gam->gvasp-rwxrwxr-x+1zz217usg69676474Jul312014vasp_ncllrwxrwxrwx1zz217zz2174Jan621:24vasp_std->vasp
vasp–generalkpointVASPbuildgvasp–Gammapointonlybuildvasp_ncl–Non-collinearversionOtherbinariesarespecialbuildsperuserrequestNote,thelinkstothevasp_std,vasp_gamtobeconsistentwithvasp/5.4.1version
![Page 10: Using VASP at NERSC · PDF fileUsing VASP at NERSC ... – Edison 2.6 GB/core; 24 cores per node ... • Error EDDDAV: Call to ZHEGV failed. Returncode = 25 2 48](https://reader033.vdocument.in/reader033/viewer/2022042422/5aa7e7aa7f8b9a54748ca13d/html5/thumbnails/10.jpg)
VASP builds with VTST and Wannier90 enabled
-10-
edison01>moduleshowvasp/5.3.5_vtst-cce-------------------------------------------------------------------/usr/common/usg/Modules/modulefiles/vasp/5.3.5_vtst-cce:
module-whaGs VASP:ViennaAb-iniGoSimulaGonPackage
Note:ThisbuildenabledtheVTST3.1code(U.Texas,AusGn)andWannier901.2.0.
AccesstothevaspsuiteisallowedonlyforresearchgroupswithexisGnglicensesforVASP.IfyouhaveaVASPlicensepleaseemail
[email protected]:[email protected]
withtheinformaGononwhichresearchgroupyourlicensederivesfrom.ThePIofthegroupaswellastheinsGtuGonandlicensenumberwillhelpspeedtheprocess.setenv PSEUDOPOTENTIAL_DIR/usr/common/usg/vasp/pseudopotenGals/5.3.5setenv VDW_KERNAL_DIR/usr/common/usg/vasp/vdw_kernalsetenv NO_STOP_MESSAGE1setenv MPICH_NO_BUFFER_ALIAS_CHECK1prepend-path PATH/usr/common/usg/vasp/vtstscripts/3.1prepend-path PATH/usr/common/usg/vasp/5.3.5_vtst-cce/bin-------------------------------------------------------------------
![Page 11: Using VASP at NERSC · PDF fileUsing VASP at NERSC ... – Edison 2.6 GB/core; 24 cores per node ... • Error EDDDAV: Call to ZHEGV failed. Returncode = 25 2 48](https://reader033.vdocument.in/reader033/viewer/2022042422/5aa7e7aa7f8b9a54748ca13d/html5/thumbnails/11.jpg)
VASP builds with VTST and Wannier90 enabled
-11-
zz217@edison01>ls-l/usr/common/usg/vasp/5.3.5_vtst-cce/bintotal340624-rwxrwxr-x1zz217usg100514105Feb1916:21gvasp-rwxrwxr-x1zz217usg101455347Feb1916:21vasplrwxrwxrwx1zz217zz2175Feb1716:33vasp_gam->gvasp-rwxrwxr-x1zz217usg101515371Feb1916:21vasp_ncllrwxrwxrwx1zz217zz2174Feb1716:33vasp_std->vasp-rwxrwxr-x1zz217usg45308309May82015wannier90.x
![Page 12: Using VASP at NERSC · PDF fileUsing VASP at NERSC ... – Edison 2.6 GB/core; 24 cores per node ... • Error EDDDAV: Call to ZHEGV failed. Returncode = 25 2 48](https://reader033.vdocument.in/reader033/viewer/2022042422/5aa7e7aa7f8b9a54748ca13d/html5/thumbnails/12.jpg)
Get started with VASP at NERSC
-12-
edison01>catrun.slurm#!/bin/bash-l#SBATCH-Jtest_vasp#SBATCH-pdebug#SBATCH-N2#SBATCH-t00:30:00#SBATCH–otest_vasp.o%jmoduleloadvaspsrun-n48vasp_stdedison01>sbatchrun.slurmSubmiRedbatchjob518354
hRps://www.nersc.gov/users/soTware/applicaGons/materials-science/vasp/
squeue–u<username>scancel<jobid>scontrolshow<jobid>sqs–u<username>sinfo–sscontrolhold<jobid>scontrolrelease<jobid>Scontrolupdatejob<jobid>TimeLimit=24:00:00Scontrolupdatejob<jobid>QOS=premiumFormoreinfo,readmanpages
edison01>salloc–N2–pdebug–t30:00…moduleloadvaspsrun-n48vasp_std
Runningvaspviaabatchjob
RunninginteracCvely
Commonlyusedcommands:
#SBATCH--mail-type=BEGIN,END,FAIL#SBATCH–A<yourrepo>
![Page 13: Using VASP at NERSC · PDF fileUsing VASP at NERSC ... – Edison 2.6 GB/core; 24 cores per node ... • Error EDDDAV: Call to ZHEGV failed. Returncode = 25 2 48](https://reader033.vdocument.in/reader033/viewer/2022042422/5aa7e7aa7f8b9a54748ca13d/html5/thumbnails/13.jpg)
Common Problems Users Run into
-13-
![Page 14: Using VASP at NERSC · PDF fileUsing VASP at NERSC ... – Edison 2.6 GB/core; 24 cores per node ... • Error EDDDAV: Call to ZHEGV failed. Returncode = 25 2 48](https://reader033.vdocument.in/reader033/viewer/2022042422/5aa7e7aa7f8b9a54748ca13d/html5/thumbnails/14.jpg)
VASP not found error
• NoaccesstoVASP–checkifyouhaveaccesstotheVASPbinariesatNERSC.Ifnot,confirmyourlicense.usgsw@edison01:~>groupsusgswoprofileusg#userusgswdoesnothaveaccesstoVASP
• Failedtoloadavaspmodule–– Youwillalsoseeanerrorinyourjobstandarderrorfile:module:commandnotfound
-14-
![Page 15: Using VASP at NERSC · PDF fileUsing VASP at NERSC ... – Edison 2.6 GB/core; 24 cores per node ... • Error EDDDAV: Call to ZHEGV failed. Returncode = 25 2 48](https://reader033.vdocument.in/reader033/viewer/2022042422/5aa7e7aa7f8b9a54748ca13d/html5/thumbnails/15.jpg)
Running VASP jobs with too many cores
• VASPingeneralconsideredtoscaleupto1core/atom.Ifrunningoutsidetheparallelscalingregion,variouserrorsmayoccur
• internalERRORRSPHER:runningoutofbuffer– Thiserrorwasseenwhenusingtoomanycoresforasmallsystem,andwasalsoseenwhenasmallNPAR+LPLANEisused,sothatnumberofcoresperorbitalissimilarorevenlargerthanNGZ(VASPmanual:LPLANE=.TRUE.shouldonlybeusedifNGZisatleast3*(numberofcores)/NPAR).
– TherecouldbemulGplefixesincludingmodifyingthesourcecodetoallocatelargerworkarrays,butyoucaneliminatethiserrorbyreducingthetotalnumberofcoresorusingalargerNPAR(closertothedefaultvalue),orse{ngLPALNE=.FALSE.
-15-
![Page 16: Using VASP at NERSC · PDF fileUsing VASP at NERSC ... – Edison 2.6 GB/core; 24 cores per node ... • Error EDDDAV: Call to ZHEGV failed. Returncode = 25 2 48](https://reader033.vdocument.in/reader033/viewer/2022042422/5aa7e7aa7f8b9a54748ca13d/html5/thumbnails/16.jpg)
Running with too small number of cores - Out of Memory error
• Errormessage:OOMkillerterminatedthisprocess– Edison2.6GB/core;24corespernode– Cori$GBGB/core;32corespernode
• Toavoid– Considertouse~1core/atom– GammapointcalculaGons,usegvaspinsteadofvasp– Usemorecoresand/orrunonunpackednodes,e.g.,use12core/node
-16-
![Page 17: Using VASP at NERSC · PDF fileUsing VASP at NERSC ... – Edison 2.6 GB/core; 24 cores per node ... • Error EDDDAV: Call to ZHEGV failed. Returncode = 25 2 48](https://reader033.vdocument.in/reader033/viewer/2022042422/5aa7e7aa7f8b9a54748ca13d/html5/thumbnails/17.jpg)
How many cores to use? • UsinglargernumberofcoresmaynotreducetheCme
tosoluConcosteffecCvely– VASPmayspendmostoftheGmeinthecommunicaGonevenifitdoesnotrunintoerror
• 1Core/atomisagoodreference– ConservaGvely,youmaygowith0.5core/atomorslightlyless– UseNPAR~sqrt(totalnumberofcoresused)whereapplicable– Iftherearemanykpoints,use0.5core/atomforeachkpointgroup;thetotalnumberofcores=KPARx0.5core/atomx(#ofatoms)
– IftherearemulGpleimages,totalnumberofcores=IMAGESx0.5core/atomx(#ofatoms)
• ThiswillalsoeffecCvelyreducetheOutofMemory(OOM)error
-17-
![Page 18: Using VASP at NERSC · PDF fileUsing VASP at NERSC ... – Edison 2.6 GB/core; 24 cores per node ... • Error EDDDAV: Call to ZHEGV failed. Returncode = 25 2 48](https://reader033.vdocument.in/reader033/viewer/2022042422/5aa7e7aa7f8b9a54748ca13d/html5/thumbnails/18.jpg)
Error when total number of cores is not divisible by NPAR
• M_divide:cannotsubdivide• NPARdefaultisthesameasthetotalnumberofcores– MaxNPAR=256– NPAR~sqrt(totalnumberofcores)performsbeRerthanNPAR=1orthedefaultNPAR(=totalnumberofcores)whenapplicable
-18-
![Page 19: Using VASP at NERSC · PDF fileUsing VASP at NERSC ... – Edison 2.6 GB/core; 24 cores per node ... • Error EDDDAV: Call to ZHEGV failed. Returncode = 25 2 48](https://reader033.vdocument.in/reader033/viewer/2022042422/5aa7e7aa7f8b9a54748ca13d/html5/thumbnails/19.jpg)
Error due to aliasing buffers in MPI collective calls in VASP code
• FatalerrorinPMPI_Allgatherv:Invalidbufferpointer,errorstack:PMPI_Allgatherv(1235):MPI_Allgatherv(sbuf=0x9ed6000,scount=64,MPI_DOUBLE_COMPLEX,rbuf=0x9ed6000,rcounts=0xb27f7c0,displs=0xad81140,MPI_DOUBLE_COMPLEX,comm=0xc4000003)failedPMPI_Allgatherv(1183):Buffersmustnotbealiased.ConsiderusingMPI_IN_PLACEorsehngMPICH_NO_BUFFER_ALIAS_CHECK
– ThiserrorwasseenwithHSEjobsexportMPICH_NO_BUFFER_ALIAS_CHECK=1#forbash/shsetenvMPICH_NO_BUFFER_ALIAS_CHECK1#forcsh/tcsh
– InNERSCVASPmodules,wesetMPICH_NO_BUFFER_ALIAS_CHECK=1tobypassthebufferaliasingerrorcheckinginMPIcollecGvecalls.
– IfyourunyourownVASPbuildsmayneedtosetthisenv
-19-
![Page 20: Using VASP at NERSC · PDF fileUsing VASP at NERSC ... – Edison 2.6 GB/core; 24 cores per node ... • Error EDDDAV: Call to ZHEGV failed. Returncode = 25 2 48](https://reader033.vdocument.in/reader033/viewer/2022042422/5aa7e7aa7f8b9a54748ca13d/html5/thumbnails/20.jpg)
Hung jobs – difficult to debug
• Filesystemissues–Lustre,globalhomes– Filesystemsmaysloworhangduetohardware/soTwareissues,excessiveusefromsomeusers,usersover$HOMEquota,etc.Jobsmayhangorrunslowerduetounder-performingfilesystems
• Srun–utodisabletheIObufferingofSlurm
-20-
![Page 21: Using VASP at NERSC · PDF fileUsing VASP at NERSC ... – Edison 2.6 GB/core; 24 cores per node ... • Error EDDDAV: Call to ZHEGV failed. Returncode = 25 2 48](https://reader033.vdocument.in/reader033/viewer/2022042422/5aa7e7aa7f8b9a54748ca13d/html5/thumbnails/21.jpg)
LAPACK’s ZHEGV failure
• ErrorEDDDAV:CalltoZHEGVfailed.Returncode=25248– ZHEGVsolves(diagonalizes)thecomplexgeneralizedHermiGan
eigenproblemA*x=(lambda)*B*x,whereBmustbeposiGvedefinite(i.e.,itseigenvaluesareposiGve).
– ForsomesystemsVASPmaygenerateamatrixBthatisnotposiGvedefinite(thereasonforthisisnotwellunderstood)andZHEGVreturnsanerrorcode.--commentsprovidedbyOsniMarquesatLBNL.
• ThiserrorwasmorefrequentlyseenwithIntelbuildsonEdison
• SwitchingtoCraybuildsmayhelp• Edisondefaultvaspmodule,vasp/5.3.5-cce,wasbuilt
withaCraycompiler
-21-
![Page 22: Using VASP at NERSC · PDF fileUsing VASP at NERSC ... – Edison 2.6 GB/core; 24 cores per node ... • Error EDDDAV: Call to ZHEGV failed. Returncode = 25 2 48](https://reader033.vdocument.in/reader033/viewer/2022042422/5aa7e7aa7f8b9a54748ca13d/html5/thumbnails/22.jpg)
Known issues with the Wannier90 enabled builds
• CraycompilerbuildsdonotworkwellwithWannierf90bothonEdisonandCori– lib-4095:UNRECOVERABLElibraryerrorAWRITEoperaGonisinvalidifthefileisposiGonedaTertheend-of-file.
-22-
![Page 23: Using VASP at NERSC · PDF fileUsing VASP at NERSC ... – Edison 2.6 GB/core; 24 cores per node ... • Error EDDDAV: Call to ZHEGV failed. Returncode = 25 2 48](https://reader033.vdocument.in/reader033/viewer/2022042422/5aa7e7aa7f8b9a54748ca13d/html5/thumbnails/23.jpg)
Be aware that jobs may fail due to various system issues
• Systemhardwareandsomwarefailuresoccur– Filesystemsunder-performing– Filesystemsnotavailable,GPFSexpel– Batchsystemerror– UpgradesonOS,system,libraryandapplicaGonsoTware– Nodefailure– Networkfailure– Powerglitch
• Usersmisusethesystemmayaffectotherusers– Poundingthefilesystems,andmakethemunresponsiveaffecGngalluserjobsrunningoutofthesamefilesystems
– RunningI/Ointensiveparalleljobsoutof$HOME• [email protected]
– Helpustoresolvetheproblemfaster
-23-
![Page 24: Using VASP at NERSC · PDF fileUsing VASP at NERSC ... – Edison 2.6 GB/core; 24 cores per node ... • Error EDDDAV: Call to ZHEGV failed. Returncode = 25 2 48](https://reader033.vdocument.in/reader033/viewer/2022042422/5aa7e7aa7f8b9a54748ca13d/html5/thumbnails/24.jpg)
Good practices
-24-
![Page 25: Using VASP at NERSC · PDF fileUsing VASP at NERSC ... – Edison 2.6 GB/core; 24 cores per node ... • Error EDDDAV: Call to ZHEGV failed. Returncode = 25 2 48](https://reader033.vdocument.in/reader033/viewer/2022042422/5aa7e7aa7f8b9a54748ca13d/html5/thumbnails/25.jpg)
Which system to use? Goal:getmorecompuCngdonewithagivenallocaConwithinagivenCmeperiod• Charging:
– Edisonhasamachinechargefactor2,however,sincemostcodesruntwicefasteronEdison,theMPPchargingforthesamecorejobsonbothsystemsshouldbesimilar.VASPonEdisoncaneasilyoutperformHopperby2-3Cmes.
-25-
NERSC-6ApplicaConBenchmarks
ApplicaCon CAM GAMESS GTC IMPACT-T MAESTRO MILC PARATECConcurrency 240 1024 2048 1024 2048 8192 1024Streams/Core 2 2 2 2 1 1 1EdisonTime(s) 273.08 1,125.80 863.88 579.78 935.45 446.36 173.51HopperTime(s) 348 1389 1338 618 1901 921 353Speedup1) 2.5 2.5 3.1 2.1 2.0 2.1 2.01)Speedup=[Time(Hopper)/Time(Edison)]*Streams/Core
![Page 26: Using VASP at NERSC · PDF fileUsing VASP at NERSC ... – Edison 2.6 GB/core; 24 cores per node ... • Error EDDDAV: Call to ZHEGV failed. Returncode = 25 2 48](https://reader033.vdocument.in/reader033/viewer/2022042422/5aa7e7aa7f8b9a54748ca13d/html5/thumbnails/26.jpg)
Which system to use - cont
-26-
HelenHeatNERSCprovidedthefigure
• Corihasachargefactorof2.5whileEdisonis2.AslongasanapplicaGonrunsmorethan25%fasteronCori,runningonCoriwouldbemoreMPPchargingefficient.
• VASPwouldbeabout30%fasteronCori(samecore),sobothsystemsshouldbesimilarinMPPcharge.
![Page 27: Using VASP at NERSC · PDF fileUsing VASP at NERSC ... – Edison 2.6 GB/core; 24 cores per node ... • Error EDDDAV: Call to ZHEGV failed. Returncode = 25 2 48](https://reader033.vdocument.in/reader033/viewer/2022042422/5aa7e7aa7f8b9a54748ca13d/html5/thumbnails/27.jpg)
Which system to use? - cont
• Queuepolicy:– Edisonfavorslargerjobs,thesmallregularjobs(1-682nodes)havealowerpriorityonEdison.Corihasthesamepriorityforalljobs.
– Ingeneralshorterjobshavemorebackfillopportunity.MakesuretoavoidunnecessarylongwallGmerequests.
– Bundlingupmanysmallregularjobs,iftheydosimilarcomputaGons,togetahigherqueuepriorityandachargingdiscount.
• Youcanuseeithersystemtorunyourjobs
-27-
hRps://www.nersc.gov/users/computaGonal-systems/edison/running-jobs/queues-and-policies/hRps://www.nersc.gov/users/computaGonal-systems/cori/running-jobs/queues-and-policies/
![Page 28: Using VASP at NERSC · PDF fileUsing VASP at NERSC ... – Edison 2.6 GB/core; 24 cores per node ... • Error EDDDAV: Call to ZHEGV failed. Returncode = 25 2 48](https://reader033.vdocument.in/reader033/viewer/2022042422/5aa7e7aa7f8b9a54748ca13d/html5/thumbnails/28.jpg)
Choose the right file systems to run jobs on
• Yourglobalhomes,$HOME,isnottherightfilesystemtorunyourVASPjobs– Youmayexceedyourhomequota,40GB,andcausesystemissues.– Slowdownyourselfandalsootherusers
• TheLustrefilesystems($SCRATCH)arerecommendedfilesystemstorunyourjobsbothforalargerstoragespaceandabererI/Operformance.– 10TBscratchquotaonEdison;100TBon/scratch3(forI/Ointensive
workloadonly)– 20TBscratchquotaonCori(/cscratch1)– Note:filesarenotaccessedformorethan8weeksonEdisonand12
weeksonCoriwillbepurged– Youcanbackupyourimportantfilestoyourprojectdirectory,/
project/projectdires/<yourrepo>,whichhas4TBquota,ortoHPSSaTeranalysis.
-28-
![Page 29: Using VASP at NERSC · PDF fileUsing VASP at NERSC ... – Edison 2.6 GB/core; 24 cores per node ... • Error EDDDAV: Call to ZHEGV failed. Returncode = 25 2 48](https://reader033.vdocument.in/reader033/viewer/2022042422/5aa7e7aa7f8b9a54748ca13d/html5/thumbnails/29.jpg)
Short scaling tests are recommended to choose optimal core counts and NPAR
• ThedebugqueuehasahighpriorityonEdisonandCoriINCAR:
LWAVE=.FALSE.LCHARG=.FALSE.NELMDL=-1NELM=2NSW=1
hRp://cms.mpi.univie.ac.at/vasp/vasp/ParallelisaGon_NPAR_NCORE_LPLANE_KPAR_tag.html
• AlsododebugrunstoseeifyourjobscouldruntocompleConbeforesubmihngyourlongjobs– Makesurenomemoryandotherfailuresduringtherun
-29-
![Page 30: Using VASP at NERSC · PDF fileUsing VASP at NERSC ... – Edison 2.6 GB/core; 24 cores per node ... • Error EDDDAV: Call to ZHEGV failed. Returncode = 25 2 48](https://reader033.vdocument.in/reader033/viewer/2022042422/5aa7e7aa7f8b9a54748ca13d/html5/thumbnails/30.jpg)
Ask questions and send problem reports to NERSC consultants
• Email:[email protected]• Phone:1-800-666-3772menuopCon3.• Wemaynotsolveallyourproblems,butwemaybeabletohelpyoutofindworkarounds
-30-
![Page 31: Using VASP at NERSC · PDF fileUsing VASP at NERSC ... – Edison 2.6 GB/core; 24 cores per node ... • Error EDDDAV: Call to ZHEGV failed. Returncode = 25 2 48](https://reader033.vdocument.in/reader033/viewer/2022042422/5aa7e7aa7f8b9a54748ca13d/html5/thumbnails/31.jpg)
Compiling VASP on NERSC systems
-31-
![Page 32: Using VASP at NERSC · PDF fileUsing VASP at NERSC ... – Edison 2.6 GB/core; 24 cores per node ... • Error EDDDAV: Call to ZHEGV failed. Returncode = 25 2 48](https://reader033.vdocument.in/reader033/viewer/2022042422/5aa7e7aa7f8b9a54748ca13d/html5/thumbnails/32.jpg)
Where to find the makefiles • Weprovidemakefilesfortheuserswhowanttocompilethecodesby
themselves–opentoeveryone,donotneedtoconfirmyourlicensetoaccessthemakefiles
• ThemakefilesareavailableatthevaspinstallaCondirectories– Instdir=/usr/common/usgonEdison;Instdir=/usr/common/soTwareonCori– VASP5.3.5:$instdir/vasp/<version-string>/makefiles– VASP5.4.1:themakefile.includeisavailableat$instdir/vasp/5.4.1.– Trythebuildscript,build.shforVASP5.3.5,orrunmakeaTergetthemake.include
file– Tocompilerotherolderversions(<5.3.5)youmayjustneedtoreplacetheSOURCE
variableinthemakefilewiththelistofsourcefilesthatmatchyourVASPversion(getthesourcelistfromthemakefilesthatweredistributedwithyourvaspcode).
• VASPwasbuiltwithtwocompilersingeneral,IntelandCraycompilersonEdisonandCori.– IntelCompiler+MKL+�w3wrapperrouGnesfromMKL– CrayCompiler+Libsci+�w3
-32-
![Page 33: Using VASP at NERSC · PDF fileUsing VASP at NERSC ... – Edison 2.6 GB/core; 24 cores per node ... • Error EDDDAV: Call to ZHEGV failed. Returncode = 25 2 48](https://reader033.vdocument.in/reader033/viewer/2022042422/5aa7e7aa7f8b9a54748ca13d/html5/thumbnails/33.jpg)
Thank you
-33-