research computing at ilri
TRANSCRIPT
![Page 1: Research computing at ILRI](https://reader033.vdocument.in/reader033/viewer/2022042614/55a2bf5e1a28abfe3e8b4621/html5/thumbnails/1.jpg)
Research Computing at ILRI
Alan Orth
ICT Managers Meeting, ILRI, Kenya, 5 March 2014
![Page 2: Research computing at ILRI](https://reader033.vdocument.in/reader033/viewer/2022042614/55a2bf5e1a28abfe3e8b4621/html5/thumbnails/2.jpg)
Where we came from (2003)
- 32 dual-core computenodes- 32 * 2 != 64- Writing MPI code is hard!- Data storage over NFS to“master” node- “Rocks” cluster distro- Revolutionary at the time!
![Page 3: Research computing at ILRI](https://reader033.vdocument.in/reader033/viewer/2022042614/55a2bf5e1a28abfe3e8b4621/html5/thumbnails/3.jpg)
Where we came from (2010)
- Most of the original clusterremoved- Replaced with singleDell PowerEdge R910- 64 cores, 8TB storage, 128 GB- Threading is easier* than MPI!- Data is local- Easier to manage!
![Page 4: Research computing at ILRI](https://reader033.vdocument.in/reader033/viewer/2022042614/55a2bf5e1a28abfe3e8b4621/html5/thumbnails/4.jpg)
To infinity and beyond (2013)
- A little bit back to the“old” model- Mixture of “thin” and“thick” nodes- Networked storage- Pure CentOS- Supermicro boxen- Pretty exciting! --->
![Page 5: Research computing at ILRI](https://reader033.vdocument.in/reader033/viewer/2022042614/55a2bf5e1a28abfe3e8b4621/html5/thumbnails/5.jpg)
Primary characteristics
Computational capacity
Data storage
![Page 6: Research computing at ILRI](https://reader033.vdocument.in/reader033/viewer/2022042614/55a2bf5e1a28abfe3e8b4621/html5/thumbnails/6.jpg)
Platform
- 152 compute cores- 32* TB storage- 700 GB RAM- 10 GbE interconnects- LTO-4 tape backups (LOL?)
![Page 7: Research computing at ILRI](https://reader033.vdocument.in/reader033/viewer/2022042614/55a2bf5e1a28abfe3e8b4621/html5/thumbnails/7.jpg)
Homogeneous computing environment
User IDs, applications, and data are available everywhere.
![Page 8: Research computing at ILRI](https://reader033.vdocument.in/reader033/viewer/2022042614/55a2bf5e1a28abfe3e8b4621/html5/thumbnails/8.jpg)
Scaling out storage with GlusterFS
- Developed by Red Hat- Abstracts backend storage (file systems, technology, etc)- Can do replicate, distribute, replicate+distribute, geo-replication (off site!), etc- Scales “out”, not “up”
![Page 9: Research computing at ILRI](https://reader033.vdocument.in/reader033/viewer/2022042614/55a2bf5e1a28abfe3e8b4621/html5/thumbnails/9.jpg)
How we use GlusterFS
[aorth@hpc: ~]$ df -hFilesystem Size Used Avail Use% Mounted on
...wingu1:/homes 31T 9.5T 21T 32% /homewingu0:/apps 31T 9.5T 21T 32% /export/appswingu1:/data 31T 9.5T 21T 32% /export/data
- Persistent paths for homes, data, and applications across the cluster.- These volumes are replicated, so essentially application-layer RAID1
![Page 10: Research computing at ILRI](https://reader033.vdocument.in/reader033/viewer/2022042614/55a2bf5e1a28abfe3e8b4621/html5/thumbnails/10.jpg)
GlusterFS <3 10GbE
![Page 11: Research computing at ILRI](https://reader033.vdocument.in/reader033/viewer/2022042614/55a2bf5e1a28abfe3e8b4621/html5/thumbnails/11.jpg)
- Project from Lawrence Livermore National Labs (LLNL)- Manages resources
- Users request CPU, memory, and node allocations- Queues / prioritizes jobs, logs usage, etc
- More like an accountant than a bouncer
![Page 12: Research computing at ILRI](https://reader033.vdocument.in/reader033/viewer/2022042614/55a2bf5e1a28abfe3e8b4621/html5/thumbnails/12.jpg)
Topology
![Page 13: Research computing at ILRI](https://reader033.vdocument.in/reader033/viewer/2022042614/55a2bf5e1a28abfe3e8b4621/html5/thumbnails/13.jpg)
How we use SLURM
- Can submit “batch” jobs (long-running jobs, invoke program many times with different variables, etc)- Can run “interactively” (something that needs keyboard interaction)
Make it easy for users to do the “right thing”:
[aorth@hpc: ~]$ interactive -c 10salloc: Granted job allocation 1080[aorth@compute0: ~]$
![Page 14: Research computing at ILRI](https://reader033.vdocument.in/reader033/viewer/2022042614/55a2bf5e1a28abfe3e8b4621/html5/thumbnails/14.jpg)
Managing applications
- Environment modules - http://modules.sourceforge.net- Dynamically load support for packages in a user’s environment- Makes it easy to support multiple versions, complicated packages with $PERL5LIB, package dependencies, etc
![Page 15: Research computing at ILRI](https://reader033.vdocument.in/reader033/viewer/2022042614/55a2bf5e1a28abfe3e8b4621/html5/thumbnails/15.jpg)
Managing applications
Install once, use everywhere...
[aorth@hpc: ~]$ module avail blastblast/2.2.25+ blast/2.2.26 blast/2.2.26+ blast/2.2.28+
[aorth@hpc: ~]$ module load blast/2.2.28+
[aorth@hpc: ~]$ which blastn/export/apps/blast/2.2.28+/bin/blastn
Works anywhere on the cluster!
![Page 16: Research computing at ILRI](https://reader033.vdocument.in/reader033/viewer/2022042614/55a2bf5e1a28abfe3e8b4621/html5/thumbnails/16.jpg)
Users and Groups
- Consistent UID/GIDs across systems- LDAP + SSSD (also from Red Hat) is a great match- 389 LDAP works great with CentOS- SSSD is simpler than pam_ldap and does caching