Download - NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing
![Page 1: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ce65503460f949b3fe0/html5/thumbnails/1.jpg)
NSF Site Visit2-23-2006
HYDRAUsing Windows Desktop Systems in Distributed Parallel Computing
![Page 2: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ce65503460f949b3fe0/html5/thumbnails/2.jpg)
Introduction…
• Windows desktop systems at IUB student labs– 2300 systems, 3 year replacement cycle– Pentium IV (>=1.6 GHz), 256/512/1024 MB
memory, 10/100 Mbps/GigE, Windows XP– More than 1.5 TF
NSF Site Visit2-23-2006
![Page 3: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ce65503460f949b3fe0/html5/thumbnails/3.jpg)
Possibly Utilize Idle Cycles?
Red: total owner Blue: total idle Green: total Condor
NSF Site Visit2-23-2006
![Page 4: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ce65503460f949b3fe0/html5/thumbnails/4.jpg)
Problem Description
• Once again... Windows desktop systems at IUB student labs:
– As a scientific resource
– Harvest idle cycles
NSF Site Visit2-23-2006
![Page 5: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ce65503460f949b3fe0/html5/thumbnails/5.jpg)
Constraints
• Systems dedicated to students using desktop office applications — not parallel scientific computing – making their availability unpredictable and sporadic
• Microsoft Windows environment
• Daily software rebuild (updates)
NSF Site Visit2-23-2006
![Page 6: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ce65503460f949b3fe0/html5/thumbnails/6.jpg)
What could these systems be used for?
• Many small computations and a few small messages– Foreman-worker– Parameter studies– Monte Carlo
• Goal: High Throughput Computing (not HPC)– Parallel runs of the aforementioned small computations
to make better use of resource– Parallel libraries – MPI, PVM, etc. – have constraints if
availability of resources is ephemeral i.e. not predictable
NSF Site Visit2-23-2006
![Page 7: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ce65503460f949b3fe0/html5/thumbnails/7.jpg)
Solution• Simple Message Brokering Library (SMBL)
– Limited replacement for MPI• Both server and client library based on TCP socket
abstraction
– Porting from MPI is fairly straight forward
• Process and Port Manager (PPM) • Plus …
– Condor for job management, file transfer, no checkpointing or parallelism
– Web portal for job submission
NSF Site Visit2-23-2006
![Page 8: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ce65503460f949b3fe0/html5/thumbnails/8.jpg)
The Big PictureWe’ll discuss each part in more detail next…
The shaded box indicates components hosted on multiple desktop computers
NSF Site Visit2-23-2006
![Page 9: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ce65503460f949b3fe0/html5/thumbnails/9.jpg)
SMBL (Server)
• SMBL server maintains a dynamic pool of client process connections
• Worker job manager hides details of ephemeral workers at the application level
SMBL Rank Condor Assigned Node
0
(Foreman)
Wrubel Computing Center, sacramento
1 Chemistry Student Lab, computer_14
2 CS Student Lab, computer_8
3 Library, computer_6
SMBL Server Process Table for 4 CPU parallel session
NSF Site Visit2-23-2006
![Page 10: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ce65503460f949b3fe0/html5/thumbnails/10.jpg)
SMBL (Server)
• SMBL server maintains a dynamic pool of client process connections
• Worker job manager hides details of ephemeral workers at the application level
SMBL Rank Condor Assigned Node
0
(Foreman)
Wrubel Computing Center, sacramento
1 Chemistry Student Lab, computer_14
2 Physics Student Lab, computer_11
3 Library, computer_6
SMBL Server Process Table for 4 CPU parallel session
NSF Site Visit2-23-2006
![Page 11: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ce65503460f949b3fe0/html5/thumbnails/11.jpg)
SMBL (Client)
• Client library implements selected MPI-like calls– MPI_Send () SMBL_Send ()
– MPI_Recv () SMBL_Recv ()
• In charge of message delivery for each parallel process
NSF Site Visit2-23-2006
![Page 12: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ce65503460f949b3fe0/html5/thumbnails/12.jpg)
Process and Port Manager (PPM)
• Starts the SMBL server and application processes on demand
• Assigns port/host to each parallel session• Directs workers to their servers
NSF Site Visit2-23-2006
![Page 13: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ce65503460f949b3fe0/html5/thumbnails/13.jpg)
PPM with two SMBL servers (two parallel sessions)
SMBL Rank Condor Assigned Node
0 (Foreman) Wrubel Computing Center, sacramento
1 Chemistry Student Lab, computer_14
2 CS Student Lab, computer_8
3 Wells Library, computer_6
0 (Foreman) Wrubel Computing Center, sacramento
1 Wells Library, computer_27
2 Biology Student Lab, computer_4
3 CS Student Lab, computer_2
PPM (cont’d ...)
NSF Site Visit2-23-2006
Parallel Session 1
Parallel Session 2
![Page 14: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ce65503460f949b3fe0/html5/thumbnails/14.jpg)
Once again … the big picture
The shaded box indicates components hosted on multiple desktop computers
NSF Site Visit2-23-2006
![Page 15: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ce65503460f949b3fe0/html5/thumbnails/15.jpg)
Recent Development
• Hydra cluster Teragrid enabled! (Nov 2005)– Allow TG users to use resource– Virtual Host based solution – two different
URLs for IU and Teragrid users– Teragrid users authenticate against PSC’s
Kerberos server
NSF Site Visit2-23-2006
![Page 16: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ce65503460f949b3fe0/html5/thumbnails/16.jpg)
• PPM, SMBL server, Condor and web portal running on Linux server– Dual Intel Xeon 3.0 GHz, 4 GB memory,
GigE
• Second Linux server running Samba to serve BLAST database
System Layout
NSF Site Visit2-23-2006
![Page 17: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ce65503460f949b3fe0/html5/thumbnails/17.jpg)
Portal
• Creates and submits Condor files, handles data files
• Apache/PHP based• Kerberos authentication
• URLs:– http://hydra.indiana.edu (IU users)– http://hydra.iu.teragrid.org (Teragrid users)
NSF Site Visit2-23-2006
![Page 18: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ce65503460f949b3fe0/html5/thumbnails/18.jpg)
Utilization of Idle Cycles
Red: total owner Blue: total idle Green: total Condor
NSF Site Visit2-23-2006
![Page 19: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ce65503460f949b3fe0/html5/thumbnails/19.jpg)
Summary
• Large parallel computing facility created at a low cost– SMBL parallel message passing library that can deal
with ephemeral resources– PPM port broker that can handle multiple parallel
sessions
• SMBL Homepage– http://smbl.sourceforge.net (Open Source)
NSF Site Visit2-23-2006
![Page 20: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ce65503460f949b3fe0/html5/thumbnails/20.jpg)
Links and References
• Hydra Portal– http://hydra.indiana.edu (IU users)– http://hydra.iu.teragrid.org (Teragrid users)
• SMBL home page: http://smbl.sourceforge.net• Condor home page:
http://www.cs.wisc.edu/condor/• IU Teragrid home page – http://iu.teragrid.org
NSF Site Visit2-23-2006
![Page 21: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ce65503460f949b3fe0/html5/thumbnails/21.jpg)
Links and References (cont’d..)
• Parallel FastDNAml: http://www.indiana.edu/~rac/hpc/fastDNAml
• Blast: http://www.ncbi.nlm.nih.gov/BLAST• Meme: http://meme.sdsc.edu/meme/intro.html
NSF Site Visit2-23-2006