nsf site visit 2-23-2006 hydra using windows desktop systems in distributed parallel computing

Post on 17-Dec-2015

216 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

NSF Site Visit2-23-2006

HYDRAUsing Windows Desktop Systems in Distributed Parallel Computing

Introduction…

• Windows desktop systems at IUB student labs– 2300 systems, 3 year replacement cycle– Pentium IV (>=1.6 GHz), 256/512/1024 MB

memory, 10/100 Mbps/GigE, Windows XP– More than 1.5 TF

NSF Site Visit2-23-2006

Possibly Utilize Idle Cycles?

Red: total owner Blue: total idle Green: total Condor

NSF Site Visit2-23-2006

Problem Description

• Once again... Windows desktop systems at IUB student labs:

– As a scientific resource

– Harvest idle cycles

NSF Site Visit2-23-2006

Constraints

• Systems dedicated to students using desktop office applications — not parallel scientific computing – making their availability unpredictable and sporadic

• Microsoft Windows environment

• Daily software rebuild (updates)

NSF Site Visit2-23-2006

What could these systems be used for?

• Many small computations and a few small messages– Foreman-worker– Parameter studies– Monte Carlo

• Goal: High Throughput Computing (not HPC)– Parallel runs of the aforementioned small computations

to make better use of resource– Parallel libraries – MPI, PVM, etc. – have constraints if

availability of resources is ephemeral i.e. not predictable

NSF Site Visit2-23-2006

Solution• Simple Message Brokering Library (SMBL)

– Limited replacement for MPI• Both server and client library based on TCP socket

abstraction

– Porting from MPI is fairly straight forward

• Process and Port Manager (PPM) • Plus …

– Condor for job management, file transfer, no checkpointing or parallelism

– Web portal for job submission

NSF Site Visit2-23-2006

The Big PictureWe’ll discuss each part in more detail next…

The shaded box indicates components hosted on multiple desktop computers

NSF Site Visit2-23-2006

SMBL (Server)

• SMBL server maintains a dynamic pool of client process connections

• Worker job manager hides details of ephemeral workers at the application level

SMBL Rank Condor Assigned Node

0

(Foreman)

Wrubel Computing Center, sacramento

1 Chemistry Student Lab, computer_14

2 CS Student Lab, computer_8

3 Library, computer_6

SMBL Server Process Table for 4 CPU parallel session

NSF Site Visit2-23-2006

SMBL (Server)

• SMBL server maintains a dynamic pool of client process connections

• Worker job manager hides details of ephemeral workers at the application level

SMBL Rank Condor Assigned Node

0

(Foreman)

Wrubel Computing Center, sacramento

1 Chemistry Student Lab, computer_14

2 Physics Student Lab, computer_11

3 Library, computer_6

SMBL Server Process Table for 4 CPU parallel session

NSF Site Visit2-23-2006

SMBL (Client)

• Client library implements selected MPI-like calls– MPI_Send () SMBL_Send ()

– MPI_Recv () SMBL_Recv ()

• In charge of message delivery for each parallel process

NSF Site Visit2-23-2006

Process and Port Manager (PPM)

• Starts the SMBL server and application processes on demand

• Assigns port/host to each parallel session• Directs workers to their servers

NSF Site Visit2-23-2006

PPM with two SMBL servers (two parallel sessions)

SMBL Rank Condor Assigned Node

0 (Foreman) Wrubel Computing Center, sacramento

1 Chemistry Student Lab, computer_14

2 CS Student Lab, computer_8

3 Wells Library, computer_6

0 (Foreman) Wrubel Computing Center, sacramento

1 Wells Library, computer_27

2 Biology Student Lab, computer_4

3 CS Student Lab, computer_2

PPM (cont’d ...)

NSF Site Visit2-23-2006

Parallel Session 1

Parallel Session 2

Once again … the big picture

The shaded box indicates components hosted on multiple desktop computers

NSF Site Visit2-23-2006

Recent Development

• Hydra cluster Teragrid enabled! (Nov 2005)– Allow TG users to use resource– Virtual Host based solution – two different

URLs for IU and Teragrid users– Teragrid users authenticate against PSC’s

Kerberos server

NSF Site Visit2-23-2006

• PPM, SMBL server, Condor and web portal running on Linux server– Dual Intel Xeon 3.0 GHz, 4 GB memory,

GigE

• Second Linux server running Samba to serve BLAST database

System Layout

NSF Site Visit2-23-2006

Portal

• Creates and submits Condor files, handles data files

• Apache/PHP based• Kerberos authentication

• URLs:– http://hydra.indiana.edu (IU users)– http://hydra.iu.teragrid.org (Teragrid users)

NSF Site Visit2-23-2006

Utilization of Idle Cycles

Red: total owner Blue: total idle Green: total Condor

NSF Site Visit2-23-2006

Summary

• Large parallel computing facility created at a low cost– SMBL parallel message passing library that can deal

with ephemeral resources– PPM port broker that can handle multiple parallel

sessions

• SMBL Homepage– http://smbl.sourceforge.net (Open Source)

NSF Site Visit2-23-2006

Links and References

• Hydra Portal– http://hydra.indiana.edu (IU users)– http://hydra.iu.teragrid.org (Teragrid users)

• SMBL home page: http://smbl.sourceforge.net• Condor home page:

http://www.cs.wisc.edu/condor/• IU Teragrid home page – http://iu.teragrid.org

NSF Site Visit2-23-2006

Links and References (cont’d..)

• Parallel FastDNAml: http://www.indiana.edu/~rac/hpc/fastDNAml

• Blast: http://www.ncbi.nlm.nih.gov/BLAST• Meme: http://meme.sdsc.edu/meme/intro.html

NSF Site Visit2-23-2006

top related