htpc - high throughput parallel computing (on the osg)
DESCRIPTION
HTPC - High Throughput Parallel Computing (on the OSG). Dan Fraser, UChicago OSG Production Coordinator Horst Severini , OU (Greg Thain , Uwisc ) OU Supercomputing Symposium Oct 6, 2010. Rough Outline. What is the OSG? (think ebay ) HTPC as a new paradigm - PowerPoint PPT PresentationTRANSCRIPT
HTPC - High Throughput HTPC - High Throughput Parallel ComputingParallel Computing
(on the OSG)(on the OSG)Dan Fraser, UChicago Dan Fraser, UChicago OSG Production CoordinatorOSG Production Coordinator
Horst Severini, OUHorst Severini, OU(Greg Thain, Uwisc)(Greg Thain, Uwisc)
OU Supercomputing SymposiumOU Supercomputing SymposiumOct 6, 2010Oct 6, 2010
Rough OutlineRough Outline
What is the OSG? (think ebay)What is the OSG? (think ebay)HTPC as a new paradigmHTPC as a new paradigmAdvantages of HTPC for parallel jobsAdvantages of HTPC for parallel jobsHow does HTPC work?How does HTPC work?Who is using it?Who is using it?The FutureThe FutureConclusionsConclusions
Making sense of the OSGMaking sense of the OSG
OSG = Technology + Process + SociologyOSG = Technology + Process + Sociology70+ sites70+ sites (& growing) -- Supply (& growing) -- Supply contribute resources to the OSGcontribute resources to the OSG
Virtual Organizations -- DemandVirtual Organizations -- Demand VO’s are Multidisciplinary Research GroupsVO’s are Multidisciplinary Research Groups
Sites and VOs often overlapSites and VOs often overlapOSG Delivers: OSG Delivers: ~1M CPU hours every day~1M CPU hours every day 1 Pbyte of data transferred every day1 Pbyte of data transferred every day
eBay (naïve) eBay (naïve)
EBAY
Demand Supply
List itemSearch/buy
eBay (more realistic) eBay (more realistic)
Demand SupplyUPS Endpoints
Match Maker
Accounting Ratings
Complaints
PayPalUPS
Bank 1Bank 2
OSG-Bay OSG-Bay
VO’s(Demand)
ResourceProviders(Supply)
ClientStack
Match Making
Accounting Monitoring
Integration& Testing
EngageSupport
ServerStack
SoftwarePackaging
Ticketing
Servers &Storage
Servers &Storage
Servers &Storage
Security
Coordination
ProposalAnchoring …
…
Where does HTPC fit?Where does HTPC fit?
The two familiar HPC Models The two familiar HPC Models
High Throughput Computing (e.g. OSG)High Throughput Computing (e.g. OSG) Run ensembles of single core jobsRun ensembles of single core jobs
Capability Computing (e.g. TeraGrid)Capability Computing (e.g. TeraGrid) A few jobs parallelized over the whole systemA few jobs parallelized over the whole system Use whatever parallel s/w is on the systemUse whatever parallel s/w is on the system
HTPC – an emerging modelHTPC – an emerging model
Ensembles of small- Ensembles of small- way parallel jobsway parallel jobs(10’s – 1000’s)(10’s – 1000’s)
Use whatever Use whatever parallel s/w you want parallel s/w you want (It ships with the job)(It ships with the job)
Tackling Four ProblemsTackling Four Problems
Parallel job portabilityParallel job portability
Effective use of multi-core technologiesEffective use of multi-core technologies
Identify suitable resources & submit jobsIdentify suitable resources & submit jobs
Job Management, tracking, accounting, …Job Management, tracking, accounting, …
Current plan of attackCurrent plan of attack
Force jobs to consume an entire processorForce jobs to consume an entire processor Today 4-8+ cores, tomorrow 32+ cores, …Today 4-8+ cores, tomorrow 32+ cores, … Package jobs with a parallel libraryPackage jobs with a parallel library
HTPC jobs as portable as any other jobHTPC jobs as portable as any other jobMPI, OpenMP, your own scripts, …MPI, OpenMP, your own scripts, …Parallel libraries can be optimized for on-board Parallel libraries can be optimized for on-board memory accessmemory access
All memory is available for efficient utilizationAll memory is available for efficient utilization Submit the jobs via OSG (or Condor-G)Submit the jobs via OSG (or Condor-G)
Problem areasProblem areas
Advertising HTPC capability on OSGAdvertising HTPC capability on OSGAdapting OSG job submission/mgmt toolsAdapting OSG job submission/mgmt tools GlideinWMSGlideinWMS
Ensure that Gratia accounting can identify Ensure that Gratia accounting can identify jobs and apply the correct multiplierjobs and apply the correct multiplierSupport more HTPC scientistsSupport more HTPC scientistsHTPC enable more sitesHTPC enable more sites
What’s the magic RSL?What’s the magic RSL?
Site SpecificSite Specific We’re working on documents/standards We’re working on documents/standards
PBSPBS (host_xcount=1)(xcount=8)(queue=?)(host_xcount=1)(xcount=8)(queue=?)
LSFLSF (queue=?)(exclusive=1)(queue=?)(exclusive=1)
CondorCondor (condorsubmit=(‘+WholeMachine’ true))(condorsubmit=(‘+WholeMachine’ true))
Examples of HTPC users:Examples of HTPC users:
Oceanographers:Oceanographers: Brian Blanton, Howard Lander (RENCI)Brian Blanton, Howard Lander (RENCI)
Redrawing flood map boundariesRedrawing flood map boundaries ADCIRCADCIRC
Coastal circulation and storm surge modelCoastal circulation and storm surge modelRuns on 256+ cores, several daysRuns on 256+ cores, several days
Parameter sensitivity studiesParameter sensitivity studiesDetermine best settings for large runsDetermine best settings for large runs220 jobs to determine optimal mesh size220 jobs to determine optimal mesh sizeEach job takes 8 processors, several hoursEach job takes 8 processors, several hours
Examples of HTPC users:Examples of HTPC users:
ChemistsChemists UW Chemistry groupUW Chemistry group GromacsGromacs Jobs take 24 hours on 8 coresJobs take 24 hours on 8 cores Steady stream of 20-40 jobs/daySteady stream of 20-40 jobs/day
Peak usage is 320,000 hours per monthPeak usage is 320,000 hours per monthWritten 9 papers in 10 months based on thisWritten 9 papers in 10 months based on this
Chemistry Usage of HTPCChemistry Usage of HTPC
OSG sites that allow HTPCOSG sites that allow HTPC
OUOU The first site to run HTPC jobs on the OSG!The first site to run HTPC jobs on the OSG!
PurduePurdueClemsonClemsonNebraskaNebraskaSan Diego, CMS Tier-2San Diego, CMS Tier-2
Your site can be on this list!
Future DirectionsFuture Directions
More Sites, more cycles!More Sites, more cycles!
More usersMore users Working with Atlas (AthenaMP)Working with Atlas (AthenaMP) Working with Amber 9Working with Amber 9 There is room for you…There is room for you…
Use glide-in to homogenize accessUse glide-in to homogenize access
ConclusionsConclusions
HTPC adds a new dimension to HPC HTPC adds a new dimension to HPC computing – ensembles of parallel jobscomputing – ensembles of parallel jobsThis approach minimizes portability issues This approach minimizes portability issues with parallel codeswith parallel codesKeep same job submission modelKeep same job submission modelNot hypothetical – we’re already running Not hypothetical – we’re already running HTPC jobsHTPC jobsThanks to many helping handsThanks to many helping hands
Additional SlidesAdditional Slides
Some of these are from Greg Thain Some of these are from Greg Thain (UWisc)(UWisc)
The playersThe players
Dan FraserDan FraserComputation Inst.Computation Inst.University of ChicagoUniversity of Chicago
Miron LivnyMiron LivnyU WisconsinU Wisconsin
John McGeeJohn McGeeRENCIRENCI
Greg ThainGreg ThainU WisconsinU WisconsinKey DeveloperKey Developer
Funded by NSF-STCIFunded by NSF-STCI
Configuring Condor for HTPCConfiguring Condor for HTPC
Two strategies:Two strategies: Suspend/drain jobs to open HTPC slotsSuspend/drain jobs to open HTPC slots Hold empty cores until HTPC slot is openHold empty cores until HTPC slot is open
http://condor-wiki.cs.wisc.eduhttp://condor-wiki.cs.wisc.edu
How to submitHow to submit
universe = vanilla
requirements = (CAN_RUN_WHOLE_MACHINE =?= TRUE)
+RequiresWholeMachine=trueexecutable = some job
arguments = arguments
should_transfer_files = yes
when_to_transfer_output = on_exit
transfer_input_files = inputs
queue
MPI on Whole machine jobsMPI on Whole machine jobs
universe = vanilla
requirements = (CAN_RUN_WHOLE_MACHINE =?= TRUE)
+RequiresWholeMachine=true
executable = mpiexec
arguments = -np 8 real_exeshould_transfer_files = yes
when_to_transfer_output = on_exit
transfer_input_files = real_exequeue
Whole machine mpi submit file
How to submit to OSGHow to submit to OSGuniverse = griduniverse = grid
GridResource = some_grid_hostGridResource = some_grid_host
GlobusRSL = MagicRSLGlobusRSL = MagicRSL
executable = wrapper.shexecutable = wrapper.sh
arguments = argumentsarguments = arguments
should_transfer_files = yesshould_transfer_files = yes
when_to_transfer_output = on_exitwhen_to_transfer_output = on_exit
transfer_input_files = inputstransfer_input_files = inputs
transfer_output_files = outputtransfer_output_files = output
queuequeue