[ieee 2005 ieee international conference on cluster computing - burlington, ma, usa...

2
Co-Scheduling Parallel Electronic Structure Calculations in SMP Cluster Environments Nurzhan Ustemirov Masha Sosonkina 1. Introduction The General Atomic and Molecular Electronic Struc- ture System (GAMESS) is a program for ab initio molec- ular quantum chemistry calculations [6]. GAMESS places signicant demands on the high-performance comput- ing platforms to perform a wide range of Hartree-Fock (HF) wave function (RHF, ROHF, UHF, GVB, and MC- SCF) calculations. Using Self-Consistent-Field (SCF) method, GAMESS iteratively approximates solution to the Shr¨ odinger equation which describes the basic structure of atoms and molecules. There are two different implemen- tations (direct and conventional) of the SCF method in GAMESS. The direct algorithm recomputes integrals on- the-y for each iteration, requiring enormous main memory and CPU resources. Whereas the conventional algorithm calculates integrals once, stores them on the disk, and reuses during iterations. In contrast with the former, this algorithm requires less computational power but has a heavy disk I/O usage. Competition for the same computational resources by a parallel GAMESS execution or by multiple instances of GAMESS code have led to inefcient resource utilization [8]. As experiments showed, on a two-processor SMP node, a parallel execution of conventional GAMESS code was signicantly slower than its direct counterpart. On the other hand, the two concurrent instances of sequential direct ex- ecutions were slower than combination of direct and con- ventional. To maximize throughput (amount of work com- pleted over certain time), a scheduling that diminishes re- source contention based on the knowledge of the current GAMESS algorithmic stage is highly desirable. Unfor- tunately, modern “all-purpose” (system-level) schedulers, such as PBS [1] and Load Leveler [4], have limited capabil- ities in this respect since they do not have a means to “peek” into application’s (GAMESS) execution. On the other hand, This work was supported in part by the U.S. Department of Energy under Contract W-7405-ENG-82, in part by Iowa State University under University Research Grant, and in part by the University of Minnesota Duluth. Ames Laboratory, Iowa State University, Ames, IA 50011, {nurzhan,masha}@scl.ameslab.gov. incorporating self-scheduling mechanisms into GAMESS based on the resource availability is rather difcult, since the system-level details may obscure the usage and devel- opment. Fortunately, there are middleware tools that could be integrated with GAMESS to act as a co-scheduler while invoking application adaptations to a given (possibly chang- ing) computing platform [2, 3]. One such middleware tool is Network Information Conveyer and Application Notica- tion (NICAN), a framework which enables adaptation func- tionality of distributed applications [7]. 2. GAMESS-NICAN Integration This work presents a modication to the integration model of NICAN into GAMESS described in [8] for the concurrent execution of sequential GAMESS jobs. In the work presented here, NICAN acts as an application-level co-scheduler in distributed SMP environments. The pri- mary goal of such a co-scheduler is to increase throughput of parallel GAMESS calculations. Figure 1. GAMESS-NICAN Integration Extending the model to parallelized GAMESS brought new challenges. In a parallel GAMESS job, all processes use the same method for the Hartree-Fock calculations, causing resource contention not only among concurrent GAMESS executions but also within a single parallel code. Prior to executing the core quantum chemistry computa- tions GAMESS-NICAN Manager (Figure 1) performs pre- processing and scheduling. First, the manager analyzes sys- tem parameters (such as main memory or disk I/O channel

Upload: masha

Post on 22-Mar-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE 2005 IEEE International Conference on Cluster Computing - Burlington, MA, USA (2005.09.27-2005.09.30)] 2005 IEEE International Conference on Cluster Computing - Co-Scheduling

Co-Scheduling Parallel Electronic Structure Calculations in SMP ClusterEnvironments ∗

Nurzhan Ustemirov† Masha Sosonkina †

1. Introduction

The General Atomic and Molecular Electronic Struc-ture System (GAMESS) is a program for ab initio molec-ular quantum chemistry calculations [6]. GAMESS placessignificant demands on the high-performance comput-ing platforms to perform a wide range of Hartree-Fock(HF) wave function (RHF, ROHF, UHF, GVB, and MC-SCF) calculations. Using Self-Consistent-Field (SCF)method, GAMESS iteratively approximates solution to theShrodinger equation which describes the basic structure ofatoms and molecules. There are two different implemen-tations (direct and conventional) of the SCF method inGAMESS. The direct algorithm recomputes integrals on-the-fly for each iteration, requiring enormous main memoryand CPU resources. Whereas the conventional algorithmcalculates integrals once, stores them on the disk, and reusesduring iterations. In contrast with the former, this algorithmrequires less computational power but has a heavy disk I/Ousage.Competition for the same computational resources by

a parallel GAMESS execution or by multiple instances ofGAMESS code have led to inefficient resource utilization[8]. As experiments showed, on a two-processor SMP node,a parallel execution of conventional GAMESS code wassignificantly slower than its direct counterpart. On the otherhand, the two concurrent instances of sequential direct ex-ecutions were slower than combination of direct and con-ventional. To maximize throughput (amount of work com-pleted over certain time), a scheduling that diminishes re-source contention based on the knowledge of the currentGAMESS algorithmic stage is highly desirable. Unfor-tunately, modern “all-purpose” (system-level) schedulers,such as PBS [1] and Load Leveler [4], have limited capabil-ities in this respect since they do not have a means to “peek”into application’s (GAMESS) execution. On the other hand,

∗This work was supported in part by the U.S. Department of Energyunder Contract W-7405-ENG-82, in part by Iowa State University underUniversity Research Grant, and in part by the University of MinnesotaDuluth.

†Ames Laboratory, Iowa State University, Ames, IA 50011,{nurzhan,masha}@scl.ameslab.gov.

incorporating self-scheduling mechanisms into GAMESSbased on the resource availability is rather difficult, sincethe system-level details may obscure the usage and devel-opment. Fortunately, there are middleware tools that couldbe integrated with GAMESS to act as a co-scheduler whileinvoking application adaptations to a given (possibly chang-ing) computing platform [2, 3]. One such middleware toolis Network Information Conveyer and Application Notifica-tion (NICAN), a framework which enables adaptation func-tionality of distributed applications [7].

2. GAMESS-NICAN Integration

This work presents a modification to the integrationmodel of NICAN into GAMESS described in [8] for theconcurrent execution of sequential GAMESS jobs. In thework presented here, NICAN acts as an application-levelco-scheduler in distributed SMP environments. The pri-mary goal of such a co-scheduler is to increase throughputof parallel GAMESS calculations.

Figure 1. GAMESS-NICAN Integration

Extending the model to parallelized GAMESS broughtnew challenges. In a parallel GAMESS job, all processesuse the same method for the Hartree-Fock calculations,causing resource contention not only among concurrentGAMESS executions but also within a single parallel code.Prior to executing the core quantum chemistry computa-tions GAMESS-NICAN Manager (Figure 1) performs pre-processing and scheduling. First, the manager analyzes sys-tem parameters (such as main memory or disk I/O channel

Page 2: [IEEE 2005 IEEE International Conference on Cluster Computing - Burlington, MA, USA (2005.09.27-2005.09.30)] 2005 IEEE International Conference on Cluster Computing - Co-Scheduling

capacity). Second, it selects an appropriate SCF algorithmbased on the node mapping assigned by the batch schedulerand on the algorithms of the peer GAMESS jobs alreadyrunning on those nodes. In a multiprocessor SMP node ifa parallel GAMESS with the conventional SCF algorithmhas been assigned to more than one processor, its algorithmwill be switched by NICAN to direct. Otherwise, the peerGAMESS jobs are checked and decisions are made, so thatonly one processor per SMP node executes conventional al-gorithm. The rationale is that multiple conventional jobson the same SMP node significantly reduce the disk I/Oresponse time. In Figure 1, the module labels (G) and(S) stand for “general-use” and “specific-use” modules,respectively. For the general-use modules, NICAN requiresno customized integration to an application. Application-specific features are included in the specific-use NICANmodules. Modifications to the Manager differentiate thisintegration model from the previous one in [8].

3. Test Results

In the experiments (Figure 2), we have queued 12 four-processor parallel jobs to the Portable Batch System (PBS)scheduler [1]. By choosing 12 jobs, we simulated a ratherlarge pool of similar (identical) parallel jobs. Identicalmeans that each job calculates the same Luciferin molecule[5] structure. One half of these jobs was pre-set to use thedirect (D) SCF method, the other half used the conventional(C) method. We have compared the execution of parallelGAMESS jobs with and without co-scheduler.

Figure 2. Execution of twelve parallel GAMESSjobs on four processors

GAMESS without co-scheduler had the best perfor-mance (Figure 2, label Best) when two types of jobs (di-rect and conventional) were queued in alternating order.Conversely, it performed the worst (Figure 2, label Worst)when all jobs with the same SCF algorithm were executedin following order. The other x-axis labels correspond tothree samples of arbitrarily ordered queues. Since the bestand worst total throughput are observed when the peer jobs

are assigned different and identical patterns of execution,respectively, switching from one algorithm to the other, per-formed by co-scheduler, has a better performance regardlessof the job order.

4. Summary

In this work, we have shown how to improve the concur-rent execution of multiple instances of parallel GAMESSjobs submitted to a cluster of SMP nodes. The advantage isobserved when a GAMESS calculation adapts its executionpattern by choosing either in-core or out-of-core implemen-tation. The choice depends on system characteristics and onthe execution patterns of the peer GAMESS calculations.In particular, the throughput for the adaptive GAMESS cal-culations was observed to be higher than for non-adaptiveones.We have enabled the adaptation process by integrating

GAMESS with a middleware NICAN. NICAN modulesdiscover system parameters, analyze peer GAMESS jobs,and choose a particular execution pattern for GAMESS. Anattractive feature of such an implementation is that it re-quires only a few lines of trivial modifications to the origi-nal GAMESS code.

References

[1] Bayucan, et al. Portable Batch System: AdministratorGuide. OpenPBS 2.3. August 2000

[2] F. Chang and V. Karamcheti. A Framework for AutomaticAdaptation of Tunable Distributed Applications. ClusterComputing 4(1): 49-62 (2001)

[3] J. Hollingsworth and P. Keleher. Prediction and adaptationin active harmony. Cluster Computing, 2(3):195-205, 1999.

[4] LoadLeveler for AIX 5L and Linux V3.2: Using and Ad-ministering. IBM LoadLeveler Publications, Third Edition,May 2004.

[5] E.H. White, F. Capra, W.D. McElroy. The Structure andSynthesis of Firefly Luciferin J. Am. Chem. Soc., 83(10),2402-2403(1961).

[6] Schmidt, et al., General Atomic and Molecular ElectronicStructure System. J. Computational Chemistry, 14, 1347-1363(1993).

[7] M. Sosonkina. Adapting Distributed Scientifi c Applicationsto Run-time Network Conditions. In PARA’04 Workshop onstate-of-the-art in scientific computing, Denmark, June 20 –23, 2004.

[8] N. Ustemirov, M. Sosonkina, M.S. Gordon, M.W. Schmidt.Concurrent Execution of Electronic Structure Calculationsin SMP Environments. In Proceedings HPC 2005, April2005.