dynamic condor universe

Upload: rams334

Post on 30-May-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 Dynamic Condor Universe

    1/10

    Exploring Virtual Workspace Concepts in a Dynamic Universe for CondorQuinn Lewis

    ABSTRACT

    Virtualization offers a cost-effective and flexible way to use and manage

    computing resources. Such an abstraction is appealing in grid computing for better

    matching jobs (applications) to computational resources. This paper applies the virtual

    workspace concept introduced in the Globus Toolkit to the Condor workload

    management system. It allows existing computing resources to be dynamically

    provisioned at run-time by users based on application requirements instead of statically at

    design-time.

    INTRODUCTION

    A common goal of computer systems is to minimize cost while maximizing other

    criteria, such as performance, reliability, and scalability, to achieve the objectives of the

    user(s). In Grid computing, a scalable way to harness large amounts of computing power

    across various organizations is to amass several relatively inexpensive computing

    resources together. Coordinating these distributed and heterogeneous computing

    resources for the purposes of perhaps several users can be difficult. In such an

    environment, resource consumers have several varying, specific, and demanding

    requirements and preferences for how they would like their applications and services to

    leverage the resources made available by resource providers. Resource providers must

    ensure the resources meet a certain quality of service (e.g. make resources securely and

    consistently available to several concurrent users).

    In the past, control over the availability, quantity, and software configurations of

    resources has been limited to the resource provider. With virtualization, it becomes

  • 8/14/2019 Dynamic Condor Universe

    2/10

    possible for resource providers to offer up more control of the resources to a user without

    sacrificing quality of service to other resource consumers. Users (resource consumers)

    can more easily create execution environments that meet the needs of their applications

    and jobs within the policies defined by the resource providers. Such a relationship,

    enabled by virtualization, is both cost-effective and flexible for the resource producer and

    consumer. [1]

    The virtual workspace term, initially coined in [2] for use with the Globus

    Toolkit, "is an abstraction of an execution environment that can be made dynamically

    available to authorized clients by using well-defined protocols". This execution

    environment can encompass several physical resources. Generically, this concept could

    be implemented in various ways; however, virtualization has proven itself to be a

    practicable implementation. [3]

    Condor, "is a specialized workload management system for compute-intensive

    jobs" [4]. Condor currently abstracts the resources of a single physical machine into

    virtual machines which can run multiple jobs at the same time [5]. A "universe" is used

    to statically describe the execution environment in which the jobs are expected to run.

    This approach assumes the resources (whether real or virtual) have to all be allocated in

    advance. While there is support for adding more resources to an existing pool via the

    Glide-in mechanism, the user still has to dedicate the use of these other physical

    resources.

    The purpose of this paper is to describe how a Condor execution environment

    (universe) can be dynamically created at run-time by users to more flexibly and cost-

    effectively use and manage existing resources using virtualization. Two of the unique

  • 8/14/2019 Dynamic Condor Universe

    3/10

    implementation details described in this paper are the use of Microsoft Windows and

    Microsoft Virtual Server 2005 R2 for the virtual machine manager (VMM) on the host

    operating system (instead of being Linux-based using Xen or VMWare) and the use of

    differencing virtual hard disks. More details about virtual workspaces and similar

    attempts to virtualize Condor are described in Related Work. The implementation details

    of the work performed for a dynamic Condor universe are provided along with

    performance tests results. Future enhancements are included for making this work-in-

    progress more robust.

    RELATED WORK

    While virtualization has a number of applications for business computing and

    software development and testing, the work outlined in this paper most directly applies to

    technical computing, including Grid computing, clusters, and resource-scavenging

    systems.

    Grid Computing

    The use of virtualization in Grid computing has been proposed before, touting the

    benefits of legacy application support, improved security, and the ability to deploy

    computation independently of site administration. The challenges of dynamically

    creating and managing virtual machines are also described [6]. The virtual workspace

    concept [7] extended [6] to present "a unified abstraction" and address additional issues

    associated with the complexities of managing such an environment in the Grid. Two key

    differences between the Grid-related work mentioned and this paper is the emphasis on

    dynamically creating the execution environment at run-time and the (Microsoft)

    virtualization software employed.

  • 8/14/2019 Dynamic Condor Universe

    4/10

    As mentioned previously, the Condor Glide-in mechanism works in conjunction

    with the Globus Toolkit to temporarily make Globus resources available to a users

    Condor pool. This has the advantage of being able to submit Condor jobs using Condor

    capabilities (matchmaking and scheduling) on Globus managed resources [8]. However,

    it is expected that the user acquire these remote resources before the jobs are executed.

    Using virtualization allows the existing local Condor resources to be leveraged as the

    jobs require.

    Clusters

    Many of the same motivations that exist for this work have also been applied to

    clusters [9, 10] but focus more on dynamically provisioning homogenous execution

    environments on resources. Although perhaps accommodated in the design of the

    Cluster-on-Demand [9], virtualization technology is not used in the implementation of the

    system. The resources are assumed to physically exist and the software is deployed by

    re-imaging the machine. In [10], virtualization is used to provision the software on the

    cluster(s) but the time required to stage in the virtual image(s) is costly. The use of the

    differencing virtual hard disk image type in this work offers a mitigating solution to

    this problem [11].

    Condor

    Additional work with virtualization and Condor focuses on exploiting Condors

    cycle stealing capability at the University of Nebraska Lincoln to transform typical

    Windows campus machines into Unix-based machines required by researchers [12]. The

    solution leveraged coLinux to run a Condor compute node through a Windows device

    driver [13]. While some of the same motivation exists for this work, using a

  • 8/14/2019 Dynamic Condor Universe

    5/10

    virtualization technology such as Virtual Server 2005 R2 allows other operating systems

    and versions to be used and provides more flexible ways to programmatically control the

    dynamic environment.

    IMPLEMENTATION

    We leverage Condors existing ability to schedule jobs, advertise resource

    availability, and match jobs to resources and introduce a flexible extension for

    dynamically describing, deploying, and using virtual execution resources in the Condor

    universe.

    In Condor, one or more machines (resources) along with jobs (resource requests)

    are part of a collection, known as a pool. The resources in the pool have one or more of

    the following roles: Central Manager, Execute, and/or Submit. The Central Manager

    collects information and negotiates how jobs are matched to available resources. Submit

    resources allow jobs to be submitted to the Condor pool through a description of the job

    and its requirements. Execute resources run jobs submitted by users after having been

    matched and negotiated by the Central Manager. [14]

    We extend the responsibilities of each of these three different roles to incorporate

    virtualization into Condor. Each Execute resource describes the extent to which it can be

    virtualized (to the Central Manager) and is responsible for hosting additional (virtual)

    resources. The Submit resource(s) takes a workflow of jobs and requirements and

    initiates the deployment of the virtual resources plus signals its usage (start/stop) to the

    host/execute machine. The Central Manager is responsible for storing virtual machine

    metadata used for scheduling. For this implementation, a single machine is used for the

    Central Manager, Submit, and Execute roles.

  • 8/14/2019 Dynamic Condor Universe

    6/10

    The virtualization capabilities for a particular Execute resource can be published

    to the Central Manager via authorized use ofcondor_advertise. Attributes about the

    virtual Execute resources, such as the operating system (and version), available memory

    and disk space, and more specific data about the status of the virtual machine are

    included. Currently, the host Execute resource invokes condor_advertise for each

    guest or virtual Execute resource it anticipates hosting at start-up. This approach

    allows virtual resources to appear almost indistinguishable from real physical resources

    and will be included in Condors resource scheduling. Note that real resources are

    running while the virtual resources are not. They have only been described.

    Using the standard Condor tools, such as condor_status, users can view the

    resources (real and virtual) available in the pool. Users can then create workflows (using

    Windows Workflow Foundation [16]) for one or more jobs that intend to run on the

    provided resources. Since the virtual resource(s) may not be running when a job is

    submitted, the initial scheduling will fail. Fortunately, Condor provides a SOAP-based

    API for submitting and querying jobs [15]. Using this Condor API via workflows,

    unsuccessful job submissions can be checked for the intended attributes of the advertised

    machine to determine if the resource is a virtual machine and if it needs to be deployed,

    and/or if it needs to be started.

    The user can indicate specific job requirements in the workflow. These

    requirements can optionally specify the location of the files required to run the virtual

    machine for consumer flexibility (assuming the provider has allowed it). These files

    provide the operating system and necessary configuration (including Condor) for

    executing the job. The workflow is invoked by the Submit machine. If the virtual

  • 8/14/2019 Dynamic Condor Universe

    7/10

    resource is specified by the workflow, the workflow manager on the Submit machine

    either transfers the virtual machine files to the Execute resource or provides the Execute

    resource with the location and protocol for retrieving the virtual machine files. (The

    automatic copying of virtual images was not completely implemented for this paper.) For

    performance, it is expected that host Execute machines have base virtual images local to

    the resource that provide the operating system and Condor. Additional software and

    configuration can be added by in a separate file that only stores the modified blocks from

    a parent hard disk (file), called differencing virtual disks. This provides a flexible

    balance, allowing resource providers to provide base images and giving resource

    consumers the ability to extend the base images.

    The workflow, running on the Submit machine, also provides the logic for starting

    the virtual resource on the host. Microsoft Virtual Server R2 provides an API for

    managing local and remote virtual machines. The workflow leverages this API for

    starting the virtual resources. For this paper, it assumes that virtual resources are started

    from a cold state. The result is that startup times are as long as a normal boot time for

    the respective operating system.

    PERFORMANCE TESTS AND MEASUREMENTS

    To test performance, a 2GHz AMD Athlon 64 processor with 1 GB of RAM

    running Windows XP was used as the Central Manager, Execute, and Submit role. Two

    virtual Execute machines, running Debian Linux 3.1 and Windows 2000, each with 128

    MB RAM were created. A virtual network was created to allow communication between

    the three different operating systems, each running Condor.

  • 8/14/2019 Dynamic Condor Universe

    8/10

    The MEME [17] bioinformatics application was used as the test job. Initially, a

    MEME job was submitted to the Condor pool using the standard Condor command-line

    tools (e.g. condor_submit). The test input and configuration options were used resulting

    in job submission, execution, and result times of less than one minute.

    Using Windows Workflow Foundation and Visual Studio, a graphical workflow

    was constructed that submitted the same MEME job to the cluster, specifically requesting

    a Windows 2000 or Linux resource. The same test input and configuration options took 6

    to 8 minutes on average. Since the virtual machines are programmatically started only

    after an initial job schedule fails and are currently starting from a cold state, the start

    times include the setup and also reflects the time for the operating system to boot. There

    is also an unresolved issue with the (5 minute) cycle time between scheduling when using

    the Condor SOAP API [18].

    Additionally, the Windows 2000 virtual machine was created as a base image

    (932 MB) with a differencing virtual disk that included Condor and other support

    software (684 MB). Since the differencing disks use a sector bitmap to indicate which

    sectors are within the current disk (1s) or on the parent (0s), the specification [11]

    suggests it may be possible to achieve performance improvements. It also lent itself well

    to compression. The 684 MB difference disk was compressed to 116 MB (using standard

    ZIP compression). This file could be transferred over a standard broadband Internet

    connection in 3.7 minutes (at 511.88 Kb/s) as opposed to 30 minutes.

    CONCLUSION AND FUTURE WORK

    A number of additional modifications are required for this solution to become

    more robust. For example, security was not considered. Also, the current times for

  • 8/14/2019 Dynamic Condor Universe

    9/10

    executing short running jobs are not acceptable. Another improvement would be to start

    the virtual machines from a hot or paused state. Since the virtual machines used in this

    exercise were DHCP, the virtual machines would need to have static IPs or have

    additional knowledge of when the virtual machines are un-paused. The virtual hard

    disk(s) may be further compressed using a specific compression algorithm that takes the

    disk format into account. Performance considerations could also be given to differencing

    hard disks that are chained together for application extensibility purposes.

    This paper describes a mechanism for extending Condor to take advantage of

    virtualization to more flexibly (and cost-effectively) create an execution environment at

    run-time that balances the interests of the resource providers and consumers.

  • 8/14/2019 Dynamic Condor Universe

    10/10

    REFERENCES

    1. Keahey, K., Foster, I., Freeman, T., Zhang, X. Virtual Workspaces: AchievingQuality of Service and Quality of Life in the Grid. CCGRID 2006, Singapore,

    May 2006.

    2. Keahey, K., Foster, I., Freeman, T., Zhang, X., Galron, D. Virtual Workspaces inthe Grid. Europar 2005, Lisbon, Portugal, September, 2005.

    3. http://workspace.globus.org/vm

    4. http://www.cs.wisc.edu/condor/description.html5. http://www.bo.infn.it/alice/alice-doc/mll-doc/condor/node4.html

    6. Figueiredo, R., inda, P., Fortes, Jose. A Case For Grid Computing On Virtual

    Machines.

    7. Keahey, K., Ripeanu, M., Doering, K. Dynamic Creation and Management ofRuntime Environments in the Grid.

    8. http://www.cs.wisc.edu/condor/CondorWeek2005/presentations/user_tutorial.ppt

    9. Chase, J., Irwin, D., Grit, L., Moore, J., Sprenkle, S. Dynamic Virtual Clusters in

    a Grid Site Manager.10. Zhang, X., Keahey, K., Foster, I., Freeman, T. Virtual Cluster Workspaces for

    Grid Applications.11. Virtual Hard Disk Image Form Specification. October 11, 2006 Version 1.0.

    Microsoft.

    12. Sumanth, J. Running Condor in a Virtual Environment with coLinux.

    http://www.cs.wisc.edu/condor/CondorWeek2006/presentations/sumanth_condor_colinux.ppt

    13. Santosa, M., Schaefer, A. Build a heterogeneous cluster with coLinux and

    openMosix. http://www-128.ibm.com/developerworks/linux/library/l-colinux/index.html

    14. Condor Version 6.9.2 Manual. http://www.cs.wisc.edu/condor/manual/v6.9/

    15. http://www.cs.wisc.edu/condor/birdbath/16. http://wf.netfx3.com/content/WFIntro.aspx

    17. MEME. http://meme.sdsc.edu

    18. https://lists.cs.wisc.edu/archive/condor-users/2006-May/msg00296.shtml

    http://www.cs.wisc.edu/condor/CondorWeek2005/presentations/user_tutorial.ppthttp://www.cs.wisc.edu/condor/CondorWeek2006/presentations/sumanth_condor_colinux.ppthttp://www.cs.wisc.edu/condor/CondorWeek2006/presentations/sumanth_condor_colinux.ppthttp://www-128.ibm.com/developerworks/linux/library/l-colinux/index.htmlhttp://www-128.ibm.com/developerworks/linux/library/l-colinux/index.htmlhttp://www.cs.wisc.edu/condor/manual/v6.9/http://www.cs.wisc.edu/condor/birdbath/http://wf.netfx3.com/content/WFIntro.aspxhttps://lists.cs.wisc.edu/archive/condor-users/2006-May/msg00296.shtmlhttp://www.cs.wisc.edu/condor/CondorWeek2005/presentations/user_tutorial.ppthttp://www.cs.wisc.edu/condor/CondorWeek2006/presentations/sumanth_condor_colinux.ppthttp://www.cs.wisc.edu/condor/CondorWeek2006/presentations/sumanth_condor_colinux.ppthttp://www-128.ibm.com/developerworks/linux/library/l-colinux/index.htmlhttp://www-128.ibm.com/developerworks/linux/library/l-colinux/index.htmlhttp://www.cs.wisc.edu/condor/manual/v6.9/http://www.cs.wisc.edu/condor/birdbath/http://wf.netfx3.com/content/WFIntro.aspxhttps://lists.cs.wisc.edu/archive/condor-users/2006-May/msg00296.shtml